Text extraction is the key step in the character recognition; its accuracy highly relies on the location of the text region. In this paper, we propose a new method which can find the text location automatically to solve some regional problems such as incomplete, false position or orientation deviation occurred in the low-contrast image text extraction. Firstly, we make some pre-processing for the original image, including color space transform, contrast-limited adaptive histogram equalization, Sobel edge detector, morphological method and eight neighborhood processing method (ENPM) etc., to provide some results to compare the different methods. Secondly, we use the connected component analysis (CCA) method to get several connected parts and non-connected parts, then use the morphology method and CCA again for the non-connected part to erode some noises, obtain another connected and non-connected parts. Thirdly, we compute the edge feature for all connected areas, combine Support Vector Machine (SVM) to classify the real text region, obtain the text location coordinates. Finally, we use the text region coordinate to extract the block including the text, then binarize, cluster and recognize all text information. At last, we calculate the precision rate and recall rate to evaluate the method for more than 200 images. The experiments show that the method we proposed is robust for low-contrast text images with the variations in font size and font color, different language, gloomy environment, etc.
Text information extraction from images and video is a very important subject in computer vision, it’s widely used in specific applications including page segmentation, address block location, license plate location, etc. Because there are so many possible sources of variation when extracting text from a shaded or textured background, from low-contrast or complex images, or from images having variations in font size, style, color, orientation and alignment. These variations make the problem of automatic text information extraction extremely difficult. Numerous of existing methods have been proposed to detect and recognize text in scene imagery, which can be categorized into edge-based detection, connected component based detection and texture based detection. The connected component based methods assume that the text pixels belonging to the same connected region share some common features such as color or gray intensity [
In order to solve the problem of text recognition in low-contrast colorful images, we proposed a new method for positioning text region automatically. This method includes the image color space transform, edge detection, image enhancement, morphological, connected component, SVM, etc. In order to evaluate our method efficiently, we collect a new data set with 200 images with various low-contrast, compute their precision rate and recall rate. Experiments show that our method not only can accurately position the text area but also gain good results on the low-contrast images with different sizes and languages.
The paper is organized as follows. In Section 2, four steps for positioning the text region location with image pre-processing, connected component analysis, edge detection and text region merge are introduced in detail. In Section 3, text extraction and recognition results using OCR system are given. In Section 4, some experiments and evaluations are presented. In Section 5, what we have done is summarized.
The goal of our approach is to detect low contrast text images without being affected by the language, font color or font size. For the simplify, we assume that text present in images is in the horizontal direction with uniform spacing between words. The processing to locate the text region is divided into four steps: pre-processing, gain the connect component part using CCA method, first and second layer judge to locate the text location coordinates, merging the text regions.
1) Convert Original Image to YUV Color Space Image
RGB color space is complex in describing the color pattern and has redundant information between each component. Since pixel values in RGB color space are highly correlated, RGB color space is converted into other color spaces. The YUV color model defines a color space in terms of one luminance and two chrominance components. Because of the low-contrast images we choose have similar color between character and background, meanwhile, the luminance information can get a better result than colorful information for further processing, so, we convert the input image (see
2) Enhance the Above Y Channel Image to Increase Image Contrast
Owing to the text images we choose are low-contrast, so, in order to get a better result, we first select some methods to enhance the image, comparing with multi-scale retina algorithm, adjust image intensity, histogram equalization and contrast-limited adaptive histogram equalization, finally finds that contrast-li- mited adaptive histogram equalization (see
3) Use Sobel Edge Detector Operator to Detect the Above-Enhanced Image
After doing some enhancement for the image, we need to select a kind of edge operator to detect the image edge. In the experiment, we can see Sobel operator [
4) Use Morphology Method (MM) and Eight Neighborhood Processing Method (ENPM) to Enhance Edge Information
Observing the Sobel detector results, we found the edge density is very weak, so we decide to adopt morphology method [
Following is the algorithm about ENPM (see
CCA could be regarded as a graph algorithm, where subsets of connected components are uniquely labeled based on heuristics about feature consensus, i.e., color similarity and spatial layout. In the implementation of CCA, syntactic pattern recognition methods are often used to analyze the spatial and feature consensus, and to define text regions. Considering the complexity of fine turning the syntactic rules, a new trend is to perform CCA with statistical models.
The basic steps of the connected-component text extraction algorithm are given below. 1) Convert the input image to YUV color space (luminance + chrominance), use only luminance channel for further processing; 2) Convert the image gray scale using only Y channel; 3) Compute the edge image for Y channel gray image; 4) Sharpen the edge image by convolving it with sharpening filter; 5) Compute horizontal and vertical projection files considering the sharpened edge image as the input intensity image; 6) Segment the candidate text regions based on adaptive threshold values calculated for vertical and horizontal projections; 7) Perform gap filling to eliminate possible non-text regions [
Through the above steps, we can use CCA method to remove part overlap blocks from connect component.
In
Input: ImgSobelMorph: ImgSobelMorph is the image, which the source image has been processed by step one to step three and morphology method. r: r is the row of the image. c: c is the column of the image. Algorithm: ImgENP is a two-dimensional array with r rows and c columns, whose value set to be zero. For i set 2 to r-1 For j set 2 to c-1 If the sum of ImgSobelMorph (i-1:i+1, j-1:j+1) is greater than 1 Then ImgENP (i, j) set to be 1 End If End For End For Output: ImgENP: ImgENP is the output image after being processed by ENPM algorithm. |
---|
overlap blocks. After removing the overlap regions, we need to extract the remaining regions except for the regions gained by CCA.
The type of split overlap region is as follows. In which the yellow color region the first block and the white color region is the second block. According to dual law, there are four types (see
1) First Layer Judge (FLJ)
Reading 200 text images, and using the above method to gain a lot of connected component regions, by observing the regions difference between text regions and non-text regions, we find that regions whose row or column is less than 22, or the value for row divide column is more than 10 or less than 0.1 generally would be non-text regions. The results see in
2) Second Layer Judge (SLJ)
a) Get the Edge Detector Image
Edge is the distinct characteristic which can be used to find possible text areas. The text is mainly composed of the strokes in horizontal, vertical, up-right, up- left direction, so it can be considered that the region with higher edge strength in these directions is the text region. In order to get a better edge strength, we made a lot of experiments to see which style of computing would be better.
In
b) Based Edge Map Image to Detection Unsupervised Text Regions
To some extent, the text has weak and irregular texture property, so we can look text as a special texture. We employ the statistical features in the edge maps to capture the texture property. The features are mean, standard deviation, energy, inertia, local homogeneity and correlation of edge maps [
After feature computing, we get 5 features form the feature vector representation for each block, then use Support Vector Machine Classifier Algorithm to classify the feature vectors into two clusters: text regions and background regions (see
Text region merges mainly rely on the location of the text region, finding the neighborhood region, and compute the distance in near neighborhood regions. Then merge with those regions which their distance is small.
Through the method we proposed in Section 2, we can successfully locate the
text, then, based on the coordinate of text to extract the text regions. For each text region, first use global threshold method to gain binarization images, then, use means filter to filter the small spot pixels so as to get a smooth image. After doing that, we use OCR system to recognize text, including Chinese or English.
Text regions | Non-text regions |
---|---|
Here we collect a new dataset with 200 images of various low-contrast images. In addition, in order to evaluate our method, we compute the precision rate and recall rate, meanwhile, makes comparisons with other methods.
The performance of each algorithm has been evaluated based on its precision rate, an average run time obtained. The precision and recall rates are calculated as [
PrecisionRate = Correctlydetected Correctlydetected + Falsepositives × 100 % (1)
Method | Test Images Number | Precision Rate (%) | Recall Rate (%) | Average Runtime in (s) |
---|---|---|---|---|
Our Method | 200 | 76.5 | 60.39 | 27 |
Text | Chinese | English |
---|---|---|
Number of characters | 2281 | 1304 |
Correctly detected | 1924 | 1271 |
Missed | 357 | 33 |
Number of false alarm regions | 490 | |
Missed rate (%) | 15.6 | 2.5 |
RecallRate = Correctlydetected Correctlydetected + Falsenegatives × 100 % (2)
We compute the results in
In this paper, we provide a new method to find the text region in low-contrast image automatically. In the first, we use the only luminance (Y) channel for further processing, and use the contrast-limited adaptive histogram equalization to enhance the image. Then we use the connected component analysis (CCA) method to analysis the location of connected parts to remove inner or border parts so as to reduce the connected parts. Thirdly, we compute the edge feature for all connected parts, and combine Support Vector Machine (SVM) to obtain the real text region. Finally, we merge the text region to extract the block including the text, and use OCR system to recognize all text informations. In order to evaluate our method efficiently, we collect a new data setting with 200 images with various low-contrast, and compute their precision rate and recall rate. Experiments show that our method can not only accurately position the text area but also gain good results on the low-contrast images with different sizes and languages.
This work was supported by Chinese National Natural Science Foundation (No. 11161055) and Program for Innovative Research Team (in Science and Technology) in University of Yunnan Province.
Liu, G.Q., Jiang, M.R., Cun, H.L., Shi, Z.Z. and Hao, J.Y. (2017) An Automatic Text Region Positioning Method for the Low-Contrast Image. Journal of Computer and Communications, 5, 36-49. https://doi.org/10.4236/jcc.2017.510005