Paper Menu >>
Journal Menu >>
A Journal of Software Engineering and Applications, 2012, 5, 101-106 doi:10.4236/jsea.2012.512b020 Published Online December 2012 (http://www. scir p.org/journal/jsea) Copyright © 2012 SciRes. JSEA A Combination of Feature Selection and Co-occurrence Matrix Methods for Leuk ocyte Recognit ion S yst em Lina, Arlends Chris, Bagus Mulyawan 1Facult y of In format ion Technology, Tarumanagar a University, Ja karta, Indonesia; 2Facu lty of Med icine, Tarumanagara University, Jakarta, Indonesia. Email: lina@untar.ac.id Received 2012. ABSTRACT A leukocyte rec ognition syste m, as part of a differential blood counter syste m, is very important in hematolo gy field. In this paper, the propose s yst e m a i ms to automatically classify the white blood cells (leukocytes) on a given microscopic image. The classificatio ns of leukocytes are performed based on the combination of color and texture features of the blood cell images. The developed system classifies the leukocytes in one of the five categories (neutrophils, eosinophils, basophils, lymphocytes, and monocytes). In the preprocessing sta ge, the system starts with converting the microscopic images from Red Green Blue (RGB) color space to Hue Saturation Value (HSV) color space. Next, t he system sp lits the Hue and Saturation features from the Value feature. For both Hue and Saturation features, the system processes their color information using the Feature Selection method and the Window Cropping method; while the Value feature is processed by its texture information using the Co-occurrence matrix method. The final recognition stage is performed using the Euclidean distance method. The combination of the Feature Selection and Co-occurrence Matrix methods gives the b est overall recognition accuracies for classi fying leukocyte ima ges. Keywords: Leukocyte recognition; White blood c e ll; Microscopic image; Feature selection;Co-occurrence matrix. 1. Introduction One of important issues in hematology is how to accu- rately diagnose the hematopoietic system disorders. While the manual screening and evaluation by a hema- tologist using a microscope is relatively accurate, it is a highly complex and t ime consuming task. The blood elements include erythrocytes (red cells), leukocytes (white cells), and platelets. Red blood cells are the most numerous blood cells in the blood and are re- quired for tissue respiration [1]. In contrast to red cells, normal white blood cells are nucleated and include neu- trophils, lymphocytes, monocytes, eosinophils, and ba- sophils [2]. White blood cells serve in immune function. Meanwhile, p latelets function in coa gulation and hemos- tasis. The blood cell reports from a medical laboratory can be categorized into tw o areas: 1) standard count for red blood cells, white blood cells, and platelets, and 2) differential count for white blood cells. The hema tologists will also need to detect blood disorder and the leukocyte count is used to determine the presence of an infection in the hu- man body. Since the task is very tedious and really time consuming, an automated system is necessary and helpful. Several researchers have proposed various methods to recognize the white blood cells. However, up to now no automatic system exists that could recognize and count the blood cells with the accuracy comparable to the hu- man expert [3]. S om e a ttempts to solv e th is pr oblem have been proposed, such as the work by Markiewicz using the Support Vector Machine meth o d [2-3], Co lunga with EM algorithm [4], and Neural Network-based classifiers [5-6]. In this paper, the leukocyte recognition system is de- veloped using the combination of the Feature Selection and the Co-occurrence matrix methods. The proposed system works based on the similarity of i mage color and texture. In the first stage, each pixel of RGB (Red, Green, Blue) images is transformed into HSV (Hue, Saturation, Value) color space. Next, Hue and Saturation are processed as color features, while Value, as a texture part, is processed using the Cooccurrence matrix method. Fi- nally, the Euclidean distance method is applied to the system for rec ognition. Figure 1 depicted the proposed leukocyte recognition system. The remainder of this paper is organized as fol- lows. In section 2, the methods used in the proposed leukoc yte r ecog nitio n syste m are d escribed, i.e. the color space transformation from RGB to HSV color space, the Feature Selectio n method, the Window Cropping method, A Combination o f Feature Selection and Co-occurrence Matrix Methods for Leukocyte Recognition System Copyright © 2012 SciRes. JSEA 102 the Co-occurrence matrix method for processing texture characteristics, and the Euclidean distance method for recognitio n. Section 3 presents the experimental setup and results. Finally, the conclusion is presented in sec- tion 4. Training Images Feature Selection Haralick Features Color Space Transformation Probability Matrix Eulidean Distance Feature Vectors Recognition Results Texture Feature Color Feature Cooccurrence Matrix Composing Feature Vectors Testing Images Color Features Texture Features Figure 1 . The proposed leukocyt e recognition sy stem. 2. The Leukocyte Recognition System 2.1. RGB to HSV Color Space Transformation In the developed system, a color space transformation from RGB domain to HSV domain is performed in the pre-processing stage. The HSV (Hue, Saturation, Value) defines a common color space which is popular in graphic applications. The HSV color space is preferred over the RGB (Red, Green, Blue) color space due to its ability to be more perceptually relevant than the RGB cartesian representation. The HSV color range is con- sisted of 3 elements: 1) Hue which corresponds to color intensity, 2) Saturation which corresponds to lightness (white and monochromatics colors), and 3) Value which corresponds to brightness level . To c o nver t a n image from RGB [0,1] color space to HSV (H,S[0,1], V[0-360º]) color space, we essentially follo w the steps listed b e low: ),,(M BGRaxV = (1) VB)G,(R,Min -V =S (2) If Max = R, then 60xH MinMax BG− − = (3) If Max = G, then ( ) 602xH MinMax RB− − += (4) If Max = B, the n ( ) 604xH MinMax GR− − += (5) with H=Hue, S=Saturation, V=Value, R=intensity of red pixel, G=intensity of green pixel, B=intensity of blue pixel. 2.2. Feature Selection Method Feature selection aims to reduce the dimension of the input image. With feature selection, the system chooses and will only process the selected features for recogni- tion process. In ord er to re duce the d imensio n of the col- or features, we propose two steps of reduction: 1) process only the Hue and the Saturation values, and 2) compose small size windows for each feature. In the first step, several 4x4 pixels windo ws are created and arranged not to be overlapped with each other in a single image. Next, sum each window values and calcu- late its average for both Hue and Saturation images. These average Hue and Saturation values are then processed for the recognition stage. Figure 2 illustrates the feature selection pro cess of an image. Figure 2. T he feature selection process of an image. 2.3. Window Cropping Method Different from the Feature Selection method, the wi nd o w cropping technique aims to ad j ust the cell block ( window) sizes of the testing images with that of the training im- ages. The adjustment of each window size is necessary, since the probability of obtaining different cell block sizes of images from a segmentation process are rela- tively high. The window cropping process is performed by comparing the size of the testing image with the A Combination o f Feature Selection and Co-occurrence Matrix Meth ods for Leukocyte Rec og ni ti on System Copyright © 2012 SciRes. JSEA 103 training images. Then, the image with a larger size will be cropped according to the minimum size bet ween thos e i mage s. 2.4. Co-occurrence Matrix Method For processing the texture information of the microscop- ic images, the Co-occurrence Matrix method is applied. A co-occurrence matrix is constructed by clustering the gray-scale values of an image. Such matrix is derived from the angular relationship between the neighboring pixels as well as the distances b etween the m. The hi gher the color intensity of an image, the larger size of co-occurrence matrix can be obtained. First, the proba- bility value p(i,j) of the color frequency ( ) jif , of index pair i and j is calculated by [7]: ( )( ) × ∑ = ...................... .....),(),( .....,, ),( 1 ),( jifjif jifjif jif jip (6) Next, the characteristic values, known as the Haralick features, are obtained by processing the probability val- ues of the co-occurrence matrix. In the proposed system, five characteristic features are processed, i.e. entropy, contrast, homogeneity, e nergy , and cor relation. Entro py i s used to meas ure th e ra ndo mness o f i ntensity distribution s. The entrop y val ue is ca lculated by: ( )( ) ∑ ∑ −= = = I i J jjipjipEntropy 0 0,log, (7) Another feature is the image contrast which is used to measure the power of intensity differences in an image. The contrast value is calc ulated by: ()( ) ∑ ∑−= = = I i J jjipjiContrast 0 0 2, (8) The homogeneity which calculates the uniformity of intensity variations in an image, is the contrary of the image contrast. The equation for calculating the homogeneity is as follows: ( ) ∑ ∑−+ = = = I i J j ji jip yHomogeneit 0 0 1 , (9) Moreover, energy, as the fourth features, is used to measure the texture uniformity. The energy value is cal- culated by: ( )() ∑ ∑ = = = I i J j jiPEnergy 0 0 2 , (10) Finally, the correlation value is used to describe the re- lations between each pixel value with its neighbors. The correlation value is calculated by: yx yxjipij nCorrelatio i ij j σσ µµ ∑ ∑− = = =0 0 )],()[( (11) 2.5. Euclidean Distance As the final stage, in the recognition process, the dissi- milarities between the testing images with the training images are calculated using the Euclidean distance mea- sureme nt, as fo l l ows [7]: r = |||| wx − (12) with r is the Euclidean distance between x as the testing vector and w as the training vector. A small r value indicates a hig h si milarity of t wo images. 3. Experimental Setup and Results We have conducted several experiments to test the rec- ognition ability of the proposed system. We created our own blood cells images and developed the FTI-Untar blood cells database. Figure 3 shows the examples of each leukocyte type image that ar e used in the exp eriments. The FTI-Untar blood cells database consists of two da- tasets: 1) Dataset 1, consists of 500 blood cells images, with 266 neut rophils images, 122 lym phocyt es images, 94 monocytes images, 13 eosinophils images, and 5 baso- phils images; and 2) Dataset 2, which contains 135 neu- trophils images, 52 lymphocytes images, 43 monocytes images, 5 eosinophils images, and 5 basophils images. In the experiments, 80% of imag es from t he database are used as training data, and the remaining data is used for testing. We conducted various experiments with various targets. First, we evaluated the performance of the com- bination of the Feature Selection the Co-occurrence ma- trix methods and the combination of the Window Crop- ping and the Co-occurrence matrix methods with two different window sizes: 1) 47x47 pixels, and 2) 57x57 pixels. The recognition accuracies for Dataset 1 with two dif- ferent image sizes are presented in Table 1 for 47x47 pixels and Table 2 for 57x57 pixels. As can be seen in Table 1 and Table 2, for both image sizes, the Feature Selection method outperformed the Window Cropping method. The overall recognition accuracies of the system with the Feature Selection method and the Window Cropping method were 80.56% and 61.78%, respectively. Meanwhile, for images with 57x57 pixels, the overall recognition accuracies of the system with Feature Selec- tion method were 87.28% and 75.93% for the Window Cropping method. A Journal of Software Engineering and Applications, 2012, 5, 101-106 doi:10.4236/jsea.2012.512b020 Published Online December 2012 (http://www. scir p.org/journal/jsea) Copyright © 2012 SciRes. JSEA (a) (b) (c) (d) (e) (f) (g) (h) (i) (j) Figure 3. The examples of each type of w hite blood cells: (a) Neutrophils, (b) Lymphocytes, (c) Monocytes, (d) Eosinophils, (e ) Basophils, (f) cro ppe d N eutrophils, ( g) cropped Lymphocytes, ( h) cropped Monocytes, (i ) cropped Eo sinophil s, (j) cropped Basophils. Table 1. The leukocyte recognition accuracies for Dataset 1 with 47x47 pixels. No. Leukocyte T ype # of Training Ima g e # of Testing Ima g e Feature Selection Method Window Cropping Method # of Correct Recognition Recognition Accuracy (%) # of Correct Recognition Recognition Accuracy (%) 1 Neutrophils 213 53 42 81.13 39 73.58 2 Lympocytes 100 22 22 100 14 63.63 3 Monocytes 74 20 11 55 11 55 4 Eosinophils 10 3 2 66.67 2 66.67 5 Basophils 3 2 2 100 1 50 Table 2. The leukocyte recognition accuraci es for Dataset 1 with 57x57 pixels. No. Leukocyte T ype # of Training Ima g e # of Testing Ima g e Feature Selection Method Window Cropping Method # of Correct Recognition Recognition Accuracy (%) # of Correct Recognition Recognition Accuracy (%) 1 Neutrophils 213 53 40 75.47 48 90.56 2 Lympocytes 100 22 20 90.91 13 59.09 3 Monocytes 74 20 14 70 6 30 4 Eosinophils 10 3 3 100 3 100 5 Basophils 3 2 2 100 2 100 Table 3. The leu kocyte recognition accura cies for Dataset 2 using Wind o w Cropping Method. No. Leukocyte T ype # of Training Ima g e # of Testing Ima g e The size of Training image > Testing image The size of Training image < Testing image # of Correct Recognition Recognition Accuracy (%) # of Correct Recognition Recognition Accuracy (%) 1 Neutrophils 108 27 17 62.96 23 85.18 2 Lympocytes 41 11 2 18.18 6 54.54 3 Monocytes 33 10 8 80 0 0 4 Eosinophils 4 1 1 100 0 0 A Combination o f Feature Selection and Co-occurrence Matrix Meth ods for Leukocyte Rec og ni ti on System Copyright © 2012 SciRes. JSEA 105 5 Basophils 4 1 0 0 1 100 0 10 20 30 40 50 60 70 80 90 100 Neutrophils LympocytesMonocytes Eoshinophils Basophils Feature Selection HSV Window Cropping HSV RGB Figure 4. The overall recognition accuracies for images with 47x47 pixels. 0 10 20 30 40 50 60 70 80 90 100 Neutrophils Lympocytes Monocytes Eoshinophils Basophils Feature Selection HSV Window Cropping HSV RGB Figure 5. The overall recognition accuracies for images with 57x57 pixels. In the next experiments, we evaluated the recognition accuracies of images in Dataset 2 by applying the Win- dow Cropping and the Co-occurrence matrix methods. Table 3 shows that for training images which had larger sizes than the testing images, the overall recognition ac- curacy was 52.23%. Meanwhile, the overall recognition accuracy of the system was 47.94% for training images with smaller dimensions than that of the testing images. The se resul ts sho w tha t the W indo w Cro pping method is incapable to give high recognition accuracies for both conditions. Finally, we compared the overall recognition accura- cies for three different recognition methods: 1) the Fea- ture S electi on and t he Co-occurrence matrix methods, 2) the Window Cropping and the Co-occurrence matrix methods, and 3) the Co-occurrence matrix method in RGB color space. Figure 4 and 5 sho w the o verall recog- nition accuracies for images with 47x47 pixels and 57x57 pixels, respectively. Based on Figure 4, the highest overall recognition ac- curacy for images with 47x47 pixels were obtained by the Feature Selection and the Co-occurrence matrix me- thods with 83.64%, followed by the Co-occurrence ma- trix method in RGB color space with 82.79%. The lowest overall recognition accuracy was given by the combina- tion of the Window Cropping and the Co-occurrence matrix methods with 61.78%. For images with 57x57 pixels, the Feature Selection and the Co -occurrence matrix methods still outperformed the Co-occurrence matrix method in RGB and gave the highest overall recognition accuracy with 87.12%. The overall recognition accuracy for the Co-occurrence ma- trix method in RGB color space was 84.93% and fol- lowed by the combination of the Window Cropping and the Co-occurrence matrix methods with 75.93 %. 4. Conclusion We have described the methodologies of our proposed leukocyte recognition system. The pre-processing stage A Combination o f Feature Selection and Co-occurrence Matrix Methods for Leukocyte Recognition System Copyright © 2012 SciRes. JSEA 106 is consisted o f a color space transformation from the Red Green Blue (RGB) color space to the Hue Saturation Value (HSV) color space. Next, the input images are processed based on their: 1) color information from Hue and Saturation images, and 2) texture information from Value images. The color information is processed using the Feature Selection method or the Window Cropping method, while the texture information is processed using the Co -occurrence matrix metho d. Final ly, the E uclidea n distance method is used as the classifier of the leukocyte recognition system. The experimental results show that the combination of the Feature Selection and the Co-occurrence matrix methods gave the best recognition accuracies for recogni zing leu kocyte images. 5. Acknowledgements The authors would like to thank our research assistants for their help in preparing the experiments. The research described in this paper is supported by the Indonesian Ministry of Education and Culture - Directorate General for Higher Education (Hibah Penelitian Unggulan Perguruan T i nggi No. 552-SPK-LPPI/Untar/IV /2012). REFERENCES [1] M.M. Wintrobe, “Clinical Hematology,” 12th Edition, Lippincott Williams & Wilkins, Philadelphia, 2008. [2] M. Adjouadi, N. Zong, and M. Ayala, “Multidimensional Pattern Recognition and Classification of White Blood Cells using Support Vector Machines,” Particle & Particle Systems Characterization, Vol. 22, No. 2, 2005, pp. 107-118. doi: 10.1002/ppsc.200400888. [3] T. Marki ewicz and S. Osowski, “Data Min ing Techniques for Feature Selection in Blood Cell Recognition,” Pro- ceedings of European Symposium on Artificial Neural Networks, Belgium, 2006, pp. 407-412. [4] M.C. Colunga, O.S. Siordia, S.J. Maybank, “Leukocyte Recognition using EM-algorithm,” Proceedings of 8th Mexican International Conference on Artificial Intelli- gence, Gu anajuato, 2009 , pp.545-555. [5] N. Theera-Umpon and P.D. Gader, “Training Neural Networks to Count White Blood Cells via a Minimum Counting Error Objective Function,” Proceedings of In- ternat ional Confe rence on Patte rn Recognition, Barcelona, 2000, pp .2299-2302. [6] M. Beksac, M.S. Beksac, V.B. Tipi, H.A. Duru, M.U. Karakas, and A. Cakar,” An Artificial Intelligent Diag- nostic System on Differential Recognition of Hemato- poi etic Cells from Microsco pic Image s,” C ytometry, V ol. 30, 1997, pp.145-150. doi: 10.1002/(SICI)1097-0320(19970615). [7] A. Chris, S. Sugiharto, Lina,” Detection of Abnormalities of Lymph Node Tissues using Image Texture Analysis,” Proceedings of International Conference on Information Technology and Applied Mathemati cs, Jakarta, 2012, pp.30-32. |