A Journal of Software Engineering and Applications, 2012, 5, 101-106
doi:10.4236/jsea.2012.512b020 Published Online December 2012 (http://www. scir p.org/journal/jsea)
Copyright © 2012 SciRes. JSEA
A Combination of Feature Selection and Co-occurrence
Matrix Methods for Leuk ocyte Recognit ion S yst em
Lina, Arlends Chris, Bagus Mulyawan
1Facult y of In format ion Technology, Tarumanagar a University, Ja karta, Indonesia; 2Facu lty of Med icine, Tarumanagara University,
Jakarta, Indonesia.
Email: lina@untar.ac.id
Received 2012.
ABSTRACT
A leukocyte rec ognition syste m, as part of a differential blood counter syste m, is very important in hematolo gy field. In
this paper, the propose s yst e m a i ms to automatically classify the white blood cells (leukocytes) on a given microscopic
image. The classificatio ns of leukocytes are performed based on the combination of color and texture features of the
blood cell images. The developed system classifies the leukocytes in one of the five categories (neutrophils, eosinophils,
basophils, lymphocytes, and monocytes). In the preprocessing sta ge, the system starts with converting the microscopic
images from Red Green Blue (RGB) color space to Hue Saturation Value (HSV) color space. Next, t he system sp lits the
Hue and Saturation features from the Value feature. For both Hue and Saturation features, the system processes their
color information using the Feature Selection method and the Window Cropping method; while the Value feature is
processed by its texture information using the Co-occurrence matrix method. The final recognition stage is performed
using the Euclidean distance method. The combination of the Feature Selection and Co-occurrence Matrix methods
gives the b est overall recognition accuracies for classi fying leukocyte ima ges.
Keywords: Leukocyte recognition; White blood c e ll; Microscopic image; Feature selection;Co-occurrence matrix.
1. Introduction
One of important issues in hematology is how to accu-
rately diagnose the hematopoietic system disorders.
While the manual screening and evaluation by a hema-
tologist using a microscope is relatively accurate, it is a
highly complex and t ime consuming task.
The blood elements include erythrocytes (red cells),
leukocytes (white cells), and platelets. Red blood cells are
the most numerous blood cells in the blood and are re-
quired for tissue respiration [1]. In contrast to red cells,
normal white blood cells are nucleated and include neu-
trophils, lymphocytes, monocytes, eosinophils, and ba-
sophils [2]. White blood cells serve in immune function.
Meanwhile, p latelets function in coa gulation and hemos-
tasis.
The blood cell reports from a medical laboratory can be
categorized into tw o areas: 1) standard count for red blood
cells, white blood cells, and platelets, and 2) differential
count for white blood cells. The hema tologists will also
need to detect blood disorder and the leukocyte count is
used to determine the presence of an infection in the hu-
man body. Since the task is very tedious and really time
consuming, an automated system is necessary and helpful.
Several researchers have proposed various methods to
recognize the white blood cells. However, up to now no
automatic system exists that could recognize and count
the blood cells with the accuracy comparable to the hu-
man expert [3]. S om e a ttempts to solv e th is pr oblem have
been proposed, such as the work by Markiewicz using the
Support Vector Machine meth o d [2-3], Co lunga with EM
algorithm [4], and Neural Network-based classifiers [5-6].
In this paper, the leukocyte recognition system is de-
veloped using the combination of the Feature Selection
and the Co-occurrence matrix methods. The proposed
system works based on the similarity of i mage color and
texture. In the first stage, each pixel of RGB (Red, Green,
Blue) images is transformed into HSV (Hue, Saturation,
Value) color space. Next, Hue and Saturation are
processed as color features, while Value, as a texture part,
is processed using the Cooccurrence matrix method. Fi-
nally, the Euclidean distance method is applied to the
system for rec ognition.
Figure 1 depicted the proposed leukocyte recognition
system. The remainder of this paper is organized as fol-
lows. In section 2, the methods used in the proposed
leukoc yte r ecog nitio n syste m are d escribed, i.e. the color
space transformation from RGB to HSV color space, the
Feature Selectio n method, the Window Cropping method,
A Combination o f Feature Selection and Co-occurrence Matrix Methods for Leukocyte Recognition System
Copyright © 2012 SciRes. JSEA
102
the Co-occurrence matrix method for processing texture
characteristics, and the Euclidean distance method for
recognitio n. Section 3 presents the experimental setup
and results. Finally, the conclusion is presented in sec-
tion 4.
Training Images
Feature
Selection
Haralick
Features
Color Space Transformation
Probability
Matrix
Eulidean Distance
Feature Vectors
Recognition
Results
Texture
Feature
Color
Feature Cooccurrence Matrix
Composing
Feature Vectors
Testing Images
Color
Features
Texture
Features
Figure 1 . The proposed leukocyt e recognition sy stem.
2. The Leukocyte Recognition System
2.1. RGB to HSV Color Space Transformation
In the developed system, a color space transformation
from RGB domain to HSV domain is performed in the
pre-processing stage. The HSV (Hue, Saturation, Value)
defines a common color space which is popular in
graphic applications. The HSV color space is preferred
over the RGB (Red, Green, Blue) color space due to its
ability to be more perceptually relevant than the RGB
cartesian representation. The HSV color range is con-
sisted of 3 elements: 1) Hue which corresponds to color
intensity, 2) Saturation which corresponds to lightness
(white and monochromatics colors), and 3) Value which
corresponds to brightness level .
To c o nver t a n image from RGB [0,1] color space to HSV
(H,S[0,1], V[0-360º]) color space, we essentially follo w
the steps listed b e low:
),,(M
BGRaxV
= (1)
VB)G,(R,Min -V
=S
(2)
If Max = R, then
60xH
MinMax BG
=
(3)
If Max = G, then
( )
602xH
MinMax RB
+=
(4)
If Max = B, the n
( )
604xH
MinMax GR
+=
(5)
with H=Hue, S=Saturation, V=Value, R=intensity of red
pixel, G=intensity of green pixel, B=intensity of blue
pixel.
2.2. Feature Selection Method
Feature selection aims to reduce the dimension of the
input image. With feature selection, the system chooses
and will only process the selected features for recogni-
tion process. In ord er to re duce the d imensio n of the col-
or features, we propose two steps of reduction: 1)
process only the Hue and the Saturation values, and 2)
compose small size windows for each feature.
In the first step, several 4x4 pixels windo ws are created
and arranged not to be overlapped with each other in a
single image. Next, sum each window values and calcu-
late its average for both Hue and Saturation images.
These average Hue and Saturation values are then
processed for the recognition stage. Figure 2 illustrates
the feature selection pro cess of an image.
Figure 2. T he feature selection process of an image.
2.3. Window Cropping Method
Different from the Feature Selection method, the wi nd o w
cropping technique aims to ad j ust the cell block ( window)
sizes of the testing images with that of the training im-
ages. The adjustment of each window size is necessary,
since the probability of obtaining different cell block
sizes of images from a segmentation process are rela-
tively high. The window cropping process is performed
by comparing the size of the testing image with the
A Combination o f Feature Selection and Co-occurrence Matrix Meth ods for Leukocyte Rec og ni ti on System
Copyright © 2012 SciRes. JSEA
103
training images. Then, the image with a larger size will
be cropped according to the minimum size bet ween thos e
i mage s.
2.4. Co-occurrence Matrix Method
For processing the texture information of the microscop-
ic images, the Co-occurrence Matrix method is applied.
A co-occurrence matrix is constructed by clustering the
gray-scale values of an image. Such matrix is derived
from the angular relationship between the neighboring
pixels as well as the distances b etween the m. The hi gher
the color intensity of an image, the larger size of
co-occurrence matrix can be obtained. First, the proba-
bility value p(i,j) of the color frequency
( )
jif ,
of index
pair i and j is calculated by [7]:
(6)
Next, the characteristic values, known as the Haralick
features, are obtained by processing the probability val-
ues of the co-occurrence matrix. In the proposed system,
five characteristic features are processed, i.e. entropy,
contrast, homogeneity, e nergy , and cor relation.
Entro py i s used to meas ure th e ra ndo mness o f i ntensity
distribution s. The entrop y val ue is ca lculated by:
( )( )
∑ ∑
−=
= =
I
i
J
jjipjipEntropy 0 0,log,
(7)
Another feature is the image contrast which is used to
measure the power of intensity differences in an image.
The contrast value is calc ulated by:
()( )
∑ ∑−=
= =
I
i
J
jjipjiContrast 0 0
2,
(8)
The homogeneity which calculates the uniformity of
intensity variations in an image, is the contrary of the
image contrast. The equation for calculating the
homogeneity is as follows:
( )
∑ ∑−+
=
= =
I
i
J
j
ji
jip
yHomogeneit
0 0
1
,
(9)
Moreover, energy, as the fourth features, is used to
measure the texture uniformity. The energy value is cal-
culated by:
( )()
∑ ∑
=
= =
I
i
J
j
jiPEnergy
0 0
2
,
(10)
Finally, the correlation value is used to describe the re-
lations between each pixel value with its neighbors. The
correlation value is calculated by:
yx
yxjipij
nCorrelatio
i
ij
j
σσ
µµ
∑ ∑
=
= =0 0
)],()[(
(11)
2.5. Euclidean Distance
As the final stage, in the recognition process, the dissi-
milarities between the testing images with the training
images are calculated using the Euclidean distance mea-
sureme nt, as fo l l ows [7]:
r =
|||| wx
(12)
with r is the Euclidean distance between x as the testing
vector and w as the training vector. A small r value
indicates a hig h si milarity of t wo images.
3. Experimental Setup and Results
We have conducted several experiments to test the rec-
ognition ability of the proposed system. We created our
own blood cells images and developed the FTI-Untar
blood cells database. Figure 3 shows the examples of each
leukocyte type image that ar e used in the exp eriments.
The FTI-Untar blood cells database consists of two da-
tasets: 1) Dataset 1, consists of 500 blood cells images,
with 266 neut rophils images, 122 lym phocyt es images, 94
monocytes images, 13 eosinophils images, and 5 baso-
phils images; and 2) Dataset 2, which contains 135 neu-
trophils images, 52 lymphocytes images, 43 monocytes
images, 5 eosinophils images, and 5 basophils images.
In the experiments, 80% of imag es from t he database are
used as training data, and the remaining data is used for
testing. We conducted various experiments with various
targets. First, we evaluated the performance of the com-
bination of the Feature Selection the Co-occurrence ma-
trix methods and the combination of the Window Crop-
ping and the Co-occurrence matrix methods with two
different window sizes: 1) 47x47 pixels, and 2) 57x57
pixels.
The recognition accuracies for Dataset 1 with two dif-
ferent image sizes are presented in Table 1 for 47x47
pixels and Table 2 for 57x57 pixels. As can be seen in
Table 1 and Table 2, for both image sizes, the Feature
Selection method outperformed the Window Cropping
method. The overall recognition accuracies of the system
with the Feature Selection method and the Window
Cropping method were 80.56% and 61.78%, respectively.
Meanwhile, for images with 57x57 pixels, the overall
recognition accuracies of the system with Feature Selec-
tion method were 87.28% and 75.93% for the Window
Cropping method.
A Journal of Software Engineering and Applications, 2012, 5, 101-106
doi:10.4236/jsea.2012.512b020 Published Online December 2012 (http://www. scir p.org/journal/jsea)
Copyright © 2012 SciRes. JSEA
(a) (b) (c) (d) (e)
(f) (g) (h) (i) (j)
Figure 3. The examples of each type of w hite blood cells: (a) Neutrophils, (b) Lymphocytes, (c) Monocytes, (d)
Eosinophils, (e ) Basophils, (f) cro ppe d N eutrophils, ( g) cropped Lymphocytes, ( h) cropped Monocytes, (i )
cropped Eo sinophil s, (j) cropped Basophils.
Table 1. The leukocyte recognition accuracies for Dataset 1 with 47x47 pixels.
No. Leukocyte
T ype
# of
Training
Ima g e
# of
Testing
Ima g e
Feature Selection Method
Window Cropping Method
# of Correct
Recognition
Recognition
Accuracy
(%)
# of Correct
Recognition
Recognition
Accuracy
(%)
1
Neutrophils
213
53
42
81.13
39
73.58
2
Lympocytes
100
22
22
100
14
63.63
3
Monocytes
74
20
11
55
11
55
4
Eosinophils
10
3
2
66.67
2
66.67
5
Basophils
3
2
2
100
1
50
Table 2. The leukocyte recognition accuraci es for Dataset 1 with 57x57 pixels.
No. Leukocyte
T ype
# of
Training
Ima g e
# of
Testing
Ima g e
Feature Selection Method
Window Cropping Method
# of Correct
Recognition
Recognition
Accuracy
(%)
# of Correct
Recognition
Recognition
Accuracy
(%)
1
Neutrophils
213
53
40
75.47
48
90.56
2
Lympocytes
100
22
20
90.91
13
59.09
3
Monocytes
74
20
14
70
6
30
4
Eosinophils
10
3
3
100
3
100
5
Basophils
3
2
2
100
2
100
Table 3. The leu kocyte recognition accura cies for Dataset 2 using Wind o w Cropping Method.
No. Leukocyte
T ype
# of
Training
Ima g e
# of
Testing
Ima g e
The size of Training image >
Testing image
The size of Training image <
Testing image
# of Correct
Recognition
Recognition
Accuracy
(%)
# of Correct
Recognition
Recognition
Accuracy
(%)
1
Neutrophils
108
27
17
62.96
23
85.18
2
Lympocytes
41
11
2
18.18
6
54.54
3
Monocytes
33
10
8
80
0
0
4
Eosinophils
4
1
1
100
0
0
A Combination o f Feature Selection and Co-occurrence Matrix Meth ods for Leukocyte Rec og ni ti on System
Copyright © 2012 SciRes. JSEA
105
5
Basophils
4
1
0
0
1
100
0
10
20
30
40
50
60
70
80
90
100
Neutrophils LympocytesMonocytes Eoshinophils Basophils
Feature Selection HSV
Window Cropping HSV
RGB
Figure 4. The overall recognition accuracies for images with 47x47 pixels.
0
10
20
30
40
50
60
70
80
90
100
Neutrophils Lympocytes Monocytes Eoshinophils Basophils
Feature Selection HSV
Window Cropping HSV
RGB
Figure 5. The overall recognition accuracies for images with 57x57 pixels.
In the next experiments, we evaluated the recognition
accuracies of images in Dataset 2 by applying the Win-
dow Cropping and the Co-occurrence matrix methods.
Table 3 shows that for training images which had larger
sizes than the testing images, the overall recognition ac-
curacy was 52.23%. Meanwhile, the overall recognition
accuracy of the system was 47.94% for training images
with smaller dimensions than that of the testing images.
The se resul ts sho w tha t the W indo w Cro pping method is
incapable to give high recognition accuracies for both
conditions.
Finally, we compared the overall recognition accura-
cies for three different recognition methods: 1) the Fea-
ture S electi on and t he Co-occurrence matrix methods, 2)
the Window Cropping and the Co-occurrence matrix
methods, and 3) the Co-occurrence matrix method in
RGB color space. Figure 4 and 5 sho w the o verall recog-
nition accuracies for images with 47x47 pixels and
57x57 pixels, respectively.
Based on Figure 4, the highest overall recognition ac-
curacy for images with 47x47 pixels were obtained by
the Feature Selection and the Co-occurrence matrix me-
thods with 83.64%, followed by the Co-occurrence ma-
trix method in RGB color space with 82.79%. The lowest
overall recognition accuracy was given by the combina-
tion of the Window Cropping and the Co-occurrence
matrix methods with 61.78%.
For images with 57x57 pixels, the Feature Selection
and the Co -occurrence matrix methods still outperformed
the Co-occurrence matrix method in RGB and gave the
highest overall recognition accuracy with 87.12%. The
overall recognition accuracy for the Co-occurrence ma-
trix method in RGB color space was 84.93% and fol-
lowed by the combination of the Window Cropping and
the Co-occurrence matrix methods with 75.93 %.
4. Conclusion
We have described the methodologies of our proposed
leukocyte recognition system. The pre-processing stage
A Combination o f Feature Selection and Co-occurrence Matrix Methods for Leukocyte Recognition System
Copyright © 2012 SciRes. JSEA
106
is consisted o f a color space transformation from the Red
Green Blue (RGB) color space to the Hue Saturation
Value (HSV) color space. Next, the input images are
processed based on their: 1) color information from Hue
and Saturation images, and 2) texture information from
Value images. The color information is processed using
the Feature Selection method or the Window Cropping
method, while the texture information is processed using
the Co -occurrence matrix metho d. Final ly, the E uclidea n
distance method is used as the classifier of the leukocyte
recognition system. The experimental results show that
the combination of the Feature Selection and the
Co-occurrence matrix methods gave the best recognition
accuracies for recogni zing leu kocyte images.
5. Acknowledgements
The authors would like to thank our research assistants
for their help in preparing the experiments. The research
described in this paper is supported by the Indonesian
Ministry of Education and Culture - Directorate General
for Higher Education (Hibah Penelitian Unggulan
Perguruan T i nggi No. 552-SPK-LPPI/Untar/IV /2012).
REFERENCES
[1] M.M. Wintrobe, “Clinical Hematology,” 12th Edition,
Lippincott Williams & Wilkins, Philadelphia, 2008.
[2] M. Adjouadi, N. Zong, and M. Ayala, “Multidimensional
Pattern Recognition and Classification of White Blood
Cells using Support Vector Machines,” Particle & Particle
Systems Characterization, Vol. 22, No. 2, 2005, pp.
107-118. doi: 10.1002/ppsc.200400888.
[3] T. Marki ewicz and S. Osowski, “Data Min ing Techniques
for Feature Selection in Blood Cell Recognition,” Pro-
ceedings of European Symposium on Artificial Neural
Networks, Belgium, 2006, pp. 407-412.
[4] M.C. Colunga, O.S. Siordia, S.J. Maybank, “Leukocyte
Recognition using EM-algorithm,” Proceedings of 8th
Mexican International Conference on Artificial Intelli-
gence, Gu anajuato, 2009 , pp.545-555.
[5] N. Theera-Umpon and P.D. Gader, “Training Neural
Networks to Count White Blood Cells via a Minimum
Counting Error Objective Function,” Proceedings of In-
ternat ional Confe rence on Patte rn Recognition, Barcelona,
2000, pp .2299-2302.
[6] M. Beksac, M.S. Beksac, V.B. Tipi, H.A. Duru, M.U.
Karakas, and A. Cakar,” An Artificial Intelligent Diag-
nostic System on Differential Recognition of Hemato-
poi etic Cells from Microsco pic Image s,” C ytometry, V ol.
30, 1997, pp.145-150. doi:
10.1002/(SICI)1097-0320(19970615).
[7] A. Chris, S. Sugiharto, Lina,” Detection of Abnormalities
of Lymph Node Tissues using Image Texture Analysis,
Proceedings of International Conference on Information
Technology and Applied Mathemati cs, Jakarta, 2012,
pp.30-32.