Journal of Software Engineering and Applications, 2011, 4, 682-687
doi:10.4 23 6/jse a .20 11 .4 12 08 0 Pu blishe d Onli ne December 2011 (http://www.SciRP.org/journal/jsea)
Cop yright © 2011 Sci Res. JSEA
Face Detection and Localization in Color Images:
An Efficient Neural Approach
Samy Sadek1, Ayoub Al-Hamadi1, Bernd Michaelis1, Usama Sayed2
1Institute for Electronics, Signal Processing and Communications (IESK), Otto-von-Guericke-University Magdeburg, Magdeburg,
Germany; 2Department o f Elect rical Engineerin g, Assiut Universit y, Assi ut, Egypt.
Email: samy.bakheet@ovgu.de, samy.technik@yahoo.de
Received Oct ober 17th, 2011; revised November 29th, 2011; accep ted December 11th, 2011.
ABSTRACT
Automatic face detection and localization is a key problem in many computer vision tasks. In this paper, a simple yet
effective approach for detecting and locating human faces in color images is proposed. The contribution of this paper is
twofold. First, a particular reference to face detection techniques along with a background to neural networks is given.
Second, and maybe most importantly, an adaptive cubic-spline neural network is designed to be used to detect and lo-
cate human faces in uncontrolled environments. The experimental results conducted on our test set show the effective-
ness of the proposed approach and it can compare favorably with other state-of-the-art approaches in the literature.
Keywords: Human Face Detection and Localization, Spline A c t i v at i on Functi on, Color Moments, Human-Computer
Interaction
1. Introduction
Automatic face detection and localization is an active
area of research spanning several disciplines in computer
vision and pattern classification and has many applica-
tion potentials, yet it still presents one of the most chal-
lengi ng c o mpute r vi si on p r ob lems. For insta nc e, mugsho t
matching, user verification and access control, enhanced
human-computer interaction, and crowd surveillance all
are becoming possible if an effective face detection sys-
tem could be implemented [1]. There are two funda-
mental face detection techniques: content-based methods
and color-based methods. Content-based methods try to
identify features in a human face. Most content-based
methods were developed for grayscale images to avoid
the complexity of combining the features detected in the
RGB color space. A method developed by Yow and
Cipolla [2] elongates the image in the horizontal direc-
tion and identifies thin horizontal features, such as the
eyes and mouth. Cootes and Taylor [3] develop a tech-
nique that matches features to a model face using statis-
tical methods. Leung et al. [4] present a similar method
that matches features to a model face, except they used a
graph matching algorithm to compare detected features
to the model. In [5], Rowley et al. develop a front view
face detection system that uses neural networks to pick
out features. Instead of using neural networks, Sung and
Poggio [6] develop an example-based learning technique,
while Colmenzrez and Huang [7] use a probabilistic vis-
ual learning system. A survey of content-based techni-
ques for general image retrieval can be found in [8].
Unfortunately, content-based techniques are very com-
plex and expensive computationally. Also, if the face is
rotated or partially obscured, the technique has to incur-
porate other techniques to solve the image registration
and occlusion problems. In the other hand, color-based
methods are based on calculating histograms of the color
values and then develop a chroma chart to identify the
probability that a particular range of pixel values repre-
sent human flesh. It has been found that the effectiveness
of the method depends highly on the color space used.
Chroma charts have been developed for the standard
RGB color space [9], the YIQ color space [10], the HSV
color space [11,12], and the LUV space [13]. The imple-
mentation of color-based techniques is fairly simple and,
after the system has learned a chroma chart, the process-
ing is very efficient. Also, the methods handle color im-
ages in a more straightforward manner than the con-
tent-based methods. However, as [14] describes, color-
based techniques have several drawbacks. These disadvan-
tages include information loss due to quantization, the
strong dependence on the color space, and erroneous re-
trieval in the presence of gamma nonlinearity. The most
significant drawback, however, is that a technique based
Face Detection and Lo cal ization in Color Images: An Efficient Neural Approach683
solely on a color histogram ignores all spatial informa-
tion in the image. That is, color histograms catalog the
global distribution of colors, but do not tell how the col-
ors are arranged to form shapes and features. Despite
these disadvantages, color histograms are very popular
due to their simplicity and ease of calculation.
The remainder of the paper proceeds as follows. Sec-
tion 2 outlines the neural model (i.e. cubic-spline neural
network) used as classifier with the proposed approach.
In Section 3, we describe the proposed method that is
based on YES histograms and color moments. In Section
4, experimental results are reported. Finally, a few con-
cluding remarks and suggestions for possible future ex-
tensions are given in Section 5.
2. Cubic-Spline Neural Networks
Artificial Neural networks (ANNs) are very likely to be
the future of computing. A neural network is a powerful
data modeling tool that is able to capture and represent
complex input/output relationships. The motivation for
the development of neural network technology stemmed
from the desire to develop an artificial system that could
perform “intelligent” tasks similar to those performed by
the human brain. A graphical representation of the neural
model is shown in Figure 1. The ANN learns via a proc-
ess called “training”. With training, the input data is re-
peatedly presented to the neural network. With each pre-
sentation the output of the neural network is compared to
the desired output and an error is computed. This error is
then fed to the neural network and used to adjust the
weights such that the error decreases with each iteration
and the neural model gets closer and closer to producing
the desired output.
To produce an output closer to the desired output, the
ne urons of network employ a non-linear function, so-called
activation function which is usually a non-linear mono-
tonic function and generally based upon the sigmoidal
function. The activation function simulates the correlatio n
between the action potential of the inputs and the output
of the neuron. In this work we employ an adaptive acti-
vation function for the hidden neurons out of a pool of
standard functions called cubic-spline function to increase
Figure 1. Bl ock diagram of an ANN architecture.
flexibility [15]. The neural mo del employing this t ype of
activation function is called Cubic-spline Neural Net-
work (CSNN). Mathematically, the cubic-spline activa-
tion func tion is define d by

3
,
=1
== i
kkik
i
Sxssx x
(1)
1
,,=1,2,,
kk
xxx kn
1.

where ,ki
s
are the coefficients of the cubic-spline func-
tion. Further details on this model can be found in [16].
3. Proposed Methodology
This section is to discuss the proposed methodology for
r eal-time face det ection and localizatio n. Figure 2 is a sim-
plified block diagram illustrating the main components of
the proposed architecture, and how they interact with
each other in order to achieve effective functionality of
the whole approach. As illustrated in the block diagram,
the proposed approach generally consists of two parts,
each carries out a specific tasks. The first part performs
face detection task, while the second one performs face
localization task. Each of these two tasks can be de-
scribed briefly below.
Face Detection
To achieve this task, the proposed approach tries to
discriminate between two classes of images (i.e., “face”
class and “non-face” class). It is noted that training a neu-
ral model for the face detection task is challenging be-
cause of the difficulty in characterizing prototypical “non-
face” images. It is easy to get a representative sample of
images which contain faces, but it is much difficult to get
a representative sample of those which do not. A simple
procedure for this task works as follows: At first, the
feature vector x which consi sts of information (YES his-
togr ams and colo r mo ments) der ived from a give n ima ge
is fed into the designed adaptive SNN. Then, the output y
will represent the probability that the image contains a
human face. Formally, the output y for a given image can
be interpreted as:

> 0.5,face found
=x=
.,face not found
yp ow
(2)
Face Localization
In terms of the methodology for face localization task,
Figure 2. Main structure of th e proposed approach.
Cop yright © 2011 Sci Res. JSEA
Face Detection and Lo cal ization in Color Images: An Efficient Neural Approach 684
the approach attempts to identify the locatio n of a face in
a given image. The ultimate goal of this task is finding an
object in an image as a face candidate that its shape re-
sembles the shape of a face. Faces can be characterized
by elliptical shape and an ellipse can approximate the
shape of a face. In order to perform face localization task,
the proposed approach carries out two subtasks: a human
skin segmentation to identify possible regions correspond-
ing to human faces; and shape analysis to separate iso-
lated human faces from initial segmentation results and
then identify the location of each human face in the im-
age. In the following subsectio ns, we discuss the di fferent
modules that implement the baseline architecture afore-
ment ioned in Figure 2, with a particular focus on the fea-
ture extraction module.
3.1. Preprocessing
For later successful feature extraction and classification,
it is important to preprocess all video sequences to re-
move noisy, erroneous, and incomplete data, and to pre-
pare the representative features that are suitable for know-
ledge generation. To wipe off noise and weaken image
distortion, all frames of each action snippet are smoothed
by Gaus sian c onvol ution with a kernel o f size 33
and
variance =0.5
.
3.2. Feature Extra c t io n
Feature extraction is indeed the core of any recognition
system, but is also the most challenging and timecon-
suming part. Further it was stated that the overall per-
formance of the recognition system relies heavily on the
feature extraction than the classification part. In particu-
lar, real-time feature extraction is a key component for
any action recognition system that claims to be truly real-
time. Many varieties of visual features can be used for
face detection and localization. In this work, the features
that have been considered are derived from the difference
images that primarily describe the shape of the moving
human body parts. Such features represent a fundamental
source of information regarding the interpretation of a
specific human action. Furthermore the information of
motion can be also extracted by following the trajectory
of the motion centroid. The extracted features are prima-
rily based on computing the moments of the difference
images to specify the type of motion of a given action.
Therefore the basic features are defined as:
YES Histogram
RGB is the natural color space to work in, since most
co lor images are encoded in this space. Although the RGB
histogram may yield so me positive results in many color
based image retrieval or classification systems [17], it is
still not a satisfactory face detection system. The trans-
formation from the standard RGB color space to the YES
color space is given by the following matrix equation:
Y0.253 0.684 0.063R
E =0.50.50.0G
S0.250.250.5 B
 
 
 
 
 
(3)
It is noted that the Y component picks out the edges of
the image, while the E and S components encode the
color intensities. The Y histogram may, in some sense,
provide the neural network with spatial information, ra-
ther than just color intensities. The errors in the RGB his-
togram approach may have been due in part to the simi-
larity of the three R, G, and B histograms. The YES ap-
proach seemed to have resulted in histograms that with
greater variation. In this manner, appending the three his-
tograms along with color moments as a vector provides
the network with more information.
Color Moments
Color moments have been successfully used in many
color-based image retrieval or classification systems [18-
21], especially when the image contains just the object.
The first order (mean), the second (variance) and the
third order (skewness) color moments have been proved
to be efficient and effective in representing color distri-
butions of images. Mathematically, the first three mo-
ments can be defined as:
=1 =1
1
=mn k
ki
ji j
p


2
=1 =1
1
=mn k
kij
ji
p

 k

3
3
=1 =1
1
=mn k
kij
ji
sp
 k
where k
ij
p
is the value of the k-th color component of
the image ij-th pixel and =mn
where m and n are the
height and the width of the image, respectively.
3.3. Fea t ure C la ssifi cati on
In this section, face detection task is modeled as a simple
two-class classification task, and the goal is to assign a
class to a given i mage. There ar e various supervised l earn -
ing algorithms by which a face detection can be trained.
The neural classifier aforementioned in Section 2 is used
for the current classification task due to its outstanding
generalization capability and reputation of a highly ac-
curate paradigm. The basic model of the ANN classifier
that we used is an MLP network with multiple hidden
layers with 20 neurons each, which is most similar to the
classical network structures but with improving in the
hidden-unit adaptive activation functions (i.e. the hyper-
bolic-tangent function). Before the training phase, the
Cop yright © 2011 Sci Res. JSEA
Face Detection and Lo cal ization in Color Images: An Efficient Neural Approach685
classifier begins with random weights at the connections
between the neurons. The learning procedure followed
by the ANN classifier is similar to the well-known back-
p ropagation procedure [22,23]. In our approach, two classes
of images are created. During the learning stage, the
ANN classifier is trained using the features extracted
from the images in the training set. The 24-bin YES his-
tograms (8-bin for each component) representing the co-
lor features are first transformed into plain vectors, and
then fused with the image-moments features. All feature
vectors are finally fed into the ANN classifier to distin-
guish the image classes. After the learning stage is fin-
ished, the system is able to detect and identify unseen
image. In fact, the classifier produces a real value be-
tween 0 and 1 which can easily be binarized by using a
predetermined threshold.
4. Experimental Results
In this section, the experiments conducted to assess the
performance of the proposed approach are described and
some of their results are presented. In order to prepare
the experiments and to provide an unbiased estimation of
the generalization abilities of the classification process,
the images in our dataset were partitioned into two inde-
pendent subsets, i.e. a training set and a test set. More
specifically, a set of images (50% of all images) were
used for training and other image (the remaining 50%)
were set aside as a test set. An MLP network with 33
input, 20 hidden and 1 output neurons was trained on the
training set, while the evaluation of the detection per-
formance was performed on the test set. The first half of
the training set were labeled as face images, while the
second half were labeled as non-face. The face-labeled
images were chosen to represent a variety of ages, gen-
ders, and skin tones. The other non-face images represent
different objects randomly collected from internet sites.
Some of the non-face images were chosen to “fool” the
neural classifier. For instance, some of these images con-
tained flesh tones or facial features. After the training
process, the neural classifier could correctly classify all
training images. The detection results obtained on the test
set are outlined in Table 1 and depicted graphically in
terms of true positive (TP) and false positive (FP) vs the
number of hidden layers of the network in Figure 3. It
may not be irrelevant to mention here that some of non-
face images in the test set contained skin tones that were
not represented in the training set (see Figure 4). These
issues were a big challenge for the proposed approach to
identify these images correctly. Figure 5 shows some
results obtained with the proposed method when applied
on “multi-face” images in the test set. These evaluation
results demonstrate that the proposed approach not only
can detect human faces, but also can accurately localize
them in multi-face images.
Table 1. Acc uracy performance vs. no. of hidden neu rons.
No. of hidden layers: 1 2 3 4 5
Average true positive (TP)0.850.92 0.95 0.980.97
Average f als e pos itive (FP)0.180.12 0.010 0.090.11
RMS error 10
<1.0 10
Figure 3. Detection performance in terms of true positive
(TP) and false positive ( FP) vs the nu mber of hi dden la yers
of the network.
Figure 4. Some results for “face” image localization: (a)
Source image; (b) Skin-colored regions; (c) Filtered image;
and (d) Localized face i mage.
Figure 5. Results for “ multi-face” image localiz ation.
Cop yright © 2011 Sci Res. JSEA
Face Detection and Lo cal ization in Color Images: An Efficient Neural Approach 686
5. Conclusions and Future Work
In this paper we have presented a computationally effi-
cient approach for real-time face detection and localiza-
tion in color images using a finite set of low-level features
directly derived from the input image. The obtained re-
sults showed that using YES histograms and color mo-
ments to detect and localize face is a promising approach.
The key advantage of the proposed approach is that the
training process takes a trivial time to complete. Further-
more the approach can locate multiple faces with encou-
raging results that enable the proposed approach to com-
pare favorably with other state-of-the-art approaches in
terms of detection and false-positive rates. Additionally,
the process of locating multiple faces in image does not
enlarge time-consuming, so that the approach can offer
timing guarantees to real-time applications. However, it
would be advantageous to explore the empirical valida-
tion of the approach on more complex large benchmark
video datasets presenting many technical challenges in
data handling such as object articulation, occlusion, and
significant background clutter. These issues are of great
interest and could be more complex, so that we plan to
address them thoroughly in our future work.
6. Acknowledgements
This work is supported by Transregional Collaborative
Research Centre SFB/TRR 62 “Companion-Technology
for Cognitive Technical Systems” funded by DFG, and
BMBF Bernstein-Group (FKZ: 01GQ0702).
REFERENCES
[1] K. Sung and T. Poggio, “Example-Based Learning for
View-Based Human Face Detection,” IEEE Transactions
on Pattern Analysis and Machine Intelligence, Vol. 20,
No. 1, 1998, pp. 39-51. doi:10.1109/34.655648
[2] K. Yow and R. Cipolla, “F eature-Based Human Face De-
tection,” Image and Vi sion Computing, Vol. 2, No . 15 , 1 99 7 ,
pp. 713-735. doi:10.1016/S0262-8856(97)00003-6
[3] T. Cootes and C. Taylor, “Locating Faces Using Statisti-
cal Featu re Detect ors, ” Pro ceed in g o f the S eco nd In tern a-
tional Conference on Automatic Face and Gesture Rec-
ognition, Killington, 14-16 October 1996, pp. 640-645.
doi:10.1109/AFGR.1996.557265
[4] T. Leung, M. Burl and P. Perona, “Fin ding Faces in Clut-
tered Scenes Using Random Labeled Graph Matching,”
Proceeding s of th e Fifth In terna tional Co nferen ce on Com-
puter Vision, C ambridg e , 20- 23 June 1995, pp. 637-644.
[5] H. Rowley, S. Bluja and T. Kanade, “Neural Network-
Based Face Detection,” IEEE Transactions on Pattern
Analysis and Machine Intelligence, Vol. 20, No. 1, 1998,
pp. 23-3 8. doi:10.1109/34.655647
[6] K. Sung and T. Poggio, “Example-Based Learning for
Viewbased Human Face Detection,” IEEE Transactions
on Pattern Analysis and Machine Intelligence, Vol. 20,
No. 1, 1998, pp. 39-51. doi:10.1109/34.655648
[7] A. Colmenarez and T. Huang, “Face Detection with In-
formation-Based Maximum Discrimination,” Proceed-
ings of the IEEE Computer Society Conference on Com-
puter Vision and Pattern Recognition, 17-19 June 1997,
pp. 278- 287 .
[8] M. D. Mariscoi, L. Cinque and S. Levialdi, “Indexing
Pictorial Documents by Their Content: A Survey of Cur-
rent Techniques,” Image and Vision Computing, Vol. 15,
No. 2, 1997, pp. 119-141.
doi:10.1016/S0262-8856(96)01114-6
[9] B. Schiele and A. Waibel, “Gaze Tracking Based on
Facecolor,” International Workshop on Face and Gesture
Recognition, Zurich, 1995 .
[10] Y. Dai and Y. Nakano, “Face-Texture Model Based on
Sgld and Its Applications in Face Detection in a Color
Scene,” Pattern Recognition, Vol. 29, No. 6, 1996, pp.
1007-1017. doi:10.1016/0031-3203(95)00139-5
[11] Q. Chen, H. Wu and M. Yachida, “Face Det ectio n by Fu-
zzy Pattern Matching,” Proceedings of the Fifth Interna-
tiona l Conferen ce on Co mput er Vision, Cambridge, 20-23
June 1995, pp. 591-596.
[12] J. Cai and A. Goshtasby, “Detecting Human Faces in
Color Images,” Image and Vision Computing, Vol. 18,
1999, pp. 63-75. doi:10.1016/S0262-8856(99)00006-2
[13] Y. Miyake, H. Saitoh, H. Yaguchi and N. Tsukada, “Fa-
cial Pattern Detection and Color Correction from Televi-
sion Picture and Newspaper Printing,” Journal of Imag-
ing Techn olo gy, Vol . 16, N o. 5, 1990, pp. 165-169 .
[14] D. Androutsos, K. N. Plataniotois and A. N. Venet-
sanopoulos, “A Novel Vector-Based Approach to Color
Image Retrieval Using a Vector Angular-Based Distance
Measure,” Computer Vision and Image Understanding,
Vol. 75, No. 1-2, 1999, pp. 46-58.
doi:10.1006/cviu.1999.0767
[15] S. Sad ek, A. Al-Hamad i, B. Mich aelis and U. Sayed, “Im-
age Retrieval Using Cubic Spline Neural Networks,” Inter-
national Journal of Video & Image Processing and Net-
work Security (IJIPNS), Vol. 9, No. 10, 2009, pp. 17-22.
[16] S. Sadek, A. Al-Hamadi, B. Michaelis and U. Sayed,
“Cubic-Spline Neural Network-Based System for Image
Retrieval,” Pro ceedings o f Sixth Intern ational IEEE Con -
ference on Image Processing (ICIP’09), Cairo, 7-11 No-
vember 2009, pp. 273-276.
[17] S. Sadek, A. Al-Hamadi, B. Michaelis and U. Sayed, “A
Robust Neural System for Objectionable Image Recogni-
tion,” Proceeding s of Second Interna tional Conference o n
Machine Vision (ICMV2009), Dubai, 28-30 December
2009, pp. 32-36.
[18] S. Sadek, A. Al-Hamadi, B. Michaelis and U. Sayed, “A
New Method for Image Classification Based on Multi-
Level Neural Networks,” Proceedings of International Con-
ference on Signal and Image Processing (IC-SIP2009), Am-
sterdam , 29 J uly - 1 August 2009, p p. 197- 200.
[19] B. Si, W. Gao, H. Lu and W. Zeng, “An Image Retrieval
Method Based Regions of Interest,” High Technology Le-
Cop yright © 2011 Sci Res. JSEA
Face Detection and Lo cal ization in Color Images: An Efficient Neural Approach
Cop yright © 2011 Sci Res. JSEA
687
tters, Vol. 13, No. 5, 2003 , pp. 13 - 18.
[20] S. Sadek, A. Al-Hamadi, B. Michaelis and U. Sayed, “An
Image Classification Approach Using Multilevel Neural
Networks,” Proceedings of IEEE International Confer-
ence on Intelligent Computing and Intelligent Systems
(ICIS’09), Shanghai, 17-20 September 2009, pp. 180-183.
[21] S. Sadek, A. Al-Hamadi, B. Michaelis and U. Sayed, “An
Efficient Approach for Region-Based Image Classifica-
tion and Retrieval,” Communications in Computer and In-
formation Science, Vol. 61, 2009, pp. 56-64.
doi:10.1007/978-3-642-10546-3_8
[22] D. Rumelhart, G. Hinton and R. Williams, “Learning
Internal Representation by Error Propagation,” Parallel Di s-
tributed Processing: Explorations in the Microstructures
of Cognition, V ol. 1, MIT Press, Cam br idg e , 1986.
[23] C. Bishop, “Neural Networks for Pattern Recognition,”
Oxford University Press, Oxford , 1995.