Journal of Signal and Information Processing, 2013, 4, 62-65
doi:10.4236/jsip.2013.43B011 Published Online August 2013 (
Combining Multiple Cues for Pedestrian Detection in
Crowded Situations
Shih-Shinh Huang1, Feng-Chia Chang1, Ching-Hu Lu2
1Dept. of Computer and Communication Engineering, National Kaohsiung First University of Science and Technology; 2Dept. of
Information Communication, Yuan Ze University.
Received April, 2013.
This paper proposes a vision-based pedestrian detection in crowded situations based on a single camera. The main idea
behind our work is to fuse multiple cues so that th e major challen ges, such as occlusion an d co mplex back ground facing
in the topic of crowd detection can be successfully overcome. Based on the assumption that human heads are visible,
circle Hough transform (CHT) is applied to detect all circular regions and each of which is considered as the head can-
didate of a pedestrian. After that, the false candidates resulting from complex background are firstly removed by using
template matching algor ithm. Two pr oposed cues called head for eground contr ast (HFC) and block co lor relation (BCR)
are incorporated for further verification. The rectangular region of every detected human is determined by the geometric
relationships as well as foreground mask extracted through background subtraction process. Three videos are used to
validate the propo sed approach and the experimental r esults show that the proposed method effectively lowers the false
positives at the expense of little detection rate.
Keywords: Pedestrian Detection; Circular Hough Transform; Head Foreground Contrast; Block Color Relation
1. Introduction
As wide deployment of cameras in public environment,
such as airports, parting lots, and mass-transit stations,
accurate estimating the number of people and locating
each individual is an important issue in automation of
video surveillance system. It can provide valuable infor-
mation for safety control, urban planning, or business
managing. However, the successful pedestrian detection
in crowded situations is still a challenging problem due
to severe occlusio n, dynamic backgr ound and fo reground
clutter. The approaches for pedestrian detection can be
generally divided into categories: approaches using global
features and those using local features. A normal way to
use global feature is based on template matching, which
constructs the human templates from different viewing
angles and views. It detects human by comparing the ex-
tracted shape with the constructed templates [9, 10]. This
kind of approaches can avoid effects caused by complex
background and noises. However, the approaches which
only take the global features for human description have a
tendency to fail in detecting partially occluded humans.
In the literature, the use of complementary local fea-
tures which denote the partial appearance of human in
the image is motivated for solving the occlusion. As we
known, the Histograms of Oriented Gradients (HOGs)
are firstly proposed by Dalal et al. [11] and have been
widely used in human detection. The improvements that
make HOGs more representative were proposed in [12,
13]. Although the methods using local features ef- fec-
tively tackle with occlusion effect, they suffer from the
problem of false detection in case of complex back-
ground or severe oc clusion.
Based on the assumptions that the camera is stationary
and the pedestrians are in upright standing, this paper
proposes a method for detecting pedestrians in crowded
situations by combining multip le features. The remainder
of this paper is organized as follows. In section 2, we
introduce how to generate a set of pedestrian candidates.
Then, three features including shape, head foreground
contrast (HFC) and block color relation (BCR) for veri-
fying the detected candidates are discussed in Section 3.
Finally, we validate the proposed method by using three
videos and give some discussion.
2. Candidate Generation
In this section, we will introduce how to generate a set of
possible pedestrian candidates. The foreground regions
are firstly segmented by background subtraction
algorithm based on an assumption that the camera is
stationary. Then, a circle detection algorithm based on
Copyright © 2013 SciRes. JSIP
Combining Multiple Cues for Pedestrian Detection in Crowded Situations 63
Hough transform (CHT) is applied to detect all circular
regions and each of which is considered as the head
candidat e o f a pedestr ian.
2.1. Foreground Segmentation
The intensity distribution of each pixel is modeled by a
Gaussian distribution ),(
, where
are the mean vector and correlation matrix, respectively.
The first NB frames are used to initialize
and as follows.
In order to adapt to the background change, the
and , are updated over time and the
strategy used is similar to the method proposed in [1].
For the cu rrently observ ed image I, the pixel p is consid-
ered as the foreground if the probability
is below the pre-defined threshold
))|)(Pr( pI (p
, where is the constructed Gaussian distribu-
tion of the pixel p. The parameter
is set to 0.1 in our
work. The foreground mask can be expressed
as: (.)FM
)(|)(Pr if
)( 1
pFM (2)
2.2. Circular Hough Transform
In crowd scene with severe occlusion, the head contour is
a more robust cue which has the properties of low vari-
ance in appearance and high visibility from different
views. Accordingly, the detection of head contour is
generally served as the first step to generate a set of pos-
sible pedestrian candidates [2, 3]. In their approaches, the
-shape template is applied to locate the head-shoulder
candidates in the image. In case of mutual occlusion, the
used -shape template may result in miss detection.
Instead of using -shape template, this work applies a
circular Hough transform algorithm [4] to find out the
head candidates in the edge map of the segmented fore-
ground. The red circles shown in the left-top image of
Figure x are the detected head candidates. Apparently, all
heads are successfully detected but with several false
3. Candidate Verification
In this section, the shape feature and the proposed two
features, head foreground contrast (HFC) and block color
relation (BCR) are used to verify the detected head
candidates. These features for verification are applied in
a cascaded manner and the flow chart of is shown in
Figure 1.
Head Foreground
Contract (HFC)
Block Color
Relation (BCR)
Figure 1. The flow chart of the candidate verification.
3.1. Template Matching
Pedestrian shape has been proven its robustness in de-
scribing the pedestrian appearance. The common way to
utilize the shape for checking the existence of the pedes-
trian is by comparing it with a set of constructed tem-
plates [5]. The method to construct templates for match-
ing is the same as our previous work [6]. Let
T1 be
the set of the constructed templates. The images on the
right-bottom in Figure 2 are the learned templates. After
obtaining edge template, the similarity of the original
image origin between templates I
can be calculated
through the using of Chamfer distance [7] which is de-
fined as:
originDToriginchamfer tId
TID ),(
),( (3)
where is the number of edge points in template T
and is the distance transform of a specific im-
The pedestrian existence of a window determined by
the circle size of the detected head candidate is firstly
verified by template matching. If the minimum distance
of the pedestrian window to all templates is less than a
defined threshold 2
, this head candidate is considered
as the true positive so far. 2
is set to 10. Figure 2
shows the verification process of the template match ing.
Figure 2. Verification using template matching.
3.2. Head Foreground Contrast (HFC)
When the moving direction of the pedestrians is
horizontal with respect to the image, the foreground re-
gions inside and above head have high contrast. This
phenomenon is used to further eliminate the false posi-
tives. Let and be the two regions inside and
Copyright © 2013 SciRes. JSIP
Combining Multiple Cues for Pedestrian Detection in Crowded Situations
about the detected circular head candidate, respectively.
The pictures on the right hand side of Figure 3 is the
schematic description of Int and Avt . We define the
existence confidence of a head candidate H using HFC
FxFM )(
HFC ppConf )()(1)( )( (4)
where is an indicator function to check if
the point )(1 )( p
is a foreground one.
Figure 3. Region description of RInt and RAvt.
3.3. Block Color Relation (BCR)
According to the observation of color consistency in [2],
the colors of blocks below and above head block (back-
ground) should be different and the colors of the next
two blocks below head block should be similar as shown
in Figure 4. In this work, the Bhattacharyya distance [8]
is adopted to model the color similarity of two blocks.
Let i
and i be the color mean and covariance
matrix over a region i. Then, the color difference be-
tween two regions using the Bhattacharyya distance is
defined as:
 
jijib RRd
Consequently, the existence confidence of H using
BCR is defined as:
),(),()( 4321 RRdRRdHConf bbBCR  (6)
Then, the HCF and BCR are fused by linear weighting
as (6) and the head candidates have high combined con-
fidence is finally considered as a pedestrian head.
)(5.0)(5.0)( HConfHConfHConf HCFHCF
 (7)
The pedestrian region is determined using the circle
radius and aspect ratio of human body and an example is
shown in Figure 5.
4. Experiment
The proposed method is implemented on a platform with
2.4 GHz Intel i7 Core and 4G RAM. The OpenCV li-
brary is used to facilitate the dev elopment of our system.
In this section, we introduce the scenarios and camera
setting for collecting three videos and also give the quan-
titative analysis of our proposed method.
Re DifferencHigh
Similarity High
Figure 4. Block color relation. For a correct candidate, the
regions R1 and R2 should have high color difference; the
regions R3 and R4 should have high color similarity.
Figure 5. The determination of pedestrian region using the
radius of head circle.
In our campus, three videos are collected for validating
our proposed method. Each collected video has 120
frames with resolution 1280 × 720 and the detailed
description of these three videos in terms of depression
angle, moving direction, near/far, and scene are listed in
Table 1. The first 20 frames for each video are used for
background modeling and the ground truth of the
remaindering 100 frames are manually annotated. The
criteria used for evaluating the detected results are
detection rate (DR) and false alarm rate (FAR)
Table 1. Description of three collected videos.
In this section, we compare the performance of three
methods and analyze the complementary property of
HFC and BCR features. They are methods based on
head-foreground contrast (HFC), block color relation
(BCR), and both. Table 2 shows the experimental results
of these three videos and the some results are shown in
Figure 6. Apparently, the method combining HFC and
BCR will significantly reduce the FAR at the expense of
Copyright © 2013 SciRes. JSIP
Combining Multiple Cues for Pedestrian Detection in Crowded Situations
Copyright © 2013 SciRes. JSIP
6. Acknowledgements
little DR. To further discuss the comple mentary property
of HFC and BCR, we exhibit the detection rate of these
three methods for every frame in Figure 7. Obviously,
HFC and BCR are complementary of each other as the
red and blue lines shown.
This research is partially supported by the project grant
Table 2. Detection rate (DR) and false alarm rate (FAR) of
the three approaches for three videos, respectively. [1] C. Stauffer and W. E .L. Grimson, “Adaptive background
mixture models for real-time tracking,” IEEE Intl. Conf.
on Computer Vision and Pattern Recognition, Vol. 2,
2009, pp. 246-252.
[2] P. Tu, et al., “Unified Crowd Segmentation,” European
Conf. on Computer Vision, 2008, pp. 691-704.
[3] T. Zhao, “Bayesian Human Segmentation in Crowded
Situations,” IEEE Intl. Conf. on Computer Vision and
Pattern Recognition, Vol. 2, 2003, pp. 459-466.
[4] M. Perreira Da Silva, V. Courboulay, and A. Prigent, P.
Estraillier, “Fast, Low Resource, Head Detection and
Tracking for Interactive Applications,” PsychNology
Journal, 2009, pp. 243-264.
[5] D. M. Gavrila, “A Bayesian, Exemplar-Based Approach
to Hierarchical Sha
Figure 6. Examples of the detection results in crowded
situations. [6] S. S. Huang, C. Y. Mao, P. Y. Hsiao, and L. A. Yen,
“Global Template Matching for Guiding the Learning of
Human Detector,” IEEE Conf. on Systems, Man, and Cy-
bernetics, 2012, pp.565-570.
[7] T. Nguyen, D pe Matching,” IEEE Trans. on Pattern
Analysis and Machine Intelligence, Vol. 29, N. 8, 2007,
pp. 1408-1421. P. Ogunbona, and W. Li, “Human Detec-
tion Based on Weighted Template Matching,” IEEE Intl.
on Multimedia and Expo, 2009, pp. 634-637.
[8] S. S. Huang, L. C. Fu, and P. Y. Hsiao, “Region-Level
Motion-Based Background Modeling and Subtraction
Using MRFs,” IEEE Transactions on Image Processing,
Vol. 16, No. 5, 2007, pp. 1446-1456.
[9] S. Belongie and J. Malik, “Matching with Shape Con-
texts,” IEEE Workshop on Content-based Access of Im-
age and Video Libraries, 2000, pp. 20–26.
Figure 7. Complementary Analysis of HFC and BCR for
the video 2.
[10] A. Broggi, M. Bertozzi, A. Fascioli, and M. Sechi,
“Shaped-Based Pedestrian Detection,” IEEE Intelligent
Vehicle Symposium, 2000, pp. 215–220.
5. Conclusions
This paper presents a method for pedestrian detection in
a crowd scene. Firstly, the foreground regions are seg-
mented from background by using background subtrac-
tion technique and the circular Hough transform is used
to extract the head candidates By combining two com-
plementary features HFC and BCR, the experiment re-
sults of three videos show that the proposed method can
reduce the false positive rate at the expense of little de-
tection rate. However, the accuracy of the proposed
method is still not yet ready for real applications. The
motion cue as well as tracking strategy should be incor-
porated in the system in the near future.
[11] N. Dalal and B. Triggs, “Histograms of Oriented Gradi-
ents for Human Detection,” IEEE Intl. Conf. on Computer
Vision and Pattern Recognition, Vol. 1, 2005, pp.
[12] Z. Hao, B. Wang, and J. Teng, “Fast Pedestrian Detec-
tion Based on Adaboost and Probability Template
Matching,” Intl. Conf. on Advanced Computer Control,
Vol. 2, 2010, pp. 390–394.
[13] S. Paisitkriangkrai, C. Shen, and J. Zhang, “Performance
Evaluation of Local Features in Human Classification and
Detection,” IET Computer Vision, Vol. 2, No. 4, 2008, pp.
236–246. doi:10.1049/iet-cvi:20080026