Combining Multiple Cues for Pedestrian Detection in Crowded Situations

doi:10.4236/jsip.2013.43B011

Paper Menu >>

Journal Menu >>

Journal of Signal and Information Processing, 2013, 4, 62-65

doi:10.4236/jsip.2013.43B011 Published Online August 2013 (http://www.scirp.org/journal/jsip)

Combining Multiple Cues for Pedestrian Detection in

Crowded Situations

Shih-Shinh Huang1, Feng-Chia Chang1, Ching-Hu Lu2

1Dept. of Computer and Communication Engineering, National Kaohsiung First University of Science and Technology; 2Dept. of

Information Communication, Yuan Ze University.

Email: powwhuang@gmail.com

Received April, 2013.

ABSTRACT

This paper proposes a vision-based pedestrian detection in crowded situations based on a single camera. The main idea

behind our work is to fuse multiple cues so that th e major challen ges, such as occlusion an d co mplex back ground facing

in the topic of crowd detection can be successfully overcome. Based on the assumption that human heads are visible,

circle Hough transform (CHT) is applied to detect all circular regions and each of which is considered as the head can-

didate of a pedestrian. After that, the false candidates resulting from complex background are firstly removed by using

template matching algor ithm. Two pr oposed cues called head for eground contr ast (HFC) and block co lor relation (BCR)

are incorporated for further verification. The rectangular region of every detected human is determined by the geometric

relationships as well as foreground mask extracted through background subtraction process. Three videos are used to

validate the propo sed approach and the experimental r esults show that the proposed method effectively lowers the false

positives at the expense of little detection rate.

Keywords: Pedestrian Detection; Circular Hough Transform; Head Foreground Contrast; Block Color Relation

1. Introduction

As wide deployment of cameras in public environment,

such as airports, parting lots, and mass-transit stations,

accurate estimating the number of people and locating

each individual is an important issue in automation of

video surveillance system. It can provide valuable infor-

mation for safety control, urban planning, or business

managing. However, the successful pedestrian detection

in crowded situations is still a challenging problem due

to severe occlusio n, dynamic backgr ound and fo reground

clutter. The approaches for pedestrian detection can be

generally divided into categories: approaches using global

features and those using local features. A normal way to

use global feature is based on template matching, which

constructs the human templates from different viewing

angles and views. It detects human by comparing the ex-

tracted shape with the constructed templates [9, 10]. This

kind of approaches can avoid effects caused by complex

background and noises. However, the approaches which

only take the global features for human description have a

tendency to fail in detecting partially occluded humans.

In the literature, the use of complementary local fea-

tures which denote the partial appearance of human in

the image is motivated for solving the occlusion. As we

known, the Histograms of Oriented Gradients (HOGs)

are firstly proposed by Dalal et al. [11] and have been

widely used in human detection. The improvements that

make HOGs more representative were proposed in [12,

13]. Although the methods using local features ef- fec-

tively tackle with occlusion effect, they suffer from the

problem of false detection in case of complex back-

ground or severe oc clusion.

Based on the assumptions that the camera is stationary

and the pedestrians are in upright standing, this paper

proposes a method for detecting pedestrians in crowded

situations by combining multip le features. The remainder

of this paper is organized as follows. In section 2, we

introduce how to generate a set of pedestrian candidates.

Then, three features including shape, head foreground

contrast (HFC) and block color relation (BCR) for veri-

fying the detected candidates are discussed in Section 3.

Finally, we validate the proposed method by using three

videos and give some discussion.

2. Candidate Generation

In this section, we will introduce how to generate a set of

possible pedestrian candidates. The foreground regions

are firstly segmented by background subtraction

algorithm based on an assumption that the camera is

stationary. Then, a circle detection algorithm based on

Combining Multiple Cues for Pedestrian Detection in Crowded Situations 63

Hough transform (CHT) is applied to detect all circular

regions and each of which is considered as the head

candidat e o f a pedestr ian.

2.1. Foreground Segmentation

The intensity distribution of each pixel is modeled by a

Gaussian distribution ),(







, where



and



are the mean vector and correlation matrix, respectively.

The first NB frames are used to initialize



x

1



and as follows.





B







－

＝



(1)

In order to adapt to the background change, the

parameters,



and , are updated over time and the

strategy used is similar to the method proposed in [1].

For the cu rrently observ ed image I, the pixel p is consid-

ered as the foreground if the probability

is below the pre-defined threshold



))|)(Pr( pI (p



, where is the constructed Gaussian distribu-

tion of the pixel p. The parameter

)p(



is set to 0.1 in our

work. The foreground mask can be expressed

as: (.)FM











otherwise

)(|)(Pr if

)( 1



pNpI

pFM (2)

2.2. Circular Hough Transform

In crowd scene with severe occlusion, the head contour is

a more robust cue which has the properties of low vari-

ance in appearance and high visibility from different

views. Accordingly, the detection of head contour is

generally served as the first step to generate a set of pos-

sible pedestrian candidates [2, 3]. In their approaches, the

-shape template is applied to locate the head-shoulder

candidates in the image. In case of mutual occlusion, the

used -shape template may result in miss detection.

Instead of using -shape template, this work applies a

circular Hough transform algorithm [4] to find out the

head candidates in the edge map of the segmented fore-

ground. The red circles shown in the left-top image of

Figure x are the detected head candidates. Apparently, all

heads are successfully detected but with several false

alarms.



3. Candidate Verification

In this section, the shape feature and the proposed two

features, head foreground contrast (HFC) and block color

relation (BCR) are used to verify the detected head

candidates. These features for verification are applied in

a cascaded manner and the flow chart of is shown in

Figure 1.

Template

Matching

Head Foreground

Contract (HFC)

Block Color

Relation (BCR)

Figure 1. The flow chart of the candidate verification.

3.1. Template Matching

Pedestrian shape has been proven its robustness in de-

scribing the pedestrian appearance. The common way to

utilize the shape for checking the existence of the pedes-

trian is by comparing it with a set of constructed tem-

plates [5]. The method to construct templates for match-

ing is the same as our previous work [6]. Let





T1 be

the set of the constructed templates. The images on the

right-bottom in Figure 2 are the learned templates. After

obtaining edge template, the similarity of the original

image origin between templates I

can be calculated

through the using of Chamfer distance [7] which is de-

fined as:







originDToriginchamfer tId

TID ),(

),( (3)

where is the number of edge points in template T

and is the distance transform of a specific im-

age.

||T

(.)

The pedestrian existence of a window determined by

the circle size of the detected head candidate is firstly

verified by template matching. If the minimum distance

of the pedestrian window to all templates is less than a

defined threshold 2



, this head candidate is considered

as the true positive so far. 2



is set to 10. Figure 2

shows the verification process of the template match ing.

Figure 2. Verification using template matching.

3.2. Head Foreground Contrast (HFC)

When the moving direction of the pedestrians is

horizontal with respect to the image, the foreground re-

gions inside and above head have high contrast. This

phenomenon is used to further eliminate the false posi-

tives. Let and be the two regions inside and

Int

RAvt

Combining Multiple Cues for Pedestrian Detection in Crowded Situations

about the detected circular head candidate, respectively.

The pictures on the right hand side of Figure 3 is the

schematic description of Int and Avt . We define the

existence confidence of a head candidate H using HFC

as:

FxFM )(



Avt







Int

HFC ppConf )()(1)( )( (4)

where is an indicator function to check if

the point )(1 )( p

FxFM 

is a foreground one.

Figure 3. Region description of RInt and RAvt.

3.3. Block Color Relation (BCR)

According to the observation of color consistency in [2],

the colors of blocks below and above head block (back-

ground) should be different and the colors of the next

two blocks below head block should be similar as shown

in Figure 4. In this work, the Bhattacharyya distance [8]

is adopted to model the color similarity of two blocks.

Let i



and i be the color mean and covariance

matrix over a region i. Then, the color difference be-

tween two regions using the Bhattacharyya distance is

defined as:



 

jijib RRd





2



j



(5)

Consequently, the existence confidence of H using

BCR is defined as:

),(),()( 4321 RRdRRdHConf bbBCR  (6)

Then, the HCF and BCR are fused by linear weighting

as (6) and the head candidates have high combined con-

fidence is finally considered as a pedestrian head.

)(5.0)(5.0)( HConfHConfHConf HCFHCF 



 (7)

The pedestrian region is determined using the circle

radius and aspect ratio of human body and an example is

shown in Figure 5.

4. Experiment

The proposed method is implemented on a platform with

2.4 GHz Intel i7 Core and 4G RAM. The OpenCV li-

brary is used to facilitate the dev elopment of our system.

In this section, we introduce the scenarios and camera

setting for collecting three videos and also give the quan-

titative analysis of our proposed method.

Re DifferencHigh

Similarity High

Figure 4. Block color relation. For a correct candidate, the

regions R1 and R2 should have high color difference; the

regions R3 and R4 should have high color similarity.

Figure 5. The determination of pedestrian region using the

radius of head circle.

In our campus, three videos are collected for validating

our proposed method. Each collected video has 120

frames with resolution 1280 × 720 and the detailed

description of these three videos in terms of depression

angle, moving direction, near/far, and scene are listed in

Table 1. The first 20 frames for each video are used for

background modeling and the ground truth of the

remaindering 100 frames are manually annotated. The

criteria used for evaluating the detected results are

detection rate (DR) and false alarm rate (FAR)

Table 1. Description of three collected videos.

In this section, we compare the performance of three

methods and analyze the complementary property of

HFC and BCR features. They are methods based on

head-foreground contrast (HFC), block color relation

(BCR), and both. Table 2 shows the experimental results

of these three videos and the some results are shown in

Figure 6. Apparently, the method combining HFC and

BCR will significantly reduce the FAR at the expense of

Combining Multiple Cues for Pedestrian Detection in Crowded Situations

6. Acknowledgements

little DR. To further discuss the comple mentary property

of HFC and BCR, we exhibit the detection rate of these

three methods for every frame in Figure 7. Obviously,

HFC and BCR are complementary of each other as the

red and blue lines shown.

This research is partially supported by the project grant

101-2221-E-327-038-.

REFERENCES

Table 2. Detection rate (DR) and false alarm rate (FAR) of

the three approaches for three videos, respectively. [1] C. Stauffer and W. E .L. Grimson, “Adaptive background

mixture models for real-time tracking,” IEEE Intl. Conf.

on Computer Vision and Pattern Recognition, Vol. 2,

2009, pp. 246-252.

[2] P. Tu, et al., “Unified Crowd Segmentation,” European

Conf. on Computer Vision, 2008, pp. 691-704.

[3] T. Zhao, “Bayesian Human Segmentation in Crowded

Situations,” IEEE Intl. Conf. on Computer Vision and

Pattern Recognition, Vol. 2, 2003, pp. 459-466.

[4] M. Perreira Da Silva, V. Courboulay, and A. Prigent, P.

Estraillier, “Fast, Low Resource, Head Detection and

Tracking for Interactive Applications,” PsychNology

Journal, 2009, pp. 243-264.

[5] D. M. Gavrila, “A Bayesian, Exemplar-Based Approach

to Hierarchical Sha

Figure 6. Examples of the detection results in crowded

situations. [6] S. S. Huang, C. Y. Mao, P. Y. Hsiao, and L. A. Yen,

“Global Template Matching for Guiding the Learning of

Human Detector,” IEEE Conf. on Systems, Man, and Cy-

bernetics, 2012, pp.565-570.

[7] T. Nguyen, D pe Matching,” IEEE Trans. on Pattern

Analysis and Machine Intelligence, Vol. 29, N. 8, 2007,

pp. 1408-1421. P. Ogunbona, and W. Li, “Human Detec-

tion Based on Weighted Template Matching,” IEEE Intl.

on Multimedia and Expo, 2009, pp. 634-637.

[8] S. S. Huang, L. C. Fu, and P. Y. Hsiao, “Region-Level

Motion-Based Background Modeling and Subtraction

Using MRFs,” IEEE Transactions on Image Processing,

Vol. 16, No. 5, 2007, pp. 1446-1456.

doi:10.1109/TIP.2007.894246

[9] S. Belongie and J. Malik, “Matching with Shape Con-

texts,” IEEE Workshop on Content-based Access of Im-

age and Video Libraries, 2000, pp. 20–26.

doi:10.1109/IVL.2000.853834

Figure 7. Complementary Analysis of HFC and BCR for

the video 2.

[10] A. Broggi, M. Bertozzi, A. Fascioli, and M. Sechi,

“Shaped-Based Pedestrian Detection,” IEEE Intelligent

Vehicle Symposium, 2000, pp. 215–220.

5. Conclusions

This paper presents a method for pedestrian detection in

a crowd scene. Firstly, the foreground regions are seg-

mented from background by using background subtrac-

tion technique and the circular Hough transform is used

to extract the head candidates By combining two com-

plementary features HFC and BCR, the experiment re-

sults of three videos show that the proposed method can

reduce the false positive rate at the expense of little de-

tection rate. However, the accuracy of the proposed

method is still not yet ready for real applications. The

motion cue as well as tracking strategy should be incor-

porated in the system in the near future.

[11] N. Dalal and B. Triggs, “Histograms of Oriented Gradi-

ents for Human Detection,” IEEE Intl. Conf. on Computer

Vision and Pattern Recognition, Vol. 1, 2005, pp.

886–893.

[12] Z. Hao, B. Wang, and J. Teng, “Fast Pedestrian Detec-

tion Based on Adaboost and Probability Template

Matching,” Intl. Conf. on Advanced Computer Control,

Vol. 2, 2010, pp. 390–394.

[13] S. Paisitkriangkrai, C. Shen, and J. Zhang, “Performance

Evaluation of Local Features in Human Classification and

Detection,” IET Computer Vision, Vol. 2, No. 4, 2008, pp.

236–246. doi:10.1049/iet-cvi:20080026