Depth-Aided Tracking Multiple Objects under Occlusion

doi:10.4236/jsip.2013.43038

Paper Menu >>

Journal Menu >>

Journal of Signal and Information Processing, 2013, 4, 299-307

http://dx.doi.org/10.4236/jsip.2013.43038 Published Online August 2013 (http://www.scirp.org/journal/jsip) 299

Depth-Aided Tracking Multiple Objects under Occlusion

Anh Tu Tran, Koichi Harada

Graduate School of Engineering, Hiroshima University, Higashihiroshima, Japan.

Email: d103714@hiroshima-u.ac.jp, hrd@hiroshima-u.ac.jp

Received May 12th, 2013; revised June 15th, 2013; accepted July 12th, 2013

License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

ABSTRACT

In this paper, we have presented a novel tracking metho d aiming at detecting objects and maintaining their la-bel/iden-

tification over the time. The key factors of this method are to use depth information and different strategies to track ob-

jects under various occlusion scenarios. The foreground objects are detected and refin ed by backgroun d sub traction and

shadow cancellation. The occlusion detection is based on information of foreground blobs in successive frames. The

occlusion regions are projected to the projection plane XZ to analysis occlusion situation. According to the occlusion

analysis results, different objects’ corresponding strategies are introduced to track objects under various occlusion sce-

narios including tracking occluded objects in similar depth layer and in different depth layers. The experimental results

show that our proposed method can track the moving objects under the most typical and challenging occlusion scenar-

ios.

Keywords: Visual Tracking; Multiple Object Tracking; Stereo Tracking; Occlusion Analysis

1. Introduction

Object tracking in the video sequence has played an im-

portant role in a research area of computer vision and a

wide range of applications, such as video monitoring and

surveillance, video conferencing and video summariza-

tion. Based on different camera configurations, objects

can be tracked by using a single camera or stereo/multi-

ple cameras. Object tracking with a single camera has

studied in many literatures and different methods have

been developed such as tracking by model-based tracking

method [1], appearance-based methods [2-4], feature-

based tracking [5], and statistical methods [6-8]. Many

algorithms can obtain good results in so me cases, such as

when the targets are separated. However, multiple object

tracking is still a challenging task due to the non-rigid

motion of deformable object, persistent occlusion and the

dynamic change of object attributes, such as color distri-

bution, shape and visibility. In the real scene, occlusion

between objects often occurs. For example, in typical

surveillance scenario a person is partially or fully oc-

cluded by other people. Unfortunately, these occlusions

lead to failed tracking. Some classical frameworks have

been extended to track multi objects. In the multi-object

tracking system [9], the level set method is used to han-

dle contour splitting and merging. Extensive methods, i.e.

Monte Carlo based probabilistic methods [10], game

theory based approaches [11] and appearance model

based deterministic methods [12,13] have been presented

to solve the mutual occlu sion p roblem. Ano ther attractive

research direction is stereo or multiple camera based

method. While object detection and tracking with a sin-

gle camera are well-explored topics, the use of multi-ca-

meras technology for this purpose has attracted much

attentions recently du e to the availability and low price of

new hardware. A multi-camera system observes the scene

from two or more different views, and obtains more

comprehensive information than a monocular camera

system, which can take the advantage of depth informa-

tion to improve the tracking system performance. Some

tracking methods focus on usage of depth information

only [14], or usage of depth information on better fore-

ground segmentation [15], or usage depth information as

a feature to be fused in a maximum likelihood model to

predict 3D object positions [16].

In this work, we have presented a novel tracking

method aiming at detecting objects and maintaining their

label/identification across video frame sequence. The

main points of this method are to use depth information

and different strategies to track objects under various

occlusion scenarios. Figure 1 shows the flowchart of our

tracking system.

The rest of this paper is organized as follows: Section

2 presents the proposed tracking method. Section 3

Depth-Aided Tracking Multiple Objects under Occlusion

300

shows experimental results; and, finally, Section 4 con-

cludes this paper.

2. Proposed Synthesis Method

Our proposed tracking system is shown in Figure 1 and

it consists of below main steps.

2.1. Depth Estimation

Depth estimation aims at calculating the structure and

depth of objects in a scene from two views or a set of

multiple views. This topic has been attracted extensive

attentions in research communities. A comprehensive

survey and evaluation of dense two-view stereo matching

algorithms can be found in [17].

In this work, depth is estimated based on block match-

ing algorithms proposal in [18]. This block matching

technique is a one-pass stereo matching algorithm that

uses a sliding sums of absolute differences window be-

tween pixels in the left image and the pixels in the right

image. An example of depth image is shown in Figure 2.

2.2. Foreground Segmentation and Shadow

Cancellation

Our method performs foreground segmentation to speed

up the process of object tracking. There are many fore-

Figure 1. The flowchart of proposed tracking method.

Figure 2. Color image and depth image.

ground segmentation algor ithms for instance of Gaussian

mixture model [19,20]. In our method, we use simple

technique based on absolute differences between current

image and background image.

In some cases, we have the fixed cameras observing

the scene, so we may have an image of the background

of the scene. However, in most case this background is

not readily available. Moreover, the background scene

often evolves over time because for example the light

condition might change or because of new object could

be added or removed from the background. Therefore, it

is necessary to dynamically build the background model

by regularly updating it. This can be accomplished by

computing moving average using the following formula:











 , (1)

where, is pixel value ate a given time ,







the current average value, and



is called the learning

rate and it defines the influence of the current value.

In our method, first a color background model is cre-

ated by computing a moving average for each channel (R,

B and G channels of color image) of each pixel of in-

coming frames (around 10 frames). The decision to de-

fine a foreground pixel is simply based on comparing the

current frame with background model and then updating

this. Specifically,

  



0if I,

Iotherwise

bg H

pIp t

Fp p











(2)

where,





p is value of pixel in foreground image, p









cbg

pIp repres ents the abso lute color differ ence

between the color value at pixel of current frame





p and the color value of pixel of background

frame





p of R, G, B channels.

t is threshold and

for the each color channel this threshold can be set to





0.3*bg

p. An example of foreground image is shown

in Figure 3.

However, the segmented foreground image includes

noise affected by the shadow. The shadow regions are the

parts of moving objects. Shadow detection and removing

will be used to refine the foreground . To avoid the effects

Depth-Aided Tracking Multiple Objects under Occlusion

301



blob area





BA , and the average depth value of blob





min

Z. Blobs are also given temporally identification





of shadows, a shadow detection described in [21] is em-

ployed. More detail about this method, please refer to

[21]. The result of the shadow detection is shown in

Figure 4.

We define two kinds of distance: the distance between

blob and blob in the same frame and the dis-

tance between blob of frame and blob of fra-

ijt

i tj

2.3. Blobs Extraction and Blobs’ Information

Store







First, we try to extract the blobs of objects from seg-

mented foreground image. Blob extraction is performed

on the foreground binary image by connected component

labeling using CvBlobLib library [22]. The foreground

binary image can obtained from simple threshold opera-

tion followed by the application of eroded and dilated

operation on the segmented foreground image. CvBlob-

sLib provides two basic function alities: extracting 8-con-

nected components, referred to as blobs, in binary or

grayscale images using Chang’s contour tracing algo-

rithm [23], and filtering the obtained blobs to get the ob-

jects in the image that we are interested in. In our method,

we remove any blobs that have area small than 100 pix-

els.

The distance





di j between blob i and blob

in the same frame is computed by:





tt tt

tti jij

di jxxyy

(3)

where





y and





y are the center coordina-

tion of blob and blob at frame t, respectively. ij

Similarly, a distance







Di j



t

between blob i of

frame and blob of frame is calculated by:





tijij

Di jxxyy



  (4)

For every frame, after extracting blobs, information

about blob is stored in a structured record for later proc-

essing steps. The blob’s information includes total num- where





y and

 







are the center coor-

dination of blob at frame and blob at frame

i tj

ber of blob



, blob’s center coordination NB











correspondingly.

2.4. Occlusion Detection

We detect the occlusion in current frame according to

blobs’ information at frame t and previous frame







. It is based on two clues. The first clue comes

from the shortest distance between blobs at the same fra-







and the second one is the difference of num-

ber of blobs at frame and . First, we find the

shortest distance





1t



(a) (b)







1t1

di j



t between blobs in frame







, assuming that it occurs between blob m and

Figure 3. Foreground segmentation. (a) input image; (b)

segmented foreground image.

blob , i.e.

 











min 11 11

,min,

tt tt

dnmdi j

 

.

We define an occlusion flag . This flag gets

occ_s t art

value t if occlusion is found at frame and otherwise it

gets value t



. Specially (see Equation (5)),

(a) (b) where,





and are the total number of blobs

in frame







and frame respectively. is tthreshold

Figure 4. Shadow regions detection. (a) segmented fore-

ground image; (b) shadow detection (the blue color pixels

are the detected shadow). the threshold of blob distance at the same frame.

 





minthreshold( 1)

occ_start

ifand ,

1otherwise

tt-

t dn,md NBNB

f











(5)

Depth-Aided Tracking Multiple Objects under Occlusion

302

Similarly, we also detect when the occlusion termi-

nates. The end of occlusion is checked based on the

shortest distance between blobs at the current frame

and the difference of number of blobs at current frame

and previous frame . We define the end of occlu-

sion flag as

following:



1t









occ_end



minthreshold 1

occ_end

if and,

1otherwise

tt t

tdn,md NBNB

f











(6)

2.5. Object Tracking

According to the result of occlusion detection, the track-

ing objects can be dividing into two types: tracking ob-

jects without occlusion and tracking objects under occlu-

sion.

2.5.1. Tracking Objects without Occlusion

The video obj ects corr esponden ce und er non-o cclusio n is

obtained through the shorted distance be-





Di j

tween blobs in previous frame and blobs in the current

frame. This distance between blobs in previous frame

and blobs in the current frame is calculated by Equation

(4). For instance, once a foreground blob at frame





1t

finds its corresponding blob in frame

, its label or identification is updated corre-

spondingly to the of blob . Specially,



B

)(ID



t

ID B







thres

of of

,min ,1,2,,

if,

mn mj

ID B ID B

DB BDB,BjNB

DB BD





















(7)

where k

B denotes the blob at frame ;





number of blobs in frame





1t; is distance

thr

Des

threshold.

2.5.2. Tracking Objects under Occlusion

The main idea of our tracking method is that the object

label or identification (ID) is maintained constantly dur-

ing occlusion and after they switch their positions.

When occlusion occurs, we can detect and extract the

occlusion region. We also can detect and separately ex-

tract a list of objects that are non-occlusion objects in

previous frame but overlaying each other in current

frame.

In order to track the objects under occlusion, depth in-

formation is used to analysis the occlusion situation. First,

the occluded regions are projected to the ground plane

Z according to their horizontal position and their

depth gray level (more detail in the next subsection).

Then according to the

Z plane, the occlusion objects

can be divided into two types based on the depth ranges:

1) in the different depth layer or 2) in the same depth

layer. We are dealing with these situations as following

parts.

1) Project Occlusion Regions into Ground Plane

Each foreground pixel in occlusion region has in-

formation obtained from depth map. These pixels are

projected to the ground plane

Z according to their

horizontal position and their depth gray level, where X is

the width of the depth map and the range of Z is [0,255].

The projected point, which is located at





z, is de-

fined as





,pxz. The value at position of pro-

jection plane is the total number of points at position x in

the depth maps that have same gray level (depth value z).

Figure 5 illustrates the image plane XY and ground plane

XZ.



,pxz



In order to remove noisy points, if the value at point





,pxz is less than threshold 1

T the point





,pxz

will be discarded. Then we also apply morphological

operations (dilating and eroding operation) to remove

noisy points and connect nearby points. The remaining

points in XZ plane are grouped in to the blobs that are

based on connected component analysis technique using

CvBlobLib library [22]. If a projected blob is small than

threshold 2, it is consider a noise and it will be re-

moved. T he pr ojecte d blo bs are define d as







1, 2,,,

PB jm where m is total number of pro-

jected blobs. Each projected blobs

PB is mask as ob-

ject regions. Figure 6 shows an example of projected

blobs in XZ plane.

As mention before, according to the projected blobs in

XZ plane, the occlusion objects can be divided into two

types based on the depth ranges: 1) in the different depth

layers or 2) in the one depth layer.

2) Tracking Occluded Objects in Different Depth Lay-

ers

Figure 6 shows the case of occluded objects is in dif-

ferent depth layers. Once occluded objects have different

Figure 5. The image plane XY and ground plane XZ .

Depth-Aided Tracking Multiple Objects under Occlusion 303

(a) (c)

(b) (d)

Figure 6. Projected for eground blobs in XZ plane. (a) input

image; (b) foreground blob (occlusion region); (c) occlusion

region (depth image); (d) projected blobs in XZ plane.

depth layer, they can be segmented in color image by

means of their depth ranges. An example of color image

segmentation by means of depth ranges is shown Figure

7. In our method, object correspondence under different

layer is based on Bhattacharyya distance [24] between

the color histograms. In statistics, the Bhattacharyya dis-

tance measures the similarity of two probability distri-

butions. In our case, Bhattacharyya distance represents

the similarity between two normalized histograms. The

Bhattacharyya distance is calculated by:

BDp q





,



(8)

where, BD denotes Bhattacharyya distance; p and q are

the two normalized color histograms; N is number of bin

in histogram.

Let is the occluded objects at

frame and denotes the ex-



,,,

kkkm

OOO O



UU,,,

n n

U U

isting non-occluded objects in previous frame n (the

frame before occlusion is found). The color histogram of

each occluded objects and existing non-occluded objects

are k

and n

, respectively. In this paper, color

histograms are created from hue component of

color space. Figure 8 shows an example of the color his-

togram.

HSV

For each occluded object , we calculate the Bhat-

tacharyya distance between color histo-





BD O

gram of this object and color histogram of every object

U in and then find the shortest distance .

Umin

(a) (b)

(e) (c)

(f) (d)

Figure 7. An example of image segmentation by means of

depth ranges. (a) input image; (b) foreground blob (occlu-

sion region); (c) occlusion region (depth image); (d) Pro-

jected blobs in XZ plane (occluded objects in different depth

layers); (e) segmented object based on depth layer 1 (red

color in d); (f) segmented object based on depth layer 2

(blue color in d).

Hue

Frequency

(a) (b)

Figure 8. An example of color histogram. (a) color image; (b)

color histogram.

The Bhattacharyya distance is computed



BD OU



according to Equation (8), specially:



 

,1 kn

ij OU

BD OUHbHb





, (9)

The occluded object will update its accord-

ing to

OID

U if





Umin

,BD OBD. Figure 9 shows the

numeric results of calculating Bhattacharyya distance

Depth-Aided Tracking Multiple Objects under Occlusion

304

(a) (b) (c)

(d) (b1) (c1)

Dista nce : 0. 32 88

BD(H ,H)

Distance: 0.4106

BD(H ,H)

(e) (e1)

BD(H ,H)

Distance: 0.4253

Distance: 0.3367

BD(H ,H)

(f) (f1)

Figure 9. Bhattacharyya distance between two histograms.

(a) input frame (having occlusion objects); (b) segmented obj.

1 in input frame (O1); (c) segmented obj. 2 in input frame

(O2); (d) Previous frame (before occlusion happened); (b1)

Histogram of object O1 in (b) (); (c1) Histogram of ob-

ject O2 in (c) (); (e) object 1 before occlusion happened

(U1); (e1) Histogram of object U1 in (e) (); distance

between and is 0.3288; distance between

and is 0.4106; (f) object 2 before occlusion happened

(U2); (f1) Histogram of object U2 in (f) (); distance

between and is 0.4253; distance between

and is 0.3367.

between two histograms.

Figure 10 illustrates an example of tracking occluded

objects in diffe rent de pth laye rs.

3) Tr acking Occlude d Ob j ects in One Depth Laye r

Figure 11 shows an example of occluded objects in si-

milar depth layer.

When occlusion objects have similar depth range or

(a) (b)

Figure 10. Tracking occluded objects in different depth la-

yers. (a) Input frame (having occlusion objec ts); (b) Output

(tracking occlusion objects in different layers).

(a) (b) (c)

Figure 11. An example of tracking occluded objects in one

depth layer. (a) input frame (having occlusion objects); (b)

occlusion region (depth image); (c) projected blobs in XZ

plane (occlusion objects in one depth layer).

full occlusion, it is difficult to segment and track multiple

objects as above technique. To deal with this problem,

we propose the tracking method based on camshift (Con-

tinuously Adaptive Mean shift) algorithm [25] as fol-

lowing part.

Assuming that there are occluded object m





1, 2,,

Oi m in the occlusion region , their

occ

existing corresponding tracks in the previous frame are





. Our algorithm has following steps:

1) Pre-computing the color histogram U

for every

existing track n

U, and the color histogram

for

occlusion region. Here we calculate a hue histogram

from

SV color space.

2) Based on the average depth values, sorting the

list of object







so that the object with biggest depth





foremost

U (i.e. the shortest distance from camera) goes

first.

3) Calculating a back projection of a hue plane of oc-

clusion region using the pre-computing histogram of



foremost

U

occ

. Based on the back projection image, finding in

the corresponding object of using cam-

shift algorithm. Label it as with according

to the of .



foremost

U

foremost



foremost

U

Figure 12 shows an example of projection image.

4) Removing the in . Selecting the next

foremo st

occ

track in the sorted







list and running step (3) to

Depth-Aided Tracking Multiple Objects under Occlusion 305

(a) (d)

(b1) (b2) (e)

Figure 12. An example of projection image. (a) Previous

frame (before occlusion happened); (b1) Object 1 before

occlusion happened (with bigger depth); (b2) Object 2 be-

fore occlusion happened; (c) Histogram of object 1 in (b1);

(d) input frame (having occlusion objects); (e) occluded

objects in input frame; (f) Projection image of occluded

region in input frame using pre-computing histogram in (c).

find the next k

O in .

occ

5) Repeating step (4) until all object in in

occ

finds their corresponding track.

Figure 13 demonstrates a result of tracking occluded

objects in one depth layer.

In practice, when projecting partly occluded objects or

fully occluded objects with similar depth ranges into

Z plane, we will obtain only one blob/region in

plane. In the case, the objects in similar depth range are

partly occluded, the above algorithm will work well (see

Figure 13). However, the fully occluded objects current

frame t will reappear as partial occluded objects bind

their occluder in the later frame, so it will be tracked by

our method (see example in Figure 14).

3. Experimental Results

In this section, we show the experimental results to

evaluate the proposed tracking method. We evaluate the

tracking performance by the capability of detecting and

maintaining constant of foreground objects during

the occlusion and after the occlusion over.

The proposed tracking method has been test on some

video sequences. The input of out method is a pair of

video sequence and the output is the left video sequence

in which has a set of moving objects labeling with

and bounding boxe s with different color.

Figure 15 shows results of tracking non-occlusion ob-

jects. In the example 15(a), in the 3 frames (left to right)

each object appears one after another and in the fourth

frame, object with has gone out the scene. These

results illustrate the ability of our algorithm in detecting

object, assigning and maintaining the objects .

We demonstrate the result of tracking of occluded ob-

jects in different depth layers in Figure 16. Our proposed

method can successful detect the object under partial

occlusion. All of objects have constantly label over the

(a) (b)

Figure 13. An example of tracking occluded objects in one

depth layer. (a) Input frame (having occlusion objects); (b)

Output (tracking occlusion objects in differe nt layers).

Figure 14. Tracking fully occluded objects. “Room sequen-

ce”: from left to right, the frame indexes are 67, 72, 73 and

74, respectively.

(a)

(b)

Figure 15. Result of tracking non-occluded objects. (a)

Tracking of non-occluded obje cts in “Gym sequence” (from

left to right, the frame number are: 34, 42, 60 and 201); (b)

Tracking of non-occluded objects in “Room sequence”

(from left to right, the frame number are: 0, 19, 113 and 250,

respectively.

Depth-Aided Tracking Multiple Objects under Occlusion

306

time even they are moving in variety of pose and posi-

tion.

Tracking the occluded object under the similar depth

layer is shown in Figure 17. These examples show that

the proposed algorithm can detect and track a partial oc-

cluded object that is equivalent to at least one half a hu-

man body.

Figure 14 shows the case an object is fully occluded

and in the similar depth layer with its occluder and our

system cannot detect it. However, in the future time when

this object reappears as partial occluded object, the sys-

tem can detect and maintain its ID. This example demon-

strate the capability of proposed algorithm in term of

maintain constant label for object over the video se-

quence.

4. Conclusions

In this paper, we have presented a novel tracking method

aiming at detecting objects and maintaining their ID over

the time. The key factor of this method is to use depth

information to track objects under various occlusion

scenarios. Different object tracking strategies are applied

(a)

(b)

Figure 16. Tracking occluded objects in different depth

layers. (a) “Gym sequence”: from left to right, the frames

indexes are 65, 73, 87 and 145, respectively; (b) “Room se-

quence”: from left to right, the frame indexes are 127, 129,

132 and 135, respectively.

(a)

(b)

Figure 17. Tracking occ luded objects in one depth lay er. (a)

“Gym sequence”: from left to right, the frames indexes are

206, 207, 208 and 209, respectively; (b) “Room sequence”:

from left to right, the frame indexes are 236, 237, 238 and

239, respectively.

according to occlusion situation including finding corre-

sponding objects based on Bhattacharyya distance be-

tween two histograms and using a camshift based algo-

rithm with the help of object depth ordering. The ex-

perimental results have confirmed the capability of our

proposed objects tracking algorithm under the most typi-

cal and challenging occlusion scenarios.

However, the proposed algorithm can work only in an

indoor or medium sized environment since the reliability

of depth information diminishes in proportion to the dis-

tance from the camera and only when the moving veloc-

ity of objects is slow. In the future work, to construct a

robust moving object tracking system in both indoor and

outdoor environment, we will study to use more object’s

features to classify and track the objects.

REFERENCES

[1] D. Koller, K. Danilidis and H.-H. Nagel, “Model-Based

Object Tracking in Monocular Image Sequences of Road

Traffic Scenes,” International Journal of Computer Vi-

sion, Vol. 10, No. 3, 1993, pp. 257-281.

doi:10.1007/BF01539538

[2] T. Hai, H. S. Sawhney and R. Kumar, “Object Tracking

with Bayesian Estimation of Dynamic Layer Representa-

tions,” IEEE Transactions on Pattern Analysis and Ma-

chine Intelligence, Vol. 24, No. 1, 2002, pp. 75-89.

doi:10.1109/34.982885

[3] A. D. Jepson, D. J. Fleet and T. F. El-Maraghi, “Robust

Online Appearance Models for Visual Tracking,” IEEE

Transactions on Pattern Analysis and Machine Intelli-

gence, Vol. 25, No. 10, 2003, pp. 1296-1311.

doi:10.1109/TPAMI.2003.1233903

[4] D. Comaniciu, V. Ramesh and P. Meer, “Kernel-Based

Object Tracking,” IEEE Transactions on Pattern Analysis

and Machine Intelligence, Vol. 25, No. 5, 2003, pp. 564-

577. doi:10.1109/TPAMI.2003.1195991

[5] J. Shi and C. Tomasi, “Good Features to Track,” Pro-

ceedings IEEE Computer Society Conference on Com-

puter Vision and Pattern Recognition (CVPR), Seattle,

21-23 June 1994, pp. 593-600.

[6] Y. Rui and Y. Chen, “Better Proposal Distributions: Ob-

ject Tracking Using Unscented Particle Filter,” Proceed-

ings of the Computer Vision and Pattern Recognition, Vol.

2, 2001, pp. 786-793.

[7] M. Isard and A. Blake, “CONDENSATION—Condi-

tional Density Propagation for Visual Tracking,” Interna-

tional Journal of Computer Vision, Vol. 29, No. 1, 1998,

pp. 5-28. doi:10.1023/A:1008078328650

[8] D. Beymer, P. McLauchlan, B. Coifma n and J. Malik, “A

Real-Time Computer Vision System for Measuring Traf-

fic Parameters,” IEEE Computer Society Conference on

Computer Vision and Pattern Recognition, San Juan,

17-19 June 1997, pp. 495-501.

[9] N. K. Paragios and R. Deriche, “A PDE-Based Level-Set

Approach for Detection and Tracking of Moving Ob-

jects,” The Sixth International Conference on Computer

Depth-Aided Tracking Multiple Objects under Occlusion

307

Vision, Bombay, 4-7 January 1998, pp. 1139-1145.

[10] Z. Tao, R. Nevatia and W. Bo, “Segmentation and Track-

ing of Multiple Humans in Crowded Environments,”

IEEE Transactions on Pattern Analysis and Machine In-

telligence, Vol. 30No. 7, , 2008, pp. 1198-1211.

doi:10.1109/TPAMI.2007.70770

[11] X. Zhou, Y. F. Li and B. He, “Game-Theoretical Occlu-

sion Handling for Multi-Target Visual Tracking,” Pattern

Recognition, Vol. 46, No. 10, 2013, pp. 2670-2684.

doi:10.1016/j.patcog.2013.02.013

[12] V. Papadourakis and A. Argyros, “Multiple Objects

Tracking in the Presence of Long-Term Occlusions,”

Computer Vision and Image Understanding, Vol. 114, No.

7, 2010, pp. 835-846. doi:10.1016/j.cviu.2010.02.003

[13] M. Wu, X. Peng, Q. Zhang and R. Zhao, “Segmenting

and Tracking Multiple Objects under Occlusion Using

Multi-Label Graph Cut,” Computers & Electrical Engi-

neering, Vol. 36, No. 5, 2010, pp. 927-934.

doi:10.1016/j.compeleceng.2009.12.013

[14] E. Parvizi and Q. M. J. Wu, “Multiple Object Tracking

Based on Adaptive Depth Segmentation,” Canadian Con-

ference on Computer and Robot Vision, Windsor, 28-30

May 2008, pp. 273-277.

[15] S. J. Krotosky and M. M. Trivedi, “On Color-, Infrared-,

and Multimodal-Stereo Approaches to Pedestrian Detec-

tion,” IEEE Transactions on Intelligent Transportation

Systems, Vol. 8, No. 4, 2007, pp. 619-629.

doi:10.1109/TITS.2007.908722

[16] R. Okada, Y. Shirai and J. Miura, “Object Tracking Based

on Optical Flow and Depth,” International Conference on

Multisensor Fusion and Integration for Intelligent Sys-

tems (IEEE/SICE/RSJ), Washington DC, 8-11 December

1996, pp. 565-571.

[17] D. Scharstein and R. Szeliski, “A Taxonomy and Evalua-

tion of Dense Two-Frame Stereo Correspondence Algo-

rithms,” International Journal of Computer Vision, Vol.

47, No. 1-3, 2002, pp. 7-42.

doi:10.1023/A:1014573219977

[18] K. Konolige, “Small Vision Systems: Hardware and Im-

plementation,” In: Y. Shirai and S. Hirose, Eds., Robotics

Research, Springer, London, 1998, pp. 203-212.

[19] Z. Zivkovic, “Improved Adaptive Gaussian Mixture

Model for Background Subtraction,” Proceedings of the

17th International Conference on Pattern Recognition,

Vol. 2, 2004, pp. 28-31.

[20] P. KaewTraKulPong and R. Bowden, “An Improved

Adaptive Background Mixture Model for Real-Time

Tracking with Shadow Detection,” In: P. Remagnino, G.

Jones, N. Paragios and C. Regazzoni, Eds., Video-Based

Surveillance Systems, Springer, Berlin, 2002, pp. 135-

144.

[21] A. Prati, I. Mikic, M. M. Trivedi and R. Cucchiara, “De-

tecting Moving Shadows: Algorithms and Evaluation,”

IEEE Transactions on Pattern Analysis and Machine In-

telligence, Vol. 25, No. 7, 2003, pp. 918-923.

doi:10.1109/TPAMI.2003.1206520

[22] “OpenCV Wiki, cvBlobLib,” 2011.

http://opencv.willowgarage.com/wiki/cvBlobsLib

[23] F. Chang, C.-J. Chen and C.-J. Lu, “A Linear-Time Com-

ponent-Labeling Algorithm Using Contour Tracing Tech-

nique,” Computer Vision and Image Understanding, Vol.

93, No. 2, 2004, pp. 206-220.

doi:10.1016/j.cviu.2003.09.002

[24] A. Bhattacharyya, “On a Measure of Divergence between

Two Statistical Populations Defined by Their Probability

Distributions,” Bulletin of the Calcutta Mathematical So-

ciety, Vol. 35, 1943, pp. 99-109.

[25] G. R. Bradski, “Real Time Face and Object Tracking as a

Component of a Perceptual User Interface,” The Fourth

IEEE Workshop on Applications of Computer Vision,

Princeton, 19-21 October 1998, pp. 214-219.