A Journal of Software Engineering and Applications, 2013, 6, 37-42
doi:10.4236/jsea.2013.65B008 Published Online May 2013 (http://www.scirp.org/journal/jsea) 37
Human Body Tracking and Pose Estimation Using
Modified Camshift Algorithm
Seung-Jun Hwang, Jae-Hong Min, In-Gyu Kim, Seung-Jae Park, Gwang-Pyo Ahn,
Joong-Hwan Baek
Department of Information and Telecommunication Engineering, Korea Aerospace University, Go-yang, South Korea.
Email: sj.fogfog@gmail.com
Received 2013
In this paper, we prop ose multiple CAMShift Algo rithm based on Kalman filter and weighted search windows that ex-
tracts skin color area and tracks several human body parts for real-time human tracking system. The CAMShift Algo-
rithm we propose searches the skin color region by detecting the sk in color area from background model. Kalman filter
stabilizes the floated search area of CAMShift Algorithm. Each occlusion areas are avoided by using weighted window
of non-search areas and main- search ar ea. And sh adow s are eli minated fro m background model an d intensity of shado w.
The proposed modified Camshaft algorithm can estimate human pose in real-time and achieves 96.82% accuracy even
in the case of occlusions.
Keywords: Body Tracking; CAMShift; Pose Estimation; Kalman Filter; Weighted Search Windows
1. Introduction
Recently, with the spread and development of 3D display,
the 3D(three-dimensional) content has been developing.
In order to co ntrol 3D conten t, there is a need to develop
more convenient and intuitive interface. Therefore, in
order to implement an interface that matches with these
devices, it is necessary to recognize technology to control
the objects on the 3D space. 3D gesture recognition
hardware has been developed such as a TOF camera and
Kinect. However there is a disadvantage that the price is
high compared with webcam[1].
In this paper, we propose an algorithm to track and
recognize the body in the picture based RGB. It requires
a precise tracking to estimate the posture of the body
based on the hands and face. However, the hands and
face colors are similar to each other. In addition, in the
case of the background color with the color of the skin,
the tracking error occurs. In order to solve this problem,
we propose CAMShift Algorithm that searches the skin
color region by detecting the skin color area from back-
ground model[2,3]. At this time, we used a Kalman filter
to stabilize the detection area of CAMShift. In addition,
to prevent the loss of the detection area, we add the
weights of the main-detection area and the non-detection
area. For example, as hands and face overlap with each
other, we propose an algorithm to avoid each other ob-
struction area.
This paper is organized as follows. In Chapter 2, we
describe how to remove the shadow and the Gaussian
background model for detection of the body. Chapter 3
describes how to recognize different body parts such as
hands, face, elbows, and feet. Chapter 4 explains how the
CAMShift algorithm can be used to avoid obstruction
during between tracking regions, and applying Kalman
filter algorithm for stabilization. Proposed experimental
results are provided in Chapter 5, and concluding re-
marks in Chapter 6.
2. Body Detection
2.1. Gaussian Background Model
In order to keep track of each part of the body, it is nec-
essary to distinct the body part from the background accu-
rately. We use an adaptive Gaussian background model
that can respond adaptively to changes in the background.
We set the background for a period of time using a
Gaussian probability density function and weights based
on a plurality of color model. Applying the Gaussian
background model, Figure 1 shows an example of ex-
traction of the foreground image of the original image.
2.2. Shadow Elimination
Data that has passed through the background separation
process by the Gaussian background model, has the
moving obj ect and shadow. In the no rmalized RGB color
model is capable of comparison of the color without
Copyright © 2013 SciRes. JSEA
Human Body Tracking and Pose Estimation Using Modified Camshift Algorithm
brightness of pixel. Moreover, in this model, it is possi-
ble to calculate the similarity of the color of each pixel in
the shadow using the equ a tion (1).
If any pixel from the background is above the thresh-
old, is judged as the area of the shadow, thus the pixel is
removed. Through shadow removal, Figure 2 shows that
the shadow around the lower body portion of the fore-
ground image is removed, making the toes exactly sepa-
3. Recognition of Body Parts
3.1. Hand and Face Recognition
In this paper, in order to detect the skin area, we use the
method to extract the skin color and the center of gravity.
To detect the skin color from the incoming video, apply
certain rules to calculate a numerical distance of skin and
non-skin color. Because it defines the boundary value of
a certain skin area, fast detection is possible. The input
video format is set to RGB, if value for each pixel of the
image meets the following equation (2), then it is de-
tected as the skin color[4,5].
(a) Background image (b) Input image
(c) Foreground image
Figure 1. Foreground extraction using Gaussian background
Figure 2. Result of shadow elimination.
However, in addition to the area of skin to be detected,
the noise will ap pear in oth er parts. Through morpholog y
operation, these noise components are removed. Set the
region with more than a certain size as the region of in-
terest. The coordinates of the detected area is recognized
as the center of gravity of the region of interest. Figure 3
shows the extracted skin color region and the center of
3.2. Elbow Recognition
When the lower arm and upper arm are overlapped, the
position of the elbow is ambiguous. So calculate sepa-
rately. At this time, using a Kalman filter makes the posi-
tion of the elbow to have more stable value. With the
arms extended as shown in Figure 4(a), the elbow is
present in the normal direction relative to the center of
the distance between the hand and shoulder. As shown in
Figure 4(b) when the upper arm and the lower arm are
folded together, the distance between hand and shoulder
are relatively shorter than other poses. When the distance
is smaller than a certain value, we assume that the arm is
folded and the farthest point is considered as the elbow.
3.3. Toe tip Recognition
We have assumed that the foot is in the lowest position in
the body to recognize toe. As shown in the Figure, we
track the foot from the lower part of shadow-removed
foreground image. Like Figure 5, in the case of left foot,
scan from left to right, and the right foot right to left.
When the amount of pixels exceeds a threshold value,
then the center of the pixels is the toe tip.
Figure 3. Skin color area extraction and center of gravity.
(a) Extended arm (b) Folded arm
Figure 4. Result of elbow tracking.
Copyright © 2013 SciRes. JSEA
Human Body Tracking and Pose Estimation Using Modified Camshift Algorithm 39
4. Body Tracking
Figure 6 shows KWMCAMShift algorithm’s block dia-
gram that applying the weight to avoid obstruction
among tracking regions, and adapting Kalman filter al-
gorithm for stabilization.
4.1. CAMShift Applying the Weighted Search
Initial region of the face and hand is sp ecified by the skin
color extraction. The color histograms extracted in the
initial region are similar to each other, thus diffusion of
tracking area by obstruction can occur. In addition, there
is no division among the regions and the center point of
the tracking area appears in almost the same place. Fig-
ure 7(b) is a case where the searching area is overlapped
and the color distribution of the image area is expanded.
We can see that the center point of the track in the origi-
nal searching area is getting expand to another searching
area. Therefore, it is necessary for each other’s searching
area to prevent expanding if the searching areas are
In this paper, As in Equation (3), the filter which adds
the weight was designed for tracking region of the pre-
vious frame. a
represents the mean motion vector of
the object ,
is the next position of tracking object,
is the number of pixels in the tracking area, is
the weight of color,
n w
is the profile of the kernel and
indicates the window size. In addition, we can do
robust tracking in obstruction with eliminating other
tracking areas within current tracking area in distribution
Figure 5. Finding toes’ ends.
Figure 6. Block diagram of KWMCAMShift algorithm.
In this case, we apply a filter of equation (4), (5) for
the overlapped areas to estimate a new motion mean
In this case, the value of m, b
b have 0 to 1 and ,
a are the weights of each searching area. m
a, m
indicate weight-added function in the main searching
b are weight-added function of the non-
searching area and is the number of non-searching
area. We highlighted the main search area by adding the
high-weighted value and add the low-weighted value in
the non-search area. It changes the variables of the
weights of the histogram and estimates the new center
point by computing repeatedly.
By adding weights to the non-searching area such the
dotted area in Figure 8, we prevented the spread of the
searching area and the center point. Therefore, to reduce
the weight of the area of the hands during the search of
the face area as shown in Figure 8, since the value is
reduced in formula (4), the value of the mean motion
vector by ()
y is decreased. We could calculate
repeatedly by adjusting the variables of the weight of the
histogram, prevented the spread of the search area, and
maintained the orig inal search area.
(a) Non-occlusion case
(b) Occlusion case
Figure 7. Non-occlusion and occl usion c a se s.
Copyright © 2013 SciRes. JSEA
Human Body Tracking and Pose Estimation Using Modified Camshift Algorithm
Figure 9(a) is the video that used only the CAMShift
algorithm to recalculate area in the range of two times of
the current-searching region. Therefore, it can be seen
that the area is extended if the similar color histogram
exists in current searching area. The results of maintain-
ing the search area per each part of Figure 10 with
weight adjustment of histogram show that the extension
was not occurred as shown in Figure 9(b).
4.2. Stabilization Algorithm by Using Kalman
Searching area obtained from the mean motion does not
hold a stable value because the shape and the intensity of
the hand’s region in the searching area obtained in each
frame are not constant.
Figure 8. Mass center when occlusion.
(a) Only CAMShifht (b) Modified CAMShift
Figure 9. Result of weighted window CAM Shift.
(a) Face Tracking Color Distribution Image
(b) Left and Right Tracking Color Dist ribution Image
Figure 10. Hue distribution images for search areas.
In Figure 11, Change of color values occurs due to ir-
regular illumination, even though the shape in skin area
is similar. Since the CAMShift algorithm is applied in
each frame, each searching area changes in color distri-
bution images obtained by color histogram back-projec-
tion. In order to stabilize this chang e, we should track the
center point and the size of the searching area with Kal-
man filter. Figure 12 is a block of CAMShift using a
Kalman filter.
In this paper, the state equation and the measurement
equation were defined in formula (6) and (7).
k, k are Gaussian noise, W Vk
, k
are the center
points of the searching area, and, are the center
points of the current measurement. Also, k
v, k
v are
the speed of the object. Furthermore, we defined the state
and measurement equations of k, k (8), (9) which
mean the width and height of the searching area.
w h
Figure 11. Histogram back-projection image in various
Figure 12. Block Diagram of CAMShift using Kalman filter.
Copyright © 2013 SciRes. JSEA
Human Body Tracking and Pose Estimation Using Modified Camshift Algorithm 41
ck , are the center points of the current meas-
urement, k, k
are Gaussian noise, , are the
ratio of the size of the searching area. k
5. Applying the CAMShift Algorithm using
the Kalman Filter and Weighted Search
Windows and the Result
5.1. Experimental Comparison of the Proposed
Algorithm with the Obstruction
Figure 13(a) shows the imag es applying multiple CAM-
Shift algorithm that do not add the Kalman filter and the
(a) Result image
(b) The distance between the center points of hands
(c) Size and distance of the hands
Figure 13. Result of CAMShift without weighted search
weighted searching window areas of skin color. We
found that the region of the both hands with a similar
color value caused overlap with each mean shift algo-
rithm and around of center’s value. We could know that
we failed tracking due to the extension and the overlap of
the size of two hands’ area near the 205 frame with Fig-
ures 13(b), (c).
Figure 14(a) shows the images applying multiple
CAMShift algorithm that add the Kalman filter and the
weighted search window areas of skin color. Since the
trace is avoided when area of hands are obstructed, we
could know that it is possible to maintain the original
search area.
We deserved that searching area of both hands avoided
each other in occasion of both hands’ overlap by using
Figures 14(b), (c). When both hands overlapped, the size
of searching area extended and return to original size.
5.2. Experiments the Change of Weights with the
Obstruction in Hands’ Overlapped.
In order to investigate the recognition rate co rresponding
to the weight in obstru ction, Figures 15, 16 has changed
(a) Result image
(b) The distance between the center points of hands
(c) Size and distance of the hands
Figure 14. Result of proposed KWMCAMShift.
Copyright © 2013 SciRes. JSEA
Human Body Tracking and Pose Estimation Using Modified Camshift Algorithm
Copyright © 2013 SciRes. JSEA
5.3. Result of Pose Estimation
Figure 17 is a video that estimates the posture of the
body when the proposed algorithm is applied.
6. Conclusions
In this paper, we propose Multi CAMShift Algorithm
based on Kalman filter and weighted search windows
that extracts skin color area and tracks several human
body parts for real-t i me human tracki n g system.
(a) (,0,20,1) (b) (3, ,20,1)
aWe estimated the width, the height, and the position in
searching area of CAMShift algorithm with the Kalman
filter, and we made accurate searching in occasion of
obstruction add ing the weight to main-search ing area and
non-searching area in mean motion vector. We found that
the recognition rate o f 96.82% when we applied modified
CAMShift algorithm proposed in this paper even with
the obstruction.
Figure 15. Result of various non-search window weights.
7. Acknowledgements
This study was conducted with the assistance of the Ko-
rea Aerospace University Technical Research Center of
the next generation broadcast media by the GRRC
(Gyeonggi-do Regional Research Center) progra m.
(a) (3,0,,1) (b) (3,0,20,)
Figure 16. Result of various ma in -search window we ights.
[1] J. Shotton, el al.,“Real-time Human Pose Recognition
in Parts from Single Depth Images,” IEEE Conference on
Computer Vision and Pattern Recognition (CVPR), 20-25
June, 2011, pp. 1297-1304.
[2] G. R. Bradski, “Computer Vision Face Tracking for Use
in a Perceptual User Interface,” Intel Technology Journal,
2nd Quarter, 1998.
[3] Xun Cai, Long Jiang, et al., “A New Region Gaussian
Background Model for Video Surveillance,” Natural
Computation, 2008, Vol. 6, pp. 123-127.
Figure 17. Result of pose estimation using proposed algo-
rithm. [4] V. Vezhnevets, V. Sazonov and A. Andreeva, “A Survey
on Pixel-based Skin Color Detection Techniques,”
Graphicon03, 2003, pp. 85-92.
the value of the weight variable (
b, m,m
b)of i
in equation (5). As a resu lt, we concluded the recognitio n
rate of 96.82% in weight (1,0,20,1).
ak [5] P. Peer, J. Kovac and F. Solina, “Human Skin Colour
Clustering for Face Detection,” Eurocon 2003.