Paper Menu >>
Journal Menu >>
A Journal of Software Engineering and Applications, 2013, 6, 37-42 doi:10.4236/jsea.2013.65B008 Published Online May 2013 (http://www.scirp.org/journal/jsea) 37 Human Body Tracking and Pose Estimation Using Modified Camshift Algorithm Seung-Jun Hwang, Jae-Hong Min, In-Gyu Kim, Seung-Jae Park, Gwang-Pyo Ahn, Joong-Hwan Baek Department of Information and Telecommunication Engineering, Korea Aerospace University, Go-yang, South Korea. Email: sj.fogfog@gmail.com Received 2013 ABSTRACT In this paper, we prop ose multiple CAMShift Algo rithm based on Kalman filter and weighted search windows that ex- tracts skin color area and tracks several human body parts for real-time human tracking system. The CAMShift Algo- rithm we propose searches the skin color region by detecting the sk in color area from background model. Kalman filter stabilizes the floated search area of CAMShift Algorithm. Each occlusion areas are avoided by using weighted window of non-search areas and main- search ar ea. And sh adow s are eli minated fro m background model an d intensity of shado w. The proposed modified Camshaft algorithm can estimate human pose in real-time and achieves 96.82% accuracy even in the case of occlusions. Keywords: Body Tracking; CAMShift; Pose Estimation; Kalman Filter; Weighted Search Windows 1. Introduction Recently, with the spread and development of 3D display, the 3D(three-dimensional) content has been developing. In order to co ntrol 3D conten t, there is a need to develop more convenient and intuitive interface. Therefore, in order to implement an interface that matches with these devices, it is necessary to recognize technology to control the objects on the 3D space. 3D gesture recognition hardware has been developed such as a TOF camera and Kinect. However there is a disadvantage that the price is high compared with webcam[1]. In this paper, we propose an algorithm to track and recognize the body in the picture based RGB. It requires a precise tracking to estimate the posture of the body based on the hands and face. However, the hands and face colors are similar to each other. In addition, in the case of the background color with the color of the skin, the tracking error occurs. In order to solve this problem, we propose CAMShift Algorithm that searches the skin color region by detecting the skin color area from back- ground model[2,3]. At this time, we used a Kalman filter to stabilize the detection area of CAMShift. In addition, to prevent the loss of the detection area, we add the weights of the main-detection area and the non-detection area. For example, as hands and face overlap with each other, we propose an algorithm to avoid each other ob- struction area. This paper is organized as follows. In Chapter 2, we describe how to remove the shadow and the Gaussian background model for detection of the body. Chapter 3 describes how to recognize different body parts such as hands, face, elbows, and feet. Chapter 4 explains how the CAMShift algorithm can be used to avoid obstruction during between tracking regions, and applying Kalman filter algorithm for stabilization. Proposed experimental results are provided in Chapter 5, and concluding re- marks in Chapter 6. 2. Body Detection 2.1. Gaussian Background Model In order to keep track of each part of the body, it is nec- essary to distinct the body part from the background accu- rately. We use an adaptive Gaussian background model that can respond adaptively to changes in the background. We set the background for a period of time using a Gaussian probability density function and weights based on a plurality of color model. Applying the Gaussian background model, Figure 1 shows an example of ex- traction of the foreground image of the original image. 2.2. Shadow Elimination Data that has passed through the background separation process by the Gaussian background model, has the moving obj ect and shadow. In the no rmalized RGB color model is capable of comparison of the color without Copyright © 2013 SciRes. JSEA Human Body Tracking and Pose Estimation Using Modified Camshift Algorithm 38 brightness of pixel. Moreover, in this model, it is possi- ble to calculate the similarity of the color of each pixel in the shadow using the equ a tion (1). (1) If any pixel from the background is above the thresh- old, is judged as the area of the shadow, thus the pixel is removed. Through shadow removal, Figure 2 shows that the shadow around the lower body portion of the fore- ground image is removed, making the toes exactly sepa- rated. 3. Recognition of Body Parts 3.1. Hand and Face Recognition In this paper, in order to detect the skin area, we use the method to extract the skin color and the center of gravity. To detect the skin color from the incoming video, apply certain rules to calculate a numerical distance of skin and non-skin color. Because it defines the boundary value of a certain skin area, fast detection is possible. The input video format is set to RGB, if value for each pixel of the image meets the following equation (2), then it is de- tected as the skin color[4,5]. (2) (a) Background image (b) Input image (c) Foreground image Figure 1. Foreground extraction using Gaussian background model. Figure 2. Result of shadow elimination. However, in addition to the area of skin to be detected, the noise will ap pear in oth er parts. Through morpholog y operation, these noise components are removed. Set the region with more than a certain size as the region of in- terest. The coordinates of the detected area is recognized as the center of gravity of the region of interest. Figure 3 shows the extracted skin color region and the center of gravity. 3.2. Elbow Recognition When the lower arm and upper arm are overlapped, the position of the elbow is ambiguous. So calculate sepa- rately. At this time, using a Kalman filter makes the posi- tion of the elbow to have more stable value. With the arms extended as shown in Figure 4(a), the elbow is present in the normal direction relative to the center of the distance between the hand and shoulder. As shown in Figure 4(b) when the upper arm and the lower arm are folded together, the distance between hand and shoulder are relatively shorter than other poses. When the distance is smaller than a certain value, we assume that the arm is folded and the farthest point is considered as the elbow. 3.3. Toe tip Recognition We have assumed that the foot is in the lowest position in the body to recognize toe. As shown in the Figure, we track the foot from the lower part of shadow-removed foreground image. Like Figure 5, in the case of left foot, scan from left to right, and the right foot right to left. When the amount of pixels exceeds a threshold value, then the center of the pixels is the toe tip. Figure 3. Skin color area extraction and center of gravity. (a) Extended arm (b) Folded arm Figure 4. Result of elbow tracking. Copyright © 2013 SciRes. JSEA Human Body Tracking and Pose Estimation Using Modified Camshift Algorithm 39 4. Body Tracking Figure 6 shows KWMCAMShift algorithm’s block dia- gram that applying the weight to avoid obstruction among tracking regions, and adapting Kalman filter al- gorithm for stabilization. 4.1. CAMShift Applying the Weighted Search Window Initial region of the face and hand is sp ecified by the skin color extraction. The color histograms extracted in the initial region are similar to each other, thus diffusion of tracking area by obstruction can occur. In addition, there is no division among the regions and the center point of the tracking area appears in almost the same place. Fig- ure 7(b) is a case where the searching area is overlapped and the color distribution of the image area is expanded. We can see that the center point of the track in the origi- nal searching area is getting expand to another searching area. Therefore, it is necessary for each other’s searching area to prevent expanding if the searching areas are overlapped. In this paper, As in Equation (3), the filter which adds the weight was designed for tracking region of the pre- vious frame. a g M represents the mean motion vector of the object , aa y is the next position of tracking object, is the number of pixels in the tracking area, is the weight of color, n w g is the profile of the kernel and indicates the window size. In addition, we can do robust tracking in obstruction with eliminating other tracking areas within current tracking area in distribution function. h Figure 5. Finding toes’ ends. Figure 6. Block diagram of KWMCAMShift algorithm. (3) In this case, we apply a filter of equation (4), (5) for the overlapped areas to estimate a new motion mean vector. (4) (5) In this case, the value of m, b j n b have 0 to 1 and , m a j n a are the weights of each searching area. m a, m indicate weight-added function in the main searching area, b j n a, j n b are weight-added function of the non- searching area and is the number of non-searching area. We highlighted the main search area by adding the high-weighted value and add the low-weighted value in the non-search area. It changes the variables of the weights of the histogram and estimates the new center point by computing repeatedly. j By adding weights to the non-searching area such the dotted area in Figure 8, we prevented the spread of the searching area and the center point. Therefore, to reduce the weight of the area of the hands during the search of the face area as shown in Figure 8, since the value is reduced in formula (4), the value of the mean motion vector by () aa g i M y is decreased. We could calculate repeatedly by adjusting the variables of the weight of the histogram, prevented the spread of the search area, and maintained the orig inal search area. (a) Non-occlusion case (b) Occlusion case Figure 7. Non-occlusion and occl usion c a se s. Copyright © 2013 SciRes. JSEA Human Body Tracking and Pose Estimation Using Modified Camshift Algorithm 40 Figure 9(a) is the video that used only the CAMShift algorithm to recalculate area in the range of two times of the current-searching region. Therefore, it can be seen that the area is extended if the similar color histogram exists in current searching area. The results of maintain- ing the search area per each part of Figure 10 with weight adjustment of histogram show that the extension was not occurred as shown in Figure 9(b). 4.2. Stabilization Algorithm by Using Kalman Filter Searching area obtained from the mean motion does not hold a stable value because the shape and the intensity of the hand’s region in the searching area obtained in each frame are not constant. Figure 8. Mass center when occlusion. (a) Only CAMShifht (b) Modified CAMShift Figure 9. Result of weighted window CAM Shift. (a) Face Tracking Color Distribution Image (b) Left and Right Tracking Color Dist ribution Image Figure 10. Hue distribution images for search areas. In Figure 11, Change of color values occurs due to ir- regular illumination, even though the shape in skin area is similar. Since the CAMShift algorithm is applied in each frame, each searching area changes in color distri- bution images obtained by color histogram back-projec- tion. In order to stabilize this chang e, we should track the center point and the size of the searching area with Kal- man filter. Figure 12 is a block of CAMShift using a Kalman filter. In this paper, the state equation and the measurement equation were defined in formula (6) and (7). (6) (7) k, k are Gaussian noise, W Vk x , k y are the center points of the searching area, and, are the center points of the current measurement. Also, k ck xck y x v, k y v are the speed of the object. Furthermore, we defined the state and measurement equations of k, k (8), (9) which mean the width and height of the searching area. w h Figure 11. Histogram back-projection image in various poses. Figure 12. Block Diagram of CAMShift using Kalman filter. Copyright © 2013 SciRes. JSEA Human Body Tracking and Pose Estimation Using Modified Camshift Algorithm 41 (8) (9) ck , are the center points of the current meas- urement, k, k Wck hU Z are Gaussian noise, , are the ratio of the size of the searching area. k w rk h v 5. Applying the CAMShift Algorithm using the Kalman Filter and Weighted Search Windows and the Result 5.1. Experimental Comparison of the Proposed Algorithm with the Obstruction Figure 13(a) shows the imag es applying multiple CAM- Shift algorithm that do not add the Kalman filter and the (a) Result image (b) The distance between the center points of hands (c) Size and distance of the hands Figure 13. Result of CAMShift without weighted search window. weighted searching window areas of skin color. We found that the region of the both hands with a similar color value caused overlap with each mean shift algo- rithm and around of center’s value. We could know that we failed tracking due to the extension and the overlap of the size of two hands’ area near the 205 frame with Fig- ures 13(b), (c). Figure 14(a) shows the images applying multiple CAMShift algorithm that add the Kalman filter and the weighted search window areas of skin color. Since the trace is avoided when area of hands are obstructed, we could know that it is possible to maintain the original search area. We deserved that searching area of both hands avoided each other in occasion of both hands’ overlap by using Figures 14(b), (c). When both hands overlapped, the size of searching area extended and return to original size. 5.2. Experiments the Change of Weights with the Obstruction in Hands’ Overlapped. In order to investigate the recognition rate co rresponding to the weight in obstru ction, Figures 15, 16 has changed (a) Result image (b) The distance between the center points of hands (c) Size and distance of the hands Figure 14. Result of proposed KWMCAMShift. Copyright © 2013 SciRes. JSEA Human Body Tracking and Pose Estimation Using Modified Camshift Algorithm Copyright © 2013 SciRes. JSEA 42 5.3. Result of Pose Estimation Figure 17 is a video that estimates the posture of the body when the proposed algorithm is applied. 6. Conclusions In this paper, we propose Multi CAMShift Algorithm based on Kalman filter and weighted search windows that extracts skin color area and tracks several human body parts for real-t i me human tracki n g system. (a) (,0,20,1) (b) (3, ,20,1) j n aj n aWe estimated the width, the height, and the position in searching area of CAMShift algorithm with the Kalman filter, and we made accurate searching in occasion of obstruction add ing the weight to main-search ing area and non-searching area in mean motion vector. We found that the recognition rate o f 96.82% when we applied modified CAMShift algorithm proposed in this paper even with the obstruction. Figure 15. Result of various non-search window weights. 7. Acknowledgements This study was conducted with the assistance of the Ko- rea Aerospace University Technical Research Center of the next generation broadcast media by the GRRC (Gyeonggi-do Regional Research Center) progra m. (a) (3,0,,1) (b) (3,0,20,) m am b Figure 16. Result of various ma in -search window we ights. REFERENCES [1] J. Shotton, el al.,“Real-time Human Pose Recognition in Parts from Single Depth Images,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 20-25 June, 2011, pp. 1297-1304. [2] G. R. Bradski, “Computer Vision Face Tracking for Use in a Perceptual User Interface,” Intel Technology Journal, 2nd Quarter, 1998. [3] Xun Cai, Long Jiang, et al., “A New Region Gaussian Background Model for Video Surveillance,” Natural Computation, 2008, Vol. 6, pp. 123-127. Figure 17. Result of pose estimation using proposed algo- rithm. [4] V. Vezhnevets, V. Sazonov and A. Andreeva, “A Survey on Pixel-based Skin Color Detection Techniques,” Graphicon03, 2003, pp. 85-92. the value of the weight variable ( j n a, j n b, m,m b)of i in equation (5). As a resu lt, we concluded the recognitio n rate of 96.82% in weight (1,0,20,1). ak [5] P. Peer, J. Kovac and F. Solina, “Human Skin Colour Clustering for Face Detection,” Eurocon 2003. |