Multi-Scale Human Pose Tracking in 2D Monocular Images
OPEN ACCESS JCC
Table 4. Comparison of tracking results on sequences with
scale variations in percentage.
Approach Torso
Head Upper leg Lower leg
Upper arm
Fore arm
Total
[5] 52.5 36.3 53.8 65.0 68.8 63.4 24.3 26.7 33.8
42.5
46.7
[3] 91.1 80.3 62.7 65.4 69.8 73.9 60.4 62.6 54.5
52.1
67.3
Proposed 98.4 94.5 84.7 83.2 81.8 78.0 81.0 82.5 73.1
72.8
83.0
tions. The other two approaches fail for frames in which
the assumed fixed scale is not suitable. The quantitative
comparison is given in Table 4, where the accuracy is
evaluated based on the tracking results for all frames in
the two sequences: HE_Jogging and HE_Gestures.
Clearly, the tracking performance of our approach sur-
passes [5] and [3], although they perform well for se-
quences with no scale variation. The tracking for all body
parts are remarkably improved. It appears that the me-
thod proposed by [5] performs quite poorly when there
are significant scale variations in the image sequence.
This clearly demonstrates the importance of including
scale adjustment in the tracking process, since the overall
performance has been greatly improved.
4. Conclusion
In this paper we propose a human motion tracking
framework for 2D monocular images, specially address
the scale variation problem. An automatic scale evaluat-
ing and adjusting algorithm is proposed to adaptively
change the scale values during the tracking process. Two
metrics for this algorithm are proposed. One is Height_
Metric, which is a simple and straightforward metric
suitable for motions where the tracked target remains
upright. The other is PixelCount_Metric , which is im-
plemented by computing the ratio between pixel counts
of the foreground blobs and the detected body part
bounding boxes. This metric is more complicated yet
more generic and invariant to motion types. The effica-
cious of the proposed algorithm is demonstrated through
experiments on the publicly available Human Eva data-
sets, where the proposed algorithm can produce highly
satisfactory tracking results.
REFERENCES
[1] R. Poppe, “Vision-Based Human Motion Analysis: An
Overview,” Computer Vision and Image Understanding,
Vol. 108, No. 1C2, 2007, pp. 4-18.
[2] H. Zhou and H. Hu, “Human Motion Tracking for Reha-
bilitations—A Survey,” Biomedical Signal Processing
and Control, Vol. 3, No. 1, 2008, pp. 1-18.
Uhttp://dx.doi.org/10.1016/j.bspc.2007.09.001U
[3] Y. Lu, L. Li and P. Peursum, “Human Pose Tracking
Based on Both Generic and Specific Appearance Models,”
Control Automation Robotics & Vision, 2012, pp. 1071-
1076.
[4] J. M. del Rincon, D. Makris, C. O. Urunuela and J.-C.
Nebel, “Tracking Human Position and Lower Body Parts
Using Kalman and Particle Filters Constrained by Human
Biomechanics,” IEEE Transactions on Systems, Man, and
Cybernetics Part B: Cybernetics, Vol. 41, No. 1, 2011, pp.
26-37. Uhttp://dx.doi.org/10.1109/TSMCB.2010.2044041U
[5] D. Ramanan, D. A. Forsyth and A. Zisserman, “Tracking
People by Learning Their Appearance,” IEEE Transac-
tions on Pattern Analysis and Machine Intelligence, Vol.
29, No. 1, 2007, pp. 65-81.
Uhttp://dx.doi.org/10.1109/TPAMI.2007.250600U
[6] X. Lan and D. P. Huttenlocher, “Beyond Trees: Com-
mon-Factor Models for 2d Human Pose Recovery,” IEEE
ICCV, Vol. 1, 2005, pp. 470-477.
[7] C. Chang, R. Ansari and A. Khokhar, “Cyclic Articulated
Human Motion Tracking by Sequential Ancestral Simula-
tion,” IEEE CVPR, Vol. 2, 2004, pp. II-45.
[8] D. Ramanan and D. A. Forsyth, “Finding and Tracking
People from the Bottom Up,” IEEE CVPR, Vol. 2, 2003,
pp. II-467.
[9] R. Fablet and M. J. Black, “Automatic Detection and
Tracking of Human Motion with a View-Based Repre-
sentation,” ECCV, Springer, 2002, pp. 476-491.
[10] M. A. Fischler and R. A. Elschlager, “The Representation
and Matching of Pictorial Structures,” IEEE Transactions
on Computers, Vol. 100, No. 1, 1973, pp. 67-92.
Uhttp://dx.doi.org/10.1109/T-C.1973.223602U
[11] P. F. Felzenszwalb and D. P. Huttenlocher, “Pictorial
Structures for Object Recognition,” IJCV, Vol. 61, No. 1,
2005, pp. 55-79.
Uhttp://dx.doi.org/10.1023/B:VISI.0000042934.15159.49U
[12] P. F. Felzenszwalb and D. P. Huttenlocher, “Efficient
Matching of Pictorial Structures,” IEEE CV PR, Vol. 2,
2000, pp. 66-73.
[13] D. Ramanan, “Learning to Parse Images of Articulated
Bodies,” NIPS, Vol. 19, 2007, p. 1129.
[14] M. Eichner, M. Marin-Jimenez, A. Zisserman and V.
Ferrari, “2d Articulated Human Pose Estimation and Re -
trieval in (Almost) Unconstrained Still Images,” IJCV,
Vol. 99, No. 2, 2012, pp. 190-214.
Uhttp://dx.doi.org/10.1007/s11263-012-0524-9U
[15] M. Andriluka, S. Roth and B. Schiele, “Discriminative
Appearance Models for Pictorial Structures,” IJCV , 2012,
pp. 1-22.
[16] C. Stauffer and W. E. L. Grimson, “Adaptive Background
Mixture Models for Real-Time Tracking,” IEEE CVPR,
Vol. 2, 1999.
[17] P. KaewTraKulPong and R. Bowden, “An Improved
Adaptive Background Mixture Model for Real-Time
Tracking with Shadow Detection,” Video-Based Surveil-
lance Systems, Springer, 2002, pp. 135-144.
[18] M. Andriluka, S. Roth and B. Schiele, “Pictorial Struc-
tures Revisited: People Detection and Articulated Pose
Estimation,” IEEE CVPR, 2009, pp. 1014-1021.
[19] Y. Freund and R. E. Schapire, “A Decision-Theoretic
Generalization of On-Line Learning and an Application
to Boosting,” Journal of Computer and System Sciences,
Vol. 55, No. 1, 1997, pp. 119-139.