Advances in Internet of Things
Vol.3 No.2A(2013), Article ID:33325,9 pages DOI:10.4236/ait.2013.32A006

Intelligent Video Surveillance System for Elderly People Living Alone Based on ODVS

Yiping Tang, Baoqing Ma, Hangchen Yan

Zhejiang University of Technology, Hangzhou, China

Email: typ@zjut.edu.cn

Copyright © 2013 Yiping Tang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Received March 9, 2013; revised April 20, 2013; accepted April 28, 2013

Keywords: Intelligent Surveillance; Elderly People Living Alone; ODVS; MHoEI Algorithm; Pose Detection; Abnormal Behavior Recognition

ABSTRACT

Intelligent video surveillance for elderly people living alone using Omni-directional Vision Sensor (ODVS) is an important application in the field of intelligent video surveillance. In this paper, an ODVS is utilized to provide a 360˚ panoramic image for obtaining the real-time situation for the elderly at home. Some algorithms such as motion object detection, motion object tracking, posture detection, behavior analysis are used to implement elderly monitoring. For motion detection and object tracking, a method based on MHoEI(Motion History or Energy Images) is proposed to obtain the trajectory and the minimum bounding rectangle information for the elderly. The posture of the elderly is judged by the aspect ratio of the minimum bounding rectangle. And there are the different aspect ratios in accordance with the different distance between the object and ODVS. In order to obtain activity rhythm and detect variously behavioral abnormality for the elderly, a detection method is proposed using time, space, environment, posture and action to describe, analyze and judge the various behaviors of the elderly in the paper. In addition, the relationship between the panoramic image coordinates and the ground positions is acquired by using ODVS calibration. The experiment result shows that the above algorithm can meet elderly surveillance demand and has a higher recognizable rate.

1. Introduction

According to the investigation of the UN, the number of people over 65 in China will be 12.7% of the total population in 2030 [1], and the number of elderly persons who lived alone increased rapidly in recent years. As for this proliferation of the elderly, kinds of remote care services need to be provided. In 2003, GE Company made a global research on caregivers’ stress of taking care of the elderly who lived alone [2], at the top of the stress list is fall. According to another report [3], after 65 years old, 30% of the person will significantly fall within a year. And after 75 years old, this proportion will reach 42%.

At present, a variety of cameras or sensors are adopted to obtain the real-time situation for the elderly at home, and elderly abnormality is judged according to the above situations. According to abnormality type and credibility, some correction measurements are advanced in order to notify the guardian or those concerned. There are different support systems depending on the different signal acquisition methods. In general, elderly home health care technology can be classified into three kinds, home monitoring technology based on image understanding, activity signal and physiological sensor.

For the home monitoring technology [4,5] based on physiological sensor, elderly physiological parameters including ECG, blood pressure, respiration, blood glucose, body temperature, and so on are acquired with using physiological sensor in real time and the condition of elderly healthy is judged. With using this method, the elderly who suffer from chronic diseases can be not only monitored effectively, but the early symptoms of the disease are also found. However, the issue of surveillance is that the elderly often forget to wear this equipment, depending on the ability and willingness of the elderly.

As for the home monitoring technology based on activity signal, the water/gas/electricity data is obtained by using human activity sensor, switch sensor or flow sensor installed in elderly home [6]. By analyzing activity data for the elderly within a period of time, the behavior pattern is established to judge abnormal activities for the elderly and to implement the health care of the elderly.

For the home monitoring technology based on image understanding, the state of the scene is guarded through the scene camera device firstly. In the following, the scene images are transmitted by the internet or other communication methods. As for the distant guardian, the scene situation of the elderly is mastered with analyzing the scene images from the distance. To reduce the workload of the remote monitoring personnel, the scene images are preprocessed with using image processing technology, data mining technology, etc.

With the development of computer vision technology, sensor technology and telecommunication, the care support system for the elderly based on computer vision gradually comes into people’s vision in recent years. Some scholars obtain the condition of daily life [7,8] by using video image, and lots of results are achieved in some horizons. However, the motion object is lost in tracking easily because there is a blind area for the camera. To solve this problem, Huei-Yung Lin et al. proposed an intelligent surveillance system using an Omnidirectional CCD Camera [9]. The research effectively carried out the problem that there is a blind area for the camera. But the system still has some short comings. For example, the tracking algorithm is so complex that a large amount of computing resources and storage resources are consumed. The fall judgment (not combine environmental factors) is so mechanized that miscarriage of justice is brought. The tracking object is lost when it expands the unwrapped edge line. The tracking data cannot be used to study the life pattern for the elderly, so that it is difficult to analyze and judge other abnormal behaviors of the elderly.

In this paper, an ODVS is adopted to provide a 360˚ panoramic image. According to the ODVS calibration result and Bird-View image, a one-to-one correspondence is established between the ground locations and the panoramic image coordinates. For motion detection and object tracking, we propose a method based on MHoEI to obtain the trajectory and the minimum bounding rectangle information for the elderly. The posture of the elderly (such as sitting, lying, standing, squatting, etc.) is judged by the aspect ratio of the minimum bounding rectangle. Finally, elderly abnormal behaviors are described, analyzed and judged with using time, space, environment, posture and action, etc.

2. Technical Details

2.1. Panoramic Image Acquisition and Calibration

To accurately and completely detect abnormal behaviors for the elderly, surveillance equipment should be installed in some important places such as the living room, the bedroom, the bathroom, the kitchen, etc. This strategy not only improves the monitoring cost, but also makes fall detection algorithm become relatively complex. Although the monitoring equipment is installed in many different places on indoor, it still does not work out how elderly fall is detected on outdoor. In this work, an ODVS is installed in the main place of daily life for elderly such as in the middle of the living room. This system will be able to monitor the mostly daily activities for elderly. Figure 1 shows an ODVS used for the study and its panoramic image. The imaging principle and design are introduced in References [10].

A relationship can be conveniently established by the calibration of ODVS between the image pixels and the locations on the ground. Since the space is limited, the specific calibration algorithm references the literature [11,12].

2.2. Object Tracking

Object tracking will be a precondition for posture recognition, action recognition and behavior recognition. Recently, there are some tracking algorithms, such as Meanshift algorithm, Camshift algorithm, the algorithm based on feature matching, the algorithm based on shape and size, etc. However, the above algorithms will spend a lot of computing resources and involve a mount of calculation. In this work, an ultimate aim is to obtain the high-level behavior semantic of target object through analyzing the intelligent video image. Therefore, a rapid and efficient MHoEI algorithm is proposed.

To track the target object effectively, a key point is to hold a fast efficient algorithm. For MHI (Motion History Images) algorithm and MEI (Motion Energy Images) algorithm, the calculation is recursive. And the latest information is just saved with using the two algorithms. So the two algorithms are in accord with the above requirements.

The motion history image is obtained by using the MHI algorithm after the Inter-frame Difference processing and the Gray processing are used at a time intervals. The motion profile template of target is accurately obtained with using MHI algorithm and the MHI algorithm involves a very small calculation. Meanwhile, the MHI

Figure 1. An ODVS and its panoramic image.

algorithm can be used to create a motion gradient image by calculating the orientation and magnitude of the gradient given by the Sobel operator. The resulting gradient can be further used to estimate the direction of motion flow of the object. The motion object foreground is obtained by using the Inter-frame Difference processing and not using the background modeling. So the MHI algorithm have a high real-time and the calculation method is also extremely simple. However, the motion history image is not obtained if the motion object has stopped. In addition, when the foreground object is in temporarily stable state on the whole and some in motion, the MHI algorithm can only detect the foreground object that is in motion. For example, the arms constantly swing when the human body is in a resting state. Now the MHI algorithm can only detect the hands. The MHI updating equation is given by

(1)

where ts is the current time and dur is the duration time. For the duration time dur, a determining factor is a range of motion. So it is obtained through a series of dynamic search.

The motion energy image is obtained by using the MEI algorithm after the summation of Inter-frame Difference image processing is used at a time intervals. It is called binary cumulative motion energy image. The MEI updating equation is given by

(2)

where is video image sequence, is the motion region of the binary video image sequence. is generated by Inter-frame Difference image in many applications.

MEI and MHI are the two different action properties of vector image coding. Since the calculation of both algorithms are recursive and the latest information is just stored with using the MHI and the MEI, the two algorithms make the calculation become fast and efficient. We note that the match is different between the MEI and the MHI in some cases. To give the difference between the match criteria, we should distinguish the movement where the movement occur and how to carry out in fact. The MEI algorithm mainly solves where the movement occur. The MHI algorithm mostly solves how to move for the foreground object.

As for the foreground object tracking, there are two states, movement state and static state. For the motion foreground object, MHI algorithm is adopted with selecting an appropriate duration dur and the motion object can be accurately detected. For the temporary stationary foreground object, it will gradually disappear over time. However, the temporarily stationary foreground object does not gradually disappear with using the MEI algorithm over time. Nowadays it is difficult to track the foreground object for each algorithm.

To solve this problem, a MHoEI (Motion History or Energy Images) algorithm is proposed which is fast and efficient in this work. The MHoEI updating equation is given by

(3)

where V is the velocity of the foreground object, ts is the current time and dur is the duration time. The duration time can be dynamically adjusted according to the foreground object velocity. In general, the faster the foreground object moves, the smaller the dur value is. And the slower the foreground object moves, the larger the dur value is. The gray value of foreground object is not subtracted one when the velocity of foreground object is less than or equal to a threshold. For the stationary foreground object, it will not gradually disappear over time. Figure 2 shows a flow chart about MHoEI algorithm.

As for the motion object velocity, it is computed by the center of minimum bounding rectangle. We consider that is the center point of minimum bounding rectangle in the tth frame, and is the center point of minimum bounding rectangle in the frame. Then the pixel distance is computed by the distance Equation (4). So the actual distance is obtained by the ODVS calibration and S. Finally, the motion object velocity is computed according to the equation, where is the running time of each frame.

(4)

Since the geometric projection model of ODVS is different from the conventional CCD camera, there are some deformations for the motion object in the panoramic image. And the panoramic image cannot meet the visual habits and it is not convenient for computer processing. To solve the problem, Huei-Yung Lin et al. [9] proposed a approach that the motion target is tracked in 360˚ unwrapped image. The unwrapped algorithm is introduced in References [10]. However, the motion target

Figure 2. MHoEI algorithm flow chart.

is considered as two different targets when the motion target is near to the edge of 0˚ or 360˚. To solve this problem and keep the integrity and continuity of tracking object, the 20˚ overlap region is increased in the originnally unwrapped image in this paper. Figure 3 shows a 380˚ unwrapped image with tracking result.

2.3. Customization for Home Environment

At the time of paying attention to the target object for human beings, the environment where the target object is located is concerned in the first place, and then the specific behavior of the object is judged. Accurate information cannot be obtained if you only rely on the human action to identify the behavior. Although the sitting during a meal and the sitting on the sofa are the same action, the behavior which is described by the action is different. It is difficult to recognize the behavior with relying on the simple action. The master purpose is easily understood if the scene where the action is taken place is used.

Almost all the environmental elements belong to static object. It is extremely difficult to use computer vision to recognize the static object. Meanwhile, there is no need to use computer vision to analyze a variety of complex environmental factors in the practical application of intelligent video surveillance. Computational resources are spent when the environmental elements will be accurately recognized, and we need a huge knowledge base. Therefore, a customization method is used for the static camera device in home environment.

As for the customization method in home environment, the physical space is divided into a number of grids in the video image in this paper. Then a one-to-one correspondence is established between the foreground objects and the environment elements. The environment element where the target object is located is judged so long as the foreground object is in a grid.

Since the panoramic image have serious distortion on the imaging plane, especially in a horizontal position, it is generally difficult to extract the geometric features from the panoramic image. To easily set up the one-toone correspondences between the image pixels and the locations on the ground and custom all kinds of environmental elements in home, the Bird-View image is adopted to customize the environmental information in this paper. Since the space is limited, we will introduce the BirdView transform algorithm in the other papers. Two letters are used for giving the grid a name. The first letter means the line. The second letter means the column. Figure 4 shows the Customization method in home environment.

Figure 3. Tracking result in unwrapped image (+20˚).

Figure 4. Customization method in home environment.

In the Bird-view image, AH refers to the bathroom entrance. EI refers to the room entrance. BC and BG refer to the ground in the living room. BE refers to the bathroom in the living room. BD refers to the stool in the living room. For example, the human posture is determined as the sitting when the tracking box is located on the BE number which is customized as the bathroom, then we think elderly sit on the bathroom. If the tracking box disappears on the EI number, we can judge the elderly out the door. If the tracking box disappear on the AH number, we can judge the elderly into the bathroom.

2.4. Posture Detection

As for elderly surveillance, the standing, the sitting and the lying are defined as the human basic posture in this work. P is a set of the human basic posture:

(5)

The human target can be detected when it is only in motion and the posture. The human target is considered as keeping the original posture if it is not in motion. Since the human image achieved through motion detection is a high-dimensional image signal and is not recognized easily, a great deal of time has been wasted on dealing with them. To make high-dimensional image signal decrease 2-D signal, the human posture is described with using the minimum bounding rectangle of the foreground object. W is the width of the rectangle. H is the height of the rectangle. This bounding rectangle can be directly obtained from the low-level visual processing. The human posture ratio is given by, as the characteristic of describing the human posture. The posture (in middle-distance region) is distinguished through compiling the different posture ratio of human daily action k, and setting the threshold by the minimum error probability criterion. Table 1 shows the feature threshold in middle-distance region.

However, there are all sorts of human posture in fact. According to the above detection criterion, the standing will be recognized as the sitting if the arm is in expansion. To solve the above problem, comprehensive judgments are made according to tightness, solid degree, eccentricity, irregularity, etc.

Posture caused by the human unusual action misjudgement is eliminated with constructing the posture evaluation function. The posture evaluation function is given by

(6)

Table 1. Feature threshold in middle-distance region.

(7)

(8)

where F is the area of the foreground object and is the minimum bounding rectangle area of the foreground object. From the function, we can see that the function value varies in the range [0,1]. The evaluation function have the highest value when. While the human arm is in expansion, the evaluation function value becomes very low. For example, the posture is considered as undefined posture if evaluation function value is very low. In certain case, the human posture is not recognized. Figure 5 shows human posture recognition results. The experimental results show that the posture can be accurately recognized in the case of the definition posture

2.5. Abnormal Behavior Detection

The detection of abnormal behavior is a core of the care support system of the elderly with using ODVS. In this work, we want to establish a low-cost abnormal behavior detecting system for the elderly and the privacy can be protected in this system. This system can automatically notify the relevant department or the guardian when the abnormal behaviors can be detected. In addition, the guardianship can monitor the activities of the agent through the web page at any time in elderly home. To implement effective automatic monitoring for the elderly, the identification of behavior is essential under different illumination conditions. To accurately identify the elderly behavior, we should describe, analyze and judge the elderly behavior by using time, space, environment, posture, action, etc.

With this in mind, the set of abnormal behaviors is a Cartesian product of the sets:

(9)

where t\in T, p\in P, a\in A, and e\in E. And T refers to time information. P refers to posture information that it is obtained by posture detection. A refers to action information that is acquired by action recognition. E refers to

Figure 5. Human posture recognition results.

environment information which is created by the home environment customization operation. Every human behavior can be described by T code, P code, A code and E code. TPAE coding system not only provides some means for decomposing, identifying and describing human behavior diversity, but also provides a coding system for computer vision analysis.

3. Experimental Results

Our system was implemented using Eclipse on a PC using Intel Core i3 2.13GHz CPU with 4GB RAM. The panoramic video sequences are with a size of 640 × 480, 25 fps, captured by an ODVS. The resolution of unwrapped image is 740 × 180 dpi. To achieve the panoramic image in elderly room, we install ODVS in the middle of the room and the height of ODVS is about 1800 mm.

In order to evaluate the system performance, We conduct an experiment in the meeting room of 80 square meters. 50 volunteers between the ages 20 and 30 years old are asked to sit the practical. At the same time, the volunteers have different weight, height and genders. They act 6 kinds of behaviors by 6 times in the elderly room. Table 2 shows the experiment result.

From this result, we recognize that the proposed method is stable and efficient. In this table, Sum refers to number of actions, Y is the number of correct detection events, N is the number false detection events and R is the accurate rate.

To validate performance of the surveillance system, we make use of two well-known criteria that are universally applied to abnormal detection systems. Sensitivity is the capacity to detect an abnormal behavior and Specificity is the capacity to detect only an abnormal behavior:

(10)

The definitions of TP, FP, FN and TN are as follows:

shows the experiment result.

Table 2. The experiment result of behavior recognition.

According to above method, we do some experiments with 1266 abnormal/normal behaviors and 386 abnormal behaviors. Table 3 shows the evaluation of abnormal behavior recognition. According to the test data, Sensitivity and Specificity of the surveillance system will be 94.04 and 97.16 respectively.

Human object tracking is the critical technique of elderly care support. In order to testify the effectiveness of the MHoEI algorithm, the paper carried out the experiment of tracking, which showed that in comparison with ordinary methods the MHoEI algorithm is characteristics of evident effects and stable operation. Figure 6 shows the experimental results of tracking a man.

The experimental results show that the motion target can be tracked steadily with using the MHoEI algorithm in some cases such as from motion to stationary for the target, from stationary to motion for the target, a different

Table 3. Evaluation of abnormal behavior recognition.

Figure 6. Experimental results of tracking a man.

posture for the target, environment brightness changes, background block, etc. And the interference caused by the similar color in the background can be overcome effectively with using the MHoEI algorithm. To give posture recognition and behavior recognition more time for the processing, the program processing speed is at a frame rate of 10 fps and can meet the requirements of real-time character. According to the trial experiment in few months, the phenomenon of tracking lost was not found. The result of study indicates that the MHoEI has a high robustness.

To improve the accuracy of posture recognition, the human minimum bounding rectangle image is saved in a JPG format, and the recognition results with time and posture are saved on a local disk storage unit. Some examples are shown in Figure 7.

In this work, both the sitting posture and the squatting posture are regarded as the sitting posture. As you can see from the experiment, the motion target is accurately recognized in different velocity, different direction and different state (motion and stationary). From the experiment, we can also see that the ratio of sitting is close to the ratio of lying or standing when the ODVS ray is parallel with the direction of lying or standing. In this case, the mistaken identifications are caused and judged as the sitting. Although it is a very small probability, we will eliminate the miscarriage by fusing the motion of the current frame with that of the previous frames.

4. Conclusions and Future Work

Intelligent surveillance is one of the important topics in the nursing and home-care system. In this work, we have proposed a care support system of the elderly with using ODVS. An ODVS is adopted to provide a 360˚ panoramic image in this paper. According to the ODVS calibration result and Bird-View image, a one-to-one correspondence is established between the image pixels and the locations on the ground. To achieve elderly trajectory and minimum bounding rectangle, a MHoEI algorithm is proposed. This result of study indicates that the algorithm has the advantages such as: less computation amounts, higher robustness and more effective tracking. Different posture of the elderly is determined by using the aspect ratio of the minimum bounding rectangle in the basic of effective tracking. The experiment result shows that the above algorithm can meet elderly surveillance demand and has a higher recognizable rate. In order to obtain activity rhythm and detect variously behavioral abnormality for the elderly, a detection method is proposed using time, space, environment, posture and action to describe, analyze and judge the various behaviors of the elderly in the paper.

Figure 7. Experimental results of posture recognition.

As for above proposed methods, some abnormal behaviors in the ODVS view are detected and recognized. But those abnormal behaviors are not found beyond the ODVS range of observation.

To obtain abnormal activity beyond the ODVS view, we will establish an activity rhythm model for the elderly in the future. Meanwhile, activity rhythm of the elderly is not a constant. There are different distributions of activity with the changing reasons or the increasing age. To accurately detect the abnormal activity and implement modified activity model, the reasons will be also considered. In addition, to find a deeper abnormal event and behavior and provide more effective auxiliary support for real-time monitoring, we should spend more time studying the daily life data by mining technology and neural network technology.

5. Acknowledgements

Great thanks to the support from the National Natural Science Foundation (Number: 61070134) towards this project and the assistance from elderly Surveillance Service Center in the experiment in the Tokyo Zone in Japan.

REFERENCES

  1. Y. P. Tang, W. Wang and Y. Z. Fu, “Elder Health Status Monitoring through Analysis of Activity,” Chinese Journal of Computer Engineering and Applications, Vol. 43, No. 3, 2006, pp. 211-213.
  2. C. Paul, G. Meena, G. Catherine and W. Jenny, “Remote Monitoring and Adaptive Models for Caregiver Peace of Mind,” Proceedings of 2003 International Conference on Aging, Disability and Independence, Washington DC, 4- 6 December 2003, pp. 183-184.
  3. S. Andrew and J. Neil, “Smart Sensor to Detect the Falls of the Elderly,” IEEE Pervasive Computing, Vol. 3, No. 2, 2004, pp. 42-47. doi:10.1109/MPRV.2004.1316817
  4. M. E. Taylor, M. M. Ketels, K. Delbaere, S. R. Lord, A. S. Mikolaizak and J. C. T. Close, “Ait Impairment and Falls in Cognitively Impaired Older Adults: An Explanatory Model of Sensorimotor and Neuropsychological Mediators,” Age and Ageing, Vol. 41, No. 5, 2012, pp. 665-669. doi:10.1080/028418501127346846
  5. S. H. Kim and D. W. Kim, “A Study on Real-Time Fall Detection Systems Using Acceleration Sensor and Tilt Sensor,” Sensor Letters, Vol. 10, No. 5-6, 2012, pp. 5-6. doi:10.1166/sl.2012.2293
  6. H. Martin, H. Wang, K. Liam and M. Elean, “Monitoring of Activity Levels of the Elderly in Home and Community Environments Using Off the Shelf Cellular Handsets,” Proceedings of the 2010 International Conference on Consumer Electronics, Las Vegas, 9-13 January 2010, pp. 9-10.
  7. B. T. Morris and M. M. Trivedi, “Trajectory Learning for Activity Understanding: Unsupervised, Multilevel, and Long-Term Adaptive Approach,” IEEE Transaction on Pattern Analysis and Machine Intelligence, Vol. 33, No. 11, 2011, pp. 2287-2301. doi:10.1109/TPAMI.2011.64
  8. P. H. Yuan, K. F. Yang and W. H. Tsai “Real-Time Security Monitoring Around a Video Surveillance Vehicle With a Pair of Two-Camera Omni-Imaging Devices,” IEEE Transaction on Vehicular Technology, Vol. 60, No. 8, 2011, pp. 3603-3614. doi:10.1109/TVT.2011.2162862
  9. H. Y. Lin, M. L Wang, C. C Huang and B. W. Tsai, “Intelligent Surveillance Using an Omnidirectional CCD Camera,” Proceedings of the Automatic Control, Peking University Press, Taipei, 2005.
  10. Y. P. Tang, Y. J. Ye, Y. H. Zhu and X. K. Gu, “The Application Research of Intelligent Omni-Directional Vision Sensor,” Chinese Journal of Sensors and Actuators, Vol. 20, No. 6, 2007, pp. 1316-1320.
  11. D. Scaramuzza, A. Martinelli and R. Siegwart, “A Toolbox for Easy Calibrating Omnidirectional Cameras,” Proceedings of the IEEE International Conference on Intelligent Robots and Systems, Beijing, 9-15 October 2006.
  12. B. MicksiK, “Two-View Geometry of Omnidirectional Cameras,” Ph.D. Thesis, Czech Technical University, Prague, 2004.