Fall Detection Using Head Tracking and Centroid Movement Based on a Depth Camera

ACM__paper-template_ICIST

FairouzMerrouche
Computer Science Department
USTHB, Algeria, Algiers
/ AnnaELODIES
Computer Science Department
ESC, Montreal, Canada

ABSTRACT

The number of elderly people living alone has increased over the las[1]t years and fall is one of major risks that threaten their lives. A fall detection system has become a requirement and computer vision is an efficient solution among many accurate solutions developed in this field. This paper proposes a novel method vision-based fall detection using depth camera, which combines human shape analysis, head tracking and centroid detection to validate falls. An experimental test done with SDUFall dataset which contains 20 subjects performing five daily activities and falls demonstrates the efficiency of our method, achieving up to 93.25% accuracy compared with the state-of-the-art method using the same dataset.

CCS CONCEPTS

• Computing methodologies→ Tracking; Computing methodologies → Shape representations;Computing methodologies → Object Detection

Keywords

Human activity recognition, Video monitoring, Fall detection, Human shape analysis, Head detection, Kinect sensor.

ACM Reference format:

F. Merrouche and N. Baha. 2017. SIG Proceedings Paper in Word Format. In Proceedings of ACM ICCES conference, Istanbul, Turkey, July 2017 (ICCES '17), 6 pages.

1 INTRODUCTION

Nowadays, the life expectancy of the elderly is increasing which increase the number of people who wants to live independently and without need for help. Falling, particularly a repeated falling are the leading cause of injuries [1], hospitalization and a traumatic injuries-related deaths[2] in person aged 65 or elder. Furthermore, fall can have severe consequences such depress, avoidance of activities, a fear of falling that increase from suffering [3].

In order to overcome the elderly fall problem and to prevent fall incident, an automated fall detection is an appropriate technology. There have been a numerous research for that purpose, some of the works are based on the wearable sensors such as accelerometers, gyroscope.

However, these sensors are ineffective if the user forgets to wear them or forgets to recharge or to replace the battery. Vision-based approaches using commodity cameras have proven to offer better solutions such as cameras have a low price, and not required to be worn by the user.

The recently emerged low-cost depth camera Kinect is effective based on several advantages against regular RGB camera such as the privacy of person’s life. In this paper we present a new vision-based approach for fall detection in an indoor environment. The contribution of this work lies on detecting falls in reliable way

with simple system that uses head tracking and centroid movement.

This paper is organized as follows: section 2 presents the related work in the field of fall detection systems. Section 3 presents the stages followed to fall detection. In section 4, experimental results are presented and discussed. Finally, section 5 concludes the paper with some remarks.

2 RELATED WORKS

In the previous years, there have been numerous works proposed for fall detection. According to the state of the art, fall detection methods can be divided into two categories which are: computer vision-based approaches and non-computer vision approaches. The non-computer vision methods can be divided into two main classes wearable device based and ambient device based.

In the work of [4] a special piezoelectric sensor coupled to the floor is used, a binary fall signal can be generated in case of fall, however, the main disadvantage can be false alarms. [5] have attached two accelerometers to chest and thigh of 10 persons, and a decision tree was applied to recognize posture transitions with threshold to detect falls. The main drawback of this method is to wear the sensor. Another work was proposed in [6] where a wearable accelerometer and a mobile device were used to monitor the patients in real time, the data recorded by the accelerometer is sent to the mobile device to analyze them. This approach consists in three phases. First, gathering data and making a set of simulated falls with the corresponding class Fall or non-Fall. Second, knowledge extraction as a set of IF-THEN rules and the last phase consists in patients monitoring in the decisional layer in real time but this approach still preliminary, the tests were done for just three people.

Many existing solutions have been developed with computer vision-based. There are some methods that used shape analysis to detect fall. The first step is to detect the person in each video frame using separation of people from the background. In 2D sensors, the bounding box aspect ratio was used to distinguish if the person is in an upright position or not [7]. In [8] and [9] the shape of the person was approximated by an ellipse. Despite their performance, these approaches are not feasible for fall into the direction of the camera, therefore a multi camera system was proposed to overcome this problem which benefits from the3D features to detect fall. In [10] multi-camera system is used where fall detection has been presented in two levels; the first level infers the states of the object in each frame. The second level consists in linguistic summarization of 3D person’s states called voxels. Another work of [11] used a method where four cameras were mounted in one room for classification using GMM to detect fall by analyzing the deformation in human shape.

Recently, the 3D camera has attracted a lot of attention and the use of Kinect is widely emerged because of its low cost. In [12] the centroid height of the person relative to the ground and centroid velocity were used to detect occluded fall. The person was tracked over time to obtain person’s vertical state to segment on ground events and then an ensemble of decision trees to compute a confidence that a fall occurs before an on-ground state. In[13] the key joint of human body is extracted using an RDT algorithm trained with Shannon entropy and then track these joints. SVM is employed to detect fall based on head joint distance trajectory. A new shape-based method to detect fall is used in [14] where CSS features of the silhouette in each frame and the actions are represented by a bag of CSS words and then fall is detected with classification using VPSO-ELM that uses particle swarm optimization while in [15] a Silhouette Orientation Volume (SOV) features is used to represent actions and classify falls, good results were obtained and a high accuracy was achieved. In [16] a new fall detection method is presented combining an accelerometer with Kinect. The accelerometer was used to filter out a non-fall events and Kinect sensor to authenticate the fall, which leads to good performance.

In this work, we propose a novel human shape-based fall detection that exploits the advantage of Kinect and uses shape information combined with the centroid position based on the physic mechanic principle that characterize free falls in order to validate fall.

3 PROPOSED METHOD

Our proposed method uses Kinect sensor which is a special camera developed by Microsoft for Xbox360 console. This sensor is composed of an RGB camera and an infrared (IR) depth sensor, which are characterized by a resolution of 640×480 at 30 fps.

The Kinect is able to provide a depth map where each pixel’s gray scale represents the distance in millimeters between theKinect and the objects. The infrared sensor is able to work even in a dark room which represents an advantage for any surveillance system, in this work;we use only the depth images for the privacy of the elder people.

In this paper, we make use of human shape analysis and movement. First, the head is tracked at each frame which enables the localization of its position. Second, the position of the center of mass is measured then its movement is analyzed. Third, fall is detected and validated. In the following, the main stages are described in more details. Fig. 1 represents an overview of the proposed method.

Figure 1:Overview of fall detection algorithm.

3.1 Person Detection

The first step of our system consists of extracting the human body silhouette, followed by a morphological operator erosion

and dilatation. In order to detect a person, a GMM background subtraction method is used. Fig. 2shows an example of an image

before and after background subtraction.

Figure 2:Background subtraction.

3.2 Head Tracking

The head information is very useful for fall detection as demonstrated in [17]. In this work, the head is approximated by an ellipse and tracked using particle filter. Particle Filter is widely used for tracking because of its performance. It employs a Sequential Importance Sampling algorithmor Monte Carlo method to compute the posterior density of a variable given an observation [18]. The posterior density function PDF over the state can be approximated by a set of particle. Each particle represents a state with associated weights which is an observation likelihood:

w = p(zt|xt) / (1)

Where zt is an observation measurement.

In our work, the observationcombines the score between the ellipses and head point. This algorithm has three steps, first, the initialization; second the update and then the resampling step. The algorithm 1 illustrates the use of Particle Filter for head tracking, and the results are shown in Fig. 3

Algorithm 1:Head tracking using particle filter.

Input: Ellipses modelling the Head.

1)Initialisation:

Set n=0.

for i=1,2, …,m:

 Spread the m ellipses with the initial weights .



2)Update:

for i=1, …,m:

Update the weights by the likelihood:



=p ( |)

Normalize to:



=/∑

3)Resampling:

Choose the reasonable particles based on the likelihood of each of them.



Output: Position of the head in the tthframe.

Figure 3: Head Tracking in depth frames.

3.3 Floor Detection

The floor is one of the most important features to be measured in order to compute the distance of the head to the floor, which is a discriminative feature in our system.The floor plane was calculated using cross product of three points chosen randomly from the input depth image.

Given 3 points p1 (x1,y1,z1),p2 (x2,y2,z2), p3(x3,y3,z3), we can form 2 vectors: from p1andp2 and fromp2 and p3.The cross product = X is the vector perpendicular to and and therefore it is perpendicular to the plane.

Let a,b,c be the coordinate of vector , is used to compute the d parameter using the equation (2)

d=ax1+by1+cz1 / (2)

The floor equation is given by

ax+by+cz=d / (3)

Fig 4 shows an example of floor detection using cross product.

Figure 4: Floor plane detection.

The head distance to the floor can be obtained using the distance of point to a plane computed by the formula (4)

/ (4)

3.4 Fall Detection Process

The height tothe width of the rectangle which surrounds a person silhouette was widely used to discriminate between falls and non-falls activities [7]. In this work we use a person’s height from the floor to detect a lying down pose.

When Head’s distance from the ground Dh exceeds a certain threshold Tbf, either falling or lying down event is detected.

The distance of human centroid changes rapidly and smoothly during falls. To distinguish between the two events (falling and lying down), we analyzethe center of mass movement by gathering the position information at every frame and generate a trajectorycsig from start of lying down pose [19]. Using the physic principle that characterizes free fall given by formula (5), we generate a trajectoryfreef:

z(t)=z0-a(t-t0)2 / (5)

Where

z0: ordinate of center of mass at the beginning of lying down pose.

a: the acceleration.

t0: the beginning time of lying down pose.

In in order to compare the two trajectories csig and freef, we use the covariance which is calculated as:

cov(x,y)= / (6)

where

: represents distance at frame, and the time.

, : are the average of each variable and .

Covariance of two variables is a measure which indicates how much one variable goes up when the other goes up.If the covariance is greater than a threshold Tfthen fall is validated.Fig 5 shows a free fall body trajectory and center of mass trajectory and how they apparently go down together and form the same curve. Whereas, fig 6 shows the two trajectories of the same person lying down and how they diverge.

Figure 5: Free fall trajectory and centroid trajectory of falling person.

Figure 6: Free fall trajectory and centroid trajectory of a lying down person.

Fig 7 illustrates an example of fall detection obtained for person 9 of SDUFall dataset[20] falling down.

Figure 7: Fall detection.

4 EXPEREMANTAL RESULT

4.1 Data Base

To test our method,we use SDUFall dataset available at [20] and UR Fall dataset [21]. The SDUFall Dataset contains several actions of 20 person men and women. These data were recorded with Kinect installed 1.5 height where sitting down, lying down, bending, squatting and falling actions were performed 10 times with different condition like carrying or not an object, turning the light on and off and changing the direction towards the camera. These conditions were chosen randomly. In total, there is 1200 depth video with frame size of 320x240 recorded at 30 fps in avi format.

URFall dataset contains 70 (30 falls + 40 activities of daily living) sequences.

4.2Result and Discussion

All experiment were conducted on a PC with Intel (R) Core i5 1.80 GHz. The algorithm was implemented using C++ language and openCV library.

We used the following parameters to evaluate the proposed method of fall detection [22]

TP: true positives are the number of fall events detected as fall.

TN: True negatives is the number of non-fall events detected as non-fall events.

FP: False positives is the number of non-fall events detected as fall events.

FN: False negatives is the number of fall events detected as non-falls events.

Sensitivity: The capacity of the algorithm to detect fall events.

Se= / (7)

Specificity: The capacity of the algorithm to detect non-fall events.

Sp= / (8)

Accuracy: is how many true results were correctly detected amongst all measurements.

A= / (9)

As we can observe in Table 1, 2 and 4 the true positive is 100%. The results show the performance of our method to detect fall alarms and its ability to discriminate between falls and lying down events. There were some false negatives due to errors in head tracking and some false positives occurred when a person brutally lays down, other error detections were due to errors in background subtraction.

Table 1. Experiment result of person 2 (SDUFall)

Event
Falling / Lying
Tp / 10 / -
Fn / - / 0
Fp / 2 / -
Tn / - / 8

Table 2. Experiment result of person 9 (SDUFall)

Event
Falling / Lying
Tp / 10 / -
Fn / - / 0
Fp / 0 / -
Tn / - / 10

Table 3. Experiment result of person 11 (SDUFall)

Event
Falling / Lying
Tp / 9 / -
Fn / - / 1
Fp / 0 / -
Tn / - / 10

Table 4. Experiment result of person 16 (SDUFall)

Event
Falling / Lying
Tp / 10 / -
Fn / - / 0
Fp / 0 / -
Tn / - / 10

Fig 8 and Fig 9 represent the results obtained with all the events in SDUFall DataSet [20] and URFall [21] respectively.

Figure 8: Results with different events SDUFall.

Figure 9: Results with different event URFall.

The performance of our method is shown in Table 5,Table 6.

Table 5. Fall detection performance for SDUFall

Accuracy (%) / Specificity (%) / Sensitivity (%)
93.25 / 94.7 / 86

Table 6. Fall detection performance for URFall

Accuracy (%) / Specificity (%) / Sensitivity (%)
78.68 / 80 / 76.92

The results obtained with URfall dataset are not as good as SDU- Fall due to errors encounteredin background subtraction most the cases, in addition to errors in head tracking.

To evaluate our method we compared its result with 3 other works used the SDUFall dataset. The approach of [23] and [14] are based in pose classification to detect falls. In [23] the accuracy achieved was 88.83% whereas in [14] the accuracy was 86.83%.

In [24] the method is based on human headpositioncombined with center of mass velocity in each frame to detect falls, the accuracy achieved was 92.98%. Table 7illustrates the results.

Table 7.Comparison with different approaches.

Specificity (%) / Sensitivity (%) / Accuracy (%)
Method of [21] / - / - / 88.83
Method of [14] / 77.14 / 91.15 / 86.83
Method of [22] / 93.52 / 90.76 / 92.98
Our method / 94.7 / 86 / 93.25

5 CONCLUSION

In this paper we have presented a novel fall detection method using Kinect sensor. The method combines shape features and centroid movement. The head has been tracked using particle filter. Centroid was detected in each frame and its movement was analyzed using the physic mechanic principle characterizing a free fall which allows us to distinguishbetween lying down events and fall events. We showed experimentally that our method is efficient and can detect falls accurately comparing it to some state of the art using the same dataset. The accuracy has been achieved up to 93.25%. In the further work, we will addressed to have complicated behaviors with more scenarios such as backwards fall and fall from chair for this aim we are looking forwards to build our dataset.

REFERENCES

[1]World Health Organization Department of Ageing and Life Course, 2008. WHO global report on falls prevention in older age. In: World Health Organization.Geneva, Switzerland.

[2]Stevens, J. A., Rudd,R. A., 2014.Circumstances and contributing causes of fall deaths among persons aged 65 and older: United States, 2010. In: Journal of the American Geriatrics Society, vol. 62, no. 3, pp. 470–475.

[3]Friedman, S.M., Munoz, B., West, S.K., Rubin, G.S., Fried,L.P., 2002. Falls and fear of falling: which comes first? A longitudinal prediction model suggests strategies for primary and secondary prevention. In: J Am GeriatrSoc. 50(8):1329-35.

[4]Alwan, M, Rajendran, P.J., Kell, S., Mack. D., Dalal, S., Wolfe, M., Felder, R., 2006.A smart and passive floor-vibration based fall detector for elderly. In: IEEE International Conference on Information & Communication Technologies (ICITA).pp. 1003-1007.

[5]Juan, C., Xiang,C., Minfen,S., 2013. A framework for daily activity monitoring and fall detection based on surface electromyography and accelerometer signals In: IEEE J. Biomed. Health Inform. pp 38–45.

[6]Sannino, G., De Falco, I., De Pietro, G., 2015. A supervised approach to automatically extract a set of rules to support fall detection in an mhealth system. In: Applied Soft Computing Journal. doi: 10.1016/j.asoc.2015.04.060. vol. 34. pp.205-216.