Calibrated, Real-Time Eye Gaze Tracking System As an Assistive System for Persons With

Jitter Reduction in Eye Gaze Tracking System and Conception of a Metric for Performance Evaluation

Anaelis Sesin, Melvin Ayala, Mercedes Cabrerizo, andMalek Adjouadi

Department of Electrical & Computer Engineering

FloridaInternationalUniversity

10555 W. Flagler Street, MiamiFL33174

U.S.A.

Abstract: - This study focuses on the design of an integrated, real-time assistive system as an alternate human computer interface (HCI) that can be used by individuals with severe motor disabilities. This interface is inherently integrated in terms of both modalities of use and in hardware-software assimilation.The integrated aspect of the design is based on the use of the remote eye-gaze tracking (EGT) system in order to obtain eye coordinates, which are sent to the computer interface where they are normalized into mouse coordinates according to the current monitor resolution, and passed through a trained neural network to reduce the ubiquitous jitter of the mouse cursor and the calibration errors due to eye movement. The novelty of this work is that it simplifies as it optimizes the eye-tracking technique to achieve near real-time and user-friendly interaction even with the more stringent remote eye gaze model in order to free the user from any intrusion. This interface also responds to the main concerns of: (a) overcoming the unavoidable calibration errors, (b) reducing the persistent cursor movement jitter, (c) determining a practical solution to the mouse click operations, and (d) developing effective means for performance evaluation.

Key-Words: - Eye gaze tracking, human-computer interaction, motor disability.

1 Introduction

Universal Access is a concept whose vision is to empower persons with disability at having access to all technological resources so that all persons, able-bodied or disabled, will have the same potential at fully harnessing the power of computing.Notable advances having been made in the past decade, however, with the present scientific and technological breakthroughs, the idea of persons with severe motor disability interacting effectively with a computer still remains a challenging endeavor.

The objective of this study was thus to seek an effective HCI that will allow individuals with severe motor disabilities interact with a computer using only eye movement with options for practical web browsing and editing. This can be achieved by using an eye-gaze tracking (EGT) system in order to obtain eye coordinates and creating an interface that will normalize the coordinates into mouse cursor coordinates.

To date, most of the work published on the field deals with improving and simplifying the estimation of the eye gaze by mathematical methods based on the geometry of the cornea. Only a limited number of artificial neural networks (ANN) applications can be found on the area of face pose tracking with the purpose of redirecting the camera to the reference eye in case of head movement during working sessions as exemplified in [1, 2]. Studies in EGT jitter and calibration error reduction by means of ANNs constitute a relatively new research endeavor that has practical merit, as will be demonstrated by the results obtained in this study.

In developing the prototype design, several important tasks have been performed to ensure accurate cursor displacements:

(1)Established a repeatable process in order to obtain a traceable eye image, which includes adjusting the values for the pupil and corneal reflection (C.R.) thresholds;

(2)Accomplished calibration of raw data and computation of the point of regard using the pupil and corneal reflection differences at five different points, one point at the center of the field of view and one point for each of the four quadrants of the visual scene;

(3)Enhanced the data transport mechanism between the eye-gaze system and the host computer, which involved developing a data recovery technique;

(4)Translated the data into cursor movement using three steps, first the data is converted into monitor coordinates, then to cursor coordinates, and finally calibrated using a trained neural network in order to eliminate the jittering of the mouse pointer.

(5)Experimented with applications involving web browsing, editing and using email with the assistance of an on-screen keyboard that is included in the software interface which is explained in details in [3].

2 Eye Gaze Tracking System

2.1 EGT Components

The EGT system as proposed consists of a CPU, an eye monitor, a stimulus monitor, an eye imaging camera, and an infrared light source (Fig. 1).

Fig. 1: Eye gaze tracking system components

The CPU contains three eye-cards (ISCAN RK-726PCI, RK-620PC, RK-464) as well as the raw eye movement data acquisition software[4].

The RK-726PCI Pupil/Cornel Reflection Tracking System is one-half slot real-time image processor that tracks the center of a subject’s pupil and reflection from the corneal surface, and measures the pupil size. The RK-620PC Auto-calibration System is a ¾ slot ISA bus real time computation and display unit used to calculate a subject’s point of regard with respect to the viewed scene using the raw eye position data generated by RK-726PCI.

The RK-464 Remote Eye Imaging System is an eye imaging system which allows the operator to control the direction, focus, magnification and iris of the eye imaging camera from the control console.

The raw eye movement data acquisition software allows the user to control the data collection process. It allows the eye tracking system to be adjusted for any subject. Incoming eye movement and auxiliary data can be seen graphically in real-time, calibrated, and recorded. Calibrated or raw eye movement and auxiliary data can be output in real-time through the serial port.

2.2 Eye Tracking Technique

The corneal and pupil reflection relationship is the foundation for the technique used to determine the movement of the eye [5]. This technique is based on shining an infrared light into the eye and measuring the amount of light that gets reflected, as well as the shape and position of the pupil with respect to the head. The type of light used is infrared light to avoid distraction and interference from other light sources like lamps.

When infrared light is shone into the subject’s eye, some light is reflected by the boundaries between the lens and the cornea. These reflections are called purkinje images[6, 7]. The glint (the purkinje image) and the reflection produced by the retina are recorded by the infrared sensitive camera as a bright spot and a less bright disc. When the eye is panned horizontally or vertically, the relative position of the glint and the center of the bright-eye change accordingly, and the direction of gaze can be calculated from this relative position.

3 Algorithm for Reducing Jittering and Calibration Errors

The screen coordinates input are sporadic due to eye movement, and when used directly to move the mouse pointer, the mouse would jitter all over the screen.

The main idea of using an ANN to reduce the jitter of the mouse pointer was to replace the one-to-one relationship between the screen coordinates and the position of the mouse pointer with an n-to-1 relationship obtained with a trained ANN.

3.1 Algorithm Description

The mouse pointer trajectory(Fig. 2) is subdivided into small subtrajectories by using time frames Δt. During that time frame, the x and y coordinates of the screen that were generated by the Mouse Coordinates Generator (MCG) module would be used as inputs to calculate the actual position of the mouse pointer.

Fig. 2: Trajectory is fragmented using a time frame Δt. PStart and PEndare the initial and final positions of mouse pointer, respectively.

Unlike a mouse, it is relative difficult to control eye position consciously and precisely at all times [9]. Therefore, by subdividing the movement of the mouse into smaller sections, the trajectory of the mouse can be described linearly, which would make it easier to predict the actual position of the mouse.

A decisive step was assumed in the definition of the time frame size, which accounts for the number of coordinates averaged and sent to the application in use.

The EGT generated the data at a frequency of 60 Hz. However, to be able to use more than one input to generate one output, the sampling frequency had to be less than 60 Hz. In order to decide on a suitable frequency two requirements had to be met: (1) to still guaranty a smooth mouse pointer movement perception, and (2) to have sufficient input points to reduce them to an acceptable averaged position.

To facilitate a smoothing mouse pointer movement, the sampling frequency should not be much less that 24 Hz, which is the physiological frequency of the human eye perception of movement. A good choice would have been to use the sampling rate equal to 24 Hz. However, this would cause the number of input to be 60/24 = 2.5, which is a very small number of inputs causing the jittering to be quite high. After testing with different sampling frequencies, the best results were obtained at a sampling frequency of 10Hz. At this frequency, the mouse pointer still shows a relatively smooth trajectory, and the size of the sampling window is 60/10 = 6, providing an acceptable number of reference points necessary to yield the correct position of the mouse pointer.

3.2 Design of the Artificial Neural Network

A decisive parameter that needed to be determined was the number of input units that would be used to build the ANN. The criteria that were taken into consideration were the number of input points and features that would be used for each input. In this case, the network has only two features to take into account, since the inputs are the x and y ordinates of the 6 mouse pointer locations. Consequently, the ANN uses 12 input units.

Finally, the last two required parameters were the number of hidden and output units. After testing the design using different number of hidden units, it was determined that the best results were obtained with 20 hidden units. The training involved 5-fold cross validation. Since the outputs of the networkare x and y ordinates, only two output units are needed (xout and yout).

3.3 Training Process

The purpose of training the ANN was to learn how the EGT inputs statistically relate to the jittering and calibration error in a working session with the stimulus computer in order to cancel or reduce these effects during that time.

The training process was conceptually divided into three steps: (1) data is read from the EGT system, (2) converted to screen coordinates, and then (3) used as input to the ANN for training (Fig. 3).

For the preliminary training stage, a graphical user interface (GUI) application [8] was developed in which a small button is used as a moving target throughout the screen covering as much screen area as possible. The subject is asked to follow the button with the eyes. As the button moves during a time frame Δt, the EGT data is translated to screen coordinates which are taken as the input of the network, while the actual position of the button is considered the target. This process is executed for a few minutes to allow the control application to collect evaluation data. These values represent the training table and are automatically passed onto the training module, which learns the mapping between the inputs and the jittering on screen, as well as the offset due to the unavoidable calibration error. The training stage implements a backpropagation algorithm to compute the weights and biases of the network that will be applied to cancel or reduce the aforementioned negative effects. Experiments conducted proved that a time of 1 to 2 minutes is sufficient to train the network with a 5% confidence error.

Fig. 3:Employed training logic

3.4 Training Pattern Extraction

The data collection process lasted two minutes for each subject. During this time, a total of 1,440 samples were collected.

(1)

Based on the collected data, the training features are extracted and a training table is generated.

For a window size of 6 samples, 1200 (7200 / 6) training patterns were extractedas illustrated in Fig.4 (shown at the end of this article). The scrolling window used for data collection does not overlap, which simplifies the programming code; and only one button position is collected in each sampling window. During the pattern extraction procedure, the screen coordinates are collectedfor each time frame and used as training input patterns. The average location of a moving button, which the subject is asked to follow with his/her eyes is also collected and is used as the training target.For convenience, the coordinates to the network are dividedby the screen size (in terms of width and height).

3.5 Usage after Training

Once the ANN is trained, the screen coordinates generated by the MCG module are no longer used directly to move the mouse pointer. Instead, the output (actual position of the mouse) is passed to the Mouse Control module (Fig. 5)which shrinks the coordinates and passes them to the trained ANN which in turn outputs a percentage location which is then rescaled in order to obtain the expected mouse position.

Fig. 5:Application procedure with trained ANN logic

4 Simulation of the Mouse Click Event

In Addition to augmenting the quality of the mouse pointer movement, an investigation was conducted to implement a left click on desired screen areas, which can be buttons, menus or links.

When the subject closes the left eye, the stimulus computer reads zeros. Thus by counting the number of zeros coming into the stimulus computing, the application can differentiatebetween a blink or whether the user wants to triggera left-click event.

Repeated experiments reveal that the number of consecutive zeros needed to simulate a voluntary left-click was 20.

5 Discussion

A practical experiment was thus conducted in order to determine the prospects for jitter and calibration error reduction. The test was conductedwith the same application that was used during the preliminary training stage (Fig. 6).

Fig. 6:Graphical evaluation application

The idea underlying the jittering evaluation proposed in this study is depicted in Fig. 7. For each set of 6 consecutive mouse pointer locations, the relative percentage relation between the sum of the consecutive-points distances dij and the gap between the initial and final points d16 is computed as a degree of jittering in that time space.

The jittering can be regarded as a degree or percentage of deviation from the shortest path of movement during a time frame. In this work, equation (2) was used to quantify the jittering(Jitt). It is noted that its value decreases to zero when the mouse moves along a straight line.

(2)

Fig. 7: The jittering degree is computed as a percentage relation between the trajectory length (sum of the individual distances between consecutive points) and the distance between starting and ending point for each six-point time frame.

In order to estimate the calibration error, the Euclidean distance Debetween the centroid (xc, yc) of the mouse pointer and the target button location (xb, yb) was calculated for each time frame in each data set (equation 3). The centroid was determined using the six points that fall within the time frame Δtand the target was the position of the moving button at that time.

(3)

The experiments were conducted using six subjects.Each subject went through a three step process. First, data was collected with the application shown on Fig. 7 without using the ANN. Then, the neural network was trained using thecollected raw data.Finally, data was collected using the ANN.The results from all the subjects were averaged and processed using equations 2 and 3 (Table1).

Table 1: Jittering and calibration error reduction

Without ANN
(a) / With ANN
(b) / Ratio of Improvement
((a – b) / a)
Jitteringdegree / 39.31% / 32.94% / 16.20%
Calibration error / 62 pixels / 51 pixels / 17.74%

The results reveal more than 15% reduction in both jittering and calibration error when the EGT is supported with ANN intervention, which represents a substantial improvementin the use of eye gaze to control the mouse pointer. Experimenting with the control application and web browsing, the mouse cursor was found to be more stable and easier to control since the trajectory was significantly smoother and could reach the target and click on it with an improved degree of accuracy.

6 Conclusion

The challenge of this study was todesign an integrated, real-time assistive system as an alternate human computer interface that will allow individuals with severe motor disabilities to use most of the Windows applications. The objectives were (a) to acquire a traceable eye image, (b) to calibrate the eye raw data, (c) to reconstruct the data from the EGT, (d) to translate the use of eye output stream in place of the mouse, and (e) to experiment with different Windows applications using different subjects.

The main advantage of the EGT-based interface is that it responds instantly to broad displacements of the user's eye-gaze on the computer screen. Eye gaze interaction gives a subjective feeling of highly responsive system, almost as though the system is executing the user’s intentions before the user actually implements them.