High Compression of Faces in Video Sequences for Multimedia Applications

HIGH COMPRESSION OF FACES IN VIDEO SEQUENCES FOR MULTIMEDIA APPLICATIONS

MAHARAJ VIJAYARAM GAJAPATI

RAJCOLLEGE OF ENGINEERING

HIGH COMPRESSION OF FACES IN VIDEO SEQUENCES FOR MULTIMEDIA APPLICATIONS

Page 1 of 6

ABSTRACT:

This paper presents a proposal for a novel video coding scheme intended to encode human faces in video sequences at very high compression using a recognition and reconstruction approach. The scheme is based on the well-known eigenspace concepts used

in face recognition systems, which have beenmodified to cope with the video compressionapplication. An adaptive mechanism is presented to update the eigenspace which improves the overall approach. Results are presented which look promising.

1. INTRODUCTION

Image and video coding are one of the mostimportant topics in multimedia processing andcommunications. During the last thirty years we havewitnessed a tremendous explosion in research andapplications in the visual communications field.

However, and in spite of all this effort, there are someapplications that still demand higher compressionratios than those provided by state of the arttechnologies.

In particular, and due to its high

applicability, there is a need to provide novelcompression schemes to encode faces present invideo sequences. Although the new standards H.263+ and the synthetic part of MPEG-4along withother model-based proposed schemes achievehigh compression ratios for this particular application, we still believe that further compressionis needed, among others, for mobile and video streaming environments.

It is in this context that we present a novelscheme to encode faces in video sequences based onan eigenspace approach. The eigenface concept forstill image coding has been already presented in aface recognition framework in. However, to thebest of our knowledge, our approach is original andadapts the eigenspace to the video sequence to takeinto account the different poses, expressions andlighting conditions of the faces. Section 2 presents an introduction to the topic ofvery high compression and the basic eigenspaceconcepts on which our scheme is based. Section 3presents a fixed eigenspace approach while Section 4presents the basics of the adaptive scheme. Section 5presents some results and finally Section 6 drawssome conclusions.

2. IMAGE CODING THROUGH

RECOGNITION

2.1 Introduction

Many proposals have been made in the last years for image and video coding. In particular H.263+ ismainly intended for low to high data rate robust

compression and is based on a block-basedredundancy removal scheme.In addition, MPEG-4combines frame-based and segmentation-basedapproaches along with model-based video coding inthe facial animation part of the standard which allowsefficient coding as well as content access andmanipulation. It can be said that H.263+ andMPEG-4 represent the state of the art in video coding.Our proposal relies on fourth generationvideo coding techniques based on recognition and

reconstruction.Recognition and reconstructionapproaches rely on the understanding of the content.In particular, if it is know that an image contains aface; a house, and a car, recognition techniques toidentify the content can be developed as a previousstep to coding. Once the content is recognized,content-based coding techniques can be applied toencode each specific object. MPEG-4 provides apartial answer to this approach by using specifictechniques to encode faces and to animate them.

2.2 Face coding using a Principal Component Analysis approach

Let us simplify the visual content by assuming thatwe are interested in the coding of faces in a videosequence. Let us also assume that automatic tools todetect a face in a video sequence are available. Then,some experiments show that a face can be wellrepresented by very few coefficients found throughthe projection of the face on an eigenspace previouslydefined.The image face can be well reconstructed(decoded), up to a certain quality, by coding onlyvery few coefficients.Face coding can be done using a fixedeigenspace, what decreases the quality of thereconstructed image, or by adaptively changing theeigenspace according to the changes experienced bythe face to be encoded as appears in the videosequence. Prior to coding, the corresponding face isdetected and separated from the background usingsome segmentation techniques.Our coding technique is based on a facerecognition approach, which has been modified tocope with the coding application. It assumes that aset of training images for each person contained inthe video sequence is previously known. Once thesetraining images have been found (usually coming

from an image database or from a video sequence), aPrincipal Component Analysis (PCA) is performedfor each individual using the corresponding trainingset of each person. This means that a PCAdecomposition for every face image to be coded isobtained. The PCA is done previously to theencoding process.After the PCA, the face to be coded isprojected and reconstructed using each set ofdifferent eigenvectors (called eigenfaces) obtained inthe PCA stage. If the reconstruction error using a

specific set of eigenfaces is below a threshold, thenthe face is said to match the training image whichgenerated this set of eigenfaces. In this case therecognized face is coded by quantizing only the mostimportant coefficients used in the reconstruction. Thesize of the coded image has to be previouslynormalized for PCA purposes and then denormalizedat the decoder.

3. FIXED EIGEN SPACE APPROACH

In order to check the validity of the eigenspaceapproach for image coding, results using a fixedeigenspace will be first presented. These results will

be useful to point out the main drawbacks of theeigenspace approach and to fully understand theadaptive eigenspace proposed in the next section.

Figure 1 shows faces of the original sequence to becoded and the corresponding coded images. Theoriginal sequence is 8 bits 68x105 pixels at 12.5frames/s which implies 714kbits/s. The number oftraining images has been set to 30 and 16 coefficientsper image has been used to encode each frame. Theencoded sequence has an average bit-rate of 0.62kbits/s and an average PSNR of 29.6 dB. Othersimilar sequences present a very similar performance.A dynamic coefficient selection scheme could havebeen designed but our purpose here is just to checkthe validity of the eigen approach. A first orderoptimized DPCM encoding scheme has beendesigned to encode the projection coefficients.

All the images have been coded intraframe and nomotion compensation has been used. This approach ismainly a still image coder (except for the DPCM

encoding of the coefficients) although we areinterested in using video sequences to check theadaptive eigenspace presented in the next section.

When viewed in video mode, the coded sequencepresents a very acceptable visual quality.In order to check how the quality of the coded image varies over the sequence,

Figure 2 shows the PSNR of the coded sequence over 500

frames.

An important decrease in quality can beappreciated in some frames of the sequence. Thiscorresponds to a change of expression in the faceimage. Any important change in the expression of theface that is not included in the training images, willlead the system to poor quality reconstruction. Nextsection presents an adaptive eigenspace approach tocope with these situations.

4. ADAPTIVE EIGENSPACE APPROACH

4.1 Introduction

In this section we propose an eigen coding approachthat adapts itself to the face content appearing in thevideo sequence. Notice that a related technique hasbeen presented in although the approach issignificantly different.The initial encoding scheme follows that ofthe fixed eigenspace explained in Section 3. In thatscheme, any important change in the expression ofthe face will lead at a poor performance of thescheme. In order to overcome this problem, we havedesigned a fall back mode system, which consists of aquality of reconstruction evaluation block followed by an upgrade mechanism of the coder and decodereigenspace databases.If the error obtained from the reconstructionprocess is low, only the coefficients of thereconstruction are sent to the decoder (thiscorresponds to the fixed eigenspace approach). If animportant change in the expression of the face leadsto a high error result, that face will be coded usingone of the available off-the-shelf coding techniqueslike JPEG. In addition, an upgrade mechanism hasbeen implemented, and the corresponding eigenspaceis recalculated. In this way the faces of that personcontained in the following frames, will be betterrepresented using the upgraded eigenspace. Thisprovides a basic approach for updating theeigenspace. Next section provides details of theupgrade mechanism.

4.2 Eigenspace database updating system

The updating PCA system proposed here works asfollows: let us assume the eigenfaces of a personhave been obtained by calculating the PCA ofNtraining images of this person. The encoder startscoding the faces of the video sequence bytransmitting the coefficients of the projection of theoriginal faces over the eigenspace. The systemcontinuously evaluates the mean square error of thereconstruction. When the error increases over aspecified limit, the eigenspace has to be updated atboth the coder and the decoder.A first updating system approach can bedesigned by substituting the oldest face in the groupby the face which is being now coded. However, thissystem has an obvious problem. We are wasting theface that is updated in the substitution process. Itwould be more useful to store the information of anyexisting frames, as the images of a ‘talking-head’sequence present short-term and long-termcorrelation. To illustrate this, let us assume we have a10 face training database, formed mainly with thefirst frames of a sequence where the person as a happy expression. During the encoding process, if theperson changes its expression to a sad one, thesystem will update the face database. But at the sametime, if the oldest face is substituted, part of thedatabase will be destroyed and the system will not beable to reconstruct good happy faces anymore usingonly projection coefficients. There are two solutions to this problem. Thesize of the training images set can be increasedwithout eliminating any face each time an updated-frame has to be sent (unless a maximum number oftraining images is fixed). However, as the systemincreases the size of the training images, theeigenspace generation becomes more and more time-consuming,and the system will not be able toperform the necessary operations in a reasonabletime. The other solution consists of a multiple low-dimensioneigenspace database. The system startsusing a single eigenspace as before. When theeigenspace has to be updated, the training set isduplicated, and the substitution process is performedover one of the resulting sets. A new PCA is thencalculated over the new set and the system obtainstwo eigenspace. The following frames arereconstructed with the two existing eigenspace andthe one with the minimum reconstruction error isselected. A maximum number of eigenspace perperson is fixed, and when it is reached, the updatesare made by direct substitution of the oldest face of

the selected eigenspace. This approach is lessexpensive, computationally speaking. Firstly, becauseevery time an update must be done, the first systemhas to calculate only an eigenspace of dimension 5,whereas the dimension is 50 in the second one.Secondly, because the PCA is a much morecomputationally intensive process than the projectionand reconstruction one. To calculate 10reconstructions per frame is not as time consuming ascalculating the eigenspace.

5. RESULTS

Preliminary results will be shown here. Figure 3presents a PSNR comparison of coded sequenceswith and without eigenspace adaptation for the same sequence above. An update occurs whenever thePSNR of the reconstruction error falls below 24 dB.Any other threshold may be used but we have foundthat this provides good visual results for thisapplication. For clarity purposes, only frames 1-100are shown. The corresponding bit-rate is 4.8 bits/s.Notice that after the update, the PSNR of the decodedframes increases with respect to the static case, at theexpenses of obtaining a higher bit-rate. This increasein bit-rate corresponds to the JPEG images used inthe update process. Some other image coding schemedifferent than JPEG may be used which wouldimprove the coding efficiency.

Figure 3. PSNR of coded sequences with and withoutupdates

As another example, Figure 4 shows someresults for a sequence with moderate expressionchanges (see the mouth zone).

Figure 4. Above: original image; below: codedimage at 3.93 kbits/s.

The first two frames have been coded usingonly the projection coefficients while the third imagehas been coded using JPEG. The fourth and fifthimages (coded only with their reconstructioncoefficients) present a better visual quality than thefirst ones due to the update process. Thetotal bit-rateis 3.93 kbits/s. Notice that the visual quality of thecoded image can be improved by designingreconstruction error coding schemes suited to thesekind images. This would also provide scalableschemes.

6. CONCLUSIONS

An eigenspace approach for encoding movingfaces has been presented. Very acceptable resultsbelow 1 kbit/s have been obtained for moderatechanges of expression. An adaptive eigenspacescheme has been also designed to cope with moreactive sequences which provide bit-rates at around 4Kbits/s.

7. REFERENCES

[1] G. Côté, B. Erol, M. Gallant, and F. Kossentini,“H.263+: Video coding at low bit rates”, IEEETransactions on Circuit and Systems for VideoTechnology, vol. 8, no. 7, November 1998.

[2] ISO/IEC ISO/IEC 14496-2: 1999: “Informationtechnology – Coding of audio visual objects –Part 2: Visual”, December 1999.

[3]P. Eisert, B. Girod, “Analyzing facialexpressions for virtual videoconferencing”, IEEEComputer Graphics and Applications, Vol. 18,No. 5, pp. 70 - 78, 1998.

[4] B. Moghaddam, A. Pentland, “Probabilisticvisual learning for object representation”, IEEETransactions on Pattern Analysis and MachineIntelligence, vol. 19, no. 7, pp. 696-710, July1997.

[5] H. Harashima, K.Aizawa, and T. Saito, “Model-basedanalysis synthesis coding of video-telephoneimages – conception and basic studyof intelligent image coding”, TransactionsIEICE, vol. E72, no. 5, pp. 452-458, 1989.