THESIS PROPOSAL

- Pooja Vasant Agawane

Student I.D.: 1000-522-697

Date : August 20, 2007

IMPLEMENTATION AND EVALUVATION OF

THE RESIDUAL COLOR TRANSFORM

OBJECTIVE:

The objective of the thesis is to implement and evaluate the performance of Residual Color Transform (RCT) [9] applied to the High Definition test sequence[16]. This thesis aims to compare the performance of the RCT with the lossless coding tools in the JM reference software[16].

MOTIVATION:

The 4:4:4 video sampling format is gaining a lot of attention due to its significance in the professional applications. Both the industry and the academia are actively involved to achieve better compression efficiency and high coding gain in RGB (red, green, and blue) color space. In contrast totypical consumer applications, the high quality of the video is demanded in the applicationssuch as professional digital video recording ordigital cinema / large-screen digital imagery. These applications require allthree color components to be represented with identical spatialresolution. Moreover, for this kind of applications, samplevalues in each color component of a video signal areexpected to be captured and displayed with a precision ofmore than 8 bits. These specific characteristics are posingnew questions and new challenges, especially regarding thechoice of an optimal color space representation.Typically, both for video capture and display purposesthe RGB color space representation canbe considered as the natural choice. From a coding point ofview,however, the RGBdomain is often not the optimumcolor space representation, mainly because for natural sourcematerial usually significant amount of statistical dependenciesamongthe RGBcomponents can be observed.Thus, in order to take advantage of these statistical properties,the usage of a decorrelating transformation from theoriginal RGBdomain to some appropriate color space is oftenrecommended. [2]

DISADVANTAGES OF YCbCr AND INTRODUCTION OF YCgCo COLOR SPACE:

Typically, a video is captured and displayed using the RGB(Red, Green and Blue) color space. The disadvantages of encoding the video in RGB domain are:

  • Color components in the RGB domain are highly correlated.
  • The response of the human visual system (HVS) is better matched to the luminance and chrominance components, rather than RGB. The HVSis very sensitive to the luminance information in the image. It is less sensitive to the chrominance components.

The YUV color space represents this luminance and chrominance information in a given RGB image. Hence the color conversion from the RGB domain to the YUV domain for encoding is performed. This conversion can be performed as follows[20]:

Y = 0.299R + 0.587G + 0.114B

U = − 0.147R − 0.289G + 0.436B

V = 0.615R − 0.515G − 0.100B

In the YUV domain, the chrominance samples can be subsampled. This leads to compression. Then the inverse transform is performed from the YUV to RGB for display.

YCbCr is a family of color spaces. Y stands for Luminance, Cb represents the blue chromna and Cr represents the red chroma. The conversion from RGB to YCbCr can be performed as follows[7]:

with, e.g., KR= 0.2126, KB= 0.0722.

There are two problems with this approach:

  • The samples are actually represented using integers. The rounding error is introduced in both the forward and inverse color transformations.
  • Theabove transformation was not originally designed for digital video compression. It uses a sub-optimal trade-off betweenthe complexity of the transformation (with difficult-to-implement coefficient values such as 0.2126 and 0.0722) andcoding efficiency.

Considering the second problem, a new color spacecalled YCgCo (where the "Cg" stands for green chroma and the "Co" stands for orange chroma) has been introduced. This is much simplerand typically has equal or better coding efficiency. The conversion from the RGB to the YCgCo color space can be performed as follows[7]:

.

This conversion reduces the complexity of conversion from the RGB domain to YCbCr and also increases the coding efficiency.

RESIDUAL COLOR TRANSFORM:

The residual color transform maps RGB to the YCoCg color space. The characteristics of the YCgCo color space can be explained as follows[8]:

  • This color transform has been shown to becapable of achieving a decorrelation that is much better thanthat obtained by various RGB-to-YCbCr transforms andwhich, in fact, is very close to that of the Karhunen-Loeve transform[23].
  • The transform is reversiblein the sense that each original RGB triple can be exactly recoveredfrom the corresponding YCoCg triple if the colordifference components Co and Cg are represented with oneadditional bit accuracy relative to the bit depth used for representingRGB, and if furthermore, no information loss in anysubsequent coding step is assumed.
  • Boththe forward and inverse RGB-to-YCoCg transforms requireonly a few shift and add operations per triple which, in addition,can be performed inline, i.e., without the need of someextra memory apart from one single auxiliary register:

The “>”-operator denotes the bitwise right shiftoperator.

OVERVIEW OF H.264:

H.264/MPEG-4 AVC is the latest video coding standard. It is noted for achieving very high data compression. H.264 is aimed at achieving high quality video at low bit rates as compared to previous standards of MPEG-2, H.263 and MPEG-4 part 2. The price to be paid is an increase in complexity where the decoder complexity is about four times that of MPEG-2 and two times that of MPEG-4 Part 2 Visual[6].

The basic coding structure of H.264 is similar to the previous standards of MPEG-1, MPEG-2, H.263, etc. [3]. This coding structure is referred to as motion compensated – transform coding structure. A video is a group of pictures and it is coded by considering one picture at a time. A picture is considered as a group of slices. A picture can have one or more slices. A slice consists of a sequence of macroblocks (MB). Each MB is 16*16 pixels of luminance component (Y) and 8*8 pixels of two chrominance components (Cb and Cr) for the 4:2:0 sampling format. This 16*16 luminance macroblock can be partitioned into sub blocks of 16*8, 8*16 and 8*8. Each 8*8 luminance can be further partitioned in 8*4, 4*8 and 4*4 sub blocks. The hierarchy of video data organization is shown in Fig. 1:

Figure 1: Hierarchy of video data organization

Figure 2 illustrates the various macroblock and sub-macroblock partitions supported by H.264[6].

Figure 2: Partitioning of a macroblock and a sub macroblock for motion compensated prediction[6]

The H.264/MPEG-4 AVC standard introduced new coding tools and concepts in order to achieve high compression efficiency. Some of the new coding tools introduced in H.264 are listed as follows:

  • Intra prediction,
  • Multiple block size motion compensated prediction,
  • Multiple previous (reference) frames prediction,
  • Inloop adaptive deblocking filter

A number of new coding tools have been added to the first draft of H.264 which is termed as Fidelity Range Extensions (FRExts). These amendments were made in response to increasing demand for professional applications. Some of the features included in FRExts can be listed as follows[4]:

  • More than 8 bits per sample for source video accuracy
  • High resolution frames of 4:2:2 and 4:4:4 format
  • RGB color representation
  • Adaptive residual spatial transform
  • 9 Intra 8*8 prediction methods
  • Adaptive transform block size – 4*4, 8*8.

These extensions were manifested by the introduction of a set of high profiles. Four high profiles (Fig. 3) were introduced in FRExts[4].

Figure 3: High profiles of H.264 [13]

The high profile supports 8-bit video with 4:2:0 sampling. It addresses high-end consumer use and other applications using high-resolution video without a need for extended chroma formats or extended sample accuracy.

SCHEMATIC OF THE THESIS:

This thesis plans to implement the schematic of residual color transform (RCT). The block diagram for this schematic is shown in Fig. 4.

Figure 4: Block diagram of RCT [9]

The residual color transform is applied in the loop of encoder (Fig. 5). The input, output and the reference frames are in the RGB domain.Fig. 6 illustrates the decoder involving RCT.

Figure 5: Inloop implementation of RCT in H.264 encoder [17]

Figure 6: Implementation of RCT in the H.264 decoder [1]

This thesis aims at implementation of the encoder and decoder block diagrams shown in Figs. 5 and 6. The reconstructed video is then compared for the metrics of PSNR v/s bitrate and mean square error.

The thesis also aims at execution of the JM 12.1 reference software[16]. The implementation of the RCT is then compared with the statistics of the reconstructed video obtained from the JM software.

WORK DONE SO FAR:

MATLAB CODE:

The code reads a high definition(HD) sequence. The number of frames being coded has been restricted to three for intermediate step. The sequences are in YUV 4:4:4 format. The code converts the input frames to YCgCo domain and then encodes them. Arithmetic coding [21]is used for entropy coding. The frames are then reconstructed using inverse transform.

The code is given as follows:

close all; clear all; clc;

inputVDO='PLANE_YUV444.yuv';

samplingFormat=[1920 1080];

totalFrame=3; fpin=fopen(inputVDO,'rb');

for p=1:totalFrame

[y]=fread(fpin,samplingFormat(1,:),'uint8'); y=y';

[cb]=fread(fpin,samplingFormat(1,:),'uint8'); cb=cb';

[cr]=fread(fpin,samplingFormat(1,:),'uint8'); cr=cr';

Y(:,:,p)=y; CB(:,:,p)=cb; CR(:,:,p)=cr;

end

fclose(fpin);

for p=1:1:totalFrame

red(:,:,p) = 1.164*(Y(:,:,p)-16) + 1.596*(CR(:,:,p)-128);

green(:,:,p) = 1.164 * (Y(:,:,p)-16) - 0.813*(CR(:,:,p)-128) - 0.392*(CB(:,:,p)-128);

blue(:,:,p) = 1.164 * (Y(:,:,p)-16) + 2.017*(CB(:,:,p)-128);

image_rgb(:,:,1) = red(:,:,p);

image_rgb(:,:,2) = green(:,:,p);

image_rgb(:,:,3) = blue(:,:,p);

image_rgb_final(:,:,:,p) = image_rgb(:,:,:);

subplot(3,1,p)

imshow(uint8(image_rgb))

end

figure

for p=1:1:totalFrame

resd_frm = image_rgb_final(:,:,:,p);

red = resd_frm(:,:,1);

blue = resd_frm(:,:,2);

green = resd_frm(:,:,3);

Y1new = round((green + (red + blue)/2)/2);

Cgnew = round((green - (red + blue)/2)/2);

Conew = round((red - blue)/2);

%ARITHMETIC ENCODING:

Y1_vec = Y1new(:)';

symb = [min(Y1_vec):max(Y1_vec)];

ncount=histc(Y1_vec,symb);

a = ncount(ncount~=0);

b = symb(ncount~=0);

Y1_tmp = Y1_vec;

for j=1:length(b)

Y1_tmp(Y1_vec == b(j)) = j;

end

bitcount_Y(p) = length(arithenco(Y1_tmp,a));

Cg_vec = Cgnew(:)';

symb = [min(Cg_vec):max(Cg_vec)];

ncount=histc(Cg_vec,symb);

a = ncount(ncount~=0);

b = symb(ncount~=0);

Cg_tmp = Cg_vec;

for j=1:length(b)

Cg_tmp(Cg_vec == b(j)) = j;

end

bitcount_Cg(p) = length(arithenco(Cg_tmp,a));

Co_vec = Conew(:)';

symb = [min(Co_vec):max(Co_vec)];

ncount=histc(Co_vec,symb);

a = ncount(ncount~=0);

b = symb(ncount~=0);

Co_tmp = Co_vec;

for j=1:length(b)

Co_tmp(Co_vec == b(j)) = j;

end

bitcount_Co(p) = length(arithenco(Co_tmp,a));

red_recon = Y1new + Conew - Cgnew;

green_recon = Y1new + Cgnew;

blue_recon = Y1new - Conew - Cgnew;

resid_frm_recon(:,:,1) = red_recon;

resid_frm_recon(:,:,2) = green_recon;

resid_frm_recon(:,:,3) = blue_recon;

subplot(3,1,p)

imshow(uint8(resid_frm_recon))

end

The output reconstructed frames are shown below. The frame size is 1920*1080.

OUTPUT FILE SIZE:

For one frame:

Original File Size : 6,220,800 bytes

File size after Arithmetic coding: 4243400

Compression Ratio: 1.466

JM REFERENCE SOFTWARE:

Planeframe1.yuv - Original file size - 6,220,800 bytes

1: lencod -p Inputfile="J:\JMsoftware\JM\bin\Planeframe1.yuv"

test.264 - 4096 bytes

test_rec.yuv - 114,688 bytes

2: lencod -p Inputfile="J:\JMsoftware\JM\bin\Planeframe1.yuv" -p ProfileIDC=144 -p YUVFormat=3

test.264 - 4096 bytes

test_rec.yuv - 229,376 bytes

These High Definition frames are encoded in the JM 12.1 software. The SNR values for the Y, U, and V components are noted.

PSNR Y = 43.74 dB

PSNR U = 44.16 dB

PSNR V = 43.28 dB

FUTURE WORK TO BE DONE:

The code written so far does not consider motion estimation and compensation. The thesis aims at implementing these functions to get further compression. Further research includes the following steps:

  • consider and implement reference frames
  • perform inter prediction
  • apply RCT to the residual frame
  • encode the transformed frame using arithmetic coding
  • calculate the compression ratio
  • decode the bitstream
  • apply inverse RCT
  • add the residual frame to the reference frame
  • get the reconstructed frame.

The thesis aims at analyzing the results of the High 4:4:4 profile in the JM software for encoding a high definition video sequence.

REFERENCE PAPERS AND ARTICLES:

[1]Soon-kak Kwon, A. Tamhankar andK.R. Rao, ”Overview of H.264 / MPEG-4 Part10”, J. Visual Communication and Image Representation,vol. 17, pp.183-552, April 2006.

[2]T. Wiegand, G. J. Sullivan, “The H.264 video coding standard”, IEEE Signal Processing magazine, vol. 24, pp. 148-153, March 2007.

[3]D. Marpe, T. Wiegand, G. J. Sullivan, “The H.264/MPEG-4 AVC Standard and its applications”, IEEE communications magazine, vol. 44, pp. 134-143,Aug. 2006.

[4]D. Marpe and T. Wiegand, “H.264/MPEG4-AVC Fidelity Range Extensions: Tools, Profiles, Performance, and Application Areas”, Proc. IEEE International Conference on Image Processing 2005, vol. 1,pp. I - 593-6, 11-14 Sept. 2005.

[5]A. Puri et al, “Video Coding using the H.264/ MPEG-4 AVC compression standard”, Signal Processing: Image Communication, vol. 19, pp: 793 – 849, Oct. 2004.

[6]J. Ostermann et al, “Video coding with H.264/AVC: Tools, Performance, and Complexity”, IEEECircuits and Systems Magazine,vol. 4,Issue 1, pp. 7 – 28, First Quarter 2004.

[7]G. Sullivan et al, “The H.264/AVC Standard: Overview and Introduction to the Fidelity Range Extensions”, SPIE conference on Applications of Digital Image Processing XXVII, vol. 5558, pp. 53-74, Aug. 2004.

[8]D. Marpe et al, “Macroblock-Adaptive Residual Color Space Transforms for 4:4:4 Video Coding”,Proc. IEEE International Conference on Image Processing (ICIP 2006), Atlanta, GA, USA, Oct. 8-11, 2006.

[9]W. S. Kim: Residue Color Transform, JVT-L025, 12th meeting: Redmond, WA, USA, 17-23 July, 2004.

[10]W. S. Kim: Adaptive Residue Transform and Sampling, JVT-K018, 11th meeting: Munich, Germany, 15-19 March, 2004.

[11]Y. L. Lee: Lossless Intra Coding for improved 4:4:4 Coding in H.264/MPEG-4 AVC, JVT-P016, 16th meeting: Poznan, Poland, 24-29 July, 2005,

[12]Y. L. Lee: Lossless coding for Professional Extensions, JVT-L017, 12th meeting: Redmond, WA, USA, 17-23 July, 2004.

[13]W. S. Kim: Advanced Residual Color Transform, JVT-Q059, 17th meeting: Nice, France, 14-21 October, 2005.

REFERENCE WEBSITES:

[14] JM reference software manual -

[15]Overview of H.264,

[16]H.264 JM reference software,

[17]Presentation on “YCgCo Residual Color Transform”,

[18] DPX to YUV converter -

[19] High Definition sequences - ftp.tnt.uni-hannover.de

[20] YUV color space –

REFERENCE BOOKS:

[21]K. Sayood, “Introduction to Data compression”, III edition, Morgan Kauffmann publishers, 2006.

[22]I. E.G. Richardson,“H.264 and MPEG-4 video compression: video coding for next-generation multimedia”, Wiley, 2003.

[23]K. R. Rao and P. C. Yip, “The transform and data compression handbook”,Boca Raton, FL: CRC press, 2001.