EE 5359 Multimedia Processing

Study of AVS-China Part 2

and Comparison of its Performance

with H.264 and Dirac

~ Jennie Gloria Abraham

Fall 2009

Department of Electrical Engineering,

University of Texas at Arlington

List of Acronyms

AU Access Unit

AVS Audio Video Standard

AVS-M Audio Video Standard for mobile

B-Frame Interpolated Frame

CAVLC Context Adaptive Variable Length Coding

CBP Coded Block Pattern

CIF Common Intermediate Format

DIP Direct Intra Prediction

DPB Decoded Picture Buffer

DVD Digital Versatile Disc

EOB End of Block

HD High Definition

HHR Horizontal High Resolution

ICT Integer Cosine Transform

IDR Instantaneous Decoding Refresh

I-Frame Intra Frame

IMS IP Multimedia Subsystem

ITU-T International Telecommunication Union

MB Macroblocks

MBPAFF Macro Block Pair Adaptive Field Frame

MPEG Moving Picture Experts Group

MPM Most Probable Mode MV Motion Vector

NAL Network Abstraction Layer

PAFF Picture Adaptive Field Frame

P-Frame Predicted Frame

PIT Prescaled Integer Transform

PPS Picture Parameter Set

QCIF Quarter Common Intermediate Format

QP Quantization Parameter

RD Rate Distortion

SAD Sum of Absolute Differences

SD Standard Definition

SEI Supplemental Enhancement Information

SPS Sequence Parameter Set

TV Television

VLC Variable Length Coding

List of Figures

Figure 1.1: An example of a Home Media Ecosystem

Figure 1.3: Video encoding / decoding process

Figure 2.3 Non-Sampling (NS) macroblock pair

Figure 2.4 Vertical-Sampling (VS) macroblock pair

Figure 2.5: Two stages of encoding order in MBPAFF

Figure 2.6: Layered data structure

Figure 2.7: Normal slice structure and flexible slice set in AVS-video

Figure 2.8: Macroblock partitioning [Lu ppt]

Figure 2.9: AVS Video framework

Figure 2.10: Intra prediction

Figure 2.11: Inter prediction in a block-based video codec

Figure 2.12: Inverse transform: combining weighted basis patterns to create a 4x4 residual image block

Figure 2.13: Reconstruction of MB image from residual MB and MB predicted using motion vectors

Figure 3.1: Neighbor Pixels in Luminance Intra Prediction

Figure 3.2: Five luminance intra prediction modes.

Figure 3.3: Symmetric mode of AVS Part 2.

Figure 3.4: Position ofinteger pixels, 1/2 pixels and 1/4 pixels.

Figure 3.5: Maximum of two reference pictures in AVS-video:

(a) reference of current (the second) field in field-coded I picture,

(b) references of current frame in frame-coded P picture,

(d)references of current (the second) field in field-coded P picture,

(e) references of current frame in frame-coded B picture

(f) references of current (the first) field in field-coded B picture and

(g) references of current (the second) field in field-coded B picture.

Figure 3.6: Quantization of the transformed coefficients of the image block

List of Tables

Table 2.1: Different parts of AVS standard

Table 2.2 Application-based profiles of AVS

Table 2.3: Summary of profiles in AVS-Video

ABSTRACT

Audio Video Coding Standard (AVS) is established by the Working Group of China in the same name. AVS China - Video is the second part of the standard developed to target high-definition digital video broadcasting and high-density storage media.

This project aims to provide an in-depth view of AVS China - Part 2 standard, beginning from the architecture of the codec to the features it offers and various data formats it supports. The project mainly focuses on providing an understanding of the AVS-P2 video encoder and decoder, while detailing the major coding tools: integer transform, intra and inter-picture prediction, in-loop de-blocking filter and two dimensional variable length coding within this system.

A performance comparison is made with the other popular standards like H.264 and Dirac under specific testing conditions. The experimental results are tabulated and plotted and appropriate conclusions are drawn.

MOTIVATION

The opportunity to get to know the inner workings of the of AVS China video codec with an added benefit of familiarizing with H.264 and Dirac video codec was a strong motivating factor to choose this project.

Availability of the codecs in Multimedia Processing Lab at University of Texas at Arlington was crucial to the decision in carrying out this project

1. INTRODUCTION

Broadcast television and home entertainment have been revolutionized by the advent of digital TV and DVD-video. These applications and many more were made possible by the standardization of video compression technology. Video compression (or video coding) is an essential technology for applications such as digital television, DVD-Video, mobile TV, videoconferencing and internet video streaming.

Figure 1.1: An example of a Home Media Ecosystem[29]

In today’s world we expect a seamless integration of various standards.

The history of various audio video coding standards as they emerged over the years can be seen in figure 1.2.

1stGENERATION 2ndGENERATION

Figure 1.2: History of audio/video coding standards [30]

Standardizing video compression makes it possible for products from different manufacturers (e.g. encoders, decoders and storage media) to inter-operate. An encoder converts video into a compressed format and a decoder converts compressed video back into an uncompressed format.

1.1 Scope of a standard

The process of converting a digital video into a format that takes up less capacity when it is stored or transmitted is defined by each industry standard for video compression. These standards define the format (syntax) for compressed video bitstream and a method for decoding this syntax to produce a displayable video sequence. The standard document does not actually specify how to encode (compress) digital video – this is left to the manufacturer of a video encoder – but in practice the encoder is likely to mirror the steps of the decoding process.

Figure 1.3: Video encoding / decoding process

Figure 1.3 shows the encoding and decoding processes of a general video compression scheme and highlight the parts specified by the standard.

2. AVS STANDARD

AVS video coding standards are important parts of standardization productions of AVS working group. AVS-video is the collective name of all parts related to coding of video and its auxiliary information in the AVS such as video, audio and media copyright management. The different parts of AVS China are listed in Table 2.1.

Table 2.1: Different parts of AVS standard [34]

Considering the different requirements of various video applications, AVS-video defines different profiles, which combine advanced video coding tools with trade-off between coding efficiency and encoder/decoder implementation complexity as well as functional properties and target to category of applications.

2.1 Profiles and Levels

‘‘Profile’’ is a specified subset of the coding tools. In AVS video, each profile picks up tools from the video coding tool pool. So far, there are four profiles -

· Jizhun (base ) profile

· Jiben (basic) profile

· Shenzhan (extended) profile

· Jiaqiang (enhanced) profile

- defined in AVS-video targeting different applications (Table 2.2).

Profiles / Key Applications
Jizhun profile / Television broadcasting, HDTV etc
Jiben profile / Mobility applications, etc.
Shenzhan profile / Video surveillance, etc.
Jiaqiang profile / Multimedia entertainment, etc

Table 2.2 Application-based profiles of AVS[37]

2.1.1 AVS-Video Jizhun profile (base profile)

Jizhun profile is defined as the first profile in the national standard of AVS-Part2, approved as national standard in 2006, which mainly focuses on digital video applications like commercial broadcasting and storage media, including high-definition applications. Typically, it is preferable for high coding efficiency on video sequences of higher resolutions, at the expense of moderate computational complexity.

2.1.2. AVS-video Jiben profile (basic profile)

Jiben profile is defined in AVS-Part7 target mobility video applications featured with smaller picture resolution. Thus, computational complexity becomes a critical issue. In addition, the ability on error resilience is needed due to the wireless transporting environment.

2.1.3. AVS-Shenzhan profile (extended profile)

The standard of AVS-Shenzhan focuses exclusively on solutions of standardizing the video surveillance applications. Especially, there are special features of sequences from video surveillance, i.e. the random noise appearing in pictures, relatively lower encoding complexity affordable, and friendliness to events detection and searching required.

2.1.4. AVS-Jiaqiang profile (enhanced profile)

To fulfill the needs of multimedia entertainment, one of the major concerns of Jiaqiang profile is movie compression for high-density storage. Relatively higher computational complexity can be tolerated at the encoder side to provide higher video quality, with compatibility to AVS-Part 2 as well.

Where, a PROFILE is a subset of syntax, semantics, and algorithms defined by AVS, a LEVEL puts constraints on the parameters of the stream. The purpose of defining profiles and levels is to facilitate interoperability among streams from various applications. AVS Part 2 which defines the Jizhun profile which comprises of 4 levels,

· level 4.0 and 4.2 for standard definition (SD) video with 4:2:0 and 4:2:2 format

· level 6.0 and 6.2 for high definition (HD) video with 4:2:0 and 4:2:2 format

Table 2.3: Summary of profiles in AVS-Video [37]

2.2 Data Formats used in AVS

Progressive scan is a method of storing or transmitting images where in all lines of each frame is drawn in sequence. Interlaced scanning involves alternate drawing of odd and even lines. AVS codes video data in progressive scan format. An advantage of coding data in progressive scan format is the efficiency with which motion estimate operates and also progressive content can be encoded at significantly lower bit rates than interlaced data. Motion compensation of progressive content is less complex than interlaced content.

AVS supports both progressive and interlace scan formats. In case of progressive, one picture is one frame. For interlaced coding, a picture can be coded as two fields instead of a frame. In such case, two fields share a same picture header and they shall belong to different slices.

One sequence can be coding either in frame, field, picture-level adaptive frame/ field (PAFF), or macroblock pair adaptive frame/field (MBPAFF) coding. Especially, the technique of MBPAFF requires a pair of macroblocks, rather than a single macroblock, to be considered as the basic coding unit. Figs. 2.3 and 2.4 illustrate two different kinds of macroblock pairs in MBPAFF, non-sampling (NS) pair and vertical-sampling (VS) pair, respectively.

Figure 2.3 Non-Sampling (NS) macroblock pair[37]

Figure 2.4 Vertical-Sampling (VS) macroblock pair[37]

In a picture applying MBPAFF with allowance of prediction within a same frame/field, macroblock pairs are coded in a two-stage order, as shown in Fig. 2.5.

Figure 2.5: Two stages of encoding order in MBPAFF [37]

(solid line: the first stage; dash line: the second stage).

Only when all the macroblocks of the first stage, which includes NS0, NS1and VS0 macroblocks, are encoded, will the macroblocks of the second stage, which only includes VS1 macroblocks, be processed. So that the macroblocks coded in the second stage can be predicted from the macroblocks coded in the first stage with motion compensation. It can be noticed that when applying MBPAFF, the height of each slice shall be integer times of two macroblocks.

2.2.1 Layered Structure

AVS follows a layered structure for the data and this is very much visible in the coded bitstream. Figure 2.6 depicts the layered data structure.

Figure 2.6: Layered data structure

At the top most layer, sets of frames of video are put together as a sequence. Video frames comprise the next layer, and are called Pictures. Pictures are subdivided into rectangular regions called slices. Slices are further subdivided into square regions of pixels called macroblocks (MB). These MBs consists of a set of luminance and chrominance blocks.

Sequence

The sequence layer consists of a set of mandatory and optional downloaded system parameters. The mandatory parameters are necessary to initialize decoder systems. The optional parameters are used for other system settings at the discretion of the network provider. Sometimes user data can optionally be contained in the sequence header. The Sequence layer provides an entry point into the coded video. Sequence headers should be placed in the bitstream to support user access appropriately for the given distribution medium. Repeat sequence headers may be inserted to support random access. Sequences are terminated with a sequence end code.

Picture

The picture layer provides the coded representation of a video frame. It comprises a header with mandatory and optional parameters and optionally with user data. Three types of pictures are defined by AVS:

1. Intra pictures (I-pictures)

2. Predicted pictures (P-pictures)

3. Interpolated pictures (B-pictures)

Slice

The slice structure provides the lowest-layer mechanism for resynchronizing the bitstream in case of transmission error. Slices comprise of a series of MBs. Slices must not overlap, must be contiguous, must begin and terminate at the left and right edges of the Picture. It is possible for a single slice to cover the entire Picture. The slice structure is optional. Slices are independently coded and no slice can refer to another slice during the decoding process.

Figure 2.7: Normal slice structure and flexible slice set in AVS-video[37]

(left: normal slice structure where slice can only contain continual lines of macroblocks; right: flexible slice set allowing more flexible grouping of macroblocks in slice and slice set, where B0, B1and B2 are slices of the same slice group).

Macroblock

Picture is divided into macroblocks (MB). The upper-left sample of each MB should not exceed picture boundary. Macroblock partitioning is shown in Figure 2.2. The partitioning is used for motion compensation. The number in each rectangle specifies the order of appearance of motion vectors and reference indices in a bitstream.

Figure 2.8: Macroblock partitioning

A macroblock includes the luminance and chrominance component pixels that collectively represent a 16x16 region of the picture. In 4:2:0 mode, the chrominance pixels are sub-sampled by a factor of two in each dimension; therefore each chrominance component contains only one 8x8 block. In 4:2:2 mode, the chrominance pixels are sub-sampled by a factor of two in the horizontal dimension; therefore each chrominance component contains two 8x8 blocks. The MB header contains information about the coding mode and the motion vectors. It may optionally contain the quantization parameter (QP).

Block

The block is the smallest coded unit and contains the transform coefficient data for the prediction errors. In the case of intra-coded blocks, intra prediction is performed from neighboring blocks.

2.3 System Architecture