Interim Report for Multimedia Processing

INTERIM REPORT FOR MULTIMEDIA PROCESSING

PERFORMANCE COMPARISON OF HEVC MAIN STILL PICTURE PROFILE

SPRING 2015

MULTIMEDIA PROCESSING- EE 5359

04/19/2015

ADVISOR: DR. K. R. RAO

DEPARTMENT OF ELECTRICAL ENGINEERING

UNIVERSITY OF TEXAS, ARLINGTON

DEEPU SLEEBA PHILIP

1001038966

TABLE OF CONTENTS

1. Acronyms And Abbreviations...... 2

2. Objective Of The Project...... 3

3. Overview Of HEVC...... 3

4. WebP...... 10

5. Performance Comparison Metrics...... 14

6. Implementation...... 14

7. Results...... 15

8. Test Images...... 21

9. Test Configuration...... 22

10. Conclusions...... 23

11. Appendix...... 24

12. References...... 27

1. Acronyms And Abbreviations

Ø AMVP: Advanced motion vector prediction

Ø AVC: Advanced Video Coding

Ø BD-PSNR: Bjontegaard metric calculation

Ø BSD: Berkeley Software Distribution

Ø CABAC: Context Adaptive Binary Arithmetic Coding

Ø CB: Coding Block

Ø CIF: Common Intermediate Format

Ø CU: Coding Unit

Ø CTB: Coding Tree Block

Ø CTU: Coding Tree Unit

Ø DCT: Discrete Cosine Transforms

Ø DST: Discrete Sine Transform

Ø EBCOT: Embedded block coding with optimized truncation

Ø GIF: Graphics interchange format

Ø HD: High Definition

Ø HEVC: High Efficiency Video Coding

Ø JCT-VC: Joint Collaborative Team on Video Coding

Ø MC: Motion Compensation

Ø ME: Motion Estimation

Ø MPEG: Moving Picture Experts Group

Ø MSP: Main Still Picture Profile

Ø MV: Motion Vector

Ø NGOV: Next Generation Open Video

Ø PCS: Professional Communication Society

Ø PNG: Portable Network Graphics

Ø PSNR: Peak Signal To Noise Ratio

Ø PU: Prediction Unit

Ø QP: Quantization Parameter

Ø QCIF: Quarter Common Intermediate Format

Ø RD: Rate Distortion

Ø SAO: Sample Adaptive Offset

Ø SAD: Sum of Absolute Differences

Ø SATD: Sum of Absolute Transformed Differences (SATD)

Ø SHVC: Scalable HEVC

Ø SSIM: Structural Similarity

Ø SVC: Scalable Video Coding

Ø TM: True Motion

Ø TU: Transform Unit

Ø URQ: Uniform Reconstruction Quantization

Ø VCEG: Visual Coding Experts Group

2.Objective:

• The aim of this project is to compare the rate-distortion performance analysis of the HEVC MSP profile with that of WebP

• The peak-signal-to-noise ratio (PSNR) and the average bit rate savings in terms of Bjøntegaard delta rate (BD) is considered for this comparison

3.Overview of HEVC:

• The High Efficiency Video Coding (HEVC) standard is the most recent joint video project of the ITU-T Visual Coding Experts Group (VCEG) and the ISO/IEC Moving Picture Experts Group (MPEG) standardization organizations, working together in a partnership known as the Joint Collaborative Team on Video Coding (JCT-VC) [1].

• It is widely used for many applications, including broadcast of high definition (HD) TV signals over satellite, cable, and terrestrial transmission systems, video content acquisition and editing systems, camcorders, security applications, Internet and mobile network video, Blu-ray Discs, and real-time conversational applications such as video chat, video conferencing, and telepresence systems. However, an increasing diversity of services, the growing popularity of HD video, and the emergence of beyond- HD formats (e.g., 4k×2k or 8k×4k resolution) are creating even stronger needs for coding efficiency superior to H.264/ MPEG-4 AVC’s capabilities. An increased desire for higher quality and resolutions is also arising in mobile applications [2].

Block Diagram of HEVC ENCODER and DECODER :

Figure 1: Block Diagram of HEVC Encoder (with decoder modelling elements shaded in light gray). [3]

Figure. 1a HEVC Decoder Block Diagram [29]

3.1 Macro block concept and Prediction block sizes:

The concept of macro-block in HEVC [3] is represented by the Coding Tree Unit (CTU). CTU size can be 16x16, 32x32 or 64x64, while AVC macro-block size is 16x16. Larger CTU size aims to improve the efficiency of block partitioning on high resolution video sequence. Larger blocks provoke the introduction of quad-tree partitioning of a CTU into smaller coding units (CUs). A coding unit is a bottom-level quad-tree syntax element of CTU splitting. The CU contains a prediction unit (PU) and a transform unit (TU).

The TU is a syntax element responsible for storing transform data. Allowed TU sizes are 32x32, 16x16, 8x8 and 4x4. The PU is a syntax element to store prediction data like the intra-prediction angle or inter-prediction motion vector. The CU can contain up to four prediction units. CU splitting on PUs can be 2Nx2N, 2NxN, Nx2N, NxN, 2NxnU, 2NxnD, nLx2N and nRx2N where 2N is a size of a CU being split. In the intra-prediction mode only 2Nx2N PU splitting is allowed. An NxN PU split is also possible for a bottom level CU that cannot be further split into sub CUs.

3.2 Coding units (CUs) and coding blocks (CBs):

Figure 2. 64*64 CTBs split into CBs [3]

The quad tree syntax of the CTU specifies the size and positions of its luma and croma CBs. The root of the quad tree is associated with the CTU. Hence, the size of the luma CTB is the largest supported size for a luma CB. The splitting of a CTU into luma and croma CBs is signaled jointly. One luma CB and ordinarily two croma CBs, together with associated syntax, form a coding unit (CU) as shown in Figure.3.

Figure 3. CUs split into CBs [3]

3.3 Prediction Modes :

3.3.1 Intra Prediction Modes :

There are a total of 35 intra-prediction modes in HEVC: planar (mode 0), DC (mode 1) (fig.4) and 33 angular modes (modes 2-34 in Figure 4). DC intra-prediction is the simplest mode in HEVC. All PU pixels are set equal to the mean value of all available neighbouring pixels. Planar intra-prediction is the most computationally expensive. It is a two- dimensional linear interpolation. Angular intra-prediction modes 2-34 are linear interpolations of pixel values in the corresponding directions. Vertical intra-prediction (modes 18- 34) is an updown interpolation of neighbouring pixel values. Also, intra prediction can be done at different block sizes, ranging from 4 X 4 to 64 X 64.

Fig 4: Modes and directional orientations for intra picture prediction [9]

3.3.2 Inter Prediction :

Each PU is predicted from image data in one or two reference pictures (before or after the current picture in display order), using motion compensated prediction. Transform and Quantization Any residual data remaining after prediction is transformed using a block transform based on the integer Discrete Cosine Transform (DCT) [4]. Only for 4x4 intra luma, a transform based on Discrete Sine Transform (DST) is used. One or more block transforms of size 32x32, 16x16, 8x8 and 4x4 are applied to residual data in each CU. Then the transformed data is quantized.

3.3.3 Entropy coding:

Context adaptive binary arithmetic coding (CABAC) is used for entropy coding. This is similar to the CABAC scheme in H.264/MPEG-4 AVC, but has undergone several improvements to improve its throughput speed (especially for parallel-processing architectures) and its compression performance, and to reduce its context memory requirements.

3.3.4 In-loop deblocking filtering:

A deblocking filter similar to the one used in H.264/MPEG-4 AVC is operated within the inter picture prediction loop. However, the design is simplified in regard to its decision-making and filtering processes, and is made more friendly to parallel processing.

3.3.5 Sample adaptive offset (SAO):

A nonlinear amplitude mapping is introduced within the inter picture prediction loop after the deblocking filter. Its goal is to better reconstruct the original signal amplitudes by using a look-up table that is described by a few additional parameters that can be determined by histogram analysis at the encoder side.

4. Overview of WebP:

figure 6: Block Diagram of VP8 Encoder [21]

WebPis animage format employing bothlossy andlosslesscompression [19]. It is currently developed byGoogle, based on technology acquired with the purchase ofOn2 Technologies.As a derivative of theVP8video format, it is a sister project to the WebMmultimedia container format.WebP-related software is released under aBSD license.

The format was first announced in 2010 as a newopen standardfor lossily compressed true-color graphics on the web, producing smaller files of comparable image quality to the olderJPEGscheme [19]. According to Google's measurements, a conversion fromPNGto WebP results in a 45% reduction in file size when starting with PNG found on the web, and a 28% reduction compared to PNG that are recompressed withpngcrushandpngout. [19]

Google has proposed using WebP for animated images as an alternative to the popularGIFformat, citing the advantages of 24-bit color with transparency, combining frames with lossy and lossless compression in the same animation, and as well as support for seeking to specific frames. Google reports a 64% reduction in file size for images converted from animated GIF to lossy WebP, and a 19% reduction when converted to lossless WebP. [19]

4.1 Prediction Techniques

WebP's lossy compression uses the same methodology as VP8 for predicting (video) frames. VP8 is based onblock predictionand like any block-based codec, VP8 divides the frame into smaller segments called macro-blocks. It has two prediction modes : Intra prediction uses data within a single video frame and Inter prediction uses data from previously encoded frame.

4.1.1 Intra Prediction:

WebP has three types of blocks:

· 4x4 luma

· 16x16 luma

· 8x8 chroma

Four common intra prediction modes used by these blocks are:

H_PRED (horizontal prediction): Fills each column of the block with a copy of the left column, L.

V_PRED (vertical prediction) : Fills each row of the block with a copy of the above row, A.

DC_PRED (DC prediction): Fills the block with a single value using the average of the pixels in the row above A and the column to the left of L [16].

TM PRED (True Motion prediction): In addition to the row A and column L, TM_PRED uses the pixel C above and to the left of the block. Horizontal differences between pixels in A and vertical differences between pixels in L are propagated (starting from C) to form the prediction block.

For 4x4 luma blocks, there are six additional intra modes corresponding to predicting pixels in different directions. As mentioned above, the TM_PRED mode is unique to VP8. Fig. 7 uses a 4x4 block as example to illustrate how the TM_PRED mode works:

In Fig. 7, C, A and L represent reconstructed pixel values from previously coded blocks, and Xoo through X33 represent predicted values for the current block. TM_PRED uses the following

equation to calculate Xij = Li + Aj - C (i,j=0, 1, 2, 3). The TM PRED prediction mode for 8x8 and 16x16 blocks works in a similar fashion. Among all the intra prediction modes, TM PRED is one of the more frequently used modes in VP8. For natural video sequences, it is typically used by 20% to 45% of all

intra coded blocks. Together, these intra prediction modes help VP8 to achieve high compression efficiency, especially for key frames, which can only use intra modes.

figure 7: Illustration of intra prediction mode TM_PRED [16]

4.1.2 Inter prediction Modes:

Inter prediction modes are used on inter frames (non-keyframes). For any VP8 inter frame, there are typically three previously coded reference frames that can be used for prediction.

A typical inter prediction block is constructed using a motion vector to copy a block from one of the three frames. The motion vector points to the location of a pixel block to be copied. In video compression schemes, a good portion of the bits is spent on encoding motion vectors; the portion can be especially large for video encoded at lower data rates. VP8 provides efficient motion vector coding by reusing motion vectors from neighboring macro-blocks. For example, the prediction modes "NEAREST' and "NEAR" make use of last and second-to-last, non-zero motion vectors from neighboring macro-blocks. These inter prediction modes can be used in combination with any of the three different reference frames.

In addition, VP8 has a sophisticated, flexible inter prediction mode called SPLITMV. This mode was designed to enable flexible partitioning of a macro-block into sub-blocks to achieve better inter

prediction. SPLITMV is useful when objects within a macro-block have different motion characteristics. Within a macro-block coded using the SPLITMV mode, each sub-block can have its own motion vector. Similar to the strategy of reusing without transmitting motion vectors at the macro-block level, a sub-block

can also use motion vectors from neighboring sub-blocks above or left of the current block without transmitting the motion vectors. This strategy is very flexible and can encode any shape of submacro-block

partitioning. Fig. 8(a) shows an example of a macro-block with l6xI6 luma pixels that are partitioned to 16 4x4 blocks:

1 / 1 / 1 / 1
1 / 2 / 2 / 1
1 / 2 / 2 / 1
3 / 3 / 3 / 1

(a) (b)

figure 8: Illustration of VP8 inter prediction mode SPLITMV [16]

In Fig. 8 (a), New represents a 4x4 bock coded with a new motion vector, and Left and Above represent a 4x4 block coded using the motion vector from the left and above, respectively. This example effectively partitions the 16xI6 macro-block into three different segments with three different motion vectors (represented by 1, 2and 3), as seen in Fig. 8 (b).

4.1.3 Reference Frames

VP8 uses three types of reference frames for inter prediction: the "last frame", a "golden frame" and an "alternate reference frame." Depending on content, a frame from the distant past can be very beneficial in

terms of inter prediction when objects re-appear after disappearing for a number of frames. Based on such observations, VP8 was designed to use one reference frame buffer to store a video frame from an arbitrary point in the past. This buffer is known as the "Golden Reference Frame." Unlike other types of reference frames used in video compression, which are always displayed to the user by the decoder, the VP8 alternate reference frame is decoded normally but may or may not be shown in the decoder. It can be used solely as a reference to improve inter prediction for other coded frames.