Performance and Computational Complexity Assessment of High-Efficiency Video Encoders

A project proposal on

Performance and Computational Complexity Assessment of High-Efficiency Video Encoders

Under the guidance of Dr. K. R. Rao

For the fulfillment of the course Multimedia Processing (EE5359)

Spring 2016

Submitted by

Manu Rajendra Sheelvant

UTA ID: 1001112728

Email id:

Department of Electrical Engineering

The University of Texas at Arlington

Table of Contents

List of Acronyms and Abbreviations……………………………………………………………...3

Abstract…………………………………………………………………………………...... 5

Objective of the Project……………………………………………………………………………6
H.265/High Efficiency Video Coding(HEVC)...………………………………………………….7

2.1Introduction……………………………………………………………………...... 7

2.2Encoder and Decoder…………………………………………………………...... 7

2.3Features of HEVC……………………………………………………………...... 8

2.3.1Coding tree units and coding tree block (CTB) structure…………………....……9

2.3.2Coding Units (CUs) and coding blocks (CBs)…………………………………...10

2.3.3Prediction units and prediction blocks (PBs)…………………………………….11

2.3.4Transform Units(TUs) and transform blocks…………………………..………...11

2.3.5Motion Vector Signaling……………………………………………...... 11

2.3.6Motion Compensation………………………………………………...... 11

2.3.7Intra-picture prediction………………………………………………………...…12

2.3.8Quantization Control…………………………………………………………..…13

2.3.9Entropy Coding………………………………………………………………...…13

2.3.10In-loop de-blocking filtering…………………………………………………...…13

2.3.11Sample Adaptive Offset (SAO)………………………..………………………...…14

3.Recent works on complex analysis…………………………………………………………...…...15

3.1Reference Code Analysis…………………………………………………………...……..15

3.2Reported Runtime…………………………………………………………………...…….15

3.3Time-Stamp Counter………………………………………………………………………15

3.4Energy Consumption Analysis and Estimation……………………………………………16

3.5Software Profilers………………………………………………………………………….16

4. HEV Encoders……………………………………………………………………………………..17

5.Experimental Setup………………………………………………………………………………..18

6. References…………………………………………………………………………………………19

List of Acronyms and Abbreviations

ALF: Adaptive Loop Filter.

AVC: Advanced Video Coding.

CABAC: Context Adaptive Binary Arithmetic Coding

CTB: Coding Tree Block.

CTU: Coding Tree Unit.

CU: Coding Unit.

CB: Coding Block.

DBF: De-blocking Filter.

DCT: Discrete Cosine Transform.

DST: Discrete Sine Transform.

DVFS: Dynamic Voltage and Frequency Scaling.

GBD-PSNR: Generalized Bjontegaard Delta – PSNR.

HEVC: High Efficiency Video Coding.

HEVC-DM: HEVC Direct Mode.

HEVC-LM: HEVC Linear Mode.

HM: HEVC Test Model.

HP: Hierarchical Prediction.

JCT: Joint Collaborative Team.

JCT-VC: Joint Collaborative Team on Video Coding.

JM: H.264 Test Model.

JPEG: Joint Photographic Experts Group.

LCU: Largest Coding Units.

MV: Motion Vector.

MC: Motion Compensation.

ME: Motion Estimation.

MPEG: Motion Picture Experts Group.

NSQT: Non-Square Quadtree.

PC: Prediction Chunking.

PU: Prediction Unit.

PB: Prediction Block.

PSNR: Power Signal-to-Noise Ratio.

QP: Quantization Parameter.

RDO: Rate Distortion Optimization.

RDTSC: Read Time Stamp Counter.

RQT: Residual Quadtree.

SAO: Sample Adaptive Offset.

TB: Transform Block.

TSC: Time Stamp Counter.

TU: Transform Unit.

VCEG: Video Coding Experts Group.

Abstract

This project presents a performance evaluation study of coding efficiency versus computational complexity for the forthcoming High Efficiency Video Coding (HEVC) [1] standard. A thorough experimental investigation was carried out to identify the tools that most affect the encoding efficiency and computational complexity of the HEVC encoder. A set of 16 different encoding configurations was created to investigate the impact of each tool, varying the encoding parameter set and comparing the results with a baseline encoder. This paper shows that, even though the computational complexity increases monotonically from the baseline to the most complex configuration, the encoding efficiency saturates at some point [2]. Moreover, the results of this paper provide relevant information for implementation of complexity-constrained encoders by taking into account the tradeoff between complexity and coding efficiency. It is shown that low-complexity encoding configurations, defined by careful selection of coding tools, achieve coding efficiency comparable to that of high-complexity configurations.

Objective of this project:-

The objective of this project is to study about the computational complexity and encoding performance of the HEVC encoder. Different cases of encoding configurations will be tested on wide variety of video contents and test sequences.

The relation/trade-off between HEVC complexity and coding efficiency cost will be studied. Different types of coding tool will be combined and left out and then check the how the HEVC complexity and coding efficiency cost will be effected. Then, it will derived to find the best combination of configurations tools to maximize the encoding efficiency and minimize the HEVC complexity.

Therefore it is also found what configuration tools are helpful for high HEVC encoding efficiency and find what configuration tools have the minimum effect on HEVC encoding efficiency.

The simulation will be conducted using HM16.4software [3], with different video sequences [4], configuration tools like Search Range, Hadamard ME, Fast Merge Decision, Deblocking Filter, Sample Adaptive Offset, Adaptive Loop filter, Transform Skipping, and other tools.

2. H.265 / High Efficiency Video Coding

2.1 Introduction

High Efficiency Video Coding (HEVC) is an international standard for video compression developed by a working group of ISO/IEC MPEG (Moving Picture Experts Group) and ITU-T VCEG (Video Coding Experts Group). The main goal of HEVC standard is to significantly improve compression performance compared to existing standards (such as H.264/Advanced Video Coding [7]) in the range of 50% bit rate reduction at similar visual quality [1].

HEVC is designed to address existing applications of H.264/MPEG-4 AVC and to focus on two key issues: increased video resolution and increased use of parallel processing architectures [1]. It primarily targets consumer applications as pixel formats are limited to 4:2:0 8-bit and 4:2:0 10-bit. The next revision of the standard, will enable new use-cases with the support of additional pixel formats such as 4:2:2 and 4:4:4 and bit depth higher than 10-bit [8], embedded bit-stream scalability and 3D video [9].

2.2 Encoder and Decoder in HEVC

Source video, consisting of a sequence of video frames, is encoded or compressed by a video encoder to create a compressed video bit stream. The compressed bit stream is stored or transmitted. A video decoder decompresses the bit stream to create a sequence of decoded frames [10].

The video encoder performs the following steps:

Partitioning each picture into multiple units
Predicting each unit using inter or intra prediction, and subtracting the prediction from the unit
Transforming and quantizing the residual (the difference between the original picture unit and the prediction)
Entropy encoding transform output, prediction information, mode information and headers

The video decoder performs the following steps:

Entropy decoding and extracting the elements of the coded sequence
Rescaling and inverting the transform stage
Predicting each unit and adding the prediction to the output of the inverse transform
Reconstructing a decoded video image

The Figure 1 represents the block diagram of HEVC CODEC [10] :

Figure 1: Block Diagram of HEVC CODEC [10].

2.3 Features of HEVC

The video coding layer of HEVC employs the same hybrid approach (inter-/intra-picture prediction and 2-D transform coding) used in all video compression standards. Figure. 1 depicts the block diagram of a hybrid video encoder, which can create a bitstream conforming to the HEVC standard. Figure.5 shows the HEVC decoder block diagram. An encoding algorithm producing an HEVC compliant bitstream would typically proceed as follows. Each picture is split into block-shaped regions, with the exact block partitioning being conveyed to the decoder. The first picture of a video sequence (and the first picture at each clean random access point in a video sequence) is coded using only intra-picture prediction (that uses prediction of data spatially from region-to-region within the same picture, but has no dependence on other pictures). For all remaining pictures of a sequence or between random access points, inter-picture temporally predictive coding modes are typically used for most blocks.

The encoding process for inter-picture prediction consists of choosing motion data comprising the selected reference picture and Motion Vector (MV) to be applied for predicting the samples of each block. The encoder and decoder generate identical inter-picture prediction signals by applying motion compensation (MC) using the MV and mode decision data, which are transmitted as side information. The residual signal of the intra- or inter-picture prediction, which is the difference between the original block and its prediction, is transformed by a linear spatial transform. The transform coefficients are then scaled, quantized, entropy coded, and transmitted together with the prediction information. The encoder duplicates the decoder processing loop (see gray-shaded boxes in Figure.4) such that both will generate identical predictions for subsequent data. Therefore, the quantized transform coefficients are constructed by inverse scaling and are then inverse transformed to duplicate the decoded approximation of the residual signal. The residual is then added to the prediction, and the result of that addition may then be fed into one or two loop filters to smooth out artifacts induced by block-wise processing and quantization.

The final picture representation (that is a duplicate of the output of the decoder) is stored in a decoded picture buffer to be used for the prediction of subsequent pictures. In general, the order of encoding or decoding processing of pictures often differs from the order in which they arrive from the source; necessitating a distinction between the decoding order (i.e., bitstream order) and the output order (i.e., display order) for a decoder. Video material to be encoded by HEVC is generally expected to be input as progressive scan imagery (either due to the source video originating in that format or resulting from de-interlacing prior to encoding). No explicit coding features are present in the HEVC design to support the use of interlaced scanning, as interlaced scanning is no longer used for displays and is becoming substantially less common for distribution. However, a metadata syntax has been provided in HEVC to allow an encoder to indicate that interlace-scanned video has been sent by coding each field (i.e., the even or odd numbered lines of each video frame) of interlaced video as a separate picture or that it has been sent by coding each interlaced frame as an HEVC coded picture. This provides an efficient method of coding interlaced video without burdening decoders with a need to support a special decoding process for it. In the following, the various features involved in hybrid video coding using HEVC are highlighted as follows.

2.3.1 Coding tree units and coding tree block (CTB) structure:The core of the coding layer in previous standards was the macroblock, containing a 16×16 block of luma samples and, in the usual case of 4:2:0 color sampling, two corresponding 8×8 blocks of croma samples; whereas the analogous structure in HEVC is the coding tree unit (CTU), which has a size selected by the encoder and can be larger than a traditional macroblock. The CTU consists of a luma CTB and the corresponding croma CTBs and syntax elements. The size L×L of a luma CTB can be chosen as L = 16, 32, or 64 samples, with the larger sizes typically enabling better compression. HEVC then supports a partitioning of the CTBs into smaller blocks using a tree structure and quad tree-like signalling. The partitioning of CTBs into CBs ranging from 64*64 down to 8*8 is shown in Figure.2.

Figure 2: 64*64 CTBs split into CBs [13]

2.3.2 Coding units (CUs) and coding blocks (CBs):The quad tree syntax of the CTU specifies the size and positions of its luma and croma CBs. The root of the quadtree is associated with the CTU. Hence, the size of the luma CTB is the largest supported size for a luma CB. The splitting of a CTU into luma and croma CBs is signaled jointly. One luma CB and ordinarily two croma CBs, together with associated syntax, form a coding unit (CU) as shown in Figure.3. A CTB may contain only one CU or may be split to form multiple CUs, and each CU has an associated partitioning into prediction units (PUs) and a tree of transform units (TUs).

Figure 3: CUs split into CBs [13]

2.3.3 Prediction units and prediction blocks (PBs):The decision whether to code a picture area using interpicture or intrapicture prediction is made at the CU level. A PU partitioning structure has its root at the CU level. Depending on the basic prediction-type decision, the luma and croma CBs can then be further split in size and predicted from luma and croma prediction blocks (PBs) as shown in Figure. 4. HEVC supports variable PB sizes from 64×64 down to 4×4 samples.

Figure 4: Partitioning of Prediction Blocks from Coding Blocks [13]

2.3.4 TUs and transform blocks:The prediction residual is coded using block transforms. A TU tree structure has its root at the CU level. The luma CB residual may be identical to the luma transform block (TB) or may be further split into smaller luma TBs as shown in Figure.5. The same applies to the Croma TBs. Integer basis functions similar to those of a discrete cosine transform (DCT) are defined for the square TB sizes 4×4, 8×8, 16×16, and 32×32. For the 4×4 transform of luma intrapicture prediction residuals, an integer transform derived from a form of discrete sine transform(DST) is alternatively specified.

Figure 5: Partitioning of Transform Blocks from Coding Blocks [13]

2.3.5 Motion vector signalling:Advanced motion vector prediction (AMVP) is used, including derivation of several most probable candidates based on data from adjacent PBs and the reference picture. A merge mode for MV coding can also be used, allowing the inheritance of MVs from temporally or spatially neighboring PBs. Moreover, compared to H.264/MPEG-4 AVC [7], improved skipped and direct motion inference is also specified.

2.3.6Motion compensation:Quarter-sample precision is used for the MVs and 7-tap or 8-tap filters are used for interpolation of fractional-sample positions (compared to six-tap filtering of half-sample positions followed by linear interpolation for quarter-sample positions in H.264/MPEG-4 AVC). Similar to H.264/MPEG-4 AVC, multiple reference pictures are used as shown in Figure.6. For each PB, either one or two motion vectors can be transmitted, resulting either in uni-predictive or bi-predictive coding, respectively. As in H.264/MPEG-4 AVC, a scaling and offset operation may be applied to the prediction signal(s) in a manner known as weighted prediction.

Figure 6: Quadtree structure used for MVs [13]

Figure 7: Concept of multi-frame motion-compensated prediction [13]

2.3.7Intra-picture prediction:The decoded boundary samples of adjacent blocks are used as reference data for spatial prediction in regions where inter-picture prediction is not performed. Intra picture prediction supports 33 directional modes (compared to eight such modes in H.264/MPEG-4 AVC[7]), plus planar (surface fitting) and DC (flat) prediction modes. The selected intra-picture prediction modes are encoded by deriving most probable modes (e.g., prediction directions) based on those of previously decoded neighboring PBs.

Figure 8: Directional Prediction Modes in H.264 [13]

Figure 9: Modes and directional orientations for intra picture prediction in HEVC [13]

2.3.8Quantization control:As in H.264/MPEG-4 AVC, uniform reconstruction quantization (URQ) is used in HEVC, with quantization scaling matrices supported for the various transform block sizes.

2.3.9Entropy coding:Context adaptive binary arithmetic coding (CABAC) is used for entropy coding. This is similar to the CABAC scheme in H.264/MPEG-4 AVC, but has undergone several improvements to improve its throughput speed (especially for parallel-processing architectures) and its compression performance, and to reduce its context memory requirements.

2.3.10In-loop de-blocking filtering:A de-blocking filter similar to the one used in H.264/MPEG-4 AVC[7] is operated within the inter-picture prediction loop. However, the design is simplified in regard to its decision-making and filtering processes, and is made more friendly to parallel processing.

2.3.11Sample adaptive offset (SAO): A nonlinear amplitude mapping is introduced within the inter-picture prediction loop after the de-blocking filter. Its goal is to better reconstruct the original signal amplitudes by using a look-up table that is described by a few additional parameters that can be determined by histogram analysis at the encoder side.

3. Recent work on Complexity Analysis

Several works have been published in the last few years, addressing the problem of complexity analysis of video encoders [18] and decoders [23]. Generally, these works evaluate the relationship between compression efficiency and complexity, mainly focusing on computational complexity (processing time or instruction-level profiling). Processing time has been largely used to measure computational complexity of video coding systems [18], but other approaches such as theoretical analysis [19], direct analysis of reference code [21], and power consumption evaluation [18], have also been frequently applied to measure complexity.

3.1 Reference Code Analysis: Lehtoranta and Hamalainen [21] analyzed the reference code of an MPEG-4 encoder, evaluating its processing modules by counting their equivalent number of RISC like instructions. The overall complexity of the MPEG-4 encoder was then compared to that of H.263 and H.264/AVC encoders. Chang-Guo et al. [32] also analyzed the C code implementation of an MPEG decoder using the number of basic processor instructions to estimate the computational complexity measured in MOPS. The code size and complex implementation structures of the reference software in modern video encoders/decoders make this kind of direct analysis of computer code extremely difficult and inaccurate, which encourages the use of other methods and tools.

3.2 Reported Runtime Other works measure the computational complexity simply based on the encoding and decoding report generated by the reference software of each standard, which generally includes processing time information. As an example of this approach, Xiang et al. [33] report on a method called generalized Bjøntegaard delta PSNR (GBD-PSNR) for evaluating coding efficiency that takes into consideration the bit rate, the image distortion, and the processing time. The computational complexity for each of the main H.264/AVC encoding modules is also theoretically estimated and experimentally evaluated in [19], considering all coding modes possibilities. The experimental results in [19] and [33] are based on the runtime reported by the reference software. Although this type of information provides a rough indication about encoding complexity, to support valid conclusions an accurate analysis is necessary using a controlled test environment and specific tools.