Fast Inter-Prediction Mode Decision Algorithm for H.264 Video Encoder

3/4

Fast Inter-prediction Mode Decision Algorithm for H.264 Video Encoder

Dongil Han, Amruta Kulkarni and K.R.Rao

3/4

Abstract—The latest video compression standard, H.264 (also known as MPEG-4 Part 10/AVC for Advanced Video Coding), has proven to be superior to earlier standards in terms of compression ratio, quality, bit rates and error resilience. All this is achieved at the expense of increased encoder complexity. The algorithm used in this paper is a computationally efficient mode prediction and selection approach which has the following features: (a) both the spatial and temporal information is used to achieve early termination using adaptive thresholds, (b) inclusion of a modulator capable of trading off computational efficiency and accuracy and (c) a homogenous region detection procedure for 8x8 blocks based on adaptive thresholds.

Results obtained using QCIF and CIF format video sequences based on JM17.2 reference software using H.264baseline profile are shown to prove the computational efficiency of the proposed algorithm.

I. INTRODUCTION

MPEG-4 Part 10 or AVC (Advanced Video Coding) –also called as H.264 is a standard for video compression, and is currently one of the most commonly used formats for the recording, compression, and distribution of high definition video[16]. The standard uses more advanced coding features such as quarter pixel motion compensation, adaptive deblocking filter and variable block size mode selection [1, 2, 5, 6, 10, 11]. The prediction methods supported by H.264 are more flexible than those in previous standards, enabling accurate predictions and hence efficient video compression. Without compromising the image quality, an H.264 encoder can reduce the size of a digital video file by more than 80% compared with the Motion JPEG[] format and as much as 50% more than with the MPEG-4 Part 2 standard.

In order to improve coding efficiency, in intra prediction, the intra mode is checked on I4x4 and I16x16, to support 9 and 4 different directional predictions (I is intra). The temporal (inter) prediction varies from block sizes 16x16, 16x8, 8x16, 8x8, 8x4, 4x8 to 4x4, where small size blocks correspond to detailed regions or large motion areas while larger size blocks correspond to homogenous regions, or relatively stationary or small motion areas. In the implementation of the H.264/AVC codec appearing in JM17.2[6], when conducting mode decision for one macroblock, P16x16 is first checked, followed by P16x8 and so on (P is prediction). For each mode, a rate distortion-based mode selection is carried out to obtain the best candidate mode which has the minimum rate distortion denoted by J. Though the rate distortion cost addresses coding efficiency, price to pay is increased encoder complexity which makes it difficult to implement for real time applications and on handheld devices, with limited battery-life. In this paper, we have implemented a FAT (Fast Adaptive Termination) [8] algorithm for mode selection which provides improvement over the existing JM 17.2 [6] implementation.

II. FAST ADAPTIVE EARLY TERMINATION ALGORITHM FOR MODE SELECTION

H.264/AVC is a block-oriented motion compensation based standard, in which video coding is performed on macroblocks from the up left to bottom right corner. Generally, the spatial macroblocks in the same frame have the similar characteristics such as motion, detailed region, etc. For example, if most of the neighboring macroblocks have skip mode, it is quite likely that the current macroblock is having the same mode. Temporal similarity also exists between the co-located macroblocks in the previous encoded frame and current frame.

Fig. 1 Encoder block diagram of H.264[2].

FAT for mode decision exploits statistical similarity between current macro block and predicted macroblock. Predicted mode is obtained from the spatial and temporal macroblocks. For accuracy, the rate distortion cost is checked against Adaptive Threshold I and Adaptive Threshold II

Adaptive Threshold I: RD thres = RD pred x (1-8xβ)

Adaptive Threshold II: RD thres = RD pred x (1+10xβ)

where, β is the modulator

If the predicted mode is less than P8x8, it is checked if the current macroblock is homogeneous or not. Further partitioning is done into 8x4, 4x8 and 4x4 blocks, if the current macroblock is not homogenous.

A mode histogram from spatial and temporal neighboring macroblocks is obtained; then the best mode as the index corresponding to the maximum value in the mode histogram is selected.

The average rate-distortion cost of each neighboring macroblock corresponding to the best mode is then selected as the prediction cost for the current macroblock.

III. FAT ALGORITHM [8]

· Step 1: If current macroblock belongs to I slice, check for intra prediction using I4x4 or I16x16, go to step 10 else go to step 2.

· Step 2: If a current macroblock belongs to the first macroblock in P slice check inter and intra prediction modes, go to step 10 else go to step 2.

· Step 3: Compute mode histogram from neighboring spatial and temporal macroblocks, go to step 4.

· Step 4: Select prediction mode as the index corresponding to maximum in the mode histogram and obtain values of Adaptive Threshold I and Adaptive Threshold II, go to step 5.

· Step 5: Always check over P16x16 mode and check the conditions in the skip mode, if the conditions of skip mode are satisfied go to step 10, otherwise go to step 6.

Fig.2 Flowchart for FAT algorithm [8]

· Step 6 : If all left, up , up-left and up-right have skip modes, then check the skip mode against, Adaptive Threshold I if the rate distortion is less than Adaptive Threshold I , the current macroblock is labeled as skip mode and go to step 10, otherwise, go to step 7.

· Step 7: First round check over the predicted mode; if the predicted mode is P 8x8, go to step 8; otherwise, check the rate distortion cost of the predicted mode against Adaptive Threshold I. If the RD cost is less than Adaptive Threshold I, go to step 10; otherwise go to step 9.

· Step 8: If a current P8x8 is homogeneous, no further partition is required. Otherwise, further partitioning into smaller blocks 8x4, 4x8, 4x4 is performed. If the RD of P8x8 is less than Adaptive Threshold I, go to step 10; otherwise go to step 9.

· Step 9 : Second round check over the remaining modes against Adaptive Threshold II : If the rate distortion is less than Adaptive Threshold II; go to step 10; otherwise continue checking all the remaining modes, go to step 10.

· Step 10: Save the best mode and rate distortion cost.

IV. RESULTS

Original JM 17.2 and FAT algorithm are tested on 16 test sequences [13] (8 CIF and 8 QCIF)

Fig. 3 Akiyo test sequence for CIF and QCIF [17].

Time reduction (%)

CIF Sequences:

i) Total encoding time required for all CIF sequences using original JM 17.2 reference software

(for QP=22, 27, 32, and 37): 4535.043 seconds

ii) Total encoding time required for all CIF sequences using FAT algorithm

(for QP=22, 27, 32, and 37): 2562.531 seconds

Thus, % reduction in time required to encode the CIF sequences is 43.5%

QCIF Sequences:

i) Total encoding time required using original JM 17.2 reference software for all QCIF sequences (for QP=22, 27, 32, and 37): 2573.801 seconds

ii) Total encoding time required using modified JM 17.2 reference software for all QCIF sequences using FAT algorithm (for QP=22, 27, 32, and 37): 1350.269 seconds

Thus, % reduction in time required to encode the QCIF sequences is 47.538%

Table 1. Comparison of PSNR vs bit rate between JM 17.2 and FAT algorithm

Fig. 4. PSNR vs Bit rate for akiyo_cif sequence, 30 FPS, 30 Frames encoded

Table 2. Comparison of encoding time between

JM 17.2 and FAT algorithm

Fig. 5. Encoding time vs QP for

akiyo_cif sequence, 30 FPS, 30 Frames encoded

QP / Original (Y SSIM) / Optimized (Y SSIM ) / ∆SSIM %
22 / 0.9825 / 0.9822 / 0.030534351
27 / 0.9781 / 0.978 / 0.01
32 / 0.9562 / 0.9555 / 0.07
37 / 0.9273 / 0.9269 / 0.04

Table 3. Comparsion of SSIM between JM17.2 and FAT algorithm

Table 4. Comparison of PSNR vs bit rate between JM 17.2 and FAT algorithm

QP / Bit rate / PSNR / Bit Rate / PSNR
Original / Original / Optimized / Optimized
(kbps) / (dB) / (kbps) / (dB)
22 / 95.61 / 42.58 / 95.14 / 42.493
27 / 53.07 / 39.701 / 53.52 / 39.622
32 / 47.4 / 36.159 / 32.73 / 36.863
37 / 20.55 / 34.233 / 20.91 / 34.141

Fig. 6. PSNR vs Bit rate for akiyo_qcif sequence, 30 FPS, 30 Frames encoded

Table 5. Comparison of encoding time between

JM 17.2 and FAT algorithm

Fig. 7. Encoding time vs QP for

akiyo_qcif sequence, 30 FPS, 30 Frames encoded

Table 6. Comparison of SSIM between JM 17.2

and FAT algorithm

V. CONCLUSIONS

To achieve time complexity reduction in inter prediction, a fast adaptive termination mode selection algorithm; named FAT [8] is used for implementing H.264 video encoder.

The algorithm involves- i) Initial prediction based on mode histogram from neighboring macroblocks ii) Early termination using adaptive thresholds iii) A refinement if the distortion cost remains larger than the thresholds.

Experimental results reported on different video sequences and comparison with open source code (JM17.2)[6] indicate that the algorithm used achieves faster computation speed with a negligible loss in video quality and very little increment in bit rate.

VI. ACKNOWLEDGEMENT

This work (Grants No. 000355930109) was supported by Business for Cooperative R&D between Industry, Academy, and Research Institute funded Korea Small and Medium Business Administration in 2009.

VII. REFERENCES

[1] ITU-T Rec. H.264 | ISO/IEC 14496-10: Information Technology – Coding of Audio-visual Objects, Part 10: Advanced Video Coding 2002.

[2] T.Wiegand, et al, “Overview of the H.264/AVC Video Coding Standard.” IEEE Trans. Circuits and Syst. for Video Technol., Vol. 13, pp. 560-576, July. 2003.

[3] Z.Chen, et al, “Fast Integer Pixel and Fractional Pixel Motion Estimation for JVT.” Doc. #JVT-F017, Dec. 2002.

[4] B.Hsieh, et al, “Fast Motion Estimation for H.264/MPEG-4 AVC by Using Multiple Reference Frame Skipping Criteria.” VCIP 2003, Proceedings of SPIE, Vol. 5150, pp. 1551-1560, Oct. 2003.

[5] A.Puri, et al, “Video coding using the H.264/MPEG-4 AVC compression standard”, Signal Processing: Image Communication, vol.19, pp. 793-849, Oct. 2004.

[6] H.264/AVC JM software: http://iphome.hhi.de/suehring/tml/

[7] An overview of the H.264 encoder: www.vcodex.com

[8] J. Ren, et al, “Computationally efficient mode selection in H.264/AVC video coding”, IEEE Trans. on Consumer Electronics, vol.54, pp.877 – 886, May 2008.

[9] J. Blanc-Talon et al. (Eds.): ACIVS 2006, LNCS 4179, pp. 454 – 465, Springer-Verlag Berlin Heidelberg, 2006.

[10] I. Richardson, “The H.264 advanced video compression standard” –second edition, Wiley, 2010.

[11] Soon-kak Kwon, et al “Overview of H.264/MPEG-4 part 10”, J. VCIR, vol.17, pp. 186-216, April 2006.

[12] J. Kim, et al “Complexity Reduction Algorithm for Intra Mode Selection in H.264/AVC Video Coding”, J. Blanc-Talon et al. (Eds.): ACIVS 2006, LNCS 4179, pp. 454 – 465, 2006.Springer-Verlag Berlin Heidelberg, 2006.

[13] YUV test video sequences : http://trace.eas.asu.edu/yuv/

[14] “http://en.wikipedia.org/wiki/bidirectional prediction”, bidirectional prediction.

[15] Z. Wang, A. C. Bovik, H. R. Sheikh and E. P. Simoncelli, "Image quality assessment: From error visibility to structural similarity," IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600-612, Apr. 2004.

[16] T.Wiegand and G.J. Suvillan ,” The picture phone is here, Really”, IEEE spectrum, vol. 48, pp.50-54,Sept. 2011.