PROJECT PROPOSAL
Topic: Advanced Image Coding
By
Radhika Veerla
Under the guidance of Dr. K. R. Rao
TABLE OF ACRONYMS
AICadvanced image coding
AVCadvanced video coding
BMPbit map format
CABAC context adaptive binary arithmetic coding
DCTdiscrete cosine transform
DWTdiscrete wavelet transform
EBCOTembedded block coding with optimizedtruncation
EZWembedded zero-tree wavelet coding
FRExtfidelity range extensions
HD-photohigh-definition photo
I-frameintra frame
JMjoint model
JPEGjoint photographic experts group
JPEG-LSjoint photographic experts group lossless coding
JPEG-XRjoint photographic experts group extended range
LBTlapped bi-orthogonal transform
MSEmean square error
PGMportable graymap
PNMportable any map
PPM portable pixel map
PSNRpeak signal to noise ratio
SSIMstructural similarity index
VLCvariable length coding
LIST OF FIGURES
Figure
1The process flow of the AIC encoder and decoder
2YCbCr sampling formats - 4:4:4, 4:2:2 and 4:2:0
3Different prediction modes used for prediction in AIC
4The specific coding parts of the profiles in H.264
5Basic coding structure for a macroblock in H.264/ AVC
6Block diagram for CABAC
7Diagram for zig-zag scan and scan line order
8Block diagram of JPEG encoder and decoder
9Structure of JPEG 2000 codec
10Tiling, DC level shifting, color transformation, DWT of each image component
11Block diagram of JPEG-XR encoder and decoder
12JPEG-LS block diagram
13A causal template of LOCO-I
14SSIM measurement system
Implementation of AIC based on I-frame only coding in H.264 and comparison with other still frame image coding standards such as JPEG, JPEG 2000, JPEG-LS, JPEG-XR
Objective:
It is proposed to implement advanced image coding (AIC) based on I-frame only coding using JM software and compare the results with other image compression techniques like JPEG, JPEG2000, JPEG-LS, JPEG-XR, Microsoft HD photo, H.263 I-frame coding. Coding simulations will be performed on various sets of test images. Experimental results are to be measured in terms of bit-rate, quality- PSNR, SSIM etc.This project considers only main and (FRExt) high profiles in H.264/AVC I-frame coding, JPEG using baselinemethod, JPEG 2000 in non-scalable, but optimal mode.
Introduction:
The aim of AIC [1] is to provide better quality with reduced level of complexity while optimizing readability and clarity. Though its aim is not to optimize speed, it is faster than many of the JPEG 2000 codecs [10]. H.264 technology aims to provide good video quality at considerably low bit rates, at reasonable level of complexity while providing flexibility to wide range of applications [2]. Coding efficiency is further improved in fidelity range extensions(FRExt) using 8x8 integer transform and works well for more complex visual content. JPEG [15]isfirst still image compression standardwhich uses 8x8 block based DCT decomposition, while JPEG 2000is a wavelet-based compression standard which has improved coding performance over JPEG with additional features like scalability and lossless coding capability has best performance with smooth spatial data. JPEG performs well in low complexity applications whereas JPEG 2000 works well in high complexity, lower bit-rate applications. JPEG2000 has rate-distortion advantage over JPEG.Microsoft HD photo[19] is a new still-image compression algorithm for continuous-tone photographic imageswhichmaintains highest image quality or delivers the most optimal performance. JPEG-XR [16] (extended range), a standard for HD-photo has high dynamic-range image coding and performance as the most desirable feature.Its performance is close to JPEG2000 with computational and memory requirements close to JPEG. With half the file size of JPEG, HD photo delivers lossy compressed image with better perceptual quality than JPEG and lossless compressed image at 2.5 times smaller than the original image. JPEG-LS [30] (lossless) is an ISO/ITU-T standard for lossless coding of still images. In addition, it also provides support for "near-lossless" compression. The main goal of JPEG-LS is to deliver a low complexity solution for lossless image coding with the best possible compression efficiency. JPEG uses Huffman coding, H.264/AVC and AIC systems adopt CABAC encoding technique, and HD photo uses reversible integer-integer-mapping lapped bi-orthogonal transform [7].LOCO-I (low complexity lossless compression for images), an algorithm for JPEG-LS uses adaptive prediction, context modeling and Golomb coding. It supports near lossless compression by allowing a fixed maximum sample error. Transcoding converts H.263 compression format to that of H.264 and viceversa. If the transcoding is done in compression domain, it gives better results as the computation only needs to perform on compressed pixels.
Although the above mentioned compression techniques are developed for different signals, they work well for still image compression and hence worthwhile for comparison. Different softwares like AIC reference software, JM software for H.264 [17], JPEG reference software[18] for JPEG,HD-photo reference software [19], JasPer [20] for JPEG2000, JPEG-LS reference software [30]are used for comparison between different codecs. The evaluation will be done using bit rates, different quality assessment metrics like PSNR, SSIM and complexity.
The following topics are discussed in this proposal.AIC is described in detail as it is implemented and various other codecs used for comparison in brief. Different settings used in the softwares and evaluation methodology are discussed. Few results obtained by evaluating different test images and test images of different sizes using AIC reference software are included.
Advanced Image Coding
Advanced image coding (AIC) is a still image compression system which combines the algorithms of H.264 and JPEG standard, shown in Fig.1, in order to achieve best compression capability in terms of quality factor with less complexity. The performance of AIC is close to JPEG 2000 and islot better than JPEG. AIC uses the intra-frame block prediction, which is originally used in H.264 to reduce the large number of bits to code original input. Both AIC and H.264 use CABAC coding while AIC uses position of coefficient matrix as the context [1].
It is observed that each block in AIC is modified to get the best compression efficiency possible.
Fig.1: The process flow of the AIC encoder and decoder [1].
Overview:
The color conversion from RGB to YCbCr allows better compression in channels as chrominance channels have less information content. Then each channel is divided into 8x8 blocks for prediction. Prediction is based on 9 modes based on previously encoded and decoded blocks. Chrominance channels use same prediction modes as corresponding blocks in luminance. Entropy is reduced further when DCT is applied to the residual blocks. CABAC is used for encoding the bit stream which uses a context where commonly encoded prediction modes and DCT coefficients use less number of bits than rarely used prediction modes and coefficients [1]. It is observed that each block in AIC is modified to get the best compression efficiency possible.
Color conversion:
The color conversion from RGB to YCbCr allows better compression in channels as chrominance channels have less information content. AIC achieves higher quality/ compression ratio without the use of sub-sampling, which was employed by H.264 and JPEG.This is possible with the use of block prediction and binary arithmetic coding. AIC uses 4:4:4 format shown in Fig.2. Sub-sampling has negative impact on image quality.
Fig.2:YCbCr sampling formats - 4:4:4, 4:2:2 and 4:2:0 [33]
Block prediction:
Each channel is divided into 8x8 blocks for prediction. Each 8x8 block is encoded using scan line order from left to right and top to bottom. H.264 supports 8x8 and 16x16 block prediction algorithms whereas AIC uses 4x4 block algorithms which are extended to 8x8 block case. Prediction is performed using all previously encoded and decoded blocks. Both H.264 and AIC use 9 prediction modes to predict the current block, shown in Fig.3. The mode which gives the minimum difference between the original and predicted block is chosen. Prediction needs information about all the pixels. The first block cannot be predicted by previous blocks. So DC mode is used for this purpose. Same prediction modes employed by Y are used for Cb, Cr in order to reduce complexity. Residual blocks are obtained by subtracting the predicted block from the original block.
AIC – Block Prediction Implementation Details:
Different modes used for block prediction are shown in Fig.3.
Mode 0: Vertical Mode 1: Horizontal Mode 2: DC
Mode 3: Diagonal Down-LeftMode 4: Diagonal Down-Right
Mode 5: Vertical-RightMode 6: Horizontal-Down
Mode 7: Vertical-LeftMode 8: Horizontal-Up
Fig.3: Different prediction modes used for prediction in AIC [1]
DCT and Quantization:
DCT is applied on each 8x8 residual block. DCT has a property of energy compaction. Uniform quantization is applied without actually discarding the bits. Quality level setting is nothing but setting the amount of quantization. AIC uses floating point algorithms to produce the best quality images.
In JPEG shown in Fig.8, the DCT coefficients are transmitted in zig-zag order shown in Fig.7(a) rather than scan-line order shown in Fig.7(b) employed by AIC. Zig-zag scanning needs reordering of coefficients to form run of zeros which can be encoded using run length coding.CABAC does not need reordering of coefficients, so run length encoding is not needed.
CABAC:
The resulting prediction modes and DCT coefficients obtained from the above processes must be stored in a stream. AIC uses CABAC algorithms to minimize the bit stream.CABAC uses different contexts to encode symbols.
Arithmetic coding can encode fractional number of bits andoutperforms Huffman coding but is more complex and slower.Position of coefficient in a matrix may be context. This can be derived as DCT has high probability of zero coefficients in high-frequency domain.Different contexts AIC use are prediction-prediction mode, prediction mode, coefficient map, last coefficient, coefficient greater than 1, absolute coefficient value, coded block [1].
H.264 standard
H.264 or MPEG-4 part 10 aims at coding video sequences at approximately half the bit rate compared to MPEG-2 at the same quality. It also aims at having significant improvements in coding efficiency using CABAC entropy coder, error robustness and network friendliness. Parameter set concept, arbitrary slice ordering, flexible macroblock structure, redundant pictures, switched predictive and switched intra pictures have contributed to error resilience/robustness of this standard. Adaptive (directional) intra prediction (Fig.3) is one of the factors which contributed to the high coding efficiency of this standard [2].
Fig.4: The specific coding parts of the profiles in H.264 [2]
Each profile specifies a subset of entire bitstream of syntax and limits that shall be supported by all decoders conforming to that profile. There are three profiles in the first version: baseline, main, and extended. Main profile is designed for digital storage media and television broadcasting. H.264 main profile which is the subset of high profile was designed with compression coding efficiency as its main target. Fidelity range extensions [3] provide a major breakthrough with regard to compression efficiency. The profiles are shown in Fig. 4.
There are four High profiles defined in the fidelity range extensions: High, High 10, High 4:2:2, and High 4:4:4. High profile is to support the 8-bit video with 4:2:0 sampling for applications using high resolution. High 10 profile is to support the 4:2:0 sampling with up to 10 bits of representation accuracy per sample. High 4:2:2 profile supports up to 10 bits per sample. High 4:4:4 profile supports up to 4:4:4 chroma sampling up to 12 bits per sample thereby supporting efficient lossless region coding [2].
H.264/AVC Main Profile Intra-Frame Coding:
Main difference between H.264/AVC main profile intra-frame coding and JPEG 2000 is in the transformation stage. The characteristics of this stage also decide the quantization and entropy coding stages. H.264 uses block based coding, shown in
Fig. 5 which is like block translational model employed in inter-frame coding framework [7]. 4x4 transform block size is used instead of 8x8. H.264 exploits spatial redundancies using intra-frame prediction of the macro-block using the neighboring pixels of the same frame, thus taking the advantage of inter-block spatial prediction.
The result of applying spatial prediction and wavelet like 2-level transform iteration is effective in smooth image regions. This feature enables H.264 to be competitive with JPEG2000 in high resolution, high quality applications. JPEG cannot sustain in the competition even though it uses DCT based block coding. DCT coding framework is competitive with wavelet transform coding if the correlation between neighboring pixels is properly considered using context adaptive entropy coding.
In H.264, after transformation, the coefficients are scalar quantized, zig-zag scanned and entropy coded by CABAC. Another entropy coding CAVLC operates by switching between different VLC tables which are designed using exponential Golomb codes [32] based on locally available contexts collected from neighboring blocks- used sacrificing some coding efficiency [2].
H.264/AVC FRExt High Profile Intra-Frame Coding:
Main feature in FRExt that improves coding efficiency is the 8x8 integer transform- and all the coding methods as well as prediction modes associated with adaptive selection between 4x4 and 8x8 integer transforms. Other features include [3, 7]
- higher resolution for color representation such as YUV 4:2:2 and YUV 4:4:4, shown in Fig.2.
- addition of 8x8 block size is a key factor in very high resolution, high bit rates
- achieve very high fidelity – even for selective lossless representation of video
Fig.5: Basic coding structure for a macroblock in H.264/AVC [2].
Context-based Adaptive Binary Arithmetic Coding (CABAC):
CABAC utilizes the arithmetic coding, also in order to achieve good compression. The CABAC encoding process, shown in Fig. 6, consists of three elementary steps [11].
Fig.6:Block diagram for CABAC [8]
step 1 : binarization – Mapping non binary symbols into binary sequence before given to arithmetic coder.
step 2: context modeling – It is a probability model for defining one or more elements based on previously encoded syntax elements.
step 3 : binary arithmetic coding – Encodes elements based on selected probability model.
JPEG
JPEG is the first ISO/ITU-T standard for continuous tone still images [15]. It allows lossy and lossless coding of still images.JPEG gives good compression results for lossy compression with the least complexity. There are several modes defined for JPEG including baseline, progressive and hierarchical.The baseline mode, which supports lossy compression alone, is most popular. Average compression ratio of 15:1 is achieved using lossy coding with the help of DCT-block based compression. Lossless coding is made possible with predictive coding compression techniques which include differential coding, run length coding and Huffman coding. JPEG employs uniform quantization with HVS weighting. Zig-zag scanning is performed on quantized coefficients since it allows entropy coding to be performed in the order from low frequency to high frequency components [15].
Fig.7(a): Zig-zag scan [15] Fig.7(b): Scan line order [1]
The process flow of JPEG baseline (lossy) algorithm isshown in the Fig.8.
(a)
(b)
Fig.8(a): Block diagram of JPEG encoder (b): Block diagram of JPEG decoder [15]
The process flow starts with the color conversion for color images followed by 8x8 block based DCT (process flow starts here for gray scale images), quantization, zig-zag ordering, and entropy coding using Huffman tables in the encoding process and vice versa for decoding process. Different quantization matrices are used for luminance and chrominance components. Quality factor ‘Q’ is set using quantization tables and different kinds of artifacts in varied ranges are observed [15].
JPEG2000
JPEG 2000 [10]is image compression standard which supports lossy and lossless compression of gray scale or color images. In addition to the compression capability, JPEG 2000 supports excellent low bit rate performance without sacrificing the performance at high bit rate, region of interest coding, EBCOT(Embedded Block Coding with Optimal Truncation) which overcomes the limitations of EZW (embedded zero-tree wavelet coding) which are random access to specific regions of the image, error resilience. It also supports flexible file format and progressive decoding of the image to allow from lossless to lossy by fidelity and resolution. It is a transform based framework,uses wavelet based decomposition. Wavelet transform has 3dB improvement over DCT based compression[14]. Lossless compression is the result of transform, entropy coding. We consider non-scalable, single layer mode since scalability feature leads to adverse effect on rate-distortion performance. Also we disable tiling mode because it also lowers rate-distortion performance. Tiling allows the image be partitioned into non-overlapped rectangular tiles to be encoded independently [7].
Fig.9: Structure of JPEG 2000 codec. The structure of the (a) encoder and (b) decoder [22]
Fig.10: Tiling, DC level shifting, color transformation, DWT of each image component [9]
JPEG XR
JPEG XR [16], a coded file format is designed mainly for storage of continuous-tone photographic content. It supports wide range of color formats including n-channel encodings using fixed and floating point numerical representations, bit depth varieties giving a way for wide range of data compression scenarios. The ultimate goal is to support wide range of color encodings, maintain forward compatibility with existing formats and keep device implementation simple. It also aims at providing same algorithm for lossless as well as lossy compression.