Image Compression Tutorial
Chapter 2 Existing Image Coding Standards
After introducing the basics of each component in image compression flow chart, it is time to introduce two of the most well-known image compression standards JPEG and JPEG 2000. In this section, we will introduce the basic functional blocks in these two image compression standards.
2.1The JPEG Standard
JPEG is the first international digital image compression standard for still images, both grayscale and color, and many of the later image and video compression standards are mainly based on the basic structure of JPEG. The JPEG standard defines two basic compression methods. One is the DCT-Based compression method, which is the lossy compression, and another is the Predictive compression method, which is the lossless compression. The DCT-Based compression method is known as the JPEGBaseline, which is the most widely used image compression format in the World Wide Web and digital camera. In this subsection, we will introduce both the lossy and lossless JPEG compression methods.
2.1.1 The JPEG Encoder
Fig. 2.1 shows the DCT-Based JPEG encoder. The YCbCr color transform and the chrominance downsampling is not defined in the JPEG standard, but most JPEG compression software will perform these processing because it makes the JPEG encoder to reduce the data quantity more efficiently. The downsampled YCbCr passing through the JPEG encoder will be encoded as bitstream afterward.
Fig. 2.1 The DCT-Based encoder of JPEG compression standard
2.1.2 Transform Coding – Discrete Cosine Transform
As we know, the natural image contents change slowly across the image in a small region, so the JPEG encoder will segment the original image into several small 8×8 blocks, which are called Macroblocks. The encoder will perform DCT on these macroblocks and the frequency transform coefficients will be obtained. Because the transform are performed on the 8×8 blocks, the forward DCT formula in (2.1) can be rewritten as follows
(2.1)
2.1.3 Quantization
The second step after DCT is quantization. As just mentioned, the human visual is more sensitive to the low frequency components than the high frequency components, so the quantization will assign large quantization step size to the high frequency components to discard the redundant information and reduce the data quantity and assign small quantization step size to the low frequency components to save the significant information. The JPEG standard defines two quantization tables, one for the luminance and another for the chrominance. The luminance quantization table and chrominance quantization table are defined in (2.2) and (2.3), respectively.
(2.2)
(2.3)
With the quantization table, we can encode the DCT coefficients as
(2.4)
The quantization step side is smaller in the left top direction because the low frequency components locate. The right bottom direction components are assigned larger quantization step size because the high frequency components locate. On the other hand, it can be observed that most of the high frequency quantization step sizes for chrominance are large and the same, this is due to the characteristics of the human visual system. The human visual system is more sensitive to the luminance information than the chrominance information, so the distortion of the high frequency chrominance components is not detected by the human eyes easily. Thus, in general, we will assign large quantization step size to the high frequency components in the chrominance signal.
2.1.4 Differential Coding of the DC Components
After quantization, we will separate the DC components from the AC components because large correlation still exists between the DC components in the neighboring macroblocks. Assume the DC component in i-th macroblock is DCi and the DC component in the previous block is DCi-1. One example is shown in Fig. 2.2.
Fig. 2.2Differential Coding of the DC components in the neighboring blocks
We will perform differential coding on the two DC components as follows
(2.5)
We set DC0 = 0. DC of the current block DCi will be equal to DCi-1 + Diffi. Therefore, in the JPEGfile, the first coefficient is actuallythe difference of DCs.Then the difference is encoded with the AC coefficients.
2.1.5 Zigzag Scanning of the AC Coefficients
However, the quantizated and differential coded coefficients are 2D signal and the structure seems not suitable for the entropy encoder. Thus, the JPEG standard perform zigzag scanning on the AC coefficients to convert it to the 1D signal and order the sequence from low frequency to high frequency. The zigzag scan order is illustrated in Fig. 2.3. It is worth noting that the zigzag scanning is performed on the 63 AC coefficients only, exclusive of the DC coefficient.
Fig. 2.3Zigzag scanning order
2.1.6 Run Length Coding of the AC Coefficients
Because we assign larger quantization step size to the high frequency components, most of the high frequency components will be truncated to zero while coarse quantization (strong quantization) is applied. In order to take advantage of this property, the JPEG standard defines run length coding to convert the AC coefficients to more efficient format for the entropy encoder. The RLC step replaces the quantized values by
(2.6)
where RUNLENGTH is the number of zeros, and VALUE is the nonzero coefficients.
For example, assume the quantized 63 AC coefficients are as follows
(2.7)
Then we can perform RLC on this vector and we will have
(2.8)
where EOB means that the remaining coefficients are all zeros.
2.1.7 Entropy Coding
The DC and the AC coefficients after run length coding finally undergo the entropy coding. Because it takes large computation to grow the optimal Huffman tree for the input image and the dynamic Huffman tree must be included in the bitstream, the JPEG defines the two Huffman tables for the DC and AC coefficients, respectively. These Huffman coding tables save large amount of computation to encode the images and the Huffman tree is not necessarily included in the encoded bitstream because the default Huffman tree defined in the JPEG standard is fixed. The codeword table for the value of the DC and AC coefficients is shown in Table 2.1.
Table 2.1The codeword table for VALUE of the coefficients
Values / Bits for the value / Size0 / 0
-1,1 / 0,1 / 1
-3,-2,2,3 / 00,01,10,11 / 2
-7,-6,-5,-4,4,5,6,7 / 000,001,010,011,100,101,110,111 / 3
-15,...,-8,8,...,15 / 0000,...,0111,1000,...,1111 / 4
-31,...,-16,16,...31 / 00000,...,01111,10000,...,11111 / 5
-63,...,-32,32,...63 / 000000,...,011111,100000,...,111111 / 6
-127,...,-64,64,...,127 / 0000000,...,0111111,1000000,...,1111111 / 7
-255,..,-128,128,..,255 / ... / 8
-511,..,-256,256,..,511 / ... / 9
-1023,..,-512,512,..,1023 / ... / 10
-2047,...,-1024,1024,...,2047 / ... / 11
Each differential coded DC coefficients will be encoded as
(2.9)
where SIZE indicates how many bits are needed to represent the VALUE of the coefficient, and VALUE is the actual encoded binary DC coefficients. SIZE must be Huffman coded while VALUE is not.
For example, if we have four DC coefficients 150, 5, -6, 3, then the DC coefficients will be turned into (2.10) according to Table 2.1.
(2.10)
After the value of SIZEis obtained, we encode them with the Huffman table defined in Table 2.2, so we can finally encode all the DC coefficients as
(2.11)
Table 2.2 The Huffman table for the SIZE information of the luminance DC coefficients
Size / Values / Code Length0 / 00 / 2
1 / 010 / 3
2 / 011 / 3
3 / 100 / 3
4 / 101 / 3
5 / 110 / 3
6 / 1110 / 4
7 / 11110 / 5
8 / 111110 / 6
9 / 1111110 / 7
10 / 11111110 / 8
11 / 111111110 / 9
The AC coefficients can be encoded with similar Huffman Table. The Huffman table for the AC coefficients is shown in Table 2.3. The codeword format of the AC coefficients is defined as follows
(2.12)
Table 2.3 The Run/Size Huffman table for the luminance AC coefficients
Run/Size / code length / code word0/0 (EOB) / 4 / 1010
15/0 (ZRL) / 11 / 11111111001
0/1 / 2 / 00
...
0/6 / 7 / 1111000
...
0/10 / 16 / 1111111110000011
1/1 / 4 / 1100
1/2 / 5 / 11011
...
1/10 / 16 / 1111111110001000
2/1 / 5 / 11100
...
4/5 / 16 / 1111111110011000
...
15/10 / 16 / 1111111111111110
With Table 2.1, the AC coefficients after RLC in (2.8) can be encode as follows
(2.13)
Finally we must encode the Run/Length data in (2.13) with the Huffman table defined in Table 2.3, so the final bitstream for the AC coefficients can be encoded as
(2.14)
2.1.8 The JPEG Decoder
The encoded bitstream can be converted back to the spatial image coefficients through an inverse process. The structure of the JPEG decoder is shown in Fig. 2.4. It is easy to reconstruct the image at the decoder.
Fig. 2.4 The DCT-Based decoder of JPEG compression standard
However, the quantization operation is a many-to-one mapping, so the compression in baseline JPEG is lossy operation. The reconstructed image is just an approximation of the original image. On the other hand, the JPEG segment the original image into several macroblocks and perform transform coding and quantization on each blocks. This will introduce the very serious artifact called blocking effect, which results from coarse quantization that discards most of the high frequency components of each segmented macroblock. One example of the blocking effect is shown in Fig. 2.5.
Fig. 2.5 (a) The original image / (b) The highly compressed image2.2The Lossless JPEG
In addition to the lossy DCT-Based compression method, the JPEG standard also defines the lossless encoder for compressing the image and achieving perfect reconstruction. The encoder of the lossless JPEG is shown in Fig. 2.6.
Fig. 2.6 The encoder of the lossless JPEG
2.2.2 Predictor of Lossless JPEG
In order to illustrate the predictor in lossless JPEG, we must refer to Fig. 2.7.
Fig. 2.73-Sample Prediction Neighborhood
The syntax X represents the pixel to be predicted, and A, B, C are its neighboring pixels. The function of the predictor is defined in Table 2.4, and we can choose one from the eight prediction modes. The predictive value X will be subtracted from the actual value of X afterward. It is worth noting that the sample X can be predicted from the left and upper samples because of the decoding order. At the decoder, we will decode the coefficients from left to right and from top to bottom. Therefore, we cannot predict the sample X from the samples in the right and lower sides, which are not being decoded yet.
Table 2.4 The function of the predictor
Selection Value / Prediction0 / No prediction
1 / A
2 / B
3 / C
4 / A+B-C
5 / A+[(B-C)/2]
6 / B+[(A-C)/2]
7 / (A+B)/2
The residue of the sample X will be encoded with the Huffman table defined in the JPEG standard afterward. Since there is no quantization operation in the lossless JPEG and the predictor functions are composed of the ADD, SUBTRACT and SHIFT(divided by 2), there is no information loss in the lossless JPEG.
2.3The JPEG 2000 Standard
With the expansion of the high requirements of multimedia and Internet applications, the JPEG cannot fulfill the demand gradually. Therefore, the new call for the new standard JPEG 2000 was launched to solve many increasing shortcomings of the existing image compression standards. The features of JPEG 2000 include: (1) Efficient lossy and lossless compression with the same coding platform. (2) Superior image quality. (3) Additional features such as spatial scalability and region of interest. The JPEG 2000 encoder and decoder are shown in Fig. 2.8 and Fig. 2.9, respectively. We will introduce each functional blocks in the rest of this subsection.
Fig. 2.8The block diagram of the JPEG 2000encoder
Fig. 2.9The block diagram of the JPEG 2000 decoder
2.3.1 Forward Component Transform
As mentioned in the previous sections, we must perform color space transform to separate the luminance components from the chrominance components for efficient processing, and the forward component transform is aim to accomplish this work. In JPEG 2000, two types of component transform are defined: the reversible component transform (RCT) and the irreversible component transform (ICT). The RCT is reversible and integer-to-integer, while the ICT is irreversible and real-to-real.
The forward RCT is defined as (2.15) and we can find that it is floating-point transform. Since the RCT is irreversible, it can be used in the lossy compression only.
(2.15)
where , , and are the red, green, and blue planes, respectively. , , and are the Y, Cb, and Cr planes, respectively.
The ICT is defined in (2.16) to (2.18). It is obvious that the transform is integer transform, so it can be applied to both lossy and lossless transform.
(2.16)
(2.17)
(2.18)
Where , , , , , and are defined the same as in (2.15).
2.3.2Discrete Wavelet Transform
In JPEG 2000, the Discrete Wavelet Transform is exploited to reduce the correlation between pixels. Before DWT, each image components will be partitioned into several tiles as shown in Fig. 2.10. The tiling operation will result into the loss of the image quality, but each tile can be encoded and decoded separately. This property makes JPEG 2000 enable the applications of region of interest (ROI), which can provide higher quality in some region we are interested in. On the other hand, the arithmetic codes suffer from the noise in the channel, and the tiling operation enable the decoder to reconstruct the image even parts of the tiles cannot be recovered.
Fig. 2.10Tiling, and DWT of each image tile
After tiling, we perform 2D DWT on each tile to split it into several frequency subbands. The DWT used in JPEG 2000 can be reversible or irreversible. The default irreversible DWT defined in JPEG 2000 is the Daubechies 9/7 filter, and the analysis and synthesis filter coefficients are listed in Table 2.5. Since the 9/7 filter coefficients are real, the forward and inverse transform are both irreversible, so it can be applied in lossy compression only. On the other hand, the reversible DWT defined in JPEG 2000 is the 5/3 filter, and the analysis and synthesis filter coefficients are listed in Table 2.6. Because the 5/3 filter in JPEG 2000 is reversible, it can be applied in both lossy and lossless compression.
Table 2.5 Daubechies 9/7 analysis filter coefficients
Analysis Filter Coefficients / Synthesis Filter Coefficientsn / Lowpass Filter / Highpass Filter / Lowpass Filter / Highpass Filter
0 / 0.602949018236 / 1.115087052456 / 1.115087052456 / 0.6029490182363
±1 / 0.266864118442 / -0.059127176311 / 0.591271763114 / -0.2668641184428
±2 / -0.078223266528 / -0.057543526228 / -0.057543526228 / -0.0782232665289
±3 / -0.0168641184428 / 0.091271763114 / -0.0912717631142 / 0.0168641184428
±4 / 0.026748757410 / 0.0267487574108
Table 2.6 5/3 analysis filter coefficients
Analysis Filter Coefficients / Synthesis Filter Coefficientsn / Lowpass Filter / Highpass Filter / Lowpass Filter / Highpass Filter
0 / 6/8 / 6/8 / 1 / 6/8
±1 / 2/8 / 2/8 / 1/2 / -2/8
±2 / -1/8 / -1/8 / -1/8
2.3.3 Quantization
After transform coding, we must perform quantization to reduce the precision of the data. In JPEG 2000, different quantization is performed on different subbands of each tile. The largest characteristic of the JPEG 2000 quantization is that it can achieve lossless and lossy quantization by specifying different parameters. We represent the wavelet coefficient in subband b as, and we can quantize it as
(2.19)
where is the quantization step size and. is the nominal dynamic range of subband b, which depends on the largest value in that subband. The exponent and the mantissa are the other two parameters to adjust the strength of quantization.
We can achieve lossy and lossless quantization in JPEG 2000 by means of adjusting the parameters
- Lossless Quantization: If , , then the step side . Thus, lossless quantization can be achieved.
- Lossy Quantization: If lossy quantization is applied, then we must specify and in the following ranges - and .
2.3.4 Tier-1 Encoder
The entropy encoder adopted in JPEG 2000 is called Embedded Block Coding with Optimized Truncation, abbreviated EBCOT. We can divide the operation of EBCOT into two steps: Tier-1 and Tier-2. The Tier-1 encoder is shown in Fig. 2.11, and it is composed of three parts: bit-plane conversion, context formation, and the arithmetic coding. Each part of the tier-1 encoder will be introduced in the rest of this sub-section.
Fig. 2.11 The block diagram of the tier-1 encoder
1) Bit-plane Conversion
This is the first part of the tier-1 encoder, and it converts the quantized wavelet coefficients into several bit-planes. The first bit-plane is the sign plane, which is composed of the sing bit of each wavelet coefficients. The other planes are the magnitude plane, from MSB to LSB. The bit-plane conversion is shown in the left of Fig. 2.12. After the bit-planes are obtained, we must determine the significance of every bit. We take the right figure of Fig. 2.13 for example. A bit is viewed as insignificant before the first nonzero bit is met from MSB to LSB, which is marked white; otherwise it is viewed as significant, which is marked gray. The bit 1 marked green is the first nonzero bit at position [x,y].We define the significance of the bit in (x,y) in bit-plane P as vP[x,y], and vP[x,y]=1 if it is significant; else vP[x,y]=0.
Fig. 2.12 Bit-plan scanning and the significant and insignificant bits
2) Fractional Bit-plane Coding
After the significance of each bit is marked, the encoder can generate the contextual information for each bit. With the contextual information, the arithmetic encoder can compress the data more efficiently. Before encoding, the each subband will be partitioned into several non-overlapping rectangles called code blocks. After the code blocks are obtained, the data in each code block can be encoded separately from the other code blocks. The order of the coding in each code block is shown in Fig. 2.13, the scanning order is from top to down, from left to right, and from the higher bit-plane to the lower bit-plane. After 64 bits are scanned in the code block, the coding position will jump to the next four bits in the vertical direction, and repeat the same scanning scheme.
Fig. 2.13The scanning order of the bit to be coded in a code block
After we know the scanning order of the bits, we can generate the contextual information based on the adjacent bits around it. The adjacent bits will form a context window as shown in Fig. 2.14, and the bit marked X is the current one to be coded. The contextual information is generated by four coding methods: zero coding, sign coding, magnitude refinement coding, and run length coding. Each coding will generate the Context (CX) and the Decision (D), which are used for arithmetic coding. The four coding methods needs two variables σ and φ, and both of them are initialized to 0 in each bit. σ[x,y] is 1 after the first nonzero bit at [x,y] is met, and φ[x,y] is just the sign bit at position [x,y] in the bit plane.