A Quick Illustration of JPEG 2000

Fall 2003 ECE533 Final Project Proposal

Department of Electrical and Computer Engineering

University of Wisconsin-Madison

By

______

Kim-Huei LowData Fok

Submitted to: Professor Hu

Dec 12th 2003

1. Introduction

The international JPEG (Joint Bi-level Image Experts Group) and JBIG (Joint Photographic Experts Group) groups, who represent a wide variety of companies and academic institutions worldwide, have created a new image coding system that uses state-of-the-art compression techniques based on wavelet technology. This standard is called the JPEG 2000, which its architecture should lend itself to a wide range of uses from portable digital cameras through to advanced pre-press, medical imaging and other key sectors.

The JPEG 2000 standard has 11 parts [2], in which part 1, the core coding system is now published as an International Standard. Parts 2-6 are complete or nearly complete, and parts 8-11 are still under development. A Java software implementation of the standard (JJ2000) is also found [1], which implements the entire part 1 of the standard.

Although both the standard and the tools are available to the public, but due to the complexity of the standards, especially for amateurs who just begin to learn about image coding, we feel that there is a need to present a quick tutorial of JPEG 2000. This paper is therefore written to give new users a grasp of JPEG 2000.

2. Approach

The most straightforward way to present a quick tutorial of JPEG 2000 is to illustrate and to examine its features one by one in great details. In order to do that, we have familiarized ourselves with the standards and the tools. At first, we have difficulties understanding some of the algorithms and specifications in the standard. However, after several iterations of reading and with the help of the Java tool set, we are able to Figure out all the features in JPEG 2000.

Similar to the JPEG 2000 final committee draft, we will present our work in the same order. We will briefly explain each section of the standard, illustrate each feature, discuss about its applications, and list the pros and cons. For visualization purposes, most of the time, we encode images with extremely low bit rate to show noticeable differences even with small display images. The encoding bit rates are tuned with very fine granularity, base on the feature that we are illustrating.

3. Experiments, Results & Discussions

Annex A, Annex C and Annex D of the JPEG 2000 standard will not be examined. This is because these sections do not consist of a feature. Instead, Annex A talks about the headers and markers which is simply a bunch of constants that are used to efficiently represent an image. Although it is essential to understand Annex C (Arithmetic Entropy Coding) and Annex D (Coefficient Bit Modeling), however, to a user, these are basically the algorithms. Feature wise, they do not play an important role.

3.1 Annex B: Data Ordering

3.1.1 Tile division

In JPEG 2000, an image offset and a tile offset is often given to specify the upper left corner of the desired cropped image. It’s expensive to load a huge image in hardware and try to encode it. Therefore, images are typically broken down into multiple tiles, and encoded independently. The Discrete Wavelet Transformation (DWT) is designed for this purpose.

A tile-component is a tile consists of only one component. For example, a RGB image would be broken down into R tile-component, G tile-component and B tile-component. Each tile-component is then further divided down to different resolutions and sub-bands with the use of DWT. Each resolution is divided into multiple precincts which identify a geometric position of a tile-component of an image. Furthermore, each sub-band at each resolution is divided into multiple code-blocks, which will be coded into individual packets.

Shown below is an illustration of the selection of code-blocks for encoding, assuming only two levels of DWT decomposition. Due to the cropping of a tile-component, not all precinct partitions and code-blocks are included for coding. A precinct is only included if the entire precinct falls within the cropped region of the tile-component. Moreover, only code-blocks that are overlapped with the designated precincts are included for coding as shown in Figure 3.1.1-4.

Figure 3.1.1-1: Original DWT Figure 3.1.1-2: Precinct Selection

Figure 3.1.1-3: Sub-band Selection Figure 3.1.1-4: Code-block Selection

3.1.2 Progression Order

For a given tile, the packets contain data from a specific layer, a specific component, a specific resolution, and a specific precinct. The order in which these packets are interleaved is called the progression order. The interleaving of the packets can progress along four axes: layer, component, resolution and precinct.

As shown below, progression in layer and resolution results in sharper image when bit rate increases, and progression in precinct results in overall clearer image, as more precincts/portions of the image are decoded. We also show that progression in component results in less color distortion and increasing contrast as more components are decoded.

There are altogether 5 progression types defined in the JPEG 2000 standard. They are listed below:

1) Layer-Resolution-Component-Position Progressive

- All positions are encoded before all components before all resolutions before all layers.

- Image quality is reduced before any color components or parts of the image being thrown away.

2) Resolution-Layer-Component-Position Progressive

- Visually same effects as Layer-Resolution-Component-Position Progressive.

3) Resolution-Position-Component-Layer Progressive

- In general, same effects as Layer-Resolution-Component-Position Progressive.

- Better quality since layer has the highest priority to be coded, trading off some portions of the image.

4) Position-Component-Resolution-Layer Progressive

- All layers are encoded before all resolutions before all components before all positions.

- Results in loss of parts of image as positions are truncated to achieve the target bit rate.

5) Component-Position-Resolution-Layer Progressive

- Identical to Position-Component-Resolution-Layer Progressive

- Loss of color components occurs before any parts of the image are truncated.

For illustration, we have grouped these progression types into three basic types, the layer or resolution progressive, the component progressive, and the position progressive. We decode these images with decreasing bit rate to observe the progression differences. Note that the target bit rate must be achieved through packets truncation.

With layer or resolution progression as shown in Figure 3.1.2-1, as the bit rate decreases, the image first becomes blurry, i.e. loses layer or resolution, and then becomes colorless, i.e. loses component as the bit rate gets extremely low. In the other case as shown in Figure 3.1.2-2, with component progression, the image loses its color component before its layer or resolution degrades. This can be verified by examining the 0.5 bpp image, which still retains all the obvious details while all the color components have completely vanished.

Figure 3.1.2-1: 1bpp, 0.5bpp, 0.05bpp and 0.01bpp J2K Image with Layer or Resolution Progression

Figure 3.1.2-2: 1bpp, 0.5bpp, 0.1bpp and 0.01bpp J2K Image with Component Progression

Figure 3.1.2-3: 1bpp, 0.5bpp and 0.1bpp J2K Image with Position Progression

As for position progression, as shown in the second image in Figure 3.1.2-3, the bottom right strawberry becomes blurry while the top center strawberry still retains its high resolution. A similar observation can be seen in the third image where the top center strawberry is split into two portions, clear on the left side and blurry on the right side. This demonstrates that more portions of the image become clearer as bit rate increases, therefore the position progressive.

3.2 Annex E: Quantization

There are three types of quantization in JPEG 2000. They are reversible, expounded, derived. However, their application is fixed with one of the two discrete wavelet transforms, namely, 5/3 wavelet transform and 9/7 wavelet transform. The different between the three types of quantization is in the quantization step size.

Generally, after the forward wavelet transform, each of the transform coefficients ab(u,v) of the sub-band b is quantized to the value ab(u,v) according to the following equation:

where is the quantization step size.

User can specify number of guard bits to prevent possible overflow of excursion beyond the nominal range of the integer representation of . The number of guard bits is 0 to 7. Typical numbers of guard bits are 1 or 2.

3.2.1 Reversible quantization

Reversible quantization can only be processed if the image has been under 5/3 wavelet transform. The quantization step size is required to be one (no quantization is performed) in order to have reversible quantization.

Following are images with different target bits per pixel (bpp) value using reversible quantization

Figure 3.2.1-1
original image (jpeg)
(54,089 bytes) / Figure 3.2.1-2
1 bpp
(32,762 bytes)
Figure 3.2.1-3
0.5 bpp
(16,319 bytes) / Figure 3.2.1-4
0.05 bpp
(1,629 bytes)

Although no quantization is actually performed, in comparing the original image with Figure 3.2.1-3 with 0.5 bpp, we can see the power of the data compression in JPEG 2000. With similar picture quality, the file size of Figure 3.2.1-3 is much smaller than the original JPEG image.

3.2.2 Irreversible quantization

Irreversible quantization can only be process with 9/7 wavelet transform. There are two type of irreversible quantization: expounded and derived.

Explicit Quantization

Expounded quantization is also known as explicit quantization. During decoding, the step size of each sub-band is calculated from the values explicitly stored for each sub-band. Encoding is performed in the opposite manner; Step size can be specified by a user and is decomposed in to some values by a formula and explicitly stored in the bit streams.

Following are images with different step size using explicit quantization with target bpp = 0.5

Figure 3.2.2-1
step size 1
(868 bytes) / Figure 3.2.2-2
step size 0.1
(11,986 bytes)
Figure 3.2.2-3
step size 0.01
(16,363 bytes) / Figure 3.2.2-4
step size 0.0078125
(16,337 bytes)

Compare Figure 3.2.1-3 with Figure 3.2.2-2, it is shown that by using explicit quantization with a step size of 0.1, we can obtain an image with similar quality with smallerfile size than using reversible quantization. This is because some information in the image is being discarded in quantization. However, as long as it “looks” like the original image, we have a gain in a smaller file size.

One has to note that although they are using the same step size for Figure 3.2.2-1 and Figure 3.2.1-2, they are processed under different DWT. Thus, they got different image quality.

Implicit Quantization

Derived quantization is also known as implicit quantization. It is similar to explicit quantization, but instead of stored for each sub-band, the values use to construct the step size is only stored for the LL band. All other band’s step size is derived implicitly from the values of the LL band.

Following are images with different step size using implicit quantization with target bpp = 0.5

Figure 3.2.2-1
step size 1
(787 bytes) / Figure 3.2.2-6
step size 0.1
(11,704 bytes)
Figure 3.2.2-7
step size 0.01
(16,320 bytes) / Figure 3.2.2-8
step size 0.0078125
(16,327 bytes)

Compare Figure 3.2.2-6 and Figure 3.2.2-2, we can see that even using the same step size, the file size of the image using implicit quantization is smaller than the one using explicit quantization provided that the image quality is about the same. It is because as explicit quantization has to store step size information for every sub-band while implicit quantization just store those information for the LL band. The overhead is smaller, thus result in a smaller file size.

The option of different quantization provides a choice for the need of different applications. User can choose any on of the quantization according to their needs. However, user has to keep in mind that, their choice of quantization is fixed by the type of discrete wavelet transform they choose.

3.3 Annex F: Discrete Wavelet Transformation

There are two type of wavelet transformation, a user can choose to use in JPEG 2000. They are reversible transformation (5/3) and the irreversible transformation (9/7). They both apply under the same algorithm, however, with different low pass and high pass filter.

3.3.1 Forward Discrete Wavelet Transform (FDWT)

For encoding, the image is first divided in to many tile components. Then each tile will under go a forward discrete wavelet transform (FDWT) by itself. Each tile component is transformed into a set of two-dimensional sub-band signals ab(ub,vb), each representing the activity of the signal in various frequency bands b, at various spatial resolutions. The different number of levels of spatial resolutions is called the number of decomposition levels (NL). The decomposition levels can be specified by the user.

Figure 3.3.1-1: The FDWT (NL = 2)

The total number of sub-bands is (NL x 3) + 1. The sub-bands are labeled in the following way: an index lev corresponding to the level of the sub-band decomposition, followed by two letters which are either LL, HL, LH or HH. Coefficients from the sub-band b=levHL, are the transform coefficients obtained from low-pass filtering vertically andhigh-pass filtering horizontally at decomposition level lev. Coefficients from the sub-band b=levLH, are the transform coefficients obtained from high-pass filtering vertically and low-pass filtering horizontally at decomposition level lev. Coefficients from the sub-band b=levHH, are the transform coefficients obtained from high-pass filtering vertically and high-pass filtering horizontally at decomposition level lev.

The decomposition will continue in the LL sub-band, until desire NL is reached. Thus, coefficients from the sub-band b=NLLL, are the transform coefficients obtained from low-pass filtering vertically and low-pass filtering horizontally at the last decomposition level NL. Figure 3.3-1 is the illustration of all the sub-bands when NL = 2.

The sub-bands are then channel out to the next level as the following order: NL LL, NL HL, NL LH, NL HH, (NL -1)HL, (NL -1)LH, (NL -1)HH, ... , 1HL, 1LH, 1HH

3.3.2 Inverse Discrete Wavelet Transform (IDWT)

Inverse discrete wavelet transform is performed in a similar but reverse manner. Upon received a set of sub-bands with coefficients ab(ub,vb), IDWT will transform the coefficients into DC-level shifted tile component samples, which depend on the NL.

Figure 3.3.2-1: The IDWT (NL = 2)

3.3.3 Reversible and irreversible transformation

As its name suggests, 5/3 transformation is reversible, which mean all the original data can be completely retrieve after inverse transformation, provided that the original data are in integer. The irreversible transformation 9/7 cannot completely retrieve all the original data. Yet, it can help to reduce the image file size while not degrade the image quality very much.

Here are some test images with different transformation and decomposition level

5/3 / 9/7
NL = 14 /
(275,242 bytes) /
(99,027 bytes)
NL = 10 /
(274,942 bytes) /
(98,807 bytes)
NL = 3 /
(274,060 bytes) /
(98,465 bytes)

Figure 3.3.3-1

The result shows that despite the complexity in implementing it and its lossy nature, 9/7 transform is included in JPEG 2000 because of the high compression ratio it can get while maintaining the picture quality. We can see that though the file size of the test images.

One may also see that as the decomposition level decreases, the file size decreases as well. It can be explained by the increases in the number of sub-bands being produced increases the overhead and the information sent through the bit stream. This might lead to the question of “Why do we need to decompose in to sub-bands as that will increase file size?” The answer lies in the nature of the DWT. The low level sub-bands of an image are always redundant to our eyes. Since we will not notice any change of the picture even if it is discarded. Therefore, by decompose an image in to sub-bands and discard the low level sub-bands, one can obtain an image with small file size yet still present to our eyes, which is illustrated in Figure 3.3.3-2.

NL = 3
8.3636 bpp
(274,060 bytes) / NL = 14
0.9948 bpp
(32,600 bytes)

Figure 3.3.3-2 (both image is using 5/3 filter)

Depends on what kind of usage, user can select the most suitable transformation with the ability to fine tune by the number of decomposition level and bit per pixel he wants.

3.4 Annex G: DC Level Shifting and Component Transformation

Prior to the discrete wavelet transformation, all pixels in an image are level-shifted by a fixed constant value. This is to reduce the number of bits needed to represent the DWT coefficients. Furthermore, if the image is in RGB format, a component decorrelating transformation is often required to convert the image into YCbCr representation. There are two types of component transformation. The Reversible Component Transformation (RCT) is designed for lossless compression, and the Irreversible Component Transformation (ICT) is designed to produce excellent compression ratio for lossy compression.

Figure 3.4-1(a),(b),(c): Raw Image; 0.035bpp J2K Image with RCT; 0.035bpp J2K Image with ICT

We encoded a raw image with an extremely low bit rate to illustrate a rather visually noticeable quality loss with the use of RCT on lossy compression. From Figure 3.4-1(a) to Figure 3.4-1(c), by closely examining the bottom right side of the valley, it can be seen that the RCT image is generally more blurry than the ICT image. More details are lost in RCT image as the encoder attempts to achieve a low bit rate as compare to the ICT image. Therefore, for lossy compression, it is recommended to use ICT. However, due to the fact that RCT does not require any multiplications, which are expensive in hardware and due to the insignificant quality loss for higher bit rate compression, we believe that RCT can be chosen for both lossless and lossy compression to reduce cost.