MULTIMEDIA PROCESSING
EE5359 PROJECT REPORT
ADAPTIVE INTERPOLATION FILTER FOR H.264/AVC
UNDER THE GUIDANCE OF: DR. RAO
-BhavanaPrabhakar
Student Id: 1000790889
ABSTRACT:
For reducing the bit-rate of video signals, current coding standards apply hybrid coding with motion-compensated prediction and transform coding of the prediction error [16]. From prior research [3] it is known that aliasing components contained in an image signal, as well as motion blur are limiting the prediction efficiency obtained by motion compensation. Hence, the objective is to show that the analytical development of an optimal interpolation filter at particular constraints is possible, resulting in coding improvements of broadcast quality compared to the H.264/advanced video coding (AVC) [8] high profile. Furthermore, the spatial adaptation to local image characteristics enables further improvements for common intermediate format (CIF) sequences compared to globally adaptive filter. Additionally, it will be shown that the presented approach is generally applicable, i.e., also motion blur can be exactly compensated, if particular constraints are fulfilled.
REQUIREMENT OF INTERPOLATION:
Motion-compensated prediction (MCP) is the key to the success of the modern video coding standards, as it removes the temporal redundancy in video sequences and reduces the size of bitstreams significantly. With MCP, the pixels to be coded are predicted from the temporally neighboring ones, and only the prediction errors and the motion vectors (MV) are transmitted. However, due to the finite sampling rate, the actual position of the prediction in the neighboring frames may be out of the sampling grid, where the intensity is unknown, so the intensities of the positions in between the integer pixels, called sub-pixel positions, must be interpolated and the resolution of MV is increased accordingly.
INTERPOLATION IN H.264/AVC
In H.264/AVC, for the resolution of MV is quarter-pixel, the reference frame is interpolated to be 16 times the size for MCP, 4 times both sides. As shown in Fig. 1(a), the interpolation defined in H.264 includes two stages, interpolating the half-pixel and quarter-pixel sub-positions, respectively. The interpolation in the first stage is separable, which means the sampling rate in one direction is doubled by inserting zero-valued samples followed by filtering using a 1-D filterh1, [1, 0, -5, 0, 20, 32, 20, 0, -5, 0, 1]/32 [21], and then the process repeats in the other direction. The second stage, which is non-separable, uses bilinear filtering supported by the integer pixels and the interpolated half-pixel values.
Fig. 1 Interpolation process of (a) the filter in H.264/AVC, (b) the optimal AIF, and (c) the separable AIF
= m times the sampling rate due to interpolation.
REVIEW OF ADAPTIVE INTERPOLATION FILTERS (AIF)
Considering the time-varying statistics of video sources, some researchers propose using adaptive interpolation filter (AIF) [16], which is one of the design elements making KTA significantly outperformJM [12]. With AIF, the filter coefficients are optimized on a frame basis, such that for each frame the energy of the MCP erroris minimized. The optimal filter coefficients are quantized, coded, and transmitted as the side information of the associated frame
2-D non-separable AIF [22], of which the interpolation process isshown in Fig. 1(b), increases the spatial sampling rate 16 times at one time by zero-insertion, andeach sub-pixel position is interpolated directly by filtering the surrounding 6×6 integer pixels. Fig.2 (a) shows the support region of 2-D non-separable AIF. As the spatial statistics are assumed to be isotropic, the filterhis in circular symmetry and therefore 1/8 of the coefficients are coded, as shown in Fig.2 (b).The assumption that the spatial statistics are isotropic may not hold for every frame in a video sequence.2-D separable AIF is proposed, which considers the spatial statistics of horizontal and vertical directions different and reduces the complexity of 2-D non-separable AIF.The 1-D AIFs for the two directions are separately designed. As shown in Fig. 1(c), the horizontal sampling rate is increased four times by zero-insertion and a 1-D filterh1calculated for the current frame is applied. Then, the process repeats for the vertical direction usingh1.
(a) (b)
Fig.2 2-D non-separable AIF’s (a) support region and (b) coded coefficients[22]
DESCRIPTION:
To reduce the bit-rate of video signals, the international telecommunication union (ITU) coding standards [14] apply hybrid video coding with motion-compensated prediction combined with transform coding of the prediction error. In the first step the motion- compensated prediction is performed. The temporal redundancy, i.e., the correlation between consecutive images is exploited for the prediction of the current image from already transmitted images. In a second step, the residual error is transform coded, thus the spatial redundancy is reduced.
For performing motion-compensated prediction, the current image of a sequence is split into blocks. For each block a displacement vector is estimated and transmitted that refers to the corresponding position of its image signal in an already transmitted reference image. The displacement vectors have fractional-pel resolution. The H.264/ (AVC) [8] is based on ¼ pel displacement resolution [1]. Displacement vectors with fractional resolution may refer to positions in the reference image, which are located between the sampled positions. In order to estimate and compensate the fractional-pel displacements, the reference image has to be interpolated on the fractional-pel positions.
H.264/AVC [8] uses a 6-tap Wiener interpolation filter with filter coefficients . The interpolation process is depicted in Fig.3 and can be subdivided into two steps. At first, the half-pel positions are calculated, using a horizontal or vertical 6-tap Wiener filter, respectively. Using the same Wiener filter applied at fractional-pel positionsthe fractional-pel position j is computed. In the second step, the remaining quarter-pel positions are obtained, using a bilinear filter, applied at already calculated half-pel positions and existing full-pel positions.
Fig.3. Integer pixels (shaded blocks with upper-case letters) and fractional pixel positions (non-shaded blocks with lower-case letters). Example for filter size 6 x6. [15]
An adaptive interpolation filter as proposed in [3] is independently estimated for every image. This approach enables to take into account the alteration of image signal properties as aliasing on the basis of minimization of the prediction error energy. Analytical calculation of optimal filter coefficients is not possible due to nonlinearity, which is caused by subsequent application of 1-D filters. In [4] a 3-D filter is proposed. In this proposal two techniques are combined: a 2-D spatial filter with a motion compensated interpolation filter (MCIF).
The main disadvantage of MCIF is the sensitivity concerning displacement vector estimation errors. Besides aliasing, there are further distorting factors, which impair the efficiency of motion compensated prediction. The main disadvantage of using a 2-D spatial filter with a motion compensated interpolation filter (MCIF) proposed in[4] is its numerical approach to determine the coefficients of a separable 2-D filter. Due to an iterative procedure, this method is nondeterministic in terms of time and requires a significantly higher encoder complexity.
In order to guarantee a limited increase of encoder complexity compared to the standard H.264/ AVC [8] on the one hand and to reach the theoretical bound for the coding gain obtained by means of a 2-D filter on the other hand, a non-separable filter scheme is proposed. An individual filter will be used for the interpolation of each fractional-pel position.
In the following, the calculation of the filter coefficients is shown more precisely. Let us assume that are the 36 filter coefficients of a 6x6-tap 2D filter used for a particular sub-pelposition. Then the value () to be interpolated is computed by a two-dimensional convolution:
Where is an integer sample value ().
The calculation of coefficients and the motion compensation are performed in the following steps:
Displacement vectors are estimated for every image to be coded. For the purpose of interpolation, the standard interpolation filter of H.264/AVC is applied to every reference image.
2-D filter coefficients are calculated for each sub-pel position independently by minimization of the prediction error energy:
with
where is an original image, a previously decoded image, are the filter indices, and are the estimated displacement vector (in meters) components, - a so called Filter Offset centering the filter (, in case of a 6-tap filter ) and -operator is the floor function, which maps the estimated displacement vector to the next full-pel position smaller than . This is a necessary step, since the previously decoded images contain information only at full-pel positions. Note, for the error minimization, only the sub-pel positions are used, which were referred to by motion vectors. Thus, for each of the sub-pel positions an individual set of equations is set up by computing the derivative of with respect to the filter coefficient. The number of equations is equal to the number of filter coefficients used for the current sub-pel position.
For each sub-pel position using a 6x6-tap 2D filter, a system of 36 equations with 36 unknowns has to be solved. For the remaining sub-pel positions requiring a 1D filter, systems of 6 equations have to be solved. Thisresults in 360 filter coefficients (nine 2D filter sets with 36 coefficients each and six 1D filter sets with 6 coefficients per set).
New displacement vectors are estimated. For the purpose of interpolation, the adaptive interpolation filter computed in step 2 is applied. This step enables reducing motion estimation errors, caused by aliasing, camera noise, etc. on the one hand and to treat the problem in the rate-distortion sense on the other hand.
The steps 2 and 3 can be repeated, until a particular quality improvement threshold is achieved. Since some of the displacement vectors are different after the 3. Step, it is conceivable to estimate new filter coefficients, adapted to the new displacement vectors.
The quantization, prediction and entropy coding:
• First, finding an optimal quantization step size is a very important step. On one hand, thefiner the quantization is the more accurate is the prediction.However, on the other hand, the amount of the side information increases, which may impair the coding gain. A trade-off will be set to 9 bits for the amplitude of the filtercoefficients.
• Second, the entropy coded differences to the standardWiener filter are to be transmitted for the 1-D fractional-pelpositions. In case of symmetrical filter, these are fractional-pel positions a andb. The filter coefficients for thefractional-pel position are to be obtained from the filter bymirroring.
•In order to predict filter coefficients of 2-D positions, thecalculated filter coefficients for the 1-D positionsare used. For the prediction of the non-separable 2-D filter,which can also be regarded as a poly phase filter [6], 2-Dseparable filters are used. Fig. 4 illustrates an example with interpolated impulse responseof a predicted filter at the fractional-pel position j and actually calculated filter coefficients from [16]. The spline surfacerepresents the interpolated prediction of the coefficientsof a polyphase filter. The dots represent the calculatedvalues of the coefficients of such a filter, sampled at fractional-pel positions. The greater the distance from thedots to the plotted surface is, the greater is the predictionerror.The filter coefficients for the remaining 2-D positions (g, i, k, m, n ando) are also obtained by mirroring in case of non-symmetric filter. In case of a non-symmetric filter, the filtercoefficients are predicted in the same manner.
Fig. 4. Prediction of impulse response of a 6 x 6-tap 2-D Wiener filter at thefractional-pel position j (displacement vector [ 0.5,0.5] and actually calculatedfilter coefficients from [16].
• The entropy coding is performed using the signed exp-Golomb code [1]. This code is well-suitable for Laplacian distribution and is already implemented in the standard H.264/AVC. Thus, no additional calculations or look-up tablesare required.
In order to keep the amount of the necessary side information as low as possible, thus, enabling the highest coding gains, the filter coefficients are subject to quantization, followed by prediction and entropy coding. For error resilience reasons, only intra prediction is performed. The aliasing effects are minimized by suppressing the high-frequency components.
Steps will be taken to reduce blurring effects by: first considering only displacement without blurring effects.
Where = average displacement vector and the prediction signal () expressed in frequency domain is given in eq.(1)
() = () eq.(1)
With the intention of compensating the blurring effects, the adaptive interpolation filter H(jΩ) for perfect motion compensated prediction has to satisfy the condition given in eq.(2)
() = () .H(jΩ) eq.(2)
It can be shown that by the implementation of an adaptive interpolation filter, blurring effects caused due to motion can be reduced.
Experimental Results:
The analysis is presented to explain the filter behavior based on the camera motion using real sequences. The HDTV sequence Raven is chosen for the evaluation. This sequence is characterized by a strong camera pan in x direction with alternating acceleration and deceleration.
In Fig. 10, four graphics are depicted. Top left, the PSNR prediction quality of the sequence Raven is given with three curves: standard Wiener filter (black), symmetric adaptive filter (red) and non-symmetric adaptive filter (blue). It can be observed that the non-symmetric filter provides the best and the standard Wiener filter the worst prediction quality for all images. Furthermore, the absolute quality raises in case of motion acceleration, which means that motion blur takes place and a very accurate prediction is possible. On the other hand, the quality decreases if the motion slows down resulting in deblurring, which is not ideally predictable when using an FIR-filter. The average displacement vectors in x-direction are depicted for each image bottom left.
Fig. 10. Evaluation of the HDTV sequence Raven in terms of PSNR prediction quality (top left), the displacement vectors per frame in quarter-pel resolution(bottom left) and two cut-outs (top right and bottom right) for the standard Wiener filter, symmetric adaptive interpolation filter and non-symmetric adaptive interpolation filter.
Top and bottom right of Fig. 10, the frequency responses of the three filters are depicted for two cut-outs. Top right, the frequency responses are given, when a very strong motion deceleration takes place. The absolute quality decreases, but the difference of prediction quality between the non-symmetric adaptive filter and the standard Wiener filter is more than 1 dB.Both filters amplify high-frequencies, where at the non-symmetric filter amplifies them more distinctively. Bottom right, the frequency responses are depicted, when motion acceleration takes place. Both adaptive filters show blurring-compensating frequency responses. Basically, the prediction qualities provided by both, symmetric and non-symmetric filters are the smaller, the smaller the motion differences between successive images. Furthermore, the prediction quality when using an adaptive filter is the higher compared to the prediction quality when using the standard Wiener filter, the more blurring or deblurring has to be compensated.
While evaluating the coding efficiency of adaptive Wiener interpolation filter, several QCIF, CIF and HDTV sequences were coded.
In Fig. 11, four rate-distortion curves are depicted for each CIF sequence, where the standard H.264/AVC (labeled as std) is compared with [3] (labeled as wedi), with globally adaptive interpolation filter (labeled as vatis) and with locally adaptive interpolation filter (labeled as vatislaif). As one can see, vatis out-performs wedi for all sequences at almost all bit-rates with exception of very low bit-rates, as e.g., for the sequence Foreman. In this case, the costs for the additional side information are higher than the coding gain. Furthermore, an additional coding gain of approximately 0.15dB can be achieved if yatislaif is applied. Thus, the coding gain of the proposed approach is up to 0.7 dB compared to the standard H.264/AVC and up to 0.3 dB compared to wedi.
In Fig. 12 five rate-distortion curves are depicted for the 720p
(1280x720 progressive) sequences City, Raven, ShuttleStart and 1080p (1920x1080 progressive) Sunflower. These rate-distortion curves show performance of the standard H.264/AVC, the reference method, the globally adaptive symmetric 6x6-tap filter, the globally adaptive symmetric 8x8-tap filter and the globally adaptive non-symmetric 6x6-tap filter.
For HDTV sequences, significant coding gains of up to 1.2 dB are achieved, compared to the standard H.264/AVC and of up to 0.4 dB, compared to the reference method. The highest gainsare achieved at high bit-rates in case the non-symmetric filter is deployed. This leads back to the blurring, contained in the sequences. Three of four tested sequences are characterized by a strong alternative camera motion. Blurring with acceleration and deceleration cannot be exactly compensated by the symmetric filter. In the sequence ShuttleStart, the camera motion is much slower and more monotone, thus the both, symmetric and non-symmetric filter provide a very similar efficiency. At low bit-rates, the symmetric filter outperforms the non-symmetric one due to lower costs for the side information.
Table 2. Most important H.264/AVC coder settings according to common test conditions
In contrast to the reference method, the 8x8-tap symmetric globally adaptive filter does not provide better results compared to 6 x6-tapsymmetric globally adaptive filter. During in the reference method the transition from 6-tap to 8-tap results in one additional filter coefficient, the amount of the filter coefficients in case of the non-separable filter has to be doubled.
In order to show that the proposed method is also competitive against the standard H.264/AVC with its best settings, acomparison to the standard H.264/AVC is performed using therecommended setting given in Table 2 and IBBP scenario.For that purpose, the globally adaptive filter was extended to also support B-images. Furthermore, for some images it is more efficient to use the standard Wiener filter only. This can be for example the case, when a marginal part of an image contains moved objects, during the major part of the image does notmove. It is also the case, when the absolute bit-rates required for a particular image are very low, such that the quality gain obtained by means of the globally adaptive filter does not justify the amount of the side information, which has to be transmitted. To overcome this problem, a rate-distortion optimized coder control has been implemented, which does not increase the encoder complexity.