Homework 5 Solution (Motion Estimation for Video Coding)

EE4414 Multimedia Communication System II, Fall 2005, Yao Wang

Homework 5 Solution (Motion estimation for video coding)

______

Describe the major steps of EBMA algorithms with integer-pel accuracy, either in the form of a pseudo C-code, or matlab code, or flow graph.

Pseudo code in MATLAB style:

%frame size: fheight,fwidth

%block size: bheight, bwidth

%search range: hsrange, vsrange

%vMV_field, hMV_field: store the vertical and horizontal MVs

MV_field_height=floor(fheight/bheight);

MV_field_width=floor(fwidth/bwidth);

vMV_field=zeros(MV_field_height,MV_field_width);

hMV_field=zeros(MV_field_height,MV_field_width);

MAX_SAD=255*bheight*bwidth;

%for every block in the anchor frame, left corner at (r,c)

for (r=1:bheight:fheight)

for (c=1:bwidth:fwidth)

%initialize

SAD_min=MAX_SAD;vMV=0,hMV=0;

anchor_block=anchor_frame(r:r+bheight-1,c:c+bwidth-1);

%for every candidate_block_position_in_search_range,

%left corner at (sr,sc)

%increment by 1 pixel to left or down

for (sr=r-vsrange:1:r+vsrange)

for (sc=c-hsrange:1:c+hsrange)

candidate_block=

target_frame(sr:1:sr+bheight-1,sc:1:sc+bwidth-1);

SAD=calculate_SAD(anchor_block,candidate_block);

if SAD<SAD_min

SAD_min=SAD;

vMV=sr-r; hMV=sc-c;

end;

blk_r=floor(r/bheight);blk_c=floor(c/bwidth);

vMV_field(blk_r,blk_c)=vMV; hMV_field(blk_r,blk_c)=hMV;

predicted_frame(r:r+bheight-1,c:c+bwidth-1)=candidate_block;

end;

For a video of 30 frame/second, 352x288/frame, what is the number of operations needed per second to accomplish EBMA if we use block size of 16x16, search range of –16 to 16? (count one subtraction and taking absolute value, and sum of two numbers as one operation). Does the total operation count depend on the block size?

For each anchor block, one must do comparison with 33x33 candidate blocks in the search range. Each comparison consists of pixel-wise subtracting,taking absolute value and sum for 16x16 pixels, with 16^2 operations per candidate. Since we need to do 33^2 comparisons for each anchor block, the total operation is 33^2*16^2 per anchor block

There are 352/16 x 288./16 in each frame, hence total operation per frame is 352x288/(16^2)*33^2*16^2=352x288x33^2 operations/frame

Since there are 30 frame/second, the number of operations per second is 352x288x33^2x30=3.3 x10^9 operations.

As can be seen from the above derivation, the total operation number does not depend on block size. Although the operation for each block linearly depends on the block size, the number of blocks is inversely proportional to the block size. Overall, the impact of the block size is cancelled. It should be noted however, that the block size does affect the accuracy of motion estimation. A smaller block size will lead to smaller prediction errors. But then more motion vectors must be transmitted.

Describe the major steps of EBMA algorithms with half-pel accuracy, either in the form of a pseudo C-code, or matlab code, or flow graph.

Pseudo code in MATLAB style: (only those in “red” are changes from integer-pel EBMA)

%frame size: fheight,fwidth

%block size: bheight, bwidth

%search range: hsrange, vsrange

%vMV_field, hMV_field: store the vertical and horizontal MVs

MV_field_height=floor(fheight/bheight);

MV_field_width=floor(fwidth/bwidth);

vMV_field=zeros(MV_field_height,MV_field_width);

hMV_field=zeros(MV_field_height,MV_field_width);

MAX_SAD=255*bheight*bwidth;

%first upsample the target frame using bilinear filtering

up_target_frame=imresize(target_frame, 2, ‘bilinear’);

%for every block in the anchor frame, left corner at (r,c)

for (r=1:bheight:fheight)

for (c=1:bwidth:fwidth)

%initialize

SAD_min=MAX_SAD;vMV=0,hMV=0;

anchor_block=anchor_frame(r:r+bheight-1,c:c+bwidth-1);

%for every candidate_block in_search_range,

%left corner at (sr,sc)

%increment by 0.5 pixel to left or down

for (sr=r-vsrange:0.5:r+vsrange)

for (sc=c-hsrange:0.5:c+hsrange)

candidate_block=

up_target_frame(2*sr:2:2*sr+2*bheight-1,2*sc:2:2*sc+2*bwidth-1);

%need to take every other samples from the %up_target_frame

SAD=calculate_SAD(anchor_block,candidate_block);

if SAD<SAD_min

SAD_min=SAD;

vMV=sr-r; hMV=sc-c;

end;

blk_r=floor(r/bheight);blk_c=floor(c/bwidth);

vMV_field(blk_r,blk_c)=vMV; hMV_field(blk_r,blk_c)=hMV;

predicted_frame(r:r+bheight-1,c:c+bwidth-1)=candidate_block;

end;

What is the pros and cons of half-pel EBMA compared to integer-pel EBMA?

Half-pel EBMA can yield more accurate motion field and prediction, but at more than 4 times of the computation, because there are 4 times more candidate blocks to compare with for each block in the anchor frame.

What will be the operation counts for half-pel accuracy EBMA for the same parameter setting of Problem 2?

4 times if we don’t count the operation needed for the initial up-sampling of the target frame, extra memory access cost in skipping pixels when extracting the candidate blocks from the up-sampled target frame.

What is motion-compensated temporal prediction? How is it used in video coding? How does it reduce the bit rate needed to code a video, compared to code each frame directly?

To predict a pixel in a current frame, instead of using the pixel in the reference frame that has the same spatial coordinate, motion-compensated temporal prediction finds the pixel in the target frame that corresponds to the same object point as the current pixel by motion estimation. In video coding, this process is done at block-level, i.e., for each macroblock in the current frame, a best matching block in the reference frame is determined and used to predict the current block. The prediction error block is then coded using DCT-based method. If the prediction is fairly accurate, the error block will consist of mostly zero pixels or small-valued pixels that will be quantized to zeros. Such blocks of mostly zeros require fewer bits to code than the original blocks, in which the pixels typically have large dynamic ranges.

How would you apply EBMA to accomplish bi-directional motion compensated prediction?

A simple way is to first use EBMA on a block in the current frame and the previous reference frame, to produce one best matching block. Then EBMA is applied to a block in the current frame and the following reference frame, to produce another best matching block. The average of the two best matching blocks is then used as the prediction for the current block. Sometimes, if one prediction error is much larger than the other, the best matching block with the smaller prediction error alone is used.