Multi-resolutionTexture Synthesis on Cell/B.E

Tong Wei, Deng Shien, Pan Gang, Che Ming, Zhang Jiawan, Yu Ce

School of Computer Science and Technology, Tianjin University

,

1.  Introduction

Texture synthesis algorithm based on samples uses a smaller size of texture sample to synthesize large size of texture graphics. It can be used to resolve the problems include slot, distort and deformation that appear in the traditional texture synthesis algorithm such as texture mapping, process texture synthesis. This algorithm has wide application potential in mass scenes production, realistic graphics drawing, computer animation and synthesis of games background. In addition, texture synthesis algorithm based on samples can be used to repair and fill the texture graphics which have defects and loopholes, compress the mass texture graphics to samples, generate special effects texture-such as graded texture based on different texture samples, and texture synthesis of the three-dimensional surface.

To synthesize a pixel, the algorithm of texture synthesis based on pixel searches every pixel of the sample image. So there is tremendous calculation and frequent memory access. In the code, we use double buffering to make the computation and data transfer run concurrently. The output picture has been divided into 8 blocks, each SPE deal with one block independently. So the synthesized picture is not smooth at the boundary of two adjoining blocks. To solve this, we refer to the minimum error boundary, which is a concept of synthesis based on block [1].

2.  Algorithm of Multi-resolutionTexture Synthesis

The process of the serial texture synthesis algorithm:

Get input texture sample image Iin and a white random noise Iout;

Get point P0(x, y)’s L-neighborhood N0(x, y) in a raster scan ordering from output image Iout;

Find a pixel P1(x’, y’) from the input sample image ,whose L-neighborhood is most liked the L-neighborhood N0(x, y);

Set P1(x’, y’) to P0(x, y) ;

Repeat upper steps until Iout complete;

The multi-resolution synthesis algorithm proceeds as follows. Two Gaussian pyramids, Gin and Gout, are first built from Iin and Iout, respectively. The algorithm then transforms Gout from lower to higher resolutions, such that each higher resolution level is constructed from the already synthesized lower resolution levels. This is similar to the sequence in which a picture is painted: long and thick strokes are placed first, and details are then added. Within each output pyramid level Gout (L), the pixels are synthesized in a way similar to the single resolution case where the pixels are assigned in a raster scan ordering. The only modification is that for the multi-resolution case, each neighborhood N(p) contains pixels in the current resolution as well as those in the lower resolutions. The similarity between two multi-resolution neighborhoods is measured by computing the sum of the squared distance of all pixels within them. These lower resolution pixels constrain the synthesis process so that the added high frequency details will be consistent with the already synthesized low frequency structures.

3.  Parallelized algorithm for Cell/B.E

(1) Vector of color:

The color of every pixel is 24 bit, the first byte is red, the second is green, and the last is blue. But in Cell one vector contains 128-bit, so we add one byte for every pixel. All bits of the additional byte are 0. Now every pixel consists of four bytes, and there are four pixels in one vector, as showed in figure 1.

Figure 1: Color vector of every pixel

(2)PPE program:

The process of program:

Read the sample picture;

Initiate the output picture, the color of every pixel is random;

Create the input and output multi-layer pyramid

Create transfer block “ctx”, which contains the information for transfer between SPE and PPE;

Create the spe thread;

for(the layers of output pyramid) {

Wait for the signal that all SPEs finished the current layer image;

Send signal to all SPEs to start the synthesis of next layers ;}

Wait for all SPEs finished the last layer;

Synthesize the overlap area of every two blocks;

Write the output picture to disk;

End.

Firstly, PPE reads sample picture, creates the input multi-layer image pyramid. Secondly, PPE initiates the output picture (the color of every pixel is initiated randomly), and create the output pyramid. Thirdly, PPE creates 8 SPEs to synthesize every layers of the output pyramid parallel. The image of every layer in the output pyramid has been divided into 8 blocks, which are distributed to the 8 SPEs (figure 2). Every SPE synthesizes a part of the image, so PPE uses “mbox” to synchronize the 8 SPEs. When SPE finished its block, it sends a signal to PPE through mbox. When PPE receive all SPEs’ signals, it sends signal to SPEs to start the next layer synthesis. After SPEs finished the last layer image, PPE synthesizes the overlap area between every two adjoining blocks. Finally, PPE writes the output picture to disk.

Figure 2: The block of output picture for each SPE.

Because every SPE synthesizes the block independently, there will be some gaps at the boundary area between two adjoining blocks. To resolve this, we refer to the minimum error boundary in image quilting algorithm [1]. First, every block SPE synthesizes two more blocks at the boundary to the adjoining blocks. Then PPE calculates a new boundary on the overlap area. If two pixels from the overlap areas of block B1 and B2 are at the same position of the minimum error path, the difference between them is the least. Finally, the overlap area which is at the B1 side of the minimum error path will be filled by block B1, and the other side of the minimum error path will be filled with the pixels from B2 (Figure 3).

Figure 3: adjoining blocks boundary area synthesis.

(3) SPE program:

The process of program:

Read the transfer block “ctx” from main storage;

According to the data in the “ctx”, read the input and output pyramids which only contain the points of the images, not the data of colors of pixels;

Calculate the start position and the size of the block which belongs to this SPE;

Transfer the address of “numLevels-1” layer image of output pyramid to output pyramid loop queue;

L= numLevels-1; //”L” means the layer number, No. “numLevels-1” layer is the smallest

//image layer in the pyramid

while(“L” is not 0 )//double buffering

{

Transfer the address of L-1 layer image of input pyramid to input pyramid loop queue;

Transfer the address of L-1 layer image of output pyramid to output pyramid loop queue;

Wait for the L layer transfer finish

row=0; //”row” means the row of the pixels

Transfer the No.0 row pixel of L layer in output pyramid to output pyramid loop queue(oCB);

while (No.” row” row is not the last row of the image)//double buffering

{

Transfer the No.”row+1” row pixel of L layer in output pyramid to output pyramid loop queue(oCB);

Wait for transfer of the No. “row” row pixels finish;

foreach( pixel∈the pixels in the No. “row” row)

oCB(pixel)=FindBestMatch(oCB,L,pixel);//search the best march pixel in

//the input pyramid;

Transfer the row of best march pixels back to main storage

}

Synthesize the last row pixels;

Send finish signal to PPE;

L=L-1;

}

Synthesize the last layer of output pyramid;

Return

The process of function FindBestMatch(oCB,L,pixel) is:

row=0; //”row” means the row of the pixels

NX=BuildNeighborHood(oCB,pixel); //to built the L-Neighborhood

NBest=NULL;

Load the No.0 row pixels of “L” layer in the input pyramid to input pyramid loop queue(iCB);

while(“row” is not the last row of the image )

{

Load the No.”row+1” row pixels to input pyramid loop queue(iCB);

wait for the transfer of the No. “row” row pixels finish;

foreach( p∈the No. “row” row)

{

NZ=BuildNeighborHood(iCB,p);//to built the L-Neighborhood

if(Match(NZ,NX)<Match(NBest,NX))

{ NBest=NZ; Color=GetColor(p); }

}

}

Synthesize the last row pixels;

return Color;

In the SPE code, we use double buffering. In serial algorithm, to synthesize one pixel needs to search all the pixels of the input pyramid, and calculate the distance of two pixels. The data of picture’s colors is tremendous, but the SPE’s local storage is only 256 KB. SPE has to firstly transfer a part of the image by DMA, calculate it, and transfer the result back to the main storage. So we use double buffering in the code, let SPE calculates a part of the image, at the same time, MFC is transferring the next part.

SPE firstly transfer the element of “ctx”--transfer block, according to the second parameter of the main function. Then SPE transfer the input and output pyramids which only contains the width, height and address of the image of every layer in the pyramid according to the “ctx”. The next step, SPE calculates the start position and size of the block to be synthesized of every layer image in the output pyramid. When the data is ready, from the top of the output pyramid (small picture to large one) SPE synthesizes image in every layer with double buffering and loop queue.

4.  Results and Performance Analysis

(1) The speed contrast between serial and parallel

Time
(microseconds)
Image Size / Parallel without block edges smoothed (A) / Parallel with block edges smoothed (B) / Serial on x86
(C) / Serial on PPE
(D)* / Serial with one SPE (E)
128*128 / 113,790,825 / 164,216,575 / 2,658,539,612 / 3,711,235,764 / 908,969,929
144*144 / 144,040,022 / 200,724,468 / 3,249,120,915 / 4,841,895,675 / 1,150,856,609
160*160 / 177,791,017 / 241,013,893 / 3,983,156,475 / 5,496,580,292 / 1,421,306,305
176*176 / 215,201,065 / 284,627,833 / 4,848,789,256 / 6,572,517,913 / 1,720,285,671
192*192 / 256,036,229 / 332,016,972 / 5,720,681,972 / 2,047,720,120
208*208 / 300,508,022 / 382,727,163 / 6,690,790,606 / 2,403,738,278
224*224 / 348,428,683 / 437,211,189 / 7,760,963,585 / 2,788,222,763
240*240 / 400,156,322 / 495,108,847 / 8,870,850,712 / 3,201,627,169
256*256 / 455,155,449 / 556,749,036 / 10,065,607,232 / 3,571,553,194

*When the image size exceeds 192*192, the running time of D is too long to time.

Table 1: Running time of programs.

Chart 1: Curve of the running time.

The time data of A, B, D and E measures from one node of the Cell BladeCenter QS20, the time of C measures from PC with 2.4GHz processor.

In the data table and chart, parallel program is almost 30 times as the serial program, and the program using one SPE is 4 times as the serial program. There are two reasons for the phenomenon. Firstly, the serial programs whether x86 or PPE, do not use the “vector”. Because of the vector, PPE and SPE can process four pixels at one time. So the speed of serial program which uses one SPE is 3 times more then program on X86 or on PPE. Secondly, there are 8 SPEs are synthesizing the image parallel in the parallel program. So the speed of parallel program is almost 30 times more then the serial programs without vector, and nearly 8 times as the speed of the program using one SPE. The data B is more then A, because the program of B synthesizes some more overlap blocks, and at last the PPE has to synthesizes them.

(2) The synthesized image contrast between serial and parallel.

Serial Parallel

Figure 4: The serial and parallel results.

The parallel algorithm on Cell is Suitable for the random sample, because it based on pixel not block. If the sample is more random, the picture been synthesized by parallel program almost is the same as serial. Because every SPE synthesizes a part of the image independently, there are some gaps at the area around the boundaries between two adjoining blocks. So we use a method in the “Image Quilting” algorithm to solve this [1].

(3) The images which have been smoothed the area of the boundary between two adjoining blocks and the images which are not.

Sample Smoothed Without smoothed

(a) (b) (c)

Figure 5: The parallel results with block edges smoothed and without smoothed. (a) Original picture, (b) Synthesized picture with block edges smoothed, (c) Synthesized picture without smoothed.

The modified code can improve the synthesis of the texture, but there are still a few gaps in the picture. So next step we want to improve the code to make the picture smoother.

5.  References

[1] Alexei A. Efros and William T. Freeman. Image Quilting for Texture Synthesis and Transfer. In Proceedings of ACM SIGGRAPH 2001, 341-346. 2001.

[2] IBM. Cell Broadband Engine Programming Tutorial. June 15, 2006

[3] IBM. C/C++ Language Extensions for Cell Broadband Engine Architecture, March 8, 2007

[4] IBM. Cell Broadband Engine Programming Handbook, April 19, 2006

[5] Alexei Efros and Thomas Leung. Texture synthesis by non-parametric sampling. In: International Conference on Computer Vision, 1999-09. Volume 2, 1033–1041

[6] Li-Yi Wei, Marc Levoy. Fast texture synthesis using tree-structured vector quantization. In: Proceedings of SIGGRAPH 2000, 2000. 479-488

[7] Michael Ashikhmin. Synthesizing natural textures. In: 2001 ACM Symposium on Interactive 3D Graphics, 2001. 217~216

[8] Vivek Kwatra, Irfan E A. Texture Optimization for Example-based Synthesis. In: ACM Transactions on Graphics, SIGGRAPH 2005, 2005-07. 24(3), 795-802

[9] J. S. De Bonet. Multiresolution sampling procedure for analysis and synthesis of texture images. In T. Whitted, editor, SIGGRAPH 97 Conference Proceedings, Annual Conference Series, pages 361–368. ACM SIGGRAPH, AddisonWesley, Aug. 1997.

[10] D. J. Heeger and J. R. Bergen. Pyramid-Based texture analysis/synthesis. In R. Cook, editor, SIGGRAPH 95 Conference Proceedings, Annual Conference Series, pages 229–238. ACM SIGGRAPH, AddisonWesley, Aug. 1995.