24.06.09 AGE Spec

Animation Graphic Engine (AGE) Specification

1.General
The AGE is acomputer graphics subsystem that performs the basic steps required to render a polygon-based 3D graphics modelgenerating photo-realistic images on desktop/Laptop machines in real time.
3D graphics pipeline polygon-based 3D graphic rendering is the process of converting the geometric description of a 3D model (or a virtual world) to a two-dimensional image (a 2D array of picture elements or pixels) that can be displayed on acomputer monitor.
Each pixel represents a color value consisting of red, green, and blue (RGB) components

2.System Level Introduction

Commonly used Computer graphics subsystem Architecture consists of host CPU-based Geometry Processing and dedicated hardware accelerator for Animation and Rasterization.
The Animation Graphics Engine (AGE) is a three-dimensional (3D) Computer Graphics Hardware Accelerator, performing basic Animation operations (3D graphics manipulation) and pixel-related backend Rasterization operations of the polygon-based (Triangles) Rendering process.

The handshaking between the SW and HW is as follows:

The software is running on a host computer equipped with graphics board. The application first initializes the system by performing the following operations:

  1. Defining the object to be displayed, both its geometry and color.
  2. Performing triangulation of its surface, calculating the vertices edges and the triangles that comprise
    the object.
  3. Assigning colors to every vertex.
  4. Sending the following initial data to the USB port
  5. List of vertices comprising the triangulated object’s surface. Every vertex will have world coordinates and RGB colors in agreed formats. A special order is imposed on this order to enable pipeline processing of triangles within any frame.
  6. List of triangles given by their enclosing edges in cyclic order. The list should be synchronized with that of vertices, such that continuous progression along triangle list will impose continuous progression along vertex list.
  7. An outward normal vector to every triangle. Set to unit length by Software.
    The order of this list must match the order of triangles to enable the hardware pipeline.
  8. A box of the real world where the object exists, given by.
  9. The screen viewport where the object should be displayed, given by.
  10. A light unit vector.
  11. Background RGB colors for the frame buffer

The application then performs an animation session where the above object with its world box are moving and rotating in space according to some externally defined trajectory. The animation related data is transferred to the USB port at a rate of and it comprises of a matrix representing object’s new position, an indicator which one of the projection planes will be displayed in the screen viewport(XY = 00, XZ = 01, YZ = 11)and with scaling factors.

AGE then performs the animation at a rate of by applying hardware operations whose mathematical definitions and hardware implementationare described subsequently. The result of every animation step is the contents of a frame buffer comprising pixels associated with RGB colors each. The frame buffer data is addressed directly to the graphics board of the host.

Software and hardware pipeline implications

The so called “graphics pipeline” described herein lends itself to pipelined hardware implementation where the processing of triangles comprising displayed objects possesses a great deal of overlap. It is unnecessary to first transform all vertices into new world coordinates and only then start rasterization of first triangle. Instead, if the order of vertices stored in memory is such that “surface continuity” is maintained, where addressing next vertex in memory defines a new triangle whose two other vertices have already been addressed and transformed, one can then start pipelined processing of that triangle. This requires synchronizing the addresses of vertices and triangles in their corresponding memories to maintain this continuity of addresses. This synchronization (order of streaming vertices, triangles and triangle’s normal vectors) is the responsibility of the application software. The hardware will need to maintain a pointer of vertex memory, designating up to what address vertices have already been transformed, as the process of triangle cannot start before world coordinates of its three vertices are updated.

Major System considerations:

- Data transfer protocol:

  • Initialization data transfer order:
  1. V - Number of vertices.
  2. T - Number of triangles.
  3. Vertices (V*12B).
  4. RGB (V*3B).
  5. Triangles (T*6B).
  6. Triangle outward normal (V*12B).
  7. Real world borders - .
  8. Screen viewport - .
  9. Light unit vector.
  10. Background RGB colors
  11. Scale factors– SFx,y,z (World/Screen) , 1/SFx,y,z (Screen/World).
    Permanent Scale Factor for all Projection Planes – No Zooming!
  • Real time data transfer order:
  1. Rotation matrix.
  2. Projection plane indicator.
  • Data transfer mode:

The USB device can provide max. packet size of 1024B. Our data is 3B-based, hence we will use packet size of 1023B.

Since the data that the host sends cannot tolerate packet errors (if there is a mistake in the initialization data, it will stay for the whole process, and a mistake with the 4X4 transformation matrix will cause wrong transformation), we will use a robust mode (Bulk mode-ack/nak) here.

The data that the AGE sends to the host can tolerate a few packet errors (a few pixel errors),

Hence, we can use isochronous mode here.

- Execution efficiency – sustain required refresh rate (24 FPS) and screen image size.
- AGE die size - our AGE is core limited, and die size will be determined mostly by the memory size.

3.System Block Diagram

Fig. 1: System Block Diagram

4.AGE Top Level Introduction

The AGE contains the following Functional Blocks, necessary for Device operation:

  • Universal Serial Bus (USB) 2.0 Device Controller- used to interface the AGE to the host computer.
  • Direct Memory Access Controller (DMAC) - to control data flow to and from the AGE.
  • Local Memoriesand Registers– to store the data that we work with.
  • GPU (Graphic Processing Unit) – to execute the Rasterization and animation operations, using pipeline.

5.AGE Block Diagram

Fig. 2: AGE Block Diagram

6.Major Building Blocks
[Block definition includes functionality and interfaces]

6.1. Graphics Pipeline

The rendering pipeline and its mathematical calculations:

1.Multiplication of real world vertices by transformation matrix. This operation takes place in real world coordinates. A vertex stored in variable worldvertex memory is first converted into homogeneous representation and then multiplied by the position matrix to yield its new position in the world. The result is then stored back in variable world vertex memory, overriding the previous position. The operation involves 9 multiplications and 9 additions of 4-byte operands. The usage of a single memory for vertex coordinates implies that the hosting application will send incremental position matrices describing the position change since last animation step.

2.Multiplication of triangle outward normal vector by transformation matrix. As the object is changing position, the outward normal vectors of its triangles are changing correspondingly. This change is obtained by converting first the vector stored in variable triangle outward normal memory into homogeneous representation and then multiplied by the position matrix to yield new normal. The result is then stored back in variable triangle outward normal memory, overriding the previous normal. This operation involves 9 multiplications and 6 additions of 4-byte operands as explained below.

These two stages are independent, therefore executed concurrently. Three multipliers are used for each stage. As these stages read and write from the two memories mentioned above, Special care should be taken to synchronize these memory accesses.

3.Projecting triangle on viewing plane. Since the 3D object is projected on a 2D screen plane, the depth coordinate which is perpendicular to the projection plane is dropped.
for xyprojecting plane, yw for xz plane and xw for yz plane.

No actual work by the pipe here.

4.Convert every vertex to pixel coordinates. The projected world coordinate or (xw, zw) or (yw, zw), is converted intoa screen coordinate by the transformation. Analogous transformation applies for.
The screen coordinates thus obtained are stored in variable screen viewport vertex memory.
The scaling factor of the above transformations is vertex and triangle independent and can therefore be calculated once per session and can be performed by the Application SW and stored during initialization time in 3 local registers (XSF;YSF; ZSF).

This stage can start as soon as 1 completes transformation of one coordinate. To save memory accesses, the input here is the output of 1. The conversion of XW and YW can be executed concurrently thus requiring 2 multipliers.

5.Deciding on hidden triangles is done by observingthe outward normal vector.
It is certainly hidden from observer’s eye iffor xy projection plane, B≤0 for xz plane and A≤0 for yz plane), a case where rasterization of the triangle is ruled out, thus saving a lot of computations.

To save memory access, this stage gets its input directly from 2. The decision is made by comparing the MSB of C (or B or A) to ‘1’.

6.Computation of D (Normal) - The value of D in the plane presentation is required later for Z-depth calculations. It is obtained by where the point is taken as one of triangle’s vertices. The vectoris initially set to unit length by the application software (i.e. square root (A2 + B2 + C2) = 1). Its length is then maintained as a unit length since transformation matrix preserves vector length (we assume that in this implementation animation excludes scaling of world and perspective projections).

To save computations and memory, this stage is calculated after deciding whether the triangle is hidden or not. A,B and C will be taken from the previous stage, while X,Y and Z will be taken directly from “Vertex Memory” thus suggesting dual port memory, as stage 1 also uses this memory.

7.Scanning a triangle for rasterization. This is the most time consuming step of the pipeline.
The first step is to compute the slopes of the triangle.This computation can be performed in screen rather than world coordinate, in the expense of precisions. The advantage of division in screen coordinates is in the possibility to implement division as multiplication, avoiding the need for hardware divider. Since the range of screen coordinates is limited to 1024 or 2048 at most, all denominator fractions can be pre-calculated and stored in appropriate memory prior to starting the animation. The second step is to finds the vertex with the smallest coordinate. Two edges are emanating from this vertex, one left-upward and one right-upward, with known slopes which have been calculated before, denoted byand, respectively. Every horizontal scan line is obtained from the previous one by. New is detected versus the opposite ends of the upward-left and upward-right edges of whether it exceeds any of them, a case where one of the edges terminates and triangle’s third edge is invoked, or the scan terminates. Every scan-line extends from a leftmost to a rightmost pixel obtained as follows. Let and denote the leftmost and rightmost pixels of the scan line, respectively. They are equal initially to each other as obtained from the lowest vertex of the triangle. Let left and be the slopes of the corresponding edges, respectively. Then, where round_int is a rounding operation to nearest integer. Similarly.

This stage executes only on triangles that pass the “hidden triangles” test. The rasterization consists of variable number of comparisons, increments and additions/subtractions, depending on the triangle and the number of pixels inside one. This stage will get the triangles to work on from stage 5. As stage 5 finishes (probably) testing a different triangle before the rasterization of one triangle is finished, a small FIFO buffer will be used to store the triangles indices.

8.Decide on pixel visibility. This is accomplished with the aid of a Z-buffer. Initially, the content of the Z-buffer is reset, per frame, to store the smallest integer (in 2’s complement representations of 32-bit fixed point numbers). Then every pixel in its turn is looked for the real world coordinate corresponding to that pixel. If its value is smaller than the value found in Z-buffer (hence it is closer to viewer’s eye), the color calculation of that pixel is progressing. In addition, the depth value of that pixel gets update. Otherwise, the pixel is ignored and next pixel is considered. The calculation of Z-value is made by first translating the pair of the given pixel into by the transformation, which is the inverse of the transformation used formerly to convert vertices from world to screen coordinates.
Is obtained analogously. Notice that the scale factor is invariant along the entire computation of the session, hence can be calculated once and stored in a register. Once is known, its depth in the real world is obtained from the plane equation, yielding.
Coefficients are stored in variable triangle outward normal memory. Notice that the coefficients are invariant for the entire triangle rasterization. Therefore, in order to avoid unnecessary memory accesses, the coefficients should be stored in registers. The result of this depth test is either an update of both the nearest -coordinate and the triangle which implied it, or just ignoring the update if the found deeper than the nearest so far.

This stage gets its pixels input from the previous stage.
To save access to “Normal memory”, this part will receive the A,B,C,D coefficients from stages 5 and 6.A problem might occur if 5 and 6 finished working on the next triangle, before this stage finished, thus sending the wrong coefficients. A possible solution is to use a small FIFO buffer to store the coefficients that 6 sends. The same problem occurs with the pixels received from previous stage, as the next pixel will be sent before the current pixel computation is over. Again, the solution is to use a small FIFO buffer to store the pixels.

9.Setting pixel’s color. This operation is executed only once per pixel, according to the nearest triangle covered that pixel, whose index is found in the Z-buffer memory. Notice that this mode of operation excludes setting pixel’s colors from the hardware pipeline, since all triangles must be processed first in order to know which of them is the nearest to viewer’s eye at that pixel. This operation could be added to the pipeline, but the number of pixel color calculation will be more than doubled, where more than half of which are unnecessary. A pixel is assigned with nominal RGB values derived from those exist in the vertices of the triangle it belongs to by interpolation over its three vertices. Once RGB values have been set, further account of object’s surface curvature takes place by multiplying the RGB with the factor () LxN where is the triangle’s outward normal vector and is a unit light vector pointing to the viewer (perpendicular to the screen). Notice that this dot product is fixed for the entire triangle, but because pixel’s color setting is excluded from hardware’s pipeline, it is recalculated for every pixel. This is an overhead in case, but gets smaller as objects get more and more complex. The overhead could be avoided by calculating the above lighting coefficient as a part of hardware’s pipeline and storing it in a dedicated memory.

As this stage starts after all the triangles have been processed, there is no danger of memory accesses collisions here.

10.Writing a pixel into frame buffer. The RGB values obtained for the above pixel are written into a frame buffer that is eventually sent to the USB port for display on host’s screen. This takes place at the rate of. At every animation step (or once per session?) the frame is first filled by a background color as defined by the host application. It is then filled pixel by pixel as a result of the above color calculation. Once filled, the frame buffer is flushed out to the USB port.

6.2.Internal Memories
(Type, Size, organization, access, etc.)

Notations:

V-number of vertices

T-number of triangles

D- Size of frame buffer

6.2.1Variable vertex memory:

  • Dual port memory
  • Word size: 4B
  • Memory size: V*12B
  • In:

Port 1 Addr lines –≥3log2V bit

Port 2 Addr lines – ≥3log2V bit

Port 1dataV_in – 4B
Port 1 R/nW Signal
Port 2 R/nW Signal

  • Out:

Port 1 DataV_out-3B
Port 2 Data V_out-3B

6.2.2Permanent RGB memory:

  • Word size: 3B
  • Memory size: V*3B
  • In:

AddrRGB lines ≥log2V

DataRGB_in-3B
R/nW Signal

  • Out:

DataRGB_out-3B

6.2.3Permanent triangle memory:

  • Word size: 6B
  • Memory size: T*6B
  • In:

AddrT lines ≥ log2T

DataT_in-6B
R/nW Signal

  • Out:

DataT_out-2B

6.2.4Variable triangle outward normal memory:

  • Word size: 4B
  • Memory size: T*12B (An outward normal vector to every triangle).
  • In:

AddrN lines ≥3log2T

DataN_in-19B
R/nW Signal

  • out:

DataN_out-16B

6.2.5Variable screen viewport vertex memory:

  • Word size: 4B
  • Memory size: V*4B
  • In:

AddrSV lines ≥ log2V

DataSV_in-4B
R/nW Signal

  • Out:

DataSV_out-4B

6.2.6Permanent screen coordinate reciprocal memory:

ROM (Read Only Memory) Word size: 12bit

  • Memory size: Width of screen in pixels (Sw)*12bit
  • In:

AddrCR lines ≥ log2Sw
R Signal

  • Out:

DataCR_out-2B

6.2.7Variable Z-buffer (depth buffer) memory:

  • Word size: 4B
  • Memory size: D*4B
  • In:

AddrZ lines ≥ log2D

DataZ_in-4B
R/nW Signal

  • Out:

DataZ_out-4B

6.2.8Variable Display (frame buffer) memory:

  • Word size: 3B
  • Memory size: D*3B
  • In:

AddrFB lines ≥ log2D

DataFB_in-3B
R/nW Signal

  • Out:

DataFB_out-3B

6.2.9Registers

TM - Transformation Matrix (4X4) Word size –4DW (16B). 3 Words (t11-t14, t21-t24, t31-t34)

XWmax – 1DW
XWmin – 1DW
YWmax – 1DW
YWmin – 1DW
ZWmax – 1DW
ZWmin – 1DW
XSmax – 1DW
XSmin – 1DW
YSmax – 1DW
YSmin – 1DW
XSF – 1DW [(XSmax-XSmin)/(XWmax-XWmin)]
YSF – 1DW [(YSmax-YSmin)/(YWmax-YWmin)]
1/XSF – 1DW
1/YSF – 1DW
LV (Light Vector) [lx, ly, lz] – 3DW
PP (Projection Plane) – 1B [ XY = 00, XZ = 01, YZ = 10]

Background_RGB - 3B

V – (log2V)*1b

T - (log2T)*1b

D - (log2D)*1b

6.2.10Variable vertex memory Latch:

In:

Din (DW),

Clk,

En.

Out:

DoutZ(DW),

DoutY(DW),

DoutX(DW).

6.2.11Variable vertex memory Counter: