EE552 Project:Introduction to Micropipeline

By:On Wa Yeung

December 2002

(858)527-0919

I alone prepared and wrote this project. I received no help from anyone else. The material is not copied or paraphrased from any other source except where specially indicated. I grant my permission for this project to be placed on the course homepage during future semesters. I understand that I could receive an F for the course retroactively, even after graduation, if this work is later found to be plagiarized.

1.Introduction

Micropipeline is an event driven asynchronous circuits design style. Blocks communicate with each other with a bundled data interface (Figure 1). When the data is ready, the sender generates a request and holds the data. When the receiver finished processing, it triggers the acknowledge line to signal that the data are free to change. For correctness of the operation, the request must arrive at the receiver later than the data. In other words, delays may be required on the request line to compensate for the data processing delay.

Figure 1: Bundled Data Interface

The signaling protocol can either be 2 phase (Figure 2) or 4 phase (Figure 3).

Figure 2: 2 Phase Signaling Protocol

In a two phase protocol, the Request (Rin) and Acknowledge (Ain) signals do not return to zero after each transaction. The interface does not differentiate between the 1->0 and 0->1 transitions. It is the relative state between Rin and Ain that differentiate the internal state of the pipeline. A circuit using this protocol requires a 2 phase to 4 phase converter to interface with traditional latches which are level sensitive.

In a 4 phase protocol, the request (Rin) and Acknowledge (Ain) return to zero in every cycle.

Figure 3: 4 Phase Signaling Protocol

2.Building Blocks of Micropipelines

Figure 4: Event Logic Modules of Micropipelines

Figure 4 shows the common constructs used in a micropipeline design [3].

Figure 5: 2 Phase to 4 Phase Converter

Figure 5 shows a 2 phase to 4 phase converter that interfaces with a level sensitive latch using the above modules [2]. Initially, all outputs are low and the latch is transparent. Rin then rises and the C element follows. The toggling are XORed and sent to the Toggle block. Toggle then steers the transition to Ain and Rout, signaling the data arrival. At the same time, the latch is closed. After the data are consumed, Aout rises to 1. The latch are now transparent again and toggle steers 1 to the C element inverting input, setting up for the Rin falling cycle.

Sutherland [3] showed a circular buffer FIFO control using the above blocks. Interestingly, it is a real circuit even though it appears to be just like a block diagram. It is reproduced in Figure 6.

Figure 6: Circular FIFO Control Logic Using the Event Blocks

The storage element can be implemented in many ways. One of the earliest forms is a capture pass element, proposed by Sutherland in his Turing Award lecture [3].

Figure 7: Capture Pass Element

After initialization, the storage is empty and the “Capture” and “Pass” inputs are on the opposite direction, forming a transparent path between the input and output. When the new input arrives, “Capture” toggles, placing it at same polarity with “Pass”, the output then forms a close loop, holding the value it captures. When the processing is completed, “Pass” is toggled, placing at the same polarity as “Capture” again, and the output becomes transparent again.

A CMOS implementation is shown in Figure 8[1].

Figure 8: CMOS Capture Pass Element Implementation

3.Micropipeline Implementation.

Figure 9 shows a 3 stage micropipeline [3]. The inverter on the 2nd input of the C-element input ensures that the element is transparent when the 2 inputs differ.

Figure 9: 3 Stage Micropipeline

After initialization, all C-element output returns to 0, the pipeline is at the empty state.

When data arrives,

Rin  1  C1  1; then

Data is captured and held at the 1st stage

Ain  1

Then

R1  1 C2  1; then

Data is captured and held at the 2nd stage

A1  1

Then

Stage 1 storage become transparent again

C1 is ready to respond to Rin  0.

The rest of the stages propagate in similar manner.

The pipeline will be stalled if all stages are filled and Aout does not toggle. In this case, Aout and R2 are with the same value, A2 will not toggled again until Aout changes its state.

The delays are inserted on the R1, R2 and Rout generation to match the delay on the data processing blocks so that the bundled data condition is satisfied. A micropipeline design is speed independent only on the block level scale, but may not be speed independent within the block.

4.Improvement on the Micropipeline

Sutherland et al. proposed using GasP instead of the C-element as the rendezvous element [4]. The structure [4] is reproduced in Figure 10.

Figure 10: GasP with Self Resetting NAND

The circuit is divided into PATH and PLACE sections. PLACE holds the data and PATH controls the data flow between PLACEs. A PLACE can either be full or empty. A PATH will be active only when the predecessor PLACE is full and the successor PLACE is empty. The empty and full state are stored in the “state conductor”.

After initialization, all PLACES are empty. When new data arrives, the following sequence happens.

Inverter cc turns on momentarily and data is latched.

Inverter c and N type transistor d turns on, the successor PLACE becomes full (0).

P type transistor y turns on, predecessor PLACE becomes empty (1).

After some delay, inverters r and s and P type transistor t resets the NAND gate.

The data keep propagating in this manner.

Contrary to Figure 9, the forward latency (4 gate delays) in this design is longer than the reverse latency (2 gate delays). In fact, this is more desirable since in very fast asynchronous circuits, it takes more time for moving data forward which requires changing the latch output. The circuit operates in pulse mode and achieves the speed of 3 inverter ring oscillator [4].

Day and Woods [1] proposed using the conventional pass transistor transparent latch instead of using the conventional capture pass latch to reduce the area and total gate capacitance (Figure 11). Compared with Figure 8, the conventional latch only used 1 transmission gates instead of 4 as in the capture pass latch. However, it is level sensitive, thus requires 2 phase to 4 phase converter and each transaction requires 2 state changes.

Figure 11: Conventional Pass Transistor Transparent Latch

Kenneth Y. Yun et al. [2] proposed using a dual edge triggered D flip flop (DETDFF) as the storage element to improve the speed. Compared with the simple transparent latch, DETDFFs are faster and do not require any 2-phase to 4-phase converters to interface with. However, the design requires more areas and consumes more power. The structure is shown in Figure 12[2].

Figure 12: 2 Phase Micropipeline with DETDFF

In the same paper, Kenneth Y. Yun et al. also proposed using a 4 phase extended burst mode approach to improve the speed. It is shown in Figure 13[2]. The timing specification is shown in and Figure 15[2]. The CMOS implementation is shown in Figure 16[2].

As shown in Figure 13, the circuit has 6 external signal transitions within each loop. As a comparison, the traditional 2 phase design only have 4 transitions: Rin  1, Rout  1, Aout  1 and Ain  1. However, by eliminating the signal transition concurrency, the control circuit is simplified and the overall speed increases.

Figure 13: 4 Phase Circuit

Figure 14: 4 Phase Pipeline Controller Specification

Figure 15: 4 Phase Pipeline Timing

Figure 16: 4 Phase Circuit Implementation

5.Conclusion

This paper gives an overview to the general ideas of micropipeline design. Basic event driven building blocks are introduced with design examples. The micropipeline implementation is then discussed. The paper concludes by presenting various speed improvement approaches, e.g. using GasP as rendezvous element and 4 phase extended burst mode control circuits.

Reference:

[1]Paul Day and J. Viv. Woods. Investigation into Micropipeline Latch Design Style. IEEE Transactions on VLSI Systems, 3(2), June 1995. Class handout.

[2]K. Y. Yun, P. A. Beerel, and J. Arceo. High-Performance Asynchrnous Pipeline Circuits, ASYNC-96, pp 17-28, April 1996.

[3]Ivan E. Sutherland. Micropipelines. Communications of the ACM, 32(6):720-738, June 1989.

[4]Ivan E. Sutherland and Scott Fairbanks. GasP: A Minimal FIFO Control. Proceedings of the Seventh International Symposium on Advanced Research in AsynchronousCircuits and Systems, 2001. pp 46-53.

Page 1 of 1