EE457 Pipeline – list of important points discussed in the form of “fill-in the blanks” Oct 16, 2014, Oct 13, 2016

A.  Pipeline introduction:

1.  The ______(Single-cycle / Multi-cycle) DPU can be used as the starting point for designing the pipelined DPU. We ______(can / can’t) allow sharing of resources such as the ALU across stages meaning across different instructions at the same time.

2.  The ______(Single-cycle / Multi-cycle) CU (control unit) is used as part of the “Data Stationary Method of Control” to control the pipelined DPU.

B.  Data Dependencies in Pipeline:

1.  Spurious stalls ______(A / B)
A. will reduce performance but you will still be able to produce the correct result when you run a program
B. will cause your program produce wrong results

2.  Spurious forwarding from a nop or store-word instruction ______(A / B)
A. will reduce performance but you will still be able to produce the correct result when you run a program
B. will cause your program produce wrong results

3.  It is ______(OK / not OK) to forward to instructions who are not interested in forwarding help (for example to jump instruction or into Rt for lw even though Rt is not lw’s source register).

4.  Whenever possible, we should prefer to do ______(forwarding / stalling) to solve data-dependency problem over ______(forwarding / stalling) as it saves clocks.

5.  To detect the need/opportunity to ______(stall / forward / stall or forward), we use ______(5-bit / 32-bit) comparators to compare the ______
(register IDs / register contents) of the ______(source / destination) registers of the dependent junior instruction with the ______(source / destination) registers of the senior instructions. We qualify/validate such match with the ______(A / B / C)
A. intent to write to the destination by the senior instructions
B. intent to read the sources by the junior instructions
C. both the above

6.  Help from the nearer senior has ______(higher/lower) priority compared to the help from the farther senior.

7.  While the internally forwarding register file (I.F.R.F) takes care of the ______
(EX-Hazard / MEM-Hazard / WB-Hazard) the FU in EX stage takes care of the remaining two hazards.
Priority between the forwarding due to the I.F.R.F and the forwarding due to FU is ______(A/B).
A. Implemented explicitly B. Solved through natural ordering of the two forwarding.

C.  Control Hazards in pipeline:

1.  Late branch incurs a branch penalty of ______(1 / 2 / 3) clocks in the case of a/an ______(unsuccessful / successful / all) branches.
Early branch incurs a branch penalty of ______(1 / 2 / 3) clocks in the case of a/an ______(unsuccessful / successful / all) branches.

2.  Cost of forwarding is more in the case of the ______(late / early) branch implementation.
Cost of hazard detection and stalling is more in the case of the ______(late / early) branch implementation.

3.  You are likely to see more stalls in the case of the ______(late / early) branch implementation.
You are likely to see more flushes in the case of the ______(late / early) branch implementation.

4.  Each of the two statements could be correct for some program and some compiler, but do you think the text book design assumes A or B? ______(A / B).
A. Even though early branch implementation requires us to consider more hazard detection hardware, it is not true that these stalls occur so frequently as to outweigh the advantage of the early branch. The compiler reduces (if not eliminates) these dependency situations, so that the net effect of the early branch produces an improvement in performance compared to the medium delay branch (the one that executes branch from the EX stage).
B. Because of the increased stalls due to more dependency problems, faced by Beq (who insists that all help is brought to him when he is in the ID stage), in most programs, the performance of the early branch is actually lower than the performance of the medium-delay branch.

5.  There is a conflict between HDU and successful branch in the case of ______(A / B / C / A and B / other).
A. Late branch
B. Medium-delay Branch
C. Early Branch

6.  In Early Branch, the ______(FU_Br / HDU_Br) logic can be simplified so that he can consider only (EX/MEM.RegWrite) instead of [EX/MEM.RegWrite & (not (EX/MEM.MemRead))] , because ______(FU_Br / HDU_Br) make sure that no harm is caused because of the above simplification.

7.  In Late Branch and in Early Branch, the ______(FU / HDU) logic can be simplified so that he can consider only (EX/MEM.RegWrite) instead of [EX/MEM.RegWrite & (not (EX/MEM.MemRead))], because ______(FU / HDU) makes sure that no harm is caused because of the above simplification. (by separating the lw instruction and the very next instruction, which is dependent on it by 1 clock (1 stage)) .

8.  If FU or FU_Br logic is simplified so that it does not care to check the senior instruction’s intent to write to a register file, he ______(cannot / can) be saved by HDU or HDU_Br.

9.  EX/MEM.MemRead is used as one of the conditions for stalling in in the case of ______(A / B / C / A and B / other).
A. Late branch
B. Medium-delay Branch
C. Early Branch

10.  In this question and the next couple of questions related to Lab 6 Part 5, do not count the 5-input OR gates (ORing the 5-bit ID of the register in question to confirm that it is not the $0) as 5-bit comparison units.
In Lab 6 Part 5, before the optimization, there were
____ 5-bit comparators in the HDU_Br,
____ 5-bit comparators in the HDU,
____ 5-bit comparators in the FU_Br,
____ 5-bit comparators in the FU.
After opening the 4 boxes up (to avoid loss due compartmentalization) and further deciding to do all comparisons in the ID stage only (to avoid duplications further), we needed only altogether ______5-bit comparators in the ID stage (not counting the 2 comparators inside the IFRF).

11.  Now let us assume that the 5-stage pipeline is changed to a 7-stage pipeline (as shown at the bottom of the page 11/24 of Lab 6 Part 4
http://www-classes.usc.edu/engr/ee-s/457/ee457_lab6/ee457_Lab6_Part4_r3.pdf
How many 5-bit comparators are needed in the ID stage to take care of all stalling and forwarding needs? ______(not counting the 2 comparators inside the IFRF).

D. Some additional questions Hazards in pipeline:

1. Among the three branch designs (late, medium-delay, and early), which two are almost identical?
______

2. Can we use a 2nd ALU instead of the “equality checker” in the ID stage for checking equality? Any timing problems? ______
______
3. Is there any difference among the three branch designs (late, medium-delay, and early) so far as the IF stage flushing via wrist-banding the IF instruction when the branch is successful? ______
______
Similarly, if the IF stage is split into two stages IF1 and IF2 (as done in the 7-stage pipeline of the lab 6 part 4), is there any difference among the three branch designs (late, medium-delay, and early) so far as the IF stage flushing via wrist-banding the IF instruction when the branch is successful? _____
______
4. ______(Because / Even though) MIPS ISA designers have defined one load-word delay slot, it is ______(imperative / not necessary) that we build only 5-stage pipeline. Explain ______
______
5. ______(Because / Even though) MIPS ISA designers have defined one branch delay slot, it is ______(imperative / not necessary) that we build only 5-stage pipeline. Explain ______
______
6. Branch Delay slot idea is ______(a new / an old) idea that ______(is/isn’t) used in the current day designs of the CPU. Explain ______
______
7. What are the advantages of performing all register ID comparisons in the ID stage itself in the “so called comparator station in the ID stage in Lab 7”. ______
______
______