CEC470, Spring 2013 Homework #1, part BName: ______

Open book, open notes; but do your own work, please. You’re on your honor here.

[50 pts total] Answer all parts (a) through (j), inclusive, below.

I have defined a new computer program to be used as a benchmark for performance measurements. It's called the "ERAU Run". It contains the mix of instructions in the table, below (the numbers are not at all intended to be real world  this is a torture for undergraduates, the numbers don’t have to be sensible ;-) The table also shows the cycles per instruction and reflects the fact that our hardware includes a floating point co-processor.

Instruction / Cycles
Required Per Instruction / ERAU Run Instruction Count
Floating point multiply or divide / 8 / 7,000,000
Floating point add or subtract / 12 / 2,000,000
All others (non floating point) / 5 / 10,000,000

The gcc compiler we use is intended for use on many different machines; some with floating point co-processors, some without; so the user can, via compile time option, request that the code be compiled for a floating point architecture or a non-floating point architecture in which case the compiler must translate each floating point instruction into a set of equivalent integer instructions that run on the main CPU only.

Assumingthat our hardware is clocked at 100MHz:

(a)[3 pts] What is the average CPI for the FP version of the ERAU Run?

(b)[3 pts] How long will it take to execute this FP version of ERAU Run on our (pretty slow) hardware?

(c)[3 pts] What is the MIPS rate for our hardware with the FP co-processor?

If we compile the ERAU Run with the no-FP-hardware option and then measure the running time of the resultant (integer equivalent) program, we get a time of 4.25 seconds.

(d)[3 pts] How much faster is our program when we compile with the floating point co-processor option than when we compile without it?

(e)[8 pts] What is the average number of integer instructions being produced by the compiler to replace each floating point instruction? (Average over all FP instructions; you have no way of telling how many integer instructions to replace an FP add versus how many for an FP divide, for example.) As per the third row of the table above, continue to assume that the CPI for all integer instructions is 5

(f)[2 pts] What is the MIPS rate of our hardware executing this integer-equivalent version of the ERAU Run?

You now have three commonly used quantitative performance indices (elapsed time, MIPS, CPI) to look at.

(g)[2 pts] Which one provides the most accurate picture here and why?

As a result of your brilliant ERAU education, you come up with two improvements to the design of the FP co-processor of the chip that our hardware uses. One of them reduces the number of cycles it takes to do the FP multiply or divide from8 to 7; the other reduces the number of cycles for the FP addor subtract from12 to 10.

(h)[8 pts] Let’s assume you only want to make one of your two possible enhancements. Purely on a performance basis – ignoring cost, that’s part (j), below – which one would you recommend and why? (Justify your answer quantitatively.)

(i)[8 pts] The current FP co-processor chip is1 in2and contains 106 transistors (yes, I know, that’s unrealistically low these days). The chip is being manufactured on 10 inch diameter wafers. If the manufacturing process results in a defect density of 0.3 defects per in2, what is the net expected number of usable dies from the wafer? Use my formula, not the book’s here. (You don’t know the value for the book’s α factor anyway, I haven’t supplied one.)

(j)[10 pts] The potential improvement to the FP multiply/divide circuits would require 100,000 new transistors; the improvement to the FP add/subtract circuits would take 150,000. Now what is your final recommendation? Justify it quantitatively, of course. (Remember to consider that it may not be cost effective to do either alternative.)