(A) Calculate the Average CPI for Each Machine, M1, and M2

Homework assignment #1

1. [15 points] Consider two different implementations, M1 and M2, of the same instruction set. There are three classes of instructions (A, B, and C) in the instruction set. M1 has a clock rate of 80 MHz and M2 has a clock rate of 100 MHz. The average number of cycles for each instruction class and their frequencies (for a typical program) are as follows:

(a) Calculate the average CPI for each machine, M1, and M2.

M1: 0.6 x 1 + 0.3 x 2 + 0.1 x 4 = 1.6 average CPI

M2: 0.6 x 2 + 0.3 x 3 + 0.1 x 4 = 2.5 average CPI

(b) Calculate the average MIPS ratings for each machine, M1 and M2.

M1: 80 MHz / 1.6 cycles/instr. = 50 MIPS

M2: 100MHz / 2.5 cycles/inst = 40 MIPS

(c) Which machine has a smaller MIPS rating ? Which individual instruction class CPI do you need to change, and by how much, to have this machine have the same or better performance as the machine with the higher MIPS rating (you can only change the CPI for one of the instruction classes on the slower machine)?

M2 has smaller MIPS rating . We change instruction class A to have 1 cycle on M2

M2 new CPI is 0.6 x 1 + 0.3 x 3 + 0.1x 4 = 1.9

M2 new MIPS = 100MHz/1.9 cycles/intr = 52.6 MIPS

2. [10 points] (Amdahl’s law question) Suppose you have a machine which executes a program consisting of 50% floating point multiply, 20% floating point divide, and the remaining 30% are from other instructions.

(a) Management wants the machine to run 4 times faster. You can make the divide run at most 3 times faster and the multiply run at most 8 times faster. Can you meet management’s goal by making only one improvement, and which one?

divider 3 times faster time = 0.5 + 0.2/3 + 0.3 = 0.86  1/0.86 = 1.15x faster

multiply 8 times faster time = 0.5/8 + 0.2 + 0.3 = 0.565  1/0.565 = 1.8x faster

CANNOT make the machine run 4 times faster.

(b) If you make both the multiply and divide improvements, what is the speed of the improved machine relative to the original machine?

divider 3 times faster and multiply 8 times faster

time = 0.5/8 + 0.2/3 + 0.3 = 0.429  1/0.429 = 2.33x faster

3. [5 points] Suppose that we can improve the floating point instruction performance of machine by a factor of 15 (the same floating point instructions run 15 times faster on this new machine). What percent of the instructions must be floating point to achieve a Speedup of at least 4?

100% = fp% + (100 – fp)%

now we want 100/4 = fp/15 + 100 –fp

(14/15)fp = 75 fp = 80.3 ------ 80.3% of the instructions are fp instructions

4. [15 points] In MIPS assembly, write an assembly language version of the following C code segment: int A[10], B[10]; for (i=1; i < 10; i++) { A[i] = A[i-1] + B[i]; } At the beginning of this code segment, the only values in registers are the base address of arrays A and B in registers $a0 and $a1. Avoid the use of multiplication instructions–they are unnecessary. Using SPIM to do this problem.

Array A base address is in $a0

Array B base address is in $a1

The MIPS assembly program is as follows;

addi$t0, $zero, 1# starting index i = 1

addi $t5, $zero, 10# loop bound

loop:lw $t1,0($a0)# load A[i-1], a0 has the base address of array A

lw $t2, 4($a1)# load B[i], a1 has the base address of array B

add $t3, $t2, $t1# A[i-1] + B[i]

sw $t3, 4($a0)# store A[i] = A[i-1] + B[i]

addi $a0, $a0, 4# go to next element in Array A

addi $a1, $a1, 4# go to next element in Array B

addi $t0, $t0, 1# increment the index variable

beq $t0, $t5, loop# compare with loop bound

exit: nop

5. [20 points] Convert the C function below to MIPS assembly language. Make sure that your assembly language code could be called from a standard C program (that is to say, make sure you follow the MIPS calling conventions). Using SPIM to do this problem.

unsigned int sum(unsigned int n)

{ if (n == 0) return 0; else return n + sum(n-1); }

This machine has no delay slots. The stack grows downward (toward lower memory addresses). The following registers are used in the calling convention:

# This program gets an interger number as input,

# calculates recursively outputs sum = n + (n-1) + (n-2) + ... +1 +0

# program ends when input = -1

.data

newline : .asciiz "\n"

endprogram : .asciiz " the end of program \n"

.text

.globl main

main:

jal read_int

beq $v0,-1,end

move $a0,$v0

jal fact

move $a0,$v0

jal print_int

jal new_line

j main

fact :

sub $sp,8

sw $ra,4($sp)

sw $a0,0($sp)

slt $t0,$a0,1

beq $t0,$zero,L1

add $v0,$zero, $zero

add $sp,$sp,8

jr $ra

L1 :

sub $a0,$a0,1

jal fact

lw $a0,0($sp)

lw $ra,4($sp)

add $sp,$sp,8

add $v0,$a0,$v0

jr $ra

print_int :

li $v0,1

syscall

jr $31

read_int :

li $v0,5

syscall

jr $ra

new_line :

li $v0, 4

la $a0, newline

syscall

jr $31

end :

li $v0, 4

la $a0, endprogram

syscall