ENCM415 Blackfin ADSP-BF533 Assignment 3, 2007
Handling 64-bit operations on a Blackfin
Only 1 report needed from you and your laboratory partner
Due Tuesday 16th October 10 p.m.
(25% penalty if late up to 1 day, zero marks after that as answers will be posted)

The following letter was received in the Dean’s Office from Santa Claus early in September

Santa’s House

North Pole,

Canada H0H0H0

Dear ENCM415 student,

I have tried to become a “technology-aware” Santa Claus. I went down the chimney of a number of Grade 12 students and placed ADSP-BF533 evaluation boards into their stockings; but nothing has gone right. L

They claim that when they use the evaluation board for 1 second or 5 seconds, they find that the processor performs well – giving over 300 MHz clock rates. BUT when they use the board for 10 seconds, the processor gets tired and slows down to only 6 MHz. To prove this fact, they have sent me some of their test files and the screen capture below

Unless I can find out (A) why they got such strange results, and (B) fix the problem so I can show them that the processor does not get tired, my reputation is ruined. Can you help?

Father Christmas

P.S. I know you guys are working hard. I have told your instructor that I will be putting coal in his stocking for next Christmas unless he lets you use the same assembly language functions you develop for this assignment during Lab. 2 to cut down your workload!


REMINDER: Unless I can find out (A) why they got such strange results, and (B) fix the problem so I can show them that the processor does not get tired, my reputation is ruined. Maximum 1 page 10 marks

There is a lot of documentation about how to use the CYCLES register in assembly code available from the Analog Devices Website www.analog.com/dsp -- like most manufacturers’ websites -- not easy to find.

HINT: The answers to Santa’s two questions can be found in the hardware and software manuals. This assignment shows the importance of reading those manuals, especially when you, or somebody else, finds that the processor does not work as expected.

This is a typical “defect”. It is unusual in that the “final product” customer has actually sent in a “simple” example and documented output. You are doing the customer maintenance on released code

A nasty thing about this particular defect – IF you can actually reproduce the defect, and then single step through the code to find the problem, it does not occur – the actual process of debugging stops the defect occurring.

One of the main differences between errors and defects is the amount of time spent in tracking down a defect.

There are two satisfactory answers to this question that are worth 10 marks

A)  I could reproduce this error. It was caused by ………. Tell me what you did

B)  I could not reproduce this defect. Since it had the following characteristics, I seeked it here, I seeked it there, I could not find that scarlet defect anywhere. It was not this ……. Because ……… or this …….. because

I marked the report F unsatisfactory, D very basic, C average understanding, B detailed understanding, A excellent explanation

Fault is in the “CYCLES” subroutines

Hints for that explanation are ever where in the question – we are doing that code, the screen dump shows issues with timing failing after some time,

The following is needed for an A – your wording may be different.

CYCLES / CYCLES2 is a 64 bit register – you can’t read a 64 bit register in 1 instruction – you have to do it in two. These registers WILL change in the time between you reading one register and then the other.

Lets consider CYCLES2 / CYCLES to be two 2 - bit registers and it takes 1 cycle to a Blackfin read

CYCLES2 / CYCLES / Read CYCLES2
and CYCLES / Program value / Proper answer
00 / 00 / CYCLES2 = 00
00 / 01 / CYCLES = 01 / 00 01 / 00 00 or 00 01 close
00 / 10
00 / 11 / CYCLES2 = 00
01 / 00 / CYCLES = 00 / 00 00 / 01 00 or 00 11 WRONG
01 / 01
01 / 10
01 / 11 / CYCLES = 11
10 / 00 / CYCLES2 = 10 / 10 11 / 01 11 or 10 00
WRONG
10 / 01
10 / 10
10 / 11
11 / 10


Many programmers solve this sort of problem of “clock roll-over” by reading the clock registers twice – if the values are “very different” – then clock roll-over must have occurred and the clock values can be adjusted.

Blackfin has a hardware solution to this – each time you read the CYCLES register, the CYCLES2 register is stored by the hardware into the SHADOW CYCLES2 register. When you read the CYCLES2 register, you actually read the SHADOW CYCLES2 register from the last time that the CYCLES register was read

CYCLES2 / CYCLES / Read SHADOWCYCLES2
and CYCLES / Program value / Proper answer
00 / 00 / SHADOWCYCLES2 = 00 / (Reset value)
00 / 01 / CYCLES = 01
Set SHADOWCYCLES2 = 00 / 00 01 / 00 00 or 00 01 close
What students did
00 / 10
00 / 11 / SHADOWCYCLES2 = 00
01 / 00 / CYCLES = 00
Set SHADOWCYCLES2 = 01 / 00 00 / 01 00 or 00 11 WRONG
What students did
01 / 01
01 / 10
01 / 11 / CYCLES = 11
Set SHADOWCYCLES2
= 01
10 / 00 / SHADOWCYCLES2 = 01 / 01 11 / 01 11 CORRECT
10 / 01
10 / 10
10 / 11
11 / 10

Thus the code (as I suggested in the assignment)

R1 = CYCLES2;

R0 = CYCLES gives the correct answer except when the CYCLES2 / CYCLES clock rolls over from 0x0000 FFFF to 0x0001 0000.

The correct answer was R0 = CYCLES (store SHADOWCYCLES2)
R1 = CYCLES2 (reads SHADOWCYCLES2)

What made the DEFECT (unintentionally) more interesting was that my Blackfin processor 0.2 silicon was powering up in 436 MHz mode so that the CYCLES register rolled over around 8 seconds. Hence the reason I showed the values for 1, 2, 5, 10, in the screen dump. However the processors in the lab were running at 218 MHz so they did not fail after 8 seconds but after 16 seconds

Q1 This was an exercise to get you to

1)  Understand mixed mode display

2)  Learn to read the screen and compare C++ and assembly code

3)  Useful practice for midterm and final

All the following should be easy to do – what your answer is – no idea – depends on how much code you added and the file names you used

Q2) Load the Assign3 executable onto the processor. Now click on the Count1Second.cpp file to bring it into the editor window. Right click in the window and select the option “Mixed”. You should now see both the C++ code and the assembly code.

Capture a screen shot of the code and place in your report. 2 marks

Answer the following questions based on your screen shot

A)  What is the memory location (in section program) for the start of the Count1Second( ) function?

B)  What is the memory location (in section program) for the end of the Count1Second( ) function?

C)  What register is being used for the count variable?

D)  What is the memory location (in section program) where the test count < COUNT_1_SECOND is made?

E)  What is the memory location (in section program) where the operation count++ is performed?

5 marks

Now you activate the compiler to optimize the code – since this code has no useful output (as far as the compiler is concerned) – it is dead code and is optimized away to nothing.

A and B still have an answer, but C, D, E don’t

In fact, if you turned on IPA – inter procedural optimization you would find that this code got “inlined” (placed into the main code as “straight line code and not a subroutine) and then that straight line code would have been removed entirelt

Q3. We can now ask the compiler to generate optimized code.

Capture a screen shot of the code and place in your report. 2 marks

Answer the following questions based on your screen shot

A)  What is the memory location (in section program) for the start of the Count1Second( ) function?

B)  What is the memory location (in section program) for the end of the Count1Second( ) function?

C)  What register is being used for the count variable?

D)  What is the memory location (in section program) where the test count < COUNT_1_SECOND is made?

E)  What is the memory location (in section program) where the operation count++ is performed?

2 marks

Add volatile to the loop count – to get full marks you need to explain that volatile “forces the compiler” to reread that loop counter every time around the loop as was discussed in class during the human microprocessor play – I had the TA adjust the mark if this was not done.

BONUS – You can add ONE keyword to the Count1Second.cpp code and restore (most of) the code to this function in an optimized format – what is that keyword, where does it go, and why does it work, what optimizations are present in the code? TO GET THIS BONUS, YOU MUST WORK OUT THE ANSWER YOURSELF, DON’T ASK ANY BODY ELSE, INCLUDING T.A.S AND INSTRUCTOR. HINT: The keyword has already been mentioned in class
6 marks

Q4) Now built the functions
THESE WILL BE NEEDED IN LAB. 2 SO COMPLETE BEFORE COMING INTO THE LAB.

unsigned long long int ReadCycleCounterASM(void); // Return the 64 bit value of the Blackfin CYCLE register,

Q5) Question from 2005 Final – Hand in answer on another sheet 10 marks

Design a documented Blackfin assembly code subroutine
void WaitForSignal(int *leftAudioChannel, long int threshold)

which will continually monitor the left audio channel and only return from this subroutine when there is a signal (higher than a certain noise threshold level) present on the left audio channel. Follow the coding conventions established for this course. This code should not make use of memory locations declared in other files.

To make life easier for the markers (and increase your chances of partial marks) please try to match your assembly code up (left side of the page) with the documentation (right side of the page).

The answer layout should look something like this

Blackfin assembly code / Documentation
.section program;
.global _WaitForSignal__FpiL (or
some equivalent name mangle;
#define pointer_inpar1_R0 R0
#define threshold_inpar2_R1 R1
_WaitForSignal__FpiL:
#define pointer_P0 P0
Pointer_P0 = pointer_inpar1_R0
DOLOOP:
temp_R1 = [Pointer_P0];
CC = thereshold_inpar2 < temp_R1;
IF CC JUMP DOLOOP;
_WaitForSignal__FpiL.END: RTS; / void WaitForSignal(int *leftAudioChannel, long int threshold) {
while (*leftAudioChannel < threshold)
/* do nothing */