Run Length Encoding/Decoding

EE113D Project

Imran Hoque

Yipeng Li

Diwei Zhang

Introduction and Background

Run length encoding (RLE) is a simple technique to compress digital data by representing successive runs of the same value in the data as the value followed by the count, rather than the original run of values. The goal is to reduce the amount of data needed to be stored or transmitted. The need to use run length encoding often arises in various applications in DSP, especially image processing and compression.

Example of RLE:

Original

1 / 1 / 1 / 0 / 0 / 1 / 1 / 1

RLE Representation

1 / 3 / 0 / 2 / 1 / 3

As we can see from the above simple example, RLE works the best when applied to data where there are successive runs of the same values. Although one might think that such situations are trivial and not the norm, they actually appear over and over in many fields of DSP. As an example, when applying a low-frequency digital filter against a random input of varying frequencies, all the frequencies above the cut-off frequency will be represented as 0. Thus in the corresponding output, there would be runs of 0’s which could be better represented as the value (0) followed by the count of consecutive 0’s. Another real world application of RLE is in image processing. During image compression, higher spatial “frequencies” are filtered out. This is done by applying a 2-D DFT (Discrete Fourier Transform) or DCT (Discrete Cosine Transform), then multiplying the result with a quantization matrix to reduce the amount of precision stored in the higher spatial frequencies (i.e. the higher spatial frequency components will be 0 or very close to 0). The resulting image is often stored as a vector of values and they are read in from the 2-D matrix using a zig-zag technique that maximizes the number of consecutive 0’s so that RLE would be most efficient. Thus, standard image compression techniques such as JPEG uses RLE extensively, and what we implemented on the DSP would become part of a whole system that does JPEG encoding and decoding. The advantages of RLE encoding is that it’s, in principle, very easy to comprehend and implement compared with other compression techniques used today. However, the compression results depends heavily upon the input.

In the best case, RLE can reduce data to merely two numbers if all the values in the original data are exactly the same, regardless of the size of the input. However, in the worst case, that is if there are no repeating values in the data, RLE could actually double the amount of numbers compared with the original data. Thus RLE should only be used in cases where runs of the same value are expected.

Another advantage of RLE is a lossless (or reversible) compression technique. That is unlike some other compression techniques, such as JPEG, one can obtain exactly the original data. This is done through an RLE decoder which we have also implemented in both Matlab and assembly code.

Project Development

The first week of the project development time was spent on making a matlab algorithm and verifying the results against theoretical results. The matlab code for both the encoder and decoder worked just as expected. A convenience that we had in matlab but not in assembly code was the ability to add elements to an array whenever needed. This was very important in RLE since the output size of the encoder depends on the input patterns and isn’t known until runtime.

At the time of submission, we were unable to declare a vector of variable size in assembly code. Thus, both the input and output to our run length encoder is stored as a pre-defined static vector. However, we did realize that our vectors are actually represented, in assembly, as a pointer to a specific memory location. Thus, we could theoretically increment the pointer beyond our predefined memory allotment for the output, effectively creating a vector of variable size.

The problem with this method is that by dereferencing the pointer at memory addresses that aren’t specifically allocated for the output data, we could be accidentally altering important memory that is in use elsewhere by the DSP. This is actually less of a problem when programming the DSP than it would be if we were programming a standard CPU, because we can actually monitor and control exactly what memory addresses are used by the DSP. On the other hand a C++ compiler would automatically allocate a random address in CPU memory for the output, and since a CPU handles multiple tasks, altering random memory addresses belonging to other CPU functions may result in serious problems.

Our current run length encoder algorithm works only for a preset data input of 16 numbers. We thus defined the output as a static vector of 32 numbers, all initialized to ‘-1’. This is because, given an arbitrary 16-number input, the maximum possible output can only be 32 (as is the case if there are no repeated numbers in succession). For example, the input: ‘1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16’ will yield the output: ‘1,1,2,1,3,1,4,1,5,1,6,1,7,1,8,1,9,1,10,1,11,1,12,1,13,1,14,1,15,1,16,1’.

For outputs of less than 32 numbers, it is easy to see where the encoded output ends, because our entire output array is initialized to ‘-1’ at the start, and because our input data shall contain no negative numbers, thus our actual encoded output shall also contain no negative numbers.

The decoder follows a slightly different algorithm than the encoder, thus there are not quite as many restrictions placed on the input to the decoder. The encoder algorithm requires the input to be exactly 16 numbers, because the code is programmed to stop after reading through exactly 16 numbers of input. The decoder, however, is programmed to stop after exactly 16 numbers have been read to the output. Thus, the input: “1,16” will yield the exact same output (‘1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1’) as the input: “1,16,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1’, which would be the actual output to the encoder. Provided a valid run-length-encoded input to the decoder, the output should be exactly 16 positive numbers.

One problem that occurred during the initial debugging of our code was that the original encoder algorithm worked perfectly when individually stepping through it. However, when we just let the DSP run, the output came out completely wrong. The output simply displayed the last number of the input sequence, and that it was repeated 16 times. So even though our initial algorithm worked in theory, it did not work in accordance with the speed limitations of the DSP. Originally, we had incremented the two pointers (the one pointing to one input location, and the one pointing to the next) before we performed the comparison operation on them. It appeared that the DSP was reading through to the end of the input data faster than it could individually compare every two consecutive numbers. So although the counter kept counting up (all the way to 16), the comparisons were not being performed properly (which also meant the output loop was not being called within the comparison loop), which explains why in the end only one output (the very last input number) was displayed along with a count of 16. We were able to fix this problem by incrementing the pointers only after the comparison operation had been performed.

Discussion of Results

Our MATLab simulation was a simple algorithm, but it performed the RLE task correctly. The correctness of the code was easy to verify, we simply feed the algorithm an input and compared the output with the input. We knew what a correctly RLE-ed version of the input looked like, so it was easy to compare and verify the validity of the MATLab simulation.

Verifying the validity of our DSP assembly code was a similar process. We feed the DSP an input bit stream, which was in the form of a text file. The output was retrieved by reading the assigned memory location, which was then also saved in the form of a text file. While we were able to correctly retrieve the output of the encoder when we stepped thought the algorithm, when we ran the simulation completely, we faced speed limitations on the DSP. The debugging phase was by far the hardest part of the whole DSP programming procedure. We spent a good amount of time trying to find a solution which worked and eventually we determined that incrementing the pointers after the comparison operation fixed the problem. After this point we were able to run the simulation at full speed. The decoder did not give us many problems. We were able to code the decoder fairly quickly and it worked flawlessly without many adjustments being needed to be made.

Sample Inputs Tested (Encoder)

Input: 4,5,5,2,7,3,6,9,9,10,10,10,10,10,10,0,0

Matlab Output: 4,1,5,2,2,1,7,1,3,1,6,1,9,2,10,6,0,2

DSP Output: 4,1,5,2,2,1,7,1,3,1,6,1,9,2,10,6,0,2,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1, -1,-1

Input: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0

Matlab Output: 0,16

DSP Output: 0,16,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1, -1,-1,-1,-1,-1,-1,-1

Input: 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15

Matlab Output:0,1,1,1,2,1,3,1,4,1,5,1,6,1,7,1,8,1,9,1,10,1,11,1,12,1,13,1,14,1,15,1

DSP Output: 0,1,1,1,2,1,3,1,4,1,5,1,6,1,7,1,8,1,9,1,10,1,11,1,12,1,13,1,14,1,15,1

Since we know that the output of the encoder ends right before the -1’s begin, we can easily check our results from the DSP against the results from Matlab and they both match theoretical results.

Sample Inputs Tested (Decoder)

Input: 0,16

Matlab Output: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0

DSP Output: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0

Input: 0,1,1,1,2,1,3,1,4,1,5,1,6,1,7,1,8,1,9,1,10,1,11,1,12,1,13,1,14,1,15,1

Matlab Output: 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15

DSP Output: 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15

Then we checked for the results of the decoder and we looked for two things, first the results should match the matlab output as well as theoretical results, second the decoder should give the inverse result as the encoder. As we can see from these couple of sample inputs and their corresponding results, the DSP output matched both the matlab output and the theoretical results. In addition, giving the decoder the output of the encoder got us back to exactly the original input to the encoder which is what we’d expect.

We also tested for the efficiency of the assembly code. We did so by simply invoking breakpoints along the assembly code in Code Explorer and see how often a critical piece of code gets called. This would give us an insight into how many loops are needed to encode a given input sequence. Theoretically, RLE should be linear time since there only needs to be n loops for an input of n length. During our debugging phase, we verified that both the encoder and decoder also ran in approximately linear time. For the encoder, ‘loop2’ is indicative of how many iterations the code has to go through, and by setting the breakpoint there we could see that for the 16 number input samples we picked, we had 16 breaks on ‘loop2’. For the decoder, ‘loop2’ was chosen as the breakpoint again and we saw that again for the 16 number input samples we chose, we had 16 break points.

Conclusion

We were able to successfully implement the RLE on our DSP. The RLE algorithm is a simple one; however, its implementation on the DSP is not as simple. Our MATLab implementation was fairly simple to write and run. Writing the DSP assembly code was far more complex. The DSP assembly code requires us to be much more precise and deliberate.

Future improvements to our RLE/D algorithm that we considered were variable length inputs and real time outputs. Through this project we were able to gain valuable hands on experience with the entire DSP programming procedure.

Code

Encoder input:

;e_inputs.asm

;encoder input file

;'array' specifies the location of an array of numbers in data memory

;space, where the first item is located at address 'array'. This is

;essentially the input to our encoder, which is a 16-number data set.

;We have specifically designed our RLE encoder to take just positive

;numbers or 0 as inputs, this is not a limitation of the algorithm.

;It's just a way we decided to implement the algorithm simply things.

;similarly, 'output' stores the 32-number output data set, which is all

;initialized to '-1' at the beginning. Depending on the input, the

;output may or may not take up all of the 32 memory addresses allocated.

;Thus, it should be known that any '-1's stored in the memory addresses

;do not pertain to the encoded output created by the encoder.

array .word 4,5,5,2,7,3,6,9,9,10,10,10,10,10,10,0,0

output .word -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1

Encoder

;EE113D Final Project(encoder)

;

;Run Length Encoder: Shortens a series of input data by representing

;consecutive repeated numbers as the repeated number, followed by

;the number of repetitions.

;

;(Written by: Yi-peng Li, Diwei Zhang, Imran Hoque)

;

.setsect ".text", 0x500,0 ;Executible code in ".text"

;section will begin at 0x500

;in program memory

.setsect ".data", 0x800,1 ;Numbers to be sorted will

;begin at 0x800 in data memory

.data ;Data section begins

.copy "e_inputs.asm";Reads input values and initialize output

.text ;Executible code section begins.

count1.set 1;Initialize a counter starting at the number 1

count2.set 15;Initialize a counter starting at the number 15

AR6 = #array ;AR6 points to the input data location

AR3 = #array ;AR3 points to the next input data location

A = *AR3+

A = *AR3

AR5 = A;AR5 represents the actual number stored

;in the memory address AR3 points to

;(the actual number represented in the input)

A = *AR6

AR0 = A;AR0 represents the actual number stored

;in the memory address AR6 points to

AR4 = #output;AR4 points to the output data location

AR2 = #count2;AR2 keeps track of how much of the input

;data has been read

;loop1 initializes the count of AR1 to '1'

loop1 AR1 = #count1 ;Register AR1 is used to keep track of the

;number of repeated inputs in succession

;loop2 reads through the input data, and keeps

;track of the number of consecutive inputs

loop2TC = (AR0 != AR5);Compares the number stored in AR5 with the

;number stored in AR0

A = *AR6+;Increment the pointer AR6

A = *AR3+;Increment the pointer AR3

A = *AR3

AR5 = A;Re-initialize AR5

A = *AR6

AR0 = A;Re-initialize AR0

if (TC) goto loop3;Break loop if next number different

A = *AR1+;else continue counting

if (*AR2- != 0) goto loop2 ;Stop encoder if count of AR2 reaches zero

A = *AR2+;Leave count of AR2 at zero

;loop3 stores the encoded input, followed by

;its repeated count into the output array

loop3A = *AR6-;Point back to the last repeated number

A = *AR6

*AR4+ = A;Add the repeated number to output

A = AR1

*AR4+ = A;Add the count of repeated number to output

A = *AR6+;Move pointer back to where it left off

if (*AR2- != 0) goto loop1;Stop encoder if count of AR2 reaches zero

stop nop

goto stop ;infinite loop

.end

Decoder Input

;d_inputs.asm

;decoder input file

;'array' specifies the location of an array of numbers in data memory

;space, where the first item is located at address 'array'. This is

;essentially the input to our decoder (or the output to our encoder).

;Due to our decoder algorithm, the input to the decoder may be of an

;arbitrary data length (w/ or w/o the arbitrary '-1's that may be

;contained at the end of the encoder output). Note that all input

;numbers must be positive or 0 since we assumed that the input to the

;encoder were all positive or 0.

;'output' stores the 16-number output to the decoder, which is all

;initialized to '-1' at the beginning. Because the original input

;to the encoder must contain exactly 16 numbers, the output to the

;decoder must also contain exactly 16 numbers. Thus, the final output

;of the decoder must not contain any '-1's given a valid input.

array .word 1,1,0,15

output .word -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1

Decoder

;EE113D Final Project(decoder)

;

;Run Length Encoder: Takes as its input, a string of data encoded

;according the the run length encoder algorithm, and outputs it

;as the decoded string of data originally input to the encoder.

;

;(Written by: Yi-peng Li, Diwei Zhang, Imran Hoque)

;

.setsect ".text", 0x500,0 ;Executible code in ".text"

;section will begin at 0x500

;in program memory

.setsect ".data", 0x800,1 ;Numbers to be sorted will

;begin at 0x800 in data memory

.data ;Data section begins

.copy "d_inputs.asm";Get input values and initialze outputs

.text ;Executible code section begins.

count2.set 14;Initialize a counter starting at the number 14

AR6 = #array ;AR6 points to the input data location

AR3 = #array ;AR3 points to the next input data location

A = *AR3+

A = *AR3

AR5 = A;AR5 keeps track of the number of repetitions

AR4 = #output;AR4 points to the output data location

AR2 = #count2;AR0 keeps track of how much of the input

;data has been read

;loop2 reads through the input data to the decoder

loop2if (*AR5- != 0) goto loop3;Keep outputting the current input number until

;the following count of that number reaches zero

A = *AR6+;Else continue reading thru input

A = *AR6+;Increment twice to get next number in output

A = *AR3+

A = *AR3+;Increment twice to get the count of that number

A = *AR3

AR5 = A;Re-initialize AR5

if (*AR2- != 0) goto loop2 ;Stop encoder if count of AR2 reaches zero

goto stop

;loop3 stores the decoded output, by expanding the

;number of repeated inputs

loop3A = *AR6

*AR4+ = A;Add the repeated number to output

goto loop2

stop nop

goto stop ;infinite loop

.end