ETM2126 INFORMATION THEORY AND ERROR CODING

FACULTY OF ENGINEERING

LAB SHEET

INFORMATION THEORY AND ERROR CODING

ETM 2126

TRIMESTER 2 (2010/2011)

Experiment 1: / IT1 –Huffman Coding
Experiment 2: / IT2 – Linear Block Coding

Note: Students are advised to read through this lab sheet before doing experiment. On-the-spot evaluation may be carried out during or at the end of the experiment. Your performance, teamwork effort, and learning attitude will count towards the marks.

Multimedia University

Experiment 1: Huffman Coding

1.OBJECTIVE

  • To apply source coding techniques using Huffman Codes.
  • To design and generate Huffman Codes with MATLAB software.

2.SOFTWARE REQUIRED

  • MATLAB

3.INTRODUCTION TO MATLAB

3.1MATLAB

MATLAB is a powerful collection of tools for algorithm development, computation and visualization. It provides more control and flexibility compared to a traditional high-level programming language. Unlike such languages, MATLAB is compact and easy to learn, letting you express algorithms in concise, readable code. In addition, MATLAB provides an extensive set of ready-to-use functions including mathematical and matrix operations, graphics, color and sound control, and low-level file I/O. MATLAB is readily extendable - you can use the MATLAB language to easily create functions that operate as part of the MATLAB environment.

The MATLAB User's Guide describes how to work with the MATLAB language. It discusses how to enter and manipulate data and use MATLAB's extensive collection of functions. It explains how to perform command-line computations and how to create your own functions and scripts. The MATLAB Reference Guide provides reference descriptions of supplied MATLAB functions and commands.

3.2SIMULINK

SIMULINK is a dynamic system simulation environment. It allows you to represent systems as block diagrams, which you build using your mouse to connect blocks and your keyboard to edit block parameters.

The SIMULINK User's Guide describes how to work with SIMULINK. It also provides reference descriptions of each block in the standard SIMULINK libraries.

3.3Communications Toolbox

The Communications Toolbox is a collection of computation functions and simulation blocks for research, development, system design, analysis, and simulation in the communications area. The toolbox is designed to be suitable for both experts and beginners in the communications field. The toolbox contains ready-to-use functions and blocks, which you can easily modify to implement your own schemes, methods, and algorithms. The Communications Toolbox is designed for use with MATLAB and SIMULINK. The Communications Toolbox includes:

  • SIMULINK blocks
  • MATLAB functions

You can use the toolbox directly from the MATLAB workspace. You can use the SIMULINK environment to construct a simulation block diagram for your communication system. For convenience, the Communications Toolbox provides online interactive examples that use any of the function found in this toolbox.

3.4Available Functions

This toolbox supports a variety of functions in the communications area. These functions include:

  • Data source
  • Source coding/ decoding
  • Error-control coding
  • Modulation/demodulation
  • Transmission/ reception filters
  • Transmitting channel
  • Multiple access
  • Synchronization
  • Utilities

4.MATLAB FUNDAMENTALS

In this section, we will illustrate some commands used for this experiment. (For additional information on the individual MATLABcommands, we refer you to the MATLAB User's Guide or the online help facility.)

4.1Fundamental Expressions

Working in the MATLAB environment is straightforward because most commands are entered as you would write them mathematically. For example, entering the following simple expression

>a = 4/3

yields the MATLAB response

a =

1.3333

In general, it is better to use appropriate and memorable names for variables. MATLAB recognizes the first 19 characters of a variable name but the first character in a variable name be a letter. MATLAB is case sensitive and all MATLAB commands are written in lowercase letters.

If you prefer to create a new variable but do not wish to see the MATLAB response, type a semicolon at the end of the expression. For example,

>b = 4+7;

will create a new variable b whose value is 11, but MATLAB will not display the value of b.

4.2Creating Script Files

It is much more convenient to use script files than to enter commands line by line at the MATLAB prompt. A script file is an ASCII file (regular text file) that contains a series of MATLAB commands written just as you would enter them in the MATLAB environment. Statements that begin with a '%' are considered to be comments and are ignored by MATLAB. The script file is created outside of MATLAB with any available text editor or word processor. Each script file should have a name that ends in ".m". The commands in the script file are executed in the MATLAB environment simply by entering the name of the script file without the ".m". For example, suppose the text file magphase.m contains the following statements:

% magphase.m: example m-file to

% compute the magnitude and phase of G at  = 1.

 =1;

G =l/(j* + 2);

mag =abs(G)

phase =atan(imag(G)/real(G))

Then type magphase at the MATLAB prompt will yield the following MATLAB response:

mag=

0.4472

phase=

-0.4636

which are the magnitude and phase of the transfer function G(j) = l/(j + 2) evaluated at  = 1. By entering help magphasewill give you back the text in the comment lines at the beginning of the file:

magphase.m: example m-file to

compute the magnitude and phase of G at =1.

It should be obvious that it is very desirable to include at least some brief comments as a header to each m-file.

4.3Matrices

Matrices are entered into MATLAB by listing the elements of the matrix and enclosing them within a pair of brackets. Commas or blanks separate elements of a single row, and semicolons or carriage returns separate rows. For example,

>A = [1 2; 3 4]

yields the MATLAB response

A=

1 2

3 4

To get the dimensions of a matrix, we can use the size command, e.g.,

>[K,N] = size(A)

K=

2

N=

2

Special vectors can be created using the ':' operator. For example, the command

>k= 1:10

will generate a row vector with elements from 1 to 10 with increment 1.

Using the simple commands described thus far, you can easily manipulate both matrices and vectors. For example, to add a row onto matrix A we type,

>A = [A; [7 8]]

A=

1 2

3 4

7 8

To extract the submatrix of A that consists of the first through second columns of the second through third rows, use vectors as indices as follows:

>B = A(2:3,1:2)

B=

3 4

7 8

MATLAB has commands for generating special matrices. For example, you can create a n x n identity matrix with the eye(n). The zeros, ones and rand commands work similarly to eye and make matrices with elements equal to zero, elements equal to one, and random elements, respectively. These commands can also be used to create non-square matrices. For example, zeros(2,4) generates a 2 x 4 matrix of zeros.

Finally, A = A' command will give the transpose of the matrix A.

4.4Additional Commands

A) huffmandict

DICT = HUFFMANDICT(SYM, PROB)

Generates a binary Huffman code dictionary using the maximum variance algorithm for the distinct symbols given by the SYM vector. The symbols can be represented as a numeric vector or single-dimensional alphanumeric cell array. The second input, PROB, represents the probability of occurrence for each of these symbols. SYM and PROB must be of same length.

DICT = HUFFMANDICT(SYM, PROB, N)

Generates an N-ary Huffman code dictionary using the maximum variance algorithm. N is an integer between 2 and 10 (inclusive) that must not exceed the number of source symbols whose probabilities appear in PROB.

DICT = HUFFMANDICT(SYM, PROB, N, VARIANCE)

Generates an N-ary Huffman code with the specified variance. The possible values for VARIANCE are 'min' and 'max'.

[DICT, AVGLEN] = HUFFMANDICT(...)

Returns the average codeword length of the Huffman code.

B) huffmanenco

ENCO = HUFFMANENCO(SIG, DICT)

Encodes the input signal, SIG, based on the code dictionary, DICT. The code dictionary is generated using the HUFFMANDICT function. Each of the symbols appearing in SIG must be present in the code dictionary, DICT. The SIG input can be a numeric vector or a single-dimensional cell array containing alphanumeric values.

C) HUFFMANDECO

HUFFMANDECO(COMP, DICT)

Decodes the numeric Huffman code vector COMP using the code dictionary, DICT. The encoded signal is generated by the HUFFMANENCO function. The code dictionary can be generated using the HUFFMANDICT function. The decoded signal will be a numeric vector if the original signals are only numeric. If any signal value in DICT is alphanumeric, then the decoded signal will be represented as a single-dimensional cell array.

5.THEORY

5.1Source Coding

Source encoding (or source coding) is a process by the help of which the data generated by a discrete source is efficiently represented. The device that performs the representation is called a source encoder.

For a source encoder to be efficient, we require knowledge of the statistics of the source. In particular, if some source symbols are known to be more probable than the others, then we may exploit this feature in the generation of a source code by assigning short code words to frequent source symbols, and long code words to rare source symbols. We refer to such a source code as a variable-length code.

An efficient source encoder should satisfy the following two functional requirements:

1.The code words produced by the encoder are in binary form.

2.The source code is uniquely decodable, so that the original source sequence can be reconstructed perfectly from the encoded binary sequence.

Consider a discrete memoryless source whose output sk is converted by the source encoder into a block of 0s and 1s, denoted by bk. We assume that the source has an alphabet with K different symbols, and that the kth symbol sk occurs with probability pk, k = 0, 1, …, K-1. Let the binary code word assigned to sk by the encoder have a length lk, measured in bits. We define the average code word length,L, of the source encoder as

L = pklk k = 0, 1, …, K-1

L represents the average number of bits per source symbol used in the encoding process.

Let Lmin denote the minimum possible value of L. We then define the coding efficiency of the source encoder as

 = Lmin/L

With LLmin, we clearly have  1. The source encoder is said to be efficient when  approaches unity.

5.2 Source Coding Theorem

Given a discrete memoryless source of entropy H(S), the average code-word length L for any distortionless source encoding is bounded as

LH(S)

This is also known as Noiseless coding theorem or Shannon’s first theorem.

Accordingly, the entropy H(S) represents a fundamental limit on the average number of bits per source symbol necessary to represent a discrete memoryless source. In other words, L cannot be made smaller than H(S).

Thus with Lmin = H(S), we may rewrite the efficiency of a source encoder in terms of the entropy H(S) as

 = H(S)/L

5.3 Huffman Coding

Huffman code is a variable-length code. The basic principle of Huffman coding is to assign short codes to symbols with high p, and long codes to symbols with low p. The purpose is to construct a code such that its average code-word length that approaches the minimum i.e. entropy of source H(S).

The followings are the encoding steps:

1.The source symbols are listed in order of decreasing probability.

2.The two symbols with the lowest probability are assigned a 0 and a 1.

3.The probabilities of the last two symbols are added.

4.The list of symbols is re-sorted with the last two symbols combined.

5.Repeat steps 2, 3, 4 until there are only two symbols left.

6.Assign the final two combined symbols a 0 and a 1.

6.PROCEDURE

Step 1:Start the MATLAB program.

Step 2: In the Command Window, type the MATLAB code line by line, as given in Example (see Section 7). Observe the result after typing each line.

Alternatively, you can type the entire MATLAB code in an m-file (this is actually the preferred method). By saving the m-file, you can run the code at a later date, or you can make minor modifications without retyping the entire code, or you can copy the code to another computer to be executed there.

Step 1:Start the MATLAB program.

Step 2:To view the present working directory, type:

pwd

This is where all your m-files will be stored. To change the present working directory, use the "cd" command. For example:

cd d:\infot\exp1

Step 3:Go to File -> New -> M-file. The MATLAB Editor window will open.

Step 4:Type the entire code as given in Example (see Section 7).

Step 5: Save the m-file by going to File -> Save As. Choose a short name for your m-file. Make sure it contains no spaces.

Step 6:To run the m-file from the MATLAB Editor window, go to Tools -> Run. You can also execute the m-file from the Command Window by typing the m-file name (without the “.m”) on the command line.

7.EXAMPLE

STEP 1: Generating the message sequence

Let us generate a message which consists of one sentence of text: 'information theory is interesting'. The message will be stored in a variable called msg. In the MATLAB command window, type

msg = 'information theory is interesting'

Our message contains a total of 33 characters, including spaces.

STEP 2: Estimating the source

Before a Huffman coder can properly encode the given message, it requires information about the source alphabet and the source statistics. The source alphabet is a list of distinct symbols in the message sequence. There are altogether 14 distinct symbols in our message:

symb = {'i' 'n' 'f' 'o' 'r' 'm' 'a' 't' ' ' 'h' 'e' 'y' 's' 'g'}

The source statistics, on the other hand, is a list of probabilities corresponding to each symbol in the source alphabet. Since we have 14 distinct symbols, our source statistics should have 14 probabilities:

p = [5/33 4/33 1/33 3/33 3/33 1/33 1/33 4/33 1/33 3/33 1/33 2/33 1/33 3/33]

Observe that we have enclosed the elements of symb with curly braces { } instead of the usual square brackets [ ]. Doing so indicates that symb is a cell array, instead of a character array. The variable symb will become an input to a later function called huffmandict, which requires that the source alphabet be of cell data type.

STEP 3: Constructing the Huffman codebook

Based on the supplied source alphabet and source statistics, the MATLAB function huffmandict generates the Huffman codebook:

dict = huffmandict(symb, p)

The entire codebook construction process (including the Huffman tree) is done in the background, and is invisible to the user. The resulting codebook is stored in the variable dict.

Like the variable symb previously, the variable dict is also a cell array. In order to access the elements in dict, we again use curly braces{ }. To see the Huffman code for the first symbol, type:

dict{1,:}

To view the Huffman code for the rest of the symbols, simply increment the subscript value:

dict{2,:}

dict{3,:}

STEP 4: Encoding the message

Now that the Huffman codebook is ready, we may proceed to convert the message sequence into a binary stream:

binstream = huffmanenco(msg, dict)

The encoding process simply consists of replacing each symbol in the message with the corresponding binary word in the codebook.

STEP 5: Decoding the binary stream

The binary stream will be transmitted over a communications channel to the receiver. At the receiving end, it will be decoded in order to recover the original message sequence. Message decoding is generally performed with the help of a code tree. In MATLAB, we simply type:

msgdeco = huffmandeco(binstream, dict)

Huffman code is a type of prefix code, so no ambiguity will arise in the decoding process. If our transmission is error-free, the decoded message should match the original message emitted by the source. Also note that the same dictionary (or codebook) is used for both encoding and decoding.

8.PROBLEM

1.Generate a message sequence that incorporates your name. The message should be phrased as

'my name is <insert name here>'

Store the message in a variable called myname.

2.Estimate the source alphabet and source statistics of your message. Store the source alphabet in the variable salpha, and the source statistics in the variable sstat.

3.Draw the Huffman tree corresponding to the information found in part 2 (do this by hand). Trace the branches in the Huffman tree to construct the Huffman codebook.

4.Type the MATLAB code that automatically generates the Huffman codebook. Compare this codebook with the one constructed manually in part 3. Is there any difference between the two codebooks? If so, why?

5.Encode your message sequence and convert it to a binary stream.

6.Answer the following questions:

(a) How many bits were used to encode your message?

(b) If 7-bit ASCII were employed instead of Huffman, what would the length of your binary stream be?

(c) Estimate the percentage of compression that is achieved by Huffman coding relative to using 7-bit ASCII.

(d) Calculate the efficiency of your Huffman coder.

9.REPORT

Print out and submit your report 2 weeks after the laboratory session. The report should contain the following:

  • Introduction
  • Procedures
  • Results and Discussion
  • Conclusions
  • Appendices (MATLAB code)

Lab report writing is an individual work. Fabricating results and copying the report of others are strictly prohibited. Severe penalties will be imposed for any such offences.

FACULTY OF ENGINEERING

LAB REPORT SUBMISSION

ETM2126 INFORMATION THEORY

AND ERROR CODING

TRIMESTER 2 SESSION 2010/2011