COMP1321 WEEK 3 – CPU Simulator Programs, and Assembly Language

Exercise 3(a) Using a CPU simulator with simple instruction set

As you know by now (I hope!) each CPU type has its own instruction set. These represent commands to the CPU to do something!

The Intel 8086 family has over 100 instructions, but only a small number of regularly used. This simulator uses a typical small subset of CPU instructions, but the trigger codes will be different from real 8086 instructions (more about this later…)

The simulator puts into action the fetch-execute principles that were covered in the lecture.

Links to simulator provided here:

Download to a folder your local machine… and run…

You can use this simulator with the commands:

EndAddSubtractStoreLoadBranchAlwaysBranch if ACC = 0Branch if ACC >= 0 Input Output

Each command has a corresponding “opcode”, and input and output functions are triggered by different operands with the same opcode.

Try it out now…

We’ll look at a different simulator next week.

Some real assembly language…

The main purpose of the following exercises exercises are to familiarise you with use of a command line interface (CLI), but it also explores some important low-level programming concepts. You should, by the end of the session, be able to:

* Use DEBUG from the prompt

* Use DEBUG to unassemble a named executable program.

* Recognize the 8086 assembly language symbols (or mnemonics) for moving data and interrupting the CPU.

* Recognize the address format for a full 8086 memory location.

* Appreciate that assembly language programs can store "raw" data as well as instructions.

* Get out of debug, and back to the CLI prompt.

* Appreciate that executable assembly language programs are much shorter than equivalent programs written in conventional (third generation) languages.

* Appreciate that an assembly language program takes longer to design and code than a program written in a third generation language.

* Suggest types of program that should be written in assembly language and why it is often advisable to write programs in a 3GL (or 4GL), in preference to a 2GL.

Exercise 3(b): What does an Assembly Language Program look like?

As you are using 64-bit Windows 10. You’ll need to install the command line and debugger program yourself. This is not difficult!

1.Locate the program DOSBOX from the Internet, download it and install it to your local machine. Note the folder where it is installed (clue: “program files (x86)” )

  1. Copy the DOSBOX folder to a more accessible location (create a new folder called DOSBOX and copy it there.
  1. Locate and dowmload the program DEBUG.EXE from the Internet. Copy to the DOSBOX folder.
  1. Run DOSBOX from there. You should get an X: prompt
  1. Type MOUNT C: C:/DOSBOX
  1. Type C:
  1. Type DEBUG and all is ready. Exit!

Using DEBUG, it is possible to take a glance at the actual contents of any program.

  1. Find a program (ending with .exe, e.g. xyz.exe) and copy to DOSBOX folder
  1. Type DEBUG from the Command Prompt, followed by the filename, e.g. XYZ.EXE. The debug prompt (-) should appear.
  1. Type u (for unassemble). This changes the executable code of XYZ.EXE back into assembly language). The peculiar symbols you can see are 8086 assembly language codes (also known as mnemonics).

A further series of codes appear on the screen. These are the "unassembled" assembly language instructions.If you are unsure about hexadecimal, don't despair! Shortly, you’ll see and use a useful tool to manipulate hexadecimal numbers.

Quick Answer Question 1 : Why does each location in memory need an address?

QAQ 2 : What is the biggest decimal number that could be represented by two hexadecimal digits?

QAQ 3: How many addresses could be covered through 8 hex numbers, such as 0D4C:0000?

Notice the sequences of eight hexadecimal numbers on the left of the screen. These are the addresses for the locations in memory where the codes are stored. When I did this on my computer the starting address was 0D4C:0000. What was yours?

Also notice the numbers next to the memory addresses. These are the hexadecimal representations of the binary numbers stored in each memory locations.

11. Press u again… and again… and again… you can view the whole program in this way!

Answers to QAQ's : 1. so the CPU can find it again & gain access to its contents; 2. 255

3. 2power16

Exercise 3©:What do the Codes produced by Debug "Unassemble" actually mean?

Before an executable computer program can run, the program must be loaded into the computer's RAM. If that part of memory is accessed, the contents of individual memory locations can be scrutinised. This is exactly what the DEBUG program is doing.

These memory locations will contain the "machine code" that the microprocessor understands. However, not all locations used by the program will contain machine code...

The "Registry editor” program contains quite a lot of assembly language instructions. Sometimes, programs also contain data. These will be mistakedly unassembled by the “less than perfect” DEBUG, which assumes that they are instructions (!).

Quick Answer Question 4: One of the on-screen memory locations at one stage contained the hexadecimal number 48. Which letter is this the ASCII code for? Upper or Lower Case?

The actual program instructions have been correctly unassembled by DEBUG, and can be identified as MOV, INT, ADD, etc.

Quick Answer Question 5: Which part of the computer processes an assembly language instruction?

As you might guess, a MOV instruction is all about moving data from one place to another. Notice that two bytes (and therefore two memory locations) are required.The first represents the code for the instruction itself, and the second refers to the data being moved.

An INT instruction represents a "software" interrupt to the microprocessor CPU (central processing unit). Interrupts are often used to input and output data. More about those later. Notice that two bytes are again required - one for the instruction, and the other for the data itself.

PUSH and POP are all about manipulating an area of memory called the stack.

If this is again confusing, you will be glad to know that the following section will look at memory locations in more detail...

Exercise 3(d) How is data actually stored in a Memory Location?

A memory location contains eight electronic switches, which are either in the "off" (0) or "on" (1) state. If these are put together, the location can be considered to "contain" a binary number, eight characters long. This is known as a byte.

Answers to QAQs : 4. H (capital) 5. Central Processing Unit (CPU)

An eight character binary number is rather long and cumbersome, so it is normally considered as two separate four character numbers (which, believe it or not, are known as "nibbles"). Each nibble can be represented by a single hexadecimal digit.

QAQ6: Why do computers store data in binary, instead of decimal numbers?

Like most computer system utilities, Debug works in hexadecimal numbers. Those of you who are more comfortable with base 10 numbers will be glad to know that it can add and subtract any hexadecimal numbers for you...

  1. Now, type h from the dash prompt, followed by any two sets of two hexadecimal numbers. The screen display should provide you with their sum and differeence.

QAQ7: What is the sum of A1 (hexadecimal) and 23 (hexadecimal)? Answer as a hexadecimal number, please!

QAQ8: And their difference?

  1. That's quite enough assembly language for a start! You can get out of debug by

typing q at the dash prompt...

Now, a little background reading...

The development of Assembly and "High Level" Languages

The early computers were programmed by a very long and tedious process. This involved the programmer writing lots of binary numbers directly into the computer's memory. This original computing language became known as machine language (because it was understood by the machine!).

Fairly soon, programmers on a particular computer probably thought that "there must be a quicker way to do this!". They were probably fed up with feeding in the same two or three bytes into memory for the computer to perform a certain task, and so they developed a sort of short-hand form. The short-hand could be converted into the necessary bytes, and inputted directly into the computer. This would have been the first assembly language, and the converter would have (eventually) been called an assembler.

assembly language  machine language

assembler

QAQ 9: Which would be quicker to write - A program in machine language, or the same program in Assembly Language?

The reverse process (unassembly) is also possible, using a conversion program called an unassembler. The problem with unassemblers is that they try to make assembly language symbols out of every memory location - even those that contain data!

machine language  assembly language

unassembler

ASCII coded data  assembly language

unassembler

So it is very important to check any unassembled code, to make sure this has not happened (the instructions will not make any sense, in any case).

QAQ 10: A group of memory locations contain hexadecimal numbers that make no sense in machine language. Can you be sure that the contents of these locations have been corrupted?

Further reading:

Answers to QAQs :6. Memory hardware can only be in of two states (0 and 1) 7. C4 8. 7E 9. assembly language 10. No - could contain data

Exercise 3(e) Introducing input-output signals and SAM1

To be completed for next week…

CPUs are used for processing, but their input-output capabilities can also be used for control. If you have time:

Go straight to Activity 2. To access SAM1, go to:

Try to complete activities through to 6. If you don’t have time now, you are recommended to complete these in time for next week’s session, when we will start to look beyond the CPU…

RCH171