The MARIE: A Simple Computer
Goals for this chapter:
1.Describe the top–level organization of a modern stored–program computer;
specifically name and describe the four main components.
2.Describe some of the lower–level components, such as registers,
that are common to three of the four main components of the computer.
3.Define an assembly language for this simple computer and use that assembly
language to investigate the functioning of a stored program computer.
NOTES:
1.The hypothetical computer designed by our textbook’s authors is called
the MARIE. See the textbook for the origin of the name.
2.The MARIE has an extremely simple design. Any computer the student sees or
uses, including the computers in one’s microwave oven and washing machine,
are more powerful than the MARIE.
3.While a very simple design, the MARIE illustrates all of the important features
of a modern stored–program computer. “Simpler” means “easier to learn”.
Why a Binary Computer?
The MARIE is typical of all modern computers in that it is a binary device.
1.All data are stored in binary format, and
2.All arithmetic is performed using two’s–complement binary conventions.
Why not have decimal format? Just create memory elements that store one of 10 values.
The answer has two parts. Each is valid.
1.It is easy to make binary storage devices and binary arithmetic devices.
2.There are a number of significant design challenges in creating ten–state
devices. Mostly these have to do with electronic reliability.
In fact, no modern electronic computer has ever used true decimal storage and arithmetic.
Early machines, such as the ENIAC, were called decimal computers. In fact:
1.Each decimal digit was stored in binary format, and
2.All arithmetic was done in binary format, as adapted for storage
of individual decimal digits.
Computer Basics and Organization
The computer has four top–level components.
1.The CPU (Central Processing Unit)
2.The Main Memory
3.Input/Output Devices, including a Hard Disk
4.A Bus Structure to facilitate communications between the other components.
Major Components Defined
The system memory (of which this computer has 512 MB) is used for transient storage of programs and data. This is accessed much like an array, with the memory address serving the function of an array index.
The Input / Output system (I/O System) is used for the computer to save data and programs and for it to accept input data and communicate output data.
Technically the hard drive is an I/O device.
The Central Processing Unit (CPU) handles execution of the program.
It has four main components:
1.The ALU (Arithmetic Logic Unit), which performs all of the arithmetic
and logical operations of the CPU, including logic tests for branching.
2.The Control Unit, which causes the CPU to follow the instructions
found in the assembly language program being executed.
3.The register file, which stores data internally in the CPU. There are user
registers and special purpose registers used by the Control Unit.
4.A set of internal busses to allow the CPU units to communicate.
A System Level Bus, which allows the top–level components to communicate.
The System Clock
We discussed the system clock when we discussed the basic flip–flops.
Here are two depictions of the system clock that will be used in this course.
The top representation is used when discussing the system bus and the I/O bus.
The bottom representation is used elsewhere.
The system clock regulates the execution of instructions in the CPU and
synchronizes all of the CPU components to prevent errors due to bad timing.
The System Bus
The system bus comprises a set of digital lines; each of which carries a signal, power, or ground. The ground is used to complete circuits.
If the line carries a signal, that line is set either to logic 1 (often +5 volts) or logic 0
(often 0 volts). Consider it as transmitting a Boolean variable, in that the value on
that line can be changed by some control unit as a program is executed.
The ground lines also provide isolation between the signal lines, which can act as little antennas – either to “broadcast” signals or receive them from other signal lines.
This is called “cross talk”. It is not desirable.
High–speed busses are shorter than low–speed busses. High–speed busses are used for:
1.Connecting the CPU to main memory,
2.Connecting the CPU to the graphics system, and
3.Possibly connecting the CPU to other high–speed devices.
Low–speed busses are used to connect the CPU to I/O devices, possibly including the
main hard disk.
The speed of a data bus is determined by the time it takes an electrical signal to travel its
length. Light travels about 1 foot per nanosecond; signals travel about 8 inches in that time. A bus operating at 1 GHz cannot be longer than 8 inches; likely it is shorter.
Notations Used for a Bus
Here is the way that we would commonly represent a small bus.
The big “double arrow” notation indicates a bus of a number of different signals.
Our author calls this a “fat arrow”.
Lines with similar function are grouped together. Their count is denoted with the
“diagonal slash” notation.
From top to bottom, we have
1.Three data linesD2, D1, and D0
2.Two address linesA1 and A0
3.The clock signal for the bus.
Not all busses transmit a clock signal; the system bus does.
Power and ground lines usually are not shown in this diagram.
Busses: Common and Point–to–Point
In general, a design should minimize point–to–point busses, as they introduce a
number of difficulties into the design.
Shared busses tend to have lower data rates than point–to–point busses, which then
are the design choice when the bus must support a high data rate.
Typical high–rate busses include the memory bus and the graphics bus. Each of these
is implemented as a point–to–point bus for two reasons:
1.To maximize the data rate, and
2.Because there is only one device with which the CPU communicates.
Some high–rate busses are shared busses. An example would be a bus connecting the
ALU to the register file in the CPU. At any time, at most one of the registers is using
this bus to communicate data to the ALU.
Busses that manage most I/O devices tend to be shared.
Connecting External Devices to the Computer Bus
External devices include printers, network cards, disk drives, and the computer keyboard.
Each device must be connected to one of the computer busses through a device called
an interface, often an “interface card”.
Each device will have software dedicated to controlling it, called a “device driver”.
Device drivers are often considered part of the computer operating system, because
they are called only by the operating system.
The main function of the device driver is to translate between the standard device
control signals used by the operating system and the device–specific control
signals required by the device’s interface card.
The function of the interface card is to present the data and control signals, properly
formatted, to the device being managed, and accept data back.
From the view of the CPU, each device is represented as a number of addressable
registers, some containing data and some control information. The interface card
presents these to the actual device. We shall develop this idea later.
Asynchronous and Synchronous Busses
One aspect of a bus depends on what assumptions can be made about the timing of
the devices attached. Can each device be assumed to work with fixed timing?
Consider a keyboard attached to a common bus. This produces data only when a
user actually presses a key. The timing of data availability is totally unpredictable.
In cases such as managing most I/O devices, an asynchronous bus is used. This means
that there is no clock signal used to coordinate events.
An asynchronous bus must use specific control signals to coordinate between the device
producing the data and the device receiving the data. Here is a sample set:
Requestthe CPU signals the device that input is required.
Readythe input device signals the CPU that data are ready to be read.
ACKthe CPU acknowledges that it has received the data.
For some devices, such as memory, we may assume constant timings. Here we have
fewer control signals. The sequence to read memory is typically simple.
1.The CPU asserts a memory address and the READ control signal.
2.After a fixed time, the CPU reads the data from the appropriate memory register.
The data can be assumed to be present and correct at the specified time.
The Memory Component
The memory stores the instructions and data for an executing program.
Memory is characterized by the smallest addressable unit:
Byte addressablethe smallest unit is an 8–bit byte.
Word addressablethe smallest unit is a word, usually 16 or 32 bits in length.
Most modern computers are byte addressable, facilitating access to character data.
Logically, computer memory should be considered as an array.
The index into this array is called the address or “memory address”.
A logical view of such a byte addressable memory might be written in code as:
ConstMemSize =
byteMemory[MemSize] // Indexed 0 … (MemSize – 1)
The CPU has two registers dedicated to handling memory.
The MAR(Memory Address Register) holds the address being accessed.
The MBR(Memory Buffer Register) holds the data being written to the
memory or being read from the memory. This is sometimes
called the Memory Data Register.
The Simplistic Physical View of Memory
I call this the “linear view”, as memory is still modeled as one large linear array.
The N–bit address selects one of the 2N entities, numbered 0 through (2N – 1).
Read sequence:Firstaddress to MAR; command a READ.
thencopy the contents of the MBR.
Write sequence:Firstaddress to MAR; data to the MBR.
thencommand a WRITE.
This is logically correct, but difficult to implement at an acceptable price.
Memory Organization and Addressing
Memory is based on binary bits. Each bit can hold one of two values: 0 or 1.
Except for unusual designs, individual bits in memory are not directly addressable
by the CPU (Central Processing Unit). The old IBM 1401 could access bits directly.
The most common memory groupings are as follows:
8 bitsa byte
16 bits a word(some call this a short word)
32 bitsa longword(some call this a word)
The term “word” is somewhat ambiguous due to multiple definitions. In this course, we refer to “16–bit word”, “32–bit word”, etc.
In some computers, a word is the smallest addressable memory unit. Most of these, such as the CDC–6600 (60–bit words) are now obsolete.
In a byte–addressable computer (such as the Intel Pentium series), each byte is addressable individually, although 32–bit words can be directly accessed.
All computers with byte addressing provide instructions to access both 16–bit words and
32–bit longwords. The CPU just accesses two or four bytes at a time.
Memory Organization and Addressing (Part 2)
Memory is often described by a notation with the structure (L x W)
L is the number of addressable units in memory
W is the number of bits in memory
The old CDC–6600 usually had a 256 K x 60 memory. This was 256 1024 =
262, 144 words, each of 60 bits. Yes, this was called a “supercomputer”.
A modern Pentium might have a memory described as 512 M x 8;
512 220 = 512 1, 048, 576 = 536,870,912 addressable units, each with 8 bits.
This would be called a 512 MB memory.
Main memory sizes are not quoted in bits. Memory chip sizes often are quoted in bits,
but could be quoted in numbers of 4–bit elements as well as 8–bit bytes.
Common notation:1K = 210 = 1, 024(almost never seen these days)
1M = 220 = 1, 048, 576
1G = 230 = 1, 073, 741, 824
Address Space and Memory Addressing
N bits will address 2N items. Pentium has 32 bit addressing and will address 232 bytes.
To address M items, we need N bits, with 2N–1 < M 2N. Quite often these days, we
have either M = 2N (obviously requiring N bits to address). It is also the case that we normally just state the number of address bits and keep the actual memory small enough.
Example:One of my recent laptop computers had 384 MB of byte–addressable
memory. How many bits would be required to address this?
Answer:This memory has 384 MB as is byte addressable, so it has to be able
to address 384220 distinct memory cells. 1M = 220 = 1, 048, 576.
Now 384 = 256 + 128 = 28 + 27,
so 384 MB = 384220 bytes = (28 + 27)220 bytes = (228 + 227) bytes.
If M = (228 + 227), it should be obvious that 228 < M 229, so we need a 29–bit address.
Practicality:All Pentium computers provide a 32–bit address through
a 32–bit MAR (Memory Address Register).
RULE:Memory is cheap. You can never have enough memory on a computer.
When you buy a computer, buy all the memory it can be configured to use.
Memory as a Collection of Chips
In fact, physical memory is built from standard memory chips. For example, a 256 MB memory might be built from sixteen 16 MB chips, each of which might itself be implemented as eight 16 Mb (megabit) chips; a total of 128 chips.
Consider the textbook’s example: a 32 KB memory built from 4KB chips.
32 KB = 215 bytes and 4 KB = 212 bytes. We need (215 / 212) = 23 = 8 chips. In standard fashion, these chips will be numbered as 0 through 7 inclusive.
We need a 15–bit address for this memory. Address bits are numbered 14 through 0.
Here we adopt low order interleaving. Consecutive addresses are placed in different chips. This facilitates faster access to memory. Here is the textbooks figure showing the location of the first 32 addressable bytes.
Low–Order Interleaving: Partitioning the Address
Low–order interleaving will always use a chip count that is a power of 2; 2K with K > 1.
The N–bit memory address will be broken into K bits for the chip selection and
(N – K) address bits for each chip.
In our example N = 15 and K = 3. In this low–order interleaving, the three low order bits select the chip to be used. These are bits 2, 1, and 0.
bit / 14 / 13 / 12 / 11 / 10 / 9 / 8 / 7 / 6 / 5 / 4 / 3 / 2 / 1 / 012–bit address to each chip. / Chip Select
In high–order interleaving (also called “memory banking”, not much used)
the high–order K bits select the chip. In our example
bits 14 – 12 select the chip and bits 11 – 0 are sent to each chip.
In low–order interleaving Chip_Number = Address Mod 2K, the remainder from division by the number of chips in the chip set. Always this count is a power of 2.
This organization is often closely connected to the size of the memory cache blocks.
If the memory is 2K–way interleaved (low order), each cache line might have 2K bytes.
Why have low–order interleaving?
This choice is due to the principle of locality; memory locations tend to be accessed one after another. If consecutive locations are in different chips, the CPU can initiate a number of memory–read operations at a rate faster than the memory chips can handle.
Consider the organization from the book, with an 8–way low interleaving.
Suppose that the CPU wants to fill a cache line with the eight bytes, indexed 8 to 15.
The CPU sends an address and READ command to module 0.
Without waiting for a response, the CPU sends an address and READ to module 1.
Finally, the CPU sends an address and READ command to module 7.
Then, the CPU actually reads from module 0.
If the memory access time is 80 nanoseconds, the CPU can issue on command every
10 nanoseconds as it will take 80 nanoseconds to get back and read a given module.
The Central Processing Unit (CPU)
The CPU also has four main components:
1.The Control Unit (along with the IR) interprets the machine language instruction
and issues the control signals to make the CPU execute it.
2.The ALU (Arithmetic Logic Unit) that does the arithmetic and logic.
3.The Register Set (Register File) that stores temporary results related to the
computations. There are also Special Purpose Registers used by the Control Unit.
4.An internal bus structure for communication.
The Register File
There are two sets of registers, called “General Purpose” and “Special Purpose”.
The origin of the register set is simply the need to have some sort of memory on the computer and the inability to build what we now call “main memory”.
When reliable technologies, such as magnetic cores, became available for main memory, the concept of CPU registers was retained.
Registers are now implemented as a set of flip–flops physically located on the CPU chip. These are used because access times for registers are two orders of magnitude faster than access times for main memory: 1 nanosecond vs. 80 nanoseconds.
General Purpose Registers
These are mostly used to store intermediate results of computation. The count of such registers is often a power of 2, say 24 = 16 or 25 = 32, because N bits address 2N items.
The registers are often numbered and named with a strange notation so that the assembler will not confuse them for variables; e.g. %R0 … %R15. %R0 is often fixed at 0.
NOTE:The MARIE has only one general purpose register – the AC (Accumulator).
Think of the AC as the display on a standard calculator.
The Register File
Special Purpose Registers
These are often used by the control unit in its execution of the program.
PCthe Program Counter, so called because it does not count anything.
It is also called the IP (Instruction Pointer), a much better name.
The PC points to the memory location of the instruction to be executed next.