Cache: Volatile, Very Fast, and Expensive

Memory Hierarchy:

-Cache: volatile, very fast, and expensive.

-RAM: volatile, medium-speed, and medium-price.

-Disk Storage: nonvolatile, slow, and inexpensive.

Memory Manager: The part of the O/S that manages the memory hierarchy.

-Keeps track of which parts of memory are in use and which parts are not in use.

-Allocates memory to processes when they need it and de-allocate when they’re done.

-Manages swapping between MM and disk when MM is too small to hold all processes.

1)Basic Memory Management:

-Memory management systems are divided into two classes:

Those that move processes back and forth between MM and disk during execution (swapping & paging), and
Those that do not.

a)Monoprogramming without Swapping or Paging:

i)Runs one program at a time, sharing the memory between the O/S and that program.

b)Multiprogramming with Fixed Partitions:

i)Multiprogramming increases the utilization of the CPU.

ii)To achieve multiprogramming, divide memory into n partitions “fixed”.

iii)When a job arrives, it is put in the input queue for the smallest partition large enough to hold it.

iv)In this scheme, any space not used by a job is lost.

v)MFT (Multiprogramming with a Fixed Number of Tasks) used on IBM OS/360: has incoming jobs queued until a suitable partition is available, at which time the job is loaded into that partition and run until it terminates.

Relocation and Protection: They are problems produced by multiprogramming

(1)When a program is linked, a linker must know at what address the program will begin in memory.

(2)Ex: if the first instruction is a call to a procedure at absolute address 100, then the program is loaded into partition 1, that instruction will jump to absolute address 100, which is inside the O/S. What is needed is a call to 100K + 100. This problem is known as relocation.

(3)Relocation during loading doesn’t solve the protection problem. Programs in this system use absolute memory address and that is why there is no way to stop programs from building an instruction that reads or write any word in memory. This is especially dangerous in a multi-user environment.

(4)Two solutions:

(a)PSW, which contains a 4-bit key, that has a protection code to each block. Any attempt by a running process is trapped if protection code differed from the PSW key.

(b)Two hardware registers, called the base and limit registers.

(i)When a process is scheduled, the base register is loaded with the address of the start of its partition, and the limit register is loaded with the length of the partition.

(ii)Every memory address generated has the base register contents added to it before being sent to memory.

(iii)Users can’t modify the base and limit registers.

2)Swapping:

Sometimes there is not enough MM to hold all the currently active processes, so excess processes must be kept on disk and brought in to run dynamically.

Two approaches to memory management can be used:

a)A strategy that consists of bringing in each process in its entirety, running it for a while, then putting it back on the disk.

b)Another strategy, called virtual memory, allows programs to run even when they are only partially in memory.

c)Swapping is illustrated in Figure 4-3. The flexibility of not being tied to a fixed number of partitions that may be too large or too small improves memory utilization, but it also complicates allocating and de-allocating memory, as well as keeping track of it.

d)Memory Compaction: When swapping creates multiple holes in memory, it is possible to combine them into one big one by moving all processes downward as far as possible. It is a technique that requires a lot of CPU time.

e)If processes’ data segments grow, by dynamically allocating memory from heap, a problem occurs.

f)If a hole is adjacent to the process, it can be allocated and the process allowed is to grow into the whole.

g)If the process is adjacent to another process:

i)The growing process will either have to be moved to hole in memory large enough for it, or

ii)One or more processes have to be swapped out to create a large enough hole.

iii)If a process can’t grow in memory and the swap are in the disk is full, the process will have to wait or be killed.

Figure 4-4.

h)Memory management with Bit Maps “Memory is assigned dynamically”:

i)Memory is divided up into allocation units. Corresponding to each allocation unit is a bit in the bit map, which is 0 if the unit is free and 1 if it is occupied.

ii)The smaller the allocation unit, the larger the bit map.

iii)Disadvantage: When it has been decided to bring a k unit process into memory, the memory manager must search the bit map to find a run of k consecutive 0 bits in the map. “Slow operation.”

i)Memory Management with Linked Lists:

i)Maintain a link list of allocated and free memory segments, where a segment is either a process or a hole between two processes. Fig. 4 – 5 (c).

ii)The segment list is kept sorted by address.

(1)Advantage: When a process terminates or is swapped out, updating the list is easy.

iii)A terminating process has 2 neighbors (except when it is at the very top or bottom of memory).

iv)Neighbors may be either processes or holes, leading to four combinations in Fig. 4 – 6.

When the processes and holes are kept on a list sorted by address, several algorithms can be used to allocate memory for a newly created or swapped in process.

v)First Fit Algorithm: Memory manager scans along the list of segments until it finds a hole that is big enough. The hole is then broken up into two pieces:

(1)One for the process, and

(2)One for the unused memory, except in the unlikely case of an exact fit.

Simple, fast “little searching is involved”, and creates largest hole.

vi)Next Fit Algorithm: Same logic as first fit, except that it

(1)Keeps track of where it is whenever it finds a suitable hole.

(a)The next time it is called to find a hole, it starts searching the list from the place where it left off last time. Instead of always at the beginning like the FF does.

vii)Best Fit Algorithm:

(1)Search entire list and takes the smallest hole that is adequate.

(2)Problems:

(a)Slow

(b)More useless holes “It tends to fill up MM with tiny holes”

viii) Worst Fit Algorithm to avoid tiny holes problems:

(1)Always take the largest available hole, so that the hole broken off will be big enough to be useful. Obvious problems.

3)Virtual Memory: Basic idea it is that the combined size of the program, data, and stack may exceed the amount of physical memory available for it. The O/S keeps those parts of the program currently unused in MM, and the rest on disk.

Ex: a 16 MB program can run on a 4MB machine by carefully choosing which 4MB to keep in memory at each instance, with pieces of the program being swapped between disk and MM as needed.

a)Paging: A technique used by virtual memory.

Ex:MOVE REG, 1000

“Copies contents of memory address 1000 to REG.”

These program-generated addresses are called virtual addresses and form the virtual address space.

i)MMU (Memory Management Unit): a chip that maps the virtual addresses onto the physical memory addresses.

ii)Ex: Fig. 4 – 8. A complete copy of a program’s core image must be present on disk so that pieces can be brought in memory as needed.

iii)MOVE REG, 0, MOVE REG, 8192, virtual address 20500 (20480 – 24575)

iv)Pages: The virtual address space is divided up into units.

v)Page Frames: The corresponding units in the physical memory.

vi)Page Fault: If a page is unmapped it causes a trap.

vii)Page Table: The page number is used as an index in the page table.

Ex: MOVE REG, 32780

The MMU notices that the page is unmapped and causes the CPU to trap to the O/S causing a page fault.

The O/S picks a little-used page frame and writes its contents to the disk.

It then fetches the page just referenced into the page frame just freed, changes the map, and restarts the trapped instruction.

If the O/S decides to evict page frame 1, it would mark virtual page 8 at physical address 4K and make two changes to the MMU map:

Mark virtual page 1’s entry as unmapped “X”, to trap any future accesses to virtual addresses between 4K and 8K.
Replace the cross “X” in virtual page 8’s entry with a 1, so that when the trapped instruction is re-executed, it will map virtual address 32780 onto physical address 4108 (4096 + 12).

-Why have we chosen a page size that is a power of 2?

-Fig 4 – 9 shows an example of a virtual address being, 8196 (0010000000000100), being mapped using MMU map of Fig. 4 – 8.

-The incoming 16-bit virtual address is split up into a 4-bit page number and a 12-bit offset.

-With 4 bits number for the page number, we can represent 16 pages, and with 12 bits for the offset, we can address all 4096 bytes within a page.

-The page number is used as an index into the page table, yielding the number of the page frame corresponding to the virtual page.

-If the Present/absent bit is 0, a trap to the O/S is caused.

-If it is 1, the page frame number found in the page table is copied to the high-order 3 bits of the output register, along with the 12-bit offset, which is copied unmodified from the incoming virtual address. Together they form a 15-bit physical address.

-The output register is then put onto the memory bus as the physical memory address.

b)Page Tables: Map virtual pages onto page frames.

i)Major Issues:

(1)The page table can be extremely large “modern computers use virtual address of at least 32 bits.” With a 4KB page size, that leaves 220 for pages.” Each process needs its own page table.

(2)The mapping must be fast. The virtual-to-physical mapping must be done on every memory reference. The need for large, fast page mapping is a significant constraint on the way computers are built.

ii)Multilevel Page Tables: To get around having huge page tables in memory all the time, use multilevel page table. We have a 32-bit virtual address that is partitioned into a 10-bit and PT1 field, a 10-bit PT2 field, and a 12-bit offset field. Since offsets are 12 bits, pages are 4K, and there are 220 of them.

(1)Only page tables that are needed should be kept in MM.

(2)Figure below shows a top-level page table with 1024 entries corresponding to the 10-bit PT1 field. MMU extracts the PT1 field and uses this value as an index into the top page table.

(3)Each of these 1024 entries represents 4 MB b/c the entire 4-GB virtual address space has been chopped into chunks of 1024 bytes.

(4)The entry located by indexing into the top-level page table yields the address of the page frame number of a second-level page table.

-Entry 0 of the top-level page table points to the page table for the program text, entry 1 points to the page table for the data, and entry 1023 points to the page table for the stack.

-The shaded entries are not used.

-The PT2 field is now used as an index into the selected second-level page table to find the page frame number for the page itself.

-EX: Consider a 32-bit virtual address 0x00403004 “4,206,596 Decimal”, which is 12,292 bytes into the data.

-This address corresponds to PT1 = 1, PT2 = 3, and offset = 4.

-The MMU uses PT1 to index into the top-level page label and obtain entry 1 that corresponds to address 4 MB – 8 MB.

-It then uses PT2 to index into the second-level page table just found and extract entry 3, which corresponds to addresses 12288 – 16383 within its 4 MB chunk (absolute address 4,206, 592 – 4,210,687  4MB + 12288 - 4MB + 16383)

-This entry contains the page frame number of the page containing virtual address 0x00403004.

-If the page is not in memory, Present/absent bit will be 0 causing a page fault.

-It the page is in memory, the page frame taken from the second-level page table is combined with the offset (4) to construct a physical address.

-The address is put on the bus and sent to memory.

The figure below shows a sample page entry table:

-Page frame number: the goal of the page mapping is to locate this value.

-Present/absent bit: if 1, the entry is valid; if it is 0, the virtual page isn’t currently in memory and a page fault is caused.

-Protection bit: tells what kind of access is permitted. The bit is 0 for read/write, 1 for read only.

-Modified & Referenced bits: keep track of page usage. When a page is written to, H/W automatically sets the modified bit “dirty bit.” If the page has been modified “dirty”, it must be written back to disk. If it hasn’t been modified “clean”, it can be abandoned since the disk copy is still valid.

-The Referenced bit is set whenever a page is referenced, either for reading or writing. Its value helps O/S choose a page to evict when a page fault occurs. Pages not used are better candidates than pages that are.

-Caching disabled bit: If O/S is waiting for an I/O device to respond to a command, it is essential that H/W fetches the word from the device, and not use an old cached copy. With this bit caching can be turned off.

c)TLBs-Translation Lookaside Buffers “associative memory”:

i)With paging, additional memory references are needed to access the page table. “Reduces CPU performance.”

ii)Most programs tend to make a large number of references to a small number of pages, and not the other way around. Thus only a small fraction of the page table entries are heavily read; the rest are barely used.

iii)Solution: A hardware device “TLB” for mapping the virtual addresses into physical addresses without going through the page table. The device is usually inside the MMU and illustrated in figure below. It consists of:

(1)A small number of entries which contain information about one page:

(a)The virtual page number

(b)A bit is set when the page is modified

(c)The protection code (Read/Write/Execute) permissions

(d)The physical page frame in which the page is located

(e)A bit to indicate whether the entry is valid or not.

They have one-to-one correspondence with the fields in the page table.

The functionality of TLB:

When a V.A. is presented to the MMU for translation, the H/W checks to see if the virtual page number is present in the TLB by comparing it to all the entries simultaneously.
If a valid bit is found and the access doesn’t violate the protection bit, the page frame is directly taken from the TLB, without going to the page table.
If a virtual page number is present in the TLB but the instruction is trying to write on a read-only page, a protection fault is generated.

4)Page Replacement Algorithms:

a)The Optimal Page Replacement Algorithm: “Impossible to implement, Crystal Ball”

i)At the moment that a page fault occurs, some set of pages is in memory. One of these pages will be referenced on the very next instruction.

ii)Other pages may not be referenced until 10, 100, 1000 instructions later.

iii)Each page can be labeled with the number of instructions that will be executed before that page is first referenced.

iv)The optimal page algorithm says that the page with the highest label should be removed.

v)However, at the time of the page fault, the O/S has no way of knowing when each of the pages will be referenced next.

b)TheNot Recently Used Page Replacement Algorithm:

i)In order to allow O/S to collect useful statistics about which pages are used and which are not, most computers with V.M. have 2 status bits associated with each page:

(1)R is set whenever the page is referenced (r/w)

(2)M is set when the page is written to (Modified)

Note: Bits are updated on every memory reference.

(3)The O/S sets R & M bits to 0 when a process is started up.

(4)When a page fault occurs, the O/S inspects all the pages and divides them into 4 categories based on the current values of their R and M bits:

(a)Class 0: not referenced, not modified

(b)Class 1: not referenced, modified

(c)Class 2: referenced, not modified

(d)Class 3: referenced, modified

The NRU algorithm removes a page at random from the lowest numbered nonempty class.

In this algorithm, it is better to remove a modified page that hasn’t been referenced in at least once clock tick than a clean page that is in heavy use.

c)FIFO Page Replacement Algorithm:

i)The O/S maintains a list of all pages currently in memory, with the page at the head of the list the oldest one and the page at the tail the most recent arrival.

ii)On a page fault, the page at the head is removed and the new page added to the tail of the list.

iii)FIFO in its pure form is rarely used.

d)The Second Chance Page Replacement Algorithm:

i)A simple modification to FIFO that avoids the problem of throwing out a heavily used page is to inspect the R bit of the oldest page.

ii)Reference Bit: = 0  Page old and unused  Replaced

= 1 clear bit, page is put onto the end of the list, and its load time is updated as though it had just arrived in mm. Then the search continues.

iii)Figure 4-13 shows how the second chance Algorithm works.

-If A has the R bit cleared, it is evicted from memory, whether “dirty or clean.”

-If A has the R bit set, A is put at the end of the list and its load time is reset to the current time (20). The R bit cleared and the search for a suitable page continues with B.