 
              ECE232: Hardware Organization and Design Lecture 28: More Virtual Memory Adapted from Computer Organization and Design , Patterson & Hennessy, UCB
Overview Virtual memory used to protect applications from each other  Portions of application located both in main memory and on disk  Need to speed up access for virtual memory  Idea: use a small cache to store translation for frequently used  pages ECE232: More Virtual Memory 2
How to Translate Fast? Problem: Virtual Memory requires two memory accesses!  • one to translate Virtual Address into Physical Address (page table lookup) - Page Table is in physical memory • one to transfer the actual data (hopefully cache hit) • VM hierarchy only or Cache-memory-disk hierarchy Why not create a cache of virtual to physical address  translations to make translation fast? (smaller is faster) Memory For historical reasons, such a “page table cache”  is called a Translation Lookaside Buffer, or TLB CPU ECE232: More Virtual Memory 3
Translation-Lookaside Buffer (TLB) Physical Page 0 Physical Page 1 of page 1 Physical Page N-1 Main Memory H. Stone, “High Performance Computer Architecture,” AW 1993 ECE232: More Virtual Memory 4
TLB and Page Table ECE232: More Virtual Memory 5
Translation Look-Aside Buffers TLB is usually small, typically 32-512 entries  Like any other cache, the TLB can be fully associative, set  associative, or direct mapped data data virtual physical addr. addr. miss hit hit Main TLB Cache Processor Memory miss Page Disk OS Fault Table Handler Memory page fault/ protection violation ECE232: More Virtual Memory 6
Steps in Memory Access - Example data data virtual physical addr . addr . hit miss hit Main TLB Cache CPU miss Memory Page Disk OS Fault Table Handler Memory ECE232: More Virtual Memory 7
3 2 1 0 DECStation 3100/ 31 30 29 15 14 13 12 11 10 9 8 MIPS R2000 Virtual Address Virtual page number Page offset 20 12 Valid Dirty Physical page number Tag TLB TLB hit 64 entries, fully 20 associative Physical page number Page offset Physical Address Physical address tag Cache index Byte 16 14 2 offset Valid Tag Data Cache 16K entries, direct mapped 32 Data ECE232: More Virtual Memory 8 Cache hit
Real Stuff: Pentium Pro Memory Hierarchy Address Size: 32 bits (VA, PA)  VM Page Size: 4 KB  TLB organization: separate i,d TLBs  (i-TLB: 32 entries, d-TLB: 64 entries) 4-way set associative LRU approximated hardware handles miss L1 Cache: 8 KB, separate i,d  4-way set associative LRU approximated 32 byte block write back L2 Cache: 256 or 512 KB  ECE232: More Virtual Memory 9
Intel “Nehalim” quad -core processor 13.5  19.6 mm die; 731 million transistors; Two 128-bit memory channels Each processor has: private 32-KB instruction and 32-KB data caches and a 512-KB L2 cache. The four cores share an 8-MB L3 cache. Each core also has a two-level TLB. ECE232: More Virtual Memory 10
Comparing Intel’s Nehalim to AMD’s Opteron Intel Nehalem AMD Opteron X4 Virtual addr 48 bits 48 bits Physical 44 bits 48 bits addr Page size 4KB, 2/4MB 4KB, 2/4MB L1 TLB L1 I-TLB: 128 entries L1 I-TLB: 48 entries (per core) L1 D-TLB: 64 entries L1 D-TLB: 48 entries Both 4-way, LRU Both fully associative, replacement LRU replacement L2 TLB Single L2 TLB: 512 L2 I-TLB: 512 entries (per core) entries L2 D-TLB: 512 entries 4-way, LRU replacement Both 4-way, round-robin LRU TLB misses Handled in hardware Handled in hardware ECE232: More Virtual Memory 11
Further Comparison Intel Nehalem AMD Opteron X4 L1 caches L1 I-cache: 32KB, 64-byte L1 I-cache: 32KB, 64-byte (per core) blocks, 4-way, approx blocks, 2-way, LRU, hit LRU, hit time n/a time 3 cycles L1 D-cache: 32KB, 64- L1 D-cache: 32KB, 64- byte blocks, 8-way, approx byte blocks, 2-way, LRU, LRU, write-back/allocate, write-back/allocate, hit hit time n/a time 9 cycles L2 unified 256KB, 64-byte blocks, 8- 512KB, 64-byte blocks, cache way, approx LRU, write- 16-way, approx LRU, (per core) back/allocate, hit time n/a write-back/allocate, hit time n/a L3 unified 8MB, 64-byte blocks, 16- 2MB, 64-byte blocks, 32- cache way, write-back/allocate, way, write-back/allocate, (shared) hit time n/a hit time 32 cycles ECE232: More Virtual Memory 12
Summary Virtual memory allows the appearance of a main memory that is  larger than what is physically present Virtual memory can be shared by multiple applications  Page table indicates how to translate from virtual to physical  address TLB speeds up access to virtual memory  Generally set associative or fully associative • Much smaller than main memory • Next time: Putting it all together (cache, TLB, virtual memory)  ECE232: More Virtual Memory 13
Recommend
More recommend