ECE232: Hardware Organization and Design Lecture 28: More Virtual - PowerPoint PPT Presentation

ECE232: Hardware Organization and Design Lecture 28: More Virtual Memory Adapted from Computer Organization and Design , Patterson & Hennessy, UCB

Overview Virtual memory used to protect applications from each other  Portions of application located both in main memory and on disk  Need to speed up access for virtual memory  Idea: use a small cache to store translation for frequently used  pages ECE232: More Virtual Memory 2

How to Translate Fast? Problem: Virtual Memory requires two memory accesses!  • one to translate Virtual Address into Physical Address (page table lookup) - Page Table is in physical memory • one to transfer the actual data (hopefully cache hit) • VM hierarchy only or Cache-memory-disk hierarchy Why not create a cache of virtual to physical address  translations to make translation fast? (smaller is faster) Memory For historical reasons, such a “page table cache”  is called a Translation Lookaside Buffer, or TLB CPU ECE232: More Virtual Memory 3

Translation-Lookaside Buffer (TLB) Physical Page 0 Physical Page 1 of page 1 Physical Page N-1 Main Memory H. Stone, “High Performance Computer Architecture,” AW 1993 ECE232: More Virtual Memory 4

TLB and Page Table ECE232: More Virtual Memory 5

Translation Look-Aside Buffers TLB is usually small, typically 32-512 entries  Like any other cache, the TLB can be fully associative, set  associative, or direct mapped data data virtual physical addr. addr. miss hit hit Main TLB Cache Processor Memory miss Page Disk OS Fault Table Handler Memory page fault/ protection violation ECE232: More Virtual Memory 6

Steps in Memory Access - Example data data virtual physical addr . addr . hit miss hit Main TLB Cache CPU miss Memory Page Disk OS Fault Table Handler Memory ECE232: More Virtual Memory 7

3 2 1 0 DECStation 3100/ 31 30 29 15 14 13 12 11 10 9 8 MIPS R2000 Virtual Address Virtual page number Page offset 20 12 Valid Dirty Physical page number Tag TLB TLB hit 64 entries, fully 20 associative Physical page number Page offset Physical Address Physical address tag Cache index Byte 16 14 2 offset Valid Tag Data Cache 16K entries, direct mapped 32 Data ECE232: More Virtual Memory 8 Cache hit

Real Stuff: Pentium Pro Memory Hierarchy Address Size: 32 bits (VA, PA)  VM Page Size: 4 KB  TLB organization: separate i,d TLBs  (i-TLB: 32 entries, d-TLB: 64 entries) 4-way set associative LRU approximated hardware handles miss L1 Cache: 8 KB, separate i,d  4-way set associative LRU approximated 32 byte block write back L2 Cache: 256 or 512 KB  ECE232: More Virtual Memory 9

Intel “Nehalim” quad -core processor 13.5  19.6 mm die; 731 million transistors; Two 128-bit memory channels Each processor has: private 32-KB instruction and 32-KB data caches and a 512-KB L2 cache. The four cores share an 8-MB L3 cache. Each core also has a two-level TLB. ECE232: More Virtual Memory 10

Comparing Intel’s Nehalim to AMD’s Opteron Intel Nehalem AMD Opteron X4 Virtual addr 48 bits 48 bits Physical 44 bits 48 bits addr Page size 4KB, 2/4MB 4KB, 2/4MB L1 TLB L1 I-TLB: 128 entries L1 I-TLB: 48 entries (per core) L1 D-TLB: 64 entries L1 D-TLB: 48 entries Both 4-way, LRU Both fully associative, replacement LRU replacement L2 TLB Single L2 TLB: 512 L2 I-TLB: 512 entries (per core) entries L2 D-TLB: 512 entries 4-way, LRU replacement Both 4-way, round-robin LRU TLB misses Handled in hardware Handled in hardware ECE232: More Virtual Memory 11

Further Comparison Intel Nehalem AMD Opteron X4 L1 caches L1 I-cache: 32KB, 64-byte L1 I-cache: 32KB, 64-byte (per core) blocks, 4-way, approx blocks, 2-way, LRU, hit LRU, hit time n/a time 3 cycles L1 D-cache: 32KB, 64- L1 D-cache: 32KB, 64- byte blocks, 8-way, approx byte blocks, 2-way, LRU, LRU, write-back/allocate, write-back/allocate, hit hit time n/a time 9 cycles L2 unified 256KB, 64-byte blocks, 8- 512KB, 64-byte blocks, cache way, approx LRU, write- 16-way, approx LRU, (per core) back/allocate, hit time n/a write-back/allocate, hit time n/a L3 unified 8MB, 64-byte blocks, 16- 2MB, 64-byte blocks, 32- cache way, write-back/allocate, way, write-back/allocate, (shared) hit time n/a hit time 32 cycles ECE232: More Virtual Memory 12

Summary Virtual memory allows the appearance of a main memory that is  larger than what is physically present Virtual memory can be shared by multiple applications  Page table indicates how to translate from virtual to physical  address TLB speeds up access to virtual memory  Generally set associative or fully associative • Much smaller than main memory • Next time: Putting it all together (cache, TLB, virtual memory)  ECE232: More Virtual Memory 13

ECE232: Hardware Organization and Design Lecture 28: More Virtual - PowerPoint PPT Presentation

ECE232: Hardware Organization and Design Lecture 28: More Virtual Memory Adapted from Computer Organization and Design , Patterson & Hennessy, UCB Overview Virtual memory used to protect applications from each other Portions of

ECE232: Hardware Organization and Design Lecture 7: Binary Numbers and Adders Adapted from

ECE232: Hardware Organization and Design Lecture 4: Logic Operations and Introduction to

ECE232: Hardware Organization and Design Lecture 9: Floating Point Adapted from Computer

ECE232: Hardware Organization and Design Lecture 21: Memory Hierarchy Adapted from Computer

ECE232: Hardware Organization and Design Lecture 22: Introduction to Caches Adapted from Computer

ECE232: Hardware Organization and Design Lecture 29: Computer Input/Output Adapted from Computer

ECE232: Hardware Organization and Design Lecture 5: MIPs Decision-Making Instructions Adapted from

ECE232: Hardware Organization and Design Lecture 23: Associative Caches Adapted from Computer

ECE232: Hardware Organization and Design Lecture 11: Introduction to MIPs Datapath Adapted from

Hardware Observability Framework Hardware Observability Framework Hardware Observability

software and hardware for the Internet of Things. Choose hardware Design hardware Design

Sec Secure ure Hardware Hardware and Hardware and Hardware- En Enabled abled Security

VC. VC. Hardware Startup The Hardware Revolu/on The Hardware Revolution Removing Barriers to

Spark architecture Spark architecture Hardware organization Hardware organization In local

Flexible Hardware Design at Flexible Hardware Design at Low Levels of Abstraction Low Levels of

LibreCores Free and Open Digital Hardware Requirements Design Implementation Hardware

Malloc & VM Malloc & VM By sseshadr Agenda Agenda Administration d st at o

Solution 2: TLBs We have a large pile of data (i.e., the page table) and we want to access it

Continuous-time Stochastic Grey-box Model of the Nonlinear Feedback System based on Residual

Proof reconstruction in conflict-driven satisfiability 1 Maria Paola Bonacina Dipartimento di

Xen and the Art of Virtualization Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim

Multi-core Design Virendra Singh Associate Professor C omputer A rchitecture and D ependable S

Single Address Space o RW RO EX NO o Kernel vfat.o Single Address Space o RW RO EX o

A Configurable TLB Hierarchy for the RISC-V Architecture Nikolaos Charalampos Papadopoulos ,

Sambuz

Useful Links

Newsletter

Mail Us

ECE232: Hardware Organization and Design Lecture 28: More Virtual - PowerPoint PPT Presentation

ECE232: Hardware Organization and Design Lecture 28: More Virtual Memory Adapted from Computer Organization and Design , Patterson & Hennessy, UCB Overview Virtual memory used to protect applications from each other Portions of

ECE232: Hardware Organization and Design Lecture 7: Binary Numbers and Adders Adapted from

ECE232: Hardware Organization and Design Lecture 4: Logic Operations and Introduction to

ECE232: Hardware Organization and Design Lecture 9: Floating Point Adapted from Computer

ECE232: Hardware Organization and Design Lecture 21: Memory Hierarchy Adapted from Computer

ECE232: Hardware Organization and Design Lecture 22: Introduction to Caches Adapted from Computer

ECE232: Hardware Organization and Design Lecture 29: Computer Input/Output Adapted from Computer

ECE232: Hardware Organization and Design Lecture 5: MIPs Decision-Making Instructions Adapted from

ECE232: Hardware Organization and Design Lecture 23: Associative Caches Adapted from Computer

ECE232: Hardware Organization and Design Lecture 11: Introduction to MIPs Datapath Adapted from

Hardware Observability Framework Hardware Observability Framework Hardware Observability

software and hardware for the Internet of Things. Choose hardware Design hardware Design

Sec Secure ure Hardware Hardware and Hardware and Hardware- En Enabled abled Security

VC. VC. Hardware Startup The Hardware Revolu/on The Hardware Revolution Removing Barriers to

Spark architecture Spark architecture Hardware organization Hardware organization In local

Flexible Hardware Design at Flexible Hardware Design at Low Levels of Abstraction Low Levels of

LibreCores Free and Open Digital Hardware Requirements Design Implementation Hardware

Malloc &amp; VM Malloc &amp; VM By sseshadr Agenda Agenda Administration d st at o

Solution 2: TLBs We have a large pile of data (i.e., the page table) and we want to access it

Continuous-time Stochastic Grey-box Model of the Nonlinear Feedback System based on Residual

Proof reconstruction in conflict-driven satisfiability 1 Maria Paola Bonacina Dipartimento di

Xen and the Art of Virtualization Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim

Multi-core Design Virendra Singh Associate Professor C omputer A rchitecture and D ependable S

Single Address Space o RW RO EX NO o Kernel vfat.o Single Address Space o RW RO EX o

A Configurable TLB Hierarchy for the RISC-V Architecture Nikolaos Charalampos Papadopoulos ,

Sambuz

Useful Links

Newsletter

Mail Us

Malloc & VM Malloc & VM By sseshadr Agenda Agenda Administration d st at o