Memory System Design: Virtual Memory Virendra Singh Associate - PowerPoint PPT Presentation

Memory System Design: Virtual Memory Virendra Singh Associate Professor C omputer A rchitecture and D ependable S ystems L ab Department of Electrical Engineering Indian Institute of Technology Bombay http://www.ee.iitb.ac.in/~viren/ E-mail: viren@ee.iitb.ac.in EE-739: Processor Design Lecture 20 (04 Mar 2013) CADSL

Performance: Miss • Miss rate  Fraction of cache access that result in a miss • Causes of misses  Compulsory • First reference to a block  Capacity • Blocks discarded and later retrieved  Conflict • Program makes repeated references to multiple addresses from different blocks that map to the same location in the cache 04 Mar 2013 EE-739@IITB 2 CADSL

Memory Optimization • Reducing miss rate  Larger block size, larger cache size, higher associativity  Reducing miss penalty  Multi-level caches, read priority over write  Reducing time to hit in the cache  Avoid address translation when indexing caches EE-739@IITB 04 Mar 2013 3 CADSL

Memory Hierarchy Basics • Six basic cache optimizations:  Larger block size • Reduces compulsory misses • Increases capacity and conflict misses, increases miss penalty  Larger total cache capacity to reduce miss rate • Increases hit time, increases power consumption  Higher associativity • Reduces conflict misses • Increases hit time, increases power consumption 04 Mar 2013 EE-739@IITB 4 CADSL

Memory Hierarchy Basics • Six basic cache optimizations:  Higher number of cache levels • Reduces overall memory access time  Giving priority to read misses over writes • Reduces miss penalty  Avoiding address translation in cache indexing • Reduces hit time 04 Mar 2013 EE-739@IITB 5 CADSL

Summary Summary • Memory technology • Memory hierarchy – Temporal and spatial locality • Caches – Placement – Identification – Replacement – Write Policy • Pipeline integration of caches 04 Mar 2013 EE-739@IITB 6 CADSL

Memory Hierarchy Memory Hierarchy Temporal Locality Spatial Locality CPU • Keep recently • Bring neighbors of referenced items at recently referenced to higher levels higher levels I & D L1 Cache • Future references • Future references satisfied quickly satisfied quickly Shared L2 Cache Main Memory Disk 04 Mar 2013 EE-739@IITB 7 CADSL

Four Burning Issues Four Burning Issues • These are: – Placement • Where can a block of memory go? – Identification • How do I find a block of memory? – Replacement • How do I make space for new blocks? – Write Policy • How do I propagate changes? • Consider these for registers and main memory – Main memory usually DRAM 04 Mar 2013 EE-739@IITB 8 CADSL

Placement Placement Memory Placement Comments Type Registers Anywhere; Compiler/programme Int, FP, SPR r manages Cache Fixed in Direct-mapped, H/W (SRAM) set-associative, fully-associative DRAM Anywhere O/S manages Disk Anywhere O/S manages 04 Mar 2013 EE-739@IITB 9 CADSL

Register File Register File • Registers managed by programmer/compiler – Assign variables, temporaries to registers – Limited name space matches available storage Placement Flexible (subject to data type) Identification Implicit (name == location) Replacement Spill code (store to stack frame) Write policy Write-back (store on replacement) 04 Mar 2013 EE-739@IITB 10 CADSL

Main Memory and Virtual Memory Main Memory and Virtual Memory • Use of virtual memory – Main memory becomes another level in the memory hierarchy – Enables programs with address space or working set that exceed physically available memory • No need for programmer to manage overlays, etc. • Sparse use of large address space is OK – Allows multiple users or programs to timeshare limited amount of physical memory space and address space • Bottom line: efficient use of expensive resource, and ease of programming 04 Mar 2013 EE-739@IITB 11 CADSL

Virtual Memory Virtual Memory • Enables – Use more memory than system has – Program can think it is the only one running • Don’t have to manage address space usage across programs • E.g. think it always starts at address 0x0 – Memory protection • Each program has private VA space: no-one else can clobber – Better performance • Start running a large program before all of it has been loaded from disk 04 Mar 2013 EE-739@IITB 12 CADSL

Virtual Memory – Placement Virtual Memory – Placement • Main memory managed in larger blocks – Page size typically 4K – 16K • Fully flexible placement; fully associative  Operating system manages placement  Indirection through page table  Maintain mapping between: • Virtual address (seen by programmer) • Physical address (seen by main memory) 04 Mar 2013 EE-739@IITB 13 CADSL

Virtual Memory – Placement Virtual Memory – Placement • Fully associative implies expensive lookup? – In caches, yes: check multiple tags in parallel • In virtual memory, expensive lookup is avoided by using a level of indirection – Lookup table or hash table – Called a page table 04 Mar 2013 EE-739@IITB 14 CADSL

Virtual Memory – Identification Virtual Memory – Identification Virtual Address Physical Dirty bit Address 0x20004000 0x2000 Y/N • Similar to cache tag array – Page table entry contains VA, PA, dirty bit • Virtual address: – Matches programmer view; based on register values – Can be the same for multiple programs sharing same system, without conflicts • Physical address: – Invisible to programmer, managed by O/S – Created/deleted on demand basis, can change 04 Mar 2013 EE-739@IITB 15 CADSL

Virtual Memory – Replacement Virtual Memory – Replacement • Similar to caches: – FIFO – LRU; overhead too high • Approximated with reference bit checks • Clock algorithm – Random • O/S decides, manages 04 Mar 2013 EE-739@IITB 16 CADSL

Virtual Memory – Write Policy Virtual Memory – Write Policy • Write back – Disks are too slow to write through • Page table maintains dirty bit – Hardware must set dirty bit on first write – O/S checks dirty bit on eviction – Dirty pages written to backing store • Disk write, 10+ ms 04 Mar 2013 EE-739@IITB 17 CADSL

Virtual Memory Implementation Virtual Memory Implementation • Caches have fixed policies, hardware FSM for control, pipeline stall • VM has very different miss penalties – Remember disks are 10+ ms! • Hence engineered differently 04 Mar 2013 EE-739@IITB 18 CADSL

Page Faults Page Faults • A virtual memory miss is a page fault – Physical memory location does not exist – Exception is raised, save PC – Invoke OS page fault handler • Find a physical page (possibly evict) • Initiate fetch from disk – Switch to other task that is ready to run – Interrupt when disk access complete – Restart original instruction 04 Mar 2013 EE-739@IITB 19 CADSL

Address Translation Address Translation VA PA Dirty Ref Protection 0x2000400 0x2000 Y/N Y/N Read/Write 0 /Execute • O/S and hardware communicate via PTE • How do we find a PTE? – &PTE = PTBR + page number * sizeof(PTE) – PTBR is private for each program • Context switch replaces PTBR contents 04 Mar 2013 EE-739@IITB 20 CADSL

Address Translation Address Translation Virtual Page Number Offset PTBR + D VA PA 04 Mar 2013 EE-739@IITB 21 CADSL

Page Table Size Page Table Size • How big is page table? – 2 32 / 4K * 4B = 4M per program (!) – Much worse for 64-bit machines • To make it smaller – Use limit register(s) • If VA exceeds limit, invoke O/S to grow region – Use a multi-level page table – Make the page table pageable (use VM) 04 Mar 2013 EE-739@IITB 22 CADSL

Multilevel Page Table Multilevel Page Table Offset PTBR + + + 04 Mar 2013 EE-739@IITB 23 CADSL

Hashed Page Table Hashed Page Table • Use a hash table or inverted page table – PT contains an entry for each real address • Instead of entry for every virtual address – Entry is found by hashing VA – Oversize PT to reduce collisions: #PTE = 4 x (#phys. pages) 04 Mar 2013 EE-739@IITB 24 CADSL

Hashed Page Table Hashed Page Table Virtual Page Number Offset PTBR Hash PTE0 PTE1 PTE2 PTE3 04 Mar 2013 EE-739@IITB 25 CADSL

High-Performance VM High-Performance VM • VA translation – Additional memory reference to PTE – Each instruction fetch/load/store now 2 memory references • Or more, with multilevel table or hash collisions – Even if PTE are cached, still slow • Hence, use special-purpose cache for PTEs – Called TLB (translation lookaside buffer) – Caches PTE entries – Exploits temporal and spatial locality (just a cache) 04 Mar 2013 EE-739@IITB 26 CADSL

TLB TLB 04 Mar 2013 EE-739@IITB 27 CADSL

Memory System Design: Virtual Memory Virendra Singh Associate - PowerPoint PPT Presentation

Memory System Design: Virtual Memory Virendra Singh Associate Professor C omputer A rchitecture and D ependable S ystems L ab Department of Electrical Engineering Indian Institute of Technology Bombay http://www.ee.iitb.ac.in/~viren/ E-mail:

Memory II. Memory improvement III. Problems with memory 3 systems/stages of Memory: memory

1 Memory SoC Persistent Memory-Driven Memory Memory Processor-Centric Memory SoC SoC

Networks Computer-Computer Comm CPU CPU CPU CPU Memory Device Device Memory Memory

Virtual Memory 1 Memory Hierarchy Memory 4GB Cache 1M Registers 1K Question: What if

Personal SE Computer Memory Addresses C Pointers Computer Memory Organization Memory is a

Memory Memory processing is the ability to: Acquire (Short term memory) Manipulate

Memory Management Memory Manager Requirements Minimize primary memory access time

MPI AND OPENACC JIRI KRAUS, NVIDIA MPI+OPENACC System System System GDDR5 Memory GDDR5

Memory Management Ideally programmers want memory that is large fast non

UNIFIED MEMORY IN CUDA 6 MARK HARRIS NVIDIA CONFIDENTIAL Unified Memory Dramatically Lower

28.05.04 09:50 Memory Management The computer memory is a limited resource so the Memory

Memory Hierarchy Design Memory Hierarchy Design Chapter 5 and Appendix C 1 Overview

Memory Systems Design & Programming CMPE 310 Memory Address Decoding The processor can

Virtual Memory and Virtual Memory and Demand Paging Demand Paging Virtual Memory Illustrated

Dynamic Memory Management 333 Dynamic Memory Management Process Memory Layout Process Memory

Lecture 11: Persistent Memory Databases 1 / 71 Persistent Memory Databases Recap

Andrew Godwin Django core developer Senior Software Engineer at Used to complain about

The Next Generation of the Real-Time Specifj fjcation for Java Dr. James J. Hunt JSR 282 Spec

CSAL: A CLOUD STORAGE ABSTRACTION LAYER TO ENABLE PORTABLE CLOUD APPLICATIONS Zach Hill &

Memory Management Disclaimer: some slides are adopted from book authors slides with permission

CSE 120 Cache to map virtual page numbers to page frame Associative memory: HW looks up in all

Back to the Homography: The Why Sanja Fidler CSC420: Intro to Image Understanding 1 / 29

Back to the Homography: The Why Sanja Fidler CSC420: Intro to Image Understanding 1 / 1

Math 211 Math 211 Lecture #11 October 3, 2000 2 Geometry of Solution Sets Geometry of

Memory System Design: Virtual Memory Virendra Singh Associate - PowerPoint PPT Presentation

Memory System Design: Virtual Memory Virendra Singh Associate Professor C omputer A rchitecture and D ependable S ystems L ab Department of Electrical Engineering Indian Institute of Technology Bombay http://www.ee.iitb.ac.in/~viren/ E-mail:

Memory II. Memory improvement III. Problems with memory 3 systems/stages of Memory: memory

1 Memory SoC Persistent Memory-Driven Memory Memory Processor-Centric Memory SoC SoC

Networks Computer-Computer Comm CPU CPU CPU CPU Memory Device Device Memory Memory

Virtual Memory 1 Memory Hierarchy Memory 4GB Cache 1M Registers 1K Question: What if

Personal SE Computer Memory Addresses C Pointers Computer Memory Organization Memory is a

Memory Memory processing is the ability to: Acquire (Short term memory) Manipulate

Memory Management Memory Manager Requirements Minimize primary memory access time

MPI AND OPENACC JIRI KRAUS, NVIDIA MPI+OPENACC System System System GDDR5 Memory GDDR5

Memory Management Ideally programmers want memory that is large fast non

UNIFIED MEMORY IN CUDA 6 MARK HARRIS NVIDIA CONFIDENTIAL Unified Memory Dramatically Lower

28.05.04 09:50 Memory Management The computer memory is a limited resource so the Memory

Memory Hierarchy Design Memory Hierarchy Design Chapter 5 and Appendix C 1 Overview

Memory Systems Design &amp; Programming CMPE 310 Memory Address Decoding The processor can

Virtual Memory and Virtual Memory and Demand Paging Demand Paging Virtual Memory Illustrated

Dynamic Memory Management 333 Dynamic Memory Management Process Memory Layout Process Memory

Lecture 11: Persistent Memory Databases 1 / 71 Persistent Memory Databases Recap

Andrew Godwin Django core developer Senior Software Engineer at Used to complain about

The Next Generation of the Real-Time Specifj fjcation for Java Dr. James J. Hunt JSR 282 Spec

CSAL: A CLOUD STORAGE ABSTRACTION LAYER TO ENABLE PORTABLE CLOUD APPLICATIONS Zach Hill &amp;

Memory Management Disclaimer: some slides are adopted from book authors slides with permission

CSE 120 Cache to map virtual page numbers to page frame Associative memory: HW looks up in all

Back to the Homography: The Why Sanja Fidler CSC420: Intro to Image Understanding 1 / 29

Back to the Homography: The Why Sanja Fidler CSC420: Intro to Image Understanding 1 / 1

Math 211 Math 211 Lecture #11 October 3, 2000 2 Geometry of Solution Sets Geometry of

Memory Systems Design & Programming CMPE 310 Memory Address Decoding The processor can

CSAL: A CLOUD STORAGE ABSTRACTION LAYER TO ENABLE PORTABLE CLOUD APPLICATIONS Zach Hill &