ECE 2162 Memory Views of Memory Real machines have limited amounts - PowerPoint PPT Presentation

ECE 2162 Memory

Views of Memory • Real machines have limited amounts of memory – 640KB? A few GB? – (This laptop = 2GB) • Programmer doesn’t want to be bothered – Do you think, “oh, this computer only has 128MB so I’ll write my code this way…” – What happens if you run on a different machine? 2

Programmer’s View • Example 32-bit memory 0-2GB – When programming, you Kernel don’t care about how much real memory there is Text – Even if you use a lot, Data memory can always be Heap paged to disk Stack 4GB AKA Virtual Addresses 3

Programmer’s View • Really “Program’s View” • Each program/process gets its own 4GB space Kernel Kernel Text Data Heap Text Data Kernel Heap Stack Text Data Stack Heap Stack 4

CPU’s View • At some point, the CPU is going to have to load-from/store-to memory… all it knows is the real, A.K.A. phy hysical al memory • … which unfortunately is often < 4GB • … and is never 4GB per process 5

Pages • Memory is divided into pages , which are nothing more than fixed sized and aligned regions of memory – Typical size: 4KB/page (but not always) Page 0 0-4095 Page 1 4096-8191 Page 2 8192-12287 Page 3 12288-16383 … 6

Page Table • Map from virtual addresses to physical locations Physical 0K Addresses 4K 0K 8K 4K Page Table implements 12K this V  P mapping 8K 16K 12K 20K 24K 28K Virtual Addresses “Physical Location” may include hard-disk 7

Page Tables Physical Memory 0K 4K 0K 8K 4K 12K 8K 16K 12K 20K 24K 28K 0K 4K 8K 12K 8

Need for Translation 0xFC51908B Virtual Address Virtual Page Number Page Offset Physical Address Main Page Memory Table 0xFC519 0x00152 0x0015208B 9

Simple Page Table • Flat organization – One entry per page – Entry contains physical page number (PPN) or indicates page is on disk or invalid – Also meta-data (e.g., permissions, dirtiness, etc.) One entry per page 10

Multi-Level Page Tables • Break up the virtual address space into multiple page tables • Increase the utilization and reduce the physical size of a page table • A simple technique is a two-level page table 11

Multi-Level Page Tables Virtual Page Number Level 1 Level 2 Page Offset Physical Page Number 12

Choosing a Page Size • Page size inversely proportional to page table overhead • Large page size permits more efficient transfer to/from disk – vs. many small transfers – Like downloading from Internet • Small page leads to less fragmentation – Big page likely to have more bytes unused 13

CPU Memory Access • Program deals with virtual addresses – “Load R1 = 0[R2]” • On memory instruction 1. Compute virtual address (0[R2]) 2. Compute virtual page number 3. Compute physical address of VPN’s page table entry Could be more depending 4. Load* mapping On page table organization 5. Compute physical address 6. Do the actual Load* from memory 14

Impact on Performance? • Every time you load/store, the CPU must perform two (or more) accesses! • Even worse, every instruction fetch requires translation of the PC! • Observation: – Once a virtual page is mapped into a physical page, it’ll likely stay put for quite some time 15

Idea: Caching! • Not caching of data, but caching of translations Physical 0K Addresses 4K 0K 8K 4K 12K 8K 16K 12K 20K 24K 28K Virtual Addresses 0 20 4 4 VPN 8 12 X PPN 16 8 16 16

Translation Cache: TLB • TLB = Translation Look-aside Buffer Physical Address Cache Virtual TLB Data Address Cache Hit? Tags If TLB hit, no need to do page table lookup Note: data cache from memory accessed by physical addresses now 17

PAPT Cache • Previous slide showed Physically-Addressed Physically-Tagged cache – Sometimes called PIPT (I=Indexed) • Con: TLB lookup and cache access serialized – Caches already take > 1 cycle • Pro: cache contents valid so long as page table not modified 18

Virtually Addressed Cache (VIVT: vitually Cache Virtual Physical indexed, Data Address Address virtually tagged) TLB On Cache To L2 Cache Hit? Miss Tags • Pro: latency – no need to check TLB • Con: Cache must be flushed on process change 19

Virtually Indexed Physically Tagged Cache Virtual Data Address Physical Tag Cache Tags = Hit? TLB Physical Address Big page size can help here • Pro: latency – TLB parallelized • Pro: don’t need to flush $ on process swap • Con: Synonyms 20

Synonyms or Aliases VA Tag Set index Blk offset Virtual Page Number Page Offset VA’ Tag Blk offset Set Index Virtual Page Number Page Offset VA PA VA’ 21

TLB Design • Often fully-associative – For latency, this means few entries – However, each entry is for a whole page – Ex. 32-entry TLB, 4KB page… how big of working set while avoiding TLB misses? • If many misses: – Increase TLB size (latency problems) – Increase page size (fragmenation problems) 22

Process Changes • With physically-tagged caches, don’t need to flush cache on context switch – But TLB is no longer valid! Only flush TLB when – Add process ID to translation Recycling PIDs PID:0 0 4 20 VPN:8 1 0 32 12 36 1 PPN: 28 0 8 28 0 0 16 0 12 8 PID:1 PPN: 44 1 8 44 VPN:8 1 4 52 23

SRAM vs. DRAM • DRAM = Dynamic RAM • SRAM: 6T per bit – built with normal high-speed CMOS technology • DRAM: 1T per bit – built with special DRAM process optimized for density 24

Hardware Structures SRAM DRAM wordline wordline b b b 25

DRAM Chip Organization Row Decoder Row Memory Address Cell Array Sense Amps Row Buffer Column Column Decoder Address Data Bus 27

DRAM Chip Organization (2) • Differences with SRAM • reads are destructive : contents are erased after reading – row buffer • read lots of bits all at once, and then parcel them out based on different column addresses – similar to reading a full cache line, but only accessing one word at a time • “Fast-Page Mode” FPM DRAM organizes the DRAM row to contain bits for a complete page – row address held constant, and then fast read from different locations from the same page 28

DRAM Read Operation Row Decoder Memory 0x1FE Cell Array Sense Amps Row Buffer 0x001 0x000 0x002 Column Decoder Data Bus Accesses need not be sequential 29

Destructive Read sense amp V dd bitline 1 0 voltage Wordline Enabled Sense Amp Enabled After read of 0 or 1, cell contains something close to 1/2 V dd storage cell voltage 30

Refresh • So after a read, the contents of the DRAM cell are gone • The values are stored in the row buffer • Write them back into the cells for the next read in the future DRAM cells Sense Amps Row Buffer 31

Refresh (2) • Fairly gradually, the DRAM cell will lose its contents even if it’s not accessed 1 0 – This is why it’s called “dynamic” Gate Leakage – Contrast to SRAM which is “static” in that once written, it maintains its value forever (so long as If it keeps its value power remains on) even if power is • All DRAM rows need to be removed, then it’s “non-volatile” (e.g., regularly read and re-written flash, HDD, DVDs) 32

DRAM Read Timing Accesses are asynchronous: triggered by RAS and CAS signals, which can in theory occur at arbitrary times (subject to DRAM timing constraints) 33

SDRAM Read Timing Double-Data Rate (DDR) DRAM transfers data on both rising and falling edge of the clock Command frequency does not change Burst Length Timing figures taken from “A Performance Comparison of Contemporary DRAM Architectures” by Cuppu, Jacob, Davis and Mudge 34

More Latency More wire delay getting to the memory chips Significant wire delay just getting from the CPU to the memory controller Width/Speed varies depending on memory type (plus the return trip…) 37

Memory Controller Like Write-Combining Buffer, Scheduler may coalesce multiple accesses together, or re-order to reduce number of row accesses Commands Read Write Response Data Queue Queue Queue To/From CPU Scheduler Buffer Memory Controller Bank 0 Bank 1 38

Memory Reference Scheduling • Just like registers, need to enforce RAW, WAW, WAR dependencies • No “memory renaming” in memory controller, so enforce all three dependencies • Like everything else, still need to maintain appearance of sequential access – Consider multiple read/write requests to the same address 39

Faster DRAM Speed • Clock FSB faster – DRAM chips may not be able to keep up • Latency dominated by wire delay – Bandwidth may be improved (DDR vs. regular) but latency doesn’t change much • Instead of 2 cycles for row access, may take 3 cycles at a faster bus speed • Doesn’t address latency of the memory access 42

On-Chip Memory Controller Memory controller can run at CPU speed instead of FSB clock speed All on same chip: No slow PCB wires to drive Disadvantage: memory type is now tied to the CPU implementation 43

Read 5.3 in the text 44

ECE 2162 Memory Views of Memory Real machines have limited amounts - PowerPoint PPT Presentation

ECE 2162 Memory Views of Memory Real machines have limited amounts of memory 640KB? A few GB? (This laptop = 2GB) Programmer doesnt want to be bothered Do you think, oh, this computer only has 128MB so Ill write

ECE 2162 Branch Prediction Control Dependencies Branches are very frequent Approx. 20%

Views 2 Designing the user interface Roy Scholten hi Views . Views 2 Views 2 have you heard

Stopping CAUTI Henry County Hospital Where We Started 2500 2283 2246 2162 2000 Device Days

FERMION DARK MATTER Accepted to JHEP [arXiv:1106.2162] Cornell University In collaboration with

Memory II. Memory improvement III. Problems with memory 3 systems/stages of Memory: memory

CS-5630 / CS-6630 Visualization Views Alexander Lex alex@sci.utah.edu [xkcd] Multiple Views

MODELING & OPTIMIZATION OF DUAL-BORE OIL DEBRIS MONITORING SYSTEM ECE Team 2016, ME Team 25

Lecture 19- ECE 240a Laser Phase Noise 1 ECE 240a Lasers - Fall 2019 Lecture 19 Phase Noise

Lecture 20- ECE 240a Distributed Feedback Lasers 1 ECE 240a Lasers - Fall 2019 Lecture 20

Physical Design Issues in Biofluidic Microchips Tamal Mukherjee MEMS Laboratory ECE Department

ECE Mail System Overview Pablo J. Rebollo ECE Network Operations Center Agenda Overview of

ECE Highlights ECE Highlights by by N. Narayana Narayana Rao Rao N. Associate Head for

Fei Li and Lei He Li and Lei He Fei ECE Dept. ECE Dept. University of Wisconsin

Memory Management (Main Memory) Mehdi Kargahi School of ECE University of Tehran Spring 2008

Memory Management (Virtual Memory) Mehdi Kargahi School of ECE University of Tehran Spring

1 Memory SoC Persistent Memory-Driven Memory Memory Processor-Centric Memory SoC SoC

a b c d e 1 1 Prepared by Dr. Iyad Jafar UNIVERSITY OF JORDAN UNIVERSITY OF JORDAN CO

Alternatives for Scheduling Virtual Machines in Real-Time Embedded Systems Robert Kaiser

Improved Branch-Cut-and-Price for Capacitated Vehicle Routing Diego Pecin 1 Artur Pessoa 2 Marcus

Quality of Service su Linux: Passato Presente e Futuro Luca Abeni luca.abeni@unitn.it

redB L um en brella redB benchmarking Strong Camel Solar Sidewalk Archipelago

Charm++ Tutorial Presented by: Laxmikant V. Kale Kumaresh Pattabiraman Chee Wai Lee Overview

ETROC Plan Ted Liu (FNAL) ETROC Overview Preamp/Disc low power TDC 16x16 pixel cell array

Cellular Automata Simulation of discrete spatio-temporal systems Systems with many variables

ECE 2162 Memory Views of Memory Real machines have limited amounts - PowerPoint PPT Presentation

ECE 2162 Memory Views of Memory Real machines have limited amounts of memory 640KB? A few GB? (This laptop = 2GB) Programmer doesnt want to be bothered Do you think, oh, this computer only has 128MB so Ill write

ECE 2162 Branch Prediction Control Dependencies Branches are very frequent Approx. 20%

Views 2 Designing the user interface Roy Scholten hi Views . Views 2 Views 2 have you heard

Stopping CAUTI Henry County Hospital Where We Started 2500 2283 2246 2162 2000 Device Days

FERMION DARK MATTER Accepted to JHEP [arXiv:1106.2162] Cornell University In collaboration with

Memory II. Memory improvement III. Problems with memory 3 systems/stages of Memory: memory

CS-5630 / CS-6630 Visualization Views Alexander Lex alex@sci.utah.edu [xkcd] Multiple Views

MODELING &amp; OPTIMIZATION OF DUAL-BORE OIL DEBRIS MONITORING SYSTEM ECE Team 2016, ME Team 25

Lecture 19- ECE 240a Laser Phase Noise 1 ECE 240a Lasers - Fall 2019 Lecture 19 Phase Noise

Lecture 20- ECE 240a Distributed Feedback Lasers 1 ECE 240a Lasers - Fall 2019 Lecture 20

Physical Design Issues in Biofluidic Microchips Tamal Mukherjee MEMS Laboratory ECE Department

ECE Mail System Overview Pablo J. Rebollo ECE Network Operations Center Agenda Overview of

ECE Highlights ECE Highlights by by N. Narayana Narayana Rao Rao N. Associate Head for

Fei Li and Lei He Li and Lei He Fei ECE Dept. ECE Dept. University of Wisconsin

Memory Management (Main Memory) Mehdi Kargahi School of ECE University of Tehran Spring 2008

Memory Management (Virtual Memory) Mehdi Kargahi School of ECE University of Tehran Spring

1 Memory SoC Persistent Memory-Driven Memory Memory Processor-Centric Memory SoC SoC

a b c d e 1 1 Prepared by Dr. Iyad Jafar UNIVERSITY OF JORDAN UNIVERSITY OF JORDAN CO

Alternatives for Scheduling Virtual Machines in Real-Time Embedded Systems Robert Kaiser

Improved Branch-Cut-and-Price for Capacitated Vehicle Routing Diego Pecin 1 Artur Pessoa 2 Marcus

Quality of Service su Linux: Passato Presente e Futuro Luca Abeni luca.abeni@unitn.it

redB L um en brella redB benchmarking Strong Camel Solar Sidewalk Archipelago

Charm++ Tutorial Presented by: Laxmikant V. Kale Kumaresh Pattabiraman Chee Wai Lee Overview

ETROC Plan Ted Liu (FNAL) ETROC Overview Preamp/Disc low power TDC 16x16 pixel cell array

Cellular Automata Simulation of discrete spatio-temporal systems Systems with many variables

MODELING & OPTIMIZATION OF DUAL-BORE OIL DEBRIS MONITORING SYSTEM ECE Team 2016, ME Team 25