ECE232: Hardware Organization and Design Lecture 21: Memory - PowerPoint PPT Presentation

ECE232: Hardware Organization and Design Lecture 21: Memory Hierarchy Adapted from Computer Organization and Design , Patterson & Hennessy, UCB

Overview  Ideally, computer memory would be large and fast • Unfortunately, memory implementation involves tradeoffs  Memory Hierarchy • Includes caches, main memory, and disks  Caches • Small and fast • Contain subset of data from main memory • Generally close to the processor  Terminology • Cache blocks, hit rate, miss rate  More mathematical than material from earlier in the course ECE232: Memory Hierarchy 2

Recap: Machine Organization Personal Computer Computer Processor Memory Devices (CPU) (active) (passive) Input Control (where programs, & data Datapath live when Output running) ECE232: Memory Hierarchy 3

Memory Basics Users want large and fast memories  Fact  Large memories are slow • Fast memories are small • Large memories use DRAM technology: D ynamic R andom A ccess  M emory High density, low power, cheap, slow • Dynamic: needs to be “refreshed” regularly • DRAM access times are 50-70ns at cost of $10 to $20 per GB  FPM ( Fast Page Mode ) • Fast memories use SRAM : S tatic R andom A ccess M emory  Low density, high power, expensive, fast • Static: content lasts “forever” (until lose power) • SRAM access times are .5 – 5ns at cost of $400 to $1,000 per GB  ECE232: Memory Hierarchy 4

Memory Technology SRAM and DRAM are: Random Access storage  Access time is the same for all locations (Hardware decoder • used) For even larger and cheaper storage (than DRAM) use hard  drive (Disk): Sequential Access Very slow, Data accessed sequentially, access time is location • dependent, considered as I/O Disk access times are 5 to 20 million ns (i.e., msec) at cost of • $.20 to $2.00 per GB ECE232: Memory Hierarchy 5

Processor-Memory Speed gap Problem Processor-DRAM Memory Performance Gap Motivation for Memory Hierarchy µProc 1000 60%/yr. (2X/1.5yr) Performance 100 Processor-Memory Performance Gap: (grows 50% / year) 10 DRAM 5%/yr. (2X/15 yrs) 1 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 Time ECE232: Memory Hierarchy 6

Need for speed Assume CPU runs at 3GHz  Interface Width Frequency Bytes/Sec Every instruction requires  4-way interleaved 4B of instruction and at PC1600 (DDR200) least one memory access 6.4 GB/s SDRAM 4 x64bits 100 MHz DDR (4B of data) Opteron Hyper- • 3 * 8 = 24GB/sec Transport memory 6.4 GB/s Peak performance of bus 128bits 200 MHz DDR  sequential burst of transfer Pentium 4 "800 6.4 GB/s MHz" FSB 64bits 200 MHz QDR ( Performance for random PC2 6400 access is much much (DDR-II 800) slower due to latency ) 6.4 GB/s SDRAM 64bits 400 MHz DDR PC2 5300 Memory bandwidth and  (DDR-II 667) access time is a 5.3 GB/s SDRAM 64bits 333 MHz DDR performance bottleneck Pentium 4 "533 4.3 GB/s MHz" FSB 64bits 133 MHz QDR FSB – Front-Side Bus; DDR – Double Data Rate; SDRAM – Synchronous DRAM ECE232: Memory Hierarchy 7

Need for Large Memory Small memories are fast  So just write small programs  “640 K of memory should be enough for anybody” -- Bill Gates, 1981 Today’s programs require large memories  • Data base applications may require Gigabytes of memory ECE232: Memory Hierarchy 8

The Goal: Illusion of large, fast, cheap memory How do we create a memory that is large, cheap and fast  (most of the time)? Strategy: Provide a Small, Fast Memory which holds a subset  of the main memory – called cache • Keep frequently-accessed locations in fast cache • Cache retrieves more than one word at a time • Sequential accesses are faster after first access ECE232: Memory Hierarchy 9

Memory Hierarchy Hierarchy of Levels  • Uses smaller and faster memory technologies close to the processor • Fast access time in highest level of hierarchy • Cheap, slow memory furthest from processor The aim of memory hierarchy design is to have access time  close to the highest level and size equal to the lowest level ECE232: Memory Hierarchy 10

Memory Hierarchy Pyramid Processor (CPU) transfer datapath: bus Decreasing distance Increasing from CPU, Level 1 Distance from Decreasing CPU, Access Time Level 2 Decreasing cost (Memory / MB Latency) Level 3 . . . Level n Size of memory at each level ECE232: Memory Hierarchy 11

Basic Philosophy Move data into ‘smaller, faster’ memory  Operate on it  Move it back to ‘larger, cheaper’ memory  • How do we keep track if changed What if we run out of space in ‘smaller, faster’ memory?  Important Concepts: Latency, Bandwidth  ECE232: Memory Hierarchy 12

Typical Hierarchy Cache/MM virtual memory C 8 B a 32 B 4 KB CPU Memory disk c regs h e Notice that the data width is changing  • Why? Bandwidth: Transfer rate between various levels  • CPU-Cache: 24 GBps • Cache-Main: 0.5-6.4GBps • Main-Disk: 187MBps (serial ATA/1500) ECE232: Memory Hierarchy 13

Why large blocks? Fetch large blocks at a time  • Take advantage of spatial locality for (i=0; i < length; i++) sum += array[i]; • array has spatial locality • sum has temporal locality ECE232: Memory Hierarchy 14

Why Hierarchy works: Natural Locality The Principle of Locality  • Programs access a relatively small portion of the address space at any second 1 Probability of reference 0 2 n - 1 0 Memory Address Temporal Locality (Locality in Time):  Recently accessed  data tend to be referenced again soon Spatial Locality (Locality in Space):  nearby items will tend  to be referenced soon ECE232: Memory Hierarchy 15

Taking Advantage of Locality Memory hierarchy  Store everything on disk  Copy recently accessed (and nearby) items from disk to smaller  DRAM memory Main memory • Copy more recently accessed (and nearby) items from DRAM to  smaller SRAM memory Cache memory attached to CPU • ECE232: Memory Hierarchy 16

Principle of Locality Programs access a small proportion of their address space at any  time Temporal locality  Items accessed recently are likely to be accessed again soon • e.g., instructions in a loop, induction variables • Spatial locality  Items near those accessed recently are likely to be accessed • soon E.g., sequential instruction access, array data • ECE232: Memory Hierarchy 17

Memory Hierarchy: Terminology Hit: data appears in upper level in block X  Hit Rate: the fraction of memory accesses found in the upper  level Miss: data needs to be retrieved from a block in the lower  level (Block Y) Miss Rate = 1 - (Hit Rate)  Hit Time: Time to access the upper level which consists of  Time to determine hit/miss + upper level access time Miss Penalty: Time to replace a block in the upper level +  Time to deliver the block to the processor Note: Hit Time << Miss Penalty  Lower Level Upper Level To Processor Block Y Block X From Processor Block X ECE232: Memory Hierarchy 18

Current Memory Hierarchy Processor Control Secondary Memory Main L2 Memory Data- regs L1 Cache path Cache Speed(ns): 1ns 2ns 6ns 100ns 10,000,000ns Size (MB): 0.0005 0.1 1-4 1000-6000 500,000 Cost ($/MB): -- $10 $3 $0.01 $0.002 Technology: Regs SRAM SRAM DRAM Disk • Cache - Main memory: Speed • Main memory – Disk (virtual memory): Capacity ECE232: Memory Hierarchy 19

How is the hierarchy managed? Registers « Memory  • By the compiler (or assembly language Programmer) Cache « Main Memory  • By hardware Main Memory « Disks  • By combination of hardware and the operating system • virtual memory Processor Control Secondary Main Memory L2 Memory Data- regs L1 Cache path Cache ECE232: Memory Hierarchy 20

Summary  Computer performance is generally determined by the memory hierarchy  An effective hierarchy contains multiple types of memories of increasing size  Caches (our first target) are a subset of main memory • Caches have their own special architecture • Generally made from SRAM • Located close to the processor  Main memory and disks • Bulkier storage • Main memory: Volatile (loses data when power removed) • Disk: Non-volatile ECE232: Memory Hierarchy 21

ECE232: Hardware Organization and Design Lecture 21: Memory - PowerPoint PPT Presentation

ECE232: Hardware Organization and Design Lecture 21: Memory Hierarchy Adapted from Computer Organization and Design , Patterson & Hennessy, UCB Overview Ideally, computer memory would be large and fast Unfortunately, memory

ECE232: Hardware Organization and Design Lecture 7: Binary Numbers and Adders Adapted from

ECE232: Hardware Organization and Design Lecture 4: Logic Operations and Introduction to

ECE232: Hardware Organization and Design Lecture 9: Floating Point Adapted from Computer

ECE232: Hardware Organization and Design Lecture 22: Introduction to Caches Adapted from Computer

ECE232: Hardware Organization and Design Lecture 29: Computer Input/Output Adapted from Computer

ECE232: Hardware Organization and Design Lecture 5: MIPs Decision-Making Instructions Adapted from

ECE232: Hardware Organization and Design Lecture 23: Associative Caches Adapted from Computer

ECE232: Hardware Organization and Design Lecture 11: Introduction to MIPs Datapath Adapted from

ECE232: Hardware Organization and Design Lecture 28: More Virtual Memory Adapted from Computer

Hardware Observability Framework Hardware Observability Framework Hardware Observability

software and hardware for the Internet of Things. Choose hardware Design hardware Design

Sec Secure ure Hardware Hardware and Hardware and Hardware- En Enabled abled Security

VC. VC. Hardware Startup The Hardware Revolu/on The Hardware Revolution Removing Barriers to

Spark architecture Spark architecture Hardware organization Hardware organization In local

Flexible Hardware Design at Flexible Hardware Design at Low Levels of Abstraction Low Levels of

LibreCores Free and Open Digital Hardware Requirements Design Implementation Hardware

MEMORY HIERARCHY DESIGN Mahdi Nazm Bojnordi Assistant Professor School of Computing University

INSTRUCTION SET ARCHITECTURE Mahdi Nazm Bojnordi Assistant Professor School of Computing

PYTHON CSSE 120 Rose Hulman Institute of Technology Final Exam Facts Date: Monday, May 24,

Pattern Based Method For P/G Grid Analysis Jin Shi, Yici Cai, Xianlong Hong Sheldon X.-D. Tan

Pebbles DB: Building Key-Value Stores using Fragmented Log- Structured Merge Trees(II) Peter

HPC Future Look Exascale and Challenges Reusing this material This work is licensed under a

Memory Categorization Separating Attacker-Controlled Data Matthias Neugschwandtner Alessandro

31. Parallel Programming II Shared Memory, Concurrency, Excursion: lock algorithm (Peterson),