ece232 hardware organization and design
play

ECE232: Hardware Organization and Design Lecture 21: Memory - PowerPoint PPT Presentation

ECE232: Hardware Organization and Design Lecture 21: Memory Hierarchy Adapted from Computer Organization and Design , Patterson & Hennessy, UCB Overview Ideally, computer memory would be large and fast Unfortunately, memory


  1. ECE232: Hardware Organization and Design Lecture 21: Memory Hierarchy Adapted from Computer Organization and Design , Patterson & Hennessy, UCB

  2. Overview  Ideally, computer memory would be large and fast • Unfortunately, memory implementation involves tradeoffs  Memory Hierarchy • Includes caches, main memory, and disks  Caches • Small and fast • Contain subset of data from main memory • Generally close to the processor  Terminology • Cache blocks, hit rate, miss rate  More mathematical than material from earlier in the course ECE232: Memory Hierarchy 2

  3. Recap: Machine Organization Personal Computer Computer Processor Memory Devices (CPU) (active) (passive) Input Control (where programs, & data Datapath live when Output running) ECE232: Memory Hierarchy 3

  4. Memory Basics Users want large and fast memories  Fact  Large memories are slow • Fast memories are small • Large memories use DRAM technology: D ynamic R andom A ccess  M emory High density, low power, cheap, slow • Dynamic: needs to be “refreshed” regularly • DRAM access times are 50-70ns at cost of $10 to $20 per GB  FPM ( Fast Page Mode ) • Fast memories use SRAM : S tatic R andom A ccess M emory  Low density, high power, expensive, fast • Static: content lasts “forever” (until lose power) • SRAM access times are .5 – 5ns at cost of $400 to $1,000 per GB  ECE232: Memory Hierarchy 4

  5. Memory Technology SRAM and DRAM are: Random Access storage  Access time is the same for all locations (Hardware decoder • used) For even larger and cheaper storage (than DRAM) use hard  drive (Disk): Sequential Access Very slow, Data accessed sequentially, access time is location • dependent, considered as I/O Disk access times are 5 to 20 million ns (i.e., msec) at cost of • $.20 to $2.00 per GB ECE232: Memory Hierarchy 5

  6. Processor-Memory Speed gap Problem Processor-DRAM Memory Performance Gap Motivation for Memory Hierarchy µProc 1000 60%/yr. (2X/1.5yr) Performance 100 Processor-Memory Performance Gap: (grows 50% / year) 10 DRAM 5%/yr. (2X/15 yrs) 1 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 Time ECE232: Memory Hierarchy 6

  7. Need for speed Assume CPU runs at 3GHz  Interface Width Frequency Bytes/Sec Every instruction requires  4-way interleaved 4B of instruction and at PC1600 (DDR200) least one memory access 6.4 GB/s SDRAM 4 x64bits 100 MHz DDR (4B of data) Opteron Hyper- • 3 * 8 = 24GB/sec Transport memory 6.4 GB/s Peak performance of bus 128bits 200 MHz DDR  sequential burst of transfer Pentium 4 "800 6.4 GB/s MHz" FSB 64bits 200 MHz QDR ( Performance for random PC2 6400 access is much much (DDR-II 800) slower due to latency ) 6.4 GB/s SDRAM 64bits 400 MHz DDR PC2 5300 Memory bandwidth and  (DDR-II 667) access time is a 5.3 GB/s SDRAM 64bits 333 MHz DDR performance bottleneck Pentium 4 "533 4.3 GB/s MHz" FSB 64bits 133 MHz QDR FSB – Front-Side Bus; DDR – Double Data Rate; SDRAM – Synchronous DRAM ECE232: Memory Hierarchy 7

  8. Need for Large Memory Small memories are fast  So just write small programs  “640 K of memory should be enough for anybody” -- Bill Gates, 1981 Today’s programs require large memories  • Data base applications may require Gigabytes of memory ECE232: Memory Hierarchy 8

  9. The Goal: Illusion of large, fast, cheap memory How do we create a memory that is large, cheap and fast  (most of the time)? Strategy: Provide a Small, Fast Memory which holds a subset  of the main memory – called cache • Keep frequently-accessed locations in fast cache • Cache retrieves more than one word at a time • Sequential accesses are faster after first access ECE232: Memory Hierarchy 9

  10. Memory Hierarchy Hierarchy of Levels  • Uses smaller and faster memory technologies close to the processor • Fast access time in highest level of hierarchy • Cheap, slow memory furthest from processor The aim of memory hierarchy design is to have access time  close to the highest level and size equal to the lowest level ECE232: Memory Hierarchy 10

  11. Memory Hierarchy Pyramid Processor (CPU) transfer datapath: bus Decreasing distance Increasing from CPU, Level 1 Distance from Decreasing CPU, Access Time Level 2 Decreasing cost (Memory / MB Latency) Level 3 . . . Level n Size of memory at each level ECE232: Memory Hierarchy 11

  12. Basic Philosophy Move data into ‘smaller, faster’ memory  Operate on it  Move it back to ‘larger, cheaper’ memory  • How do we keep track if changed What if we run out of space in ‘smaller, faster’ memory?  Important Concepts: Latency, Bandwidth  ECE232: Memory Hierarchy 12

  13. Typical Hierarchy Cache/MM virtual memory C 8 B a 32 B 4 KB CPU Memory disk c regs h e Notice that the data width is changing  • Why? Bandwidth: Transfer rate between various levels  • CPU-Cache: 24 GBps • Cache-Main: 0.5-6.4GBps • Main-Disk: 187MBps (serial ATA/1500) ECE232: Memory Hierarchy 13

  14. Why large blocks? Fetch large blocks at a time  • Take advantage of spatial locality for (i=0; i < length; i++) sum += array[i]; • array has spatial locality • sum has temporal locality ECE232: Memory Hierarchy 14

  15. Why Hierarchy works: Natural Locality The Principle of Locality  • Programs access a relatively small portion of the address space at any second 1 Probability of reference 0 2 n - 1 0 Memory Address Temporal Locality (Locality in Time):  Recently accessed  data tend to be referenced again soon Spatial Locality (Locality in Space):  nearby items will tend  to be referenced soon ECE232: Memory Hierarchy 15

  16. Taking Advantage of Locality Memory hierarchy  Store everything on disk  Copy recently accessed (and nearby) items from disk to smaller  DRAM memory Main memory • Copy more recently accessed (and nearby) items from DRAM to  smaller SRAM memory Cache memory attached to CPU • ECE232: Memory Hierarchy 16

  17. Principle of Locality Programs access a small proportion of their address space at any  time Temporal locality  Items accessed recently are likely to be accessed again soon • e.g., instructions in a loop, induction variables • Spatial locality  Items near those accessed recently are likely to be accessed • soon E.g., sequential instruction access, array data • ECE232: Memory Hierarchy 17

  18. Memory Hierarchy: Terminology Hit: data appears in upper level in block X  Hit Rate: the fraction of memory accesses found in the upper  level Miss: data needs to be retrieved from a block in the lower  level (Block Y) Miss Rate = 1 - (Hit Rate)  Hit Time: Time to access the upper level which consists of  Time to determine hit/miss + upper level access time Miss Penalty: Time to replace a block in the upper level +  Time to deliver the block to the processor Note: Hit Time << Miss Penalty  Lower Level Upper Level To Processor Block Y Block X From Processor Block X ECE232: Memory Hierarchy 18

  19. Current Memory Hierarchy Processor Control Secondary Memory Main L2 Memory Data- regs L1 Cache path Cache Speed(ns): 1ns 2ns 6ns 100ns 10,000,000ns Size (MB): 0.0005 0.1 1-4 1000-6000 500,000 Cost ($/MB): -- $10 $3 $0.01 $0.002 Technology: Regs SRAM SRAM DRAM Disk • Cache - Main memory: Speed • Main memory – Disk (virtual memory): Capacity ECE232: Memory Hierarchy 19

  20. How is the hierarchy managed? Registers « Memory  • By the compiler (or assembly language Programmer) Cache « Main Memory  • By hardware Main Memory « Disks  • By combination of hardware and the operating system • virtual memory Processor Control Secondary Main Memory L2 Memory Data- regs L1 Cache path Cache ECE232: Memory Hierarchy 20

  21. Summary  Computer performance is generally determined by the memory hierarchy  An effective hierarchy contains multiple types of memories of increasing size  Caches (our first target) are a subset of main memory • Caches have their own special architecture • Generally made from SRAM • Located close to the processor  Main memory and disks • Bulkier storage • Main memory: Volatile (loses data when power removed) • Disk: Non-volatile ECE232: Memory Hierarchy 21

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend