ECE232: Hardware Organization and Design Lecture 21: Memory - - PowerPoint PPT Presentation

ece232 hardware organization and design
SMART_READER_LITE
LIVE PREVIEW

ECE232: Hardware Organization and Design Lecture 21: Memory - - PowerPoint PPT Presentation

ECE232: Hardware Organization and Design Lecture 21: Memory Hierarchy Adapted from Computer Organization and Design , Patterson & Hennessy, UCB Overview Ideally, computer memory would be large and fast Unfortunately, memory


slide-1
SLIDE 1

Adapted from Computer Organization and Design, Patterson & Hennessy, UCB

ECE232: Hardware Organization and Design

Lecture 21: Memory Hierarchy

slide-2
SLIDE 2

ECE232: Memory Hierarchy 2

Overview

  • Ideally, computer memory would be large and fast
  • Unfortunately, memory implementation involves tradeoffs
  • Memory Hierarchy
  • Includes caches, main memory, and disks
  • Caches
  • Small and fast
  • Contain subset of data from main memory
  • Generally close to the processor
  • Terminology
  • Cache blocks, hit rate, miss rate
  • More mathematical than material from earlier in the course
slide-3
SLIDE 3

ECE232: Memory Hierarchy 3

Recap: Machine Organization

Personal Computer Processor (CPU) (active) Computer Control

Datapath

Memory

(passive) (where programs, & data live when running) Devices Input Output

slide-4
SLIDE 4

ECE232: Memory Hierarchy 4

Memory Basics

  • Users want large and fast memories
  • Fact
  • Large memories are slow
  • Fast memories are small
  • Large memories use DRAM technology: Dynamic Random Access

Memory

  • High density, low power, cheap, slow
  • Dynamic: needs to be “refreshed” regularly
  • DRAM access times are 50-70ns at cost of $10 to $20 per GB
  • FPM (Fast Page Mode)
  • Fast memories use SRAM: Static Random Access Memory
  • Low density, high power, expensive, fast
  • Static: content lasts “forever” (until lose power)
  • SRAM access times are .5 – 5ns at cost of $400 to $1,000 per GB
slide-5
SLIDE 5

ECE232: Memory Hierarchy 5

Memory Technology

  • SRAM and DRAM are: Random Access storage
  • Access time is the same for all locations (Hardware decoder

used)

  • For even larger and cheaper storage (than DRAM) use hard

drive (Disk): Sequential Access

  • Very slow, Data accessed sequentially, access time is location

dependent, considered as I/O

  • Disk access times are 5 to 20 million ns (i.e., msec) at cost of

$.20 to $2.00 per GB

slide-6
SLIDE 6

ECE232: Memory Hierarchy 6

Processor-Memory Speed gap Problem

µProc 60%/yr. (2X/1.5yr) DRAM 5%/yr. (2X/15 yrs)

1 10 100 1000

1980 1981 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 1982

Processor-Memory Performance Gap: (grows 50% / year) Performance Time Processor-DRAM Memory Performance Gap Motivation for Memory Hierarchy

slide-7
SLIDE 7

ECE232: Memory Hierarchy 7

Need for speed

  • Assume CPU runs at 3GHz
  • Every instruction requires

4B of instruction and at least one memory access (4B of data)

  • 3 * 8 = 24GB/sec
  • Peak performance of

sequential burst of transfer (Performance for random access is much much slower due to latency)

  • Memory bandwidth and

access time is a performance bottleneck

Interface Width Frequency Bytes/Sec 4-way interleaved PC1600 (DDR200) SDRAM 4 x64bits 100 MHz DDR

6.4 GB/s

Opteron Hyper- Transport memory bus 128bits 200 MHz DDR

6.4 GB/s

Pentium 4 "800 MHz" FSB 64bits 200 MHz QDR

6.4 GB/s

PC2 6400 (DDR-II 800) SDRAM 64bits 400 MHz DDR

6.4 GB/s

PC2 5300 (DDR-II 667) SDRAM 64bits 333 MHz DDR

5.3 GB/s

Pentium 4 "533 MHz" FSB 64bits 133 MHz QDR

4.3 GB/s

FSB – Front-Side Bus; DDR – Double Data Rate; SDRAM – Synchronous DRAM

slide-8
SLIDE 8

ECE232: Memory Hierarchy 8

Need for Large Memory

  • Small memories are fast
  • So just write small programs

“640 K of memory should be enough for anybody”

  • - Bill Gates, 1981
  • Today’s programs require large memories
  • Data base applications may require Gigabytes of memory
slide-9
SLIDE 9

ECE232: Memory Hierarchy 9

The Goal: Illusion of large, fast, cheap memory

  • How do we create a memory that is large, cheap and fast

(most of the time)?

  • Strategy: Provide a Small, Fast Memory which holds a subset
  • f the main memory – called cache
  • Keep frequently-accessed locations in fast cache
  • Cache retrieves more than one word at a time
  • Sequential accesses are faster after first access
slide-10
SLIDE 10

ECE232: Memory Hierarchy 10

Memory Hierarchy

  • Hierarchy of Levels
  • Uses smaller and faster memory technologies close to the

processor

  • Fast access time in highest level of hierarchy
  • Cheap, slow memory furthest from processor
  • The aim of memory hierarchy design is to have access time

close to the highest level and size equal to the lowest level

slide-11
SLIDE 11

ECE232: Memory Hierarchy 11

Memory Hierarchy Pyramid

Processor (CPU) Size of memory at each level Level 1 Level 2 Level n Increasing Distance from CPU, Decreasing cost / MB Level 3 . . .

transfer datapath: bus

Decreasing distance from CPU, Decreasing Access Time (Memory Latency)

slide-12
SLIDE 12

ECE232: Memory Hierarchy 12

Basic Philosophy

  • Move data into ‘smaller, faster’ memory
  • Operate on it
  • Move it back to ‘larger, cheaper’ memory
  • How do we keep track if changed
  • What if we run out of space in ‘smaller, faster’ memory?
  • Important Concepts: Latency, Bandwidth
slide-13
SLIDE 13

ECE232: Memory Hierarchy 13

Typical Hierarchy

  • Notice that the data width is changing
  • Why?
  • Bandwidth: Transfer rate between various levels
  • CPU-Cache: 24 GBps
  • Cache-Main: 0.5-6.4GBps
  • Main-Disk: 187MBps (serial ATA/1500)

CPU regs C a c h e Memory disk 8 B 32 B 4 KB Cache/MM virtual memory

slide-14
SLIDE 14

ECE232: Memory Hierarchy 14

Why large blocks?

  • Fetch large blocks at a time
  • Take advantage of spatial locality

for (i=0; i < length; i++) sum += array[i];

  • array has spatial locality
  • sum has temporal locality
slide-15
SLIDE 15

ECE232: Memory Hierarchy 15

Why Hierarchy works: Natural Locality

  • The Principle of Locality
  • Programs access a relatively small portion of the address

space at any second

Memory Address

2n - 1

Probability

  • f reference
  • Temporal Locality (Locality in Time): Recently accessed

data tend to be referenced again soon

  • Spatial Locality (Locality in Space): nearby items will tend

to be referenced soon 1

slide-16
SLIDE 16

ECE232: Memory Hierarchy 16

Taking Advantage of Locality

  • Memory hierarchy
  • Store everything on disk
  • Copy recently accessed (and nearby) items from disk to smaller

DRAM memory

  • Main memory
  • Copy more recently accessed (and nearby) items from DRAM to

smaller SRAM memory

  • Cache memory attached to CPU
slide-17
SLIDE 17

ECE232: Memory Hierarchy 17

Principle of Locality

  • Programs access a small proportion of their address space at any

time

  • Temporal locality
  • Items accessed recently are likely to be accessed again soon
  • e.g., instructions in a loop, induction variables
  • Spatial locality
  • Items near those accessed recently are likely to be accessed

soon

  • E.g., sequential instruction access, array data
slide-18
SLIDE 18

ECE232: Memory Hierarchy 18

Memory Hierarchy: Terminology

  • Hit: data appears in upper level in block X
  • Hit Rate: the fraction of memory accesses found in the upper

level

  • Miss: data needs to be retrieved from a block in the lower

level (Block Y)

  • Miss Rate = 1 - (Hit Rate)
  • Hit Time: Time to access the upper level which consists of

Time to determine hit/miss + upper level access time

  • Miss Penalty: Time to replace a block in the upper level +

Time to deliver the block to the processor

  • Note: Hit Time << Miss Penalty

Lower Level Upper Level

To Processor From Processor

Block X Block Y

Block X

slide-19
SLIDE 19

ECE232: Memory Hierarchy 19

Current Memory Hierarchy

Speed(ns): 1ns 2ns 6ns 100ns 10,000,000ns Size (MB): 0.0005 0.1 1-4 1000-6000 500,000 Cost ($/MB):

  • $10

$3 $0.01 $0.002 Technology: Regs SRAM SRAM DRAM Disk Control Data- path Processor regs

Secondary Memory

L2 Cache

L1 Cache

Main Memory

  • Cache - Main memory: Speed
  • Main memory – Disk (virtual memory): Capacity
slide-20
SLIDE 20

ECE232: Memory Hierarchy 20

How is the hierarchy managed?

  • Registers « Memory
  • By the compiler (or assembly language Programmer)
  • Cache « Main Memory
  • By hardware
  • Main Memory « Disks
  • By combination of hardware and the operating system
  • virtual memory

Control Data- path Processor regs Secondary Memory L2 Cache L1 Cache Main Memory

slide-21
SLIDE 21

ECE232: Memory Hierarchy 21

Summary

  • Computer performance is generally determined by the

memory hierarchy

  • An effective hierarchy contains multiple types of memories of

increasing size

  • Caches (our first target) are a subset of main memory
  • Caches have their own special architecture
  • Generally made from SRAM
  • Located close to the processor
  • Main memory and disks
  • Bulkier storage
  • Main memory: Volatile (loses data when power removed)
  • Disk: Non-volatile