Cache Memory Chapter 17 S. Dandamudi Outline Introduction - PowerPoint PPT Presentation

Cache Memory Chapter 17 S. Dandamudi

Outline • Introduction • Types of cache misses • How cache memory works • Types of caches • Why cache memory works • Example implementations ∗ Pentium • Cache design basics ∗ PowerPC • Mapping function ∗ MIPS ∗ Direct mapping • Cache operation summary ∗ Associative mapping • Design issues ∗ Set-associative mapping ∗ Cache capacity • Replacement policies ∗ Cache line size • Write policies ∗ Degree of associatively • Space overhead 2003  S. Dandamudi Chapter 17: Page 2 To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

Introduction • Memory hierarchy ∗ Registers ∗ Memory ∗ Disk ∗ … • Cache memory is a small amount of fast memory ∗ Placed between two levels of memory hierarchy » To bridge the gap in access times – Between processor and main memory (our focus) – Between main memory and disk (disk cache) ∗ Expected to behave like a large amount of fast memory 2003  S. Dandamudi Chapter 17: Page 3 To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

Introduction (cont’d) 2003  S. Dandamudi Chapter 17: Page 4 To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

How Cache Memory Works • Prefetch data into cache before the processor needs it ∗ Need to predict processor future access requirements » Not difficult owing to locality of reference • Important terms ∗ Miss penalty ∗ Hit ratio ∗ Miss ratio = (1 – hit ratio) ∗ Hit time 2003  S. Dandamudi Chapter 17: Page 5 To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

How Cache Memory Works (cont’d) Cache read operation 2003  S. Dandamudi Chapter 17: Page 6 To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

How Cache Memory Works (cont’d) Cache write operation 2003  S. Dandamudi Chapter 17: Page 7 To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

Why Cache Memory Works • Example for (i=0; i<M; i++) for(j=0; j<N; j++) X[i][j] = X[i][j] + K; ∗ Each element of X is double (eight bytes) ∗ Loop is executed (M * N) times » Placing the code in cache avoids access to main memory – Repetitive use (one of the factors) – Temporal locality » Prefetching data – Spatial locality 2003  S. Dandamudi Chapter 17: Page 8 To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

How Cache Memory Works (cont’d) 300 250 Execution time (ms Column-order 200 150 100 Row-order 50 0 500 600 700 800 900 1000 Matrix size 2003  S. Dandamudi Chapter 17: Page 9 To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

Cache Design Basics • On every read miss ∗ A fixed number of bytes are transferred » More than what the processor needs – Effective due to spatial locality • Cache is divided into blocks of B bytes » b -bits are needed as offset into the block b = log 2 B » Block are called cache lines • Main memory is also divided into blocks of same size ∗ Address is divided into two parts 2003  S. Dandamudi Chapter 17: Page 10 To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

Cache Design Basics (cont’d) B = 4 bytes b = 2 bits 2003  S. Dandamudi Chapter 17: Page 11 To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

Cache Design Basics (cont’d) • Transfer between main memory and cache ∗ In units of blocks ∗ Implements spatial locality • Transfer between main memory and cache ∗ In units of words • Need policies for ∗ Block placement ∗ Mapping function ∗ Block replacement ∗ Write policies 2003  S. Dandamudi Chapter 17: Page 12 To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

Cache Design Basics (cont’d) Read cycle operations 2003  S. Dandamudi Chapter 17: Page 13 To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

Mapping Function • Determines how memory blocks are mapped to cache lines • Three types ∗ Direct mapping » Specifies a single cache line for each memory block ∗ Set-associative mapping » Specifies a set of cache lines for each memory block ∗ Associative mapping » No restrictions – Any cache line can be used for any memory block 2003  S. Dandamudi Chapter 17: Page 14 To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

Mapping Function (cont’d) Direct mapping example 2003  S. Dandamudi Chapter 17: Page 15 To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

Mapping Function (cont’d) • Implementing direct mapping ∗ Easier than the other two ∗ Maintains three pieces of information » Cache data – Actual data » Cache tag – Problem: More memory blocks than cache lines � Several memory blocks are mapped to a cache line – Tag stores the address of memory block in cache line » Valid bit – Indicates if cache line contains a valid block 2003  S. Dandamudi Chapter 17: Page 16 To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

Mapping Function (cont’d) 2003  S. Dandamudi Chapter 17: Page 17 To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

Mapping Function (cont’d) Direct mapping Reference pattern: 0, 4, 0, 8, 0, 8, 0, 4, 0, 4, 0, 4 Hit ratio = 0% 2003  S. Dandamudi Chapter 17: Page 18 To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

Mapping Function (cont’d) Direct mapping Reference pattern: 0, 7, 9, 10, 0, 7, 9, 10, 0, 7, 9, 10 Hit ratio = 67% 2003  S. Dandamudi Chapter 17: Page 19 To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

Mapping Function (cont’d) Associative mapping 2003  S. Dandamudi Chapter 17: Page 20 To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

Mapping Function (cont’d) Associative mapping Reference pattern: 0, 4, 0, 8, 0, 8, 0, 4, 0, 4, 0, 4 Hit ratio = 75% 2003  S. Dandamudi Chapter 17: Page 21 To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

Mapping Function (cont’d) Address match logic for associative mapping 2003  S. Dandamudi Chapter 17: Page 22 To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

Mapping Function (cont’d) Associative cache with address match logic 2003  S. Dandamudi Chapter 17: Page 23 To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

Mapping Function (cont’d) Set-associative mapping 2003  S. Dandamudi Chapter 17: Page 24 To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

Mapping Function (cont’d) Address partition in set-associative mapping 2003  S. Dandamudi Chapter 17: Page 25 To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

Mapping Function (cont’d) Set-associative mapping Reference pattern: 0, 4, 0, 8, 0, 8, 0, 4, 0, 4, 0, 4 Hit ratio = 67% 2003  S. Dandamudi Chapter 17: Page 26 To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

Replacement Policies • We invoke the replacement policy ∗ When there is no place in cache to load the memory block • Depends on the actual placement policy in effect ∗ Direct mapping does not need a special replacement policy » Replace the mapped cache line ∗ Several policies for the other two mapping functions » Popular: LRU (least recently used) » Random replacement » Less interest (FIFO, LFU) 2003  S. Dandamudi Chapter 17: Page 27 To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

Replacement Policies (cont’d) • LRU ∗ Expensive to implement » Particularly for set sizes more than four • Implementations resort to approximation ∗ Pseudo-LRU » Partitions sets into two groups – Maintains the group that has been accessed recently – Requires only one bit » Requires only ( W -1) bits ( W = degree of associativity) – PowerPC is an example � Details later 2003  S. Dandamudi Chapter 17: Page 28 To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

Replacement Policies (cont’d) Pseudo-LRU implementation 2003  S. Dandamudi Chapter 17: Page 29 To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

Cache Memory Chapter 17 S. Dandamudi Outline Introduction - PowerPoint PPT Presentation

Cache Memory Chapter 17 S. Dandamudi Outline Introduction Types of cache misses How cache memory works Types of caches Why cache memory works Example implementations Pentium Cache design basics PowerPC

What Is Memory Hierarchy A typical memory hierarchy today: Lecture 13: Cache Basics and Cache

Memory Hierarchy: Cache Memory hierarchy Cache basics Locality Cache organization Cache-aware

1 Classifying cache misses Cache Organization Classifying misses by causes (3Cs) Cache size,

General Cache Mechanics CPU Block: unit of data in cache and memory. (a.k.a. line) Memory

Cache Memory Chapter 17 S. Dandamudi Outline Introduction Types of cache misses

Chapter 4 Cache Memory Contents Computer memory system overview Characteristics of

Cache Systems CPU Main Main CPU Memory Memory 400MHz 10MHz Cache 10MHz Memory Hierarchy

L09: Cache Name: ID: Question: Direct Mapping Cache Hit Rate Consider a 4-block empty Cache,

Virtual Memory 1 Memory Hierarchy Memory 4GB Cache 1M Registers 1K Question: What if

Lecture 23: Cache, Memory, Virtual Memory Todays topics: Cache examples, caching

Caches Electronic Computers M Caches 1 Cache LOCALITY PRINCIPLE (SPATIAL AND TEMPORAL)

Web Cache Consistency Web Cache Consistency Web Cache Consistency Web Cache Consistency

Cache Example Main memory: Byte addressable memory of size 4GB = 2 32 bytes Cache size: 64KB = 2 16

Generations of Cache 1980: no cache in proc; 1989 first Intel proc with a cache on chip.

Distributed Memory and Cache Consistency Distributed Memory and Cache Consistency (some slides

Distributed Memory and Cache Consistency Distributed Memory and Cache Consistency (some slides

CSE 502: Computer Architecture Memory Hierarchy & Caches Motivation 10000 Performance

Chapter 5 Large and Fast: Exploiting Memory Hierarchy 5.1 Introduction Principle of Locality

Caching 3 1 last time tag / index / ofgset lookup in associative caches replacement policies

$$$ $$$ Cache Memory 2 $$$ 2 Schedule This week

Low-Level Memory Optimisations at the High-Level with Ownership-like Annotations Do you want

CPUs Chapter 3.5 Caches. Memory management. Caches and CPUs address data cache

Memory Virtualization: Swapping and Demand Paging Policies 1 University of New Mexico Beyond

Lecture 12: Memory hierarchy & caches A modern memory subsystem combines fast small

Cache Memory Chapter 17 S. Dandamudi Outline Introduction - PowerPoint PPT Presentation

Cache Memory Chapter 17 S. Dandamudi Outline Introduction Types of cache misses How cache memory works Types of caches Why cache memory works Example implementations Pentium Cache design basics PowerPC

What Is Memory Hierarchy A typical memory hierarchy today: Lecture 13: Cache Basics and Cache

Memory Hierarchy: Cache Memory hierarchy Cache basics Locality Cache organization Cache-aware

1 Classifying cache misses Cache Organization Classifying misses by causes (3Cs) Cache size,

General Cache Mechanics CPU Block: unit of data in cache and memory. (a.k.a. line) Memory

Cache Memory Chapter 17 S. Dandamudi Outline Introduction Types of cache misses

Chapter 4 Cache Memory Contents Computer memory system overview Characteristics of

Cache Systems CPU Main Main CPU Memory Memory 400MHz 10MHz Cache 10MHz Memory Hierarchy

L09: Cache Name: ID: Question: Direct Mapping Cache Hit Rate Consider a 4-block empty Cache,

Virtual Memory 1 Memory Hierarchy Memory 4GB Cache 1M Registers 1K Question: What if

Lecture 23: Cache, Memory, Virtual Memory Todays topics: Cache examples, caching

Caches Electronic Computers M Caches 1 Cache LOCALITY PRINCIPLE (SPATIAL AND TEMPORAL)

Web Cache Consistency Web Cache Consistency Web Cache Consistency Web Cache Consistency

Cache Example Main memory: Byte addressable memory of size 4GB = 2 32 bytes Cache size: 64KB = 2 16

Generations of Cache 1980: no cache in proc; 1989 first Intel proc with a cache on chip.

Distributed Memory and Cache Consistency Distributed Memory and Cache Consistency (some slides

Distributed Memory and Cache Consistency Distributed Memory and Cache Consistency (some slides

CSE 502: Computer Architecture Memory Hierarchy &amp; Caches Motivation 10000 Performance

Chapter 5 Large and Fast: Exploiting Memory Hierarchy 5.1 Introduction Principle of Locality

Caching 3 1 last time tag / index / ofgset lookup in associative caches replacement policies

$$$ $$$ Cache Memory 2 $$$ 2 Schedule This week

Low-Level Memory Optimisations at the High-Level with Ownership-like Annotations Do you want

CPUs Chapter 3.5 Caches. Memory management. Caches and CPUs address data cache

Memory Virtualization: Swapping and Demand Paging Policies 1 University of New Mexico Beyond

Lecture 12: Memory hierarchy &amp; caches A modern memory subsystem combines fast small

CSE 502: Computer Architecture Memory Hierarchy & Caches Motivation 10000 Performance

Lecture 12: Memory hierarchy & caches A modern memory subsystem combines fast small