IC220: Caching 1 (Chapter 5) 1 Memory, Cost, and Performance - PowerPoint PPT Presentation

IC220: Caching 1 (Chapter 5) 1

Memory, Cost, and Performance • Ideal World: we want a memory that is – Fast, – Big, & – Cheap! (Choose any two!) • Recent “real world” situation: SRAM access times are .5 – 2.5ns at cost of $500 to $1000 per GB. DRAM access times are 50-70ns at cost of $10 to $20 per GB. Flash memory access times are 5000-50,000 ns at cost of $0.75-$1 per GB Disk access times are 5 to 20 million ns at cost of $.05 to $0.10 per GB. • Solution: CACHING 2

Caching Concepts and Terminology Locality: Temporal and Spatial Each access: Hit or Miss Eviction strategies: Random or Least-Recently Used (LRU) Reasons for a Miss: Compulsory, Capacity, or Conflict Measurements: Miss Rate, Hit Time, Miss Penalty Cache types: Fully Associative, Direct-Mapped, or Set Associative Parameters: N (total size), B (size of block), k (associativity) Write strategies: Write-through or Write-back Implementation details: Stall, Valid bit, Dirty bit, Tag 3

Principle of Locality • Basic observations on how memory tends to be accessed in computer programs: • If an item is referenced, 1. it will tend to be referenced again soon (TEMPORAL LOCALITY) 2. nearby items will tend to be referenced soon. (SPATIAL LOCALITY) 4

Caching Basics Cache consists of N bytes (N=8 in our examples here) To read or write to a given address: 1. First look in cache. 2. If it’s there, we have a HIT. 3. Otherwise, it’s a MISS and you must fetch from main memory. – If cache is full, must EVICT to insert the new data. Which cache line should be evicted? • Random • Least-Recently Used (LRU) (our examples will always follow LRU) 5

Example 1 – Fully associative (no blocking) Memory Cache (N = 8) Processor 16 67 1. Read 42 17 3 2. Read 43 Address Data Address Data 3. Read 18 18 27 4. Read 43 19 32 5. Read 17 ... 6. Read 42 42 78 7. Read 19 43 59 8. Read 45 44 24 9. Read 44 45 56 10. Read 46 46 87 11. Read 43 47 36 12. Read 47 48 98 13. Read 18 49 59 Total hits? Total misses? 6

Analysis of Example 1 (FA, no blocking) # of misses • Miss Rate = = # of accesses • Two kinds of misses: – Compulsory miss (first time accessing) – Capacity miss (not enough room, got evicted) • What was good: • What was bad: • Measurement concepts: – Hit Time: How long to look up something that is in cache – Miss Penalty: How long to fetch something not in cache 7

How to handle a miss? • Things we need to do: 1. Stall the CPU until miss completes 2. (If cache is full) Evict old data • Random or LRU 3. Fetch the needed data from memory 4. Restart the CPU Time for this is called the Miss Penalty. 8

Blocking Goal: Exploit spatial locality Main idea: • Group memory into blocks • B bytes in each block of memory • B bytes in each cache line • Always fetch and evict entire blocks (even if not all data was requested yet) • Position within cache line determined by Byte Offset: 9

Example 2 – Fully associative with blocking Memory Cache (N=8, B=2) Processor B 16 67 1. Read 42 L O 17 3 C 2. Read 43 K Data 3. Read 18 B 18 27 L Address O 4. Read 43 19 32 C Offset 0 Offset 1 K 5. Read 17 ... 6. Read 42 B 42 78 L O 7. Read 19 43 59 C K 8. Read 45 B 44 24 L 9. Read 44 O 45 56 C K 10. Read 46 B 46 87 L 11. Read 43 O 47 36 C K 12. Read 47 B 48 98 L 13. Read 18 O 49 59 C Total hits? K Total misses? 10

Analysis of Example 2 (FA with blocking) # of misses • Miss Rate = = # of accesses • Advantages: • Disadvantages: • Measurement concepts: – Hit Time: How long to look up something that is in cache – Miss Penalty: How long to fetch something not in cache 11

How big should the blocks be? • Keeping cache size N fixed, – Smaller B means: – Larger B means: • Increasing block size tends to decrease miss rate, up to a point: 12

Improving Hit Time with Direct Mapping Problem: How to determine whether a block is in cache? Fully associative (previous examples): • Requested block could be anywhere in cache • Must search through all cache lines, or keep an extra data structure Direct Mapped Cache: • Assign index 0 through (N/B - 1) to each cache line • Each memory block is assigned one possible index • Formula: 13

Example 3 – Direct-Mapped Cache (N=8, B=2) Memory Processor B 16 67 1. Read 42 L O 17 3 C 2. Read 43 K Data 3. Read 18 B 18 27 L Index Address O 4. Read 43 19 32 C Offset 0 Offset 1 K 5. Read 17 ... 0 6. Read 42 B 42 78 L O 7. Read 19 43 59 C K 1 8. Read 45 B 44 24 L 9. Read 44 O 45 56 C K 2 10. Read 46 B 46 87 L 11. Read 43 O 47 36 C 3 K 12. Read 47 B 48 98 L 13. Read 18 O 49 59 C Total hits? K Total misses? 14

Analysis of Example 3 (Direct mapped) # of misses • Miss Rate = = # of accesses • THREE kinds of misses: – Compulsory miss (first time accessing) – Capacity miss (not enough room, got evicted) – Conflict miss (same index, would have enough room in FA) • Advantages: • Disadvantages: • Measurement concepts: – Hit Time: How long to look up something that is in cache – Miss Penalty: How long to fetch something not in cache 15

Compromise: Set-Associative Goal: Combine low miss rate of FA with good hit time of DM Fully associative (FA): • Requested block could be anywhere in cache Direct Mapped (DM): • Assign index 0 through (N/B - 1) to each cache line • Each memory block is assigned one possible index k-way Set Associative Cache: • Group cache lines into “sets” of k lines each • Each set has a DM index, 0 through N/(kB) - 1 • Within each set, addresses can go anywhere (associative) • Formula: 16

Example 4 – 2-way Set-Associative Cache (N=8, B=2, k=2) Memory Processor B 16 67 1. Read 42 L O 17 3 C 2. Read 43 K 3. Read 18 B 18 27 Data L Index Address O 4. Read 43 19 32 C Offset 0 Offset 1 K 5. Read 17 ... 6. Read 42 B 42 78 L O 0 7. Read 19 43 59 C K 8. Read 45 B 44 24 L 9. Read 44 O 45 56 C K 10. Read 46 B 46 87 L 11. Read 43 1 O 47 36 C K 12. Read 47 B 48 98 L 13. Read 18 O 49 59 C Total hits? K Total misses? 17

Performance Tradeoffs • Block size – Advantages of small B: – Advantages of large B: – Typical values: 64 bytes (bytes, not bits!) • Associativity – Advantages of small k (DM): – Advantages of large k (SA, FA): – Typical values: 4, 8, 12 18

What to do on a write? Cache (N=5, B=1, k=1) Processor Memory 1. Read 24 Address Data 20 7 2. Write 5 to 24 0 21 3 3. Read 26 1 22 27 23 32 2 4. Write 8 to 25 24 101 3 5. Write 9 to 21 25 78 4 6. Write 2 to 24 26 34 7. Read 29 27 87 28 53 29 93 19

Write Strategies • Write-through: Update memory immediately • Write-back: Update memory on eviction 20

Write-back example Cache (N=5, B=1, k=1) Processor Memory 1. Read 24 Address Data 20 7 2. Write 5 to 24 0 21 3 3. Read 26 1 22 27 23 32 2 4. Write 8 to 25 24 101 3 5. Write 9 to 21 25 78 4 6. Write 2 to 24 26 34 7. Read 29 27 87 28 53 29 93 21

Efficient Bit Manipulation Given 2-way associative cache with N=64 and B=8, what is the set index for address 153? Formulas : NEW: (assuming dealing with powers-of-2) a. Express in binary. (153 10 = 99 16 = 10011001 2 ) b. Grab the right bits! ByteOffset = Index = Tag = 22

Real Cache with Efficient Bit Manipulation 23

Example #1: Bit Manipulation 1. Suppose direct-mapped cache has: – B=8 byte blocks – 2KiB cache Show how to break the following address into the tag, index, & byte offset. 0000 1000 0101 1100 0001 0001 0111 1001 2. Same cache, but now 4-way associative. How does this change things? 0000 1000 0101 1100 0001 0001 0111 1001 24

Example #2: Bit Manipulation Suppose a direct-mapped cache divides addresses as follows: 21 bits 7 bits 4 bits tag index byte offset What is the block size? The number of blocks? Total size of the cache? (usually refers to size of data only) 25

Review: Main concepts Locality: Temporal and Spatial Each access: Hit or Miss Eviction strategies: Random or Least-Recently Used (LRU) Reasons for a Miss: Compulsory, Capacity, or Conflict Measurements: Miss Rate, Hit Time, Miss Penalty Cache types: Fully Associative, Direct-Mapped, or Set Associative Parameters: N (total size), B (size of block), k (associativity) Write strategies: Write-through or Write-back Implementation details: Stall, Valid bit, Dirty bit, Tag 26

IC220: Caching 1 (Chapter 5) 1 Memory, Cost, and Performance - PowerPoint PPT Presentation

IC220: Caching 1 (Chapter 5) 1 Memory, Cost, and Performance Ideal World: we want a memory that is Fast, Big, & Cheap! (Choose any two!) Recent real world situation: SRAM access times are .5 2.5ns at

Agenda Caching Caching Gitlab Demo Caching Demos Mirroring Caching Limitations Manual

Web Proxy Web Proxy Caching Caching Caching Web Proxy Web Proxy Caching By Miquel Company

Cooperative Web Caching Cooperative Web Caching Cooperative Caching Cooperative Caching

Web Caching and Content Delivery Web Caching and Content Delivery Caching for a Better Web

Web Caching based on: Web Caching , Geoff Huston Web Caching and Zipf-like Distributions:

Scaling Your Cache & Caching at Scale Alex Miller @puredanger Mission Why does caching

Web Caching Web Caching and wireless networks Next generation Wireless Networks Helsinki

IC220 Caching 2: Memory Hierarchy (more from Chapter 5 - specifically 5.7, 5.8) 1 Cache design

ADMIN Reading finish Chapter 5 Sections 5.4 (skip 511-515), 5.5, 5.11, 5.12 IC220

Temporal Temporal Radiance Caching Radiance Caching Pascal Gautron R&D Engineer Thomson

1 Harvest Harvest- -Style ICP Hierarchies Style ICP Hierarchies Issues for Cache Hierarchies

1 Web Traffic Characterization Zipf Web Traffic Characterization Zipf [Breslau/Cao99] and

Slide 2 Caching is both the most effective AND the most cost-effective method for schools to

Slide 2 Caching is both the most effective AND the most cost-effective method for schools to

Understanding Optimal Caching and Opportunistic Caching at The Edge of Information Centric

CACHING BEYOND RAM CACHING BEYOND RAM memcached.org/blog @dormando WHY RAM? WHY RAM?

The Mathematics behind the Property of Associativity An invitation to study the many variants of

Cache Performance Associativity Replacement Samira Khan Cache Performance March 28,

Expressions and Assignment COS 301: Programming Languages UMAINE CIS COS 301 Programming

Induction Bernd Schr oder logo1 Bernd Schr oder Louisiana Tech University, College of

Programming Abstractions Week 4-1: Combinators and combinatory logic Stephen Checkoway An early

Tutorial 9 : cache memory Why use a cache ? Main memory (VRAM/DRAM) is slow ! To deal with

CS453 LR(1), LALR, AMBIGUITY CS453 Shift-Reduce Cont' 1 LR(1), LALR, Ambiguity The plan:

Non-Associative Flux Algebra in String and M-theory from Octonions DIETER LST (LMU, MPI)

IC220: Caching 1 (Chapter 5) 1 Memory, Cost, and Performance - PowerPoint PPT Presentation

IC220: Caching 1 (Chapter 5) 1 Memory, Cost, and Performance Ideal World: we want a memory that is Fast, Big, & Cheap! (Choose any two!) Recent real world situation: SRAM access times are .5 2.5ns at

Agenda Caching Caching Gitlab Demo Caching Demos Mirroring Caching Limitations Manual

Web Proxy Web Proxy Caching Caching Caching Web Proxy Web Proxy Caching By Miquel Company

Cooperative Web Caching Cooperative Web Caching Cooperative Caching Cooperative Caching

Web Caching and Content Delivery Web Caching and Content Delivery Caching for a Better Web

Web Caching based on: Web Caching , Geoff Huston Web Caching and Zipf-like Distributions:

Scaling Your Cache &amp; Caching at Scale Alex Miller @puredanger Mission Why does caching

Web Caching Web Caching and wireless networks Next generation Wireless Networks Helsinki

IC220 Caching 2: Memory Hierarchy (more from Chapter 5 - specifically 5.7, 5.8) 1 Cache design

ADMIN Reading finish Chapter 5 Sections 5.4 (skip 511-515), 5.5, 5.11, 5.12 IC220

Temporal Temporal Radiance Caching Radiance Caching Pascal Gautron R&amp;D Engineer Thomson

1 Harvest Harvest- -Style ICP Hierarchies Style ICP Hierarchies Issues for Cache Hierarchies

1 Web Traffic Characterization Zipf Web Traffic Characterization Zipf [Breslau/Cao99] and

Slide 2 Caching is both the most effective AND the most cost-effective method for schools to

Slide 2 Caching is both the most effective AND the most cost-effective method for schools to

Understanding Optimal Caching and Opportunistic Caching at The Edge of Information Centric

CACHING BEYOND RAM CACHING BEYOND RAM memcached.org/blog @dormando WHY RAM? WHY RAM?

The Mathematics behind the Property of Associativity An invitation to study the many variants of

Cache Performance Associativity Replacement Samira Khan Cache Performance March 28,

Expressions and Assignment COS 301: Programming Languages UMAINE CIS COS 301 Programming

Induction Bernd Schr oder logo1 Bernd Schr oder Louisiana Tech University, College of

Programming Abstractions Week 4-1: Combinators and combinatory logic Stephen Checkoway An early

Tutorial 9 : cache memory Why use a cache ? Main memory (VRAM/DRAM) is slow ! To deal with

CS453 LR(1), LALR, AMBIGUITY CS453 Shift-Reduce Cont' 1 LR(1), LALR, Ambiguity The plan:

Non-Associative Flux Algebra in String and M-theory from Octonions DIETER LST (LMU, MPI)

Scaling Your Cache & Caching at Scale Alex Miller @puredanger Mission Why does caching

Temporal Temporal Radiance Caching Radiance Caching Pascal Gautron R&D Engineer Thomson