SLIDE 1
1
IC220: Caching 1 (Chapter 5)
SLIDE 2 2
- Ideal World: we want a memory that is
– Fast, – Big, & – Cheap! (Choose any two!)
- Recent “real world” situation:
SRAM access times are .5 – 2.5ns at cost of $500 to $1000 per GB. DRAM access times are 50-70ns at cost of $10 to $20 per GB. Flash memory access times are 5000-50,000 ns at cost of $0.75-$1 per GB Disk access times are 5 to 20 million ns at cost of $.05 to $0.10 per GB.
Memory, Cost, and Performance
SLIDE 3
3
Caching Concepts and Terminology
Locality: Temporal and Spatial Each access: Hit or Miss Eviction strategies: Random or Least-Recently Used (LRU) Reasons for a Miss: Compulsory, Capacity, or Conflict Measurements: Miss Rate, Hit Time, Miss Penalty Cache types: Fully Associative, Direct-Mapped, or Set Associative Parameters: N (total size), B (size of block), k (associativity) Write strategies: Write-through or Write-back Implementation details: Stall, Valid bit, Dirty bit, Tag
SLIDE 4 4
Principle of Locality
- Basic observations on how memory tends to be
accessed in computer programs:
- If an item is referenced,
- 1. it will tend to be referenced again soon
(TEMPORAL LOCALITY)
- 2. nearby items will tend to be referenced soon.
(SPATIAL LOCALITY)
SLIDE 5 5
Caching Basics
Cache consists of N bytes (N=8 in our examples here) To read or write to a given address: 1. First look in cache. 2. If it’s there, we have a HIT. 3. Otherwise, it’s a MISS and you must fetch from main memory. – If cache is full, must EVICT to insert the new data. Which cache line should be evicted?
- Random
- Least-Recently Used (LRU)
(our examples will always follow LRU)
SLIDE 6
6 Example 1 – Fully associative (no blocking)
Address Data
Cache (N = 8)
16 67 17 3 18 27 19 32
...
42 78 43 59 44 24 45 56 46 87 47 36 48 98 49 59
Memory Processor
1. Read 42 2. Read 43 3. Read 18 4. Read 43 5. Read 17 6. Read 42 7. Read 19 8. Read 45 9. Read 44 10. Read 46 11. Read 43 12. Read 47 13. Read 18
Total hits? Total misses?
Address Data
SLIDE 7 7
Analysis of Example 1 (FA, no blocking)
=
– Compulsory miss (first time accessing) – Capacity miss (not enough room, got evicted)
- What was good:
- What was bad:
- Measurement concepts:
– Hit Time: How long to look up something that is in cache – Miss Penalty: How long to fetch something not in cache # of misses # of accesses
SLIDE 8 8
How to handle a miss?
1. Stall the CPU until miss completes 2. (If cache is full) Evict old data
3. Fetch the needed data from memory 4. Restart the CPU Time for this is called the Miss Penalty.
SLIDE 9 9
Blocking
Goal: Exploit spatial locality Main idea:
- Group memory into blocks
- B bytes in each block of memory
- B bytes in each cache line
- Always fetch and evict entire blocks
(even if not all data was requested yet)
- Position within cache line determined by Byte Offset:
SLIDE 10 10 Example 2 – Fully associative with blocking
Address Data Offset 0
Offset 1
Cache (N=8, B=2)
16 67
B L O C K
17 3 18 27
B L O C K
19 32
...
42 78
B L O C K
43 59 44 24
B L O C K
45 56 46 87
B L O C K
47 36 48 98
B L O C K
49 59
Memory Processor
1. Read 42 2. Read 43 3. Read 18 4. Read 43 5. Read 17 6. Read 42 7. Read 19 8. Read 45 9. Read 44 10. Read 46 11. Read 43 12. Read 47 13. Read 18
Total hits? Total misses?
SLIDE 11 11
Analysis of Example 2 (FA with blocking)
=
- Advantages:
- Disadvantages:
- Measurement concepts:
– Hit Time: How long to look up something that is in cache – Miss Penalty: How long to fetch something not in cache # of misses # of accesses
SLIDE 12 12
How big should the blocks be?
- Keeping cache size N fixed,
– Smaller B means: – Larger B means:
- Increasing block size tends to decrease miss rate, up to a point:
SLIDE 13 13
Improving Hit Time with Direct Mapping
Problem: How to determine whether a block is in cache? Fully associative (previous examples):
- Requested block could be anywhere in cache
- Must search through all cache lines,
- r keep an extra data structure
Direct Mapped Cache:
- Assign index 0 through (N/B - 1) to each cache line
- Each memory block is assigned one possible index
- Formula:
SLIDE 14 14 Example 3 – Direct-Mapped
Index Address Data Offset 0
Offset 1
1 2 3
Cache (N=8, B=2) Memory Processor
1. Read 42 2. Read 43 3. Read 18 4. Read 43 5. Read 17 6. Read 42 7. Read 19 8. Read 45 9. Read 44 10. Read 46 11. Read 43 12. Read 47 13. Read 18
Total hits? Total misses?
16 67
B L O C K
17 3 18 27
B L O C K
19 32
...
42 78
B L O C K
43 59 44 24
B L O C K
45 56 46 87
B L O C K
47 36 48 98
B L O C K
49 59
SLIDE 15 15
Analysis of Example 3 (Direct mapped)
=
– Compulsory miss (first time accessing) – Capacity miss (not enough room, got evicted) – Conflict miss (same index, would have enough room in FA)
- Advantages:
- Disadvantages:
- Measurement concepts:
– Hit Time: How long to look up something that is in cache – Miss Penalty: How long to fetch something not in cache # of misses # of accesses
SLIDE 16 16
Compromise: Set-Associative
Goal: Combine low miss rate of FA with good hit time of DM Fully associative (FA):
- Requested block could be anywhere in cache
Direct Mapped (DM):
- Assign index 0 through (N/B - 1) to each cache line
- Each memory block is assigned one possible index
k-way Set Associative Cache:
- Group cache lines into “sets” of k lines each
- Each set has a DM index, 0 through N/(kB) - 1
- Within each set, addresses can go anywhere (associative)
- Formula:
SLIDE 17 17 Example 4 – 2-way Set-Associative
Index Address Data Offset 0
Offset 1
1
Cache (N=8, B=2, k=2) Memory Processor
1. Read 42 2. Read 43 3. Read 18 4. Read 43 5. Read 17 6. Read 42 7. Read 19 8. Read 45 9. Read 44 10. Read 46 11. Read 43 12. Read 47 13. Read 18
Total hits? Total misses?
16 67
B L O C K
17 3 18 27
B L O C K
19 32
...
42 78
B L O C K
43 59 44 24
B L O C K
45 56 46 87
B L O C K
47 36 48 98
B L O C K
49 59
SLIDE 18 18
Performance Tradeoffs
– Advantages of small B: – Advantages of large B: – Typical values: 64 bytes (bytes, not bits!)
– Advantages of small k (DM): – Advantages of large k (SA, FA): – Typical values: 4, 8, 12
SLIDE 19 19
What to do on a write?
Address Data
Cache (N=5, B=1, k=1)
20 7 21 3 22 27 23 32 24 101 25 78 26 34 27 87 28 53 29 93
Memory Processor
- 1. Read 24
- 2. Write 5 to 24
- 3. Read 26
- 4. Write 8 to 25
- 5. Write 9 to 21
- 6. Write 2 to 24
- 7. Read 29
1 2 3 4
SLIDE 20 20
Write Strategies
- Write-through: Update memory immediately
- Write-back: Update memory on eviction
SLIDE 21 21
Write-back example
Address Data
Cache (N=5, B=1, k=1)
20 7 21 3 22 27 23 32 24 101 25 78 26 34 27 87 28 53 29 93
Memory Processor
- 1. Read 24
- 2. Write 5 to 24
- 3. Read 26
- 4. Write 8 to 25
- 5. Write 9 to 21
- 6. Write 2 to 24
- 7. Read 29
1 2 3 4
SLIDE 22 22 Given 2-way associative cache with N=64 and B=8, what is the set index for address 153? Formulas: NEW: (assuming dealing with powers-of-2)
- a. Express in binary. (15310 = 9916 = 100110012)
- b. Grab the right bits!
ByteOffset = Index = Tag =
Efficient Bit Manipulation
SLIDE 23
23
Real Cache with Efficient Bit Manipulation
SLIDE 24
24
Example #1: Bit Manipulation
1. Suppose direct-mapped cache has: – B=8 byte blocks – 2KiB cache Show how to break the following address into the tag, index, & byte offset.
0000 1000 0101 1100 0001 0001 0111 1001
2. Same cache, but now 4-way associative. How does this change things?
0000 1000 0101 1100 0001 0001 0111 1001
SLIDE 25
25
Suppose a direct-mapped cache divides addresses as follows: What is the block size? The number of blocks? Total size of the cache? (usually refers to size of data only)
Example #2: Bit Manipulation
tag index byte offset
7 bits 4 bits 21 bits
SLIDE 26
26
Review: Main concepts
Locality: Temporal and Spatial Each access: Hit or Miss Eviction strategies: Random or Least-Recently Used (LRU) Reasons for a Miss: Compulsory, Capacity, or Conflict Measurements: Miss Rate, Hit Time, Miss Penalty Cache types: Fully Associative, Direct-Mapped, or Set Associative Parameters: N (total size), B (size of block), k (associativity) Write strategies: Write-through or Write-back Implementation details: Stall, Valid bit, Dirty bit, Tag