CS 230 - Spring 2020 3-1
Computer Systems Lecture 17 Caching Continued CS 230 - Spring 2020 - - PowerPoint PPT Presentation
Computer Systems Lecture 17 Caching Continued CS 230 - Spring 2020 - - PowerPoint PPT Presentation
CS 230 Introduction to Computers and Computer Systems Lecture 17 Caching Continued CS 230 - Spring 2020 3-1 Cache Writing Strategy Store word loads blocks just like load word Then what does it do with new value? Write-through
CS 230 - Spring 2020 3-2
Cache Writing Strategy
Store word loads blocks just like load word Then what does it do with new value? Write-through
store word updates both cache and main memory
Write-back – many updates in same block
only update cache, mark block as dirty
add dirty bit column to cache diagram 1 (dirty) if store word executed for this block, 0 (clean) if not update main memory when dirty block is evicted
and set dirty bit back to 0 if the new block came from a load word
CS 230 - Spring 2020 3-3
Associative Caches
n-way set associative
divide a cache with M cache-blocks into M/n sets each set contains n cache-blocks
sets are contiguous
addresses have cache-set instead of cache-block
for address P we have cache-set Cs = (P/B)MOD(M/n) in binary it’s the next log2(M/n) bits adjacent to the
“within the block bits”
tag is T = P/(BM/n) which is remaining bits in binary
when loading: check all tags in cache-set for a hit
CS 230 - Spring 2020 3-4
Fully Associative and Direct Mapped
Direct-mapped cache is 1-way set associative
n = 1 no choice of where to put blocks
Fully associative
n-way set associative with n = B allow a given block to go in any cache entry requires all entries to be searched at once hardware comparator per entry (expensive)
CS 230 - Spring 2020 3-5
Replacement Policy
Where do we put loaded blocks?
associative caches give us a choice
n=1 (direct-mapped) has only one block per set
no choice, just put it in the one spot
n>1
fill any empty spots in order if no empty spots: evict least recently used
block that was interacted with (hit or miss) the longest time
ago
CS 230 - Spring 2020 3-6
Example
Assume 2-way set associative, M = 8, B = 2
bytes, 8-bit machine word, least-recently used eviction policy
Cache-set Index Valid Dirty Tag Data 00 000 N N 001 N N 01 010 N N 011 N N 10 100 N N 101 N N 11 110 N N 111 N N
CS 230 - Spring 2020 3-7
Example
Instruction Address Binary addr Hit/miss Cache set lw 4410 00101 10 0 Miss 10 Cache-set Index Valid Dirty Tag Data 00 000 N N 001 N N 01 010 N N 011 N N 10 100 Y N 00101 Mem[0010110X] 101 N N 11 110 N N 111 N N
CS 230 - Spring 2020 3-8
Example
Instruction Address Binary addr Hit/miss Cache set lw 6810 01000 10 0 Miss 10 Cache-set Index Valid Dirty Tag Data 00 000 N N 001 N N 01 010 N N 011 N N 10 100 Y N 00101 Mem[0010110X] 101 Y N 01000 Mem[0100010X] 11 110 N N 111 N N
CS 230 - Spring 2020 3-9
Example
Instruction Address Binary addr Hit/miss Cache set sw 4510 00101 10 1 Hit 10 Cache-set Index Valid Dirty Tag Data 00 000 N N 001 N N 01 010 N N 011 N N 10 100 Y Y 00101 Mem[0010110X] 101 Y N 01000 Mem[0100010X] 11 110 N N 111 N N
CS 230 - Spring 2020 3-10
Example
Instruction Address Binary addr Hit/miss Cache set sw 810 00001 00 0 Miss 00 Cache-set Index Valid Dirty Tag Data 00 000 Y Y 00001 Mem[0000100X] 001 N N 01 010 N N 011 N N 10 100 Y Y 00101 Mem[0010110X] 101 Y N 01000 Mem[0100010X] 11 110 N N 111 N N
CS 230 - Spring 2020 3-11
Example
Instruction Address Binary addr Hit/miss Cache set lw 1210 00001 10 0 Miss 10 Cache-set Index Valid Dirty Tag Data 00 000 Y Y 00001 Mem[0000100X] 001 N N 01 010 N N 011 N N 10 100 Y Y 00101 Mem[0010110X] 101 Y N 00001 Mem[0000110X] 11 110 N N 111 N N The block that was here before got evicted
CS 230 - Spring 2020 3-12
Try it Yourself
Draw a cache diagram showing what the cache would look like after loading or storing the following addresses from/to memory. What is the hit chance of this cache for this access sequence? Assume 4-way assoc., M = 8, B = 4 bytes, 8-bit machine word. lw 0x08, lw 0x3E, lw 0x83, lw 0x0B, sw 0x22, lw 0x32, sw 0x18
CS 230 - Spring 2020 3-13
Try it Yourself
Draw a cache diagram showing what the cache would look like after loading or storing the following addresses from/to memory. What is the hit chance of this cache for this access sequence? Assume 4-way assoc., M = 8, B = 4 bytes, 8-bit machine word. lw 0x08, lw 0x3E, lw 0x83, lw 0x0B, sw 0x22, lw 0x32, sw 0x18
Cache-set Index Valid Dirty Tag Data 000 N N 001 N N 010 N N 011 N N 1 100 N N 101 N N 110 N N 111 N N
CS 230 - Spring 2020 3-14
Try it Yourself
Draw a cache diagram showing what the cache would look like after loading or storing the following addresses from/to memory. What is the hit chance of this cache for this access sequence? Assume 4-way assoc., M = 8, B = 4 bytes, 8-bit machine word. lw 0x08, lw 0x3E, lw 0x83, lw 0x0B, sw 0x22, lw 0x32, sw 0x18
Cache-set Index Valid Dirty Tag Data 000 Y N 00001 Mem[000010XX] 001 N N 010 N N 011 N N 1 100 N N 101 N N 110 N N 111 N N 00001 0 00
tag set xx
miss
CS 230 - Spring 2020 3-15
Try it Yourself
Draw a cache diagram showing what the cache would look like after loading or storing the following addresses from/to memory. What is the hit chance of this cache for this access sequence? Assume 4-way assoc., M = 8, B = 4 bytes, 8-bit machine word. lw 0x08, lw 0x3E, lw 0x83, lw 0x0B, sw 0x22, lw 0x32, sw 0x18
Cache-set Index Valid Dirty Tag Data 000 Y N 00001 Mem[000010XX] 001 N N 010 N N 011 N N 1 100 Y N 00111 Mem[001111XX] 101 N N 110 N N 111 N N 00111 1 10
tag set xx
miss miss
CS 230 - Spring 2020 3-16
Try it Yourself
Draw a cache diagram showing what the cache would look like after loading or storing the following addresses from/to memory. What is the hit chance of this cache for this access sequence? Assume 4-way assoc., M = 8, B = 4 bytes, 8-bit machine word. lw 0x08, lw 0x3E, lw 0x83, lw 0x0B, sw 0x22, lw 0x32, sw 0x18
Cache-set Index Valid Dirty Tag Data 000 Y N 00001 Mem[000010XX] 001 Y N 10000 Mem[100000XX] 010 N N 011 N N 1 100 Y N 00111 Mem[001111XX] 101 N N 110 N N 111 N N 10000 0 11
tag set xx
miss miss miss
CS 230 - Spring 2020 3-17
Try it Yourself
Draw a cache diagram showing what the cache would look like after loading or storing the following addresses from/to memory. What is the hit chance of this cache for this access sequence? Assume 4-way assoc., M = 8, B = 4 bytes, 8-bit machine word. lw 0x08, lw 0x3E, lw 0x83, lw 0x0B, sw 0x22, lw 0x32, sw 0x18
Cache-set Index Valid Dirty Tag Data 000 Y N 00001 Mem[000010XX] 001 Y N 10000 Mem[100000XX] 010 N N 011 N N 1 100 Y N 00111 Mem[001111XX] 101 N N 110 N N 111 N N 00001 0 11
tag set xx
miss miss miss hit
CS 230 - Spring 2020 3-18
Try it Yourself
Draw a cache diagram showing what the cache would look like after loading or storing the following addresses from/to memory. What is the hit chance of this cache for this access sequence? Assume 4-way assoc., M = 8, B = 4 bytes, 8-bit machine word. lw 0x08, lw 0x3E, lw 0x83, lw 0x0B, sw 0x22, lw 0x32, sw 0x18
Cache-set Index Valid Dirty Tag Data 000 Y N 00001 Mem[000010XX] 001 Y N 10000 Mem[100000XX] 010 Y Y 00100 Mem[001000XX] 011 N N 1 100 Y N 00111 Mem[001111XX] 101 N N 110 N N 111 N N 00100 0 10
tag set xx
miss miss miss hit miss
CS 230 - Spring 2020 3-19
Try it Yourself
Draw a cache diagram showing what the cache would look like after loading or storing the following addresses from/to memory. What is the hit chance of this cache for this access sequence? Assume 4-way assoc., M = 8, B = 4 bytes, 8-bit machine word. lw 0x08, lw 0x3E, lw 0x83, lw 0x0B, sw 0x22, lw 0x32, sw 0x18
Cache-set Index Valid Dirty Tag Data 000 Y N 00001 Mem[000010XX] 001 Y N 10000 Mem[100000XX] 010 Y Y 00100 Mem[001000XX] 011 Y N 00110 Mem[001100XX] 1 100 Y N 00111 Mem[001111XX] 101 N N 110 N N 111 N N 00110 0 10
tag set xx
miss miss miss hit miss miss
CS 230 - Spring 2020 3-20
Try it Yourself
Draw a cache diagram showing what the cache would look like after loading or storing the following addresses from/to memory. What is the hit chance of this cache for this access sequence? Assume 4-way assoc., M = 8, B = 4 bytes, 8-bit machine word. lw 0x08, lw 0x3E, lw 0x83, lw 0x0B, sw 0x22, lw 0x32, sw 0x18
Cache-set Index Valid Dirty Tag Data 000 Y N 00001 Mem[000010XX] 001 Y Y 00011 Mem[000110XX] 010 Y Y 00100 Mem[001000XX] 011 Y N 00110 Mem[001100XX] 1 100 Y N 00111 Mem[001111XX] 101 N N 110 N N 111 N N 00011 0 00
tag set xx
miss miss miss hit miss miss miss+evic
CS 230 - Spring 2020 3-21
Try it Yourself
Draw a cache diagram showing what the cache would look like after loading or storing the following addresses from/to memory. What is the hit chance of this cache for this access sequence? Assume 4-way assoc., M = 8, B = 4 bytes, 8-bit machine word. lw 0x08, lw 0x3E, lw 0x83, lw 0x0B, sw 0x22, lw 0x32, sw 0x18
miss miss miss hit miss miss miss+evic
7 accesses 1 hits 6 misses 1/7 = 0.14310 hit chance
CS 230 - Spring 2020 3-22
Cache Performance
CPU time
program execution cycles (includes cache hits) memory stall cycles from cache misses
With simplifying assumptions:
penalty Miss n Instructio Misses Program ns Instructio penalty Miss rate Miss Program accesses Memory cycles stall Memory Memory Stalls
Include this in CPI to get Effective CPI
CS 230 - Spring 2020 3-23
Cache Performance Example
For some program we have:
Miss rate = 4% Miss penalty = 100 cycles to main memory Base CPI = 2 36% of instructions are memory access
Miss cycles per instruction
0.36 * 0.04 * 100 = 1.44
Effective CPI = 2 + 1.44 = 3.44
CS 230 - Spring 2020 3-24
Average Access Time
Average memory access time (AMAT)
AMAT = hit time + miss rate * miss penalty
Example
1ns clock hit time = 1 cycle miss penalty = 20 cycles to main memory cache miss rate = 5% AMAT = 1 + 0.05 * 20 = 2ns
CS 230 - Spring 2020 3-25
Multilevel Caches
Primary (level-1) cache attached to CPU
small, but fast (usually instruction speed)
Level-2 cache handles misses from L1 cache
larger, slower, but still faster than main memory
Main memory handles L2 cache misses Most modern computers have even more levels
CS 230 - Spring 2020 3-26
Multilevel Cache Example
Assume
CPU base CPI = 1, clock rate = 4GHz miss rate = 2% 50% of program is lw/sw main memory access time = 100ns
With just a one-level cache
miss penalty = 100ns/0.25ns = 400 cycles effective CPI = 1 + 0.5 * 0.02 * 400 = 5
CS 230 - Spring 2020 3-27
Multilevel Cache Example
Now add L2 cache
access time = 5ns global miss rate to main memory = 0.5%
this is the chance of missing L1 cache and L2 cache
Primary miss with L2 hit
penalty = 5ns/0.25ns = 20 cycles
Primary miss with L2 miss
100ns main memory access time = 400 cycles
- Eff. CPI = 1 + 0.5 * 0.02 * 20 + 0.5 * 0.005 * 400
= 1 + 0.2 + 1 = 2.2
CS 230 - Spring 2020 3-28
Try it Yourself
Consider a processor with a 2-level cache
it has a base CPI of 3 L1 cache has a miss rate of 20% L2 cache has a hit time of 10 cycles the global miss rate is 4% main memory access time is 50 cycles 40% of instructions are lw/sw
What is the effective CPI of this processor?
CS 230 - Spring 2020 3-29
Try it Yourself
Consider a processor with a 2-level cache
it has a base CPI of 3 L1 cache has a miss rate of 20% L2 cache has a hit time of 10 cycles the global miss rate is 4% main memory access time is 50 cycles 40% of instructions are lw/sw
What is the effective CPI of this processor?
3
CS 230 - Spring 2020 3-30
Try it Yourself
Consider a processor with a 2-level cache
it has a base CPI of 3 L1 cache has a miss rate of 20% L2 cache has a hit time of 10 cycles the global miss rate is 4% main memory access time is 50 cycles 40% of instructions are lw/sw
What is the effective CPI of this processor?
3 + 0.4 * 0.2 * 10
CS 230 - Spring 2020 3-31
Try it Yourself
Consider a processor with a 2-level cache
it has a base CPI of 3 L1 cache has a miss rate of 20% L2 cache has a hit time of 10 cycles the global miss rate is 4% main memory access time is 50 cycles 40% of instructions are lw/sw
What is the effective CPI of this processor?
3 + 0.4 * 0.2 * 10 + 0.4 * 0.04 * 50
CS 230 - Spring 2020 3-32
Try it Yourself
Consider a processor with a 2-level cache
it has a base CPI of 3 L1 cache has a miss rate of 20% L2 cache has a hit time of 10 cycles the global miss rate is 4% main memory access time is 50 cycles 40% of instructions are lw/sw
What is the effective CPI of this processor?