Adapted from Computer Organization and Design, Patterson & Hennessy, UCB
ECE232: Hardware Organization and Design Lecture 22: Introduction to - - PowerPoint PPT Presentation
ECE232: Hardware Organization and Design Lecture 22: Introduction to - - PowerPoint PPT Presentation
ECE232: Hardware Organization and Design Lecture 22: Introduction to Caches Adapted from Computer Organization and Design , Patterson & Hennessy, UCB Overview Caches hold a subset of data from the main memory Three types of caches
ECE232: Introduction to Caches 2
Overview
- Caches hold a subset of data from the main memory
- Three types of caches
- Direct mapped
- Set associative
- Fully associative
- Today: Direct mapped
- Each memory value can only be in one place in the cache
- Is it there (Hit?)
- Or is it not there (Miss?)
ECE232: Introduction to Caches 3
Direct Mapped Cache - Textbook
- Location determined by address
- Direct mapped: only one choice
- (Block address) modulo (#Blocks in cache)
- #Blocks is a
power of 2
- Use low-order
address bits
ECE232: Introduction to Caches 4
Direct mapped cache (assume 1 byte/Block)
- Cache Block 0 can be
- ccupied by data from
- Memory blocks
0, 4, 8, 12
- Cache Block 1 can be
- ccupied by data from
- Memory blocks
1, 5, 9, 13
- Cache Block 2 can be
- ccupied by data from
- Memory blocks
2, 6, 10, 14
- Cache Block 3 can be
- ccupied by data from
- Memory blocks
3, 7, 11, 15
4-Block Direct Mapped Cache
Memory
Cache Index
00002 01002 10002 11002
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2 3
Block Index
ECE232: Introduction to Caches 5
Direct Mapped Cache – Index and Tag
- index determines block in cache
- index = (address) mod (# blocks)
- The number of cache blocks is power
- f 2 cache index is the lower n bits
- f memory address
tag index Memory block address
Memory Cache Index
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2 3
Block Index
00 00 2 01 00 2 10 00 2 11 00 2 1 byte
ECE232: Introduction to Caches 6
Direct Mapped w/Tag
00 10 01 10 10 10 11 10
- tag determines which memory
block occupies cache block
- hit: cache tag field = tag bits of
address
- miss: tag field tag bits of
address
tag 11
tag
Memory Cache Index
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2 3
Block Index
index Memory block address
ECE232: Introduction to Caches 7
Direct Mapped Cache
- Simplest mapping is a direct mapped cache
- Each memory address is associated with one possible block
within the cache
- Therefore, we only need to look in a single location in the
cache for the data if it exists in the cache
ECE232: Introduction to Caches 8
Finding Item within Block
- In reality, a cache block consists of a number of bytes/words
to (1) increase cache hit due to locality property and (2) reduce the cache miss time
- Given an address of item, index tells which block of cache to
look in
- Then, how to find requested item within the cache block?
- Or, equivalently, “What is the byte offset of the item within
the cache block?”
ECE232: Introduction to Caches 9
Selecting part of a block (block size > 1 byte)
- If block size > 1, rightmost bits of index are really the offset
within the indexed block TAG INDEX OFFSET
Tag to check if have correct block Index to select a block in cache Byte offset
- Example: Block size of 8 bytes; select byte 4 (or 2nd word)
tag 11 Cache Index
1 2 3
11 01 100 Memory address
ECE232: Introduction to Caches 10
Accessing data in a direct mapped cache
- Three types of events:
- cache hit: cache block is valid and contains proper address,
so read desired word
- cache miss: nothing in cache in appropriate block, so fetch
from memory
- cache miss, block replacement: wrong data is in cache at
appropriate block, so discard it and fetch desired data from memory
- Cache Access Procedure:
- (1) Use Index bits to select cache block
- (2) If valid bit is 1, compare the tag bits of the address
with the cache block tag bits
- (3) If they match, use the offset to read out the
word/byte
ECE232: Introduction to Caches 11
Tags and Valid Bits
- How do we know which particular block is stored in a cache
location?
- Store block address as well as the data
- Actually, only need the high-order bits
- Called the tag
- What if there is no data in a location?
- Valid bit: 1 = present, 0 = not present
- Initially 0
ECE232: Introduction to Caches 12
Cache Example
- 8-blocks, 1 byte/block, direct mapped
- Initial state
Index V Tag Data 000 N 001 N 010 N 011 N 100 N 101 N 110 N 111 N
ECE232: Introduction to Caches 13
Cache Example
Index V Tag Data 000 N 001 N 010 N 011 N 100 N 101 N 110 Y 10 Mem[10110] 111 N
Addr Binary addr Hit/mis s Cache block
22 10 110 Miss 110
ECE232: Introduction to Caches 14
Cache Example
Index V Tag Data 000 N 001 N 010 Y 11 Mem[11010] 011 N 100 N 101 N 110 Y 10 Mem[10110] 111 N
Addr Binary addr Hit/miss Cache block 26 11 010 Miss 010
ECE232: Introduction to Caches 15
Cache Example
Index V Tag Data 000 N 001 N 010 Y 11 Mem[11010] 011 N 100 N 101 N 110 Y 10 Mem[10110] 111 N Addr Binary addr Hit/miss Cache block 22 10 110 Hit 110 26 11 010 Hit 010
ECE232: Introduction to Caches 16
Cache Example
Index V Tag Data 000 Y 10 Mem[10000] 001 N 010 Y 11 Mem[11010] 011 Y 00 Mem[00011] 100 N 101 N 110 Y 10 Mem[10110] 111 N Addr Binary addr Hit/miss Cache block 16 10 000 Miss 000 3 00 011 Miss 011 16 10 000 Hit 000
ECE232: Introduction to Caches 17
Cache Example
Index V Tag Data 000 Y 10 Mem[10000] 001 N 010 Y 10 Mem[10010] 011 Y 00 Mem[00011] 100 N 101 N 110 Y 10 Mem[10110] 111 N Addr Binary addr Hit/miss Cache block 18 10 010 Miss 010
ECE232: Introduction to Caches 18
Example: Larger Block Size
- 64 blocks, 16 bytes/block
- To what block number does address 1200 map?
- Block address = 1200/16 = 75
- Block number = 75 modulo 64 = 11
Tag Index Offset
3 4 9 10 31 4 bits 6 bits 22 bits
ECE232: Introduction to Caches 19
Block Size Considerations
- Larger blocks should reduce miss rate
- Due to spatial locality
- But in a fixed-sized cache
- Larger blocks fewer of them
- More competition increased miss rate
- Larger blocks pollution
- Larger miss penalty
- Can override benefit of reduced miss rate
- Early restart and critical-word-first can help
ECE232: Introduction to Caches 20
Cache Misses
- On cache hit, CPU proceeds normally
- On cache miss
- Stall the CPU pipeline
- Fetch block from next level of hierarchy
- Instruction cache miss
- Restart instruction fetch
- Data cache miss
- Complete data access
ECE232: Introduction to Caches 21
Write-Through
- On data-write hit, could just update the block in cache
- But then cache and memory would be inconsistent
- Write through: also update memory
- But makes writes take longer
- e.g., if base CPI = 1, 10% of instructions are stores, write to
memory takes 100 cycles
- Effective CPI = 1 + 0.1×100 = 11
- Solution: write buffer
- Holds data waiting to be written to memory
- CPU continues immediately
- Only stalls on write if write buffer is already full
ECE232: Introduction to Caches 22
Write-Back
- Alternative: On data-write hit, just update the block in cache
- Keep track of whether each block is dirty
- When a dirty block is replaced
- Write it back to memory
- Can use a write buffer to allow replacing block to be read first
ECE232: Introduction to Caches 23
Measuring Cache Performance
- Components of CPU time
- Program execution cycles
- Includes cache hit time
- Memory stall cycles
- Mainly from cache misses
- With simplifying assumptions:
penalty Miss n Instructio Misses Program ns Instructio penalty Miss rate Miss Program accesses Memory cycles stall Memory
ECE232: Introduction to Caches 24
Average Access Time
- Hit time is also important for performance
- Average memory access time (AMAT)
- AMAT = Hit time + Miss rate × Miss penalty
- Example
- CPU with 1ns clock, hit time = 1 cycle, miss penalty = 20
cycles, I-cache miss rate = 5%
- AMAT = 1 + 0.05 × 20 = 2ns
- 2 cycles per instruction
ECE232: Introduction to Caches 25
Summary
- Today: Direct mapped cache
- Performance: tied to whether values are located in the cache
- Cache miss = bad performance
- Need to understand how to numerically determine system
performance based on cache hit rate
- Why might direct mapped caches be bad
- Lots of data map to same location in cache
- Idea
- Maybe we should have multiple locations for each data value
- Next time: set associative