ECE232: Hardware Organization and Design Lecture 22: Introduction to - - PowerPoint PPT Presentation

ece232 hardware organization and design
SMART_READER_LITE
LIVE PREVIEW

ECE232: Hardware Organization and Design Lecture 22: Introduction to - - PowerPoint PPT Presentation

ECE232: Hardware Organization and Design Lecture 22: Introduction to Caches Adapted from Computer Organization and Design , Patterson & Hennessy, UCB Overview Caches hold a subset of data from the main memory Three types of caches


slide-1
SLIDE 1

Adapted from Computer Organization and Design, Patterson & Hennessy, UCB

ECE232: Hardware Organization and Design

Lecture 22: Introduction to Caches

slide-2
SLIDE 2

ECE232: Introduction to Caches 2

Overview

  • Caches hold a subset of data from the main memory
  • Three types of caches
  • Direct mapped
  • Set associative
  • Fully associative
  • Today: Direct mapped
  • Each memory value can only be in one place in the cache
  • Is it there (Hit?)
  • Or is it not there (Miss?)
slide-3
SLIDE 3

ECE232: Introduction to Caches 3

Direct Mapped Cache - Textbook

  • Location determined by address
  • Direct mapped: only one choice
  • (Block address) modulo (#Blocks in cache)
  • #Blocks is a

power of 2

  • Use low-order

address bits

slide-4
SLIDE 4

ECE232: Introduction to Caches 4

Direct mapped cache (assume 1 byte/Block)

  • Cache Block 0 can be
  • ccupied by data from
  • Memory blocks

0, 4, 8, 12

  • Cache Block 1 can be
  • ccupied by data from
  • Memory blocks

1, 5, 9, 13

  • Cache Block 2 can be
  • ccupied by data from
  • Memory blocks

2, 6, 10, 14

  • Cache Block 3 can be
  • ccupied by data from
  • Memory blocks

3, 7, 11, 15

4-Block Direct Mapped Cache

Memory

Cache Index

00002 01002 10002 11002

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2 3

Block Index

slide-5
SLIDE 5

ECE232: Introduction to Caches 5

Direct Mapped Cache – Index and Tag

  • index determines block in cache
  • index = (address) mod (# blocks)
  • The number of cache blocks is power
  • f 2  cache index is the lower n bits
  • f memory address

tag index Memory block address

Memory Cache Index

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2 3

Block Index

00 00 2 01 00 2 10 00 2 11 00 2 1 byte

slide-6
SLIDE 6

ECE232: Introduction to Caches 6

Direct Mapped w/Tag

00 10 01 10 10 10 11 10

  • tag determines which memory

block occupies cache block

  • hit: cache tag field = tag bits of

address

  • miss: tag field  tag bits of

address

tag 11

tag

Memory Cache Index

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2 3

Block Index

index Memory block address

slide-7
SLIDE 7

ECE232: Introduction to Caches 7

Direct Mapped Cache

  • Simplest mapping is a direct mapped cache
  • Each memory address is associated with one possible block

within the cache

  • Therefore, we only need to look in a single location in the

cache for the data if it exists in the cache

slide-8
SLIDE 8

ECE232: Introduction to Caches 8

Finding Item within Block

  • In reality, a cache block consists of a number of bytes/words

to (1) increase cache hit due to locality property and (2) reduce the cache miss time

  • Given an address of item, index tells which block of cache to

look in

  • Then, how to find requested item within the cache block?
  • Or, equivalently, “What is the byte offset of the item within

the cache block?”

slide-9
SLIDE 9

ECE232: Introduction to Caches 9

Selecting part of a block (block size > 1 byte)

  • If block size > 1, rightmost bits of index are really the offset

within the indexed block TAG INDEX OFFSET

Tag to check if have correct block Index to select a block in cache Byte offset

  • Example: Block size of 8 bytes; select byte 4 (or 2nd word)

tag 11 Cache Index

1 2 3

11 01 100 Memory address

slide-10
SLIDE 10

ECE232: Introduction to Caches 10

Accessing data in a direct mapped cache

  • Three types of events:
  • cache hit: cache block is valid and contains proper address,

so read desired word

  • cache miss: nothing in cache in appropriate block, so fetch

from memory

  • cache miss, block replacement: wrong data is in cache at

appropriate block, so discard it and fetch desired data from memory

  • Cache Access Procedure:
  • (1) Use Index bits to select cache block
  • (2) If valid bit is 1, compare the tag bits of the address

with the cache block tag bits

  • (3) If they match, use the offset to read out the

word/byte

slide-11
SLIDE 11

ECE232: Introduction to Caches 11

Tags and Valid Bits

  • How do we know which particular block is stored in a cache

location?

  • Store block address as well as the data
  • Actually, only need the high-order bits
  • Called the tag
  • What if there is no data in a location?
  • Valid bit: 1 = present, 0 = not present
  • Initially 0
slide-12
SLIDE 12

ECE232: Introduction to Caches 12

Cache Example

  • 8-blocks, 1 byte/block, direct mapped
  • Initial state

Index V Tag Data 000 N 001 N 010 N 011 N 100 N 101 N 110 N 111 N

slide-13
SLIDE 13

ECE232: Introduction to Caches 13

Cache Example

Index V Tag Data 000 N 001 N 010 N 011 N 100 N 101 N 110 Y 10 Mem[10110] 111 N

Addr Binary addr Hit/mis s Cache block

22 10 110 Miss 110

slide-14
SLIDE 14

ECE232: Introduction to Caches 14

Cache Example

Index V Tag Data 000 N 001 N 010 Y 11 Mem[11010] 011 N 100 N 101 N 110 Y 10 Mem[10110] 111 N

Addr Binary addr Hit/miss Cache block 26 11 010 Miss 010

slide-15
SLIDE 15

ECE232: Introduction to Caches 15

Cache Example

Index V Tag Data 000 N 001 N 010 Y 11 Mem[11010] 011 N 100 N 101 N 110 Y 10 Mem[10110] 111 N Addr Binary addr Hit/miss Cache block 22 10 110 Hit 110 26 11 010 Hit 010

slide-16
SLIDE 16

ECE232: Introduction to Caches 16

Cache Example

Index V Tag Data 000 Y 10 Mem[10000] 001 N 010 Y 11 Mem[11010] 011 Y 00 Mem[00011] 100 N 101 N 110 Y 10 Mem[10110] 111 N Addr Binary addr Hit/miss Cache block 16 10 000 Miss 000 3 00 011 Miss 011 16 10 000 Hit 000

slide-17
SLIDE 17

ECE232: Introduction to Caches 17

Cache Example

Index V Tag Data 000 Y 10 Mem[10000] 001 N 010 Y 10 Mem[10010] 011 Y 00 Mem[00011] 100 N 101 N 110 Y 10 Mem[10110] 111 N Addr Binary addr Hit/miss Cache block 18 10 010 Miss 010

slide-18
SLIDE 18

ECE232: Introduction to Caches 18

Example: Larger Block Size

  • 64 blocks, 16 bytes/block
  • To what block number does address 1200 map?
  • Block address = 1200/16 = 75
  • Block number = 75 modulo 64 = 11

Tag Index Offset

3 4 9 10 31 4 bits 6 bits 22 bits

slide-19
SLIDE 19

ECE232: Introduction to Caches 19

Block Size Considerations

  • Larger blocks should reduce miss rate
  • Due to spatial locality
  • But in a fixed-sized cache
  • Larger blocks  fewer of them
  • More competition  increased miss rate
  • Larger blocks  pollution
  • Larger miss penalty
  • Can override benefit of reduced miss rate
  • Early restart and critical-word-first can help
slide-20
SLIDE 20

ECE232: Introduction to Caches 20

Cache Misses

  • On cache hit, CPU proceeds normally
  • On cache miss
  • Stall the CPU pipeline
  • Fetch block from next level of hierarchy
  • Instruction cache miss
  • Restart instruction fetch
  • Data cache miss
  • Complete data access
slide-21
SLIDE 21

ECE232: Introduction to Caches 21

Write-Through

  • On data-write hit, could just update the block in cache
  • But then cache and memory would be inconsistent
  • Write through: also update memory
  • But makes writes take longer
  • e.g., if base CPI = 1, 10% of instructions are stores, write to

memory takes 100 cycles

  • Effective CPI = 1 + 0.1×100 = 11
  • Solution: write buffer
  • Holds data waiting to be written to memory
  • CPU continues immediately
  • Only stalls on write if write buffer is already full
slide-22
SLIDE 22

ECE232: Introduction to Caches 22

Write-Back

  • Alternative: On data-write hit, just update the block in cache
  • Keep track of whether each block is dirty
  • When a dirty block is replaced
  • Write it back to memory
  • Can use a write buffer to allow replacing block to be read first
slide-23
SLIDE 23

ECE232: Introduction to Caches 23

Measuring Cache Performance

  • Components of CPU time
  • Program execution cycles
  • Includes cache hit time
  • Memory stall cycles
  • Mainly from cache misses
  • With simplifying assumptions:

penalty Miss n Instructio Misses Program ns Instructio penalty Miss rate Miss Program accesses Memory cycles stall Memory      

slide-24
SLIDE 24

ECE232: Introduction to Caches 24

Average Access Time

  • Hit time is also important for performance
  • Average memory access time (AMAT)
  • AMAT = Hit time + Miss rate × Miss penalty
  • Example
  • CPU with 1ns clock, hit time = 1 cycle, miss penalty = 20

cycles, I-cache miss rate = 5%

  • AMAT = 1 + 0.05 × 20 = 2ns
  • 2 cycles per instruction
slide-25
SLIDE 25

ECE232: Introduction to Caches 25

Summary

  • Today: Direct mapped cache
  • Performance: tied to whether values are located in the cache
  • Cache miss = bad performance
  • Need to understand how to numerically determine system

performance based on cache hit rate

  • Why might direct mapped caches be bad
  • Lots of data map to same location in cache
  • Idea
  • Maybe we should have multiple locations for each data value
  • Next time: set associative