CACHE OPTIMIZATION Mahdi Nazm Bojnordi Assistant Professor School - - PowerPoint PPT Presentation

cache optimization
SMART_READER_LITE
LIVE PREVIEW

CACHE OPTIMIZATION Mahdi Nazm Bojnordi Assistant Professor School - - PowerPoint PPT Presentation

CACHE OPTIMIZATION Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Homework 3 will be released on Oct. 31 st This lecture Cache replacement


slide-1
SLIDE 1

CACHE OPTIMIZATION

CS/ECE 6810: Computer Architecture

Mahdi Nazm Bojnordi

Assistant Professor School of Computing University of Utah

slide-2
SLIDE 2

Overview

¨ Announcement

¤ Homework 3 will be released on Oct. 31st

¨ This lecture

¤ Cache replacement policies ¤ Cache write policies

¨ Reducing miss penalty

slide-3
SLIDE 3

Recall: Cache Optimizations

¨ How to improve cache performance? ¨ Reduce hit time (th)

¤ Memory technology, critical access path

¨ Improve hit rate (1 - rm)

¤ Size, associativity, placement/replacement policies

¨ Reduce miss penalty (tp)

¤ Multi level caches, data prefetching

AMAT = th + rm tp

slide-4
SLIDE 4

Recall: Cache Miss Classifications

¨ Start by measuring miss rate with an ideal cache

¤ 1. ideal is fully associative and infinite capacity ¤ 2. then reduce capacity to size of interest ¤ 3. then reduce associativity to degree of interest

  • 1. Cold (compulsory)
  • 2. Capacity
  • 3. Conflict

qCold start: first access to block qHow to improve

  • large blocks
  • prefetching

qCache is smaller than the program data qHow to improve

  • large cache

qSet size is smaller than mapped

  • mem. locations

qHow to improve

  • large cache
  • more assoc.
slide-5
SLIDE 5

Miss Rates: Example Problem

¨ 100,000 loads and stores are generated; L1 cache

has 3,000 misses; L2 cache has 1,500 misses. What are various miss rates?

slide-6
SLIDE 6

Miss Rates: Example Problem

¨ 100,000 loads and stores are generated; L1 cache

has 3,000 misses; L2 cache has 1,500 misses. What are various miss rates?

¨ L1 miss rates

¤ Local/global: 3,000/100,000 = 3%

¨ L2 miss rates

¤ Local: 1,500/3,000 = 50% ¤ Global: 1,500/100,000 = 1.5%

slide-7
SLIDE 7

Cache Replacement Policies

¨ Which block to replace on a miss?

¤ Only one candidate in direct-mapped cache ¤ Multiple candidates in set/fully associative cache

slide-8
SLIDE 8

Cache Replacement Policies

¨ Which block to replace on a miss?

¤ Only one candidate in direct-mapped cache ¤ Multiple candidates in set/fully associative cache

¨ Ideal replacement (Belady’s algorithm)

  • Cache Set

A B C B B B C A Requested Blocks

slide-9
SLIDE 9

Cache Replacement Policies

¨ Which block to replace on a miss?

¤ Only one candidate in direct-mapped cache ¤ Multiple candidates in set/fully associative cache

¨ Ideal replacement (Belady’s algorithm)

¤ Replace the block accessed farthest in the future

  • Cache Set

A B C B B B C A Requested Blocks A B

slide-10
SLIDE 10

Cache Replacement Policies

¨ Which block to replace on a miss?

¤ Only one candidate in direct-mapped cache ¤ Multiple candidates in set/fully associative cache

¨ Ideal replacement (Belady’s algorithm)

¤ Replace the block accessed farthest in the future

¨ Least recently used (LRU)

  • Cache Set

A B C B B B C A Requested Blocks

slide-11
SLIDE 11

Cache Replacement Policies

¨ Which block to replace on a miss?

¤ Only one candidate in direct-mapped cache ¤ Multiple candidates in set/fully associative cache

¨ Ideal replacement (Belady’s algorithm)

¤ Replace the block accessed farthest in the future

¨ Least recently used (LRU)

¤ Replace the block accessed farthest in the past

  • Cache Set

A B C B B B C A Requested Blocks A B

slide-12
SLIDE 12

Cache Replacement Policies

¨ Which block to replace on a miss?

¤ Only one candidate in direct-mapped cache ¤ Multiple candidates in set/fully associative cache

¨ Ideal replacement (Belady’s algorithm)

¤ Replace the block accessed farthest in the future

¨ Least recently used (LRU)

¤ Replace the block accessed farthest in the past

¨ Most recently used (MRU)

  • Cache Set

A B C B B B C A Requested Blocks

slide-13
SLIDE 13

Cache Replacement Policies

¨ Which block to replace on a miss?

¤ Only one candidate in direct-mapped cache ¤ Multiple candidates in set/fully associative cache

¨ Ideal replacement (Belady’s algorithm)

¤ Replace the block accessed farthest in the future

¨ Least recently used (LRU)

¤ Replace the block accessed farthest in the past

¨ Most recently used (MRU)

¤ Replace the block accessed nearest in the past

  • Cache Set

A B C B B B C A Requested Blocks A B

slide-14
SLIDE 14

Cache Replacement Policies

¨ Which block to replace on a miss?

¤ Only one candidate in direct-mapped cache ¤ Multiple candidates in set/fully associative cache

¨ Ideal replacement (Belady’s algorithm)

¤ Replace the block accessed farthest in the future

¨ Least recently used (LRU)

¤ Replace the block accessed farthest in the past

¨ Most recently used (MRU)

¤ Replace the block accessed nearest in the past

¨ Random replacement

¤ hardware randomly selects a cache block to replace

slide-15
SLIDE 15

Example Problem

¨ Blocks A, B, and C are mapped to a single set with

  • nly two block storages; find the miss rates for LRU

and MRU policies.

¨ 1. A, B, C, A, B, C, A, B, C ¨ 2. A, A, B, B, C, C, A, B, C

slide-16
SLIDE 16

Example Problem

¨ Blocks A, B, and C are mapped to a single set with

  • nly two block storages; find the miss rates for LRU

and MRU policies.

¨ 1. A, B, C, A, B, C, A, B, C

¤ LRU : 100% ¤ MRU : 66%

¨ 2. A, A, B, B, C, C, A, B, C

¤ LRU : 66% ¤ MRU : 44%

slide-17
SLIDE 17

Cache Write Policies

¨ Write vs. read

¤ Data and tag are accessed for both read and write ¤ Only for write, data array needs to be updated

¨ Cache write policies

slide-18
SLIDE 18

Cache Write Policies

¨ Write vs. read

¤ Data and tag are accessed for both read and write ¤ Only for write, data array needs to be updated

¨ Cache write policies

Write lookup hit miss

slide-19
SLIDE 19

Cache Write Policies

¨ Write vs. read

¤ Data and tag are accessed for both read and write ¤ Only for write, data array needs to be updated

¨ Cache write policies

Read lower level? Write no allocate Write allocate Write lookup hit miss

slide-20
SLIDE 20

Cache Write Policies

¨ Write vs. read

¤ Data and tag are accessed for both read and write ¤ Only for write, data array needs to be updated

¨ Cache write policies

Read lower level? Write no allocate Write allocate Write lower level? Write back Write through Write lookup hit miss

slide-21
SLIDE 21

Write back

¨ On a write access, write to cache only

¤ write cache block to memory only when replaced

from cache

¤ dramatically decreases bus bandwidth usage ¤ keep a bit (called the dirty bit) per cache block Core Main Memory Cache

slide-22
SLIDE 22

Write through

¨ Write to both cache and memory (or next level)

¤ Improved miss penalty ¤ More reliable because of maintaining two copies Core Main Memory Cache

slide-23
SLIDE 23

Write through

¨ Write to both cache and memory (or next level)

¤ Improved miss penalty ¤ More reliable because of maintaining two copies Core Main Memory Cache Write buffer ¤ Use write buffer alongside cache ¤ works fine if

n rate of stores < 1 / DRAM write cycle

¤ otherwise

n write buffer fills up n stall processor to allow memory to catch up

slide-24
SLIDE 24

Write (No-)Allocate

¨ Write allocate

¤ allocate a cache line for the new data, and replace

  • ld line

¤ just like a read miss

¨ Write no allocate

¤ do not allocate space in the cache for the data ¤ only really makes sense in systems with write buffers

¨ How to handle read miss after write miss?

slide-25
SLIDE 25

Reducing Miss Penalty

¨ Some cache misses are inevitable

¤ when they do happen, want to service as quickly as

possible

¨ Other miss penalty reduction techniques

¤ Multilevel caches ¤ Giving read misses priority over writes ¤ Sub-block placement ¤ Critical word first

slide-26
SLIDE 26

Victim Cache

¨ How to reduce conflict misses

¤ Larger cache capacity ¤ More associativity

¨ Associativity is expensive

¤ More hardware; longer hit time ¤ More energy consumption

¨ Observation

¤ Conflict misses do not occur in all sets ¤ Can we increase associativity on the fly for sets?

slide-27
SLIDE 27

Victim Cache

¨ Small fully associative cache

¤ On eviction, move the victim block to victim cache … Last Level Cache 4-way SA Cache Data

slide-28
SLIDE 28

Victim Cache

¨ Small fully associative cache

¤ On eviction, move the victim block to victim cache … Last Level Cache 4-way SA Cache … Victim Cache Small FA cache Data

slide-29
SLIDE 29

Cache Inclusion

¨ How to reduce the number of accesses that miss in

all cache levels?

¤ Should a block be allocated in all levels?

n Yes: inclusive cache n No: non-inclusive or exclusive

¤ Non-inclusive: only allocated in L1

¨ Modern processors

¤ L3: inclusive of L1 and L2 ¤ L2: non-inclusive of L1 (large victim cache)