CACHE ARCHITECTURE Mahdi Nazm Bojnordi Assistant Professor School - - PowerPoint PPT Presentation

cache architecture
SMART_READER_LITE
LIVE PREVIEW

CACHE ARCHITECTURE Mahdi Nazm Bojnordi Assistant Professor School - - PowerPoint PPT Presentation

CACHE ARCHITECTURE Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Homework 3 will be released on Oct. 31 st This lecture Cache addressing and


slide-1
SLIDE 1

CACHE ARCHITECTURE

CS/ECE 6810: Computer Architecture

Mahdi Nazm Bojnordi

Assistant Professor School of Computing University of Utah

slide-2
SLIDE 2

Overview

¨ Announcement

¤ Homework 3 will be released on Oct. 31st

¨ This lecture

¤ Cache addressing and lookup ¤ Cache optimizations

n Techniques to improve miss rate n Replacement policies n Write policies

slide-3
SLIDE 3

Recall: Cache Addressing

¨ Instead of specifying cache address we specify

main memory address

¨ Simplest: direct-mapped cache

1111 1110 1101 1100 1011 1010 1001 1000 0111 0110 0101 0100 0011 0010 0001 0000 11 10 01 00

Note: each memory address maps to a single cache location determined by modulo hashing

Memory Cache How to exactly specify which blocks are in the cache?

slide-4
SLIDE 4

Direct-Mapped Lookup

¨ Byte offset: to select

the requested byte

¨ Tag: to maintain the

address

¨ Valid flag (v):

whether content is meaningful

¨ Data and tag are

always accessed

hit data v

1

2

1021 1022 1023

tag index byte

=

slide-5
SLIDE 5

Example Problem

¨ Find the size of tag, index, and offset bits for an

8MB, direct-mapped L3 cache with 64B cache

  • blocks. Assume that the processor can address up to

4GB of main memory.

slide-6
SLIDE 6

Example Problem

¨ Find the size of tag, index, and offset bits for an

8MB, direct-mapped L3 cache with 64B cache

  • blocks. Assume that the processor can address up to

4GB of main memory.

¨ 4GB = 232 B à address bits = 32 ¨ 64B = 26 B à byte offset bits = 6 ¨ 8MB/64B = 217 à index bits = 17 ¨ tag bits = 32 – 6 – 17 = 9

slide-7
SLIDE 7

Example Problem

¨ Find the size of tag, index, and offset bits for an

8MB, direct-mapped L3 cache with 64B cache

  • blocks. Assume that the processor can address up to

4GB of main memory.

slide-8
SLIDE 8

Example Problem

¨ Find the size of tag, index, and offset bits for an

8MB, direct-mapped L3 cache with 64B cache

  • blocks. Assume that the processor can address up to

4GB of main memory.

¨ 4GB = 232 B à address bits = 32 ¨ 64B = 26 B à byte offset bits = 6 ¨ 8MB/64B = 217 à index bits = 17 ¨ tag bits = 32 – 6 – 17 = 9

slide-9
SLIDE 9

Cache Optimizations

¨ How to improve cache performance? ¨ Reduce hit time (th) ¨ Improve hit rate (1 - rm) ¨ Reduce miss penalty (tp)

AMAT = th + rm tp

slide-10
SLIDE 10

Cache Optimizations

¨ How to improve cache performance? ¨ Reduce hit time (th)

¤ Memory technology, critical access path

¨ Improve hit rate (1 - rm) ¨ Reduce miss penalty (tp)

AMAT = th + rm tp

slide-11
SLIDE 11

Cache Optimizations

¨ How to improve cache performance? ¨ Reduce hit time (th)

¤ Memory technology, critical access path

¨ Improve hit rate (1 - rm)

¤ Size, associativity, placement/replacement policies

¨ Reduce miss penalty (tp)

AMAT = th + rm tp

slide-12
SLIDE 12

Cache Optimizations

¨ How to improve cache performance? ¨ Reduce hit time (th)

¤ Memory technology, critical access path

¨ Improve hit rate (1 - rm)

¤ Size, associativity, placement/replacement policies

¨ Reduce miss penalty (tp)

¤ Multi level caches, data prefetching

AMAT = th + rm tp

slide-13
SLIDE 13

Set Associative Caches

¨ Improve cache hit rate by allowing a memory location

to be placed in more than one cache block

¤ N-way set associative cache ¤ Fully associative

¨ For fixed capacity, higher associativity typically leads to

higher hit rates

¤ more places to simultaneously map cache lines ¤ 8-way SA close to FA in practice

… Memory for (i=0; i<10000; i++) { a++; b++; } a b

slide-14
SLIDE 14

Set Associative Caches

¨ Improve cache hit rate by allowing a memory location

to be placed in more than one cache block

¤ N-way set associative cache ¤ Fully associative

¨ For fixed capacity, higher associativity typically leads to

higher hit rates

¤ more places to simultaneously map cache lines ¤ 8-way SA close to FA in practice

… Memory

way 1 way 0

a b for (i=0; i<10000; i++) { a++; b++; }

slide-15
SLIDE 15

n-Way Set Associative Lookup

¨ Index into cache sets ¨ Multiple tag

comparisons

¨ Multiple data reads ¨ Special cases

¤ Direct mapped

n Single block sets

¤ Fully associative

n Single set cache

=

data v 1 … 510 511

= mux

hit tag index byte

OR

slide-16
SLIDE 16

Example Problem

¨ Find the size of tag, index, and offset bits for an

4MB, 4-way set associative cache with 32B cache

  • blocks. Assume that the processor can address up to

4GB of main memory.

slide-17
SLIDE 17

Example Problem

¨ Find the size of tag, index, and offset bits for an

4MB, 4-way set associative cache with 32B cache

  • blocks. Assume that the processor can address up to

4GB of main memory.

¨ 4GB = 232 B à address bits = 32 ¨ 32B = 25 B à byte offset bits = 5 ¨ 4MB/(4x32B) = 215 à index bits = 15 ¨ tag bits = 32 – 5 – 15 = 12

slide-18
SLIDE 18

Cache Miss Classifications

¨ Start by measuring miss rate with an ideal cache

¤ 1. ideal is fully associative and infinite capacity ¤ 2. then reduce capacity to size of interest ¤ 3. then reduce associativity to degree of interest

slide-19
SLIDE 19

Cache Miss Classifications

¨ Start by measuring miss rate with an ideal cache

¤ 1. ideal is fully associative and infinite capacity ¤ 2. then reduce capacity to size of interest ¤ 3. then reduce associativity to degree of interest

  • 1. Cold (compulsory)
slide-20
SLIDE 20

Cache Miss Classifications

¨ Start by measuring miss rate with an ideal cache

¤ 1. ideal is fully associative and infinite capacity ¤ 2. then reduce capacity to size of interest ¤ 3. then reduce associativity to degree of interest

  • 1. Cold (compulsory)

qCold start: first access to block qHow to improve

  • large blocks
  • prefetching
slide-21
SLIDE 21

Cache Miss Classifications

¨ Start by measuring miss rate with an ideal cache

¤ 1. ideal is fully associative and infinite capacity ¤ 2. then reduce capacity to size of interest ¤ 3. then reduce associativity to degree of interest

  • 1. Cold (compulsory)
  • 2. Capacity

qCold start: first access to block qHow to improve

  • large blocks
  • prefetching
slide-22
SLIDE 22

Cache Miss Classifications

¨ Start by measuring miss rate with an ideal cache

¤ 1. ideal is fully associative and infinite capacity ¤ 2. then reduce capacity to size of interest ¤ 3. then reduce associativity to degree of interest

  • 1. Cold (compulsory)
  • 2. Capacity

qCold start: first access to block qHow to improve

  • large blocks
  • prefetching

qCache is smaller than the program data qHow to improve

  • large cache
slide-23
SLIDE 23

Cache Miss Classifications

¨ Start by measuring miss rate with an ideal cache

¤ 1. ideal is fully associative and infinite capacity ¤ 2. then reduce capacity to size of interest ¤ 3. then reduce associativity to degree of interest

  • 1. Cold (compulsory)
  • 2. Capacity
  • 3. Conflict

qCold start: first access to block qHow to improve

  • large blocks
  • prefetching

qCache is smaller than the program data qHow to improve

  • large cache
slide-24
SLIDE 24

Cache Miss Classifications

¨ Start by measuring miss rate with an ideal cache

¤ 1. ideal is fully associative and infinite capacity ¤ 2. then reduce capacity to size of interest ¤ 3. then reduce associativity to degree of interest

  • 1. Cold (compulsory)
  • 2. Capacity
  • 3. Conflict

qCold start: first access to block qHow to improve

  • large blocks
  • prefetching

qCache is smaller than the program data qHow to improve

  • large cache

qSet size is smaller than mapped

  • mem. locations

qHow to improve

  • large cache
  • more assoc.
slide-25
SLIDE 25

Miss Rates: Example Problem

¨ 100,000 loads and stores are generated; L1 cache

has 3,000 misses; L2 cache has 1,500 misses. What are various miss rates?

slide-26
SLIDE 26

Miss Rates: Example Problem

¨ 100,000 loads and stores are generated; L1 cache

has 3,000 misses; L2 cache has 1,500 misses. What are various miss rates?

¨ L1 miss rates

¤ Local/global: 3,000/100,000 = 3%

¨ L2 miss rates

¤ Local: 1,500/3,000 = 50% ¤ Global: 1,500/100,000 = 1.5%