Cache Control Philipp Koehn 16 October 2019 Philipp Koehn - - PowerPoint PPT Presentation

cache control
SMART_READER_LITE
LIVE PREVIEW

Cache Control Philipp Koehn 16 October 2019 Philipp Koehn - - PowerPoint PPT Presentation

Cache Control Philipp Koehn 16 October 2019 Philipp Koehn Computer Systems Fundamentals: Cache Control 16 October 2019 Memory Tradeoff 1 Fastest memory is on same chip as CPU ... but it is not very big (say, 32 KB in L1 cache)


slide-1
SLIDE 1

Cache Control

Philipp Koehn 16 October 2019

Philipp Koehn Computer Systems Fundamentals: Cache Control 16 October 2019

slide-2
SLIDE 2

1

Memory Tradeoff

  • Fastest memory is on same chip as CPU

... but it is not very big (say, 32 KB in L1 cache)

  • Slowest memory is DRAM on different chips

... but can be very large (say, 256GB in compute server)

  • Goal:

illusion that large memory is fast

  • Idea:

use small memory as cache for large memory

Philipp Koehn Computer Systems Fundamentals: Cache Control 16 October 2019

slide-3
SLIDE 3

2

Simplified View

Processor

Smaller memory mirrors some of the large memory content

Philipp Koehn Computer Systems Fundamentals: Cache Control 16 October 2019

slide-4
SLIDE 4

3

Direct Mapping

  • Idea:

keep mapping from cache to main memory simple ⇒ Use part of the address as index to cache

  • Address broken up into 3 parts

– memory position in block (offset) – index – tag to identify position in main memory

  • If blocks with same index are used, older one is overwritten

Philipp Koehn Computer Systems Fundamentals: Cache Control 16 October 2019

slide-5
SLIDE 5

4

Direct Mapping: Example

  • Main memory address (32 bit)

0010 0011 1101 1100 0001 0011 1010 1111

  • Block size:

256 bytes (8 bits)

  • Cache size:

1MB (20 bits) 0010 0011 1101 1100 0001 0011 1010 1111 Tag Index Offset

Philipp Koehn Computer Systems Fundamentals: Cache Control 16 October 2019

slide-6
SLIDE 6

5

Cache Organization

  • Mapping of the address

0010 0011 1101 1100 0001 0011 1010 1111 Tag Index Offset

  • Cache data structure

Index Tag Valid Data 4096 slots (12 bits) (1 bit) 256 bytes 000 001 xx xx xx xx xx xx xx xx 002 ... ... ... ... fff

Philipp Koehn Computer Systems Fundamentals: Cache Control 16 October 2019

slide-7
SLIDE 7

6

cache read

Philipp Koehn Computer Systems Fundamentals: Cache Control 16 October 2019

slide-8
SLIDE 8

7

Cache Hit

Cache Main Memory CPU

  • Memory request from CPU
  • Data found in cache
  • Send data to CPU

Philipp Koehn Computer Systems Fundamentals: Cache Control 16 October 2019

slide-9
SLIDE 9

8

Cache Circuit

Tag Index Offset 256 byte Memory Tag Valid Decoder 256 byte Memory Tag Valid Main Memory CPU 256 byte Memory Tag Valid 256 byte Memory Tag Valid 256 byte Memory Tag Valid 256 byte Memory Tag Valid

  • Address split up into tag, index, and offset
  • Index contains address of block in cache
  • Decoded to select correct row

Philipp Koehn Computer Systems Fundamentals: Cache Control 16 October 2019

slide-10
SLIDE 10

9

Cache Circuit

Tag Index Offset 256 byte Memory Tag Valid Decoder 256 byte Memory Tag Valid =

AND

Main Memory CPU

  • Check tag for equality
  • Check if valid bit is set

Philipp Koehn Computer Systems Fundamentals: Cache Control 16 October 2019

slide-11
SLIDE 11

10

Cache Circuit

Tag Index Offset 256 byte Memory Tag Valid Decoder 256 byte Memory Tag Valid =

AND

Select Main Memory CPU

  • Retrieve correct byte from block

(identified by offset)

  • Use cache only if valid and correct tag

Philipp Koehn Computer Systems Fundamentals: Cache Control 16 October 2019

slide-12
SLIDE 12

11

cache miss

Philipp Koehn Computer Systems Fundamentals: Cache Control 16 October 2019

slide-13
SLIDE 13

12

Cache Miss

Cache Main Memory CPU

  • Memory request from CPU
  • Data not found in cache
  • Memory request from cache to main memory
  • Send data from memory to cache
  • Store data in cache
  • Send data to CPU

Philipp Koehn Computer Systems Fundamentals: Cache Control 16 October 2019

slide-14
SLIDE 14

13

Cache Miss

  • Requires load of block from main memory
  • Blocks execution of instructions
  • Recall discussion of memory access speeds

– CPU clock cycle: 3 GHz → 0.33ns per instruction – DRAM speeds: 50ns ⇒ Significant delay (150 instruction cycles stalled)

Philipp Koehn Computer Systems Fundamentals: Cache Control 16 October 2019

slide-15
SLIDE 15

14

Block Loading

Tag Index Offset Tag Valid Decoder 256 byte Memory Tag Valid =

AND

Select Main Memory CPU 256 byte Memory

  • Example

– block size 256 bytes – request to read memory address $00d3ff53

  • Cache miss triggers read of block $00d3ff00-$00d3ffff

Philipp Koehn Computer Systems Fundamentals: Cache Control 16 October 2019

slide-16
SLIDE 16

15

Read $00d3ff53

00 20 10 30 40 50 60 70 80 90 a0 b0 c0 d0 e0 f0 53

Transfer Block from Main Memory

  • But:

this requires 53 read cycles before relevant byte is loaded

Philipp Koehn Computer Systems Fundamentals: Cache Control 16 October 2019

slide-17
SLIDE 17

16

Better

00 20 10 30 40 50 60 70 80 90 a0 b0 c0 d0 e0 f0 53

Transfer Block from Main Memory

  • Read requested byte first

Philipp Koehn Computer Systems Fundamentals: Cache Control 16 October 2019

slide-18
SLIDE 18

17

cache write

Philipp Koehn Computer Systems Fundamentals: Cache Control 16 October 2019

slide-19
SLIDE 19

18

Write Through

Cache Main Memory CPU

  • Writes change value in cache
  • Write through:

immediately store changed value in memory

  • Drawback:

slows down every write

Philipp Koehn Computer Systems Fundamentals: Cache Control 16 October 2019

slide-20
SLIDE 20

19

Write Back

Cache Main Memory CPU

  • Only change value in cache
  • Record that cache block is changed with "dirty bit"
  • Write back to RAM only when block is pre-empted

Philipp Koehn Computer Systems Fundamentals: Cache Control 16 October 2019

slide-21
SLIDE 21

20

Write Buffer

  • CPU does not need to wait for write to finish
  • Write buffer

– store value in write buffer – transfer values from write buffer to main memory in background – free write buffer

  • This works fine, unless process overloads write buffer

Philipp Koehn Computer Systems Fundamentals: Cache Control 16 October 2019

slide-22
SLIDE 22

21

Write Miss

  • Problem:

CPU writes to address X, but X is not cached

  • Need to load block into cache first
  • Write allocate

– allocate cache slot – write in value for X – load remaining values from main memory – set dirty bit

Philipp Koehn Computer Systems Fundamentals: Cache Control 16 October 2019

slide-23
SLIDE 23

22

split cache

Philipp Koehn Computer Systems Fundamentals: Cache Control 16 October 2019

slide-24
SLIDE 24

23

MIPS Pipeline

IF ID MEM WB EX

  • 2 stages access memory

– IF: instruction fetch loads current instruction – MEM: memory stage reads and writes data ⇒ 2 memory caches in processor – instruction memory – data memory

Philipp Koehn Computer Systems Fundamentals: Cache Control 16 October 2019

slide-25
SLIDE 25

24

Architecture

CPU Instruction cache Data cache Main Memory

Philipp Koehn Computer Systems Fundamentals: Cache Control 16 October 2019

slide-26
SLIDE 26

25

Comments

  • IF and MEM operations can be executed simultanously
  • Possible drawback:

same memory block in both caches ... but very unlikely: code and data usually separated

  • Cache misses possible in both caches

→ contention for memory lookup, blocking

  • Instruction cache simpler:

no writes

Philipp Koehn Computer Systems Fundamentals: Cache Control 16 October 2019