Cache Memory Raul Queiroz Feitosa Content Memory Hierarchy - - PowerPoint PPT Presentation

cache memory
SMART_READER_LITE
LIVE PREVIEW

Cache Memory Raul Queiroz Feitosa Content Memory Hierarchy - - PowerPoint PPT Presentation

Cache Memory Raul Queiroz Feitosa Content Memory Hierarchy Principle of Locality Some Definitions Cache Architectures Fully Associative Direct Mapping Set Associative Replacement Policy Main Memory Update Polic


slide-1
SLIDE 1

Cache Memory

Raul Queiroz Feitosa

slide-2
SLIDE 2 Cache Memory 2 10/09/2020

Content

  • Memory Hierarchy
  • Principle of Locality
  • Some Definitions
  • Cache Architectures
  • Fully Associative
  • Direct Mapping
  • Set Associative
  • Replacement Policy
  • Main Memory Update Policy
slide-3
SLIDE 3 Cache Memory 3 10/09/2020

Memory Hierarchy

  • Tradeoff cost  speed
  • Memory split in hierarchical levels
  • Request sent to the next level below until it is carried
  • ut.
Cost/bit spped capacity Access time Access probability
slide-4
SLIDE 4 Cache Memory 4 10/09/2020

Cache operation – overview

  • CPU requests contents of memory location
  • Check cache for this data
  • If present, get from cache (fast)
  • If not present, read required block from main

memory to cache

  • Then deliver from cache to CPU
  • Cache includes tags to identify which block of

main memory is in each cache slot

slide-5
SLIDE 5 Cache Memory 5 10/09/2020

Cache Read Operation

slide-6
SLIDE 6 Cache Memory 6 10/09/2020

Cache and Main Memory

from now on

slide-7
SLIDE 7 Cache Memory 7 10/09/2020

Cache Addressing

  • Where does cache sit?
  • Between processor and virtual memory management unit
  • Between MMU and main memory
  • Logical cache (virtual cache) stores data using virtual

addresses

  • Processor accesses cache directly, not through physical cache
  • Cache access faster, before MMU address translation
  • Virtual addresses use same address space for different applications
  • Must flush cache on each context switch
  • Physical cache stores data using main memory physical

addresses

slide-8
SLIDE 8 Cache Memory 8 10/09/2020

Principle of locality

  • Spatial

The processor tends to access few restricted areas of the address space.

  • Temporal

The processor tends to access in the near future addresses accessed in the recent past..

slide-9
SLIDE 9 Cache Memory 9 10/09/2020

Definitions

  • Hit : access served by the cache
  • Miss: access not served by the cache
  • Hitratio: proportion of accesses served by the cache
  • Missratio: proportion of accesses not served by the cache
  • Clearly m+h =1

accesses

  • f

number total cach by the served accesses

  • f

number  h accesses

  • f

number total cach by the served not accesses

  • f

number  m

slide-10
SLIDE 10 Cache Memory 10 10/09/2020

Definitions

Example: Let

 h be the hitratio  thit the access time on a hit  tmiss the access time on a miss

The average memory access time t will be: t = h thit + (1-h) tmiss

1 h

thit tmiss

slide-11
SLIDE 11 Cache Memory 11 10/09/2020

Definitions

Block

 All set of 2b bytes in consecutive addresses, starting in addresses whose b least significant bits

are zero.

 Note that the addresses of the bytes belonging to the same block are coincident to the left of

the b least significant bytes .

 The data exchange between the cache and the main memory is carried out block-by-block.

Does it make sense?

00000000 00000001 00000111 00001000 00001001 00001111 address content

block 0 block 1 … … … …

slide-12
SLIDE 12 Cache Memory 12 10/09/2020

Fully Associative Cache

Architeture

  • VALUE

TAG

valid bit

contains a copy of the memory block contains the number
  • f the block copied in
that line indicates if the line contains a valid memory block copy

1 2L-1

  • lines

2L lines

slide-13
SLIDE 13 Cache Memory 13 10/09/2020

Fully Associative Cache

Operation

 The cache controller compares the block number and the TAG field of

all lines simultaneously (associative search).

 If a TAG matches the block number and the valid bit is “on”, it is a hit,

  • therwise it is a miss.

 The b least significant bits are used as points to the byte/word within the

block.

ADDRESS GENERATED BY THE CPU a-1 b b-1 0 Block number of the addressed byte/word points to the byte/word in the block b

slide-14
SLIDE 14 Cache Memory 14 10/09/2020

Fully Associative Cache

Problem

To compare the block number with the TAG fields of all cache lines simultaneously (associative search)

  • ne needs lots of comparators.

Consequence

Fully associative design is only used for small

capacity caches..

slide-15
SLIDE 15 Cache Memory 15 10/09/2020

Direct Mapped Caches

Basic Idea:

 Assign each main memory block to a single cache line.

  • main memory

blocks cache lines f

slide-16
SLIDE 16 Cache Memory 16 10/09/2020

Direct Mapped Caches

Basic Idea:

Each main memory block can only be loaded into

the cache line it is mapped to.

Thus, it will be no more necessary to check all lines

but just one.

slide-17
SLIDE 17 Cache Memory 17 10/09/2020

Direct Mapped Caches

Operation:

 The cache controller compares the address field to the left

with the TAG field of the (single) cache line defined by the L bits.

ADDRESS GENERATED BY THE CPU a-1 b+L b+L-1 b b-1 0

to be compared with the TAG field

points to a byte/work within the block

b L

points toa cache line

slide-18
SLIDE 18 Cache Memory 18 10/09/2020

Direct Mapped Caches

Problem:

Some lines may be often requested by different blocks, while other lines are rarely requested, which implies in a non-optimal use of the cache capacity.

slide-19
SLIDE 19 Cache Memory 19 10/09/2020

Set Associative Caches

Basic Idea:

 Instead of assigning each main memory block to a single

cache line, assign each block to a set (associative) of cache lines.

  • main memory

blocks cache lines associative sets f

slide-20
SLIDE 20 Cache Memory 20 10/09/2020

Set Associative Caches

Basic Idea:

 A block may be loaded into any cache line of the associative

set it is assigned to.

  • main memory

blocks cache lines associative sets f

slide-21
SLIDE 21 Cache Memory 21 10/09/2020

Set Associative Caches

Architecture

line 0

set 2S sets

v TAG VALUE v TAG VALUE v TAG VALUE

2S-1

line 1 line 2c-1

1 … … … … … … …

slide-22
SLIDE 22 Cache Memory 22 10/09/2020

Set Associative Caches

Operation: assume that there are 2S sets

 The cache controller compares the address field to the left

with the TAG field of all rows of the associative set defined by the S bits (associative search).

ADDRESS GENERATED BY THE CPU a-1 b+S b+S-1 b b-1 0 b S

to be compared with the TAG field points to a byte/work within the block points toa set

slide-23
SLIDE 23 Cache Memory 23 10/09/2020

Set Associative Caches

 Fully Associative Caches:

Are set associative caches with a single associative set.

 Direct Mapped Caches:

Are set associative caches whose “associative” sets

contain each a single line.

slide-24
SLIDE 24 Cache Memory 24 10/09/2020

Set Associative Caches

Set size:

Keeping the overal cache capacity constant and

changing the number of lines/set.

Lines/set missratio (h) 20 21 22 23 Direct mappedtwo-way tour-wayeigth-way Fully associative Above 4 lines/set the missratio does not change significativelly

slide-25
SLIDE 25 Cache Memory 25 10/09/2020

Set Associative Caches

0.0 1k Hit ratio 2k 4k 8k 16k C ache size (bytes) direct 2-way 4-way 8-way 16-way 32k 64k 128k 256k 512k 1M 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
slide-26
SLIDE 26 Cache Memory 26 10/09/2020

Replacement Policy

 Least Recently Used - LRU

 The least recently used line will be merged from cache to

make room for a new main memory block.  Pseudo LRU

Example: a four-way set associative cache

 The least recently used line of the least recently used half is

elected to leave the cache.

bit I0

=0 =1

points to the least recently used half

=0 =1 =0 =1

bit I1 bit I2

points to the least recently used line in this half points to the least recently used line in this half
slide-27
SLIDE 27 Cache Memory 27 10/09/2020

Replacement Policy

  • Example:
 A four-way set associative  The lines in the set are initially empty
  • LRU
  • Pseudo LRU

a b a c b a d c b a a d c b e a d c b e a d e b a d a b a c b a d c b a d c a b e d a b e d b a e d b a a b c d a e b e f a b c d a e b e f

  • nly accesses to the set →
  • nly accesses to the set →
slide-28
SLIDE 28 Cache Memory 28 10/09/2020

Main Memory Update Policy

  • Write Through

 All writes are carried out in the cache and in the main

memory.

 The CPU does not halt until the main memory is updated.

  • Problem

 Lots of traffic → specially harmful in multiprocessors  15% of memory references are writes.

slide-29
SLIDE 29 Cache Memory 29 10/09/2020

Main Memory Update Policy

  • Write Back

 Each cache line has a bit (dirty) that indicates when set (=1),

that te block copy in the cache differ from the main memory.

 When the block is brought from main memory into the

cache, dirty =0

 All writes are performed in the cache only and, in this case,

dirty=1.

 The main memory is updated when the block selected for

replacement has dirty=1.

  • I/O must access main memory through cache
slide-30
SLIDE 30 Cache Memory 30 10/09/2020

Multilevel Caches

  • High logic density enables caches on chip

Faster than bus access Frees bus for other transfers

  • Common to use both on and off chip cache

L1 on chip, L2 off chip in static RAM L2 access much faster than DRAM or ROM L2 often uses separate data path L2 may now be on chip Resulting in L3 cache

 Bus access or now on chip…
slide-31
SLIDE 31 Cache Memory 31 10/09/2020

Multilevel Caches(L1 & L2)

a hit is counted in either cache
  • nly advantageous if L2 > L1
slide-32
SLIDE 32 Cache Memory 32 10/09/2020

Unified v Split Caches

  • One cache for data and instructions or two, one for

data and one for instructions

  • Advantages of unified cache
  • Higher hit rate
  • Balances load of instruction and data fetch
  • Only one cache to design & implement
  • Advantages of split cache
  • Eliminates cache contention between instruction

fetch/decode unit and execution unit

  • Important in pipelining
slide-33
SLIDE 33 Cache Memory 33 10/09/2020

Exercises

Exercise 1 A cache with 64 Kbyte capacity operates with 8 byte blocks and is organized in associative sets having 4 lines each. What is the number of the associative set that may contain a copy of the byte in the main memory address 3B EF 56 H ?

slide-34
SLIDE 34 Cache Memory 34 10/09/2020

Exercises

Exercise 2

Determine the address of the byte stored in the VALUE field in the byte indicated by the shadowed box. It is a 8-way set associative cache with 256 Kbytes capacity and 8 byte blocks. It is further known that the byte is stored in the associative set number 24H and the TAG field has the value 18H.

VALUE TAG

slide-35
SLIDE 35

Exercises

Exercise 3

How many bits are required to represent all relevant configurations of the eight lines of a single associative set in a cache, which uses the LRU policy? And for the pseudo-LRU?

Cache Memory 35 10/09/2020
slide-36
SLIDE 36

Exercises

Exercise 4

The physical address space of processor is 4 Gbyte (230 byte) memory. Its cache stores up to 1 Mbyte (220 byte), operates with 256 byte blocks and is

  • rganized in associative sets containing four lines each.

How many main memory blocks map into a single associative set?

Cache Memory 36 10/09/2020
slide-37
SLIDE 37

Exercises

Exercise 5:

A processor accesses 6 blocks of main memory in the folowing sequence: a b c d d c e b c a d f All of them map into the same set of a four-way set associative cache. Assume that at the beginning all lines of this set are empty and are being filled as misses occur. Indicate below each block replaced in the cache and the corresponding missing block whose access caused the replacement. Do it for LRU and for pseudo-LRU.

Cache Memory 37 10/09/2020 LRU incoming block (caused the miss)
  • utgoing block
(replaced) pseudo LRU incoming block (caused the miss)
  • utgoing block
(replaced)
slide-38
SLIDE 38

References

Stallings chapters 4 and 5

Cache Memory 38 10/09/2020
slide-39
SLIDE 39 Cache Memory 39 10/09/2020

Simulators

 De William Stallings

http://www.ecs.umass.edu/ece/koren/architecture/Cache/frame0.htm

slide-40
SLIDE 40 Cache Memory 40 10/09/2020

Cache Memory

END