Cache Memory
Raul Queiroz Feitosa
Cache Memory Raul Queiroz Feitosa Content Memory Hierarchy - - PowerPoint PPT Presentation
Cache Memory Raul Queiroz Feitosa Content Memory Hierarchy Principle of Locality Some Definitions Cache Architectures Fully Associative Direct Mapping Set Associative Replacement Policy Main Memory Update Polic
Cache Memory
Raul Queiroz Feitosa
Content
Memory Hierarchy
Cache operation – overview
memory to cache
main memory is in each cache slot
Cache Read Operation
Cache and Main Memory
from now on
Cache Addressing
addresses
addresses
Principle of locality
The processor tends to access few restricted areas of the address space.
The processor tends to access in the near future addresses accessed in the recent past..
Definitions
accesses
number total cach by the served accesses
number h accesses
number total cach by the served not accesses
number m
Definitions
Example: Let
h be the hitratio thit the access time on a hit tmiss the access time on a miss
The average memory access time t will be: t = h thit + (1-h) tmiss
1 h
thit tmiss
Definitions
Block
All set of 2b bytes in consecutive addresses, starting in addresses whose b least significant bitsare zero.
Note that the addresses of the bytes belonging to the same block are coincident to the left ofthe b least significant bytes .
The data exchange between the cache and the main memory is carried out block-by-block.Does it make sense?
00000000 00000001 00000111 00001000 00001001 00001111 address contentblock 0 block 1 … … … …
Fully Associative Cache
Architeture
TAG
valid bit
contains a copy of the memory block contains the number1 2L-1
2L lines
Fully Associative Cache
Operation
The cache controller compares the block number and the TAG field of
all lines simultaneously (associative search).
If a TAG matches the block number and the valid bit is “on”, it is a hit,
The b least significant bits are used as points to the byte/word within the
block.
ADDRESS GENERATED BY THE CPU a-1 b b-1 0 Block number of the addressed byte/word points to the byte/word in the block b
Fully Associative Cache
Problem
To compare the block number with the TAG fields of all cache lines simultaneously (associative search)
Consequence
Fully associative design is only used for small
capacity caches..
Direct Mapped Caches
Basic Idea:
Assign each main memory block to a single cache line.
blocks cache lines f
Direct Mapped Caches
Basic Idea:
Each main memory block can only be loaded into
the cache line it is mapped to.
Thus, it will be no more necessary to check all lines
but just one.
Direct Mapped Caches
Operation:
The cache controller compares the address field to the left
with the TAG field of the (single) cache line defined by the L bits.
ADDRESS GENERATED BY THE CPU a-1 b+L b+L-1 b b-1 0
to be compared with the TAG fieldpoints to a byte/work within the block
b L
points toa cache line
Direct Mapped Caches
Problem:
Some lines may be often requested by different blocks, while other lines are rarely requested, which implies in a non-optimal use of the cache capacity.
Set Associative Caches
Basic Idea:
Instead of assigning each main memory block to a single
cache line, assign each block to a set (associative) of cache lines.
blocks cache lines associative sets f
Set Associative Caches
Basic Idea:
A block may be loaded into any cache line of the associative
set it is assigned to.
blocks cache lines associative sets f
Set Associative Caches
Architecture
line 0set 2S sets
v TAG VALUE v TAG VALUE v TAG VALUE2S-1
line 1 line 2c-11 … … … … … … …
Set Associative Caches
Operation: assume that there are 2S sets
The cache controller compares the address field to the left
with the TAG field of all rows of the associative set defined by the S bits (associative search).
ADDRESS GENERATED BY THE CPU a-1 b+S b+S-1 b b-1 0 b S
to be compared with the TAG field points to a byte/work within the block points toa set
Set Associative Caches
Fully Associative Caches:
Are set associative caches with a single associative set.
Direct Mapped Caches:
Are set associative caches whose “associative” sets
contain each a single line.
Set Associative Caches
Set size:
Keeping the overal cache capacity constant and
changing the number of lines/set.
Lines/set missratio (h) 20 21 22 23 Direct mappedtwo-way tour-wayeigth-way Fully associative Above 4 lines/set the missratio does not change significativelly
Set Associative Caches
0.0 1k Hit ratio 2k 4k 8k 16k C ache size (bytes) direct 2-way 4-way 8-way 16-way 32k 64k 128k 256k 512k 1M 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0Replacement Policy
Least Recently Used - LRU
The least recently used line will be merged from cache to
make room for a new main memory block. Pseudo LRU
Example: a four-way set associative cache
The least recently used line of the least recently used half is
elected to leave the cache.
bit I0
=0 =1
points to the least recently used half=0 =1 =0 =1
bit I1 bit I2
points to the least recently used line in this half points to the least recently used line in this halfReplacement Policy
a b a c b a d c b a a d c b e a d c b e a d e b a d a b a c b a d c b a d c a b e d a b e d b a e d b a a b c d a e b e f a b c d a e b e f
Main Memory Update Policy
All writes are carried out in the cache and in the main
memory.
The CPU does not halt until the main memory is updated.
Lots of traffic → specially harmful in multiprocessors 15% of memory references are writes.
Main Memory Update Policy
Each cache line has a bit (dirty) that indicates when set (=1),
that te block copy in the cache differ from the main memory.
When the block is brought from main memory into the
cache, dirty =0
All writes are performed in the cache only and, in this case,
dirty=1.
The main memory is updated when the block selected for
replacement has dirty=1.
Multilevel Caches
Faster than bus access Frees bus for other transfers
L1 on chip, L2 off chip in static RAM L2 access much faster than DRAM or ROM L2 often uses separate data path L2 may now be on chip Resulting in L3 cache
Bus access or now on chip…Multilevel Caches(L1 & L2)
a hit is counted in either cacheUnified v Split Caches
data and one for instructions
fetch/decode unit and execution unit
Exercises
Exercise 1 A cache with 64 Kbyte capacity operates with 8 byte blocks and is organized in associative sets having 4 lines each. What is the number of the associative set that may contain a copy of the byte in the main memory address 3B EF 56 H ?
Exercises
Exercise 2
Determine the address of the byte stored in the VALUE field in the byte indicated by the shadowed box. It is a 8-way set associative cache with 256 Kbytes capacity and 8 byte blocks. It is further known that the byte is stored in the associative set number 24H and the TAG field has the value 18H.
VALUE TAG
Exercises
Exercise 3
How many bits are required to represent all relevant configurations of the eight lines of a single associative set in a cache, which uses the LRU policy? And for the pseudo-LRU?
Cache Memory 35 10/09/2020Exercises
Exercise 4
The physical address space of processor is 4 Gbyte (230 byte) memory. Its cache stores up to 1 Mbyte (220 byte), operates with 256 byte blocks and is
How many main memory blocks map into a single associative set?
Cache Memory 36 10/09/2020Exercises
Exercise 5:
A processor accesses 6 blocks of main memory in the folowing sequence: a b c d d c e b c a d f All of them map into the same set of a four-way set associative cache. Assume that at the beginning all lines of this set are empty and are being filled as misses occur. Indicate below each block replaced in the cache and the corresponding missing block whose access caused the replacement. Do it for LRU and for pseudo-LRU.
Cache Memory 37 10/09/2020 LRU incoming block (caused the miss)References
Stallings chapters 4 and 5
Cache Memory 38 10/09/2020Simulators
De William Stallings
http://www.ecs.umass.edu/ece/koren/architecture/Cache/frame0.htm
Cache Memory