CS422 Computer Architecture Spring 2004 Lecture 18, 26 Feb 2004 - - PowerPoint PPT Presentation

▶

Feb 20, 2023 41 likes •254 views

CS422 Computer Architecture Spring 2004 Lecture 18, 26 Feb 2004 Bhaskaran Raman Department of CSE IIT Kanpur http://web.cse.iitk.ac.in/~cs422/index.html Memory Hierarchy Two principles: Smaller is faster Principle of locality

SLIDE 1

CS422 Computer Architecture

Spring 2004 Lecture 18, 26 Feb 2004 Bhaskaran Raman Department of CSE IIT Kanpur

http://web.cse.iitk.ac.in/~cs422/index.html

SLIDE 2

Memory Hierarchy

Two principles:

– Smaller is faster – Principle of locality

Processor speed grows much faster than

memory speed

Registers – Cache – Memory – Disk

– Upper level vs. lower level

Cache design

SLIDE 3

Cache Design Questions

Cache is arranged in terms of blocks

– To take advantage of spatial locality

Design choices:

– Q1: block placement – where to place a block in

upper level?

– Q2: block identification – how to find a block in

upper level?

– Q3: block replacement – which block to replace

n a miss?

– Q4: write strategy – what happens on a write?

SLIDE 4

Block Placement: Fully Associative

8 16 24 11 Block 11 can go anywhere Memory Cache

SLIDE 5

Block Placement: Direct

8 16 24 11 Block 11 can go only in block number 11 mod 8 Memory Cache

SLIDE 6

Block Placement: Set Associative

8 16 24 11 Block 11 can go in set number 11 mod 4 Memory Cache

SLIDE 7

Continuum of Choices

Memory has n blocks, cache has m blocks
Fully associative is the same as set

associative with one set (m-way set associative)

Direct placement is the same as 1-way set

associative (with m sets)

Most processors use direct, 2-way/4-way set

associative

SLIDE 8

Block Identification

How many different blocks of memory can be

mapped (at different times) to a cache block?

Fully associative: n
Direct: n/m
k-way set associative: k*n/m
Each cache block has a tag saying which

block of memory is currently present in it

– A valid bit is set to 0 if no memory block is in the

cache block currently

SLIDE 9

Block Identification (continued)

How many bits for the tag?

log2k∗n/m

How many sets in cache?

m/k

How many bits to identify the correct set?

log2m/k

SLIDE 10

Block Identification (continued)

How many blocks in memory?

n ,log2nto represent block number in memory

Tag Index Block offset

Given a memory address:

log2klog2n−log2m log2m−log2k log2block-size

– Select set using index, block from set using tag – Select location from block using block offset – tag + index = block address

SLIDE 11

Block Replacement Policy

Cache miss ==> bring block onto cache

– What if no free block in set? – Need to replace a block

Possible policies:

– Random – Least-Recently Used (LRU)

Lesser miss-rate, but harder to implement

SLIDE 12

Replacement Policy Performance

256KB 64KB 16KB 0.00% 1.00% 2.00% 3.00% 4.00% 5.00% 6.00%

2-way LRU 2-way Random 4-way LRU 4-way Random 8-way LRU 8-way Random

Cache size Cache miss rate

SLIDE 13

Write Strategy

Reads are dominant

– All instructions are read – Even for data, loads dominate over stores

Reads can be fast

– Can read from multiple blocks while performing

tag comparison

– Cannot do the same with writes

Should pay attention to write performance

too!

SLIDE 14

When do Writes go to Memory?

Write through: each write is mirrored to

memory also

– Easier to implement

Write back: write to memory only when

block is replaced

– Faster writes – Some writes do not go to memory at all! – But, read miss may cause more delay

Block being replaced has to be written back
Optimize using dirty bit

– Also, bad for multiprocessors and I/O

SLIDE 15

Write Stalls

In write through, may have to stall waiting for

write to complete

– Called a write stall – Can employ a write buffer to enable the

processor to proceed during the write-through

SLIDE 16

What to do on a Write Miss?

Write-allocate (or, fetch on write): load block
n a cache miss during a write
No-write allocate (or, write around): just

write directly to main memory

Write-allocate usually goes with write-back,

and no-write allocate goes with write-through

SLIDE 17

The Alpha AXP 21064 Cache

34-bit physical address

– 29 bits for block address – 5 bits for block offset

8 KB cache, direct-mapped

– 8 bits for index – 29 – 8 = 21 bits for tag

SLIDE 18

Steps in Memory Read

Four steps:

– Step-1: CPU puts out the address – Step-2: Index selection – Step-3: Tag comparison, read from data – Step-4: Data returned to CPU (assuming hit)

This takes two cycles

SLIDE 19

Steps in Memory Write

Write-through policy is used
Write buffer with four entries

– Each entry can have up to 4 words from the same

block

– Write merging: successive writes to the same

block use the same write-buffer entry

SLIDE 20

Some More Details

What happens on a miss?

– Cache sends signal to CPU asking it to wait – No replacement policy required (direct mapped) – Write miss ==> write-around

8KB separate instruction cache

SLIDE 21

Separate versus Unified Cache

Direct-mapped

cache, 32-byte blocks, SPEC92,

n DECstation

5000

Unified cache has

twice the size of I- cache or D-cache

75% instruction

references

I-Cache D-Cache U-Cache 1KB 3.06% 24.61% 13.34% 2KB 2.26% 20.57% 9.78% 4KB 1.78% 15.94% 7.24% 8KB 1.10% 10.19% 4.57% 16KB 0.64% 6.47% 2.87% 32KB 0.39% 4.82% 1.99% 64KB 0.15% 3.77% 1.35% 128KB 0.02% 2.88% 0.95%

CS422 Computer Architecture

Spring 2004 Lecture 18, 26 Feb 2004 Bhaskaran Raman Department of CSE IIT Kanpur

Memory Hierarchy

memory speed

Cache Design Questions

upper level?

upper level?

Block Placement: Fully Associative

8 16 24 11 Block 11 can go anywhere Memory Cache

Block Placement: Direct

8 16 24 11 Block 11 can go only in block number 11 mod 8 Memory Cache

Block Placement: Set Associative

8 16 24 11 Block 11 can go in set number 11 mod 4 Memory Cache

Continuum of Choices

associative with one set (m-way set associative)

associative (with m sets)

associative

Block Identification

mapped (at different times) to a cache block?

block of memory is currently present in it

cache block currently

Block Identification (continued)

log2k∗n/m

m/k

log2m/k

Block Identification (continued)

n ,log2nto represent block number in memory

Tag Index Block offset

Block Replacement Policy

Replacement Policy Performance

Cache size Cache miss rate

Write Strategy

tag comparison

too!

When do Writes go to Memory?

memory also

block is replaced

Write Stalls

write to complete

processor to proceed during the write-through

What to do on a Write Miss?

write directly to main memory

and no-write allocate goes with write-through

The Alpha AXP 21064 Cache

Steps in Memory Read

Steps in Memory Write

block

block use the same write-buffer entry

Some More Details

Separate versus Unified Cache

cache, 32-byte blocks, SPEC92,

5000

twice the size of I- cache or D-cache

references

Miss-rates