CS422 Computer Architecture
Spring 2004 Lecture 18, 26 Feb 2004 Bhaskaran Raman Department of CSE IIT Kanpur
http://web.cse.iitk.ac.in/~cs422/index.html
CS422 Computer Architecture Spring 2004 Lecture 18, 26 Feb 2004 - - PowerPoint PPT Presentation
CS422 Computer Architecture Spring 2004 Lecture 18, 26 Feb 2004 Bhaskaran Raman Department of CSE IIT Kanpur http://web.cse.iitk.ac.in/~cs422/index.html Memory Hierarchy Two principles: Smaller is faster Principle of locality
http://web.cse.iitk.ac.in/~cs422/index.html
– Smaller is faster – Principle of locality
– Upper level vs. lower level
– To take advantage of spatial locality
– Q1: block placement – where to place a block in
– Q2: block identification – how to find a block in
– Q3: block replacement – which block to replace
– Q4: write strategy – what happens on a write?
– A valid bit is set to 0 if no memory block is in the
log2klog2n−log2m log2m−log2k log2block-size
– Select set using index, block from set using tag – Select location from block using block offset – tag + index = block address
– What if no free block in set? – Need to replace a block
– Random – Least-Recently Used (LRU)
256KB 64KB 16KB 0.00% 1.00% 2.00% 3.00% 4.00% 5.00% 6.00%
2-way LRU 2-way Random 4-way LRU 4-way Random 8-way LRU 8-way Random
– All instructions are read – Even for data, loads dominate over stores
– Can read from multiple blocks while performing
– Cannot do the same with writes
– Easier to implement
– Faster writes – Some writes do not go to memory at all! – But, read miss may cause more delay
– Also, bad for multiprocessors and I/O
– Called a write stall – Can employ a write buffer to enable the
– 29 bits for block address – 5 bits for block offset
– 8 bits for index – 29 – 8 = 21 bits for tag
– Step-1: CPU puts out the address – Step-2: Index selection – Step-3: Tag comparison, read from data – Step-4: Data returned to CPU (assuming hit)
– Each entry can have up to 4 words from the same
– Write merging: successive writes to the same
– Cache sends signal to CPU asking it to wait – No replacement policy required (direct mapped) – Write miss ==> write-around
I-Cache D-Cache U-Cache 1KB 3.06% 24.61% 13.34% 2KB 2.26% 20.57% 9.78% 4KB 1.78% 15.94% 7.24% 8KB 1.10% 10.19% 4.57% 16KB 0.64% 6.47% 2.87% 32KB 0.39% 4.82% 1.99% 64KB 0.15% 3.77% 1.35% 128KB 0.02% 2.88% 0.95%