Chapter Seven 1 2004 Morgan Kaufmann Publishers Memories: Review - PowerPoint PPT Presentation

Chapter Seven 1  2004 Morgan Kaufmann Publishers

Memories: Review • SRAM: – value is stored on a pair of inverting gates – very fast but takes up more space than DRAM (4 to 6 transistors) • DRAM: – value is stored as a charge on capacitor (must be refreshed) – very small but slower than SRAM (factor of 5 to 10) – very small but slower than SRAM (factor of 5 to 10) W ord line P ass transistor A A C apacitor B B B it line 2  2004 Morgan Kaufmann Publishers

Exploiting Memory Hierarchy • Users want large and fast memories! SRAM access times are .5 – 5ns at cost of $4000 to $10,000 per GB. DRAM access times are 50-70ns at cost of $100 to $200 per GB. Disk access times are 5 to 20 million ns at cost of $.50 to $2 per GB. • Try and give it to them anyway – build a memory hierarchy 3  2004 Morgan Kaufmann Publishers

Locality • A principle that makes having a memory hierarchy a good idea • If an item is referenced, temporal locality: it will tend to be referenced again soon spatial locality: nearby items will tend to be referenced soon. Why does code have locality? • Our initial focus: two levels (upper, lower) – block: minimum unit of data transferred between adjacent levels – hit: data requested is in the upper level – miss: data requested is not in the upper level – miss penalty: the requirement to fetch a block into a level of the memory hierarchy from the lower level 4  2004 Morgan Kaufmann Publishers

Cache • Two issues: – How do we know if a data item is in the cache? – If it is, how do we find it? • Our first example: – block size is one word of data – "direct mapped" For each item of data at the lower level, there is exactly one location in the cache where it might be. e.g., lots of items at the lower level share locations in the upper level 5  2004 Morgan Kaufmann Publishers

Direct Mapped Cache • Mapping: address is modulo the number of blocks in the cache Cache 000 001 010 011 100 101 110 111 01001 00001 00101 01101 10001 10101 11001 11101 Memory 6  2004 Morgan Kaufmann Publishers

Direct Mapped Cache For MIPS: • Tag – contains the address information to identify whether the associated block corresponds to a requested word • • Valid bit – a bit to indicate Valid bit – a bit to indicate that the associated block in the hierarchy contains valid data Why “valid” bit ? 7  2004 Morgan Kaufmann Publishers

Direct Mapped Cache - Example a ( Miss ) Index V Tag Data Memory Decimal Binary Hit or Assigned cache request address of address of miss in block 000 N reference reference cache 001 N a 22 10110 Miss 10110 mod 8 = 110 010 N b 26 11010 Miss 11010 mod 8 = 010 011 N c 18 10010 Miss 10110 mod 8 = 110 100 N b 26 11010 Miss 11010 mod 8 = 010 101 N a 22 10000 Hit 10000 mod 8 = 000 110 Y 10 a 111 111 N N a (Hit) b ( Miss ) c ( Miss ) b ( Miss ) Inde V Tag Data Ind V Tag Data Ind V Tag Data Ind V Tag Data x ex ex ex 000 N 000 N 000 N 000 N 001 N 001 N 001 N 001 N 010 Y 11 b 010 Y 10 c 010 Y 11 b 010 Y 11 b 011 N 011 N 011 N 011 N 100 N 100 N 100 N 100 N 101 N 101 N 101 N 101 N 110 Y 10 a 110 Y 10 a 110 Y 10 a 110 Y 10 a 111 N 111 N 111 N 111 N 8  2004 Morgan Kaufmann Publishers

Direct Mapped Cache • Increase Block Size : – E.g., A 16KB cache contains 256 blocks with 16 words per block – What kind of locality are we taking advantage of? (spatial locality) 9  2004 Morgan Kaufmann Publishers

Analysis of Tag Bits and Index Bits • Assume the 32-bit byte address, a directed-mapped cache of size 2 n blocks with 2 m -word ( 2 m+2 -byte) blocks – Tag field: 32 – (n+m+2 ) bits Cache size: 2 n × (2 m × 32 + (32 – n – m – 2) + 1) – Ex1. Bits in a cache • How many total bits are required for a directed-mapped cache with 16KB of data and 4-word blocks, assuming a 32-bit address? 16KB = 2 14 Bytes = 2 12 Words 16KB = 2 14 Bytes = 2 12 Words – – Number of blocks = 2 12 /4 = 2 10 blocks – – Tag field = 32 – (2 + 2 + 10) = 18 tag index word byte Total size = 2 10 × – × (4 × × 32 + 18 + 1) = 147 Kbits × × × × Ex2. Mapping an address to a multiword cache block • Consider a cache with 64 blocks and a block size of 16 bytes. What block number does byte address 1200 map to? Block address = � � � 1200/16 � � � � � = 75 – – Block number = 75 modulo 64 = 11 – Block number 11 ranges from 1200 to 1215 10  2004 Morgan Kaufmann Publishers

Hits vs. Misses • Read hits – this is what we want! • Read misses – stall the CPU, fetch block from memory, deliver to cache, restart • Write hits: – can replace data in cache and memory (write-through) • The writes always update both the cache and the memory, ensuring that data is always consistent between the two. – write the data only into the cache (write-back the cache later) • The modified blocks (dirty blocks) are written to the lower level of the hierarchy when the block is replaced. – Write the data into the cache and a buffer (write buffer) • After writing into buffer, CPU continue execution, writing to memory is controlled by memory controller • If buffer is full, CPU must wait for a free buffer entry • Write misses: – read the entire block into the cache, then write the word 11  2004 Morgan Kaufmann Publishers

Hardware Issues • Assume a cache block of 4 words (transfer 4 words for one miss) • Make reading multiple words easier by using banks of memory CPU CPU CPU The width of bus and cache need Multiplexor not change Cache Cache Cache Bus Bus Bus Memory Memory Memory Memory Memory bank 0 bank 1 bank 2 bank 3 b. Wide memory organization c. Interleaved memory organization c. Interleaved memory b. Wide memory 1+ 2 × × 15 + 2 = 33 (2-word wide) Memory × × 1+ 1 × × 15 + 4 × × 1 = 20 × × × × 1+ 1 × × 15 + 1 = 17 (4-word wide) × × Ex. 3 Assuming following memory access times - 1 clock cycle to send the address a. One-word-wide - 15 clock cycles for each DRAM access initiated a. One-word-wide memory memory organization - 1 clock cycle to send a word of data 1+ 4 × × 15 + 4 × × 1 = 65 × × × × Assume the cache block is of 4 words. What’s the cache miss penalty for different memory organizations? 12  2004 Morgan Kaufmann Publishers

Performance • Use split caches because there is more spatial locality in code: Block size in Instruction Data miss Effective combined Program words miss rate rate miss rate gcc 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% spice 1 1.2% 1.3% 1.2% 4 0.3% 0.6% 0.4% • Increasing the block size tends to decrease miss rate: • However, for a fixed cache size, as block size increases over a threshold value, miss rate will increase (Why ?) 13  2004 Morgan Kaufmann Publishers

Performance • Simplified model: execution time = (execution cycles + stall cycles) ´ ´ ´ cycle time ´ stall cycles = # of instructions ´ ´ miss ratio ´ ´ miss penalty ´ ´ ´ ´ • Two ways of improving performance: – decreasing the miss ratio – decreasing the miss penalty What happens if we increase block size? 14  2004 Morgan Kaufmann Publishers

Cache Performance Examples • Ex. 4(a) Instruction cache miss rate = 2%, data cache miss rate = 4%, CPI = 2 without any memory stall, miss penalty = 100 clock cycles, how much faster a processor would run with a perfect cache that never missed (Assume the percentage of instructions lw and sw is 36%) – Instruction miss cycle = I × × 2% × × 100 = 2.0 × × I × × × × × × – data miss cycle = I × × 36%(lw and sw percentage) × × 4% × × 100 = 1.44 × × I × × × × × × × × – CPI with memory stall = 2 + 3.44 = 5.44 – CPU_time stall /CPU_time nostall = I × × CPI stall × × cycle_time/I × × CPI nostall × × cycle_time = × × × × × × × × CPI stall /CPI nostall = 5.44/2 = 2.72 • Ex. 4(b) What happens if the processor is made twofold faster by reducing CPI from 2 to 1, but the memory system is not? – (1+3.44)/1 = 4.44 • Ex. 4(c) Double clock rate, the time to handle a cache miss does not change. How much faster will the computer be with the same miss rate? – Miss cycle/inst = (2% × × 200)+36% × × (4% × × 200)=6.88 × × × × × × – Performance fast /performance slow = execution_time slow /execution_tim fast = IC × × CPI slow × × cycle_time/IC × × CPI fast × × (cycle_time/2) = 5.44/(8.88 × × 0.5) = 1.23 × × × × × × × × × × 15  2004 Morgan Kaufmann Publishers

Decreasing miss ratio with associativity • Directed-mapping placement • Full associative placement – a block can be placed in any location in the cache • Set associative cache – each block can be placed in a fixed number of locations (at least two). Mapping = (block number) modulo (number of sets in the cache) Block # Block # 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 Set # Set # 0 0 1 1 2 2 3 3 Data Data Data 1 � 1 � 1 � Tag Tag Tag 2 2 2 Search Search Search 16  2004 Morgan Kaufmann Publishers

Chapter Seven 1 2004 Morgan Kaufmann Publishers Memories: Review - PowerPoint PPT Presentation

Chapter Seven 1 2004 Morgan Kaufmann Publishers Memories: Review SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: value is stored as a charge on

The Seven Churches of Revelation The Seven Churches of Revelation 2 Corinthians 5:10 For we

Applying TSP for Applying TSP for Services: Services: Seven Key Lessons Seven Key Lessons

Group of Seven Lake Superior Trail Pic Island by Lawren Harris Marathon The Group of Seven

Behind the scene Seven Capital Management - Seven Capital Behind the scene Page 1 of 8 Regulated

Kevin F. Damron DSHE Project Development The Seven Year Itch The Seven Year Itch It has been

Seven Essential Board Slides Every board meeting should include these seven slides as a minimum

Jesus Will Return What will that look like? Seven Cs of history 1. 2. 3. 4. 5. 6. 7.

Explorers Grand Slam The Seven Summits Explorers Grand Slam Seven Summits Vinson Massif

And having turned I saw seven golden lamp stands and in the midst of the seven lamp

Topics 11/13/2006 Chapter 11, start Chapter 12 11/20/2006 Chapter 12 11/27/2006 Chapter 13

A Second Look At ML Chapter Seven Modern Programming Languages, 2nd ed. 1 Outline Patterns

Seven Years Later: Seven Years Later: What the Agile Manifesto Left Out What the Agile Manifesto

Presentation 2: Seven Last Words to the Cross There is a fairly ancient devotion to the Seven

School- -Based Health Centers: Based Health Centers: School Seven Fundamental Principles Seven

Figurative Language 1 Seven Techniques of Figurative Language The seven techniques you need to

Twenty Seven Co AGM Presentation CEO: Ian Warland 28 November 2018 TWENTY SEVEN Co. CORPORATE

Computer Architecture Memory System Virendra Singh Associate Professor Computer Architecture

Memory Address Map CS RD RAM 0 0 RD WR 1K x 8 WR DB(0..7) 1 AB Decoder AB(10..11) 2

WCET Driven Design Space Exploration of an Object Cache Benedikt Huber, Wolfgang Puffitsch,

Direct Addressed Caches for Reduced Power Consumption Emmett Witchel Sam Larsen C. Scott

IC220 Slide Set #6: Digital Logic (Appendix B) 1 2 Appendix Goals Logic Design Digital

tr rtt t

A sequent calculus for a semi-associative law 1 Noam Zeilberger University of Birmingham

Then the left side is clearly \an y sequence of 0's and 1's. 1 The righ t