ic220 caching 2 memory hierarchy more from chapter 5
play

IC220 Caching 2: Memory Hierarchy (more from Chapter 5 - - PowerPoint PPT Presentation

IC220 Caching 2: Memory Hierarchy (more from Chapter 5 - specifically 5.7, 5.8) 1 Cache design overview ANY cache can be viewed as k-way associative. What are the pros and cons of each? Fully associative: k = N/B 4-way set


  1. IC220 Caching 2: Memory Hierarchy (more from Chapter 5 - specifically 5.7, 5.8) 1

  2. Cache design overview ANY cache can be viewed as k-way associative. What are the pros and cons of each? • Fully associative: k = N/B • 4-way set associative, k = 4 • Direct-mapped, k = 1 2

  3. Improving Cache Performance Remember key metrics: Miss Rate, Hit Time, Miss Penalty What happens if we: • Increase the cache size (N)? • Increase the block size (keeping N the same)? • Increase associativity (keeping N the same)? 3

  4. Cache performance key tradeoff Inherent conflict: HIT TIME vs MISS RATE 4

  5. More hierarchy – L2 cache? • Problem: CPUs get faster, DRAM gets bigger – Must keep hit time small (1 or 2 cycles) – But then cache must be small too (fast SRAM is expensive) – So miss rate gets higher... • Solution: Add another level of cache: – try and optimize the ____________ on the 1st level cache – try and optimize the ____________ on the 2nd level cache 5

  6. Memory Hierarchy 6

  7. Questions • Will the miss rate of a L2 cache be higher or lower than for the L1 cache? • Claim: “The register file is really the lowest level cache” What are reasons in favor and against this statement? 7

  8. Split Caches • Instructions and data have different properties – May benefit from different cache organizations (block size, assoc…) ICache DCache (L1) (L1) L2 Cache CPU L3, L4, …? Main memory 8

  9. What does an address refer to? The old way: • Address refers to a specific byte in main memory (DRAM). • This is called a physical address. Problems with this: CPU Physical address Cache Memory 9

  10. Virtual memory: Main idea CPU works with (fake) virtual addresses. Operating system translates to physical addresses. Advantages: CPU Virtual address OS Translation New challenge: Physical address Cache Memory 10

  11. Pages and virtual address translation • Virtual AND physical addresses divided into blocks called pages. • Typical page size is 4KiB (means 12 bits for offset) Cache Disk Memory 11

  12. Page Tables • Translation from virtual to physical pages stored in page table. 12

  13. Pages: virtual memory blocks • Page faults: the data is not in memory, retrieve it from disk – huge miss penalty (slow disk), thus • pages should be fairly • Replacement strategy: – can handle the faults in software instead of hardware • Writeback or write-through? 13

  14. Address Translation Terminology: •Cache block ฀ •Cache miss ฀ •Cache tag ฀ •Byte offset ฀ 14

  15. Making Address Translation Fast • A cache for address translations: translation lookaside buffer (TLB) Typical values: 16-512 PTEs (page table entries), miss-rate: .01% - 1% 15 miss-penalty: 10 – 100 cycles

  16. Virtual Memory Take-Aways • CPU/programs deal with virtual addresses (virtual page number + page offset). • Translated to physical addresses (physical page # + page offset) between CPU and cache. • Memory is divided into blocks called pages, commonly 4KiB (therefore 12 bits for page offset). • Page tables, managed by the operating system for each process, store virtual->physical page number mapping, as well as that process’s permissions (read/write). • TLB is a special CPU cache for page table lookups. • Physical addresses can reside in DRAM (typical), or be stored on disk (making RAM “look” larger to CPU), or can even refer to other devices (memory-mapped I/O). 16

  17. Modern Systems 17

  18. Program Design 2D array layout • Consider this C declaration: int A[4][3] = { {10, 11, 12}, {20, 21, 22}, {30, 31, 32}, {40, 41, 42} }; • How is this array stored in memory? 20

  19. Program Design for Caches – Example 1 • Option #1 for (j = 0; j < 20; j++) for (i = 0; i < 200; i++) x[i][j] = x[i][j] + 1; • Option #2 for (i = 0; i < 200; i++) for (j = 0; j < 20; j++) x[i][j] = x[i][j] + 1; 21

  20. Program Design for Caches – Example 2 • Why might this code be problematic? int A[1024][1024]; int B[1024][1024]; for (i = 0; i < 1024; i++) for (j = 0; j < 1024; j++) A[i][j] += B[i][j]; • How to fix it? 22

  21. Concluding Remarks • Fast memories are small, large memories are slow – We really want fast, large memories – Caching gives this illusion • Principle of locality – Programs use a small part of their memory space frequently • Memory hierarchy – L1 cache ↔ L2 cache ↔ … ↔ DRAM memory ↔ disk • Memory system design is critical for multiprocessors 23

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend