Cache design overview ANY cache can be viewed as k-way associative. - - PowerPoint PPT Presentation

cache design overview
SMART_READER_LITE
LIVE PREVIEW

Cache design overview ANY cache can be viewed as k-way associative. - - PowerPoint PPT Presentation

Cache design overview ANY cache can be viewed as k-way associative. What are the pros and cons of each? Fully associative: k = N/B IC220 Caching 2: Memory Hierarchy (more from Chapter 5 - specifically 5.7, 5.8) 4-way set associative,


slide-1
SLIDE 1

1

IC220 Caching 2: Memory Hierarchy (more from Chapter 5 - specifically 5.7, 5.8)

2

Cache design overview

ANY cache can be viewed as k-way associative. What are the pros and cons of each?

  • Fully associative: k = N/B
  • 4-way set associative, k = 4
  • Direct-mapped, k = 1

3

Improving Cache Performance

Remember key metrics: Miss Rate, Hit Time, Miss Penalty What happens if we:

  • Increase the cache size (N)?
  • Increase the block size (keeping N the same)?
  • Increase associativity (keeping N the same)?

4

Cache performance key tradeoff

Inherent conflict:

HIT TIME MISS RATE vs

slide-2
SLIDE 2

5

More hierarchy – L2 cache?

  • Problem: CPUs get faster, DRAM gets bigger

– Must keep hit time small (1 or 2 cycles) – But then cache must be small too (fast SRAM is expensive) – So miss rate gets higher...

  • Solution: Add another level of cache:

– try and optimize the ____________ on the 1st level cache – try and optimize the ____________ on the 2nd level cache

6

Memory Hierarchy

7

Questions

  • Will the miss rate of a L2 cache be higher or lower than for the L1

cache?

  • Claim: “The register file is really the lowest level cache”

What are reasons in favor and against this statement?

8

  • Instructions and data have different properties

– May benefit from different cache organizations (block size, assoc…)

Split Caches

ICache (L1) DCache (L1) L2 Cache Main memory L3, L4, …?

CPU

slide-3
SLIDE 3

9

What does an address refer to?

The old way:

  • Address refers to a specific byte in main memory (DRAM).
  • This is called a physical address.

Cache Memory CPU

Physical address

Problems with this:

10

Virtual memory: Main idea

CPU works with (fake) virtual addresses. Operating system translates to physical addresses. Cache Memory CPU

Physical address

Advantages: New challenge: OS Translation

Virtual address

11

Pages and virtual address translation

Cache Memory Disk

  • Virtual AND physical

addresses divided into blocks called pages.

  • Typical page size is 4KiB

(means 12 bits for offset)

12

Page Tables

  • Translation from virtual to physical pages stored in page table.
slide-4
SLIDE 4

13

Pages: virtual memory blocks

  • Page faults: the data is not in memory, retrieve it from disk

– huge miss penalty (slow disk), thus

  • pages should be fairly
  • Replacement strategy:

– can handle the faults in software instead of hardware

  • Writeback or write-through?

14

Address Translation

Terminology:

  • Cache block ฀
  • Cache miss ฀
  • Cache tag ฀
  • Byte offset ฀

15

Making Address Translation Fast

  • A cache for address translations: translation lookaside buffer (TLB)

Typical values: 16-512 PTEs (page table entries), miss-rate: .01% - 1% miss-penalty: 10 – 100 cycles

16

Virtual Memory Take-Aways

  • CPU/programs deal with virtual addresses

(virtual page number + page offset).

  • Translated to physical addresses (physical page # + page offset)

between CPU and cache.

  • Memory is divided into blocks called pages,

commonly 4KiB (therefore 12 bits for page offset).

  • Page tables, managed by the operating system for each process,

store virtual->physical page number mapping, as well as that process’s permissions (read/write).

  • TLB is a special CPU cache for page table lookups.
  • Physical addresses can reside in DRAM (typical),
  • r be stored on disk (making RAM “look” larger to CPU),
  • r can even refer to other devices (memory-mapped I/O).
slide-5
SLIDE 5

17

Modern Systems

20

Program Design 2D array layout

  • Consider this C declaration:

int A[4][3] = { {10, 11, 12}, {20, 21, 22}, {30, 31, 32}, {40, 41, 42} };

  • How is this array stored in memory?

21

Program Design for Caches – Example 1

  • Option #1

for (j = 0; j < 20; j++) for (i = 0; i < 200; i++) x[i][j] = x[i][j] + 1;

  • Option #2

for (i = 0; i < 200; i++) for (j = 0; j < 20; j++) x[i][j] = x[i][j] + 1;

22

Program Design for Caches – Example 2

  • Why might this code be problematic?

int A[1024][1024]; int B[1024][1024]; for (i = 0; i < 1024; i++) for (j = 0; j < 1024; j++) A[i][j] += B[i][j];

  • How to fix it?
slide-6
SLIDE 6

23

Concluding Remarks

  • Fast memories are small, large memories are slow

– We really want fast, large memories – Caching gives this illusion

  • Principle of locality

– Programs use a small part of their memory space frequently

  • Memory hierarchy

– L1 cache ↔ L2 cache ↔ … ↔ DRAM memory ↔ disk

  • Memory system design is critical for multiprocessors