IC220 Caching 2: Memory Hierarchy (more from Chapter 5 - - - PowerPoint PPT Presentation

ic220 caching 2 memory hierarchy more from chapter 5
SMART_READER_LITE
LIVE PREVIEW

IC220 Caching 2: Memory Hierarchy (more from Chapter 5 - - - PowerPoint PPT Presentation

IC220 Caching 2: Memory Hierarchy (more from Chapter 5 - specifically 5.7, 5.8) 1 Cache design overview ANY cache can be viewed as k-way associative. What are the pros and cons of each? Fully associative: k = N/B 4-way set


slide-1
SLIDE 1

1

IC220 Caching 2: Memory Hierarchy (more from Chapter 5 - specifically 5.7, 5.8)

slide-2
SLIDE 2

2

Cache design overview

ANY cache can be viewed as k-way associative. What are the pros and cons of each?

  • Fully associative: k = N/B
  • 4-way set associative, k = 4
  • Direct-mapped, k = 1
slide-3
SLIDE 3

3

Improving Cache Performance

Remember key metrics: Miss Rate, Hit Time, Miss Penalty What happens if we:

  • Increase the cache size (N)?
  • Increase the block size (keeping N the same)?
  • Increase associativity (keeping N the same)?
slide-4
SLIDE 4

4

Cache performance key tradeoff

Inherent conflict:

HIT TIME MISS RATE vs

slide-5
SLIDE 5

5

More hierarchy – L2 cache?

  • Problem: CPUs get faster, DRAM gets bigger

– Must keep hit time small (1 or 2 cycles) – But then cache must be small too (fast SRAM is expensive) – So miss rate gets higher...

  • Solution: Add another level of cache:

– try and optimize the ____________ on the 1st level cache – try and optimize the ____________ on the 2nd level cache

slide-6
SLIDE 6

6

Memory Hierarchy

slide-7
SLIDE 7

7

Questions

  • Will the miss rate of a L2 cache be higher or lower than for the L1

cache?

  • Claim: “The register file is really the lowest level cache”

What are reasons in favor and against this statement?

slide-8
SLIDE 8

8

  • Instructions and data have different properties

– May benefit from different cache organizations (block size, assoc…)

Split Caches

ICache (L1) DCache (L1) L2 Cache Main memory L3, L4, …?

CPU

slide-9
SLIDE 9

9

What does an address refer to?

The old way:

  • Address refers to a specific byte in main memory (DRAM).
  • This is called a physical address.

Cache Memory CPU

Physical address

Problems with this:

slide-10
SLIDE 10

10

Virtual memory: Main idea

CPU works with (fake) virtual addresses. Operating system translates to physical addresses. Cache Memory CPU

Physical address

Advantages: New challenge: OS Translation

Virtual address

slide-11
SLIDE 11

11

Pages and virtual address translation

Cache Memory Disk

  • Virtual AND physical

addresses divided into blocks called pages.

  • Typical page size is 4KiB

(means 12 bits for offset)

slide-12
SLIDE 12

12

Page Tables

  • Translation from virtual to physical pages stored in page table.
slide-13
SLIDE 13

13

Pages: virtual memory blocks

  • Page faults: the data is not in memory, retrieve it from disk

– huge miss penalty (slow disk), thus

  • pages should be fairly
  • Replacement strategy:

– can handle the faults in software instead of hardware

  • Writeback or write-through?
slide-14
SLIDE 14

14

Address Translation

Terminology:

  • Cache block ฀
  • Cache miss ฀
  • Cache tag ฀
  • Byte offset ฀
slide-15
SLIDE 15

15

Making Address Translation Fast

  • A cache for address translations: translation lookaside buffer (TLB)

Typical values: 16-512 PTEs (page table entries), miss-rate: .01% - 1% miss-penalty: 10 – 100 cycles

slide-16
SLIDE 16

16

Virtual Memory Take-Aways

  • CPU/programs deal with virtual addresses

(virtual page number + page offset).

  • Translated to physical addresses (physical page # + page offset)

between CPU and cache.

  • Memory is divided into blocks called pages,

commonly 4KiB (therefore 12 bits for page offset).

  • Page tables, managed by the operating system for each process,

store virtual->physical page number mapping, as well as that process’s permissions (read/write).

  • TLB is a special CPU cache for page table lookups.
  • Physical addresses can reside in DRAM (typical),
  • r be stored on disk (making RAM “look” larger to CPU),
  • r can even refer to other devices (memory-mapped I/O).
slide-17
SLIDE 17

17

Modern Systems

slide-18
SLIDE 18

20

Program Design 2D array layout

  • Consider this C declaration:

int A[4][3] = { {10, 11, 12}, {20, 21, 22}, {30, 31, 32}, {40, 41, 42} };

  • How is this array stored in memory?
slide-19
SLIDE 19

21

Program Design for Caches – Example 1

  • Option #1

for (j = 0; j < 20; j++) for (i = 0; i < 200; i++) x[i][j] = x[i][j] + 1;

  • Option #2

for (i = 0; i < 200; i++) for (j = 0; j < 20; j++) x[i][j] = x[i][j] + 1;

slide-20
SLIDE 20

22

Program Design for Caches – Example 2

  • Why might this code be problematic?

int A[1024][1024]; int B[1024][1024]; for (i = 0; i < 1024; i++) for (j = 0; j < 1024; j++) A[i][j] += B[i][j];

  • How to fix it?
slide-21
SLIDE 21

23

Concluding Remarks

  • Fast memories are small, large memories are slow

– We really want fast, large memories – Caching gives this illusion

  • Principle of locality

– Programs use a small part of their memory space frequently

  • Memory hierarchy

– L1 cache ↔ L2 cache ↔ … ↔ DRAM memory ↔ disk

  • Memory system design is critical for multiprocessors