IC220: Caching 1 (Chapter 5) 1 Memory, Cost, and Performance - - PowerPoint PPT Presentation

ic220 caching 1 chapter 5
SMART_READER_LITE
LIVE PREVIEW

IC220: Caching 1 (Chapter 5) 1 Memory, Cost, and Performance - - PowerPoint PPT Presentation

IC220: Caching 1 (Chapter 5) 1 Memory, Cost, and Performance Ideal World: we want a memory that is Fast, Big, & Cheap! (Choose any two!) Recent real world situation: SRAM access times are .5 2.5ns at


slide-1
SLIDE 1

1

IC220: Caching 1 (Chapter 5)

slide-2
SLIDE 2

2

  • Ideal World: we want a memory that is

– Fast, – Big, & – Cheap! (Choose any two!)

  • Recent “real world” situation:

SRAM access times are .5 – 2.5ns at cost of $500 to $1000 per GB. DRAM access times are 50-70ns at cost of $10 to $20 per GB. Flash memory access times are 5000-50,000 ns at cost of $0.75-$1 per GB Disk access times are 5 to 20 million ns at cost of $.05 to $0.10 per GB.

  • Solution: CACHING

Memory, Cost, and Performance

slide-3
SLIDE 3

3

Caching Concepts and Terminology

Locality: Temporal and Spatial Each access: Hit or Miss Eviction strategies: Random or Least-Recently Used (LRU) Reasons for a Miss: Compulsory, Capacity, or Conflict Measurements: Miss Rate, Hit Time, Miss Penalty Cache types: Fully Associative, Direct-Mapped, or Set Associative Parameters: N (total size), B (size of block), k (associativity) Write strategies: Write-through or Write-back Implementation details: Stall, Valid bit, Dirty bit, Tag

slide-4
SLIDE 4

4

Principle of Locality

  • Basic observations on how memory tends to be

accessed in computer programs:

  • If an item is referenced,
  • 1. it will tend to be referenced again soon

(TEMPORAL LOCALITY)

  • 2. nearby items will tend to be referenced soon.

(SPATIAL LOCALITY)

slide-5
SLIDE 5

5

Caching Basics

Cache consists of N bytes (N=8 in our examples here) To read or write to a given address: 1. First look in cache. 2. If it’s there, we have a HIT. 3. Otherwise, it’s a MISS and you must fetch from main memory. – If cache is full, must EVICT to insert the new data. Which cache line should be evicted?

  • Random
  • Least-Recently Used (LRU)

(our examples will always follow LRU)

slide-6
SLIDE 6

6 Example 1 – Fully associative (no blocking)

Address Data

Cache (N = 8)

16 67 17 3 18 27 19 32

...

42 78 43 59 44 24 45 56 46 87 47 36 48 98 49 59

Memory Processor

1. Read 42 2. Read 43 3. Read 18 4. Read 43 5. Read 17 6. Read 42 7. Read 19 8. Read 45 9. Read 44 10. Read 46 11. Read 43 12. Read 47 13. Read 18

Total hits? Total misses?

Address Data

slide-7
SLIDE 7

7

Analysis of Example 1 (FA, no blocking)

  • Miss Rate =

=

  • Two kinds of misses:

– Compulsory miss (first time accessing) – Capacity miss (not enough room, got evicted)

  • What was good:
  • What was bad:
  • Measurement concepts:

– Hit Time: How long to look up something that is in cache – Miss Penalty: How long to fetch something not in cache # of misses # of accesses

slide-8
SLIDE 8

8

How to handle a miss?

  • Things we need to do:

1. Stall the CPU until miss completes 2. (If cache is full) Evict old data

  • Random or LRU

3. Fetch the needed data from memory 4. Restart the CPU Time for this is called the Miss Penalty.

slide-9
SLIDE 9

9

Blocking

Goal: Exploit spatial locality Main idea:

  • Group memory into blocks
  • B bytes in each block of memory
  • B bytes in each cache line
  • Always fetch and evict entire blocks

(even if not all data was requested yet)

  • Position within cache line determined by Byte Offset:
slide-10
SLIDE 10

10 Example 2 – Fully associative with blocking

Address Data Offset 0

Offset 1

Cache (N=8, B=2)

16 67

B L O C K

17 3 18 27

B L O C K

19 32

...

42 78

B L O C K

43 59 44 24

B L O C K

45 56 46 87

B L O C K

47 36 48 98

B L O C K

49 59

Memory Processor

1. Read 42 2. Read 43 3. Read 18 4. Read 43 5. Read 17 6. Read 42 7. Read 19 8. Read 45 9. Read 44 10. Read 46 11. Read 43 12. Read 47 13. Read 18

Total hits? Total misses?

slide-11
SLIDE 11

11

Analysis of Example 2 (FA with blocking)

  • Miss Rate =

=

  • Advantages:
  • Disadvantages:
  • Measurement concepts:

– Hit Time: How long to look up something that is in cache – Miss Penalty: How long to fetch something not in cache # of misses # of accesses

slide-12
SLIDE 12

12

How big should the blocks be?

  • Keeping cache size N fixed,

– Smaller B means: – Larger B means:

  • Increasing block size tends to decrease miss rate, up to a point:
slide-13
SLIDE 13

13

Improving Hit Time with Direct Mapping

Problem: How to determine whether a block is in cache? Fully associative (previous examples):

  • Requested block could be anywhere in cache
  • Must search through all cache lines,
  • r keep an extra data structure

Direct Mapped Cache:

  • Assign index 0 through (N/B - 1) to each cache line
  • Each memory block is assigned one possible index
  • Formula:
slide-14
SLIDE 14

14 Example 3 – Direct-Mapped

Index Address Data Offset 0

Offset 1

1 2 3

Cache (N=8, B=2) Memory Processor

1. Read 42 2. Read 43 3. Read 18 4. Read 43 5. Read 17 6. Read 42 7. Read 19 8. Read 45 9. Read 44 10. Read 46 11. Read 43 12. Read 47 13. Read 18

Total hits? Total misses?

16 67

B L O C K

17 3 18 27

B L O C K

19 32

...

42 78

B L O C K

43 59 44 24

B L O C K

45 56 46 87

B L O C K

47 36 48 98

B L O C K

49 59

slide-15
SLIDE 15

15

Analysis of Example 3 (Direct mapped)

  • Miss Rate =

=

  • THREE kinds of misses:

– Compulsory miss (first time accessing) – Capacity miss (not enough room, got evicted) – Conflict miss (same index, would have enough room in FA)

  • Advantages:
  • Disadvantages:
  • Measurement concepts:

– Hit Time: How long to look up something that is in cache – Miss Penalty: How long to fetch something not in cache # of misses # of accesses

slide-16
SLIDE 16

16

Compromise: Set-Associative

Goal: Combine low miss rate of FA with good hit time of DM Fully associative (FA):

  • Requested block could be anywhere in cache

Direct Mapped (DM):

  • Assign index 0 through (N/B - 1) to each cache line
  • Each memory block is assigned one possible index

k-way Set Associative Cache:

  • Group cache lines into “sets” of k lines each
  • Each set has a DM index, 0 through N/(kB) - 1
  • Within each set, addresses can go anywhere (associative)
  • Formula:
slide-17
SLIDE 17

17 Example 4 – 2-way Set-Associative

Index Address Data Offset 0

Offset 1

1

Cache (N=8, B=2, k=2) Memory Processor

1. Read 42 2. Read 43 3. Read 18 4. Read 43 5. Read 17 6. Read 42 7. Read 19 8. Read 45 9. Read 44 10. Read 46 11. Read 43 12. Read 47 13. Read 18

Total hits? Total misses?

16 67

B L O C K

17 3 18 27

B L O C K

19 32

...

42 78

B L O C K

43 59 44 24

B L O C K

45 56 46 87

B L O C K

47 36 48 98

B L O C K

49 59

slide-18
SLIDE 18

18

Performance Tradeoffs

  • Block size

– Advantages of small B: – Advantages of large B: – Typical values: 64 bytes (bytes, not bits!)

  • Associativity

– Advantages of small k (DM): – Advantages of large k (SA, FA): – Typical values: 4, 8, 12

slide-19
SLIDE 19

19

What to do on a write?

Address Data

Cache (N=5, B=1, k=1)

20 7 21 3 22 27 23 32 24 101 25 78 26 34 27 87 28 53 29 93

Memory Processor

  • 1. Read 24
  • 2. Write 5 to 24
  • 3. Read 26
  • 4. Write 8 to 25
  • 5. Write 9 to 21
  • 6. Write 2 to 24
  • 7. Read 29

1 2 3 4

slide-20
SLIDE 20

20

Write Strategies

  • Write-through: Update memory immediately
  • Write-back: Update memory on eviction
slide-21
SLIDE 21

21

Write-back example

Address Data

Cache (N=5, B=1, k=1)

20 7 21 3 22 27 23 32 24 101 25 78 26 34 27 87 28 53 29 93

Memory Processor

  • 1. Read 24
  • 2. Write 5 to 24
  • 3. Read 26
  • 4. Write 8 to 25
  • 5. Write 9 to 21
  • 6. Write 2 to 24
  • 7. Read 29

1 2 3 4

slide-22
SLIDE 22

22 Given 2-way associative cache with N=64 and B=8, what is the set index for address 153? Formulas: NEW: (assuming dealing with powers-of-2)

  • a. Express in binary. (15310 = 9916 = 100110012)
  • b. Grab the right bits!

ByteOffset = Index = Tag =

Efficient Bit Manipulation

slide-23
SLIDE 23

23

Real Cache with Efficient Bit Manipulation

slide-24
SLIDE 24

24

Example #1: Bit Manipulation

1. Suppose direct-mapped cache has: – B=8 byte blocks – 2KiB cache Show how to break the following address into the tag, index, & byte offset.

0000 1000 0101 1100 0001 0001 0111 1001

2. Same cache, but now 4-way associative. How does this change things?

0000 1000 0101 1100 0001 0001 0111 1001

slide-25
SLIDE 25

25

Suppose a direct-mapped cache divides addresses as follows: What is the block size? The number of blocks? Total size of the cache? (usually refers to size of data only)

Example #2: Bit Manipulation

tag index byte offset

7 bits 4 bits 21 bits

slide-26
SLIDE 26

26

Review: Main concepts

Locality: Temporal and Spatial Each access: Hit or Miss Eviction strategies: Random or Least-Recently Used (LRU) Reasons for a Miss: Compulsory, Capacity, or Conflict Measurements: Miss Rate, Hit Time, Miss Penalty Cache types: Fully Associative, Direct-Mapped, or Set Associative Parameters: N (total size), B (size of block), k (associativity) Write strategies: Write-through or Write-back Implementation details: Stall, Valid bit, Dirty bit, Tag