A Fully Associative Software-Managed Cache Design Erik G. Hallnor - - PowerPoint PPT Presentation

a fully associative software managed cache design
SMART_READER_LITE
LIVE PREVIEW

A Fully Associative Software-Managed Cache Design Erik G. Hallnor - - PowerPoint PPT Presentation

A Fully Associative Software-Managed Cache Design Erik G. Hallnor and Steven K. Reindhardt Presented By: Maryam Sadooghi-Alvandi Motivation Two Trends: Growing DRAM access latency Multi-megabyte on-chip caches Motivation Two


slide-1
SLIDE 1

A Fully Associative Software-Managed Cache Design

Erik G. Hallnor and Steven K. Reindhardt

Presented By: Maryam Sadooghi-Alvandi

slide-2
SLIDE 2

Motivation

  • Two Trends:
  • Growing DRAM access latency
  • Multi-megabyte on-chip caches
slide-3
SLIDE 3

Motivation

  • Two Trends:
  • Growing DRAM access latency
  • Multi-megabyte on-chip caches

Re-examine how caches are

  • rganized and managed
slide-4
SLIDE 4

Motivation (II)

  • Cache-DRAM relationship similar to

DRAM-disk storage relationship

slide-5
SLIDE 5

Virtual Memory

  • Two mechanisms used in Virtual Memory:
  • Full Associativity
  • Software Management
slide-6
SLIDE 6

Virtual Memory

  • Two mechanisms used in Virtual Memory:
  • Full Associativity
  • Software Management

Apply these to Caches

slide-7
SLIDE 7

Goal

  • A fully associative software-managed cache
slide-8
SLIDE 8

The System

  • Consists of 2 parts:
  • The Indirect Index Cache (IIC)
  • The Generational Replacement Algorithm
slide-9
SLIDE 9

Potential Uses of A Software-Managed Cache

  • More sophisticated replacement algorithms
  • Reduced penalty for locking data
  • Arbitrary partitioning of the data store
slide-10
SLIDE 10

Conventional Caches

TAG TAG

TAG SET OFFSET

MATCH? MATCH?

slide-11
SLIDE 11

Conventional Caches

TAG TAG

TAG SET OFFSET

MATCH? MATCH?

slide-12
SLIDE 12

Conventional Caches

TAG TAG

TAG SET OFFSET

MATCH? MATCH?

slide-13
SLIDE 13

Full Associativity

DATA ARRAY

slide-14
SLIDE 14

Full Associativity

DATA ARRAY

Data block may be placed in any location

slide-15
SLIDE 15

Full Associativity

DATA ARRAY

Data block may be placed in any location

slide-16
SLIDE 16

Full Associativity

DATA ARRAY

Data block may be placed in any location

PC

slide-17
SLIDE 17

Full Associativity

DATA ARRAY

Data block may be placed in any location

PC

INDEX

slide-18
SLIDE 18

MATCH?

IIC

TAG

TAG OFFSET

HASH MATCH? MATCH? MATCH? MATCH?

slide-19
SLIDE 19

MATCH?

IIC

TAG

TAG OFFSET

HASH MATCH? MATCH? MATCH? MATCH?

slide-20
SLIDE 20

MATCH?

IIC

TAG

TAG OFFSET

HASH MATCH? MATCH? MATCH? MATCH?

slide-21
SLIDE 21

MATCH?

IIC

TAG

TAG OFFSET

HASH MATCH? MATCH? MATCH? MATCH?

TE TE TE TE

slide-22
SLIDE 22

MATCH?

IIC

TAG

TAG OFFSET

HASH MATCH? MATCH? MATCH? MATCH?

TE TE TE TE

TAG STATUS INDEX REPL.

slide-23
SLIDE 23

MATCH?

IIC

TAG

TAG OFFSET

HASH MATCH? MATCH? MATCH? MATCH?

TE TE TE TE

slide-24
SLIDE 24

MATCH?

IIC

TAG

TAG OFFSET

HASH MATCH? MATCH? MATCH? MATCH?

TE TE TE TE

slide-25
SLIDE 25

MATCH? MATCH?

IIC

TAG

TAG OFFSET

HASH MATCH? MATCH? MATCH? MATCH?

TE TE TE

INDEX

TE

slide-26
SLIDE 26

MATCH?

IIC

TAG

TAG OFFSET

HASH MATCH? MATCH? MATCH? MATCH?

TE TE TE TE

slide-27
SLIDE 27

MATCH?

IIC

TAG

TAG OFFSET

HASH MATCH? MATCH? MATCH? MATCH?

TE TE TE TE TE

slide-28
SLIDE 28

MATCH?

IIC

TAG

TAG OFFSET

HASH MATCH? MATCH? MATCH? MATCH?

TE TE TE TE TE

slide-29
SLIDE 29

MATCH?

IIC

TAG

TAG OFFSET

HASH MATCH? MATCH? MATCH? MATCH?

TE TE TE TE TE

slide-30
SLIDE 30

MATCH?

IIC

TAG

TAG OFFSET

HASH MATCH? MATCH? MATCH? MATCH?

TE TE TE TE TE

slide-31
SLIDE 31

INDEX

MATCH?

IIC

TAG

TAG OFFSET

HASH MATCH? MATCH? MATCH? MATCH?

TE TE TE TE TE

MATCH?

slide-32
SLIDE 32

MATCH?

IIC: Storage Overhead

TAG

TAG OFFSET

HASH MATCH? MATCH? MATCH? MATCH?

TE TE TE TE

TAG STATUS INDEX REPL.

TE TE

slide-33
SLIDE 33

MATCH?

IIC: Storage Overhead

TAG

TAG OFFSET

HASH MATCH? MATCH? MATCH? MATCH?

TE TE TE TE

TAG STATUS INDEX REPL.

TE TE

1

slide-34
SLIDE 34

MATCH?

IIC: Storage Overhead

TAG

TAG OFFSET

HASH MATCH? MATCH? MATCH? MATCH?

TE TE TE TE

TAG STATUS INDEX REPL.

TE TE

1 2

slide-35
SLIDE 35

MATCH?

IIC: Storage Overhead

TAG

TAG OFFSET

HASH MATCH? MATCH? MATCH? MATCH?

TE TE TE TE

TAG STATUS INDEX REPL.

TE TE

1 2 3

slide-36
SLIDE 36

MATCH?

IIC: Storage Overhead

TAG

TAG OFFSET

HASH MATCH? MATCH? MATCH? MATCH?

TE TE TE TE

TAG STATUS INDEX REPL.

TE TE

1 2 3 4 4

slide-37
SLIDE 37

MATCH?

IIC: Storage Overhead

TAG

TAG OFFSET

HASH MATCH? MATCH? MATCH? MATCH?

TE TE TE TE

TAG STATUS INDEX REPL.

TE TE

1 2 3 4 4 5

slide-38
SLIDE 38

IIC: Timing Overhead

  • Accessing tag and data arrays sequentially
  • Traversing the hash chain
  • Overhead of software management
slide-39
SLIDE 39

Generational Replacement: Motivation

L1 L2

slide-40
SLIDE 40

Generational Replacement: Motivation

repeatedly referenced

L1 L2

slide-41
SLIDE 41

Generational Replacement: Motivation

repeatedly referenced

L1 L2

L2 does not see all references

slide-42
SLIDE 42

Generational Replacement: Motivation

  • LRU looks at when block was last accessed
  • GR emphasizes how many times a block is

accessed over when it was accessed

  • Priority for staying given to blocks that are

repeatedly accessed

  • even though they may not be the most recently

used

  • Cope with long intervals between accesses
slide-43
SLIDE 43

FRESH POOL

On A Hit ...

1

POOL 0 POOL 1 POOL 2 POOL 3

least frequently used most frequently used

slide-44
SLIDE 44

FRESH POOL

On A Hit ...

1

POOL 0 POOL 1 POOL 2 POOL 3

HIT

least frequently used most frequently used

slide-45
SLIDE 45

FRESH POOL

On A Hit ...

1

POOL 0 POOL 1 POOL 2 POOL 3

1

HIT

least frequently used most frequently used

slide-46
SLIDE 46

FRESH POOL

On A Miss ...

1 1

POOL 0 POOL 1 POOL 2 POOL 3

least frequently used most frequently used

slide-47
SLIDE 47

FRESH POOL

On A Miss ...

1 1

POOL 0 POOL 1 POOL 2 POOL 3

least frequently used most frequently used

slide-48
SLIDE 48

FRESH POOL

On A Miss ...

1 1

POOL 0 POOL 1 POOL 2 POOL 3

least frequently used most frequently used

slide-49
SLIDE 49

FRESH POOL

On A Miss ...

1 1

POOL 0 POOL 1 POOL 2 POOL 3

least frequently used most frequently used

slide-50
SLIDE 50

FRESH POOL

On A Miss ...

1 1

POOL 0 POOL 1 POOL 2 POOL 3

least frequently used most frequently used

slide-51
SLIDE 51

FRESH POOL

On A Miss ...

1 1

POOL 0 POOL 1 POOL 2 POOL 3

least frequently used most frequently used

slide-52
SLIDE 52

FRESH POOL

On A Miss ...

1 1

POOL 0 POOL 1 POOL 2 POOL 3

least frequently used most frequently used

slide-53
SLIDE 53

FRESH POOL

On A Miss ...

1 1

POOL 0 POOL 1 POOL 2 POOL 3

EVICT

slide-54
SLIDE 54

B2

GR vs. LRU

B1 1

POOL 0 POOL 1 POOL 2

B3

slide-55
SLIDE 55

B2

GR vs. LRU

B1 1

POOL 0 POOL 1 POOL 2

B3

HIT

1

slide-56
SLIDE 56

B2

GR vs. LRU

B1 1

POOL 0 POOL 1 POOL 2

B3

HIT

1 1

slide-57
SLIDE 57

B2

GR vs. LRU

B1 1

POOL 0 POOL 1 POOL 2

B3

1

slide-58
SLIDE 58

B2

GR vs. LRU

B1 1

POOL 0 POOL 1 POOL 2

B3

1

MISS

2

slide-59
SLIDE 59

B2

GR vs. LRU

B1 1

POOL 0 POOL 1 POOL 2

B3

1

MISS

2

slide-60
SLIDE 60

B2

GR vs. LRU

B1 1

POOL 0 POOL 1 POOL 2

B3

1

MISS

2

slide-61
SLIDE 61

B2

GR vs. LRU

B1 1

POOL 0 POOL 1 POOL 2

B3

1

MISS

2

slide-62
SLIDE 62

B2

GR vs. LRU

B1 1

POOL 0 POOL 1 POOL 2

B3

1

slide-63
SLIDE 63

B2

GR vs. LRU

B1 1

POOL 0 POOL 1 POOL 2

B3

1

HIT

3

slide-64
SLIDE 64

B2

GR vs. LRU

B1 1

POOL 0 POOL 1 POOL 2

B3

HIT

3 3

slide-65
SLIDE 65

B2

GR vs. LRU

B1 1

POOL 0 POOL 1 POOL 2

B3

3

slide-66
SLIDE 66

B2

GR vs. LRU

B1 1

POOL 0 POOL 1 POOL 2

B3

3

HIT

4

slide-67
SLIDE 67

B2

GR vs. LRU

B1 1

POOL 0 POOL 1 POOL 2

B3

3

HIT

4

1

4

slide-68
SLIDE 68

B2

GR vs. LRU

B1 1

POOL 0 POOL 1 POOL 2

B3

3

1

4

slide-69
SLIDE 69

B2

GR vs. LRU

B1 1

POOL 0 POOL 1 POOL 2

B3

3

1

4

MISS

5

slide-70
SLIDE 70

B2

GR vs. LRU

B1 1

POOL 0 POOL 1 POOL 2

B3

3

1

4

MISS

5

slide-71
SLIDE 71

B2

GR vs. LRU

B1 1

POOL 0 POOL 1 POOL 2

B3

3

1

4

MISS

5

slide-72
SLIDE 72

B2

GR vs. LRU

B1 1

POOL 0 POOL 1 POOL 2

B3

3

1

4

MISS

5

slide-73
SLIDE 73

B2

GR vs. LRU

B1 1

POOL 0 POOL 1 POOL 2

B3

3 4

MISS

5

slide-74
SLIDE 74

B2

GR vs. LRU

B1 1

POOL 0 POOL 1 POOL 2

B3

3 4

MISS

5

slide-75
SLIDE 75

B2

GR vs. LRU

B1 1

POOL 0 POOL 1 POOL 2

B3

3 4

slide-76
SLIDE 76

B2

GR vs. LRU

B1 1

POOL 0 POOL 1 POOL 2

B3

3 4

MISS

6

slide-77
SLIDE 77

EVICT

B2

GR vs. LRU

B1 1

POOL 0 POOL 1 POOL 2

B3

3 4

MISS

6

slide-78
SLIDE 78

B2

GR vs. LRU

B1 1

POOL 0 POOL 1 POOL 2

3 4

MISS

6

slide-79
SLIDE 79

B2

GR vs. LRU

B1 1

POOL 0 POOL 1 POOL 2

3 4

MISS

6

slide-80
SLIDE 80

B2

GR vs. LRU

B1 1

POOL 0 POOL 1 POOL 2

3 4

MISS

6

slide-81
SLIDE 81

B2

GR vs. LRU

B1 1

POOL 0 POOL 1 POOL 2

3 4

slide-82
SLIDE 82

B2

GR vs. LRU

B1 1

POOL 0 POOL 1 POOL 2

3 4

MISS

7

slide-83
SLIDE 83

EVICT

B2

GR vs. LRU

B1 1

POOL 0 POOL 1 POOL 2

3 4

MISS

7

slide-84
SLIDE 84

GR: Timing Overhead

  • Number of operations on a miss:

proportional to the number of priority queues

slide-85
SLIDE 85

GR: Storage Overhead

  • FIFO queues: doubly linked lists
  • Two 12-bit pointers per block
  • Head and tail pointers per queue
  • Two full timestamps per queue
  • 8-bit timestamps per block
slide-86
SLIDE 86

Evaluation

  • Windows NT system address traces on

Intel Architecture platform

  • S/390 trace running OLTP workload
slide-87
SLIDE 87

Simulations

  • 64KB, 2-way, split L1 cache with 32B blocks
  • 1MB L2 cache
slide-88
SLIDE 88

Miss vs. Associativity

40,000 60,000 80,000 100,000 120,000 140,000 160,000 4 8 16 32 64 128 256 FA

LRU 128 LRU 256 LRU 512 OPT 128 OPT 256 OPT 512

BETTER

TPCC_LONG

slide-89
SLIDE 89

Miss vs. Associativity

40,000 60,000 80,000 100,000 120,000 140,000 160,000 4 8 16 32 64 128 256 FA

LRU 128 LRU 256 LRU 512 OPT 128 OPT 256 OPT 512

BETTER

LRU

}

TPCC_LONG

slide-90
SLIDE 90

Miss vs. Associativity

40,000 60,000 80,000 100,000 120,000 140,000 160,000 4 8 16 32 64 128 256 FA

LRU 128 LRU 256 LRU 512 OPT 128 OPT 256 OPT 512

BETTER

OPT

}

LRU

}

TPCC_LONG

slide-91
SLIDE 91

Miss vs. Associativity

50,000,000 100,000,000 150,000,000 200,000,000 250,000,000 4 8 16 32 64 128 256 FA

LRU 128 LRU 256 LRU 512 OPT 128 OPT 256 OPT 512

BETTER

OLTP1W

slide-92
SLIDE 92

Miss vs. Associativity

40,000 95,000 150,000 205,000 260,000 315,000 370,000 4 8 16 32 64 128 256 FA

LRU 128 LRU 256 LRU 512 OPT 128 OPT 256 OPT 512

BETTER

SPECWEB

slide-93
SLIDE 93

Miss vs. Associativity

40,000 95,000 150,000 205,000 260,000 315,000 370,000 4 8 16 32 64 128 256 FA

LRU 128 LRU 256 LRU 512 OPT 128 OPT 256 OPT 512

BETTER

SPECWEB

LRU

}

slide-94
SLIDE 94

Miss vs. Associativity

40,000 95,000 150,000 205,000 260,000 315,000 370,000 4 8 16 32 64 128 256 FA

LRU 128 LRU 256 LRU 512 OPT 128 OPT 256 OPT 512

BETTER

SPECWEB

OPT

}

LRU

}

slide-95
SLIDE 95

Results (I)

BETTER

5 10 15 20 25 Millions of Misses L R U 4

  • W

A Y + P G C O L O R + V I C T I M L R U 8

  • W

A Y L R U 1 6

  • W

A Y C L O C K F A G E N F A L R U F A O P T F A OLTP1W

slide-96
SLIDE 96

Miss vs. Associativity

BETTER

5 10 15 20 25 Millions of Misses L R U 4

  • W

A Y + P G C O L O R + V I C T I M L R U 8

  • W

A Y L R U 1 6

  • W

A Y C L O C K F A G E N F A L R U F A O P T F A OLTP1W

slide-97
SLIDE 97

Results (II)

BETTER

50 100 150 200 250 300 350 400 Thousands of Misses L R U 4

  • W

A Y + P G C O L O R + V I C T I M L R U 8

  • W

A Y L R U 1 6

  • W

A Y C L O C K F A G E N F A L R U F A O P T F A SPECWEB

slide-98
SLIDE 98

Miss vs. Associativity

BETTER

50 100 150 200 250 300 350 400 Thousands of Misses L R U 4

  • W

A Y + P G C O L O R + V I C T I M L R U 8

  • W

A Y L R U 1 6

  • W

A Y C L O C K F A G E N F A L R U F A O P T F A SPECWEB

slide-99
SLIDE 99

Results (III)

BETTER

20 40 60 80 100 120 140 160 180 Thousands of Misses L R U 4

  • W

A Y + P G C O L O R + V I C T I M L R U 8

  • W

A Y L R U 1 6

  • W

A Y C L O C K F A G E N F A L R U F A O P T F A TPCC_LONG

slide-100
SLIDE 100

Thank you.

slide-101
SLIDE 101

Generational Replacement: Motivation

  • L1 hits filtered from L2’s observed

reference stream

  • A block not referenced between 2 misses

more likely to be in the program’s working set