CacheQuery: Learning Replacement Policies from Hardware Caches - - PowerPoint PPT Presentation

cachequery learning replacement policies from hardware
SMART_READER_LITE
LIVE PREVIEW

CacheQuery: Learning Replacement Policies from Hardware Caches - - PowerPoint PPT Presentation

CacheQuery: Learning Replacement Policies from Hardware Caches Pepe Vila, Pierre Ganty, Marco Guarnieri, and Boris Kpf IMDEA Software Institute Microsoft Research PLDI 2020 Synthesis II Caches: those little although faster friends...


slide-1
SLIDE 1

CacheQuery: Learning Replacement Policies from Hardware Caches

Pepe Vila, Pierre Ganty, Marco Guarnieri, and Boris Köpf IMDEA Software Institute Microsoft Research PLDI 2020 Synthesis II

slide-2
SLIDE 2

2

  • Memory partitioned in memory blocks (64 bytes = 26)
  • Cache partitioned in equally sized cache sets (1024 = 210 = 256KB / (64 * 4)
  • Cache sets have capacity for N cache lines (also known as ways or associativity)

Tag Set Offset 10 6 Tag Data

256KBs Cache

Associativity Set 0 Set 1

=

block 0 1 2 3 ...

Memory CPU

memory address

Caches: those little although faster friends...

slide-3
SLIDE 3

3

  • Memory partitioned in memory blocks (64 bytes = 26)
  • Cache partitioned in equally sized cache sets (1024 = 210 = 256KB / (64 * 4)
  • Cache sets have capacity for N cache lines (also known as ways or associativity)

Tag Set Offset 10 6 Tag Data

256KBs Cache

Associativity Set 0 Set 1

=

block 0 1 2 3 ...

Memory CPU

memory address

Caches: those little although faster friends...

slide-4
SLIDE 4

Caches: those little although faster friends...

4

  • Memory partitioned in memory blocks (64 bytes = 26)
  • Cache partitioned in equally sized cache sets (1024 = 210 = 256KB / (64 * 4)
  • Cache sets have capacity for N cache lines (also known as ways or associativity)

Tag Set Offset 10 6 Tag Data

256KBs Cache

Associativity Set 0 Set 1

=

block 0 1 2 3 ...

Memory CPU

memory address

slide-5
SLIDE 5

5

  • Memory partitioned in memory blocks (64 bytes = 26)
  • Cache partitioned in equally sized cache sets (1024 = 210 = 256KB / (64 * 4)
  • Cache sets have capacity for N cache lines (also known as ways or associativity)

Tag Set Offset 10 6 Tag Data

256KBs Cache

Associativity Set 0 Set 1 block 0 1 2 3 ...

Memory CPU

memory address

=

HIT

Caches: those little although faster friends...

slide-6
SLIDE 6

6

  • Memory partitioned in memory blocks (64 bytes = 26)
  • Cache partitioned in equally sized cache sets (1024 = 210 = 256KB / (64 * 4)
  • Cache sets have capacity for N cache lines (also known as ways or associativity)

Tag Set Offset 10 6 Tag Data

256KBs Cache

Associativity Set 0 Set 1 block 0 1 2 3 ...

Memory CPU

memory address

=

HIT

64 bytes of data fast access time

Caches: those little although faster friends...

slide-7
SLIDE 7

7

  • Memory partitioned in memory blocks (64 bytes = 26)
  • Cache partitioned in equally sized cache sets (1024 = 210 = 256KB / (64 * 4)
  • Cache sets have capacity for N cache lines (also known as ways or associativity)

Tag Set Offset 10 6 Tag Data

256KBs Cache

Associativity Set 0 Set 1

=

block 0 1 2 3 ...

Memory CPU

memory address

Caches: those little although faster friends...

slide-8
SLIDE 8

8

  • Memory partitioned in memory blocks (64 bytes = 26)
  • Cache partitioned in equally sized cache sets (1024 = 210 = 256KB / (64 * 4)
  • Cache sets have capacity for N cache lines (also known as ways or associativity)

Tag Set Offset 10 6 Tag Data

256KBs Cache

Associativity Set 0 Set 1

=

block 0 1 2 3 ...

Memory CPU

memory address

MISS

Caches: those little although faster friends...

slide-9
SLIDE 9

9

  • Memory partitioned in memory blocks (64 bytes = 26)
  • Cache partitioned in equally sized cache sets (1024 = 210 = 256KB / (64 * 4)
  • Cache sets have capacity for N cache lines (also known as ways or associativity)

Tag Set Offset 10 6 Tag Data

256KBs Cache

Associativity Set 0 Set 1

=

block 0 1 2 3 ...

Memory CPU

memory address

MISS

replacement policy evicts one block

Caches: those little although faster friends...

slide-10
SLIDE 10

10

  • Memory partitioned in memory blocks (64 bytes = 26)
  • Cache partitioned in equally sized cache sets (1024 = 210 = 256KB / (64 * 4)
  • Cache sets have capacity for N cache lines (also known as ways or associativity)

Tag Set Offset 10 6 Tag Data

256KBs Cache

Associativity Set 0 Set 1

=

block 0 1 2 3 ...

Memory CPU

memory address

MISS

insert new block 64 bytes of data slow access time

Caches: those little although faster friends...

slide-11
SLIDE 11

11

Caches: their importance and impact

11

slide-12
SLIDE 12

12

Problem: cache as a black box

BLACKBOX CACHE f30 f40 f50 f30 15 16 14 4 MEMORY ADDRESSES TIME MEASUREMENTS

slide-13
SLIDE 13

Our approach for learning replacement policies

Program synthesis Automata learning Policy abstraction Hardware interface

Template Explanation

f30 f40 f50 f30 f30 f40 f50 f40 4c 4c 12c 12c 4c 4c 12c 4c A B C A A B C B H H M M H H M H h(0) h(1) m() _ _ 0

13

int missIdx (int[4] state) for(int i = 0; i < 4; i = i + 1) if(state[i] == 3) return i;

1 2 3 4

slide-14
SLIDE 14

CacheQuery: a hardware interface

CacheQuery

f30 f40 f50 f30 f30 f40 f50 f40 4c 4c 12c 12c 4c 4c 12c 4c A B C A A B C B H H M M H H M H

14

Program synthesis Automata learning Policy abstraction

Template Explanation

_ _ 0

int missIdx (int[4] state) for(int i = 0; i < 4; i = i + 1) if(state[i] == 3) return i;

h(0) h(1) m()

slide-15
SLIDE 15

15

CacheQuery: a hardware interface

15

slide-16
SLIDE 16

Polca: a cache automaton abstraction

Program synthesis Automata learning Polca CacheQuery

Template Explanation

f30 f40 f50 f30 f30 f40 f50 f40 4c 4c 12c 12c 4c 4c 12c 4c A B C A A B C B H H M M H H M H h(0) h(1) m() _ _ 0

16

int missIdx (int[4] state) for(int i = 0; i < 4; i = i + 1) if(state[i] == 3) return i;

Polca: a cache policy automaton abstraction

slide-17
SLIDE 17

17 17

Polca: a cache policy automaton abstraction

Polca = Mapper

A B C A A B C B H H M M H H M H h(0) h(1) m() _ _ 0

Abstract automaton Replacement policy Concrete automaton Cache management

keep track

  • f content

Input:

{h(0), h(1), ..., h(n-1), m()} {A, B, C, ….}

Output:

{_, 0, 1, …, n-1} {H, M}

slide-18
SLIDE 18

18

Caches: those little although faster friends...

18

slide-19
SLIDE 19

LearnLib: an automata learning framework

Program synthesis Automata Learning Polca CacheQuery

Template Explanation

f30 f40 f50 f30 f30 f40 f50 f40 4c 4c 12c 12c 4c 4c 12c 4c A B C A A B C B H H M M H H M H h(0) h(1) m() _ _ 0

19

int missIdx (int[4] state) for(int i = 0; i < 4; i = i + 1) if(state[i] == 3) return i;

slide-20
SLIDE 20

20

LearnLib: an automata learning framework

20

  • LearnLib is an open source Java framework for automata learning developed at the TU Dortmund -

https://learnlib.de/

  • Angluin’s L* algorithm has been extended to Mealy machines:

○ Membership queries replaced by output queries ○ Equivalence queries approximated by test sequences for conformance testing ○ Reset sequence is bootstrapping problem, we solve it with Flush+Refill

WP-method: test sequence selection - given an upper bound on the number of states of the System Under Learning (SUL), guarantees equivalence

slide-21
SLIDE 21

Sketch: synthesizing programs as explanations

Program synthesis Automata Learning Polca CacheQuery

Template Explanation

f30 f40 f50 f30 f30 f40 f50 f40 4c 4c 12c 12c 4c 4c 12c 4c A B C A A B C B H H M M H H M H h(0) h(1) m() _ _ 0

21

int missIdx (int[4] state) for(int i = 0; i < 4; i = i + 1) if(state[i] == 3) return i;

slide-22
SLIDE 22

22

Sketch: synthesizing programs as explanations

22

slide-23
SLIDE 23

23

Sketch: synthesizing programs as explanations

23

Domain knowledge or high-level view of a replacement policy:

  • Each block has an associated age
  • Promotion rule decides how the ages are updated upon a hit
  • Replacement rule decides which block is evicted upon a miss
  • Insertion rule decides the age of a new block

We use it to “sketch” a template for replacement policies and encode the automaton’s output and transition functions as constraints!

slide-24
SLIDE 24

24

Sketch: synthesizing programs as explanations

24

hit (state, line) :: States×Lines → States state = promote(state, line) state = normalize(state, line) return state miss (state) :: States → States×Lines Lines idx = -1 state = normalize(state, idx) idx = evict(state) state[idx] = insert(state, idx) state = normalize(state, idx) return ⟨state, idx⟩

slide-25
SLIDE 25

25

Sketch: synthesizing programs as explanations

25

hit (state, line) :: States×Lines → States state = promote(state, line) state = normalize(state, line) return state miss (state) :: States → States×Lines Lines idx = -1 state = normalize(state, idx) idx = evict(state) state[idx] = insert(state, idx) state = normalize(state, idx) return ⟨state, idx⟩ promote (state, pos) :: States×Lines → States States final = state if (??{boolExpr(state[pos])}) final[pos] = ??{natExpr(state[pos])} for(i in Lines) if(i != pos ∧ ??{boolExpr(state[pos], state[i])}) final[i] = ??{natExpr(state[i])} return final

slide-26
SLIDE 26

26

Results

26

slide-27
SLIDE 27

27

Results

27

int[4] hitState (int[4] state, int pos) int[4] final = state; // Promotion if (final[pos] > 1) final[pos] = 1; else final[pos] = 0; // Is there a block with age 3? bit found = 0; for(int j = 0; j < 4; j = j + 1) if(!found) for(int i = 0; i < 4; i = i + 1) if(!found && final[i] == 3) found = 1; // If not, increase all blocks if(!found) for(int i = 0; i < 4; i = i + 1) final[i] = final[i] + 1; return final; // Replace first block with age 3 starting from the left int missIdx (int[4] state) for(int i = 0; i < 4; i = i + 1) if(state[i] == 3) return i; int[4] missState (int[4] state) int[4] final = state; int replace = missIdx(state); // Insertion final[replace] = 1; // Is there a block with age 3? bit found = 0; for(int j = 0; j < 4; j = j + 1) if(!found) for(int i = 0; i < 4; i = i + 1) if(!found && final[i] == 3) found = 1; // If not, increase all blocks if(!found) for(int i = 0; i < 4; i = i + 1) final[i] = final[i] + 1; return final;

Description of Skylake/Kaby Lake L3’s (New2):

Initial insertion on a flushed cache set:

int[4] s0 = {3,3,3,3};

slide-28
SLIDE 28

28

Thank you for listening! Questions?

28 https://github.com/cgvwzq/cachequery https://github.com/cgvwzq/polca https://arxiv.org/pdf/1912.09770.pdf