Strand Persistency Vaibhav Gogte, William Wang $ , Stephan - - PowerPoint PPT Presentation

strand persistency
SMART_READER_LITE
LIVE PREVIEW

Strand Persistency Vaibhav Gogte, William Wang $ , Stephan - - PowerPoint PPT Presentation

Strand Persistency Vaibhav Gogte, William Wang $ , Stephan Diestelhorst $ , Peter M. Chen, Satish Narayanasamy, Thomas F. Wenisch NVMW $ 03/12/2019 Promise of persistent memory (PM) Performance Density Non-volatility 2 Promise of


slide-1
SLIDE 1

Strand Persistency

Vaibhav Gogte, William Wang$, Stephan Diestelhorst$, Peter M. Chen, Satish Narayanasamy, Thomas F. Wenisch

NVMW 03/12/2019

$

slide-2
SLIDE 2

Promise of persistent memory (PM)

2

Non-volatility Performance Density

slide-3
SLIDE 3

Promise of persistent memory (PM)

3

Non-volatility Performance Density

“Optane DC Persistent Memory will be

  • ffered in packages of up to 512GB per stick.”

“… expanding memory per CPU socket to as much as 3TB.” *

* Source: www.extremetech.com

slide-4
SLIDE 4

Promise of persistent memory (PM)

4

Non-volatility Performance Density

“Optane DC Persistent Memory will be

  • ffered in packages of up to 512GB per stick.”

“… expanding memory per CPU socket to as much as 3TB.” *

* Source: www.extremetech.com

Byte-addressable, load-store interface to durable storage

slide-5
SLIDE 5

Persistent memory system

5

DRAM Persistent Memory (PM)

CPU Writeback caches

slide-6
SLIDE 6

Persistent memory system

6

DRAM Persistent Memory (PM)

CPU Writeback caches

Failure

slide-7
SLIDE 7

Persistent memory system

7

DRAM Recovery Persistent Memory (PM)

Recovery can inspect PM data-structures to restore system to a consistent state CPU Writeback caches

Failure

slide-8
SLIDE 8

Recovery requires PM access ordering

8

CPU Writeback caches

PM St a = x St b = y

for recovery

slide-9
SLIDE 9

Recovery requires PM access ordering

9

CPU Writeback caches

PM St b = y St a = x Intel x86 primitives

Consistency model

St a = x St b = y

for recovery

slide-10
SLIDE 10

Recovery requires PM access ordering

10

CPU Writeback caches

PM St b = y St a = x CLWB(b) Intel x86 primitives

Consistency model Persistency model

CLWB(a) St a = x St b = y

for recovery

slide-11
SLIDE 11

Recovery requires PM access ordering

11

CPU Writeback caches

PM St b = y St a = x CLWB(b) SFENCE Intel x86 primitives

Consistency model Persistency model

CLWB(a) St a = x St b = y

for recovery

slide-12
SLIDE 12

Recovery requires PM access ordering

12

Hardware systems provide primitives to express persist order to PM

CPU Writeback caches

PM St b = y St a = x CLWB(b) SFENCE Intel x86 primitives

Consistency model Persistency model

CLWB(a) St a = x St b = y

for recovery

slide-13
SLIDE 13

Hardware imposes overly strict constraints

13

St A = 1; CLWB (A) St B = 2; CLWB (B) St C = 3; CLWB (C) A B C Ideal DAG

slide-14
SLIDE 14

Hardware imposes overly strict constraints

14

St A = 1; CLWB (A) St B = 2; CLWB (B) St C = 3; CLWB (C) A B C Ideal DAG St A = 1; CLWB (A) SFENCE St B = 2; CLWB (B) St C = 3; CLWB (C) A B C DAG 1

slide-15
SLIDE 15

Hardware imposes overly strict constraints

15

St A = 1; CLWB (A) St B = 2; CLWB (B) St C = 3; CLWB (C) A B C Ideal DAG St A = 1; CLWB (A) SFENCE St B = 2; CLWB (B) St C = 3; CLWB (C) A B C DAG 1 St A = 1 ; CLWB (A) St C = 3; CLWB (C) SFENCE St B = 2; CLWB (B) A B C DAG 2

slide-16
SLIDE 16

Hardware imposes overly strict constraints

16

Primitives in existing hardware systems overconstrain PM accesses St A = 1; CLWB (A) St B = 2; CLWB (B) St C = 3; CLWB (C) A B C Ideal DAG St A = 1; CLWB (A) SFENCE St B = 2; CLWB (B) St C = 3; CLWB (C) A B C DAG 1 St A = 1 ; CLWB (A) St C = 3; CLWB (C) SFENCE St B = 2; CLWB (B) A B C DAG 2

slide-17
SLIDE 17

Contributions

  • Employ strand persistency [Pelley14]

– Hardware ISA primitives to specify precise ordering constraints

  • Comprises two primitives: PersistBarrier and NewStrand

– Can encode an arbitrary DAG

  • Map language-level persistency models to ISA level primitives

– Leverage strand persistency to build persistency models efficiently

17

slide-18
SLIDE 18

Contributions

  • Employ strand persistency [Pelley14]

– Hardware ISA primitives to specify precise ordering constraints

  • Comprises two primitives: PersistBarrier and NewStrand

– Can encode an arbitrary DAG

  • Map language-level persistency models to ISA level primitives

– Leverage strand persistency to build persistency models efficiently

18

Strand persistency improves perf. of language persistency models by 21.4% (avg.)

slide-19
SLIDE 19

Outline

  • Contributions
  • Example: Failure atomicity
  • Existing hardware primitives
  • Strand persistency
  • Evaluation

19

slide-20
SLIDE 20

Example: Failure atomicity

20

Failure-atomicity: Which group of stores persist atomically? atomic_begin() x = 100; y = 200; atomic_end() Failure-atomic region

slide-21
SLIDE 21

Example: Failure atomicity

21

Failure-atomicity: Which group of stores persist atomically? Failure-atomicity limits state that recovery can observe after failure atomic_begin() x = 100; y = 200; atomic_end() Failure-atomic region

slide-22
SLIDE 22

Undo-logging for failure atomicity

22

Init: x = 0; y = 0 atomic_begin() x = 1; y = 2; atomic_end()

persistUndoLog (L) mutateData (M) commitLog (C) persistData (P)

slide-23
SLIDE 23

Undo-logging for failure atomicity

23

Init: x = 0; y = 0 atomic_begin() x = 1; y = 2; atomic_end()

Failure- atomic

persistUndoLog (L) mutateData (M) commitLog (C) persistData (P)

Undo logging steps ordered to ensure failure-atomicity

slide-24
SLIDE 24

Undo-logging for failure atomicity

24

Init: x = 0; y = 0 atomic_begin() x = 1; y = 2; atomic_end()

Failure- atomic

persistUndoLog (L) mutateData (M) commitLog (C) persistData (P)

Undo logging steps ordered to ensure failure-atomicity

slide-25
SLIDE 25

Hardware imposes stricter constraints

25

atomic_begin() x = 1; y = 2; atomic_end()

Log(Lx,x) CLWB(Lx) Store(x,1) Log(Ly,y) CLWB(Ly) Store(y,2)

Ideal ordering

slide-26
SLIDE 26

Hardware imposes stricter constraints

26

atomic_begin() x = 1; y = 2; atomic_end()

Log(Ly,y) CLWB(Ly) Log(Lx,x) CLWB(Lx) Store(x,1) Store(y,2)

SFENCE ordering

Log(Lx,x) CLWB(Lx) Store(x,1) Log(Ly,y) CLWB(Ly) Store(y,2)

Ideal ordering

SFENCE SFENCE

slide-27
SLIDE 27

Hardware imposes stricter constraints

27

atomic_begin() x = 1; y = 2; atomic_end()

Log(Ly,y) CLWB(Ly) Log(Lx,x) CLWB(Lx) Store(x,1) Store(y,2)

SFENCE ordering

Log(Lx,x) CLWB(Lx) Store(x,1) Log(Ly,y) CLWB(Ly) Store(y,2)

Ideal ordering

SFENCE SFENCE

slide-28
SLIDE 28

Hardware imposes stricter constraints

28

atomic_begin() x = 1; y = 2; atomic_end()

Log(Ly,y) CLWB(Ly) Log(Lx,x) CLWB(Lx) Store(x,1) Store(y,2)

SFENCE ordering

Log(Lx,x) CLWB(Lx) Store(x,1) Log(Ly,y) CLWB(Ly) Store(y,2)

Ideal ordering

SFENCE SFENCE

slide-29
SLIDE 29

Strand persistency enables persist concurrency

  • Provides primitives to express precise persist order

29

Persist C

A B C

Persist A Persist B

slide-30
SLIDE 30

Strand persistency enables persist concurrency

  • Provides primitives to express precise persist order

30

Persist C

A B C

PersistBarrier

Orders persists within a thread ß

Persist A Persist B

slide-31
SLIDE 31

Strand persistency enables persist concurrency

  • Provides primitives to express precise persist order

31

Persist C

A B C

PersistBarrier

Orders persists within a thread ß

NewStrand

Initiates new stream of persists ß

Persist A

Strand 0 Strand 1

Persist B

slide-32
SLIDE 32

Strand persistency enables persist concurrency

  • Provides primitives to express precise persist order

32

Persist C

A B

PersistBarrier

Orders persists within a thread ß

NewStrand

Initiates new stream of persists ß

Persist A

Strand 0 Strand 1 strand

Persist B

C

slide-33
SLIDE 33

Strand persistency enables persist concurrency

  • Provides primitives to express precise persist order

33

Persist C

A B

PersistBarrier

Orders persists within a thread ß

NewStrand

Initiates new stream of persists ß

Persist A

Strand 0 Strand 1 Persists on different strands can be issued concurrently to PM strand

Persist B

C

slide-34
SLIDE 34

What if ordering is needed across strands?

  • Conflicting accesses establish persist order across strands

34

A B

Persist A Persist B PersistBarrier Strand 0 Strand 1

slide-35
SLIDE 35

What if ordering is needed across strands?

  • Conflicting accesses establish persist order across strands

35

A B A

Persist A Persist B PersistBarrier

C

Strand 0 Strand 1 NewStrand PersistBarrier Persist A Persist C

slide-36
SLIDE 36

What if ordering is needed across strands?

  • Conflicting accesses establish persist order across strands

36

A B A

Persist A Persist B PersistBarrier

C

Strand 0 Strand 1 NewStrand PersistBarrier Persist A Persist C

Inter-strand

  • rder
slide-37
SLIDE 37

Logging using strand persistency

37

atomic_begin() x = 1; y = 2; atomic_end()

Log(Lx,x) CLWB(Lx) Store(x,1) Log(Ly,y) CLWB(Ly) Store(y,2) Log(Lx,x) CLWB(Lx) PersistBarrier Store(x,1) Log(Ly,y) CLWB(Ly) Store(y,2) PersistBarrier NewStrand

Strand 0 Strand 1

slide-38
SLIDE 38

Logging using strand persistency

38

atomic_begin() x = 1; y = 2; atomic_end()

Log(Lx,x) CLWB(Lx) Store(x,1) Log(Ly,y) CLWB(Ly) Store(y,2) Log(Lx,x) CLWB(Lx) PersistBarrier Store(x,1) Log(Ly,y) CLWB(Ly) Store(y,2) PersistBarrier NewStrand

Strand 0 Strand 1 Need to implement log buffer that can manage concurrent log updates

slide-39
SLIDE 39

Log space under strand persistency

39

Invalid Log 0 Log 1 Invalid

Persistent head atomically commits logs Volatile tail for concurrent log creation

Log buffer

slide-40
SLIDE 40

Log space under strand persistency

40

Invalid Log 0 Log 1 Invalid

Persistent head atomically commits logs Volatile tail for concurrent log creation

Log buffer

  • Failure exposes log write reorderings

– Identify valid logs in case of failure – Record order of log creation – Recovery rolls back partial updates using valid logs More details in the paper

slide-41
SLIDE 41

Language persistency models to ISA primitives

41

Hardware ISA

ISA primitives: PersistBarrier and NewStrand

slide-42
SLIDE 42

Language persistency models to ISA primitives

42

Hardware ISA

ISA primitives: PersistBarrier and NewStrand

Compiler

Logging impl. that map to hardware primitives

slide-43
SLIDE 43

Language persistency models to ISA primitives

43

Hardware ISA

ISA primitives: PersistBarrier and NewStrand

Compiler

Logging impl. that map to hardware primitives

High-level languages

Failure atomicity for language-level persistency models

slide-44
SLIDE 44

Evaluation: Language-level persistency models

ATLAS [Chakrabarti14]

  • Failure-atomic outermost critical sections

44

L1.lock(); x -= 100; y += 100; L2.lock(); a -= 100; b += 100; L2.unlock(); L1.unlock();

slide-45
SLIDE 45

Evaluation: Language-level persistency models

ATLAS [Chakrabarti14]

  • Failure-atomic outermost critical sections

45

L1.lock(); x -= 100; y += 100; L2.lock(); a -= 100; b += 100; L2.unlock(); L1.unlock();

Coupled-SFR [Gogte18]

  • Failure-atomic synchronization-free regions
slide-46
SLIDE 46

Evaluation: Language-level persistency models

ATLAS [Chakrabarti14]

  • Failure-atomic outermost critical sections

46

L1.lock(); x -= 100; y += 100; L2.lock(); a -= 100; b += 100; L2.unlock(); L1.unlock();

Coupled-SFR [Gogte18]

  • Failure-atomic synchronization-free regions

Integrate our logging mechanisms with ATLAS and Coupled-SFR

slide-47
SLIDE 47

Methodology

  • Gem5 simulator
  • Workloads: write intensive micro-benchmarks

– Queue: insert/delete entries in a queue – Hashmap: update values in persistent hash table – Array swaps: random swaps of array elements – RBTree: insert/delete entries in red-black tree – TPCC: new order transaction from TPCC

47

slide-48
SLIDE 48

Performance evaluation

48

5 10 15 20 25 30 35 Queue Hashmap Array swap RBTree TPCC Mean

  • Perf. improvement (in %)

ATLAS Coupled-SFR Improves performance of ATLAS by up to 29.9% (18.2% avg.)

slide-49
SLIDE 49

Improves performance of Coupled-SFR by up to 34.5% (21.4% avg.)

Performance evaluation

49

5 10 15 20 25 30 35 Queue Hashmap Array swap RBTree TPCC Mean

  • Perf. improvement (in %)

ATLAS Coupled-SFR

slide-50
SLIDE 50

Conclusion

  • Strand persistency to precisely order persists
  • Two primitives: PersistBarrier and NewStrand

– Work together to relax ordering constraints in undo logging

  • Evaluation using language-level persistency models
  • Performance improvement of up to 34.5%

50

slide-51
SLIDE 51

Strand Persistency

Vaibhav Gogte, William Wang$, Stephan Diestelhorst$, Peter M. Chen, Satish Narayanasamy, Thomas F. Wenisch

NVMW 03/12/2019

$