Failure-atomic Synchronization-free Regions Vaibhav Gogte, Stephan - - PowerPoint PPT Presentation

failure atomic synchronization free regions
SMART_READER_LITE
LIVE PREVIEW

Failure-atomic Synchronization-free Regions Vaibhav Gogte, Stephan - - PowerPoint PPT Presentation

Failure-atomic Synchronization-free Regions Vaibhav Gogte, Stephan Diestelhorst $ , William Wang $ , Satish Narayanasamy, Peter M. Chen, Thomas F. Wenisch NVMW 2018, San Diego, CA 03/13/2018 $ Promise of persistent memory (PM) Non-volatility


slide-1
SLIDE 1

Failure-atomic Synchronization-free Regions

Vaibhav Gogte, Stephan Diestelhorst$, William Wang$, Satish Narayanasamy, Peter M. Chen, Thomas F. Wenisch

$

NVMW 2018, San Diego, CA 03/13/2018

slide-2
SLIDE 2

Promise of persistent memory (PM)

2

Non-volatility Performance

Byte-addressable, load-store interface to storage

Density

slide-3
SLIDE 3

Core- 1 Core- 2 Core- 3 Core- 4

L1 $ L1 $ L1 $ L1 $ LLC DRAM Recovery

Recovery can inspect the data-structures in PM to restore system to a consistent state

Persistent memory system

Persistent Memory (PM)

slide-4
SLIDE 4

Memory persistency models

4

  • Provide guarantees required for recoverable software

– Academia [Condit ‘09][Pelley ‘14][Joshi ‘15][Kolli ‘17] … – Industry [Intel ‘14][ARM ‘16]

  • Express primitives to define state of the program after failure
  • Govern ordering constraints on persists to PM
  • Ensure failure atomicity for a group of persists
slide-5
SLIDE 5

Semantics for failure-atomicity

5

  • Assures that either all or none of the updates visible post failure
  • Guaranteed by hardware, library or language implementations
  • Design space for semantics

Do not provide clean failure semantics Have high performance

  • verhead

Do not compose well with general sync. primitives Individual persists

[Intel ‘14][ARM ‘16][Joshi ’15]…

Outermost critical sections

[Chakrabarti ‘14][Boehm ‘16]

Transactions

[Coburn ‘11][Volos ‘11]…

Existing mechanisms suffer a trade-off between programmability and performance

slide-6
SLIDE 6

Contributions

6

  • Failure-atomic Synchronization-free Regions (SFR)

– Extend clean semantics to post-failure recovery – Employ existing synchronization primitives in C++

  • Propose failure-atomicity as a part of language implementation

– Build compiler pass that emits logging code for persistent accesses

  • Propose two designs: Coupled-SFR and Decoupled-SFR
  • Achieve 65% better performance over state-of-the-art tech.
slide-7
SLIDE 7

Outline

7

  • Design space for granularities of failure-atomicity
  • Our proposal: Failure-atomic SFRs

– Coupled-SFR design – Decoupled-SFR design

  • Evaluation
slide-8
SLIDE 8

Language-level persistency models [Chakrabarti ’14][Kolli ’17]

  • Enables writing portable, recoverable software
  • Extend language memory-model with persistency semantics
  • Persistency model guarantees:

– Ordering: How can programmers order persists? – Failure-atomicity: Which group of stores persist atomically?

8

slide-9
SLIDE 9

Persist ordering

9

L1.acq(); A = 100; L1.rel(); L1.acq(); B = 200; L1.rel();

Thread 1 Thread 2

Core- 1 Core- 2

L1 $ L1 $ LLC PM

a b

St A <hb St B St A <p St B sw

Ascribe persist ordering using synchronization primitives in language

slide-10
SLIDE 10

Why failure-atomicity?

10

Task: Fill node and add to linked list, safely

fillNewNode() updateTailPtr()

In-memory data

Fence

slide-11
SLIDE 11

Why failure-atomicity?

11

Task: Fill node and add to linked list, safely

In-memory data

Failure-atomicity à Persistent memory programming easier

fillNewNode() updateTailPtr()

Atomic

slide-12
SLIDE 12

Granularity of failure-atomicity - I

12

L1.lock(); x -= 100; y += 100; L2.lock(); a -= 100; b += 100; L2.unlock(); L1.unlock();

Individual persists [Condit ‘09][Pelley ‘14][Joshi ‘16][Kolli ‘17]

  • Mechanisms ensure atomicity of individual persists
  • Non-sequentially consistent state visible to recovery

à Need additional custom logging

slide-13
SLIDE 13

Granularity of failure-atomicity - II

13

L1.lock(); x -= 100; y += 100; L2.lock(); a -= 100; b += 100; L2.unlock(); L1.unlock();

Outer critical sections

[Chakrabarti ‘14][Boehm ‘16]

  • Guarantees recovery to observe SC state

à Easier to build recovery code

  • Require complex dependency tracking between critical sections

à > 2x performance cost

slide-14
SLIDE 14

Our proposal: Failure-atomic SFRs

14

l1.acq(); x -= 100; y += 100; l2.acq(); a -= 100; b += 100; l2.rel(); l1.rel();

SFR1 SFR2

Thread regions delimited by synchronization operations or system calls Synchronization free regions (SFR) Persistent state atomically moves from one sync. operation to the next

slide-15
SLIDE 15

Failure-atomicity of SFRs

  • Persist SFRs in sequentially consistent order
  • Extends clean SC semantics to post-failure recovery
  • Allow hardware/compiler optimizations within SFR
  • Persist ordering

– Synchronizing acquire and release ops in C++

  • Failure-atomicity

– Undo-logging for SFRs

15

Two logging designs à Coupled-SFR and Decoupled-SFR

slide-16
SLIDE 16

Undo-logging for SFRs

16

L1.acq(); x = 100; L1.rel();

SFR1 Failure- atomic

createUndoLog (L) mutateData (M) commitLog (C) persistData (P)

SFR1 Need to ensure the ordering of steps in undo-logging for SFRs to be failure-atomic

slide-17
SLIDE 17

Design 1: Coupled-SFR

17

L1.acq(); x = 100; L1.rel();

SFR1

L1.acq(); x = 200; L1.rel();

SFR2 Thread 1 Thread 2

+ Persistent state lags execution by at most one SFR à Simpler implementation, latest state at failure

  • Need to flush updates at the end of each SFR

à High performance cost L1 M1 P1 C1 REL1

SFR1 Thread 1

L2 M2 P2 C2 ACQ2

SFR2 Thread 2 sw

slide-18
SLIDE 18

Design 2: Decoupled-SFR

  • Coupled-SFR has simple design, but lower perf.

– Persists and log commits on critical execution path L

  • Key idea: Decouple persistent state from program exec.

– Persist updates and commit logs in background – Create undo logs in order – Roll back updates in reverse order of creation on failure

18

slide-19
SLIDE 19

Decoupled-SFR in action

19

L1.acq(); x = 100; L1.rel();

SFR1

L1.acq(); x = 200; L1.rel();

SFR2 Thread 1 Thread 2

P1 C1 P2 C2 Flush and commit in background L1 M1 REL1

SFR1 Thread 1

L2 M2 ACQ2

SFR2 Thread 2

Create logs in order during execution

Need to commit logs in order à record order in which logs are created

sw

slide-20
SLIDE 20

Log ordering in Decoupled-SFR

20

X = 100; L1.rel(); L1.acq(); X = 200;

Thread 1 Thread 2 Thread 1 Header Thread 2 Header Store X X = 0 Rel L1 Seq = 0 Acq L1 Seq = 1 Store X X = 100

L1 0

Sequence Table

à 1

  • Sequence numbers in logs record inter-thread order of log creation
  • Background threads commit logs using recorded sequence no.

Init X = 0

sw

slide-21
SLIDE 21

Evaluation setup

  • Designed our logging approaches in LLVM v3.6.0

– Instruments stores and sync. ops. to emit undo logs – Creates log space for managing per-thread undo-logs – Launches background threads to flush/commit logs in Decoupled-SFR

  • Workloads: write-intensive micro-benchmarks

– 12 threads, 10M operations

  • Performed experiments on Intel E5-2683 v3

– 2GHz, 12 physical cores, 2-way hyper-threading

21

slide-22
SLIDE 22

Performance evaluation

22

0.2 0.4 0.6 0.8 1 1.2

CQ SPS PC RB-tree TATP LL TPCC Mean Normalized exec. time Atlas Coupled-SFR Decoupled-SFR No-persistency

65% Better

Decoupled-SFR performs 65% better than state-of-the-art ATLAS design

[Chakrabarti ’14]

slide-23
SLIDE 23

Performance evaluation

23

0.2 0.4 0.6 0.8 1 1.2

CQ SPS PC RB-tree TATP LL TPCC Mean Normalized exec. time Atlas Coupled-SFR Decoupled-SFR No-persistency

Better

Coupled-SFR performs better that Decoupled-SFR when fewer stores/SFR

[Chakrabarti ’14]

slide-24
SLIDE 24

Conclusion

  • Failure-atomic synchronization-free regions

– Persistent state moves from one sync. operation to the next

  • Coupled-SFR design

– Easy to reason about PM state after failure; high performance cost

  • Decoupled-SFR design

– Persistent state lags execution; performs 65% better than ATLAS

24

More details in our full paper on “Persistency for Synchronization-free Regions” appearing in PLDI ’18

slide-25
SLIDE 25

Thank you! Questions?

25