Failure-atomic Synchronization-free Regions Vaibhav Gogte, Stephan - PowerPoint PPT Presentation

Failure-atomic Synchronization-free Regions Vaibhav Gogte, Stephan Diestelhorst $ , William Wang $ , Satish Narayanasamy, Peter M. Chen, Thomas F. Wenisch NVMW 2018, San Diego, CA 03/13/2018 $

Promise of persistent memory (PM) Non-volatility Performance Density Byte-addressable, load-store interface to storage 2

Persistent memory system Core- Core- Core- Core- 1 2 3 4 L1 $ L1 $ L1 $ L1 $ LLC Recovery DRAM Persistent Memory (PM) Recovery can inspect the data-structures in PM to restore system to a consistent state

Memory persistency models • Provide guarantees required for recoverable software – Academia [Condit ‘09][Pelley ‘14][Joshi ‘15][Kolli ‘17] … – Industry [Intel ‘14][ARM ‘16] • Express primitives to define state of the program after failure • Govern ordering constraints on persists to PM • Ensure failure atomicity for a group of persists 4

Semantics for failure-atomicity • Assures that either all or none of the updates visible post failure • Guaranteed by hardware, library or language implementations • Design space for semantics Transactions Individual persists Outermost critical sections [Intel ‘14][ARM ‘16][Joshi ’ 15]… [Coburn ‘11][Volos ‘11]… [Chakrabarti ‘14][Boehm ‘16] Have high performance Do not compose well with Do not provide clean overhead general sync. primitives failure semantics Existing mechanisms suffer a trade-off between programmability and performance 5

Contributions • Failure-atomic Synchronization-free Regions (SFR) – Extend clean semantics to post-failure recovery – Employ existing synchronization primitives in C++ • Propose failure-atomicity as a part of language implementation – Build compiler pass that emits logging code for persistent accesses • Propose two designs: Coupled-SFR and Decoupled-SFR • Achieve 65% better performance over state-of-the-art tech. 6

Outline • Design space for granularities of failure-atomicity • Our proposal: Failure-atomic SFRs – Coupled-SFR design – Decoupled-SFR design • Evaluation 7

Language-level persistency models [Chakrabarti ’ 14][Kolli ’ 17] • Enables writing portable, recoverable software • Extend language memory-model with persistency semantics • Persistency model guarantees: – Ordering: How can programmers order persists? – Failure-atomicity: Which group of stores persist atomically? 8

Persist ordering Thread 1 Thread 2 Core- Core- St A < hb St B 1 2 L1.acq(); L1 $ L1 $ A = 100; L1.rel(); St A < p St B L1.acq(); sw LLC B = 200; L1.rel(); b PM a Ascribe persist ordering using synchronization primitives in language 9

Why failure-atomicity? Task: Fill node and add to linked list, safely In-memory data fillNewNode() Fence updateTailPtr() 10

Why failure-atomicity? Task: Fill node and add to linked list, safely In-memory data Atomic updateTailPtr() fillNewNode() Failure-atomicity à Persistent memory programming easier 11

Granularity of failure-atomicity - I L1.lock(); Individual persists [Condit ‘09][Pelley ‘14][Joshi ‘16][Kolli ‘17] x -= 100; y += 100; Mechanisms ensure atomicity of individual persists • L2.lock(); Non-sequentially consistent state visible to recovery • a -= 100; à Need additional custom logging b += 100; L2.unlock(); L1.unlock(); 12

Granularity of failure-atomicity - II L1.lock(); x -= 100; Outer critical sections [Chakrabarti ‘14][Boehm ‘16] y += 100; • Guarantees recovery to observe SC state L2.lock(); à Easier to build recovery code a -= 100; Require complex dependency tracking between critical sections • b += 100; à > 2x performance cost L2.unlock(); L1.unlock(); 13

Our proposal: Failure-atomic SFRs l1.acq(); Synchronization free regions (SFR) x -= 100; SFR1 Thread regions delimited by y += 100; synchronization operations or l2.acq(); system calls a -= 100; SFR2 b += 100; l2.rel(); Persistent state atomically moves l1.rel(); from one sync. operation to the next 14

Failure-atomicity of SFRs • Persist SFRs in sequentially consistent order • Extends clean SC semantics to post-failure recovery • Allow hardware/compiler optimizations within SFR • Persist ordering – Synchronizing acquire and release ops in C++ • Failure-atomicity – Undo-logging for SFRs Two logging designs à Coupled-SFR and Decoupled-SFR 15

Undo-logging for SFRs createUndoLog (L) L1.acq(); mutateData (M) Failure- SFR1 x = 100; atomic L1.rel(); persistData (P) commitLog (C) SFR1 Need to ensure the ordering of steps in undo-logging for SFRs to be failure-atomic 16

Design 1: Coupled-SFR Thread 1 Thread 2 Thread 1 Thread 2 ACQ2 L1.acq(); L1 SFR1 x = 100; L2 L1.rel(); M1 L1.acq(); SFR1 sw SFR2 x = 200; M2 P1 L1.rel(); SFR2 + Persistent state lags execution by at most one SFR P2 C1 à Simpler implementation, latest state at failure - Need to flush updates at the end of each SFR C2 REL1 à High performance cost 17

Design 2: Decoupled-SFR • Coupled-SFR has simple design, but lower perf. – Persists and log commits on critical execution path L • Key idea: Decouple persistent state from program exec. – Persist updates and commit logs in background – Create undo logs in order – Roll back updates in reverse order of creation on failure 18

Decoupled-SFR in action Thread 1 Thread 1 Thread 2 Thread 2 L1.acq(); ACQ2 L1 Create logs in order SFR1 x = 100; during execution SFR1 L2 L1.rel(); L1.acq(); M1 sw SFR2 SFR2 x = 200; M2 L1.rel(); REL1 P1 Need to commit logs in order à P2 Flush and commit in background record order in which logs are created C1 C2 19

Log ordering in Decoupled-SFR Init X = 0 Thread 1 Thread 2 Thread 1 Thread 2 Header Header Sequence Table X = 100; Store X Acq L1 à 1 L1 0 L1.rel(); X = 0 Seq = 1 L1.acq(); sw X = 200; Rel L1 Store X Seq = 0 X = 100 Sequence numbers in logs record inter-thread order of log creation • Background threads commit logs using recorded sequence no. • 20

Evaluation setup • Designed our logging approaches in LLVM v3.6.0 – Instruments stores and sync. ops. to emit undo logs – Creates log space for managing per-thread undo-logs – Launches background threads to flush/commit logs in Decoupled-SFR • Workloads: write-intensive micro-benchmarks – 12 threads, 10M operations • Performed experiments on Intel E5-2683 v3 – 2GHz, 12 physical cores, 2-way hyper-threading 21

Performance evaluation 1.2 Atlas Coupled-SFR Decoupled-SFR No-persistency [Chakrabarti ’ 14] Normalized exec. time 1 0.8 65% 0.6 Better 0.4 0.2 0 CQ SPS PC RB-tree TATP LL TPCC Mean Decoupled-SFR performs 65% better than state-of-the-art ATLAS design 22

Performance evaluation 1.2 Atlas Coupled-SFR Decoupled-SFR No-persistency [Chakrabarti ’ 14] Normalized exec. time 1 0.8 0.6 Better 0.4 0.2 0 CQ SPS PC RB-tree TATP LL TPCC Mean Coupled-SFR performs better that Decoupled-SFR when fewer stores/SFR 23

Conclusion • Failure-atomic synchronization-free regions – Persistent state moves from one sync. operation to the next • Coupled-SFR design – Easy to reason about PM state after failure; high performance cost • Decoupled-SFR design – Persistent state lags execution; performs 65% better than ATLAS More details in our full paper on “ Persistency for Synchronization-free Regions ” appearing in PLDI ’18 24

Thank you! Questions? 25

Failure-atomic Synchronization-free Regions Vaibhav Gogte, Stephan - PowerPoint PPT Presentation

Failure-atomic Synchronization-free Regions Vaibhav Gogte, Stephan Diestelhorst $ , William Wang $ , Satish Narayanasamy, Peter M. Chen, Thomas F. Wenisch NVMW 2018, San Diego, CA 03/13/2018 $ Promise of persistent memory (PM) Non-volatility

Content Synchronization Content Synchronization March 2nd 2005 Jukka Honkola T-110.456

Health Failure Telehealth Final Report Sarah Briggs Heart Failure Specialist Nurse Heart Failure

DK - Batteridrevet vakuum lfter AL-Atomic 500 D - Batteriebetrieber Vakuumheber AL-Atomic 500

Chapter 6: Process [& Thread] Synchronization Why is synchronization needed? CSCI [4|6]

LOCK/WAIT FREE SYNCHRONIZATION Synchronization Mutex Blocking Lock-free At

Synchronization-Free Parallelism Today SPMD and OpenMP programming models

Chapter 7: Process Synchronization Background The Critical-Section Problem

Module 6: Process Synchronization Background The Critical-Section Problem

Chapter 7: Process Synchronization Background The Critical-Section Problem

Failure is a four-letter word Andreas Zeller Thomas Zimmermann Christian Bird PROMISE

Atomic page flip and mode setting Hardware structure and abstraction Atomic page flip The

regions and cities the role of the European Committee of the Regions Startup Europe Regions

Clock Synchronization Synchronization Clock Henrik Lnn Electronics & Software Volvo

synchronization.txt synchronization.txt Feb 2 2009 1:10 Page 1

File Synchronization with File Synchronization with Syxaw in an Ad-hoc Network Syxaw in an

CSCI [4|6] 730 Operating Systems Synchronization Part 1 : The Basics Maria Hybinette, UGA

Disclosure This presentation will reference the use of inhaled Post-discharge respiratory

OUTCOME evaluation step by step The webinar will begin at 1 p.m. Eastern time

The Quartic Matrix Model: Transseries, Resurgence and Resummation Stokes Phenomenon, Resurgence

Cubature formulae, flat extensions and convex relaxation. B. Mourrain INRIA M editerran

CSS 161 Fundamentals of Compu3ng Flow control, Debugging,

Maintenance Items Landslide affecting the Hwy 2 off-ramp. Active landslide movement is Primary

Matthew Series Lesson #129 July 24, 2016 Dean Bible Ministries www.deanbibleministries.org Dr.

LBNF Horns Update Cory Crowley BIWG December 05, 2019 Outline Horn A Design Status Horn

Failure-atomic Synchronization-free Regions Vaibhav Gogte, Stephan - PowerPoint PPT Presentation

Failure-atomic Synchronization-free Regions Vaibhav Gogte, Stephan Diestelhorst $ , William Wang $ , Satish Narayanasamy, Peter M. Chen, Thomas F. Wenisch NVMW 2018, San Diego, CA 03/13/2018 $ Promise of persistent memory (PM) Non-volatility

Content Synchronization Content Synchronization March 2nd 2005 Jukka Honkola T-110.456

Health Failure Telehealth Final Report Sarah Briggs Heart Failure Specialist Nurse Heart Failure

DK - Batteridrevet vakuum lfter AL-Atomic 500 D - Batteriebetrieber Vakuumheber AL-Atomic 500

Chapter 6: Process [&amp; Thread] Synchronization Why is synchronization needed? CSCI [4|6]

LOCK/WAIT FREE SYNCHRONIZATION Synchronization Mutex Blocking Lock-free At

Synchronization-Free Parallelism Today SPMD and OpenMP programming models

Chapter 7: Process Synchronization Background The Critical-Section Problem

Module 6: Process Synchronization Background The Critical-Section Problem

Chapter 7: Process Synchronization Background The Critical-Section Problem

Failure is a four-letter word Andreas Zeller Thomas Zimmermann Christian Bird PROMISE

Atomic page flip and mode setting Hardware structure and abstraction Atomic page flip The

regions and cities the role of the European Committee of the Regions Startup Europe Regions

Clock Synchronization Synchronization Clock Henrik Lnn Electronics &amp; Software Volvo

synchronization.txt synchronization.txt Feb 2 2009 1:10 Page 1

File Synchronization with File Synchronization with Syxaw in an Ad-hoc Network Syxaw in an

CSCI [4|6] 730 Operating Systems Synchronization Part 1 : The Basics Maria Hybinette, UGA

Disclosure This presentation will reference the use of inhaled Post-discharge respiratory

OUTCOME evaluation step by step The webinar will begin at 1 p.m. Eastern time

The Quartic Matrix Model: Transseries, Resurgence and Resummation Stokes Phenomenon, Resurgence

Cubature formulae, flat extensions and convex relaxation. B. Mourrain INRIA M editerran

CSS 161 Fundamentals of Compu3ng Flow control, Debugging,

Maintenance Items Landslide affecting the Hwy 2 off-ramp. Active landslide movement is Primary

Matthew Series Lesson #129 July 24, 2016 Dean Bible Ministries www.deanbibleministries.org Dr.

LBNF Horns Update Cory Crowley BIWG December 05, 2019 Outline Horn A Design Status Horn

Chapter 6: Process [& Thread] Synchronization Why is synchronization needed? CSCI [4|6]

Clock Synchronization Synchronization Clock Henrik Lnn Electronics & Software Volvo