failure atomic synchronization free regions
play

Failure-atomic Synchronization-free Regions Vaibhav Gogte, Stephan - PowerPoint PPT Presentation

Failure-atomic Synchronization-free Regions Vaibhav Gogte, Stephan Diestelhorst $ , William Wang $ , Satish Narayanasamy, Peter M. Chen, Thomas F. Wenisch NVMW 2018, San Diego, CA 03/13/2018 $ Promise of persistent memory (PM) Non-volatility


  1. Failure-atomic Synchronization-free Regions Vaibhav Gogte, Stephan Diestelhorst $ , William Wang $ , Satish Narayanasamy, Peter M. Chen, Thomas F. Wenisch NVMW 2018, San Diego, CA 03/13/2018 $

  2. Promise of persistent memory (PM) Non-volatility Performance Density Byte-addressable, load-store interface to storage 2

  3. Persistent memory system Core- Core- Core- Core- 1 2 3 4 L1 $ L1 $ L1 $ L1 $ LLC Recovery DRAM Persistent Memory (PM) Recovery can inspect the data-structures in PM to restore system to a consistent state

  4. Memory persistency models • Provide guarantees required for recoverable software – Academia [Condit ‘09][Pelley ‘14][Joshi ‘15][Kolli ‘17] … – Industry [Intel ‘14][ARM ‘16] • Express primitives to define state of the program after failure • Govern ordering constraints on persists to PM • Ensure failure atomicity for a group of persists 4

  5. Semantics for failure-atomicity • Assures that either all or none of the updates visible post failure • Guaranteed by hardware, library or language implementations • Design space for semantics Transactions Individual persists Outermost critical sections [Intel ‘14][ARM ‘16][Joshi ’ 15]… [Coburn ‘11][Volos ‘11]… [Chakrabarti ‘14][Boehm ‘16] Have high performance Do not compose well with Do not provide clean overhead general sync. primitives failure semantics Existing mechanisms suffer a trade-off between programmability and performance 5

  6. Contributions • Failure-atomic Synchronization-free Regions (SFR) – Extend clean semantics to post-failure recovery – Employ existing synchronization primitives in C++ • Propose failure-atomicity as a part of language implementation – Build compiler pass that emits logging code for persistent accesses • Propose two designs: Coupled-SFR and Decoupled-SFR • Achieve 65% better performance over state-of-the-art tech. 6

  7. Outline • Design space for granularities of failure-atomicity • Our proposal: Failure-atomic SFRs – Coupled-SFR design – Decoupled-SFR design • Evaluation 7

  8. Language-level persistency models [Chakrabarti ’ 14][Kolli ’ 17] • Enables writing portable, recoverable software • Extend language memory-model with persistency semantics • Persistency model guarantees: – Ordering: How can programmers order persists? – Failure-atomicity: Which group of stores persist atomically? 8

  9. Persist ordering Thread 1 Thread 2 Core- Core- St A < hb St B 1 2 L1.acq(); L1 $ L1 $ A = 100; L1.rel(); St A < p St B L1.acq(); sw LLC B = 200; L1.rel(); b PM a Ascribe persist ordering using synchronization primitives in language 9

  10. Why failure-atomicity? Task: Fill node and add to linked list, safely In-memory data fillNewNode() Fence updateTailPtr() 10

  11. Why failure-atomicity? Task: Fill node and add to linked list, safely In-memory data Atomic updateTailPtr() fillNewNode() Failure-atomicity à Persistent memory programming easier 11

  12. Granularity of failure-atomicity - I L1.lock(); Individual persists [Condit ‘09][Pelley ‘14][Joshi ‘16][Kolli ‘17] x -= 100; y += 100; Mechanisms ensure atomicity of individual persists • L2.lock(); Non-sequentially consistent state visible to recovery • a -= 100; à Need additional custom logging b += 100; L2.unlock(); L1.unlock(); 12

  13. Granularity of failure-atomicity - II L1.lock(); x -= 100; Outer critical sections [Chakrabarti ‘14][Boehm ‘16] y += 100; • Guarantees recovery to observe SC state L2.lock(); à Easier to build recovery code a -= 100; Require complex dependency tracking between critical sections • b += 100; à > 2x performance cost L2.unlock(); L1.unlock(); 13

  14. Our proposal: Failure-atomic SFRs l1.acq(); Synchronization free regions (SFR) x -= 100; SFR1 Thread regions delimited by y += 100; synchronization operations or l2.acq(); system calls a -= 100; SFR2 b += 100; l2.rel(); Persistent state atomically moves l1.rel(); from one sync. operation to the next 14

  15. Failure-atomicity of SFRs • Persist SFRs in sequentially consistent order • Extends clean SC semantics to post-failure recovery • Allow hardware/compiler optimizations within SFR • Persist ordering – Synchronizing acquire and release ops in C++ • Failure-atomicity – Undo-logging for SFRs Two logging designs à Coupled-SFR and Decoupled-SFR 15

  16. Undo-logging for SFRs createUndoLog (L) L1.acq(); mutateData (M) Failure- SFR1 x = 100; atomic L1.rel(); persistData (P) commitLog (C) SFR1 Need to ensure the ordering of steps in undo-logging for SFRs to be failure-atomic 16

  17. Design 1: Coupled-SFR Thread 1 Thread 2 Thread 1 Thread 2 ACQ2 L1.acq(); L1 SFR1 x = 100; L2 L1.rel(); M1 L1.acq(); SFR1 sw SFR2 x = 200; M2 P1 L1.rel(); SFR2 + Persistent state lags execution by at most one SFR P2 C1 à Simpler implementation, latest state at failure - Need to flush updates at the end of each SFR C2 REL1 à High performance cost 17

  18. Design 2: Decoupled-SFR • Coupled-SFR has simple design, but lower perf. – Persists and log commits on critical execution path L • Key idea: Decouple persistent state from program exec. – Persist updates and commit logs in background – Create undo logs in order – Roll back updates in reverse order of creation on failure 18

  19. Decoupled-SFR in action Thread 1 Thread 1 Thread 2 Thread 2 L1.acq(); ACQ2 L1 Create logs in order SFR1 x = 100; during execution SFR1 L2 L1.rel(); L1.acq(); M1 sw SFR2 SFR2 x = 200; M2 L1.rel(); REL1 P1 Need to commit logs in order à P2 Flush and commit in background record order in which logs are created C1 C2 19

  20. Log ordering in Decoupled-SFR Init X = 0 Thread 1 Thread 2 Thread 1 Thread 2 Header Header Sequence Table X = 100; Store X Acq L1 à 1 L1 0 L1.rel(); X = 0 Seq = 1 L1.acq(); sw X = 200; Rel L1 Store X Seq = 0 X = 100 Sequence numbers in logs record inter-thread order of log creation • Background threads commit logs using recorded sequence no. • 20

  21. Evaluation setup • Designed our logging approaches in LLVM v3.6.0 – Instruments stores and sync. ops. to emit undo logs – Creates log space for managing per-thread undo-logs – Launches background threads to flush/commit logs in Decoupled-SFR • Workloads: write-intensive micro-benchmarks – 12 threads, 10M operations • Performed experiments on Intel E5-2683 v3 – 2GHz, 12 physical cores, 2-way hyper-threading 21

  22. Performance evaluation 1.2 Atlas Coupled-SFR Decoupled-SFR No-persistency [Chakrabarti ’ 14] Normalized exec. time 1 0.8 65% 0.6 Better 0.4 0.2 0 CQ SPS PC RB-tree TATP LL TPCC Mean Decoupled-SFR performs 65% better than state-of-the-art ATLAS design 22

  23. Performance evaluation 1.2 Atlas Coupled-SFR Decoupled-SFR No-persistency [Chakrabarti ’ 14] Normalized exec. time 1 0.8 0.6 Better 0.4 0.2 0 CQ SPS PC RB-tree TATP LL TPCC Mean Coupled-SFR performs better that Decoupled-SFR when fewer stores/SFR 23

  24. Conclusion • Failure-atomic synchronization-free regions – Persistent state moves from one sync. operation to the next • Coupled-SFR design – Easy to reason about PM state after failure; high performance cost • Decoupled-SFR design – Persistent state lags execution; performs 65% better than ATLAS More details in our full paper on “ Persistency for Synchronization-free Regions ” appearing in PLDI ’18 24

  25. Thank you! Questions? 25

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend