Relaxed Persist Ordering Using Strand Persistency Vaibhav Gogte, - PowerPoint PPT Presentation

Relaxed Persist Ordering Using Strand Persistency Vaibhav Gogte, William Wang $ , Stephan Diestelhorst $ , Peter M. Chen, Satish Narayanasamy, Thomas F. Wenisch $ ISCA 2020

Promise of persistent memory (PM) Performance Density Non-volatility 2

Promise of persistent memory (PM) Performance * “Optane DC Persistent Memory will be Density offered in packages of up to 512GB per stick.” “… expanding memory per CPU socket to as much as 3TB.” Non-volatility * Source: www.extremetech.com 3

Promise of persistent memory (PM) Performance * “Optane DC Persistent Memory will be Density offered in packages of up to 512GB per stick.” “… expanding memory per CPU socket to as much as 3TB.” Non-volatility * Source: www.extremetech.com Byte-addressable, load-store interface to durable storage 4

Persistent memory system CPU Writeback caches DRAM Persistent Memory (PM) 5

Persistent memory system Failure CPU Writeback caches DRAM Persistent Memory (PM) 6

Persistent memory system Failure CPU Recovery Writeback caches DRAM Persistent Memory (PM) Recovery can inspect PM data-structures to restore system to a consistent state 7

Recovery requires PM access ordering Intel x86 primitives CPU St a = x for recovery Writeback caches St b = y PM 8

Recovery requires PM access ordering Intel x86 primitives Consistency CPU St a = x St a = x model for recovery Writeback caches St b = y St b = y PM 9

Recovery requires PM access ordering Intel x86 primitives Consistency CPU St a = x St a = x model Persistency for recovery Writeback caches model St b = y St b = y PM 10

Recovery requires PM access ordering Intel x86 primitives Consistency CPU St a = x St a = x model CLWB(a) Persistency for recovery Writeback caches model St b = y St b = y PM CLWB(b) 11

Recovery requires PM access ordering Intel x86 primitives Consistency CPU St a = x St a = x model CLWB(a) Persistency for recovery Writeback caches model SFENCE St b = y St b = y PM CLWB(b) 12

Recovery requires PM access ordering Intel x86 primitives Consistency CPU St a = x St a = x model CLWB(a) Persistency for recovery Writeback caches model SFENCE St b = y St b = y PM CLWB(b) Hardware systems provide primitives to express persist order to PM 13

Hardware imposes overly strict constraints St A = 1; CLWB (A) St B = 2; CLWB (B) St C = 3; CLWB (C) Ideal DAG A C B 14

Hardware imposes overly strict constraints St A = 1; CLWB (A) St A = 1; CLWB (A) SFENCE St B = 2; CLWB (B) St B = 2; CLWB (B) St C = 3; CLWB (C) St C = 3; CLWB (C) Ideal DAG DAG 1 A A C C B B 15

Hardware imposes overly strict constraints St A = 1 ; CLWB (A) St A = 1; CLWB (A) St A = 1; CLWB (A) St C = 3; CLWB (C) SFENCE St B = 2; CLWB (B) SFENCE St B = 2; CLWB (B) St C = 3; CLWB (C) St B = 2; CLWB (B) St C = 3; CLWB (C) Ideal DAG DAG 1 DAG 2 A A A C C C B B B 16

Hardware imposes overly strict constraints St A = 1 ; CLWB (A) St A = 1; CLWB (A) St A = 1; CLWB (A) St C = 3; CLWB (C) SFENCE St B = 2; CLWB (B) SFENCE St B = 2; CLWB (B) St C = 3; CLWB (C) St B = 2; CLWB (B) St C = 3; CLWB (C) Ideal DAG DAG 1 DAG 2 A A A C C C B B B Primitives in existing hardware systems overconstrain PM accesses 17

Contributions • Our proposal: StrandWeaver – Builds strand persistency model in hardware – Specifies precise persist ordering constraints • Comprises primitives: PersistBarrier , NewStrand , and JoinStrand – Can encode an arbitrary DAG • Map language-level persistency models to ISA level primitives – Leverage hw primitives to build persistency models efficiently 18

Contributions • Our proposal: StrandWeaver – Builds strand persistency model in hardware – Specifies precise persist ordering constraints • Comprises primitives: PersistBarrier , NewStrand , and JoinStrand – Can encode an arbitrary DAG • Map language-level persistency models to ISA level primitives – Leverage hw primitives to build persistency models efficiently StrandWeaver results in 1.45x (avg.) speedup over Intel x86 19

Outline • Contributions • Example: Failure atomicity • Existing hardware vs. strand persistency model • Our proposal: StrandWeaver • Evaluation 20

Example: Failure atomicity Failure atomicity : Which group of stores persist atomically? atomic_begin() x = 100; Failure-atomic region y = 200; atomic_end() 21

Example: Failure atomicity Failure atomicity : Which group of stores persist atomically? atomic_begin() x = 100; Failure-atomic region y = 200; atomic_end() Failure atomicity limits state that recovery can observe after failure 22

Undo logging for failure atomicity persistUndoLog (L) Init: x = 0; y = 0 atomic_begin() mutateData (M) x = 1; y = 2; persistData (P) atomic_end() commitLog (C) 23

Undo logging for failure atomicity persistUndoLog (L) Init: x = 0; y = 0 atomic_begin() mutateData (M) Failure- x = 1; atomic y = 2; persistData (P) atomic_end() commitLog (C) Undo logging steps ordered to ensure failure atomicity 24

Undo logging for failure atomicity persistUndoLog (L) Init: x = 0; y = 0 atomic_begin() mutateData (M) Failure- x = 1; atomic y = 2; persistData (P) atomic_end() commitLog (C) Undo logging steps ordered to ensure failure atomicity 25

Hardware imposes stricter constraints Ideal ordering SFENCE ordering Log(L x ,x) CLWB(L x ) SFENCE atomic_begin() Log(L x ,x) x = 1; Store(x,1) Log(L y ,y) CLWB(L x ) CLWB(L y ) y = 2; Log(L y ,y) Store(x,1) atomic_end() CLWB(L y ) Store(y,2) SFENCE Store(y,2) 26

StrandWeaver: Hardware Strand Persistency Model Failure atomicity for language-level persistency models High-level languages Logging impl. that map to hardware primitives Compiler ISA primitives: PersistBarrier, NewStrand, JoinStrand Hardware ISA 29

StrandWeaver: Hardware Strand Persistency Model Failure atomicity for language-level persistency models High-level languages Logging impl. that map to hardware primitives Compiler ISA primitives: PersistBarrier, NewStrand, JoinStrand Hardware ISA 30

StrandWeaver enables persist concurrency • Provides primitives to express precise persist order Strand 0 Strand 1 Persist A A Orders persists within a thread ß PersistBarrier Persist B B 31

StrandWeaver enables persist concurrency • Provides primitives to express precise persist order Strand 0 Strand 1 Persist A A Orders persists within a thread ß PersistBarrier Persist B B Persist C C 32

StrandWeaver enables persist concurrency • Provides primitives to express precise persist order Strand 0 Strand 1 Persist A A strand Orders persists within a thread ß PersistBarrier C Persist B B NewStrand Initiates new stream of persists ß Persist C 33

StrandWeaver enables persist concurrency • Provides primitives to express precise persist order Strand 0 Strand 1 Persist A A strand Orders persists within a thread ß PersistBarrier C Persist B B NewStrand Initiates new stream of persists ß Persist C JoinStrand Merges prior initiated strands ß D Persist D 34

StrandWeaver architecture CPU Load-Store Queue L1 Cache 35

StrandWeaver architecture CPU Persist queue Load-Store • Manages ongoing StrandWeaver primitives Queue Persist Queue • Orders CLWBs separated by JoinStrand L1 Cache 36

StrandWeaver architecture CPU Persist queue Load-Store • Manages ongoing StrandWeaver primitives Queue Persist Queue • Orders CLWBs separated by JoinStrand Strand Buffer Unit SB0 SB1 SBn L1 … • Issues CLWBs and flushes dirty cache lines Cache • Ensures CLWBs on diff. strands are concurrent • Monitors coherence reqs. for inter-thread order Strand Buffer Unit 37

Running example Persist Queue CLWB(A) Example code NewStrand CPU CLWB(B) CLWB(A) JoinStrand NewStrand CLWB(C) CLWB(B) SB0 SB1 JoinStrand L1 CLWB(C) Cache Buffer Idx Strand Buffer Unit 38

Running example Persist Queue CLWB(A) Example code NewStrand CPU CLWB(B) CLWB(A) JoinStrand NewStrand CLWB(C) CLWB(B) SB0 SB1 JoinStrand L1 A CLWB(C) Cache Buffer Idx Strand Buffer Unit 39

Running example Persist Queue CLWB(A) Example code NewStrand CPU CLWB(B) CLWB(A) JoinStrand NewStrand CLWB(C) CLWB(B) SB0 SB1 JoinStrand L1 A CLWB(C) Cache Buffer Idx Strand Buffer Unit 40

Running example Persist Queue CLWB(A) Example code NewStrand CPU CLWB(B) CLWB(A) JoinStrand NewStrand CLWB(C) CLWB(B) SB0 SB1 JoinStrand L1 A B CLWB(C) Cache Buffer Idx Strand Buffer Unit 41

Relaxed Persist Ordering Using Strand Persistency Vaibhav Gogte, - PowerPoint PPT Presentation

Relaxed Persist Ordering Using Strand Persistency Vaibhav Gogte, William Wang $ , Stephan Diestelhorst $ , Peter M. Chen, Satish Narayanasamy, Thomas F. Wenisch $ ISCA 2020 Promise of persistent memory (PM) Performance Density Non-volatility

Efficient Persist Barriers for Multicores Arpit Joshi, Vijay Nagarajan, Marcelo Cintra, Stratis

Zach Wassmouth Project Manager Parking Strand Theatre Strand Theatre Parking Area Strand

Lazy Persistency: a High-Performing and Write-Efficient Software Persistency Technique Mohammad

Persistency Programming 101 Why and What of memory persistency

ICA Annual Conference Reykjavik 2015 Session on UNESCO/PERSIST Draft Guidelines for Selection of

Relaxed Separation Logic Tutorial @ POPL14 Viktor Vafeiadis MPI-SWS 20 January 2014

Strand Persistency Vaibhav Gogte, William Wang $ , Stephan Diestelhorst $ , Peter M. Chen, Satish

A survey of SIP Peering Lars Strand (presenter) and Wolfgang Leister NATO ASI ARCHITECTS OF

PRESENTATION TO PPG AGM - 18 SEPTEMBER 2018 STRAND MEDICAL GROUP MOVE TO 1a THE CAUSEWAY (This is

Maximum Persistency in Energy Minimization Alexander Shekhovtsov, TU Graz June 25, 2014 1/26

Planning and Optimization C2. Delete Relaxation: Finding Relaxed Plans Malte Helmert and Gabriele

A solution of A solution of the cusp problem the cusp problem in relaxed halos in relaxed

5th STL Workshop, June 2005 Title: Relaxed weak queues: an alternative to run-relaxed heaps

SPECIAL SPECIAL MOBILITY MOBILITY STRAND STRAND GENERAL GENERAL RISK RISK THEORY THEORY

CHAMPAIGN UNIT #4 INTERNATIONAL PREP ACADEMY Board of Education Meeting 12.09.19 Our Char

Press release 25 th February 2013 Worlds first phonesat, STRaND-1, successfully launched

New Social Order ? Joo Silva Sequeira joao.silva.sequeira@tecnico.ulisboa.pt ECSS 2017,

Resegregation in the Bay Area @alexschafran alexschafran.com Understanding Resegregation 1.

What has worked and what has not in supporting fragile states a practitioners viewpoint Think

THE NEW DEAL COMMENT LA DONNE TRANSFORME LE MTIER DES ACTUAIRES ? ( "CODO ERGO SUM"

Macro View on the Aviation Industry Dublin January 2017 Peter Morris Chief Economist 1

Geothermal Energy Pilot Project Unterhaching, Germany Strasbourg, 15 September 2006 Presentation

Integrated solutions on providing the consumer proper- ties of ash and improvement of

ADVANCED MACHINE LEARNING Non-linear regression techniques (SVR and extensions, GPR, Gradient

Relaxed Persist Ordering Using Strand Persistency Vaibhav Gogte, - PowerPoint PPT Presentation

Relaxed Persist Ordering Using Strand Persistency Vaibhav Gogte, William Wang $ , Stephan Diestelhorst $ , Peter M. Chen, Satish Narayanasamy, Thomas F. Wenisch $ ISCA 2020 Promise of persistent memory (PM) Performance Density Non-volatility

Efficient Persist Barriers for Multicores Arpit Joshi, Vijay Nagarajan, Marcelo Cintra, Stratis

Zach Wassmouth Project Manager Parking Strand Theatre Strand Theatre Parking Area Strand

Lazy Persistency: a High-Performing and Write-Efficient Software Persistency Technique Mohammad

Persistency Programming 101 Why and What of memory persistency

ICA Annual Conference Reykjavik 2015 Session on UNESCO/PERSIST Draft Guidelines for Selection of

Relaxed Separation Logic Tutorial @ POPL14 Viktor Vafeiadis MPI-SWS 20 January 2014

Strand Persistency Vaibhav Gogte, William Wang $ , Stephan Diestelhorst $ , Peter M. Chen, Satish

A survey of SIP Peering Lars Strand (presenter) and Wolfgang Leister NATO ASI ARCHITECTS OF

PRESENTATION TO PPG AGM - 18 SEPTEMBER 2018 STRAND MEDICAL GROUP MOVE TO 1a THE CAUSEWAY (This is

Maximum Persistency in Energy Minimization Alexander Shekhovtsov, TU Graz June 25, 2014 1/26

Planning and Optimization C2. Delete Relaxation: Finding Relaxed Plans Malte Helmert and Gabriele

A solution of A solution of the cusp problem the cusp problem in relaxed halos in relaxed

5th STL Workshop, June 2005 Title: Relaxed weak queues: an alternative to run-relaxed heaps

SPECIAL SPECIAL MOBILITY MOBILITY STRAND STRAND GENERAL GENERAL RISK RISK THEORY THEORY

CHAMPAIGN UNIT #4 INTERNATIONAL PREP ACADEMY Board of Education Meeting 12.09.19 Our Char

Press release 25 th February 2013 Worlds first phonesat, STRaND-1, successfully launched

New Social Order ? Joo Silva Sequeira joao.silva.sequeira@tecnico.ulisboa.pt ECSS 2017,

Resegregation in the Bay Area @alexschafran alexschafran.com Understanding Resegregation 1.

What has worked and what has not in supporting fragile states a practitioners viewpoint Think

THE NEW DEAL COMMENT LA DONNE TRANSFORME LE MTIER DES ACTUAIRES ? ( &quot;CODO ERGO SUM&quot;

Macro View on the Aviation Industry Dublin January 2017 Peter Morris Chief Economist 1

Geothermal Energy Pilot Project Unterhaching, Germany Strasbourg, 15 September 2006 Presentation

Integrated solutions on providing the consumer proper- ties of ash and improvement of

ADVANCED MACHINE LEARNING Non-linear regression techniques (SVR and extensions, GPR, Gradient

THE NEW DEAL COMMENT LA DONNE TRANSFORME LE MTIER DES ACTUAIRES ? ( "CODO ERGO SUM"