Relaxed Persist Ordering Using Strand Persistency
Vaibhav Gogte, William Wang$, Stephan Diestelhorst$, Peter M. Chen, Satish Narayanasamy, Thomas F. Wenisch
ISCA 2020
$
Relaxed Persist Ordering Using Strand Persistency Vaibhav Gogte, - - PowerPoint PPT Presentation
Relaxed Persist Ordering Using Strand Persistency Vaibhav Gogte, William Wang $ , Stephan Diestelhorst $ , Peter M. Chen, Satish Narayanasamy, Thomas F. Wenisch $ ISCA 2020 Promise of persistent memory (PM) Performance Density Non-volatility
$
2
3
“Optane DC Persistent Memory will be
“… expanding memory per CPU socket to as much as 3TB.” *
* Source: www.extremetech.com
4
“Optane DC Persistent Memory will be
“… expanding memory per CPU socket to as much as 3TB.” *
* Source: www.extremetech.com
Byte-addressable, load-store interface to durable storage
5
CPU Writeback caches
6
CPU Writeback caches
7
Recovery can inspect PM data-structures to restore system to a consistent state CPU Writeback caches
8
CPU Writeback caches
for recovery
9
CPU Writeback caches
Consistency model
for recovery
10
CPU Writeback caches
Consistency model Persistency model
for recovery
11
CPU Writeback caches
Consistency model Persistency model
for recovery
12
CPU Writeback caches
Consistency model Persistency model
for recovery
13
CPU Writeback caches
Consistency model Persistency model
for recovery
14
15
16
17
– Builds strand persistency model in hardware – Specifies precise persist ordering constraints
– Can encode an arbitrary DAG
– Leverage hw primitives to build persistency models efficiently
18
– Builds strand persistency model in hardware – Specifies precise persist ordering constraints
– Can encode an arbitrary DAG
– Leverage hw primitives to build persistency models efficiently
19
20
21
22
23
Init: x = 0; y = 0 atomic_begin() x = 1; y = 2; atomic_end()
persistUndoLog (L) mutateData (M) commitLog (C) persistData (P)
24
Init: x = 0; y = 0 atomic_begin() x = 1; y = 2; atomic_end()
persistUndoLog (L) mutateData (M) commitLog (C) persistData (P)
25
Init: x = 0; y = 0 atomic_begin() x = 1; y = 2; atomic_end()
persistUndoLog (L) mutateData (M) commitLog (C) persistData (P)
26
atomic_begin() x = 1; y = 2; atomic_end()
Log(Ly,y) CLWB(Ly) Log(Lx,x) CLWB(Lx) Store(x,1) Store(y,2)
Log(Lx,x) CLWB(Lx) Store(x,1) Log(Ly,y) CLWB(Ly) Store(y,2)
SFENCE SFENCE
27
atomic_begin() x = 1; y = 2; atomic_end()
Log(Ly,y) CLWB(Ly) Log(Lx,x) CLWB(Lx) Store(x,1) Store(y,2)
Log(Lx,x) CLWB(Lx) Store(x,1) Log(Ly,y) CLWB(Ly) Store(y,2)
SFENCE SFENCE
28
atomic_begin() x = 1; y = 2; atomic_end()
Log(Ly,y) CLWB(Ly) Log(Lx,x) CLWB(Lx) Store(x,1) Store(y,2)
Log(Lx,x) CLWB(Lx) Store(x,1) Log(Ly,y) CLWB(Ly) Store(y,2)
SFENCE SFENCE
29
Hardware ISA
ISA primitives: PersistBarrier, NewStrand, JoinStrand
Compiler
Logging impl. that map to hardware primitives
High-level languages
Failure atomicity for language-level persistency models
30
Hardware ISA
ISA primitives: PersistBarrier, NewStrand, JoinStrand
Compiler
Logging impl. that map to hardware primitives
High-level languages
Failure atomicity for language-level persistency models
31
Strand 0 Strand 1
Persist A PersistBarrier Persist B
Orders persists within a thread ß
32
Strand 0 Strand 1
Persist A PersistBarrier Persist C Persist B
Orders persists within a thread ß
33
Strand 0 Strand 1
Persist A PersistBarrier NewStrand Persist C Persist B
Orders persists within a thread ß Initiates new stream of persists ß strand
34
Strand 0 Strand 1
Persist A PersistBarrier NewStrand JoinStrand Persist C Persist D Persist B
Orders persists within a thread ß Initiates new stream of persists ß strand
Merges prior initiated strands ß
35
CPU L1 Cache
Load-Store Queue
36
CPU L1 Cache
Load-Store Queue
Persist queue
Persist Queue
37
CPU L1 Cache
Load-Store Queue
SB0 … Strand Buffer Unit SB1 SBn
Persist queue
Persist Queue
Strand Buffer Unit
38
Persist Queue
CLWB(A) SB0
Strand Buffer Unit
SB1 NewStrand CLWB(B) JoinStrand CLWB(C)
Buffer Idx
CLWB(A) NewStrand JoinStrand CLWB(C) CLWB(B) Example code
CPU L1 Cache
39
Persist Queue
CLWB(A) SB0
Strand Buffer Unit
SB1 NewStrand CLWB(B) JoinStrand CLWB(C) A
Buffer Idx
CLWB(A) NewStrand JoinStrand CLWB(C) CLWB(B) Example code
CPU L1 Cache
40
Persist Queue
CLWB(A) SB0
Strand Buffer Unit
SB1 NewStrand CLWB(B) JoinStrand CLWB(C) A
Buffer Idx
CLWB(A) NewStrand JoinStrand CLWB(C) CLWB(B) Example code
CPU L1 Cache
41
Persist Queue
CLWB(A) SB0
Strand Buffer Unit
SB1 NewStrand CLWB(B) JoinStrand CLWB(C) A
Buffer Idx
B
CLWB(A) NewStrand JoinStrand CLWB(C) CLWB(B) Example code
CPU L1 Cache
42
Persist Queue
CLWB(A) SB0
Strand Buffer Unit
SB1 NewStrand CLWB(B) JoinStrand CLWB(C) A
Buffer Idx
B
JoinStrand stalls until prior CLWBs complete
CLWB(A) NewStrand JoinStrand CLWB(C) CLWB(B) Example code
CPU L1 Cache
43
Persist Queue
CLWB(A) SB0
Strand Buffer Unit
SB1 NewStrand CLWB(B) JoinStrand CLWB(C) A
Buffer Idx
B
CLWBs A and B flush data concurrently
CLWB(A) NewStrand JoinStrand CLWB(C) CLWB(B) Example code
CPU L1 Cache
JoinStrand stalls until prior CLWBs complete
44
Persist Queue
CLWB(A) SB0
Strand Buffer Unit
SB1 NewStrand CLWB(B) JoinStrand CLWB(C) A
Buffer Idx
B
JoinStrand stalls until prior CLWBs complete
CLWB(A) NewStrand JoinStrand CLWB(C) CLWB(B) Example code
CPU L1 Cache
45
Persist Queue
CLWB(A) SB0
Strand Buffer Unit
SB1 NewStrand CLWB(B) JoinStrand CLWB(C)
Buffer Idx JoinStrand stalls until prior CLWBs complete
CLWB(A) NewStrand JoinStrand CLWB(C) CLWB(B) Example code
CPU L1 Cache
46
Persist Queue
SB0
Strand Buffer Unit
SB1 CLWB(C)
Buffer Idx
CLWB(A) NewStrand JoinStrand CLWB(C) CLWB(B) Example code
CPU L1 Cache
C
47
Hardware ISA
ISA primitives: PersistBarrier, NewStrand, JoinStrand
Compiler
Logging impl. that map to hardware primitives
High-level languages
Failure atomicity for language-level persistency models
48
atomic_begin() x = 1; y = 2; atomic_end()
Log(Lx,x) CLWB(Lx) PersistBarrier Store(x,1) Log(Ly,y) CLWB(Ly) Store(y,2) PersistBarrier NewStrand JoinStrand
49
atomic_begin() x = 1; y = 2; atomic_end()
Log(Lx,x) CLWB(Lx) Store(x,1) Log(Ly,y) CLWB(Ly) Store(y,2) Log(Lx,x) CLWB(Lx) PersistBarrier Store(x,1) Log(Ly,y) CLWB(Ly) Store(y,2) PersistBarrier NewStrand
Strand 0 Strand 1
JoinStrand
50
atomic_begin() x = 1; y = 2; atomic_end()
Log(Lx,x) CLWB(Lx) Store(x,1) Log(Ly,y) CLWB(Ly) Store(y,2) Log(Lx,x) CLWB(Lx) PersistBarrier Store(x,1) Log(Ly,y) CLWB(Ly) Store(y,2) PersistBarrier NewStrand
Strand 0 Strand 1
JoinStrand
51
Hardware ISA
ISA primitives: PersistBarrier, NewStrand, JoinStrand
Compiler
Logging impl. that map to hardware primitives
High-level languages
Failure atomicity for language-level persistency models
52
L1.lock(); x -= 100; y += 100; L2.lock(); a -= 100; b += 100; L2.unlock(); L1.unlock();
53
L1.lock(); x -= 100; y += 100; L2.lock(); a -= 100; b += 100; L2.unlock(); L1.unlock();
54
L1.lock(); x -= 100; y += 100; L2.lock(); a -= 100; b += 100; L2.unlock(); L1.unlock();
– Queue: insert/delete entries in a queue – Hashmap: update values in persistent hash table – Array swaps: random swaps of array elements – RBTree: insert/delete entries in red-black tree – TPCC: new order transaction from TPCC
– N-Store [Arulraj15]: persistent KV-Store benchmark
55
0.5 1 1.5 2 2.5 Queue Hashmap Array Swap RB-Tree TPCC N-Store Mean Speedup Intel x86 HOPS StrandWeaver Non-atomic
56
StrandWeaver achieves avg. speedup of 1.5x compared to the baseline
1.5x 1.9x
0.5 1 1.5 2 2.5 Queue Hashmap Array Swap RB-Tree TPCC N-Store Mean Speedup Intel x86 HOPS StrandWeaver Non-atomic
57
1.5x 1.2x
StrandWeaver achieves avg. speedup of 1.2x over HOPS
0.5 1 1.5 2 2.5 Queue Hashmap Array Swap RB-Tree TPCC N-Store Mean Speedup Intel x86 HOPS StrandWeaver Non-atomic
58
StrandWeaver performance is within 4% of non-atomic design
4%
– Work together to relax ordering constraints in undo logging
59
$