Efficient Architectural Support for Persistent Memory
Vijay Nagarajan
Efficient Architectural Support for Persistent Memory Vijay - - PowerPoint PPT Presentation
Efficient Architectural Support for Persistent Memory Vijay Nagarajan People Marcelo Cintra (Intel) Stratis Viglas (Google) Arpit Joshi(Edinburgh) 2 Emerging System Core Core Cache Cache DRAM NVM Secondary Storage Secondary Storage
Vijay Nagarajan
2
Marcelo Cintra (Intel) Arpit Joshi(Edinburgh) Stratis Viglas (Google)
Core Cache NVM Secondary Storage Core Cache DRAM Secondary Storage
3
Core Cache NVM Secondary Storage Core Cache DRAM Secondary Storage
Software Controlled
3
Core Cache NVM Secondary Storage
Hardware Controlled
Core Cache DRAM Secondary Storage
Software Controlled
3
Core Cache NVM Secondary Storage
Hardware Controlled
Core Cache DRAM Secondary Storage
Software Controlled
3
Need Efficient Persistency Primitives!
4
Node Node 1 Node 2
HEAD Cache
Pseudo-code
Node Node 1 Node 2
HEAD NVM
5
Node Node 1 Node 2
HEAD Cache
Pseudo-code
Node Node 1 Node 2
HEAD NVM
5
Cache
Pseudo-code
Node Node 1 Node 2
HEAD NVM
5
Reordering of writes to NVM renders data inconsistent.
Node Node 1 Node 2
HEAD Cache
Pseudo-code
Node Node 1 Node 2
HEAD NVM
6
Node Node 1 Node 2
HEAD Cache
Pseudo-code
Node Node 1 Node 2
HEAD NVM
6
St a St b St c St a Persist Barrier St d St e St d Persist Barrier St p St q St d …
* Pelley et. al., “Memory Persistency”, in ISCA-2014. 7
Epoch 3
b c a a a c d e
Epoch 2
d e p
Visibility Persistence
b
Epoch 1
d q d
St a St b St c St a Persist Barrier St d St e St d Persist Barrier St p St q St d …
* Pelley et. al., “Memory Persistency”, in ISCA-2014. 7
Epoch 3
b c a a a c d e
Epoch 2
d e p
Visibility Persistence
b
Epoch 1
d q d
St a St b St c St a Persist Barrier St d St e St d Persist Barrier St p St q St d …
* Pelley et. al., “Memory Persistency”, in ISCA-2014. 7
Epoch 3
b c a a a c d e
Epoch 2
d e p
Visibility Persistence
b
Epoch 1
d q d
St a St b St c St a Persist Barrier St d St e St d Persist Barrier St p St q St d …
* Pelley et. al., “Memory Persistency”, in ISCA-2014. 7
Epoch 3
b c a a a c d e
Epoch 2
d e p
Visibility Persistence
b
Epoch 1
d q d
St a St b St c St a Persist Barrier St d St e St d Persist Barrier St p St q St d …
* Pelley et. al., “Memory Persistency”, in ISCA-2014. 7
Epoch 3
b c a a a c d e
Epoch 2
d e p
Visibility Persistence
b
Epoch 1
d q d
St a St b St c St a Persist Barrier St d St e St d Persist Barrier St p St q St d …
* Pelley et. al., “Memory Persistency”, in ISCA-2014. 7
Epoch 3
b c a a a c d e
Epoch 2
d e p
Visibility Persistence
b
Epoch 1
d q d
St a St b St c St a Persist Barrier St d St e St d Persist Barrier St p St q St d …
* Pelley et. al., “Memory Persistency”, in ISCA-2014. 7
Epoch 3
b c a a a c d e
Epoch 2
d e p
Visibility Persistence
b
Epoch 1
d q d
Persist operations happen in the critical path of execution.
8
* Pelley et. al., “Memory Persistency”, in ISCA-2014. * Condit et. al., “Better I/O through byte-addressable, persistent memory”, in SOSP-2009.
8
* Pelley et. al., “Memory Persistency”, in ISCA-2014. * Condit et. al., “Better I/O through byte-addressable, persistent memory”, in SOSP-2009.
Significant perf. improvement over strict barrier
d − Conflicting request
a b c e d
Epoch 2
a b a c e
Persistence Visibility
Epoch 1
p q d
Epoch 3
d
9
Cache Line Eviction
* Pelley et. al., “Memory Persistency”, in ISCA-2014. * Condit et. al., “Better I/O through byte-addressable, persistent memory”, in SOSP-2009.
d − Conflicting request
a b c e d
Epoch 2
a b a c e
Persistence Visibility
Epoch 1
p q d
Epoch 3
d
9
Cache Line Eviction
* Pelley et. al., “Memory Persistency”, in ISCA-2014. * Condit et. al., “Better I/O through byte-addressable, persistent memory”, in SOSP-2009.
d − Conflicting request
a b c e d
Epoch 2
a b a c e
Persistence Visibility
Epoch 1
p q d
Epoch 3
d
9
Cache Line Eviction
* Pelley et. al., “Memory Persistency”, in ISCA-2014. * Condit et. al., “Better I/O through byte-addressable, persistent memory”, in SOSP-2009.
d − Conflicting request
a b c e d
Epoch 2
a b a c e
Persistence Visibility
Epoch 1
p q d
Epoch 3
d
9
* Pelley et. al., “Memory Persistency”, in ISCA-2014. * Condit et. al., “Better I/O through byte-addressable, persistent memory”, in SOSP-2009.
d − Conflicting request
a b c e d
Epoch 2
a b a c e
Persistence Visibility
Epoch 1
p q d
Epoch 3
d
9
* Pelley et. al., “Memory Persistency”, in ISCA-2014. * Condit et. al., “Better I/O through byte-addressable, persistent memory”, in SOSP-2009.
Conflicts bring persist operations back in the critical path.
d − Conflicting request
a b c e d
Epoch 2
a b a c e
Persistence Visibility
Epoch 1
p q d
Epoch 3
d
9
* Pelley et. al., “Memory Persistency”, in ISCA-2014. * Condit et. al., “Better I/O through byte-addressable, persistent memory”, in SOSP-2009.
Conflicts bring persist operations back in the critical path. Intra-thread conflict
Thread
Epoch Epoch Epoch
Persistence Visibility Visibility
RY RX W
Z
W
B
W
E
W
F
E B A F
W
A
RP W
E
RQ
E Z
T0 Thread T1
RB
00
E10 E11 E
10
Thread
Epoch Epoch Epoch
Persistence Visibility Visibility
RY RX W
Z
W
B
W
E
W
F
E B A F
W
A
RP W
E
RQ
E Z
T0 Thread T1
RB
00
E10 E11 E
10
Thread
Epoch Epoch Epoch
Persistence Visibility Visibility
RY RX W
Z
W
B
W
E
W
F
E B A F
W
A
RP W
E
RQ
E Z
T0 Thread T1
RB
00
E10 E11 E
10
Thread
Epoch Epoch Epoch
Persistence Visibility Visibility
RY RX W
Z
W
B
W
E
W
F
E B A F
W
A
RP W
E
RQ
E Z
T0 Thread T1
RB
00
E10 E11 E
10
11
LB Lazy barrier LB+IDT Lazy barrier with inter-thread dependence tracking (IDT) LB+PF Lazy barrier with proactive flush (PF) LB++ Lazy barrier with both IDT and PF
Persist Barrier Designs
12
system simulation mode
memory controllers
13
Higher is Better
13
Higher is Better 15%
13
Higher is Better 22%
14
15
Atomic_Begin A = 1 B = 1 Atomic_End Initial State A B Final State A 1 B 1 Final State A B Final State A 1 B Final State A B 1
16
Write-Ahead-Logging REDO UNDO
➡read redirection ➡fine grained log->data ordering
writes old value to log
17
Core Cache NVM A
Data Log
A
writes old value to log
17
Core Cache NVM
A = 1
A
Data Log
A
writes old value to log
17
Core Cache NVM
A = 1 L(A) = 0
A A
Data Log
A
writes old value to log
17
Core Cache NVM
A = 1 L(A) = 0
A A
Data Log
Log Done
A 1
writes old value to log
17
Core Cache NVM
A = 1 L(A) = 0
A A
Data Log
Log Done
A 1
Posted Log: Offload log—> data at memory controller!
18
SQ Cache Mem Ctrl Memory
ST(A) L(A) L(A) WRITE L(A) L(A) L(A) ST(A)
Store Completion Time
19
SQ Cache Mem Ctrl Memory
ST(A) L(A) L(A) WRITE L(A) L(A) L(A) ST(A)
Store Completion Time
20
SQ Cache Mem Ctrl Memory
WRITE L(A) ST(A)
Store Completion Time
RDx(A) RDx(A) RDx(A)
Log at the source: reduce data movement
system simulation mode
memory controllers
BASE Baseline hardware undo log implementation ATOM Posted log writes to memory controller ATOM-OPT Posted log writes with source logging NON-ATOMIC No logging (Upper bound on performance)
Atomic Durability Designs
21
22
22
27%
22
27% Within 11% of the upper bound
23 * Ceze et. al., “BulkSC: Bulk enforcement of sequential consistency”, in ISCA-2007.
24
S +posted log
24
59%
S +posted log
24
17%
S +posted log
24
15%
S +posted log
24
20%
S +posted log
24
20% Checkpointing at 32% overhead
S +posted log
25
26
27
1 1.25 1.5 1.75 2 btree hash queue rbtree sdg sps gmean
ATOM sdTM HTM-undo DHTM
27
35%
1 1.25 1.5 1.75 2 btree hash queue rbtree sdg sps gmean
ATOM sdTM HTM-undo DHTM
27
15%
1 1.25 1.5 1.75 2 btree hash queue rbtree sdg sps gmean
ATOM sdTM HTM-undo DHTM
27
45%
1 1.25 1.5 1.75 2 btree hash queue rbtree sdg sps gmean
ATOM sdTM HTM-undo DHTM
27
17%
1 1.25 1.5 1.75 2 btree hash queue rbtree sdg sps gmean
ATOM sdTM HTM-undo DHTM
27
~60% overhead over volatile 17%
28
too?
barrier?
29