Efficient Hardware-assisted Logging with Asynchronous and Direct - - PowerPoint PPT Presentation
Efficient Hardware-assisted Logging with Asynchronous and Direct - - PowerPoint PPT Presentation
Efficient Hardware-assisted Logging with Asynchronous and Direct Update for Persistent Memory Jungi Jeong , Chang Hyun Park, Jaehyuk Huh, and Seungryoul Maeng International Symposium on Microarchitecture (MICRO) 2018 Storage-Class Memory
Memory- mapped
Storage-Class Memory
NVDIMMs Application
Load/Store User Space ce Kernel Space ce Devices 22
2018-10-23
NVM-aware File System
- Directly attached to the app’s virtual address space
- Accessible through load/store instructions
- In-memory data persistency
- Ex) Doubly linked-list insertion
Memory- mapped
Storage-Class Memory
NVDIMMs Application
Load/Store User Space ce Kernel Space ce Devices 23
2018-10-23
NVM-aware File System
- Directly attached to the app’s virtual address space
- Accessible through load/store instructions
- In-memory data persistency
- Ex) Doubly linked-list insertion
Memory- mapped
Storage-Class Memory
NVDIMMs Application
Load/Store User Space ce Kernel Space ce Devices 24
2018-10-23
NVM-aware File System
- Directly attached to the app’s virtual address space
- Accessible through load/store instructions
- In-memory data persistency
- Ex) Doubly linked-list insertion
Memory- mapped
Storage-Class Memory
NVDIMMs Application
Load/Store User Space ce Kernel Space ce Devices 25
2018-10-23
NVM-aware File System
A
- Directly attached to the app’s virtual address space
- Accessible through load/store instructions
- In-memory data persistency
- Ex) Doubly linked-list insertion
Memory- mapped
Storage-Class Memory
NVDIMMs Application
Load/Store User Space ce Kernel Space ce Devices 26
2018-10-23
NVM-aware File System
A C
- Directly attached to the app’s virtual address space
- Accessible through load/store instructions
- In-memory data persistency
- Ex) Doubly linked-list insertion
Memory- mapped
Storage-Class Memory
NVDIMMs Application
Load/Store User Space ce Kernel Space ce Devices 27
2018-10-23
NVM-aware File System
A C B
- Directly attached to the app’s virtual address space
- Accessible through load/store instructions
- In-memory data persistency
- Ex) Doubly linked-list insertion
Memory- mapped
Storage-Class Memory
NVDIMMs Application
Load/Store User Space ce Kernel Space ce Devices 28
2018-10-23
NVM-aware File System
A C B
- Directly attached to the app’s virtual address space
- Accessible through load/store instructions
- In-memory data persistency
- Ex) Doubly linked-list insertion
Memory- mapped
Storage-Class Memory
NVDIMMs Application
Load/Store User Space ce Kernel Space ce Devices 29
2018-10-23
NVM-aware File System
A C B
- Directly attached to the app’s virtual address space
- Accessible through load/store instructions
- In-memory data persistency
- Ex) Doubly linked-list insertion
Memory- mapped
Storage-Class Memory
NVDIMMs Application
Load/Store User Space ce Kernel Space ce Devices 30
2018-10-23
NVM-aware File System
A C B
store B-> next= C store B-> prev= A store A-> next= B store C-> prev= B
- Directly attached to the app’s virtual address space
- Accessible through load/store instructions
- In-memory data persistency
- Ex) Doubly linked-list insertion
Memory- mapped
Storage-Class Memory
NVDIMMs Application
Load/Store User Space ce Kernel Space ce Devices 31
2018-10-23
NVM-aware File System
A C B
store B-> next= C store B-> prev= A store A-> next= B store C-> prev= B
- Directly attached to the app’s virtual address space
- Accessible through load/store instructions
- In-memory data persistency
- Ex) Doubly linked-list insertion
Memory- mapped
Storage-Class Memory
NVDIMMs Application
Load/Store User Space ce Kernel Space ce Devices 32
2018-10-23
NVM-aware File System
A C B
store B-> next= C store B-> prev= A store A-> next= B store C-> prev= B
- Directly attached to the app’s virtual address space
- Accessible through load/store instructions
- In-memory data persistency
- Ex) Doubly linked-list insertion
Memory- mapped
Storage-Class Memory
NVDIMMs Application
Load/Store User Space ce Kernel Space ce Devices 33
2018-10-23
NVM-aware File System
A C B
store B-> next= C store B-> prev= A store A-> next= B store C-> prev= B
- Directly attached to the app’s virtual address space
- Accessible through load/store instructions
- In-memory data persistency
- Ex) Doubly linked-list insertion
Memory- mapped
Storage-Class Memory
NVDIMMs Application
Load/Store User Space ce Kernel Space ce Devices 34
2018-10-23
NVM-aware File System
A C B
store B-> next= C store B-> prev= A store A-> next= B store C-> prev= B
- Directly attached to the app’s virtual address space
- Accessible through load/store instructions
- In-memory data persistency
- Ex) Doubly linked-list insertion
Memory- mapped
Storage-Class Memory
NVDIMMs Application
Load/Store User Space ce Kernel Space ce Devices 35
2018-10-23
NVM-aware File System
A C B
store B-> next= C store B-> prev= A store A-> next= B store C-> prev= B
- Directly attached to the app’s virtual address space
- Accessible through load/store instructions
- In-memory data persistency
- Ex) Doubly linked-list insertion
Memory- mapped
Storage-Class Memory
NVDIMMs Application
Load/Store User Space ce Kernel Space ce Devices 36
2018-10-23
NVM-aware File System
A C B
store B-> next= C store B-> prev= A store A-> next= B store C-> prev= B
- Directly attached to the app’s virtual address space
- Accessible through load/store instructions
- In-memory data persistency
- Ex) Doubly linked-list insertion
Memory- mapped
Storage-Class Memory
NVDIMMs Application
Load/Store User Space ce Kernel Space ce Devices 37
2018-10-23
NVM-aware File System
A C B
store B-> next= C store B-> prev= A store A-> next= B store C-> prev= B
- Directly attached to the app’s virtual address space
- Accessible through load/store instructions
- In-memory data persistency
- Ex) Doubly linked-list insertion
Memory- mapped
Storage-Class Memory
NVDIMMs Application
Load/Store User Space ce Kernel Space ce Devices 38
2018-10-23
NVM-aware File System
A C B
store B-> next= C store B-> prev= A store A-> next= B store C-> prev= B
- Directly attached to the app’s virtual address space
- Accessible through load/store instructions
- In-memory data persistency
- Ex) Doubly linked-list insertion
Memory- mapped
Storage-Class Memory
NVDIMMs Application
Load/Store User Space ce Kernel Space ce Devices 39
2018-10-23
NVM-aware File System
A C B
store B-> next= C store B-> prev= A store A-> next= B store C-> prev= B
- Directly attached to the app’s virtual address space
- Accessible through load/store instructions
- In-memory data persistency
- Ex) Doubly linked-list insertion
Memory- mapped
Storage-Class Memory
NVDIMMs Application
Load/Store User Space ce Kernel Space ce Devices 40
2018-10-23
NVM-aware File System
A C B
store B-> next= C store B-> prev= A store A-> next= B store C-> prev= B
- Directly attached to the app’s virtual address space
- Accessible through load/store instructions
- In-memory data persistency
- Ex) Doubly linked-list insertion
Memory- mapped
Storage-Class Memory
NVDIMMs Application
Load/Store User Space ce Kernel Space ce Devices 41
2018-10-23
NVM-aware File System
A C B
store B-> next= C store B-> prev= A store A-> next= B store C-> prev= B
- Directly attached to the app’s virtual address space
- Accessible through load/store instructions
- In-memory data persistency
- Ex) Doubly linked-list insertion
Memory- mapped
Storage-Class Memory
NVDIMMs Application
Load/Store User Space ce Kernel Space ce Devices 42
2018-10-23
NVM-aware File System
A C B
store B-> next= C store B-> prev= A store A-> next= B store C-> prev= B
New requirement: Supporting Crash-Consistency of NVM stores
Atomic Durability through Logging
43
2018-10-23
- Transaction
- All stores in a transaction become durable all together or nothing
- Ex) Atomic durability in software
Atomic Durability through Logging
44
2018-10-23
- Transaction
- All stores in a transaction become durable all together or nothing
- Ex) Atomic durability in software
Atomic Durability through Logging
45
2018-10-23
Durability with
cache-flush
store B-> next= C store B-> prev= A store A-> next= B store C-> prev= B
- Transaction
- All stores in a transaction become durable all together or nothing
- Ex) Atomic durability in software
Atomic Durability through Logging
46
2018-10-23
Durability with
cache-flush
Atomicity and ordering with
write-ahead logging
store B-> next= C store B-> prev= A store A-> next= B store C-> prev= B
- Transaction
- All stores in a transaction become durable all together or nothing
- Ex) Atomic durability in software
Atomic Durability through Logging
47
2018-10-23
Durability with
cache-flush
Atomicity and ordering with
write-ahead logging
store B-> next= C store B-> prev= A store A-> next= B store C-> prev= B
- Transaction
- All stores in a transaction become durable all together or nothing
- Ex) Atomic durability in software
Atomic Durability through Logging
48
2018-10-23
Durability with
cache-flush
Atomicity and ordering with
write-ahead logging
store B-> next= C store B-> prev= A store A-> next= B store C-> prev= B store B-> next= C store B-> prev= A store A-> next= B store C-> prev= B
cache-flush
- Transaction
- All stores in a transaction become durable all together or nothing
- Ex) Atomic durability in software
Atomic Durability through Logging
49
2018-10-23
Durability with
cache-flush
Atomicity and ordering with
write-ahead logging
Log Write
store B-> next= C store B-> prev= A store A-> next= B store C-> prev= B store B-> next= C store B-> prev= A store A-> next= B store C-> prev= B
cache-flush
- Transaction
- All stores in a transaction become durable all together or nothing
- Ex) Atomic durability in software
Atomic Durability through Logging
50
2018-10-23
Durability with
cache-flush
Atomicity and ordering with
write-ahead logging
Log Write Data Update
store B-> next= C store B-> prev= A store A-> next= B store C-> prev= B store B-> next= C store B-> prev= A store A-> next= B store C-> prev= B
cache-flush
- Transaction
- All stores in a transaction become durable all together or nothing
- Ex) Atomic durability in software
Atomic Durability through Logging
51
2018-10-23
Durability with
cache-flush
Atomicity and ordering with
write-ahead logging
Log Write Data Update
store B-> next= C store B-> prev= A store A-> next= B store C-> prev= B store B-> next= C store B-> prev= A store A-> next= B store C-> prev= B
cache-flush store log[0]= A-> next store log[1]= C-> prev cache-flush sfence
store B-> next= C store B-> prev= A store A-> next= B store C-> prev= B cache-flush
sfence
- Transaction
- All stores in a transaction become durable all together or nothing
- Ex) Atomic durability in software
Atomic Durability through Logging
52
2018-10-23
Durability with
cache-flush
Atomicity and ordering with
write-ahead logging
Log Write Data Update
store B-> next= C store B-> prev= A store A-> next= B store C-> prev= B store B-> next= C store B-> prev= A store A-> next= B store C-> prev= B
cache-flush store log[0]= A-> next store log[1]= C-> prev cache-flush sfence
store B-> next= C store B-> prev= A store A-> next= B store C-> prev= B cache-flush
sfence
- Transaction
- All stores in a transaction become durable all together or nothing
- Ex) Atomic durability in software
Atomic Durability through Logging
53
2018-10-23
Durability with
cache-flush
Atomicity and ordering with
write-ahead logging
Log Write Data Update
Persist-ordering with
store-fence
store B-> next= C store B-> prev= A store A-> next= B store C-> prev= B store B-> next= C store B-> prev= A store A-> next= B store C-> prev= B
cache-flush store log[0]= A-> next store log[1]= C-> prev cache-flush sfence
store B-> next= C store B-> prev= A store A-> next= B store C-> prev= B cache-flush
sfence
- Transaction
- All stores in a transaction become durable all together or nothing
- Ex) Atomic durability in software
Atomic Durability through Logging
54
2018-10-23
Durability with
cache-flush
Atomicity and ordering with
write-ahead logging
Log Write Data Update
Persist-ordering with
store-fence
store B-> next= C store B-> prev= A store A-> next= B store C-> prev= B store B-> next= C store B-> prev= A store A-> next= B store C-> prev= B
cache-flush store log[0]= A-> next store log[1]= C-> prev cache-flush sfence
store B-> next= C store B-> prev= A store A-> next= B store C-> prev= B cache-flush
sfence
- Transaction
- All stores in a transaction become durable all together or nothing
- Ex) Atomic durability in software
Atomic Durability through Logging
55
2018-10-23
Durability with
cache-flush
Atomicity and ordering with
write-ahead logging
Log Write Data Update
Persist-ordering with
store-fence
store B-> next= C store B-> prev= A store A-> next= B store C-> prev= B store B-> next= C store B-> prev= A store A-> next= B store C-> prev= B
cache-flush store log[0]= A-> next store log[1]= C-> prev cache-flush sfence
store B-> next= C store B-> prev= A store A-> next= B store C-> prev= B cache-flush
sfence
- Transaction
- All stores in a transaction become durable all together or nothing
- Ex) Atomic durability in software
Atomic Durability through Logging
56
2018-10-23
Durability with
cache-flush
Atomicity and ordering with
write-ahead logging
Log Write Data Update
Persist-ordering with
store-fence
store B-> next= C store B-> prev= A store A-> next= B store C-> prev= B store B-> next= C store B-> prev= A store A-> next= B store C-> prev= B
cache-flush store log[0]= A-> next store log[1]= C-> prev cache-flush sfence
store B-> next= C store B-> prev= A store A-> next= B store C-> prev= B cache-flush
sfence
HW-assisted Logging
2018-10-23
57
HW-assisted Logging
- Simple programming model
- HW is responsible for 1) log-write and 2) data update
- Advantages over software-logging
- Fine-grained ordering & less CPU cycles
2018-10-23
58
HW-assisted Logging
- Simple programming model
- HW is responsible for 1) log-write and 2) data update
- Advantages over software-logging
- Fine-grained ordering & less CPU cycles
2018-10-23
59
Transa sact ct ion_begin( ) ( ) store B-> next= C … store C-> prev= B Transact ion_end( ) ( )
HW-assisted Logging
- Simple programming model
- HW is responsible for 1) log-write and 2) data update
- Advantages over software-logging
- Fine-grained ordering & less CPU cycles
2018-10-23
60
Transa sact ct ion_begin( ) ( ) store B-> next= C … store C-> prev= B Transact ion_end( ) ( ) store B-> next= C … store C-> prev= B
HW-assisted Logging
- Simple programming model
- HW is responsible for 1) log-write and 2) data update
- Advantages over software-logging
- Fine-grained ordering & less CPU cycles
2018-10-23
61
Transa sact ct ion_begin( ) ( ) store B-> next= C … store C-> prev= B Transact ion_end( ) ( ) store B-> next= C … store C-> prev= B
HW-assisted Logging
- Simple programming model
- HW is responsible for 1) log-write and 2) data update
- Advantages over software-logging
- Fine-grained ordering & less CPU cycles
2018-10-23
62
Transa sact ct ion_begin( ) ( ) store B-> next= C … store C-> prev= B Transact ion_end( ) ( ) store B-> next= C … store C-> prev= B
Caches
NVM Log 1) Log-Write Log Ctrl. 2) Data-Update NVM Data
Processor NVM
HW-assisted Logging
- Simple programming model
- HW is responsible for 1) log-write and 2) data update
- Advantages over software-logging
- Fine-grained ordering & less CPU cycles
2018-10-23
63
Transa sact ct ion_begin( ) ( ) store B-> next= C … store C-> prev= B Transact ion_end( ) ( ) store B-> next= C … store C-> prev= B
Caches
NVM Log 1) Log-Write Log Ctrl. 2) Data-Update NVM Data
Processor NVM
Log A Log B
Fence
Store A Store B Log A Log B Store A Store B
(a) Ordering with SW (b) Ordering with HW
Past Proposal: Undo-based HW-Logging
2018-10-23
64
- A. Joshi et al. HPCA 2017.
- S. Shin et al. ISCA 2017.
Caches
NVM Log Log Ctrl. NVM Data
Processor NVM
Past Proposal: Undo-based HW-Logging
- Store old value in logs
- Update data in NVM before commit
Synchronous data-update
2018-10-23
65
- A. Joshi et al. HPCA 2017.
- S. Shin et al. ISCA 2017.
Caches
NVM Log Log Ctrl. NVM Data
Processor NVM
Past Proposal: Undo-based HW-Logging
- Store old value in logs
- Update data in NVM before commit
Synchronous data-update
2018-10-23
66
- A. Joshi et al. HPCA 2017.
- S. Shin et al. ISCA 2017.
Caches
NVM Log 1) Store old value in NVM logs Log Ctrl. NVM Data
Processor NVM
Past Proposal: Undo-based HW-Logging
- Store old value in logs
- Update data in NVM before commit
Synchronous data-update
2018-10-23
67
- A. Joshi et al. HPCA 2017.
- S. Shin et al. ISCA 2017.
Caches
NVM Log 1) Store old value in NVM logs Log Ctrl. NVM Data
Processor NVM
Past Proposal: Undo-based HW-Logging
- Store old value in logs
- Update data in NVM before commit
Synchronous data-update
2018-10-23
68
- A. Joshi et al. HPCA 2017.
- S. Shin et al. ISCA 2017.
Caches
NVM Log 1) Store old value in NVM logs Log Ctrl. NVM Data
Processor NVM
Addr
Past Proposal: Undo-based HW-Logging
- Store old value in logs
- Update data in NVM before commit
Synchronous data-update
2018-10-23
69
- A. Joshi et al. HPCA 2017.
- S. Shin et al. ISCA 2017.
Caches
NVM Log 1) Store old value in NVM logs Log Ctrl. 2) Update data in NVM NVM Data
Processor NVM
Addr
Old Value
Past Proposal: Undo-based HW-Logging
- Store old value in logs
- Update data in NVM before commit
Synchronous data-update
2018-10-23
70
- A. Joshi et al. HPCA 2017.
- S. Shin et al. ISCA 2017.
Caches
NVM Log 1) Store old value in NVM logs Log Ctrl. 2) Update data in NVM NVM Data
Processor NVM
Long critical path due to synchronous data-update
Addr
Old Value
Past Proposal: Undo-based HW-Logging
- Store old value in logs
- Update data in NVM before commit
Synchronous data-update
2018-10-23
71
- A. Joshi et al. HPCA 2017.
- S. Shin et al. ISCA 2017.
Caches
NVM Log 1) Store old value in NVM logs Log Ctrl. 2) Update data in NVM NVM Data
Processor NVM
Long critical path due to synchronous data-update
Addr
Old Value
Past Proposal: Undo-based HW-Logging
- Store old value in logs
- Update data in NVM before commit
Synchronous data-update
2018-10-23
72
- A. Joshi et al. HPCA 2017.
- S. Shin et al. ISCA 2017.
Caches
NVM Log 1) Store old value in NVM logs Log Ctrl. 2) Update data in NVM NVM Data
Processor NVM
Long critical path due to synchronous data-update
Addr
Old Value
Past Proposal: Redo-based HW-Logging
2018-10-23
73
- K. Doshi et al. HPCA 2016.
Caches
NVM Log Log Ctrl. NVM Data
Processor NVM
Addr
New Value
Past Proposal: Redo-based HW-Logging
- Store new value in logs
- Update data in NVM after commit
Asynchronous data-update
- However, update by reading log entries from NVM
I ndirect data-update
2018-10-23
74
- K. Doshi et al. HPCA 2016.
Caches
NVM Log Log Ctrl. NVM Data
Processor NVM
Addr
New Value
Past Proposal: Redo-based HW-Logging
- Store new value in logs
- Update data in NVM after commit
Asynchronous data-update
- However, update by reading log entries from NVM
I ndirect data-update
2018-10-23
75
- K. Doshi et al. HPCA 2016.
Caches
NVM Log 1) Store new value in NVM logs Log Ctrl. NVM Data
Processor NVM
Addr
New Value
Past Proposal: Redo-based HW-Logging
- Store new value in logs
- Update data in NVM after commit
Asynchronous data-update
- However, update by reading log entries from NVM
I ndirect data-update
2018-10-23
76
- K. Doshi et al. HPCA 2016.
Caches
NVM Log 1) Store new value in NVM logs Log Ctrl. NVM Data
Processor NVM
Addr
New Value
Past Proposal: Redo-based HW-Logging
- Store new value in logs
- Update data in NVM after commit
Asynchronous data-update
- However, update by reading log entries from NVM
I ndirect data-update
2018-10-23
77
- K. Doshi et al. HPCA 2016.
Caches
NVM Log 1) Store new value in NVM logs Log Ctrl. 2) Update data in NVM NVM Data
Processor NVM
Addr
New Value
Past Proposal: Redo-based HW-Logging
- Store new value in logs
- Update data in NVM after commit
Asynchronous data-update
- However, update by reading log entries from NVM
I ndirect data-update
2018-10-23
78
- K. Doshi et al. HPCA 2016.
Caches
NVM Log 1) Store new value in NVM logs Log Ctrl. 2) Update data in NVM NVM Data
Processor NVM
Addr
New Value
Past Proposal: Redo-based HW-Logging
- Store new value in logs
- Update data in NVM after commit
Asynchronous data-update
- However, update by reading log entries from NVM
I ndirect data-update
2018-10-23
79
- K. Doshi et al. HPCA 2016.
Caches
NVM Log 1) Store new value in NVM logs Log Ctrl. 2) Update data in NVM NVM Data
Processor NVM
Wastes extra NVM bandwidth for reading logs from NVM
Addr
New Value
Past Proposal: Undo-Redo HW-Logging
2018-10-23
80
Caches
NVM Log NVM Data
Processor NVM
- M. Ogleari et al. HPCA 2018.
Log Buffer
Addr
Old Value New Value
Past Proposal: Undo-Redo HW-Logging
- Store both old and new value in logs
Larger log sizes
- Update data in NVM after commit
2018-10-23
81
Caches
NVM Log NVM Data
Processor NVM
- M. Ogleari et al. HPCA 2018.
Log Buffer
Addr
Old Value New Value
Past Proposal: Undo-Redo HW-Logging
- Store both old and new value in logs
Larger log sizes
- Update data in NVM after commit
2018-10-23
82
Requires more NVM writes for storing logs in NVM
Caches
NVM Log 1) Store both old and new value in NVM logs NVM Data
Processor NVM
- M. Ogleari et al. HPCA 2018.
Log Buffer
Addr
Old Value New Value
Past Proposal: Undo-Redo HW-Logging
- Store both old and new value in logs
Larger log sizes
- Update data in NVM after commit
2018-10-23
83
Requires more NVM writes for storing logs in NVM
Caches
NVM Log 1) Store both old and new value in NVM logs 2) Update data in NVM NVM Data
Processor NVM
- M. Ogleari et al. HPCA 2018.
Log Buffer
Addr
Old Value New Value
Past Proposal: Undo-Redo HW-Logging
- Store both old and new value in logs
Larger log sizes
- Update data in NVM after commit
2018-10-23
84
Requires more NVM writes for storing logs in NVM
Caches
NVM Log 1) Store both old and new value in NVM logs 2) Update data in NVM NVM Data
Processor NVM
- M. Ogleari et al. HPCA 2018.
Log Buffer
Addr
Old Value New Value
Past Proposal: Undo-Redo HW-Logging
- Store both old and new value in logs
Larger log sizes
- Update data in NVM after commit
2018-10-23
85
Requires more NVM writes for storing logs in NVM
Caches
NVM Log 1) Store both old and new value in NVM logs 2) Update data in NVM NVM Data
Processor NVM
- M. Ogleari et al. HPCA 2018.
Log Buffer
Addr
Old Value New Value
Past Proposal: Undo-Redo HW-Logging
- Store both old and new value in logs
Larger log sizes
- Update data in NVM after commit
2018-10-23
86
Requires more NVM writes for storing logs in NVM
Caches
NVM Log 1) Store both old and new value in NVM logs 2) Update data in NVM NVM Data
Processor NVM
- M. Ogleari et al. HPCA 2018.
Log Buffer
Addr
Old Value New Value
Past Proposals: Summary
2018-10-23
87
Log-Write Data-Update Drawback
ATOM [HPCA 2017] Undo Direct
Synchronous Long Critical Path
Proteus [ISCA 2017] Undo Direct
Synchronous
Wrap [HPCA 2016] Redo
I ndirect
Asynchronous
Waste NVM Bandwidth
FWB [HPCA 2018]
UndoRedo
Direct Asynchronous
More Log Write
0.4 0.6 0.8 1 1.2
1 2 3
0.4 0.6 0.8 1 1.2 1.4
1 2 3
Past Proposals: Summary
2018-10-23
88
Log-Write Data-Update Drawback
ATOM [HPCA 2017] Undo Direct
Synchronous Long Critical Path
Proteus [ISCA 2017] Undo Direct
Synchronous
Wrap [HPCA 2016] Redo
I ndirect
Asynchronous
Waste NVM Bandwidth
FWB [HPCA 2018]
UndoRedo
Direct Asynchronous
More Log Write
Large & Sequential Workloads
0.4 0.6 0.8 1 1.2
1 2 3
0.4 0.6 0.8 1 1.2 1.4
1 2 3
Past Proposals: Summary
2018-10-23
89
Log-Write Data-Update Drawback
ATOM [HPCA 2017] Undo Direct
Synchronous Long Critical Path
Proteus [ISCA 2017] Undo Direct
Synchronous
Wrap [HPCA 2016] Redo
I ndirect
Asynchronous
Waste NVM Bandwidth
FWB [HPCA 2018]
UndoRedo
Direct Asynchronous
More Log Write
Large & Sequential Workloads
Cycles per Transaction (CPT) Lower is Better
0.4 0.6 0.8 1 1.2
1 2 3
0.4 0.6 0.8 1 1.2 1.4
1 2 3
Past Proposals: Summary
2018-10-23
90
Log-Write Data-Update Drawback
ATOM [HPCA 2017] Undo Direct
Synchronous Long Critical Path
Proteus [ISCA 2017] Undo Direct
Synchronous
Wrap [HPCA 2016] Redo
I ndirect
Asynchronous
Waste NVM Bandwidth
FWB [HPCA 2018]
UndoRedo
Direct Asynchronous
More Log Write
Large & Sequential Workloads Small & Random Workloads
Cycles per Transaction (CPT) Lower is Better
0.4 0.6 0.8 1 1.2
1 2 3
0.4 0.6 0.8 1 1.2 1.4
1 2 3
Past Proposals: Summary
2018-10-23
91
Log-Write Data-Update Drawback
ATOM [HPCA 2017] Undo Direct
Synchronous Long Critical Path
Proteus [ISCA 2017] Undo Direct
Synchronous
Wrap [HPCA 2016] Redo
I ndirect
Asynchronous
Waste NVM Bandwidth
FWB [HPCA 2018]
UndoRedo
Direct Asynchronous
More Log Write
Large & Sequential Workloads Small & Random Workloads
Cycles per Transaction (CPT) Lower is Better
Undo-Friendly
0.4 0.6 0.8 1 1.2
1 2 3
0.4 0.6 0.8 1 1.2 1.4
1 2 3
Past Proposals: Summary
2018-10-23
92
Log-Write Data-Update Drawback
ATOM [HPCA 2017] Undo Direct
Synchronous Long Critical Path
Proteus [ISCA 2017] Undo Direct
Synchronous
Wrap [HPCA 2016] Redo
I ndirect
Asynchronous
Waste NVM Bandwidth
FWB [HPCA 2018]
UndoRedo
Direct Asynchronous
More Log Write
Large & Sequential Workloads Small & Random Workloads
Cycles per Transaction (CPT) Lower is Better
Undo-Friendly
0.4 0.6 0.8 1 1.2
1 2 3
0.4 0.6 0.8 1 1.2 1.4
1 2 3
Past Proposals: Summary
2018-10-23
93
Log-Write Data-Update Drawback
ATOM [HPCA 2017] Undo Direct
Synchronous Long Critical Path
Proteus [ISCA 2017] Undo Direct
Synchronous
Wrap [HPCA 2016] Redo
I ndirect
Asynchronous
Waste NVM Bandwidth
FWB [HPCA 2018]
UndoRedo
Direct Asynchronous
More Log Write
Large & Sequential Workloads Small & Random Workloads
Cycles per Transaction (CPT) Lower is Better
Undo-Friendly
0.4 0.6 0.8 1 1.2
1 2 3
0.4 0.6 0.8 1 1.2 1.4
1 2 3
Past Proposals: Summary
2018-10-23
94
Log-Write Data-Update Drawback
ATOM [HPCA 2017] Undo Direct
Synchronous Long Critical Path
Proteus [ISCA 2017] Undo Direct
Synchronous
Wrap [HPCA 2016] Redo
I ndirect
Asynchronous
Waste NVM Bandwidth
FWB [HPCA 2018]
UndoRedo
Direct Asynchronous
More Log Write
Large & Sequential Workloads Small & Random Workloads
Cycles per Transaction (CPT) Lower is Better
Undo-Friendly
0.4 0.6 0.8 1 1.2
1 2 3
0.4 0.6 0.8 1 1.2 1.4
1 2 3
Past Proposals: Summary
2018-10-23
95
Log-Write Data-Update Drawback
ATOM [HPCA 2017] Undo Direct
Synchronous Long Critical Path
Proteus [ISCA 2017] Undo Direct
Synchronous
Wrap [HPCA 2016] Redo
I ndirect
Asynchronous
Waste NVM Bandwidth
FWB [HPCA 2018]
UndoRedo
Direct Asynchronous
More Log Write
Large & Sequential Workloads Small & Random Workloads
Cycles per Transaction (CPT) Lower is Better
Undo-Friendly Redo-Friendly
0.4 0.6 0.8 1 1.2
1 2 3
0.4 0.6 0.8 1 1.2 1.4
1 2 3
Past Proposals: Summary
2018-10-23
96
Log-Write Data-Update Drawback
ATOM [HPCA 2017] Undo Direct
Synchronous Long Critical Path
Proteus [ISCA 2017] Undo Direct
Synchronous
Wrap [HPCA 2016] Redo
I ndirect
Asynchronous
Waste NVM Bandwidth
FWB [HPCA 2018]
UndoRedo
Direct Asynchronous
More Log Write
Large & Sequential Workloads Small & Random Workloads
Cycles per Transaction (CPT) Lower is Better
Undo-Friendly Redo-Friendly
0.4 0.6 0.8 1 1.2
1 2 3
0.4 0.6 0.8 1 1.2 1.4
1 2 3
Past Proposals: Summary
2018-10-23
97
Log-Write Data-Update Drawback
ATOM [HPCA 2017] Undo Direct
Synchronous Long Critical Path
Proteus [ISCA 2017] Undo Direct
Synchronous
Wrap [HPCA 2016] Redo
I ndirect
Asynchronous
Waste NVM Bandwidth
FWB [HPCA 2018]
UndoRedo
Direct Asynchronous
More Log Write
Large & Sequential Workloads Small & Random Workloads
Cycles per Transaction (CPT) Lower is Better
Undo-Friendly Redo-Friendly
Cycles per Transaction (CPT) Lower is Better
0.4 0.6 0.8 1 1.2
1 2 3
0.4 0.6 0.8 1 1.2 1.4
1 2 3
Past Proposals: Summary
2018-10-23
98
Log-Write Data-Update Drawback
ATOM [HPCA 2017] Undo Direct
Synchronous Long Critical Path
Proteus [ISCA 2017] Undo Direct
Synchronous
Wrap [HPCA 2016] Redo
I ndirect
Asynchronous
Waste NVM Bandwidth
FWB [HPCA 2018]
UndoRedo
Direct Asynchronous
More Log Write
Large & Sequential Workloads Small & Random Workloads
Cycles per Transaction (CPT) Lower is Better
Undo-Friendly Redo-Friendly
Cycles per Transaction (CPT) Lower is Better
0.4 0.6 0.8 1 1.2
1 2 3
0.4 0.6 0.8 1 1.2 1.4
1 2 3
Past Proposals: Summary
2018-10-23
99
Log-Write Data-Update Drawback
ATOM [HPCA 2017] Undo Direct
Synchronous Long Critical Path
Proteus [ISCA 2017] Undo Direct
Synchronous
Wrap [HPCA 2016] Redo
I ndirect
Asynchronous
Waste NVM Bandwidth
FWB [HPCA 2018]
UndoRedo
Direct Asynchronous
More Log Write
Large & Sequential Workloads Small & Random Workloads
Cycles per Transaction (CPT) Lower is Better
Undo-Friendly Redo-Friendly
Cycles per Transaction (CPT) Lower is Better
0.4 0.6 0.8 1 1.2
1 2 3
0.4 0.6 0.8 1 1.2 1.4
1 2 3
Past Proposals: Summary
2018-10-23
100
Log-Write Data-Update Drawback
ATOM [HPCA 2017] Undo Direct
Synchronous Long Critical Path
Proteus [ISCA 2017] Undo Direct
Synchronous
Wrap [HPCA 2016] Redo
I ndirect
Asynchronous
Waste NVM Bandwidth
FWB [HPCA 2018]
UndoRedo
Direct Asynchronous
More Log Write
Large & Sequential Workloads Small & Random Workloads
Cycles per Transaction (CPT) Lower is Better
Undo-Friendly Redo-Friendly
Cycles per Transaction (CPT) Lower is Better
0.4 0.6 0.8 1 1.2
1 2 3
0.4 0.6 0.8 1 1.2 1.4
1 2 3
Past Proposals: Summary
2018-10-23
101
Log-Write Data-Update Drawback
ATOM [HPCA 2017] Undo Direct
Synchronous Long Critical Path
Proteus [ISCA 2017] Undo Direct
Synchronous
Wrap [HPCA 2016] Redo
I ndirect
Asynchronous
Waste NVM Bandwidth
FWB [HPCA 2018]
UndoRedo
Direct Asynchronous
More Log Write
Large & Sequential Workloads Small & Random Workloads
Cycles per Transaction (CPT) Lower is Better
Undo-Friendly Redo-Friendly
Cycles per Transaction (CPT) Lower is Better
0.4 0.6 0.8 1 1.2
1 2 3
0.4 0.6 0.8 1 1.2 1.4
1 2 3
Past Proposals: Summary
2018-10-23
102
Log-Write Data-Update Drawback
ATOM [HPCA 2017] Undo Direct
Synchronous Long Critical Path
Proteus [ISCA 2017] Undo Direct
Synchronous
Wrap [HPCA 2016] Redo
I ndirect
Asynchronous
Waste NVM Bandwidth
FWB [HPCA 2018]
UndoRedo
Direct Asynchronous
More Log Write
Large & Sequential Workloads Small & Random Workloads
Cycles per Transaction (CPT) Lower is Better
Undo-Friendly Redo-Friendly
Cycles per Transaction (CPT) Lower is Better
Trade-offs exist!
Design Goal & Challenges
2018-10-23
103
Design Goal & Challenges
- Redo log with asynchronous & direct update to NVM
- Challenge # 1: tracking write-sets of previous transactions
- Without data update, logs keep growing
- Challenge # 2: handling an early-eviction
- Eviction of uncommitted changes from volatile CPU caches
2018-10-23
104
Design Goal & Challenges
- Redo log with asynchronous & direct update to NVM
- Challenge # 1: tracking write-sets of previous transactions
- Without data update, logs keep growing
- Challenge # 2: handling an early-eviction
- Eviction of uncommitted changes from volatile CPU caches
2018-10-23
105
Caches
Design Goal & Challenges
- Redo log with asynchronous & direct update to NVM
- Challenge # 1: tracking write-sets of previous transactions
- Without data update, logs keep growing
- Challenge # 2: handling an early-eviction
- Eviction of uncommitted changes from volatile CPU caches
2018-10-23
106
Caches
NVM Log Processor
Design Goal & Challenges
- Redo log with asynchronous & direct update to NVM
- Challenge # 1: tracking write-sets of previous transactions
- Without data update, logs keep growing
- Challenge # 2: handling an early-eviction
- Eviction of uncommitted changes from volatile CPU caches
2018-10-23
107
Caches
NVM Log Processor
Design Goal & Challenges
- Redo log with asynchronous & direct update to NVM
- Challenge # 1: tracking write-sets of previous transactions
- Without data update, logs keep growing
- Challenge # 2: handling an early-eviction
- Eviction of uncommitted changes from volatile CPU caches
2018-10-23
108
Caches
NVM Log Processor
Design Goal & Challenges
- Redo log with asynchronous & direct update to NVM
- Challenge # 1: tracking write-sets of previous transactions
- Without data update, logs keep growing
- Challenge # 2: handling an early-eviction
- Eviction of uncommitted changes from volatile CPU caches
2018-10-23
109
Caches
NVM Log NVM Data Processor
Design Goal & Challenges
- Redo log with asynchronous & direct update to NVM
- Challenge # 1: tracking write-sets of previous transactions
- Without data update, logs keep growing
- Challenge # 2: handling an early-eviction
- Eviction of uncommitted changes from volatile CPU caches
2018-10-23
110
Caches
NVM Log NVM Data NVM Processor
Design Goal & Challenges
- Redo log with asynchronous & direct update to NVM
- Challenge # 1: tracking write-sets of previous transactions
- Without data update, logs keep growing
- Challenge # 2: handling an early-eviction
- Eviction of uncommitted changes from volatile CPU caches
2018-10-23
111
Caches
NVM Log NVM Data NVM Processor
Design Goal & Challenges
- Redo log with asynchronous & direct update to NVM
- Challenge # 1: tracking write-sets of previous transactions
- Without data update, logs keep growing
- Challenge # 2: handling an early-eviction
- Eviction of uncommitted changes from volatile CPU caches
2018-10-23
112
Caches
NVM Log NVM Data NVM
Tx1
Processor
Design Goal & Challenges
- Redo log with asynchronous & direct update to NVM
- Challenge # 1: tracking write-sets of previous transactions
- Without data update, logs keep growing
- Challenge # 2: handling an early-eviction
- Eviction of uncommitted changes from volatile CPU caches
2018-10-23
113
Caches
NVM Log NVM Data NVM
Tx1 Tx2
Processor
Design Goal & Challenges
- Redo log with asynchronous & direct update to NVM
- Challenge # 1: tracking write-sets of previous transactions
- Without data update, logs keep growing
- Challenge # 2: handling an early-eviction
- Eviction of uncommitted changes from volatile CPU caches
2018-10-23
114
Caches
NVM Log NVM Data NVM
Tx1 Tx2 Tx3
Processor
Design Goal & Challenges
- Redo log with asynchronous & direct update to NVM
- Challenge # 1: tracking write-sets of previous transactions
- Without data update, logs keep growing
- Challenge # 2: handling an early-eviction
- Eviction of uncommitted changes from volatile CPU caches
2018-10-23
115
Caches
NVM Log NVM Data NVM
Tx1 Tx2 Tx3 Tx1
Processor
Design Goal & Challenges
- Redo log with asynchronous & direct update to NVM
- Challenge # 1: tracking write-sets of previous transactions
- Without data update, logs keep growing
- Challenge # 2: handling an early-eviction
- Eviction of uncommitted changes from volatile CPU caches
2018-10-23
116
Caches
NVM Log NVM Data NVM
Tx1 Tx2 Tx3 Tx1 Tx2
Processor
Design Goal & Challenges
- Redo log with asynchronous & direct update to NVM
- Challenge # 1: tracking write-sets of previous transactions
- Without data update, logs keep growing
- Challenge # 2: handling an early-eviction
- Eviction of uncommitted changes from volatile CPU caches
2018-10-23
117
Caches
NVM Log NVM Data NVM
Tx1 Tx2 Tx3 Tx1 Tx2
Processor
Design Goal & Challenges
- Redo log with asynchronous & direct update to NVM
- Challenge # 1: tracking write-sets of previous transactions
- Without data update, logs keep growing
- Challenge # 2: handling an early-eviction
- Eviction of uncommitted changes from volatile CPU caches
2018-10-23
118
Caches
NVM Log NVM Data NVM
Tx1 Tx2 Tx3 Tx1 Tx2
Processor
Design Goal & Challenges
- Redo log with asynchronous & direct update to NVM
- Challenge # 1: tracking write-sets of previous transactions
- Without data update, logs keep growing
- Challenge # 2: handling an early-eviction
- Eviction of uncommitted changes from volatile CPU caches
2018-10-23
119
Caches
NVM Log NVM Data NVM
Tx1 Tx2 Tx3 Tx1 Tx2
Processor Transaction # 1
Design Goal & Challenges
- Redo log with asynchronous & direct update to NVM
- Challenge # 1: tracking write-sets of previous transactions
- Without data update, logs keep growing
- Challenge # 2: handling an early-eviction
- Eviction of uncommitted changes from volatile CPU caches
2018-10-23
120
Caches
NVM Log NVM Data NVM
Tx1 Tx2 Tx3 Tx1 Tx2
Processor Transaction # 1 (committed)
Design Goal & Challenges
- Redo log with asynchronous & direct update to NVM
- Challenge # 1: tracking write-sets of previous transactions
- Without data update, logs keep growing
- Challenge # 2: handling an early-eviction
- Eviction of uncommitted changes from volatile CPU caches
2018-10-23
121
Caches
NVM Log NVM Data NVM
Tx1 Tx2 Tx3 Tx1 Tx2
Processor Transaction # 1 (committed)
Design Goal & Challenges
- Redo log with asynchronous & direct update to NVM
- Challenge # 1: tracking write-sets of previous transactions
- Without data update, logs keep growing
- Challenge # 2: handling an early-eviction
- Eviction of uncommitted changes from volatile CPU caches
2018-10-23
122
Caches
NVM Log NVM Data NVM
Tx1 Tx2 Tx3 Tx1 Tx2
Processor Transaction # 1 (committed)
Design Goal & Challenges
- Redo log with asynchronous & direct update to NVM
- Challenge # 1: tracking write-sets of previous transactions
- Without data update, logs keep growing
- Challenge # 2: handling an early-eviction
- Eviction of uncommitted changes from volatile CPU caches
2018-10-23
123
Caches
NVM Log NVM Data NVM
Tx1 Tx2 Tx3 Tx1 Tx2
Processor Transaction # 1 (committed) Transaction # 2
Design Goal & Challenges
- Redo log with asynchronous & direct update to NVM
- Challenge # 1: tracking write-sets of previous transactions
- Without data update, logs keep growing
- Challenge # 2: handling an early-eviction
- Eviction of uncommitted changes from volatile CPU caches
2018-10-23
124
Caches
NVM Log NVM Data NVM
Tx1 Tx2 Tx3 Tx1 Tx2
Processor Transaction # 1 (committed) Transaction # 2 (committed)
Design Goal & Challenges
- Redo log with asynchronous & direct update to NVM
- Challenge # 1: tracking write-sets of previous transactions
- Without data update, logs keep growing
- Challenge # 2: handling an early-eviction
- Eviction of uncommitted changes from volatile CPU caches
2018-10-23
125
Caches
NVM Log NVM Data NVM
Tx1 Tx2 Tx3 Tx1 Tx2
Processor Transaction # 1 (committed) Transaction # 2 (committed) Transaction # 3
Design Goal & Challenges
- Redo log with asynchronous & direct update to NVM
- Challenge # 1: tracking write-sets of previous transactions
- Without data update, logs keep growing
- Challenge # 2: handling an early-eviction
- Eviction of uncommitted changes from volatile CPU caches
2018-10-23
126
Caches
NVM Log NVM Data NVM
Tx1 Tx2 Tx3 Tx1 Tx2
Processor Transaction # 1 (committed) Transaction # 2 (committed) Transaction # 3 (not committed)
Design Goal & Challenges
- Redo log with asynchronous & direct update to NVM
- Challenge # 1: tracking write-sets of previous transactions
- Without data update, logs keep growing
- Challenge # 2: handling an early-eviction
- Eviction of uncommitted changes from volatile CPU caches
2018-10-23
127
Caches
NVM Log NVM Data NVM
Tx1 Tx2 Tx3 Tx1 Tx2
Processor Transaction # 1 (committed) Transaction # 2 (committed) Transaction # 3 (not committed) (deleted)
Logs removed since data-update completed
Design Goal & Challenges
- Redo log with asynchronous & direct update to NVM
- Challenge # 1: tracking write-sets of previous transactions
- Without data update, logs keep growing
- Challenge # 2: handling an early-eviction
- Eviction of uncommitted changes from volatile CPU caches
2018-10-23
128
Caches
NVM Log NVM Data NVM
Tx1 Tx2 Tx3 Tx1 Tx2
Processor Transaction # 1 (committed) Transaction # 2 (committed) Transaction # 3 (not committed) (deleted)
Logs removed since data-update completed
Design Goal & Challenges
- Redo log with asynchronous & direct update to NVM
- Challenge # 1: tracking write-sets of previous transactions
- Without data update, logs keep growing
- Challenge # 2: handling an early-eviction
- Eviction of uncommitted changes from volatile CPU caches
2018-10-23
129
Caches
NVM Log NVM Data NVM
Tx1 Tx2 Tx3 Tx1 Tx2
Processor Transaction # 1 (committed) Transaction # 2 (committed) Transaction # 3 (not committed) (deleted)
Logs removed since data-update completed
Design Goal & Challenges
- Redo log with asynchronous & direct update to NVM
- Challenge # 1: tracking write-sets of previous transactions
- Without data update, logs keep growing
- Challenge # 2: handling an early-eviction
- Eviction of uncommitted changes from volatile CPU caches
2018-10-23
130
Caches
NVM Log NVM Data NVM
Tx1 Tx2 Tx3 Tx1 Tx2
Processor Transaction # 1 (committed) Transaction # 2 (committed) Transaction # 3 (not committed) (deleted)
Logs removed since data-update completed
Design Goal & Challenges
- Redo log with asynchronous & direct update to NVM
- Challenge # 1: tracking write-sets of previous transactions
- Without data update, logs keep growing
- Challenge # 2: handling an early-eviction
- Eviction of uncommitted changes from volatile CPU caches
2018-10-23
131
Caches
NVM Log NVM Data NVM
Tx1 Tx2 Tx3 Tx1 Tx2
Processor Transaction # 1 (committed) Transaction # 2 (committed) Transaction # 3 (not committed) (deleted)
Logs removed since data-update completed Uncommitted data must not be written to NVM
Naïve Solution: On-chip Cache Extension
2018-10-23
132
Caches
NVM Log
Redo-logs
Log Buffer
ASYNCHRONOUS update to NVM
NVM Data Processor NVM
Naïve Solution: On-chip Cache Extension
- Additional storages to store multiple write-sets
- E.g., to store all physical address, scan the entire cache hierarchy
- Cache replacement policy to be aware of transactions
- E.g., evict non-transactional cache blocks first
- Has to discard the cache block if overflow
Need to search log area for read access Need indirect data update
2018-10-23
133
Caches
NVM Log
Redo-logs
Log Buffer
ASYNCHRONOUS update to NVM
NVM Data Processor NVM
Naïve Solution: On-chip Cache Extension
- Additional storages to store multiple write-sets
- E.g., to store all physical address, scan the entire cache hierarchy
- Cache replacement policy to be aware of transactions
- E.g., evict non-transactional cache blocks first
- Has to discard the cache block if overflow
Need to search log area for read access Need indirect data update
2018-10-23
134
Caches
NVM Log
Redo-logs
Log Buffer
ASYNCHRONOUS update to NVM
NVM Data Processor NVM
Naïve Solution: On-chip Cache Extension
- Additional storages to store multiple write-sets
- E.g., to store all physical address, scan the entire cache hierarchy
- Cache replacement policy to be aware of transactions
- E.g., evict non-transactional cache blocks first
- Has to discard the cache block if overflow
Need to search log area for read access Need indirect data update
2018-10-23
135
Caches
NVM Log
Redo-logs
Log Buffer
ASYNCHRONOUS update to NVM
NVM Data Processor NVM
Redo log with Direct Update (ReDU)
2018-10-23
136
Caches
NVM Log Log Buffer
ASYNCHRONOUS update to NVM
NVM Data Processor NVM
Redo-logs
Redo log with Direct Update (ReDU)
- Our approach: use DRAM for handling direct-update
- Synchronous update to the FAST DRAM
- Asynchronous update to the SLOW NVM
2018-10-23
137
Caches
NVM Log Log Buffer
ASYNCHRONOUS update to NVM
NVM Data Processor NVM
Redo-logs
Redo log with Direct Update (ReDU)
- Our approach: use DRAM for handling direct-update
- Synchronous update to the FAST DRAM
- Asynchronous update to the SLOW NVM
2018-10-23
138
Caches
NVM Log Log Buffer
ASYNCHRONOUS update to NVM
NVM Data DRAM Processor NVM
Redo-logs
Redo log with Direct Update (ReDU)
- Our approach: use DRAM for handling direct-update
- Synchronous update to the FAST DRAM
- Asynchronous update to the SLOW NVM
2018-10-23
139
Caches
NVM Log Log Buffer
ASYNCHRONOUS update to NVM
NVM Data DRAM Processor NVM
SYNCHRONOUS update to fast DRAM
Redo-logs
Redo log with Direct Update (ReDU)
- Our approach: use DRAM for handling direct-update
- Synchronous update to the FAST DRAM
- Asynchronous update to the SLOW NVM
2018-10-23
140
Caches
NVM Log Log Buffer
ASYNCHRONOUS update to NVM
NVM Data DRAM Processor NVM
SYNCHRONOUS update to fast DRAM ASYNCHRONOUS update to slow NVM
Redo-logs
ReDU – Direct-Update
2018-10-23
141
Last-level Cache
L1 Cache L1 Cache
Data
Processor NVM
ReDU – Direct-Update
- Track the write-set within L1 cache
- No on-chip cache modifications except L1
- DRAM cache stores:
- “Early-evicted”: modified cachelines evicted from L1 before commit
- “On-commit-flushed”: modified cachelines in L1 on commit
- For both events, explicitly flush through the DRAM cache
2018-10-23
142
Last-level Cache
L1 Cache L1 Cache
Data
Processor NVM
ReDU – Direct-Update
- Track the write-set within L1 cache
- No on-chip cache modifications except L1
- DRAM cache stores:
- “Early-evicted”: modified cachelines evicted from L1 before commit
- “On-commit-flushed”: modified cachelines in L1 on commit
- For both events, explicitly flush through the DRAM cache
2018-10-23
143
Last-level Cache
DRAM Cache L1 Cache L1 Cache
Data
Processor NVM
ReDU – Direct-Update
- Track the write-set within L1 cache
- No on-chip cache modifications except L1
- DRAM cache stores:
- “Early-evicted”: modified cachelines evicted from L1 before commit
- “On-commit-flushed”: modified cachelines in L1 on commit
- For both events, explicitly flush through the DRAM cache
2018-10-23
144
Last-level Cache
DRAM Cache L1 Cache L1 Cache
Data Early-eviction
Processor NVM
ReDU – Direct-Update
- Track the write-set within L1 cache
- No on-chip cache modifications except L1
- DRAM cache stores:
- “Early-evicted”: modified cachelines evicted from L1 before commit
- “On-commit-flushed”: modified cachelines in L1 on commit
- For both events, explicitly flush through the DRAM cache
2018-10-23
145
Last-level Cache
DRAM Cache L1 Cache L1 Cache
Data Early-eviction
Processor NVM
ReDU – Direct-Update
- Track the write-set within L1 cache
- No on-chip cache modifications except L1
- DRAM cache stores:
- “Early-evicted”: modified cachelines evicted from L1 before commit
- “On-commit-flushed”: modified cachelines in L1 on commit
- For both events, explicitly flush through the DRAM cache
2018-10-23
146
Last-level Cache
DRAM Cache L1 Cache L1 Cache
Data Early-eviction
Processor NVM
ReDU – Direct-Update
- Track the write-set within L1 cache
- No on-chip cache modifications except L1
- DRAM cache stores:
- “Early-evicted”: modified cachelines evicted from L1 before commit
- “On-commit-flushed”: modified cachelines in L1 on commit
- For both events, explicitly flush through the DRAM cache
2018-10-23
147
Last-level Cache
DRAM Cache L1 Cache L1 Cache
Data Early-eviction
Processor NVM
ReDU – Direct-Update
- Track the write-set within L1 cache
- No on-chip cache modifications except L1
- DRAM cache stores:
- “Early-evicted”: modified cachelines evicted from L1 before commit
- “On-commit-flushed”: modified cachelines in L1 on commit
- For both events, explicitly flush through the DRAM cache
2018-10-23
148
Last-level Cache
DRAM Cache L1 Cache L1 Cache
Data Early-eviction
Processor DRAM NVM
ReDU – Direct-Update
- Track the write-set within L1 cache
- No on-chip cache modifications except L1
- DRAM cache stores:
- “Early-evicted”: modified cachelines evicted from L1 before commit
- “On-commit-flushed”: modified cachelines in L1 on commit
- For both events, explicitly flush through the DRAM cache
2018-10-23
149
Last-level Cache
DRAM Cache L1 Cache L1 Cache
Data Early-eviction
Processor DRAM NVM
ReDU – Direct-Update
- Track the write-set within L1 cache
- No on-chip cache modifications except L1
- DRAM cache stores:
- “Early-evicted”: modified cachelines evicted from L1 before commit
- “On-commit-flushed”: modified cachelines in L1 on commit
- For both events, explicitly flush through the DRAM cache
2018-10-23
150
Last-level Cache
DRAM Cache L1 Cache L1 Cache
Data Early-eviction
Processor DRAM NVM
ReDU – Direct-Update
- Track the write-set within L1 cache
- No on-chip cache modifications except L1
- DRAM cache stores:
- “Early-evicted”: modified cachelines evicted from L1 before commit
- “On-commit-flushed”: modified cachelines in L1 on commit
- For both events, explicitly flush through the DRAM cache
2018-10-23
151
Last-level Cache
DRAM Cache L1 Cache L1 Cache
Data Early-eviction
Processor DRAM NVM
ReDU – Direct-Update
- Track the write-set within L1 cache
- No on-chip cache modifications except L1
- DRAM cache stores:
- “Early-evicted”: modified cachelines evicted from L1 before commit
- “On-commit-flushed”: modified cachelines in L1 on commit
- For both events, explicitly flush through the DRAM cache
2018-10-23
152
Last-level Cache
DRAM Cache L1 Cache L1 Cache
Data Early-eviction On-commit-flushed
Processor DRAM NVM
ReDU – Direct-Update
- Track the write-set within L1 cache
- No on-chip cache modifications except L1
- DRAM cache stores:
- “Early-evicted”: modified cachelines evicted from L1 before commit
- “On-commit-flushed”: modified cachelines in L1 on commit
- For both events, explicitly flush through the DRAM cache
2018-10-23
153
Last-level Cache
DRAM Cache L1 Cache L1 Cache
Data Early-eviction On-commit-flushed
Processor DRAM NVM
ReDU – Direct-Update
- Track the write-set within L1 cache
- No on-chip cache modifications except L1
- DRAM cache stores:
- “Early-evicted”: modified cachelines evicted from L1 before commit
- “On-commit-flushed”: modified cachelines in L1 on commit
- For both events, explicitly flush through the DRAM cache
2018-10-23
154
Last-level Cache
DRAM Cache L1 Cache L1 Cache
Data Early-eviction On-commit-flushed
Processor DRAM NVM
ReDU – Direct-Update
- Track the write-set within L1 cache
- No on-chip cache modifications except L1
- DRAM cache stores:
- “Early-evicted”: modified cachelines evicted from L1 before commit
- “On-commit-flushed”: modified cachelines in L1 on commit
- For both events, explicitly flush through the DRAM cache
2018-10-23
155
Last-level Cache
DRAM Cache L1 Cache L1 Cache
Data Early-eviction On-commit-flushed
Processor DRAM NVM
ReDU – Direct-Update
- Track the write-set within L1 cache
- No on-chip cache modifications except L1
- DRAM cache stores:
- “Early-evicted”: modified cachelines evicted from L1 before commit
- “On-commit-flushed”: modified cachelines in L1 on commit
- For both events, explicitly flush through the DRAM cache
2018-10-23
156
Last-level Cache
DRAM Cache L1 Cache L1 Cache
Data Early-eviction On-commit-flushed
Processor DRAM NVM
ReDU – Direct-Update
2018-10-23
157
Last-level Cache
DRAM Cache L1 Cache L1 Cache
Data Early-eviction On-commit-flushed
Processor NVM DRAM
ReDU – Direct-Update
- Update to NVM done asynchronously
- Only flush cachelines that belong to the committed
transaction
- DRAM cache maintains the committed transaction IDs
- Various write-back policies are possible
- E.g., Eager or LRU
2018-10-23
158
Last-level Cache
DRAM Cache L1 Cache L1 Cache
Data Early-eviction On-commit-flushed
Processor NVM DRAM
ReDU – Direct-Update
- Update to NVM done asynchronously
- Only flush cachelines that belong to the committed
transaction
- DRAM cache maintains the committed transaction IDs
- Various write-back policies are possible
- E.g., Eager or LRU
2018-10-23
159
Last-level Cache
DRAM Cache L1 Cache L1 Cache
Data Early-eviction On-commit-flushed
Processor NVM DRAM
ReDU – Direct-Update
- Update to NVM done asynchronously
- Only flush cachelines that belong to the committed
transaction
- DRAM cache maintains the committed transaction IDs
- Various write-back policies are possible
- E.g., Eager or LRU
2018-10-23
160
Last-level Cache
DRAM Cache L1 Cache L1 Cache
Data Early-eviction On-commit-flushed
Processor NVM DRAM
More in the paper…
- Full design space exploration of HW logging
- Log optimization # 1: coalescing
- Log optimization # 2: packing
- Details of DRAM cache organization
- Transaction Table and Offset Table
- Bloom filter-based HW-filter to reduce DRAM accesses
- Evaluation of LRU write-back policy of the DRAM cache
- Log management
- …
2018-10-23
161
Methodology
- Gem5 simulator
- Comparing schemes
- All equally include log optimizations (e.g., coalescing and packing)
- UndoSync: undo log with synchronous commit
- RedoIndirect: redo log with asynchronous but indirect update
- Undo+ Redo: undo+ redo log with asynchronous & direct update
- ReDU: our approach
Processor OoO, 2GHz, x86 L1 I/D cache Private, 32KB, 8-way L2 cache Private, 256KB, 8-way L3 cache Shared, 8MB, 16-way DRAM cache 40MB (8MB meta + 32MB data) NVM Read: 50ns, write: 150ns
2018-10-23
162 Micro-bench Vector, Swap NVML HashMap, B-Tree, RB-Tree Macro-bench YCSB, TPCC, ECHO
- Benchmarks
Evaluation – Transaction Throughput
2018-10-23
163
0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1 2 3 Cycles per Transaction normalized to undo (Lower is better) 계열1 계열2 계열3 계열4
Evaluation – Transaction Throughput
- Large & sequential workloads
- Undo and ReDU perform similarly
(same data path and NVM bandwidth saturated)
- Redo suffers from indirect update
- UndoRedo requires double NVM writes
for logs
- Small & Random workloads
- On average
2018-10-23
164
0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1 2 3 Cycles per Transaction normalized to undo (Lower is better) 계열1 계열2 계열3 계열4
Evaluation – Transaction Throughput
- Large & sequential workloads
- Undo and ReDU perform similarly
(same data path and NVM bandwidth saturated)
- Redo suffers from indirect update
- UndoRedo requires double NVM writes
for logs
- Small & Random workloads
- On average
2018-10-23
165
0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1 2 3 Cycles per Transaction normalized to undo (Lower is better) 계열1 계열2 계열3 계열4
Evaluation – Transaction Throughput
- Large & sequential workloads
- Undo and ReDU perform similarly
(same data path and NVM bandwidth saturated)
- Redo suffers from indirect update
- UndoRedo requires double NVM writes
for logs
- Small & Random workloads
- On average
2018-10-23
166
0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1 2 3 Cycles per Transaction normalized to undo (Lower is better) 계열1 계열2 계열3 계열4
Evaluation – Transaction Throughput
- Large & sequential workloads
- Undo and ReDU perform similarly
(same data path and NVM bandwidth saturated)
- Redo suffers from indirect update
- UndoRedo requires double NVM writes
for logs
- Small & Random workloads
- On average
2018-10-23
167
0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1 2 3 Cycles per Transaction normalized to undo (Lower is better) 계열1 계열2 계열3 계열4
Evaluation – Transaction Throughput
2018-10-23
168
0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1 2 3 Cycles per Transaction normalized to undo (Lower is better) 계열1 계열2 계열3 계열4
Evaluation – Transaction Throughput
- Large & Sequential workloads
- Small & Random workloads
- Undo waits synchronous commit
- Redo suffers from indirect update
- UndoRedo requires double NVM writes
for logs
- On average
2018-10-23
169
0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1 2 3 Cycles per Transaction normalized to undo (Lower is better) 계열1 계열2 계열3 계열4
Evaluation – Transaction Throughput
- Large & Sequential workloads
- Small & Random workloads
- Undo waits synchronous commit
- Redo suffers from indirect update
- UndoRedo requires double NVM writes
for logs
- On average
2018-10-23
170
0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1 2 3 Cycles per Transaction normalized to undo (Lower is better) 계열1 계열2 계열3 계열4
Evaluation – Transaction Throughput
- Large & Sequential workloads
- Small & Random workloads
- Undo waits synchronous commit
- Redo suffers from indirect update
- UndoRedo requires double NVM writes
for logs
- On average
2018-10-23
171
0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1 2 3 Cycles per Transaction normalized to undo (Lower is better) 계열1 계열2 계열3 계열4
Evaluation – Transaction Throughput
- Large & Sequential workloads
- Small & Random workloads
- Undo waits synchronous commit
- Redo suffers from indirect update
- UndoRedo requires double NVM writes
for logs
- On average
2018-10-23
172
0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1 2 3 Cycles per Transaction normalized to undo (Lower is better) 계열1 계열2 계열3 계열4
Evaluation – Transaction Throughput
- Large & Sequential workloads
- Small & Random workloads
- On average
- Asynchronous update 9%
- Direct update 16%
- Small log size 30%
2018-10-23
173
0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1 2 3 Cycles per Transaction normalized to undo (Lower is better) 계열1 계열2 계열3 계열4
Summary
- Problem: crash-consistency in storage-class memory
- Atomicity and durability support for NVM writes
- Existing hardware solutions exhibit trade-offs
- Solution: Redo log with Direct Updates (ReDU)
- Redo-based log with optimizations
- Synchronous update to the fast DRAM
- Asynchronous update to the slow NVM
- Results: ReDU outperforms existing solutions in various
workloads
- Bringing DRAM into the atomicity and durability
2018-10-23
174