Efficient Hardware-assisted Logging with Asynchronous and Direct - - PowerPoint PPT Presentation

efficient hardware assisted logging
SMART_READER_LITE
LIVE PREVIEW

Efficient Hardware-assisted Logging with Asynchronous and Direct - - PowerPoint PPT Presentation

Efficient Hardware-assisted Logging with Asynchronous and Direct Update for Persistent Memory Jungi Jeong , Chang Hyun Park, Jaehyuk Huh, and Seungryoul Maeng International Symposium on Microarchitecture (MICRO) 2018 Storage-Class Memory


slide-1
SLIDE 1

Efficient Hardware-assisted Logging

with Asynchronous and Direct Update for Persistent Memory

Jungi Jeong, Chang Hyun Park,

Jaehyuk Huh, and Seungryoul Maeng

International Symposium on Microarchitecture (MICRO) 2018

slide-2
SLIDE 2

Memory- mapped

Storage-Class Memory

NVDIMMs Application

Load/Store User Space ce Kernel Space ce Devices 22

2018-10-23

NVM-aware File System

slide-3
SLIDE 3
  • Directly attached to the app’s virtual address space
  • Accessible through load/store instructions
  • In-memory data persistency
  • Ex) Doubly linked-list insertion

Memory- mapped

Storage-Class Memory

NVDIMMs Application

Load/Store User Space ce Kernel Space ce Devices 23

2018-10-23

NVM-aware File System

slide-4
SLIDE 4
  • Directly attached to the app’s virtual address space
  • Accessible through load/store instructions
  • In-memory data persistency
  • Ex) Doubly linked-list insertion

Memory- mapped

Storage-Class Memory

NVDIMMs Application

Load/Store User Space ce Kernel Space ce Devices 24

2018-10-23

NVM-aware File System

slide-5
SLIDE 5
  • Directly attached to the app’s virtual address space
  • Accessible through load/store instructions
  • In-memory data persistency
  • Ex) Doubly linked-list insertion

Memory- mapped

Storage-Class Memory

NVDIMMs Application

Load/Store User Space ce Kernel Space ce Devices 25

2018-10-23

NVM-aware File System

A

slide-6
SLIDE 6
  • Directly attached to the app’s virtual address space
  • Accessible through load/store instructions
  • In-memory data persistency
  • Ex) Doubly linked-list insertion

Memory- mapped

Storage-Class Memory

NVDIMMs Application

Load/Store User Space ce Kernel Space ce Devices 26

2018-10-23

NVM-aware File System

A C

slide-7
SLIDE 7
  • Directly attached to the app’s virtual address space
  • Accessible through load/store instructions
  • In-memory data persistency
  • Ex) Doubly linked-list insertion

Memory- mapped

Storage-Class Memory

NVDIMMs Application

Load/Store User Space ce Kernel Space ce Devices 27

2018-10-23

NVM-aware File System

A C B

slide-8
SLIDE 8
  • Directly attached to the app’s virtual address space
  • Accessible through load/store instructions
  • In-memory data persistency
  • Ex) Doubly linked-list insertion

Memory- mapped

Storage-Class Memory

NVDIMMs Application

Load/Store User Space ce Kernel Space ce Devices 28

2018-10-23

NVM-aware File System

A C B

slide-9
SLIDE 9
  • Directly attached to the app’s virtual address space
  • Accessible through load/store instructions
  • In-memory data persistency
  • Ex) Doubly linked-list insertion

Memory- mapped

Storage-Class Memory

NVDIMMs Application

Load/Store User Space ce Kernel Space ce Devices 29

2018-10-23

NVM-aware File System

A C B

slide-10
SLIDE 10
  • Directly attached to the app’s virtual address space
  • Accessible through load/store instructions
  • In-memory data persistency
  • Ex) Doubly linked-list insertion

Memory- mapped

Storage-Class Memory

NVDIMMs Application

Load/Store User Space ce Kernel Space ce Devices 30

2018-10-23

NVM-aware File System

A C B

store B-> next= C store B-> prev= A store A-> next= B store C-> prev= B

slide-11
SLIDE 11
  • Directly attached to the app’s virtual address space
  • Accessible through load/store instructions
  • In-memory data persistency
  • Ex) Doubly linked-list insertion

Memory- mapped

Storage-Class Memory

NVDIMMs Application

Load/Store User Space ce Kernel Space ce Devices 31

2018-10-23

NVM-aware File System

A C B

store B-> next= C store B-> prev= A store A-> next= B store C-> prev= B

slide-12
SLIDE 12
  • Directly attached to the app’s virtual address space
  • Accessible through load/store instructions
  • In-memory data persistency
  • Ex) Doubly linked-list insertion

Memory- mapped

Storage-Class Memory

NVDIMMs Application

Load/Store User Space ce Kernel Space ce Devices 32

2018-10-23

NVM-aware File System

A C B

store B-> next= C store B-> prev= A store A-> next= B store C-> prev= B

slide-13
SLIDE 13
  • Directly attached to the app’s virtual address space
  • Accessible through load/store instructions
  • In-memory data persistency
  • Ex) Doubly linked-list insertion

Memory- mapped

Storage-Class Memory

NVDIMMs Application

Load/Store User Space ce Kernel Space ce Devices 33

2018-10-23

NVM-aware File System

A C B

store B-> next= C store B-> prev= A store A-> next= B store C-> prev= B

slide-14
SLIDE 14
  • Directly attached to the app’s virtual address space
  • Accessible through load/store instructions
  • In-memory data persistency
  • Ex) Doubly linked-list insertion

Memory- mapped

Storage-Class Memory

NVDIMMs Application

Load/Store User Space ce Kernel Space ce Devices 34

2018-10-23

NVM-aware File System

A C B

store B-> next= C store B-> prev= A store A-> next= B store C-> prev= B

slide-15
SLIDE 15
  • Directly attached to the app’s virtual address space
  • Accessible through load/store instructions
  • In-memory data persistency
  • Ex) Doubly linked-list insertion

Memory- mapped

Storage-Class Memory

NVDIMMs Application

Load/Store User Space ce Kernel Space ce Devices 35

2018-10-23

NVM-aware File System

A C B

store B-> next= C store B-> prev= A store A-> next= B store C-> prev= B

slide-16
SLIDE 16
  • Directly attached to the app’s virtual address space
  • Accessible through load/store instructions
  • In-memory data persistency
  • Ex) Doubly linked-list insertion

Memory- mapped

Storage-Class Memory

NVDIMMs Application

Load/Store User Space ce Kernel Space ce Devices 36

2018-10-23

NVM-aware File System

A C B

store B-> next= C store B-> prev= A store A-> next= B store C-> prev= B

slide-17
SLIDE 17
  • Directly attached to the app’s virtual address space
  • Accessible through load/store instructions
  • In-memory data persistency
  • Ex) Doubly linked-list insertion

Memory- mapped

Storage-Class Memory

NVDIMMs Application

Load/Store User Space ce Kernel Space ce Devices 37

2018-10-23

NVM-aware File System

A C B

store B-> next= C store B-> prev= A store A-> next= B store C-> prev= B

slide-18
SLIDE 18
  • Directly attached to the app’s virtual address space
  • Accessible through load/store instructions
  • In-memory data persistency
  • Ex) Doubly linked-list insertion

Memory- mapped

Storage-Class Memory

NVDIMMs Application

Load/Store User Space ce Kernel Space ce Devices 38

2018-10-23

NVM-aware File System

A C B

store B-> next= C store B-> prev= A store A-> next= B store C-> prev= B

slide-19
SLIDE 19
  • Directly attached to the app’s virtual address space
  • Accessible through load/store instructions
  • In-memory data persistency
  • Ex) Doubly linked-list insertion

Memory- mapped

Storage-Class Memory

NVDIMMs Application

Load/Store User Space ce Kernel Space ce Devices 39

2018-10-23

NVM-aware File System

A C B

store B-> next= C store B-> prev= A store A-> next= B store C-> prev= B

slide-20
SLIDE 20
  • Directly attached to the app’s virtual address space
  • Accessible through load/store instructions
  • In-memory data persistency
  • Ex) Doubly linked-list insertion

Memory- mapped

Storage-Class Memory

NVDIMMs Application

Load/Store User Space ce Kernel Space ce Devices 40

2018-10-23

NVM-aware File System

A C B

store B-> next= C store B-> prev= A store A-> next= B store C-> prev= B

slide-21
SLIDE 21
  • Directly attached to the app’s virtual address space
  • Accessible through load/store instructions
  • In-memory data persistency
  • Ex) Doubly linked-list insertion

Memory- mapped

Storage-Class Memory

NVDIMMs Application

Load/Store User Space ce Kernel Space ce Devices 41

2018-10-23

NVM-aware File System

A C B

store B-> next= C store B-> prev= A store A-> next= B store C-> prev= B

slide-22
SLIDE 22
  • Directly attached to the app’s virtual address space
  • Accessible through load/store instructions
  • In-memory data persistency
  • Ex) Doubly linked-list insertion

Memory- mapped

Storage-Class Memory

NVDIMMs Application

Load/Store User Space ce Kernel Space ce Devices 42

2018-10-23

NVM-aware File System

A C B

store B-> next= C store B-> prev= A store A-> next= B store C-> prev= B

New requirement: Supporting Crash-Consistency of NVM stores

slide-23
SLIDE 23

Atomic Durability through Logging

43

2018-10-23

slide-24
SLIDE 24
  • Transaction
  • All stores in a transaction become durable all together or nothing
  • Ex) Atomic durability in software

Atomic Durability through Logging

44

2018-10-23

slide-25
SLIDE 25
  • Transaction
  • All stores in a transaction become durable all together or nothing
  • Ex) Atomic durability in software

Atomic Durability through Logging

45

2018-10-23

Durability with

cache-flush

store B-> next= C store B-> prev= A store A-> next= B store C-> prev= B

slide-26
SLIDE 26
  • Transaction
  • All stores in a transaction become durable all together or nothing
  • Ex) Atomic durability in software

Atomic Durability through Logging

46

2018-10-23

Durability with

cache-flush

Atomicity and ordering with

write-ahead logging

store B-> next= C store B-> prev= A store A-> next= B store C-> prev= B

slide-27
SLIDE 27
  • Transaction
  • All stores in a transaction become durable all together or nothing
  • Ex) Atomic durability in software

Atomic Durability through Logging

47

2018-10-23

Durability with

cache-flush

Atomicity and ordering with

write-ahead logging

store B-> next= C store B-> prev= A store A-> next= B store C-> prev= B

slide-28
SLIDE 28
  • Transaction
  • All stores in a transaction become durable all together or nothing
  • Ex) Atomic durability in software

Atomic Durability through Logging

48

2018-10-23

Durability with

cache-flush

Atomicity and ordering with

write-ahead logging

store B-> next= C store B-> prev= A store A-> next= B store C-> prev= B store B-> next= C store B-> prev= A store A-> next= B store C-> prev= B

cache-flush

slide-29
SLIDE 29
  • Transaction
  • All stores in a transaction become durable all together or nothing
  • Ex) Atomic durability in software

Atomic Durability through Logging

49

2018-10-23

Durability with

cache-flush

Atomicity and ordering with

write-ahead logging

Log Write

store B-> next= C store B-> prev= A store A-> next= B store C-> prev= B store B-> next= C store B-> prev= A store A-> next= B store C-> prev= B

cache-flush

slide-30
SLIDE 30
  • Transaction
  • All stores in a transaction become durable all together or nothing
  • Ex) Atomic durability in software

Atomic Durability through Logging

50

2018-10-23

Durability with

cache-flush

Atomicity and ordering with

write-ahead logging

Log Write Data Update

store B-> next= C store B-> prev= A store A-> next= B store C-> prev= B store B-> next= C store B-> prev= A store A-> next= B store C-> prev= B

cache-flush

slide-31
SLIDE 31
  • Transaction
  • All stores in a transaction become durable all together or nothing
  • Ex) Atomic durability in software

Atomic Durability through Logging

51

2018-10-23

Durability with

cache-flush

Atomicity and ordering with

write-ahead logging

Log Write Data Update

store B-> next= C store B-> prev= A store A-> next= B store C-> prev= B store B-> next= C store B-> prev= A store A-> next= B store C-> prev= B

cache-flush store log[0]= A-> next store log[1]= C-> prev cache-flush sfence

store B-> next= C store B-> prev= A store A-> next= B store C-> prev= B cache-flush

sfence

slide-32
SLIDE 32
  • Transaction
  • All stores in a transaction become durable all together or nothing
  • Ex) Atomic durability in software

Atomic Durability through Logging

52

2018-10-23

Durability with

cache-flush

Atomicity and ordering with

write-ahead logging

Log Write Data Update

store B-> next= C store B-> prev= A store A-> next= B store C-> prev= B store B-> next= C store B-> prev= A store A-> next= B store C-> prev= B

cache-flush store log[0]= A-> next store log[1]= C-> prev cache-flush sfence

store B-> next= C store B-> prev= A store A-> next= B store C-> prev= B cache-flush

sfence

slide-33
SLIDE 33
  • Transaction
  • All stores in a transaction become durable all together or nothing
  • Ex) Atomic durability in software

Atomic Durability through Logging

53

2018-10-23

Durability with

cache-flush

Atomicity and ordering with

write-ahead logging

Log Write Data Update

Persist-ordering with

store-fence

store B-> next= C store B-> prev= A store A-> next= B store C-> prev= B store B-> next= C store B-> prev= A store A-> next= B store C-> prev= B

cache-flush store log[0]= A-> next store log[1]= C-> prev cache-flush sfence

store B-> next= C store B-> prev= A store A-> next= B store C-> prev= B cache-flush

sfence

slide-34
SLIDE 34
  • Transaction
  • All stores in a transaction become durable all together or nothing
  • Ex) Atomic durability in software

Atomic Durability through Logging

54

2018-10-23

Durability with

cache-flush

Atomicity and ordering with

write-ahead logging

Log Write Data Update

Persist-ordering with

store-fence

store B-> next= C store B-> prev= A store A-> next= B store C-> prev= B store B-> next= C store B-> prev= A store A-> next= B store C-> prev= B

cache-flush store log[0]= A-> next store log[1]= C-> prev cache-flush sfence

store B-> next= C store B-> prev= A store A-> next= B store C-> prev= B cache-flush

sfence

slide-35
SLIDE 35
  • Transaction
  • All stores in a transaction become durable all together or nothing
  • Ex) Atomic durability in software

Atomic Durability through Logging

55

2018-10-23

Durability with

cache-flush

Atomicity and ordering with

write-ahead logging

Log Write Data Update

Persist-ordering with

store-fence

store B-> next= C store B-> prev= A store A-> next= B store C-> prev= B store B-> next= C store B-> prev= A store A-> next= B store C-> prev= B

cache-flush store log[0]= A-> next store log[1]= C-> prev cache-flush sfence

store B-> next= C store B-> prev= A store A-> next= B store C-> prev= B cache-flush

sfence

slide-36
SLIDE 36
  • Transaction
  • All stores in a transaction become durable all together or nothing
  • Ex) Atomic durability in software

Atomic Durability through Logging

56

2018-10-23

Durability with

cache-flush

Atomicity and ordering with

write-ahead logging

Log Write Data Update

Persist-ordering with

store-fence

store B-> next= C store B-> prev= A store A-> next= B store C-> prev= B store B-> next= C store B-> prev= A store A-> next= B store C-> prev= B

cache-flush store log[0]= A-> next store log[1]= C-> prev cache-flush sfence

store B-> next= C store B-> prev= A store A-> next= B store C-> prev= B cache-flush

sfence

slide-37
SLIDE 37

HW-assisted Logging

2018-10-23

57

slide-38
SLIDE 38

HW-assisted Logging

  • Simple programming model
  • HW is responsible for 1) log-write and 2) data update
  • Advantages over software-logging
  • Fine-grained ordering & less CPU cycles

2018-10-23

58

slide-39
SLIDE 39

HW-assisted Logging

  • Simple programming model
  • HW is responsible for 1) log-write and 2) data update
  • Advantages over software-logging
  • Fine-grained ordering & less CPU cycles

2018-10-23

59

Transa sact ct ion_begin( ) ( ) store B-> next= C … store C-> prev= B Transact ion_end( ) ( )

slide-40
SLIDE 40

HW-assisted Logging

  • Simple programming model
  • HW is responsible for 1) log-write and 2) data update
  • Advantages over software-logging
  • Fine-grained ordering & less CPU cycles

2018-10-23

60

Transa sact ct ion_begin( ) ( ) store B-> next= C … store C-> prev= B Transact ion_end( ) ( ) store B-> next= C … store C-> prev= B

slide-41
SLIDE 41

HW-assisted Logging

  • Simple programming model
  • HW is responsible for 1) log-write and 2) data update
  • Advantages over software-logging
  • Fine-grained ordering & less CPU cycles

2018-10-23

61

Transa sact ct ion_begin( ) ( ) store B-> next= C … store C-> prev= B Transact ion_end( ) ( ) store B-> next= C … store C-> prev= B

slide-42
SLIDE 42

HW-assisted Logging

  • Simple programming model
  • HW is responsible for 1) log-write and 2) data update
  • Advantages over software-logging
  • Fine-grained ordering & less CPU cycles

2018-10-23

62

Transa sact ct ion_begin( ) ( ) store B-> next= C … store C-> prev= B Transact ion_end( ) ( ) store B-> next= C … store C-> prev= B

Caches

NVM Log 1) Log-Write Log Ctrl. 2) Data-Update NVM Data

Processor NVM

slide-43
SLIDE 43

HW-assisted Logging

  • Simple programming model
  • HW is responsible for 1) log-write and 2) data update
  • Advantages over software-logging
  • Fine-grained ordering & less CPU cycles

2018-10-23

63

Transa sact ct ion_begin( ) ( ) store B-> next= C … store C-> prev= B Transact ion_end( ) ( ) store B-> next= C … store C-> prev= B

Caches

NVM Log 1) Log-Write Log Ctrl. 2) Data-Update NVM Data

Processor NVM

Log A Log B

Fence

Store A Store B Log A Log B Store A Store B

(a) Ordering with SW (b) Ordering with HW

slide-44
SLIDE 44

Past Proposal: Undo-based HW-Logging

2018-10-23

64

  • A. Joshi et al. HPCA 2017.
  • S. Shin et al. ISCA 2017.

Caches

NVM Log Log Ctrl. NVM Data

Processor NVM

slide-45
SLIDE 45

Past Proposal: Undo-based HW-Logging

  • Store old value in logs
  • Update data in NVM before commit

 Synchronous data-update

2018-10-23

65

  • A. Joshi et al. HPCA 2017.
  • S. Shin et al. ISCA 2017.

Caches

NVM Log Log Ctrl. NVM Data

Processor NVM

slide-46
SLIDE 46

Past Proposal: Undo-based HW-Logging

  • Store old value in logs
  • Update data in NVM before commit

 Synchronous data-update

2018-10-23

66

  • A. Joshi et al. HPCA 2017.
  • S. Shin et al. ISCA 2017.

Caches

NVM Log 1) Store old value in NVM logs Log Ctrl. NVM Data

Processor NVM

slide-47
SLIDE 47

Past Proposal: Undo-based HW-Logging

  • Store old value in logs
  • Update data in NVM before commit

 Synchronous data-update

2018-10-23

67

  • A. Joshi et al. HPCA 2017.
  • S. Shin et al. ISCA 2017.

Caches

NVM Log 1) Store old value in NVM logs Log Ctrl. NVM Data

Processor NVM

slide-48
SLIDE 48

Past Proposal: Undo-based HW-Logging

  • Store old value in logs
  • Update data in NVM before commit

 Synchronous data-update

2018-10-23

68

  • A. Joshi et al. HPCA 2017.
  • S. Shin et al. ISCA 2017.

Caches

NVM Log 1) Store old value in NVM logs Log Ctrl. NVM Data

Processor NVM

Addr

slide-49
SLIDE 49

Past Proposal: Undo-based HW-Logging

  • Store old value in logs
  • Update data in NVM before commit

 Synchronous data-update

2018-10-23

69

  • A. Joshi et al. HPCA 2017.
  • S. Shin et al. ISCA 2017.

Caches

NVM Log 1) Store old value in NVM logs Log Ctrl. 2) Update data in NVM NVM Data

Processor NVM

Addr

Old Value

slide-50
SLIDE 50

Past Proposal: Undo-based HW-Logging

  • Store old value in logs
  • Update data in NVM before commit

 Synchronous data-update

2018-10-23

70

  • A. Joshi et al. HPCA 2017.
  • S. Shin et al. ISCA 2017.

Caches

NVM Log 1) Store old value in NVM logs Log Ctrl. 2) Update data in NVM NVM Data

Processor NVM

Long critical path due to synchronous data-update

Addr

Old Value

slide-51
SLIDE 51

Past Proposal: Undo-based HW-Logging

  • Store old value in logs
  • Update data in NVM before commit

 Synchronous data-update

2018-10-23

71

  • A. Joshi et al. HPCA 2017.
  • S. Shin et al. ISCA 2017.

Caches

NVM Log 1) Store old value in NVM logs Log Ctrl. 2) Update data in NVM NVM Data

Processor NVM

Long critical path due to synchronous data-update

Addr

Old Value

slide-52
SLIDE 52

Past Proposal: Undo-based HW-Logging

  • Store old value in logs
  • Update data in NVM before commit

 Synchronous data-update

2018-10-23

72

  • A. Joshi et al. HPCA 2017.
  • S. Shin et al. ISCA 2017.

Caches

NVM Log 1) Store old value in NVM logs Log Ctrl. 2) Update data in NVM NVM Data

Processor NVM

Long critical path due to synchronous data-update

Addr

Old Value

slide-53
SLIDE 53

Past Proposal: Redo-based HW-Logging

2018-10-23

73

  • K. Doshi et al. HPCA 2016.

Caches

NVM Log Log Ctrl. NVM Data

Processor NVM

Addr

New Value

slide-54
SLIDE 54

Past Proposal: Redo-based HW-Logging

  • Store new value in logs
  • Update data in NVM after commit

 Asynchronous data-update

  • However, update by reading log entries from NVM

 I ndirect data-update

2018-10-23

74

  • K. Doshi et al. HPCA 2016.

Caches

NVM Log Log Ctrl. NVM Data

Processor NVM

Addr

New Value

slide-55
SLIDE 55

Past Proposal: Redo-based HW-Logging

  • Store new value in logs
  • Update data in NVM after commit

 Asynchronous data-update

  • However, update by reading log entries from NVM

 I ndirect data-update

2018-10-23

75

  • K. Doshi et al. HPCA 2016.

Caches

NVM Log 1) Store new value in NVM logs Log Ctrl. NVM Data

Processor NVM

Addr

New Value

slide-56
SLIDE 56

Past Proposal: Redo-based HW-Logging

  • Store new value in logs
  • Update data in NVM after commit

 Asynchronous data-update

  • However, update by reading log entries from NVM

 I ndirect data-update

2018-10-23

76

  • K. Doshi et al. HPCA 2016.

Caches

NVM Log 1) Store new value in NVM logs Log Ctrl. NVM Data

Processor NVM

Addr

New Value

slide-57
SLIDE 57

Past Proposal: Redo-based HW-Logging

  • Store new value in logs
  • Update data in NVM after commit

 Asynchronous data-update

  • However, update by reading log entries from NVM

 I ndirect data-update

2018-10-23

77

  • K. Doshi et al. HPCA 2016.

Caches

NVM Log 1) Store new value in NVM logs Log Ctrl. 2) Update data in NVM NVM Data

Processor NVM

Addr

New Value

slide-58
SLIDE 58

Past Proposal: Redo-based HW-Logging

  • Store new value in logs
  • Update data in NVM after commit

 Asynchronous data-update

  • However, update by reading log entries from NVM

 I ndirect data-update

2018-10-23

78

  • K. Doshi et al. HPCA 2016.

Caches

NVM Log 1) Store new value in NVM logs Log Ctrl. 2) Update data in NVM NVM Data

Processor NVM

Addr

New Value

slide-59
SLIDE 59

Past Proposal: Redo-based HW-Logging

  • Store new value in logs
  • Update data in NVM after commit

 Asynchronous data-update

  • However, update by reading log entries from NVM

 I ndirect data-update

2018-10-23

79

  • K. Doshi et al. HPCA 2016.

Caches

NVM Log 1) Store new value in NVM logs Log Ctrl. 2) Update data in NVM NVM Data

Processor NVM

Wastes extra NVM bandwidth for reading logs from NVM

Addr

New Value

slide-60
SLIDE 60

Past Proposal: Undo-Redo HW-Logging

2018-10-23

80

Caches

NVM Log NVM Data

Processor NVM

  • M. Ogleari et al. HPCA 2018.

Log Buffer

Addr

Old Value New Value

slide-61
SLIDE 61

Past Proposal: Undo-Redo HW-Logging

  • Store both old and new value in logs

 Larger log sizes

  • Update data in NVM after commit

2018-10-23

81

Caches

NVM Log NVM Data

Processor NVM

  • M. Ogleari et al. HPCA 2018.

Log Buffer

Addr

Old Value New Value

slide-62
SLIDE 62

Past Proposal: Undo-Redo HW-Logging

  • Store both old and new value in logs

 Larger log sizes

  • Update data in NVM after commit

2018-10-23

82

Requires more NVM writes for storing logs in NVM

Caches

NVM Log 1) Store both old and new value in NVM logs NVM Data

Processor NVM

  • M. Ogleari et al. HPCA 2018.

Log Buffer

Addr

Old Value New Value

slide-63
SLIDE 63

Past Proposal: Undo-Redo HW-Logging

  • Store both old and new value in logs

 Larger log sizes

  • Update data in NVM after commit

2018-10-23

83

Requires more NVM writes for storing logs in NVM

Caches

NVM Log 1) Store both old and new value in NVM logs 2) Update data in NVM NVM Data

Processor NVM

  • M. Ogleari et al. HPCA 2018.

Log Buffer

Addr

Old Value New Value

slide-64
SLIDE 64

Past Proposal: Undo-Redo HW-Logging

  • Store both old and new value in logs

 Larger log sizes

  • Update data in NVM after commit

2018-10-23

84

Requires more NVM writes for storing logs in NVM

Caches

NVM Log 1) Store both old and new value in NVM logs 2) Update data in NVM NVM Data

Processor NVM

  • M. Ogleari et al. HPCA 2018.

Log Buffer

Addr

Old Value New Value

slide-65
SLIDE 65

Past Proposal: Undo-Redo HW-Logging

  • Store both old and new value in logs

 Larger log sizes

  • Update data in NVM after commit

2018-10-23

85

Requires more NVM writes for storing logs in NVM

Caches

NVM Log 1) Store both old and new value in NVM logs 2) Update data in NVM NVM Data

Processor NVM

  • M. Ogleari et al. HPCA 2018.

Log Buffer

Addr

Old Value New Value

slide-66
SLIDE 66

Past Proposal: Undo-Redo HW-Logging

  • Store both old and new value in logs

 Larger log sizes

  • Update data in NVM after commit

2018-10-23

86

Requires more NVM writes for storing logs in NVM

Caches

NVM Log 1) Store both old and new value in NVM logs 2) Update data in NVM NVM Data

Processor NVM

  • M. Ogleari et al. HPCA 2018.

Log Buffer

Addr

Old Value New Value

slide-67
SLIDE 67

Past Proposals: Summary

2018-10-23

87

Log-Write Data-Update Drawback

ATOM [HPCA 2017] Undo Direct

Synchronous Long Critical Path

Proteus [ISCA 2017] Undo Direct

Synchronous

Wrap [HPCA 2016] Redo

I ndirect

Asynchronous

Waste NVM Bandwidth

FWB [HPCA 2018]

UndoRedo

Direct Asynchronous

More Log Write

slide-68
SLIDE 68

0.4 0.6 0.8 1 1.2

1 2 3

0.4 0.6 0.8 1 1.2 1.4

1 2 3

Past Proposals: Summary

2018-10-23

88

Log-Write Data-Update Drawback

ATOM [HPCA 2017] Undo Direct

Synchronous Long Critical Path

Proteus [ISCA 2017] Undo Direct

Synchronous

Wrap [HPCA 2016] Redo

I ndirect

Asynchronous

Waste NVM Bandwidth

FWB [HPCA 2018]

UndoRedo

Direct Asynchronous

More Log Write

Large & Sequential Workloads

slide-69
SLIDE 69

0.4 0.6 0.8 1 1.2

1 2 3

0.4 0.6 0.8 1 1.2 1.4

1 2 3

Past Proposals: Summary

2018-10-23

89

Log-Write Data-Update Drawback

ATOM [HPCA 2017] Undo Direct

Synchronous Long Critical Path

Proteus [ISCA 2017] Undo Direct

Synchronous

Wrap [HPCA 2016] Redo

I ndirect

Asynchronous

Waste NVM Bandwidth

FWB [HPCA 2018]

UndoRedo

Direct Asynchronous

More Log Write

Large & Sequential Workloads

Cycles per Transaction (CPT) Lower is Better

slide-70
SLIDE 70

0.4 0.6 0.8 1 1.2

1 2 3

0.4 0.6 0.8 1 1.2 1.4

1 2 3

Past Proposals: Summary

2018-10-23

90

Log-Write Data-Update Drawback

ATOM [HPCA 2017] Undo Direct

Synchronous Long Critical Path

Proteus [ISCA 2017] Undo Direct

Synchronous

Wrap [HPCA 2016] Redo

I ndirect

Asynchronous

Waste NVM Bandwidth

FWB [HPCA 2018]

UndoRedo

Direct Asynchronous

More Log Write

Large & Sequential Workloads Small & Random Workloads

Cycles per Transaction (CPT) Lower is Better

slide-71
SLIDE 71

0.4 0.6 0.8 1 1.2

1 2 3

0.4 0.6 0.8 1 1.2 1.4

1 2 3

Past Proposals: Summary

2018-10-23

91

Log-Write Data-Update Drawback

ATOM [HPCA 2017] Undo Direct

Synchronous Long Critical Path

Proteus [ISCA 2017] Undo Direct

Synchronous

Wrap [HPCA 2016] Redo

I ndirect

Asynchronous

Waste NVM Bandwidth

FWB [HPCA 2018]

UndoRedo

Direct Asynchronous

More Log Write

Large & Sequential Workloads Small & Random Workloads

Cycles per Transaction (CPT) Lower is Better

Undo-Friendly

slide-72
SLIDE 72

0.4 0.6 0.8 1 1.2

1 2 3

0.4 0.6 0.8 1 1.2 1.4

1 2 3

Past Proposals: Summary

2018-10-23

92

Log-Write Data-Update Drawback

ATOM [HPCA 2017] Undo Direct

Synchronous Long Critical Path

Proteus [ISCA 2017] Undo Direct

Synchronous

Wrap [HPCA 2016] Redo

I ndirect

Asynchronous

Waste NVM Bandwidth

FWB [HPCA 2018]

UndoRedo

Direct Asynchronous

More Log Write

Large & Sequential Workloads Small & Random Workloads

Cycles per Transaction (CPT) Lower is Better

Undo-Friendly

slide-73
SLIDE 73

0.4 0.6 0.8 1 1.2

1 2 3

0.4 0.6 0.8 1 1.2 1.4

1 2 3

Past Proposals: Summary

2018-10-23

93

Log-Write Data-Update Drawback

ATOM [HPCA 2017] Undo Direct

Synchronous Long Critical Path

Proteus [ISCA 2017] Undo Direct

Synchronous

Wrap [HPCA 2016] Redo

I ndirect

Asynchronous

Waste NVM Bandwidth

FWB [HPCA 2018]

UndoRedo

Direct Asynchronous

More Log Write

Large & Sequential Workloads Small & Random Workloads

Cycles per Transaction (CPT) Lower is Better

Undo-Friendly

slide-74
SLIDE 74

0.4 0.6 0.8 1 1.2

1 2 3

0.4 0.6 0.8 1 1.2 1.4

1 2 3

Past Proposals: Summary

2018-10-23

94

Log-Write Data-Update Drawback

ATOM [HPCA 2017] Undo Direct

Synchronous Long Critical Path

Proteus [ISCA 2017] Undo Direct

Synchronous

Wrap [HPCA 2016] Redo

I ndirect

Asynchronous

Waste NVM Bandwidth

FWB [HPCA 2018]

UndoRedo

Direct Asynchronous

More Log Write

Large & Sequential Workloads Small & Random Workloads

Cycles per Transaction (CPT) Lower is Better

Undo-Friendly

slide-75
SLIDE 75

0.4 0.6 0.8 1 1.2

1 2 3

0.4 0.6 0.8 1 1.2 1.4

1 2 3

Past Proposals: Summary

2018-10-23

95

Log-Write Data-Update Drawback

ATOM [HPCA 2017] Undo Direct

Synchronous Long Critical Path

Proteus [ISCA 2017] Undo Direct

Synchronous

Wrap [HPCA 2016] Redo

I ndirect

Asynchronous

Waste NVM Bandwidth

FWB [HPCA 2018]

UndoRedo

Direct Asynchronous

More Log Write

Large & Sequential Workloads Small & Random Workloads

Cycles per Transaction (CPT) Lower is Better

Undo-Friendly Redo-Friendly

slide-76
SLIDE 76

0.4 0.6 0.8 1 1.2

1 2 3

0.4 0.6 0.8 1 1.2 1.4

1 2 3

Past Proposals: Summary

2018-10-23

96

Log-Write Data-Update Drawback

ATOM [HPCA 2017] Undo Direct

Synchronous Long Critical Path

Proteus [ISCA 2017] Undo Direct

Synchronous

Wrap [HPCA 2016] Redo

I ndirect

Asynchronous

Waste NVM Bandwidth

FWB [HPCA 2018]

UndoRedo

Direct Asynchronous

More Log Write

Large & Sequential Workloads Small & Random Workloads

Cycles per Transaction (CPT) Lower is Better

Undo-Friendly Redo-Friendly

slide-77
SLIDE 77

0.4 0.6 0.8 1 1.2

1 2 3

0.4 0.6 0.8 1 1.2 1.4

1 2 3

Past Proposals: Summary

2018-10-23

97

Log-Write Data-Update Drawback

ATOM [HPCA 2017] Undo Direct

Synchronous Long Critical Path

Proteus [ISCA 2017] Undo Direct

Synchronous

Wrap [HPCA 2016] Redo

I ndirect

Asynchronous

Waste NVM Bandwidth

FWB [HPCA 2018]

UndoRedo

Direct Asynchronous

More Log Write

Large & Sequential Workloads Small & Random Workloads

Cycles per Transaction (CPT) Lower is Better

Undo-Friendly Redo-Friendly

Cycles per Transaction (CPT) Lower is Better

slide-78
SLIDE 78

0.4 0.6 0.8 1 1.2

1 2 3

0.4 0.6 0.8 1 1.2 1.4

1 2 3

Past Proposals: Summary

2018-10-23

98

Log-Write Data-Update Drawback

ATOM [HPCA 2017] Undo Direct

Synchronous Long Critical Path

Proteus [ISCA 2017] Undo Direct

Synchronous

Wrap [HPCA 2016] Redo

I ndirect

Asynchronous

Waste NVM Bandwidth

FWB [HPCA 2018]

UndoRedo

Direct Asynchronous

More Log Write

Large & Sequential Workloads Small & Random Workloads

Cycles per Transaction (CPT) Lower is Better

Undo-Friendly Redo-Friendly

Cycles per Transaction (CPT) Lower is Better

slide-79
SLIDE 79

0.4 0.6 0.8 1 1.2

1 2 3

0.4 0.6 0.8 1 1.2 1.4

1 2 3

Past Proposals: Summary

2018-10-23

99

Log-Write Data-Update Drawback

ATOM [HPCA 2017] Undo Direct

Synchronous Long Critical Path

Proteus [ISCA 2017] Undo Direct

Synchronous

Wrap [HPCA 2016] Redo

I ndirect

Asynchronous

Waste NVM Bandwidth

FWB [HPCA 2018]

UndoRedo

Direct Asynchronous

More Log Write

Large & Sequential Workloads Small & Random Workloads

Cycles per Transaction (CPT) Lower is Better

Undo-Friendly Redo-Friendly

Cycles per Transaction (CPT) Lower is Better

slide-80
SLIDE 80

0.4 0.6 0.8 1 1.2

1 2 3

0.4 0.6 0.8 1 1.2 1.4

1 2 3

Past Proposals: Summary

2018-10-23

100

Log-Write Data-Update Drawback

ATOM [HPCA 2017] Undo Direct

Synchronous Long Critical Path

Proteus [ISCA 2017] Undo Direct

Synchronous

Wrap [HPCA 2016] Redo

I ndirect

Asynchronous

Waste NVM Bandwidth

FWB [HPCA 2018]

UndoRedo

Direct Asynchronous

More Log Write

Large & Sequential Workloads Small & Random Workloads

Cycles per Transaction (CPT) Lower is Better

Undo-Friendly Redo-Friendly

Cycles per Transaction (CPT) Lower is Better

slide-81
SLIDE 81

0.4 0.6 0.8 1 1.2

1 2 3

0.4 0.6 0.8 1 1.2 1.4

1 2 3

Past Proposals: Summary

2018-10-23

101

Log-Write Data-Update Drawback

ATOM [HPCA 2017] Undo Direct

Synchronous Long Critical Path

Proteus [ISCA 2017] Undo Direct

Synchronous

Wrap [HPCA 2016] Redo

I ndirect

Asynchronous

Waste NVM Bandwidth

FWB [HPCA 2018]

UndoRedo

Direct Asynchronous

More Log Write

Large & Sequential Workloads Small & Random Workloads

Cycles per Transaction (CPT) Lower is Better

Undo-Friendly Redo-Friendly

Cycles per Transaction (CPT) Lower is Better

slide-82
SLIDE 82

0.4 0.6 0.8 1 1.2

1 2 3

0.4 0.6 0.8 1 1.2 1.4

1 2 3

Past Proposals: Summary

2018-10-23

102

Log-Write Data-Update Drawback

ATOM [HPCA 2017] Undo Direct

Synchronous Long Critical Path

Proteus [ISCA 2017] Undo Direct

Synchronous

Wrap [HPCA 2016] Redo

I ndirect

Asynchronous

Waste NVM Bandwidth

FWB [HPCA 2018]

UndoRedo

Direct Asynchronous

More Log Write

Large & Sequential Workloads Small & Random Workloads

Cycles per Transaction (CPT) Lower is Better

Undo-Friendly Redo-Friendly

Cycles per Transaction (CPT) Lower is Better

Trade-offs exist!

slide-83
SLIDE 83

Design Goal & Challenges

2018-10-23

103

slide-84
SLIDE 84

Design Goal & Challenges

  • Redo log with asynchronous & direct update to NVM
  • Challenge # 1: tracking write-sets of previous transactions
  • Without data update, logs keep growing
  • Challenge # 2: handling an early-eviction
  • Eviction of uncommitted changes from volatile CPU caches

2018-10-23

104

slide-85
SLIDE 85

Design Goal & Challenges

  • Redo log with asynchronous & direct update to NVM
  • Challenge # 1: tracking write-sets of previous transactions
  • Without data update, logs keep growing
  • Challenge # 2: handling an early-eviction
  • Eviction of uncommitted changes from volatile CPU caches

2018-10-23

105

Caches

slide-86
SLIDE 86

Design Goal & Challenges

  • Redo log with asynchronous & direct update to NVM
  • Challenge # 1: tracking write-sets of previous transactions
  • Without data update, logs keep growing
  • Challenge # 2: handling an early-eviction
  • Eviction of uncommitted changes from volatile CPU caches

2018-10-23

106

Caches

NVM Log Processor

slide-87
SLIDE 87

Design Goal & Challenges

  • Redo log with asynchronous & direct update to NVM
  • Challenge # 1: tracking write-sets of previous transactions
  • Without data update, logs keep growing
  • Challenge # 2: handling an early-eviction
  • Eviction of uncommitted changes from volatile CPU caches

2018-10-23

107

Caches

NVM Log Processor

slide-88
SLIDE 88

Design Goal & Challenges

  • Redo log with asynchronous & direct update to NVM
  • Challenge # 1: tracking write-sets of previous transactions
  • Without data update, logs keep growing
  • Challenge # 2: handling an early-eviction
  • Eviction of uncommitted changes from volatile CPU caches

2018-10-23

108

Caches

NVM Log Processor

slide-89
SLIDE 89

Design Goal & Challenges

  • Redo log with asynchronous & direct update to NVM
  • Challenge # 1: tracking write-sets of previous transactions
  • Without data update, logs keep growing
  • Challenge # 2: handling an early-eviction
  • Eviction of uncommitted changes from volatile CPU caches

2018-10-23

109

Caches

NVM Log NVM Data Processor

slide-90
SLIDE 90

Design Goal & Challenges

  • Redo log with asynchronous & direct update to NVM
  • Challenge # 1: tracking write-sets of previous transactions
  • Without data update, logs keep growing
  • Challenge # 2: handling an early-eviction
  • Eviction of uncommitted changes from volatile CPU caches

2018-10-23

110

Caches

NVM Log NVM Data NVM Processor

slide-91
SLIDE 91

Design Goal & Challenges

  • Redo log with asynchronous & direct update to NVM
  • Challenge # 1: tracking write-sets of previous transactions
  • Without data update, logs keep growing
  • Challenge # 2: handling an early-eviction
  • Eviction of uncommitted changes from volatile CPU caches

2018-10-23

111

Caches

NVM Log NVM Data NVM Processor

slide-92
SLIDE 92

Design Goal & Challenges

  • Redo log with asynchronous & direct update to NVM
  • Challenge # 1: tracking write-sets of previous transactions
  • Without data update, logs keep growing
  • Challenge # 2: handling an early-eviction
  • Eviction of uncommitted changes from volatile CPU caches

2018-10-23

112

Caches

NVM Log NVM Data NVM

Tx1

Processor

slide-93
SLIDE 93

Design Goal & Challenges

  • Redo log with asynchronous & direct update to NVM
  • Challenge # 1: tracking write-sets of previous transactions
  • Without data update, logs keep growing
  • Challenge # 2: handling an early-eviction
  • Eviction of uncommitted changes from volatile CPU caches

2018-10-23

113

Caches

NVM Log NVM Data NVM

Tx1 Tx2

Processor

slide-94
SLIDE 94

Design Goal & Challenges

  • Redo log with asynchronous & direct update to NVM
  • Challenge # 1: tracking write-sets of previous transactions
  • Without data update, logs keep growing
  • Challenge # 2: handling an early-eviction
  • Eviction of uncommitted changes from volatile CPU caches

2018-10-23

114

Caches

NVM Log NVM Data NVM

Tx1 Tx2 Tx3

Processor

slide-95
SLIDE 95

Design Goal & Challenges

  • Redo log with asynchronous & direct update to NVM
  • Challenge # 1: tracking write-sets of previous transactions
  • Without data update, logs keep growing
  • Challenge # 2: handling an early-eviction
  • Eviction of uncommitted changes from volatile CPU caches

2018-10-23

115

Caches

NVM Log NVM Data NVM

Tx1 Tx2 Tx3 Tx1

Processor

slide-96
SLIDE 96

Design Goal & Challenges

  • Redo log with asynchronous & direct update to NVM
  • Challenge # 1: tracking write-sets of previous transactions
  • Without data update, logs keep growing
  • Challenge # 2: handling an early-eviction
  • Eviction of uncommitted changes from volatile CPU caches

2018-10-23

116

Caches

NVM Log NVM Data NVM

Tx1 Tx2 Tx3 Tx1 Tx2

Processor

slide-97
SLIDE 97

Design Goal & Challenges

  • Redo log with asynchronous & direct update to NVM
  • Challenge # 1: tracking write-sets of previous transactions
  • Without data update, logs keep growing
  • Challenge # 2: handling an early-eviction
  • Eviction of uncommitted changes from volatile CPU caches

2018-10-23

117

Caches

NVM Log NVM Data NVM

Tx1 Tx2 Tx3 Tx1 Tx2

Processor

slide-98
SLIDE 98

Design Goal & Challenges

  • Redo log with asynchronous & direct update to NVM
  • Challenge # 1: tracking write-sets of previous transactions
  • Without data update, logs keep growing
  • Challenge # 2: handling an early-eviction
  • Eviction of uncommitted changes from volatile CPU caches

2018-10-23

118

Caches

NVM Log NVM Data NVM

Tx1 Tx2 Tx3 Tx1 Tx2

Processor

slide-99
SLIDE 99

Design Goal & Challenges

  • Redo log with asynchronous & direct update to NVM
  • Challenge # 1: tracking write-sets of previous transactions
  • Without data update, logs keep growing
  • Challenge # 2: handling an early-eviction
  • Eviction of uncommitted changes from volatile CPU caches

2018-10-23

119

Caches

NVM Log NVM Data NVM

Tx1 Tx2 Tx3 Tx1 Tx2

Processor Transaction # 1

slide-100
SLIDE 100

Design Goal & Challenges

  • Redo log with asynchronous & direct update to NVM
  • Challenge # 1: tracking write-sets of previous transactions
  • Without data update, logs keep growing
  • Challenge # 2: handling an early-eviction
  • Eviction of uncommitted changes from volatile CPU caches

2018-10-23

120

Caches

NVM Log NVM Data NVM

Tx1 Tx2 Tx3 Tx1 Tx2

Processor Transaction # 1 (committed)

slide-101
SLIDE 101

Design Goal & Challenges

  • Redo log with asynchronous & direct update to NVM
  • Challenge # 1: tracking write-sets of previous transactions
  • Without data update, logs keep growing
  • Challenge # 2: handling an early-eviction
  • Eviction of uncommitted changes from volatile CPU caches

2018-10-23

121

Caches

NVM Log NVM Data NVM

Tx1 Tx2 Tx3 Tx1 Tx2

Processor Transaction # 1 (committed)

slide-102
SLIDE 102

Design Goal & Challenges

  • Redo log with asynchronous & direct update to NVM
  • Challenge # 1: tracking write-sets of previous transactions
  • Without data update, logs keep growing
  • Challenge # 2: handling an early-eviction
  • Eviction of uncommitted changes from volatile CPU caches

2018-10-23

122

Caches

NVM Log NVM Data NVM

Tx1 Tx2 Tx3 Tx1 Tx2

Processor Transaction # 1 (committed)

slide-103
SLIDE 103

Design Goal & Challenges

  • Redo log with asynchronous & direct update to NVM
  • Challenge # 1: tracking write-sets of previous transactions
  • Without data update, logs keep growing
  • Challenge # 2: handling an early-eviction
  • Eviction of uncommitted changes from volatile CPU caches

2018-10-23

123

Caches

NVM Log NVM Data NVM

Tx1 Tx2 Tx3 Tx1 Tx2

Processor Transaction # 1 (committed) Transaction # 2

slide-104
SLIDE 104

Design Goal & Challenges

  • Redo log with asynchronous & direct update to NVM
  • Challenge # 1: tracking write-sets of previous transactions
  • Without data update, logs keep growing
  • Challenge # 2: handling an early-eviction
  • Eviction of uncommitted changes from volatile CPU caches

2018-10-23

124

Caches

NVM Log NVM Data NVM

Tx1 Tx2 Tx3 Tx1 Tx2

Processor Transaction # 1 (committed) Transaction # 2 (committed)

slide-105
SLIDE 105

Design Goal & Challenges

  • Redo log with asynchronous & direct update to NVM
  • Challenge # 1: tracking write-sets of previous transactions
  • Without data update, logs keep growing
  • Challenge # 2: handling an early-eviction
  • Eviction of uncommitted changes from volatile CPU caches

2018-10-23

125

Caches

NVM Log NVM Data NVM

Tx1 Tx2 Tx3 Tx1 Tx2

Processor Transaction # 1 (committed) Transaction # 2 (committed) Transaction # 3

slide-106
SLIDE 106

Design Goal & Challenges

  • Redo log with asynchronous & direct update to NVM
  • Challenge # 1: tracking write-sets of previous transactions
  • Without data update, logs keep growing
  • Challenge # 2: handling an early-eviction
  • Eviction of uncommitted changes from volatile CPU caches

2018-10-23

126

Caches

NVM Log NVM Data NVM

Tx1 Tx2 Tx3 Tx1 Tx2

Processor Transaction # 1 (committed) Transaction # 2 (committed) Transaction # 3 (not committed)

slide-107
SLIDE 107

Design Goal & Challenges

  • Redo log with asynchronous & direct update to NVM
  • Challenge # 1: tracking write-sets of previous transactions
  • Without data update, logs keep growing
  • Challenge # 2: handling an early-eviction
  • Eviction of uncommitted changes from volatile CPU caches

2018-10-23

127

Caches

NVM Log NVM Data NVM

Tx1 Tx2 Tx3 Tx1 Tx2

Processor Transaction # 1 (committed) Transaction # 2 (committed) Transaction # 3 (not committed) (deleted)

Logs removed since data-update completed

slide-108
SLIDE 108

Design Goal & Challenges

  • Redo log with asynchronous & direct update to NVM
  • Challenge # 1: tracking write-sets of previous transactions
  • Without data update, logs keep growing
  • Challenge # 2: handling an early-eviction
  • Eviction of uncommitted changes from volatile CPU caches

2018-10-23

128

Caches

NVM Log NVM Data NVM

Tx1 Tx2 Tx3 Tx1 Tx2

Processor Transaction # 1 (committed) Transaction # 2 (committed) Transaction # 3 (not committed) (deleted)

Logs removed since data-update completed

slide-109
SLIDE 109

Design Goal & Challenges

  • Redo log with asynchronous & direct update to NVM
  • Challenge # 1: tracking write-sets of previous transactions
  • Without data update, logs keep growing
  • Challenge # 2: handling an early-eviction
  • Eviction of uncommitted changes from volatile CPU caches

2018-10-23

129

Caches

NVM Log NVM Data NVM

Tx1 Tx2 Tx3 Tx1 Tx2

Processor Transaction # 1 (committed) Transaction # 2 (committed) Transaction # 3 (not committed) (deleted)

Logs removed since data-update completed

slide-110
SLIDE 110

Design Goal & Challenges

  • Redo log with asynchronous & direct update to NVM
  • Challenge # 1: tracking write-sets of previous transactions
  • Without data update, logs keep growing
  • Challenge # 2: handling an early-eviction
  • Eviction of uncommitted changes from volatile CPU caches

2018-10-23

130

Caches

NVM Log NVM Data NVM

Tx1 Tx2 Tx3 Tx1 Tx2

Processor Transaction # 1 (committed) Transaction # 2 (committed) Transaction # 3 (not committed) (deleted)

Logs removed since data-update completed

slide-111
SLIDE 111

Design Goal & Challenges

  • Redo log with asynchronous & direct update to NVM
  • Challenge # 1: tracking write-sets of previous transactions
  • Without data update, logs keep growing
  • Challenge # 2: handling an early-eviction
  • Eviction of uncommitted changes from volatile CPU caches

2018-10-23

131

Caches

NVM Log NVM Data NVM

Tx1 Tx2 Tx3 Tx1 Tx2

Processor Transaction # 1 (committed) Transaction # 2 (committed) Transaction # 3 (not committed) (deleted)

Logs removed since data-update completed Uncommitted data must not be written to NVM

slide-112
SLIDE 112

Naïve Solution: On-chip Cache Extension

2018-10-23

132

Caches

NVM Log

Redo-logs

Log Buffer

ASYNCHRONOUS update to NVM

NVM Data Processor NVM

slide-113
SLIDE 113

Naïve Solution: On-chip Cache Extension

  • Additional storages to store multiple write-sets
  • E.g., to store all physical address, scan the entire cache hierarchy
  • Cache replacement policy to be aware of transactions
  • E.g., evict non-transactional cache blocks first
  • Has to discard the cache block if overflow

 Need to search log area for read access  Need indirect data update

2018-10-23

133

Caches

NVM Log

Redo-logs

Log Buffer

ASYNCHRONOUS update to NVM

NVM Data Processor NVM

slide-114
SLIDE 114

Naïve Solution: On-chip Cache Extension

  • Additional storages to store multiple write-sets
  • E.g., to store all physical address, scan the entire cache hierarchy
  • Cache replacement policy to be aware of transactions
  • E.g., evict non-transactional cache blocks first
  • Has to discard the cache block if overflow

 Need to search log area for read access  Need indirect data update

2018-10-23

134

Caches

NVM Log

Redo-logs

Log Buffer

ASYNCHRONOUS update to NVM

NVM Data Processor NVM

slide-115
SLIDE 115

Naïve Solution: On-chip Cache Extension

  • Additional storages to store multiple write-sets
  • E.g., to store all physical address, scan the entire cache hierarchy
  • Cache replacement policy to be aware of transactions
  • E.g., evict non-transactional cache blocks first
  • Has to discard the cache block if overflow

 Need to search log area for read access  Need indirect data update

2018-10-23

135

Caches

NVM Log

Redo-logs

Log Buffer

ASYNCHRONOUS update to NVM

NVM Data Processor NVM

slide-116
SLIDE 116

Redo log with Direct Update (ReDU)

2018-10-23

136

Caches

NVM Log Log Buffer

ASYNCHRONOUS update to NVM

NVM Data Processor NVM

Redo-logs

slide-117
SLIDE 117

Redo log with Direct Update (ReDU)

  • Our approach: use DRAM for handling direct-update
  • Synchronous update to the FAST DRAM
  • Asynchronous update to the SLOW NVM

2018-10-23

137

Caches

NVM Log Log Buffer

ASYNCHRONOUS update to NVM

NVM Data Processor NVM

Redo-logs

slide-118
SLIDE 118

Redo log with Direct Update (ReDU)

  • Our approach: use DRAM for handling direct-update
  • Synchronous update to the FAST DRAM
  • Asynchronous update to the SLOW NVM

2018-10-23

138

Caches

NVM Log Log Buffer

ASYNCHRONOUS update to NVM

NVM Data DRAM Processor NVM

Redo-logs

slide-119
SLIDE 119

Redo log with Direct Update (ReDU)

  • Our approach: use DRAM for handling direct-update
  • Synchronous update to the FAST DRAM
  • Asynchronous update to the SLOW NVM

2018-10-23

139

Caches

NVM Log Log Buffer

ASYNCHRONOUS update to NVM

NVM Data DRAM Processor NVM

SYNCHRONOUS update to fast DRAM

Redo-logs

slide-120
SLIDE 120

Redo log with Direct Update (ReDU)

  • Our approach: use DRAM for handling direct-update
  • Synchronous update to the FAST DRAM
  • Asynchronous update to the SLOW NVM

2018-10-23

140

Caches

NVM Log Log Buffer

ASYNCHRONOUS update to NVM

NVM Data DRAM Processor NVM

SYNCHRONOUS update to fast DRAM ASYNCHRONOUS update to slow NVM

Redo-logs

slide-121
SLIDE 121

ReDU – Direct-Update

2018-10-23

141

Last-level Cache

L1 Cache L1 Cache

Data

Processor NVM

slide-122
SLIDE 122

ReDU – Direct-Update

  • Track the write-set within L1 cache
  • No on-chip cache modifications except L1
  • DRAM cache stores:
  • “Early-evicted”: modified cachelines evicted from L1 before commit
  • “On-commit-flushed”: modified cachelines in L1 on commit
  • For both events, explicitly flush through the DRAM cache

2018-10-23

142

Last-level Cache

L1 Cache L1 Cache

Data

Processor NVM

slide-123
SLIDE 123

ReDU – Direct-Update

  • Track the write-set within L1 cache
  • No on-chip cache modifications except L1
  • DRAM cache stores:
  • “Early-evicted”: modified cachelines evicted from L1 before commit
  • “On-commit-flushed”: modified cachelines in L1 on commit
  • For both events, explicitly flush through the DRAM cache

2018-10-23

143

Last-level Cache

DRAM Cache L1 Cache L1 Cache

Data

Processor NVM

slide-124
SLIDE 124

ReDU – Direct-Update

  • Track the write-set within L1 cache
  • No on-chip cache modifications except L1
  • DRAM cache stores:
  • “Early-evicted”: modified cachelines evicted from L1 before commit
  • “On-commit-flushed”: modified cachelines in L1 on commit
  • For both events, explicitly flush through the DRAM cache

2018-10-23

144

Last-level Cache

DRAM Cache L1 Cache L1 Cache

Data Early-eviction

Processor NVM

slide-125
SLIDE 125

ReDU – Direct-Update

  • Track the write-set within L1 cache
  • No on-chip cache modifications except L1
  • DRAM cache stores:
  • “Early-evicted”: modified cachelines evicted from L1 before commit
  • “On-commit-flushed”: modified cachelines in L1 on commit
  • For both events, explicitly flush through the DRAM cache

2018-10-23

145

Last-level Cache

DRAM Cache L1 Cache L1 Cache

Data Early-eviction

Processor NVM

slide-126
SLIDE 126

ReDU – Direct-Update

  • Track the write-set within L1 cache
  • No on-chip cache modifications except L1
  • DRAM cache stores:
  • “Early-evicted”: modified cachelines evicted from L1 before commit
  • “On-commit-flushed”: modified cachelines in L1 on commit
  • For both events, explicitly flush through the DRAM cache

2018-10-23

146

Last-level Cache

DRAM Cache L1 Cache L1 Cache

Data Early-eviction

Processor NVM

slide-127
SLIDE 127

ReDU – Direct-Update

  • Track the write-set within L1 cache
  • No on-chip cache modifications except L1
  • DRAM cache stores:
  • “Early-evicted”: modified cachelines evicted from L1 before commit
  • “On-commit-flushed”: modified cachelines in L1 on commit
  • For both events, explicitly flush through the DRAM cache

2018-10-23

147

Last-level Cache

DRAM Cache L1 Cache L1 Cache

Data Early-eviction

Processor NVM

slide-128
SLIDE 128

ReDU – Direct-Update

  • Track the write-set within L1 cache
  • No on-chip cache modifications except L1
  • DRAM cache stores:
  • “Early-evicted”: modified cachelines evicted from L1 before commit
  • “On-commit-flushed”: modified cachelines in L1 on commit
  • For both events, explicitly flush through the DRAM cache

2018-10-23

148

Last-level Cache

DRAM Cache L1 Cache L1 Cache

Data Early-eviction

Processor DRAM NVM

slide-129
SLIDE 129

ReDU – Direct-Update

  • Track the write-set within L1 cache
  • No on-chip cache modifications except L1
  • DRAM cache stores:
  • “Early-evicted”: modified cachelines evicted from L1 before commit
  • “On-commit-flushed”: modified cachelines in L1 on commit
  • For both events, explicitly flush through the DRAM cache

2018-10-23

149

Last-level Cache

DRAM Cache L1 Cache L1 Cache

Data Early-eviction

Processor DRAM NVM

slide-130
SLIDE 130

ReDU – Direct-Update

  • Track the write-set within L1 cache
  • No on-chip cache modifications except L1
  • DRAM cache stores:
  • “Early-evicted”: modified cachelines evicted from L1 before commit
  • “On-commit-flushed”: modified cachelines in L1 on commit
  • For both events, explicitly flush through the DRAM cache

2018-10-23

150

Last-level Cache

DRAM Cache L1 Cache L1 Cache

Data Early-eviction

Processor DRAM NVM

slide-131
SLIDE 131

ReDU – Direct-Update

  • Track the write-set within L1 cache
  • No on-chip cache modifications except L1
  • DRAM cache stores:
  • “Early-evicted”: modified cachelines evicted from L1 before commit
  • “On-commit-flushed”: modified cachelines in L1 on commit
  • For both events, explicitly flush through the DRAM cache

2018-10-23

151

Last-level Cache

DRAM Cache L1 Cache L1 Cache

Data Early-eviction

Processor DRAM NVM

slide-132
SLIDE 132

ReDU – Direct-Update

  • Track the write-set within L1 cache
  • No on-chip cache modifications except L1
  • DRAM cache stores:
  • “Early-evicted”: modified cachelines evicted from L1 before commit
  • “On-commit-flushed”: modified cachelines in L1 on commit
  • For both events, explicitly flush through the DRAM cache

2018-10-23

152

Last-level Cache

DRAM Cache L1 Cache L1 Cache

Data Early-eviction On-commit-flushed

Processor DRAM NVM

slide-133
SLIDE 133

ReDU – Direct-Update

  • Track the write-set within L1 cache
  • No on-chip cache modifications except L1
  • DRAM cache stores:
  • “Early-evicted”: modified cachelines evicted from L1 before commit
  • “On-commit-flushed”: modified cachelines in L1 on commit
  • For both events, explicitly flush through the DRAM cache

2018-10-23

153

Last-level Cache

DRAM Cache L1 Cache L1 Cache

Data Early-eviction On-commit-flushed

Processor DRAM NVM

slide-134
SLIDE 134

ReDU – Direct-Update

  • Track the write-set within L1 cache
  • No on-chip cache modifications except L1
  • DRAM cache stores:
  • “Early-evicted”: modified cachelines evicted from L1 before commit
  • “On-commit-flushed”: modified cachelines in L1 on commit
  • For both events, explicitly flush through the DRAM cache

2018-10-23

154

Last-level Cache

DRAM Cache L1 Cache L1 Cache

Data Early-eviction On-commit-flushed

Processor DRAM NVM

slide-135
SLIDE 135

ReDU – Direct-Update

  • Track the write-set within L1 cache
  • No on-chip cache modifications except L1
  • DRAM cache stores:
  • “Early-evicted”: modified cachelines evicted from L1 before commit
  • “On-commit-flushed”: modified cachelines in L1 on commit
  • For both events, explicitly flush through the DRAM cache

2018-10-23

155

Last-level Cache

DRAM Cache L1 Cache L1 Cache

Data Early-eviction On-commit-flushed

Processor DRAM NVM

slide-136
SLIDE 136

ReDU – Direct-Update

  • Track the write-set within L1 cache
  • No on-chip cache modifications except L1
  • DRAM cache stores:
  • “Early-evicted”: modified cachelines evicted from L1 before commit
  • “On-commit-flushed”: modified cachelines in L1 on commit
  • For both events, explicitly flush through the DRAM cache

2018-10-23

156

Last-level Cache

DRAM Cache L1 Cache L1 Cache

Data Early-eviction On-commit-flushed

Processor DRAM NVM

slide-137
SLIDE 137

ReDU – Direct-Update

2018-10-23

157

Last-level Cache

DRAM Cache L1 Cache L1 Cache

Data Early-eviction On-commit-flushed

Processor NVM DRAM

slide-138
SLIDE 138

ReDU – Direct-Update

  • Update to NVM done asynchronously
  • Only flush cachelines that belong to the committed

transaction

  • DRAM cache maintains the committed transaction IDs
  • Various write-back policies are possible
  • E.g., Eager or LRU

2018-10-23

158

Last-level Cache

DRAM Cache L1 Cache L1 Cache

Data Early-eviction On-commit-flushed

Processor NVM DRAM

slide-139
SLIDE 139

ReDU – Direct-Update

  • Update to NVM done asynchronously
  • Only flush cachelines that belong to the committed

transaction

  • DRAM cache maintains the committed transaction IDs
  • Various write-back policies are possible
  • E.g., Eager or LRU

2018-10-23

159

Last-level Cache

DRAM Cache L1 Cache L1 Cache

Data Early-eviction On-commit-flushed

Processor NVM DRAM

slide-140
SLIDE 140

ReDU – Direct-Update

  • Update to NVM done asynchronously
  • Only flush cachelines that belong to the committed

transaction

  • DRAM cache maintains the committed transaction IDs
  • Various write-back policies are possible
  • E.g., Eager or LRU

2018-10-23

160

Last-level Cache

DRAM Cache L1 Cache L1 Cache

Data Early-eviction On-commit-flushed

Processor NVM DRAM

slide-141
SLIDE 141

More in the paper…

  • Full design space exploration of HW logging
  • Log optimization # 1: coalescing
  • Log optimization # 2: packing
  • Details of DRAM cache organization
  • Transaction Table and Offset Table
  • Bloom filter-based HW-filter to reduce DRAM accesses
  • Evaluation of LRU write-back policy of the DRAM cache
  • Log management

2018-10-23

161

slide-142
SLIDE 142

Methodology

  • Gem5 simulator
  • Comparing schemes
  • All equally include log optimizations (e.g., coalescing and packing)
  • UndoSync: undo log with synchronous commit
  • RedoIndirect: redo log with asynchronous but indirect update
  • Undo+ Redo: undo+ redo log with asynchronous & direct update
  • ReDU: our approach

Processor OoO, 2GHz, x86 L1 I/D cache Private, 32KB, 8-way L2 cache Private, 256KB, 8-way L3 cache Shared, 8MB, 16-way DRAM cache 40MB (8MB meta + 32MB data) NVM Read: 50ns, write: 150ns

2018-10-23

162 Micro-bench Vector, Swap NVML HashMap, B-Tree, RB-Tree Macro-bench YCSB, TPCC, ECHO

  • Benchmarks
slide-143
SLIDE 143

Evaluation – Transaction Throughput

2018-10-23

163

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1 2 3 Cycles per Transaction normalized to undo (Lower is better) 계열1 계열2 계열3 계열4

slide-144
SLIDE 144

Evaluation – Transaction Throughput

  • Large & sequential workloads
  • Undo and ReDU perform similarly

(same data path and NVM bandwidth saturated)

  • Redo suffers from indirect update
  • UndoRedo requires double NVM writes

for logs

  • Small & Random workloads
  • On average

2018-10-23

164

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1 2 3 Cycles per Transaction normalized to undo (Lower is better) 계열1 계열2 계열3 계열4

slide-145
SLIDE 145

Evaluation – Transaction Throughput

  • Large & sequential workloads
  • Undo and ReDU perform similarly

(same data path and NVM bandwidth saturated)

  • Redo suffers from indirect update
  • UndoRedo requires double NVM writes

for logs

  • Small & Random workloads
  • On average

2018-10-23

165

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1 2 3 Cycles per Transaction normalized to undo (Lower is better) 계열1 계열2 계열3 계열4

slide-146
SLIDE 146

Evaluation – Transaction Throughput

  • Large & sequential workloads
  • Undo and ReDU perform similarly

(same data path and NVM bandwidth saturated)

  • Redo suffers from indirect update
  • UndoRedo requires double NVM writes

for logs

  • Small & Random workloads
  • On average

2018-10-23

166

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1 2 3 Cycles per Transaction normalized to undo (Lower is better) 계열1 계열2 계열3 계열4

slide-147
SLIDE 147

Evaluation – Transaction Throughput

  • Large & sequential workloads
  • Undo and ReDU perform similarly

(same data path and NVM bandwidth saturated)

  • Redo suffers from indirect update
  • UndoRedo requires double NVM writes

for logs

  • Small & Random workloads
  • On average

2018-10-23

167

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1 2 3 Cycles per Transaction normalized to undo (Lower is better) 계열1 계열2 계열3 계열4

slide-148
SLIDE 148

Evaluation – Transaction Throughput

2018-10-23

168

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1 2 3 Cycles per Transaction normalized to undo (Lower is better) 계열1 계열2 계열3 계열4

slide-149
SLIDE 149

Evaluation – Transaction Throughput

  • Large & Sequential workloads
  • Small & Random workloads
  • Undo waits synchronous commit
  • Redo suffers from indirect update
  • UndoRedo requires double NVM writes

for logs

  • On average

2018-10-23

169

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1 2 3 Cycles per Transaction normalized to undo (Lower is better) 계열1 계열2 계열3 계열4

slide-150
SLIDE 150

Evaluation – Transaction Throughput

  • Large & Sequential workloads
  • Small & Random workloads
  • Undo waits synchronous commit
  • Redo suffers from indirect update
  • UndoRedo requires double NVM writes

for logs

  • On average

2018-10-23

170

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1 2 3 Cycles per Transaction normalized to undo (Lower is better) 계열1 계열2 계열3 계열4

slide-151
SLIDE 151

Evaluation – Transaction Throughput

  • Large & Sequential workloads
  • Small & Random workloads
  • Undo waits synchronous commit
  • Redo suffers from indirect update
  • UndoRedo requires double NVM writes

for logs

  • On average

2018-10-23

171

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1 2 3 Cycles per Transaction normalized to undo (Lower is better) 계열1 계열2 계열3 계열4

slide-152
SLIDE 152

Evaluation – Transaction Throughput

  • Large & Sequential workloads
  • Small & Random workloads
  • Undo waits synchronous commit
  • Redo suffers from indirect update
  • UndoRedo requires double NVM writes

for logs

  • On average

2018-10-23

172

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1 2 3 Cycles per Transaction normalized to undo (Lower is better) 계열1 계열2 계열3 계열4

slide-153
SLIDE 153

Evaluation – Transaction Throughput

  • Large & Sequential workloads
  • Small & Random workloads
  • On average
  • Asynchronous update  9%
  • Direct update  16%
  • Small log size  30%

2018-10-23

173

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1 2 3 Cycles per Transaction normalized to undo (Lower is better) 계열1 계열2 계열3 계열4

slide-154
SLIDE 154

Summary

  • Problem: crash-consistency in storage-class memory
  • Atomicity and durability support for NVM writes
  • Existing hardware solutions exhibit trade-offs
  • Solution: Redo log with Direct Updates (ReDU)
  • Redo-based log with optimizations
  • Synchronous update to the fast DRAM
  • Asynchronous update to the slow NVM
  • Results: ReDU outperforms existing solutions in various

workloads

  • Bringing DRAM into the atomicity and durability

2018-10-23

174

slide-155
SLIDE 155

Efficient Hardware-assisted Logging

with Asynchronous and Direct Update for Persistent Memory

Jungi Jeong, Chang Hyun Park,

Jaehyuk Huh, and Seungryoul Maeng

International Symposium on Microarchitecture (MICRO) 2018