Architectural Support for Atomic Durability in Non-Volatile Memory - - PowerPoint PPT Presentation

architectural support for atomic durability in non
SMART_READER_LITE
LIVE PREVIEW

Architectural Support for Atomic Durability in Non-Volatile Memory - - PowerPoint PPT Presentation

Architectural Support for Atomic Durability in Non-Volatile Memory Arpit Joshi , Vijay Nagarajan, Stratis Viglas, Marcelo Cintra NVMW 2018 Summary Non-Volatile Memory (NVM) - on the memory bus enables in-memory persistent data structures


slide-1
SLIDE 1

Architectural Support for Atomic Durability in Non-Volatile Memory

Arpit Joshi, Vijay Nagarajan, Stratis Viglas, Marcelo Cintra

NVMW 2018

slide-2
SLIDE 2

Summary

  • Non-Volatile Memory (NVM) - on the memory bus
  • enables in-memory persistent data structures
  • Persistent data structures require an atomic

durability primitive to ensure crash consistency

  • Logging is a technique to provide atomic durability
  • ATOM: hardware support for atomic durability by

way of undo logging

2

slide-3
SLIDE 3

Atomic Durability

  • All or nothing persists: think transactions (ACID)

3

slide-4
SLIDE 4

Atomic Durability

  • All or nothing persists: think transactions (ACID)

3

Initial State A B 100 100

slide-5
SLIDE 5

Atomic Durability

  • All or nothing persists: think transactions (ACID)

3

Atomic_Begin A = A - 50 B = B + 50 Atomic_End Initial State A B 100 100

slide-6
SLIDE 6

Atomic Durability

  • All or nothing persists: think transactions (ACID)

3

Atomic_Begin A = A - 50 B = B + 50 Atomic_End Initial State A B 100 100 Final State A B 100 100

slide-7
SLIDE 7

Atomic Durability

  • All or nothing persists: think transactions (ACID)

3

Atomic_Begin A = A - 50 B = B + 50 Atomic_End Initial State A B 100 100 Final State A B 50 150 Final State A B 100 100

slide-8
SLIDE 8

A B 50 100

Atomic Durability

  • All or nothing persists: think transactions (ACID)

3

Atomic_Begin A = A - 50 B = B + 50 Atomic_End Initial State A B 100 100 Final State A B 50 150 Final State Final State Final State A B 100 100 A B 100 150

slide-9
SLIDE 9

A B 50 100

Atomic Durability

  • All or nothing persists: think transactions (ACID)

3

Atomic_Begin A = A - 50 B = B + 50 Atomic_End Initial State A B 100 100 Final State A B 50 150 Final State Final State Final State A B 100 100 A B 100 150

slide-10
SLIDE 10

Mechanisms

Shadow Paging

4

Write-Ahead-Logging

➡ beneficial for coarse grained updates

slide-11
SLIDE 11

Mechanisms

Shadow Paging

4

Write-Ahead-Logging REDO UNDO

➡ beneficial for coarse grained updates

➡reads redirection ➡victim cache ➡fine grained log->data ordering

slide-12
SLIDE 12

NVM 100

Undo Logging

  • 1. Compute: Compute the new value (V = A - 50)
  • 2. Log: Write old value of data to log space in

persistent memory (Log [A , 100])

  • 3. Modify: Modify data in-place (A = V)

5

A

Data Log

slide-13
SLIDE 13

NVM 100

Undo Logging

  • 1. Compute: Compute the new value (V = A - 50)
  • 2. Log: Write old value of data to log space in

persistent memory (Log [A , 100])

  • 3. Modify: Modify data in-place (A = V)

5

A A 100

Data Log

slide-14
SLIDE 14

NVM 100 50

Undo Logging

  • 1. Compute: Compute the new value (V = A - 50)
  • 2. Log: Write old value of data to log space in

persistent memory (Log [A , 100])

  • 3. Modify: Modify data in-place (A = V)

5

A A 100

Data Log

slide-15
SLIDE 15

NVM 100 50

Undo Logging

  • 1. Compute: Compute the new value (V = A - 50)
  • 2. Log: Write old value of data to log space in

persistent memory (Log [A , 100])

  • 3. Modify: Modify data in-place (A = V)

5

A A 100

Data Log

Log writes reach NVM before data writes. (Log —> Data ordering)

slide-16
SLIDE 16

NVM 100 50

Undo Logging

  • 1. Compute: Compute the new value (V = A - 50)
  • 2. Log: Write old value of data to log space in

persistent memory (Log [A , 100])

  • 3. Modify: Modify data in-place (A = V)

5

A A 100

Data Log

Logging is essentially a data movement task.

slide-17
SLIDE 17

System Architecture

Core Cache NVM Secondary Storage Core Cache DRAM Secondary Storage

6

Disk Based Persistence NVM Based Persistence

slide-18
SLIDE 18

System Architecture

Core Cache NVM Secondary Storage Core Cache DRAM Secondary Storage

Software Controlled

6

Disk Based Persistence NVM Based Persistence

slide-19
SLIDE 19

System Architecture

Core Cache NVM Secondary Storage

Hardware Controlled

Core Cache DRAM Secondary Storage

Software Controlled

6

Disk Based Persistence NVM Based Persistence

slide-20
SLIDE 20

Logging with Disk

7

Compute Log Modify Flush Log Flush Data

slide-21
SLIDE 21

Logging with Disk

7

Compute Log Modify Flush Log Flush Data

Volatile Phase Persistence Phase

slide-22
SLIDE 22

Logging with Disk

7

Compute Log Modify Flush Log Flush Data

Volatile Phase Persistence Phase Clear separation of volatile and persistence phases.

slide-23
SLIDE 23

Logging with NVM

8

Compute Log Flush Log Modify Flush Data

slide-24
SLIDE 24

Logging with NVM

8

Compute Log Flush Log Modify Flush Data

Volatile and persistence phases overlap.

slide-25
SLIDE 25

ATOM

9

Compute Log Flush Log Modify Flush Data

In Hardware

slide-26
SLIDE 26

ATOM

9

Compute Log Flush Log Modify Flush Data

Goal: Move logging out of critical path.

In Hardware

slide-27
SLIDE 27

Programming Model

ATOMIC_BEGIN while ( ! Done ) { Modify Data } Flush Data ATOMIC_END

10

while ( ! Done ) { Write Undo Log Flush Log Modify Data } Flush Data

Software Logging ATOM

slide-28
SLIDE 28

Cache 100

Baseline Hardware Logging

  • Create Undo Log
  • on a store, write old

value to log

  • Flush Undo Log
  • enforce log —> data
  • rdering

11

Core NVM A 100

Data Log

A

slide-29
SLIDE 29

Cache 100

Baseline Hardware Logging

  • Create Undo Log
  • on a store, write old

value to log

  • Flush Undo Log
  • enforce log —> data
  • rdering

11

Core NVM

A = 50

A 100

Data Log

A

slide-30
SLIDE 30

Cache 100

Baseline Hardware Logging

  • Create Undo Log
  • on a store, write old

value to log

  • Flush Undo Log
  • enforce log —> data
  • rdering

11

Core NVM

A = 50 L(A) = 100

A 100 A 100

Data Log

A

slide-31
SLIDE 31

Cache 100 50

Baseline Hardware Logging

  • Create Undo Log
  • on a store, write old

value to log

  • Flush Undo Log
  • enforce log —> data
  • rdering

11

Core NVM

A = 50 L(A) = 100

A 100 A 100

Data Log

Log Done

A

slide-32
SLIDE 32

ATOM Design Philosophy

Where is log —> data ordering enforced?

12

Core Cache NVM Memory Controller

Store Queue Store Buffer

slide-33
SLIDE 33

ATOM Design Philosophy

Where is log —> data ordering enforced?

12

Core Cache NVM Memory Controller

Store Queue

Baseline Design

Store Buffer

slide-34
SLIDE 34

ATOM Design Philosophy

Where is log —> data ordering enforced?

12

Core Cache NVM Memory Controller

Store Queue

Baseline Design

Store Buffer

ATOM Design

slide-35
SLIDE 35

Baseline Implementation

13

SQ Cache Mem Ctrl Memory

slide-36
SLIDE 36

Baseline Implementation

13

SQ Cache Mem Ctrl Memory

ST(A)

slide-37
SLIDE 37

Baseline Implementation

13

SQ Cache Mem Ctrl Memory

ST(A) L(A)

slide-38
SLIDE 38

Baseline Implementation

13

SQ Cache Mem Ctrl Memory

ST(A) L(A) L(A)

slide-39
SLIDE 39

Baseline Implementation

13

SQ Cache Mem Ctrl Memory

ST(A) L(A) L(A) WRITE L(A)

slide-40
SLIDE 40

Baseline Implementation

13

SQ Cache Mem Ctrl Memory

ST(A) L(A) L(A) WRITE L(A) L(A)

slide-41
SLIDE 41

Baseline Implementation

13

SQ Cache Mem Ctrl Memory

ST(A) L(A) L(A) WRITE L(A) L(A) L(A)

slide-42
SLIDE 42

Baseline Implementation

13

SQ Cache Mem Ctrl Memory

ST(A) L(A) L(A) WRITE L(A) L(A) L(A) ST(A)

slide-43
SLIDE 43

Baseline Implementation

13

SQ Cache Mem Ctrl Memory

ST(A) L(A) L(A) WRITE L(A) L(A) L(A) ST(A)

Store Completion Time

slide-44
SLIDE 44

Baseline Implementation

13

SQ Cache Mem Ctrl Memory

ST(A) L(A) L(A) WRITE L(A) L(A) L(A) ST(A)

Store Completion Time Log persist operation in the critical path of retiring stores.

slide-45
SLIDE 45

ATOM Posted Log

14

SQ Cache Mem Ctrl Memory

slide-46
SLIDE 46

ATOM Posted Log

14

SQ Cache Mem Ctrl Memory

ST(A)

slide-47
SLIDE 47

ATOM Posted Log

14

SQ Cache Mem Ctrl Memory

ST(A) L(A)

slide-48
SLIDE 48

ATOM Posted Log

14

SQ Cache Mem Ctrl Memory

ST(A) L(A) L(A)

slide-49
SLIDE 49

ATOM Posted Log

14

SQ Cache Mem Ctrl Memory

ST(A) L(A) L(A) WRITE L(A)

slide-50
SLIDE 50

ATOM Posted Log

14

SQ Cache Mem Ctrl Memory

ST(A) L(A) L(A) WRITE L(A) L(A)

slide-51
SLIDE 51

ATOM Posted Log

14

SQ Cache Mem Ctrl Memory

ST(A) L(A) L(A) WRITE L(A) L(A) L(A)

slide-52
SLIDE 52

ATOM Posted Log

14

SQ Cache Mem Ctrl Memory

ST(A) L(A) L(A) WRITE L(A) L(A) L(A) ST(A)

slide-53
SLIDE 53

ATOM Posted Log

14

SQ Cache Mem Ctrl Memory

ST(A) L(A) L(A) WRITE L(A) L(A) L(A) ST(A)

Store Completion Time

slide-54
SLIDE 54

ATOM Posted Log

14

SQ Cache Mem Ctrl Memory

ST(A) L(A) L(A) WRITE L(A) L(A) L(A) ST(A)

Store Completion Time Remove log persist operations from the critical path by enforcing ordering at memory controller.

slide-55
SLIDE 55

ATOM Store Miss

15

SQ Cache Mem Ctrl Memory

slide-56
SLIDE 56

ATOM Store Miss

15

SQ Cache Mem Ctrl Memory

RD (A)

slide-57
SLIDE 57

ATOM Store Miss

15

SQ Cache Mem Ctrl Memory

RD (A) RD(A)

slide-58
SLIDE 58

ATOM Store Miss

15

SQ Cache Mem Ctrl Memory

RD (A) RD(A) RD(A)

slide-59
SLIDE 59

ATOM Store Miss

15

SQ Cache Mem Ctrl Memory

L(A) RD (A) RD(A) RD(A)

slide-60
SLIDE 60

ATOM Store Miss

15

SQ Cache Mem Ctrl Memory

L(A) L(A) RD (A) RD(A) RD(A)

slide-61
SLIDE 61

ATOM Store Miss

15

SQ Cache Mem Ctrl Memory

L(A) L(A) WRITE L(A) RD (A) RD(A) RD(A)

slide-62
SLIDE 62

ATOM Store Miss

15

SQ Cache Mem Ctrl Memory

L(A) L(A) WRITE L(A) L(A) RD (A) RD(A) RD(A)

slide-63
SLIDE 63

ATOM Store Miss

15

SQ Cache Mem Ctrl Memory

L(A) L(A) WRITE L(A) L(A) L(A) RD (A) RD(A) RD(A)

slide-64
SLIDE 64

ATOM Store Miss

15

SQ Cache Mem Ctrl Memory

L(A) L(A) WRITE L(A) L(A) L(A) ST(A) RD (A) RD(A) RD(A)

slide-65
SLIDE 65

ATOM Store Miss

15

SQ Cache Mem Ctrl Memory

L(A) L(A) WRITE L(A) L(A) L(A) ST(A)

Store Completion Time

RD (A) RD(A) RD(A)

slide-66
SLIDE 66

ATOM Store Miss

15

SQ Cache Mem Ctrl Memory

L(A) L(A) WRITE L(A) L(A) L(A) ST(A)

Store Completion Time

RD (A) RD(A) RD(A)

Same data goes from Mem Ctrl to Cache and back.

slide-67
SLIDE 67

ATOM-OPT Source Log

16

SQ Cache Mem Ctrl Memory

slide-68
SLIDE 68

ATOM-OPT Source Log

16

SQ Cache Mem Ctrl Memory

RDx(A)

slide-69
SLIDE 69

ATOM-OPT Source Log

16

SQ Cache Mem Ctrl Memory

RDx(A) RDx(A)

slide-70
SLIDE 70

ATOM-OPT Source Log

16

SQ Cache Mem Ctrl Memory

WRITE L(A) RDx(A) RDx(A)

slide-71
SLIDE 71

ATOM-OPT Source Log

16

SQ Cache Mem Ctrl Memory

WRITE L(A) RDx(A) RDx(A) RDx(A)

slide-72
SLIDE 72

ATOM-OPT Source Log

16

SQ Cache Mem Ctrl Memory

WRITE L(A) ST(A) RDx(A) RDx(A) RDx(A)

slide-73
SLIDE 73

ATOM-OPT Source Log

16

SQ Cache Mem Ctrl Memory

WRITE L(A) ST(A)

Store Completion Time

RDx(A) RDx(A) RDx(A)

slide-74
SLIDE 74

ATOM-OPT Source Log

16

SQ Cache Mem Ctrl Memory

WRITE L(A) ST(A)

Store Completion Time

RDx(A) RDx(A) RDx(A)

Remove redundant data movement by creating log entry in the memory controller.

slide-75
SLIDE 75

Evaluation

  • System Configuration
  • We evaluate proposed design using GEM5 full-

system simulation mode

  • 32 Core CMP with 32x1MB LLC cache banks

and 4 memory controllers

BASE Baseline hardware undo log implementation ATOM Posted log writes to memory controller ATOM-OPT Posted log writes with source logging NON-ATOMIC No logging (Upper bound on performance)

Atomic Durability Designs

17

slide-76
SLIDE 76

Transaction Throughput

18

slide-77
SLIDE 77

1 1.2 1.4 1.6 1.8 btree hash queue rbtree sdg sps gmean

ATOM ATOM-OPT NON-ATOMIC

Transaction Throughput

18

Higher is Better

slide-78
SLIDE 78

1 1.2 1.4 1.6 1.8 btree hash queue rbtree sdg sps gmean

ATOM ATOM-OPT NON-ATOMIC

Transaction Throughput

18

23% Higher is Better

slide-79
SLIDE 79

1 1.2 1.4 1.6 1.8 btree hash queue rbtree sdg sps gmean

ATOM ATOM-OPT NON-ATOMIC

Transaction Throughput

18

27% Higher is Better

slide-80
SLIDE 80

1 1.2 1.4 1.6 1.8 btree hash queue rbtree sdg sps gmean

ATOM ATOM-OPT NON-ATOMIC

Transaction Throughput

18

11% Higher is Better ATOM-OPT performance is within 11% of optimal design.

slide-81
SLIDE 81

Crash Consistency Primitives

19

  • Ordering primitive [Condit’09, Joshi’15, Kolli’16,

Haria’17]

  • necessary for crash consistent programming
  • reasoning about crash consistency with ordering

as the only primitive is difficult

  • Atomic Durability [Doshi’16, Joshi’17, Shin’17]
  • eases the burden of reasoning about crash

consistency

  • enables further performance optimizations
slide-82
SLIDE 82

20

Atomic_Begin Log (A) Write (A) Log (B) Write (B) Atomic_End

Crash Consistency Primitives

slide-83
SLIDE 83

20

Atomic_Begin Log (A) Write (A) Log (B) Write (B) Atomic_End

Crash Consistency Primitives

Explicit Constraint

slide-84
SLIDE 84

20

Atomic_Begin Log (A) Write (A) Log (B) Write (B) Atomic_End

Crash Consistency Primitives

Explicit Constraint Implicit Constraint with Ordering

slide-85
SLIDE 85

20

Atomic_Begin Log (A) Write (A) Log (B) Write (B) Atomic_End

Crash Consistency Primitives

Explicit Constraint

X

Not present with Atomic Durability

slide-86
SLIDE 86

Conclusion

  • Non-Volatile memory is an opportunity to enable in-

memory persistent data structures

  • Primitives like Atomic Durability are necessary to

ensure crash consistency

  • Traditional approaches to logging perform log

writes to NVM in the critical path of store operations

  • ATOM removes log writes from the critical path by

enforcing ordering at the memory controller

21

slide-87
SLIDE 87

Architectural Support for Atomic Durability in Non-Volatile Memory

Arpit Joshi, Vijay Nagarajan, Stratis Viglas, Marcelo Cintra

NVMW 2018