Better I/O Through Byte-Addressable, Persistent Memory Jeremy Condit - - PowerPoint PPT Presentation

better i o through byte addressable persistent memory
SMART_READER_LITE
LIVE PREVIEW

Better I/O Through Byte-Addressable, Persistent Memory Jeremy Condit - - PowerPoint PPT Presentation

Better I/O Through Byte-Addressable, Persistent Memory Jeremy Condit , Ed Nightingale, Chris Frost, Engin Ipek, Ben Lee, Doug Burger, Derrick Coetzee A New World of Storage DRAM + Fast + Byte-addressable - Volatile Disk / Flash +


slide-1
SLIDE 1

Better I/O Through Byte-Addressable, Persistent Memory

Jeremy Condit, Ed Nightingale, Chris Frost, Engin Ipek, Ben Lee, Doug Burger, Derrick Coetzee

slide-2
SLIDE 2

A New World of Storage

+ Fast + Byte-addressable

  • Volatile

+ Non-volatile

  • Slow
  • Block-addressable

Disk / Flash DRAM

2

slide-3
SLIDE 3

A New World of Storage

BPRAM

Byte-addressable, Persistent RAM

3

+ Fast + Byte-addressable + Non-volatile

slide-4
SLIDE 4

A New World of Storage

BPRAM How do we build fast, reliable systems with BPRAM?

Byte-addressable, Persistent RAM

4

+ Fast + Byte-addressable + Non-volatile

slide-5
SLIDE 5

Phase Change Memory

  • Most promising form
  • f BPRAM
  • “Melting memory chips

in mass production” – Nature, 9/25/09

5

slide-6
SLIDE 6

Phase Change Memory

phase change material (chalcogenide) electrode

slow cooling -> crystalline state (1) fast cooling -> amorphous state (0)

Properties Reads: 2-4x DRAM Writes: 5-10x DRAM Endurance: 108+

6

slide-7
SLIDE 7

A New World of Storage

+ Non-volatile

  • Slow
  • Block-addressable

BPRAM Disk / Flash How do we build fast, reliable systems with BPRAM?

Byte-addressable, Persistent RAM

This talk: BPFS, a file system for BPRAM Result: Improved performance and reliability

7

+ Fast + Byte-addressable + Non-volatile

slide-8
SLIDE 8

Goal

New mechanism: short-circuit shadow paging

8

New guarantees for applications

  • File system operations will commit

atomicallyand in program order

  • Your data is durable as soon as the

cache is flushed

slide-9
SLIDE 9

Design Principles

  • 1. Eliminate the DRAM buffer cache;

use the L1/L2 cache instead

  • 3. Provide atomicity and
  • rdering in hardware

Write A Write B

  • 2. Put BPRAM on the

memory bus

9

slide-10
SLIDE 10

Outline

  • Intro
  • File System
  • Hardware Support
  • Evaluation
  • Conclusion

10

slide-11
SLIDE 11

BPRAM in the PC

L1 L2 DRAM HD / Flash PCI/IDE bus Memory bus 11

slide-12
SLIDE 12

BPRAM in the PC

L1 L2 DRAM HD / Flash PCI/IDE bus Memory bus BPRAM

  • BPRAM and DRAM are

addressable by the CPU

  • Physical address space is

partitioned

  • BPRAM data may be

cached in L1/L2

12

slide-13
SLIDE 13

BPRAM in the PC

L1 L2 DRAM Memory bus BPRAM

  • BPRAM and DRAM are

addressable by the CPU

  • Physical address space is

partitioned

  • BPRAM data may be

cached in L1/L2

13

slide-14
SLIDE 14

BPFS: A BPRAM File System

  • Guarantees that all file operations execute

atomicallyand in program order

  • Despite guarantees, significant performance

improvements over NTFS on the same media

  • Short-circuit shadow paging often allows

atomic, in-place updates

14

slide-15
SLIDE 15

file directory inode file

root pointer indirect blocks inodes

BPFS: A BPRAM File System

file

15

slide-16
SLIDE 16

file directory inode file

root pointer indirect blocks inodes

BPFS: A BPRAM File System

file

16

slide-17
SLIDE 17

Enforcing FS Consistency Guarantees

  • What happens if we crash during an update?

17

slide-18
SLIDE 18

Enforcing FS Consistency Guarantees

  • What happens if we crash during an update?

18

slide-19
SLIDE 19

Enforcing FS Consistency Guarantees

  • What happens if we crash during an update?

19

slide-20
SLIDE 20

Enforcing FS Consistency Guarantees

  • What happens if we crash during an update?

– Disk: Use journaling or shadow paging – BPRAM: Use short-circuit shadow paging

20

slide-21
SLIDE 21

Review 1: Journaling

  • Write to journal, then write to file system

A B

file system journal

21

slide-22
SLIDE 22

Review 1: Journaling

  • Write to journal, then write to file system

A B

file system journal

A’ B’ 22

slide-23
SLIDE 23

Review 1: Journaling

  • Write to journal, then write to file system

A B

file system journal

A’ B’ B’ A’ 23

slide-24
SLIDE 24

Review 1: Journaling

  • Write to journal, then write to file system

A B

file system journal

A’ B’ B’ A’

  • Reliable, but all data is written twice

24

slide-25
SLIDE 25

Review 2: Shadow Paging

  • Use copy-on-write up to root of file system

B A file’s root pointer 25

slide-26
SLIDE 26

Review 2: Shadow Paging

  • Use copy-on-write up to root of file system

B A A’ B’ file’s root pointer 26

slide-27
SLIDE 27

Review 2: Shadow Paging

  • Use copy-on-write up to root of file system

B A A’ B’ file’s root pointer 27

slide-28
SLIDE 28

Review 2: Shadow Paging

  • Use copy-on-write up to root of file system

B A A’ B’ file’s root pointer 28

slide-29
SLIDE 29

Review 2: Shadow Paging

  • Use copy-on-write up to root of file system

B A A’ B’ file’s root pointer 29

slide-30
SLIDE 30

Review 2: Shadow Paging

  • Use copy-on-write up to root of file system

B A A’ B’ file’s root pointer

  • Any change requires bubbling to the FS root
  • Small writes require large copying overhead

30

slide-31
SLIDE 31

Short-Circuit Shadow Paging

  • Uses byte-addressability and atomic 64b writes

B A file’s root pointer 31

  • Inspired by shadow paging

– Optimization: In-place update when possible

slide-32
SLIDE 32

Short-Circuit Shadow Paging

  • Uses byte-addressability and atomic 64b writes

B A A’ B’ file’s root pointer 32

  • Inspired by shadow paging

– Optimization: In-place update when possible

slide-33
SLIDE 33

Short-Circuit Shadow Paging

  • Uses byte-addressability and atomic 64b writes

B A A’ B’ file’s root pointer 33

  • Inspired by shadow paging

– Optimization: In-place update when possible

slide-34
SLIDE 34

Short-Circuit Shadow Paging

  • Uses byte-addressability and atomic 64b writes

B A A’ B’ file’s root pointer 34

  • Inspired by shadow paging

– Optimization: In-place update when possible

slide-35
SLIDE 35
  • Opt. 1: In-Place Writes
  • Aligned 64-bit writes are performed in place

– Data and metadata

file’s root pointer 35

slide-36
SLIDE 36
  • Opt. 1: In-Place Writes
  • Aligned 64-bit writes are performed in place

– Data and metadata

file’s root pointer in-place write 36

slide-37
SLIDE 37
  • Opt. 1: In-Place Writes
  • Aligned 64-bit writes are performed in place

– Data and metadata

file’s root pointer 37

slide-38
SLIDE 38
  • Opt. 1: In-Place Writes
  • Aligned 64-bit writes are performed in place

– Data and metadata

file’s root pointer 38

slide-39
SLIDE 39
  • Opt. 1: In-Place Writes
  • Aligned 64-bit writes are performed in place

– Data and metadata

file’s root pointer 39

slide-40
SLIDE 40
  • Appends committed by updating file size

file’s root pointer + size 40

  • Opt. 2: Exploit Data-Metadata

Invariants

slide-41
SLIDE 41
  • Appends committed by updating file size

file’s root pointer + size in-place append 41

  • Opt. 2: Exploit Data-Metadata

Invariants

slide-42
SLIDE 42
  • Appends committed by updating file size

file’s root pointer + size in-place append file size update 42

  • Opt. 2: Exploit Data-Metadata

Invariants

slide-43
SLIDE 43

BPFS Example

directory file directory inode file

root pointer indirect blocks inodes 43

slide-44
SLIDE 44

BPFS Example

directory file directory inode file

root pointer indirect blocks inodes add entry remove entry 44

  • Cross-directory rename bubbles to common

ancestor

slide-45
SLIDE 45

BPFS Example

directory file directory inode file

root pointer indirect blocks inodes 45

slide-46
SLIDE 46

Outline

  • Intro
  • File System
  • Hardware Support
  • Evaluation
  • Conclusion

46

slide-47
SLIDE 47

BPRAM L1 / L2 ... CoW Commit ...

Problem 1: Ordering

47

slide-48
SLIDE 48

BPRAM L1 / L2 ... CoW Commit ...

Problem 1: Ordering

48

slide-49
SLIDE 49

BPRAM L1 / L2 ... CoW Commit ...

Problem 1: Ordering

49

slide-50
SLIDE 50

BPRAM L1 / L2 ... CoW Commit ...

Problem 1: Ordering

50

slide-51
SLIDE 51

BPRAM L1 / L2 ... CoW Commit ...

Problem 1: Ordering

51

slide-52
SLIDE 52

... CoW Commit ...

Problem 2: Atomicity

L1 / L2 BPRAM

52

slide-53
SLIDE 53

... CoW Commit ...

Problem 2: Atomicity

L1 / L2 BPRAM

53

slide-54
SLIDE 54

... CoW Commit ...

Problem 2: Atomicity

L1 / L2 BPRAM

54

slide-55
SLIDE 55

... CoW Commit ...

Problem 2: Atomicity

L1 / L2 BPRAM

55

slide-56
SLIDE 56

Enforcing Ordering and Atomicity

  • Ordering

– Solution: Epoch barriers to declare constraints – Faster than write-through – Important hardware primitive (cf. SCSI TCQ)

  • Atomicity

– Solution: Capacitor on DIMM – Simple and cheap!

56

slide-57
SLIDE 57

... CoW Barrier Commit ...

Ordering and Atomicity

L1 / L2 BPRAM

57

slide-58
SLIDE 58

... CoW Barrier Commit ...

Ordering and Atomicity

L1 / L2 BPRAM

1 1 1 58

slide-59
SLIDE 59

... CoW Barrier Commit ...

Ordering and Atomicity

L1 / L2 BPRAM

1 1 1 59

slide-60
SLIDE 60

... CoW Barrier Commit ...

Ordering and Atomicity

L1 / L2 BPRAM

1 1 1 2 60

slide-61
SLIDE 61

... CoW Barrier Commit ...

Ordering and Atomicity

L1 / L2 BPRAM

1 1 1 2 Ineligible for eviction! 61

slide-62
SLIDE 62

... CoW Barrier Commit ...

Ordering and Atomicity

L1 / L2 BPRAM

2 Ineligible for eviction! 62

slide-63
SLIDE 63

... CoW Barrier Commit ...

Ordering and Atomicity

L1 / L2 BPRAM

2 63

slide-64
SLIDE 64

... CoW Barrier Commit ...

Ordering and Atomicity

L1 / L2 BPRAM

64

slide-65
SLIDE 65

... CoW Barrier Commit ...

Ordering and Atomicity

L1 / L2 BPRAM

65

MP works too (see paper)

slide-66
SLIDE 66

Outline

  • Intro
  • File System
  • Hardware Support
  • Evaluation
  • Conclusion

66

slide-67
SLIDE 67

Methodology

  • Built and evaluated BPFS in Windows
  • Three parts:

– Experimental: BPFS vs. NTFS on DRAM – Simulation: Epoch barrier evaluation – Analytical: BPFS on PCM

67

slide-68
SLIDE 68

2 4 6 8 10 8 64 512 4096

Random n Byte Write

Microbenchmarks

0.4 0.8 1.2 1.6 2 8 64 512 4096 Time (s)

Append n Bytes

NTFS - Disk NTFS - RAM BPFS - RAM

68 NOT DURABLE! NOT DURABLE! DURABLE! DURABLE!

slide-69
SLIDE 69

BPFS Throughput On PCM

0.25 0.5 0.75 1 Execution Time (vs. NTFS / Disk) NTFS Disk NTFS RAM BPFS RAM 69 BPFS PCM (Proj)

slide-70
SLIDE 70

BPFS Throughput On PCM

0.25 0.5 0.75 1 Execution Time (vs. NTFS / Disk) 0.25 0.5 0.75 1 200 400 600 800 Sustained Throughput of PCM (MB/s) Projected Throughput

BPFS - PCM

NTFS Disk NTFS RAM BPFS RAM 70 BPFS PCM (Proj)

slide-71
SLIDE 71

Conclusions

  • BPRAM changes the trade-offs for storage

– Use consistency technique designed for medium

  • Short-circuit shadow paging:

– improves performance – improves reliability

Bonus: PCM chips on display at poster session!

71