Better I/O Through Byte-Addressable, Persistent Memory
Jeremy Condit, Ed Nightingale, Chris Frost, Engin Ipek, Ben Lee, Doug Burger, Derrick Coetzee
Better I/O Through Byte-Addressable, Persistent Memory Jeremy Condit - - PowerPoint PPT Presentation
Better I/O Through Byte-Addressable, Persistent Memory Jeremy Condit , Ed Nightingale, Chris Frost, Engin Ipek, Ben Lee, Doug Burger, Derrick Coetzee A New World of Storage DRAM + Fast + Byte-addressable - Volatile Disk / Flash +
Better I/O Through Byte-Addressable, Persistent Memory
Jeremy Condit, Ed Nightingale, Chris Frost, Engin Ipek, Ben Lee, Doug Burger, Derrick Coetzee
A New World of Storage
+ Fast + Byte-addressable
+ Non-volatile
Disk / Flash DRAM
2
A New World of Storage
BPRAM
Byte-addressable, Persistent RAM
3
+ Fast + Byte-addressable + Non-volatile
A New World of Storage
BPRAM How do we build fast, reliable systems with BPRAM?
Byte-addressable, Persistent RAM
4
+ Fast + Byte-addressable + Non-volatile
Phase Change Memory
in mass production” – Nature, 9/25/09
5
Phase Change Memory
phase change material (chalcogenide) electrode
slow cooling -> crystalline state (1) fast cooling -> amorphous state (0)
Properties Reads: 2-4x DRAM Writes: 5-10x DRAM Endurance: 108+
6
A New World of Storage
+ Non-volatile
BPRAM Disk / Flash How do we build fast, reliable systems with BPRAM?
Byte-addressable, Persistent RAM
This talk: BPFS, a file system for BPRAM Result: Improved performance and reliability
7
+ Fast + Byte-addressable + Non-volatile
Goal
New mechanism: short-circuit shadow paging
8
New guarantees for applications
atomicallyand in program order
cache is flushed
Design Principles
use the L1/L2 cache instead
Write A Write B
memory bus
9
Outline
10
BPRAM in the PC
L1 L2 DRAM HD / Flash PCI/IDE bus Memory bus 11
BPRAM in the PC
L1 L2 DRAM HD / Flash PCI/IDE bus Memory bus BPRAM
addressable by the CPU
partitioned
cached in L1/L2
12
BPRAM in the PC
L1 L2 DRAM Memory bus BPRAM
addressable by the CPU
partitioned
cached in L1/L2
13
BPFS: A BPRAM File System
atomicallyand in program order
improvements over NTFS on the same media
atomic, in-place updates
14
file directory inode file
root pointer indirect blocks inodes
BPFS: A BPRAM File System
file
15
file directory inode file
root pointer indirect blocks inodes
BPFS: A BPRAM File System
file
16
Enforcing FS Consistency Guarantees
17
Enforcing FS Consistency Guarantees
18
Enforcing FS Consistency Guarantees
19
Enforcing FS Consistency Guarantees
– Disk: Use journaling or shadow paging – BPRAM: Use short-circuit shadow paging
20
Review 1: Journaling
A B
file system journal
21
Review 1: Journaling
A B
file system journal
A’ B’ 22
Review 1: Journaling
A B
file system journal
A’ B’ B’ A’ 23
Review 1: Journaling
A B
file system journal
A’ B’ B’ A’
24
Review 2: Shadow Paging
B A file’s root pointer 25
Review 2: Shadow Paging
B A A’ B’ file’s root pointer 26
Review 2: Shadow Paging
B A A’ B’ file’s root pointer 27
Review 2: Shadow Paging
B A A’ B’ file’s root pointer 28
Review 2: Shadow Paging
B A A’ B’ file’s root pointer 29
Review 2: Shadow Paging
B A A’ B’ file’s root pointer
30
Short-Circuit Shadow Paging
B A file’s root pointer 31
– Optimization: In-place update when possible
Short-Circuit Shadow Paging
B A A’ B’ file’s root pointer 32
– Optimization: In-place update when possible
Short-Circuit Shadow Paging
B A A’ B’ file’s root pointer 33
– Optimization: In-place update when possible
Short-Circuit Shadow Paging
B A A’ B’ file’s root pointer 34
– Optimization: In-place update when possible
– Data and metadata
file’s root pointer 35
– Data and metadata
file’s root pointer in-place write 36
– Data and metadata
file’s root pointer 37
– Data and metadata
file’s root pointer 38
– Data and metadata
file’s root pointer 39
file’s root pointer + size 40
Invariants
file’s root pointer + size in-place append 41
Invariants
file’s root pointer + size in-place append file size update 42
Invariants
BPFS Example
directory file directory inode file
root pointer indirect blocks inodes 43
BPFS Example
directory file directory inode file
root pointer indirect blocks inodes add entry remove entry 44
ancestor
BPFS Example
directory file directory inode file
root pointer indirect blocks inodes 45
Outline
46
BPRAM L1 / L2 ... CoW Commit ...
Problem 1: Ordering
47
BPRAM L1 / L2 ... CoW Commit ...
Problem 1: Ordering
48
BPRAM L1 / L2 ... CoW Commit ...
Problem 1: Ordering
49
BPRAM L1 / L2 ... CoW Commit ...
Problem 1: Ordering
50
BPRAM L1 / L2 ... CoW Commit ...
Problem 1: Ordering
51
... CoW Commit ...
Problem 2: Atomicity
L1 / L2 BPRAM
52
... CoW Commit ...
Problem 2: Atomicity
L1 / L2 BPRAM
53
... CoW Commit ...
Problem 2: Atomicity
L1 / L2 BPRAM
54
... CoW Commit ...
Problem 2: Atomicity
L1 / L2 BPRAM
55
Enforcing Ordering and Atomicity
– Solution: Epoch barriers to declare constraints – Faster than write-through – Important hardware primitive (cf. SCSI TCQ)
– Solution: Capacitor on DIMM – Simple and cheap!
56
... CoW Barrier Commit ...
Ordering and Atomicity
L1 / L2 BPRAM
57
... CoW Barrier Commit ...
Ordering and Atomicity
L1 / L2 BPRAM
1 1 1 58
... CoW Barrier Commit ...
Ordering and Atomicity
L1 / L2 BPRAM
1 1 1 59
... CoW Barrier Commit ...
Ordering and Atomicity
L1 / L2 BPRAM
1 1 1 2 60
... CoW Barrier Commit ...
Ordering and Atomicity
L1 / L2 BPRAM
1 1 1 2 Ineligible for eviction! 61
... CoW Barrier Commit ...
Ordering and Atomicity
L1 / L2 BPRAM
2 Ineligible for eviction! 62
... CoW Barrier Commit ...
Ordering and Atomicity
L1 / L2 BPRAM
2 63
... CoW Barrier Commit ...
Ordering and Atomicity
L1 / L2 BPRAM
64
... CoW Barrier Commit ...
Ordering and Atomicity
L1 / L2 BPRAM
65
MP works too (see paper)
Outline
66
Methodology
– Experimental: BPFS vs. NTFS on DRAM – Simulation: Epoch barrier evaluation – Analytical: BPFS on PCM
67
2 4 6 8 10 8 64 512 4096
Random n Byte Write
Microbenchmarks
0.4 0.8 1.2 1.6 2 8 64 512 4096 Time (s)
Append n Bytes
NTFS - Disk NTFS - RAM BPFS - RAM
68 NOT DURABLE! NOT DURABLE! DURABLE! DURABLE!
BPFS Throughput On PCM
0.25 0.5 0.75 1 Execution Time (vs. NTFS / Disk) NTFS Disk NTFS RAM BPFS RAM 69 BPFS PCM (Proj)
BPFS Throughput On PCM
0.25 0.5 0.75 1 Execution Time (vs. NTFS / Disk) 0.25 0.5 0.75 1 200 400 600 800 Sustained Throughput of PCM (MB/s) Projected Throughput
BPFS - PCM
NTFS Disk NTFS RAM BPFS RAM 70 BPFS PCM (Proj)
Conclusions
– Use consistency technique designed for medium
– improves performance – improves reliability
Bonus: PCM chips on display at poster session!
71