Consistent, Durable, and Safe Memory Management for Byte-Addressable - - PowerPoint PPT Presentation

consistent durable and safe memory management for byte
SMART_READER_LITE
LIVE PREVIEW

Consistent, Durable, and Safe Memory Management for Byte-Addressable - - PowerPoint PPT Presentation

Consistent, Durable, and Safe Memory Management for Byte-Addressable Non-Volatile Main Memory Iulian Moraru*, David Andersen*, Michael Kaminsky, Niraj Tolia , Nathan Binkert, Partha Ranganathan *Carnegie Mellon


slide-1
SLIDE 1

Consistent, Durable, and Safe Memory Management for Byte-Addressable Non-Volatile Main Memory

Iulian Moraru*, David Andersen*, Michael Kaminsky°, Niraj Tolia , Nathan Binkert, Partha Ranganathan

*Carnegie Mellon University, °Intel Labs, Maginatics, Amiato, Google

♢ ‡

Sunday, November 3, 13

slide-2
SLIDE 2

New memory technologies

2

1 1 10 100 1000 10000 100000

Access time [ns] logscale DRAM Flash

x x

Sunday, November 3, 13

slide-3
SLIDE 3

NVRAM: phase change, spin torque, memristor

  • Memory that is both fast and non-volatile
  • Will help build fast, robust systems

New memory technologies

2

1 1 10 100 1000 10000 100000

Access time [ns] logscale DRAM Flash

x x

Sunday, November 3, 13

slide-4
SLIDE 4

NVRAM: phase change, spin torque, memristor

  • Memory that is both fast and non-volatile
  • Will help build fast, robust systems

NVRAM much faster than HDD and Flash

New memory technologies

2

1 1 10 100 1000 10000 100000

Access time [ns] logscale DRAM Flash

x x

Sunday, November 3, 13

slide-5
SLIDE 5

NVRAM: phase change, spin torque, memristor

  • Memory that is both fast and non-volatile
  • Will help build fast, robust systems

NVRAM much faster than HDD and Flash

New memory technologies

2

1 1 10 100 1000 10000 100000

Access time [ns] logscale DRAM Flash

x x

Sunday, November 3, 13

slide-6
SLIDE 6

NVRAM: phase change, spin torque, memristor

  • Memory that is both fast and non-volatile
  • Will help build fast, robust systems

NVRAM much faster than HDD and Flash

New memory technologies

2

x

PCM

1 1 10 100 1000 10000 100000

Access time [ns] logscale DRAM Flash

x x

Sunday, November 3, 13

slide-7
SLIDE 7

NVRAM: phase change, spin torque, memristor

  • Memory that is both fast and non-volatile
  • Will help build fast, robust systems

NVRAM much faster than HDD and Flash

New memory technologies

2

x

PCM

1 1 10 100 1000 10000 100000

Access time [ns] logscale DRAM Flash

x x

  • Placed on the memory bus (alongside DRAM)

Sunday, November 3, 13

slide-8
SLIDE 8

NVRAM: phase change, spin torque, memristor

  • Memory that is both fast and non-volatile
  • Will help build fast, robust systems

NVRAM much faster than HDD and Flash

New memory technologies

2

x

PCM

1 1 10 100 1000 10000 100000

Access time [ns] logscale DRAM Flash

x x

  • Placed on the memory bus (alongside DRAM)
  • Needs new ways of handling persistent state

Sunday, November 3, 13

slide-9
SLIDE 9

NV main memory - assumptions

  • Mapped into process address space
  • Accessed through CPU loads/stores
  • Persistent namespace

3

DRAM + NVRAM Applications

OS

Sunday, November 3, 13

slide-10
SLIDE 10

NV main memory - assumptions

  • Mapped into process address space
  • Accessed through CPU loads/stores
  • Persistent namespace

3

DRAM + NVRAM Applications

OS Persistence Library

Sunday, November 3, 13

slide-11
SLIDE 11

NVRAM - challenge

➡ Give developers the right tools to develop data

structures for NVRAM High performance and Robustness (no data loss)

4

Sunday, November 3, 13

slide-12
SLIDE 12

Sources of data loss

5

Sunday, November 3, 13

slide-13
SLIDE 13

Sources of data loss

Wear-out

5

Sunday, November 3, 13

slide-14
SLIDE 14

Sources of data loss

Wear-out

5

Erroneous writes

Sunday, November 3, 13

slide-15
SLIDE 15

Sources of data loss

Wear-out

5

Erroneous writes

CPU caches

Sunday, November 3, 13

slide-16
SLIDE 16

Sources of data loss

Wear-out

5

Erroneous writes

CPU caches ...

Sunday, November 3, 13

slide-17
SLIDE 17

Sources of data loss

Wear-out

5

Erroneous writes

CPU caches

Memory allocator

...

Sunday, November 3, 13

slide-18
SLIDE 18

Sources of data loss

Wear-out

5

Erroneous writes

CPU caches

Memory allocator Virtual memory protection

...

Sunday, November 3, 13

slide-19
SLIDE 19

Sources of data loss

Wear-out

5

Erroneous writes

CPU caches

Memory allocator Virtual memory protection Cache line counters

...

Sunday, November 3, 13

slide-20
SLIDE 20

Malloc for NVRAM - goals

6

Sunday, November 3, 13

slide-21
SLIDE 21

Malloc for NVRAM - goals

  • Avoid frequently re-allocating the same block
  • HW wear leveling may slow down applications that

write often to one location

6

Sunday, November 3, 13

slide-22
SLIDE 22

Malloc for NVRAM - goals

  • Avoid frequently re-allocating the same block
  • HW wear leveling may slow down applications that

write often to one location

  • Minimize # of metadata updates in NVRAM
  • Wear-out & speed

6

Sunday, November 3, 13

slide-23
SLIDE 23

Malloc for NVRAM - goals

  • Avoid frequently re-allocating the same block
  • HW wear leveling may slow down applications that

write often to one location

  • Minimize # of metadata updates in NVRAM
  • Wear-out & speed
  • Make metadata robust to accidental corruption
  • Avoids extensive loss of (persistent) data

6

Sunday, November 3, 13

slide-24
SLIDE 24

Problems with allocators for DRAM

  • Reuse recently freed blocks
  • More writes than necessary
  • Metadata is embedded in allocated / free blocks
  • Frequent writes to one location

7

1 1 1

Sunday, November 3, 13

slide-25
SLIDE 25

3

Problems with allocators for DRAM

  • Reuse recently freed blocks
  • More writes than necessary
  • Metadata is embedded in allocated / free blocks
  • Frequent writes to one location

7

1 1 1

Sunday, November 3, 13

slide-26
SLIDE 26

3

Problems with allocators for DRAM

  • Reuse recently freed blocks
  • More writes than necessary
  • Metadata is embedded in allocated / free blocks
  • Frequent writes to one location

7

1 1 1

size

Sunday, November 3, 13

slide-27
SLIDE 27

3 1

Problems with allocators for DRAM

  • Reuse recently freed blocks
  • More writes than necessary
  • Metadata is embedded in allocated / free blocks
  • Frequent writes to one location

2

7

1 1 1

size

Sunday, November 3, 13

slide-28
SLIDE 28

3 1

Problems with allocators for DRAM

  • Reuse recently freed blocks
  • More writes than necessary
  • Metadata is embedded in allocated / free blocks
  • Frequent writes to one location

2 1 1

7

1 1 1

size

Sunday, November 3, 13

slide-29
SLIDE 29

3 1

Problems with allocators for DRAM

  • Reuse recently freed blocks
  • More writes than necessary
  • Metadata is embedded in allocated / free blocks
  • Frequent writes to one location

2 1 1 1

7

1 1 1

size

Sunday, November 3, 13

slide-30
SLIDE 30

3 1

Problems with allocators for DRAM

  • Reuse recently freed blocks
  • More writes than necessary
  • Metadata is embedded in allocated / free blocks
  • Frequent writes to one location

2 1 1 1

7

1 1 1

size

Sunday, November 3, 13

slide-31
SLIDE 31

3 1 1 2

Problems with allocators for DRAM

  • Reuse recently freed blocks
  • More writes than necessary
  • Metadata is embedded in allocated / free blocks
  • Frequent writes to one location

1

7

1 1 1

size

Sunday, November 3, 13

slide-32
SLIDE 32

3 1 2 1 2 3

Problems with allocators for DRAM

  • Reuse recently freed blocks
  • More writes than necessary
  • Metadata is embedded in allocated / free blocks
  • Frequent writes to one location

7

1 1 1

size

Sunday, November 3, 13

slide-33
SLIDE 33

3 1 2 1 2 3 1 2 3

Problems with allocators for DRAM

  • Reuse recently freed blocks
  • More writes than necessary
  • Metadata is embedded in allocated / free blocks
  • Frequent writes to one location

7

1 1 1

size

Sunday, November 3, 13

slide-34
SLIDE 34

3 1 2 1 2 3 1 2 3

Problems with allocators for DRAM

  • Reuse recently freed blocks
  • More writes than necessary
  • Metadata is embedded in allocated / free blocks
  • Frequent writes to one location

7

1 1 1

  • Metadata corruption can cause data loss

size

Sunday, November 3, 13

slide-35
SLIDE 35

3 1 2 1 2 3 1 2 3

Problems with allocators for DRAM

  • Reuse recently freed blocks
  • More writes than necessary
  • Metadata is embedded in allocated / free blocks
  • Frequent writes to one location

7

1 1 1

  • Metadata corruption can cause data loss

size

Sunday, November 3, 13

slide-36
SLIDE 36

3 1 2 1 2 3 1 2 3

Problems with allocators for DRAM

  • Reuse recently freed blocks
  • More writes than necessary
  • Metadata is embedded in allocated / free blocks
  • Frequent writes to one location

7

1 3 1 1

  • Metadata corruption can cause data loss

size

Sunday, November 3, 13

slide-37
SLIDE 37

3 1 2 1 2 3 1 2 3

Problems with allocators for DRAM

  • Reuse recently freed blocks
  • More writes than necessary
  • Metadata is embedded in allocated / free blocks
  • Frequent writes to one location

7

1 3

  • Metadata corruption can cause data loss

size

Sunday, November 3, 13

slide-38
SLIDE 38

3 1 2 1 2 3 1 2 3

Problems with allocators for DRAM

  • Reuse recently freed blocks
  • More writes than necessary
  • Metadata is embedded in allocated / free blocks
  • Frequent writes to one location

7

1 3

X X

  • Metadata corruption can cause data loss

size

Sunday, November 3, 13

slide-39
SLIDE 39

NVMalloc

8

NVRAM DRAM

Sunday, November 3, 13

slide-40
SLIDE 40

NVMalloc

8

NVRAM DRAM

Sunday, November 3, 13

slide-41
SLIDE 41

free / in-use bitmap

000000...

NVMalloc

8

NVRAM DRAM

Sunday, November 3, 13

slide-42
SLIDE 42

free / in-use bitmap

000000... 010001...

NVMalloc

8

NVRAM DRAM

Sunday, November 3, 13

slide-43
SLIDE 43

free / in-use bitmap

000000... 010001...

NVMalloc

8

NVRAM DRAM ...

size 1:

...

size 2:

...

size 3: free lists

Sunday, November 3, 13

slide-44
SLIDE 44

free / in-use bitmap

000000... 010001...

NVMalloc

8

NVRAM DRAM ...

size 1:

...

size 2:

...

size 3: free lists

Sunday, November 3, 13

slide-45
SLIDE 45

free / in-use bitmap

000000... 010001...

NVMalloc

8

NVRAM DRAM ...

size 1:

...

size 2:

...

size 3: free lists

Sunday, November 3, 13

slide-46
SLIDE 46

free / in-use bitmap

000000... 010001...

NVMalloc

8

NVRAM DRAM ...

size 1:

...

size 2:

...

size 3: free lists header: state, size, checksum

Sunday, November 3, 13

slide-47
SLIDE 47

free / in-use bitmap

000000... 010001...

NVMalloc

8

NVRAM DRAM ...

size 1:

...

size 2:

...

size 3: free lists header: state, size, checksum

Sunday, November 3, 13

slide-48
SLIDE 48

free / in-use bitmap

000000... 010001...

NVMalloc

8

NVRAM DRAM ...

size 1:

...

size 2:

...

size 3: free lists header: state, size, checksum

...

Sunday, November 3, 13

slide-49
SLIDE 49

free / in-use bitmap

000000... 010001...

NVMalloc

8

NVRAM DRAM ...

size 1:

...

size 2:

...

size 3: free lists header: state, size, checksum don’t allocate list:

...

Sunday, November 3, 13

slide-50
SLIDE 50
  • Freed blocks “don’t allocate list”
  • At least T seconds (e.g. T = 0.2 s)
  • Block will not be allocated more often than 1/T
  • Keep most metadata in DRAM
  • In-DRAM bitmap tracks blocks state
  • In-DRAM free lists accelerate allocations
  • Checksum in each block header
  • Checksum over: state, size, location

Our solution: NVMalloc

9

Sunday, November 3, 13

slide-51
SLIDE 51

Memory allocator

Building blocks

10

Memory allocator Virtual memory protection Virtual memory protection Cache line counters

Sunday, November 3, 13

slide-52
SLIDE 52

Memory allocator

Building blocks

10

Virtual memory protection Cache line counters

Sunday, November 3, 13

slide-53
SLIDE 53

Problem: erroneous writes

Address space code data stack Persistent data Narrow interface File system interface

11

Sunday, November 3, 13

slide-54
SLIDE 54

Problem: erroneous writes

Address space code data stack Persistent data

11

Sunday, November 3, 13

slide-55
SLIDE 55

Problem: erroneous writes

Address space code data stack Persistent data

11

x x x x x x x

erroneous

  • verwrite

Sunday, November 3, 13

slide-56
SLIDE 56

Problem: erroneous writes

Address space code data stack Persistent data

Solution: virtual memory protection

11

x x x x x x x

erroneous

  • verwrite

Sunday, November 3, 13

slide-57
SLIDE 57

Problem: erroneous writes

Address space code data stack Persistent data

Solution: virtual memory protection

11

x x x x x x x

erroneous

  • verwrite

(if we can make it fast enough)

Sunday, November 3, 13

slide-58
SLIDE 58
  • Today’s sloution: mprotect
  • Used for protecting in-memory databases (since

1990’s)

  • Used today: in-memory databases, JVM, garbage

collecting

Virtual memory protection

12

Sunday, November 3, 13

slide-59
SLIDE 59
  • Today’s sloution: mprotect
  • Used for protecting in-memory databases (since

1990’s)

  • Used today: in-memory databases, JVM, garbage

collecting

Virtual memory protection

12

High performance penalties:

  • synchronous
  • syscall overhead
  • one request at a time

Sunday, November 3, 13

slide-60
SLIDE 60

Asynchronous mprotect

  • No waiting
  • No system call overhead
  • Batching, sorting
  • Mprotect cheaper on ranges

User process Kernel thread

13

r e q u e s t b u f

Sunday, November 3, 13

slide-61
SLIDE 61

Un-protect: possible scenarios

14

time

x protect request un-protect request x

Sunday, November 3, 13

slide-62
SLIDE 62

Un-protect: possible scenarios

14

time

x protect request un-protect request x Coalesce

Sunday, November 3, 13

slide-63
SLIDE 63

Un-protect: possible scenarios

14

time

x protect request un-protect request x Coalesce

time

un-protect request x done un-protect x write x

Sunday, November 3, 13

slide-64
SLIDE 64

Un-protect: possible scenarios

14

time

x protect request un-protect request x Coalesce

time

un-protect request x done un-protect x write x Async, if sufficiently in advance of write (e.g. GC)

Sunday, November 3, 13

slide-65
SLIDE 65

Un-protect: possible scenarios

14

time

x protect request un-protect request x Coalesce

time

un-protect request x done un-protect x write x Async, if sufficiently in advance of write (e.g. GC)

time

un-protect request x write x

Sunday, November 3, 13

slide-66
SLIDE 66

Un-protect: possible scenarios

14

time

x protect request un-protect request x Coalesce

time

un-protect request x done un-protect x write x Async, if sufficiently in advance of write (e.g. GC)

time

un-protect request x write x Handle page fault in kernel, transparently

Sunday, November 3, 13

slide-67
SLIDE 67

Cache line counters Virtual memory protection Memory allocator

Building blocks

15

Virtual memory protection Cache line counters

Sunday, November 3, 13

slide-68
SLIDE 68

Cache line counters Virtual memory protection Memory allocator

Building blocks

15

Sunday, November 3, 13

slide-69
SLIDE 69
  • CPU caching: important optimization for DRAM
  • Even more so for NVRAM
  • No guaranteed order for cache line write-back

Write Caching for NVRAM

16

Sunday, November 3, 13

slide-70
SLIDE 70
  • CPU caching: important optimization for DRAM
  • Even more so for NVRAM
  • No guaranteed order for cache line write-back

Write Caching for NVRAM

16

Sunday, November 3, 13

slide-71
SLIDE 71
  • CPU caching: important optimization for DRAM
  • Even more so for NVRAM
  • No guaranteed order for cache line write-back

Write Caching for NVRAM

16

1

Sunday, November 3, 13

slide-72
SLIDE 72
  • CPU caching: important optimization for DRAM
  • Even more so for NVRAM
  • No guaranteed order for cache line write-back

Write Caching for NVRAM

16

1 2

Sunday, November 3, 13

slide-73
SLIDE 73
  • CPU caching: important optimization for DRAM
  • Even more so for NVRAM
  • No guaranteed order for cache line write-back

Write Caching for NVRAM

16

Pointer may be flushed first

1 2

Sunday, November 3, 13

slide-74
SLIDE 74
  • CPU caching: important optimization for DRAM
  • Even more so for NVRAM
  • No guaranteed order for cache line write-back

Write Caching for NVRAM

16

  • Goals:
  • Avoid forcing cache line flushes (costly)
  • Enforce ordering

Pointer may be flushed first

1 2

Sunday, November 3, 13

slide-75
SLIDE 75

Cache line counters

17

  • Make applications aware of cache state
  • Enforce ordering in software

How many cache lines updated by a transaction are still in cache?

Sunday, November 3, 13

slide-76
SLIDE 76

Evaluation

  • Testing setup:
  • Core i7 860 (2.8 GHz), 8 GB DRAM
  • DRAM as proxy for NVRAM

18

Sunday, November 3, 13

slide-77
SLIDE 77

Results: wear leveling

500 1000 1500 2000 2500 3000 3500 4000 4500 5000 50000 100000 150000 Total writes / 64B block Block number malloc NVMalloc

100K operations (10B-4KB, 50% deallocations)

Writes / 64 B block

19

Sunday, November 3, 13

slide-78
SLIDE 78

Results: wear leveling

500 1000 1500 2000 2500 3000 3500 4000 4500 5000 50000 100000 150000 Total writes / 64B block Block number malloc NVMalloc

100K operations (10B-4KB, 50% deallocations) malloc concentrates writes & 30%-50% more writes overall than NVMalloc

Writes / 64 B block

19

Sunday, November 3, 13

slide-79
SLIDE 79

20 40 60 80 100 1 10 100 1000 Percentage of cache lines Number of writes per 64 byte block NVMalloc malloc

NVMalloc in B+ tree

1M insert operations

% of all 64 B blocks

20

99 100

(log scale)

Sunday, November 3, 13

slide-80
SLIDE 80
  • Async. mprotect in B+ tree

2 4 6 8 Sync Async 200ms Async 0.1ms

Slowdown

9.0x 1.7x 7.9x 1M inserts, 256B values

21

(vs no mprotect)

Latency (median) 200ms 0.1ms Latency (max) 350ms 1.2ms

ideal

Sunday, November 3, 13

slide-81
SLIDE 81
  • Async. mprotect in B+ tree

2 4 6 8 Sync Async 200ms Async 0.1ms

Slowdown

9.0x 1.7x 7.9x 1M inserts, 256B values

21

(vs no mprotect)

different batching quanta

Latency (median) 200ms 0.1ms Latency (max) 350ms 1.2ms

ideal

Sunday, November 3, 13

slide-82
SLIDE 82

DRAM as NVRAM proxy

How would our results change on NVRAM?

  • Same malloc / NVMalloc access patterns
  • But NVMalloc writes less frequently to NVRAM
  • Same asynchronous mprotect overhead
  • But lower relative overhead b/c NVRAM is slower

22

Sunday, November 3, 13

slide-83
SLIDE 83
  • Challenges of NVRAM on the memory bus
  • Wear-out
  • Erroneous writes
  • CPU caches
  • Building blocks for NVRAM data stores

Summary

23

Async mprotect NVMalloc Persistent data stores CLC

Sunday, November 3, 13

slide-84
SLIDE 84

Code

24

github.com/efficient/nvram

Sunday, November 3, 13

slide-85
SLIDE 85

Related Work

[BPFS] J. Condit, et al.! Better I/O Through Byte- Addressable, Persistent Memory. In Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles, SOSP ’09. 2009. [Mnemosyne] H. Volos, A. J. Tack, M. M. Swift. Mnemosyne: Lightweight Persistent Memory. In Proceedings of the 16th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS’11. 2011. [NV-Heaps] J. Coburn, et al. NV-Heaps: Making Persistent Objects Fast and Safe with Next-Generation, Non- Volatile

  • Memories. In Proceeding of the 16th Inter- national Conference on Architectural Support for Programming Languages

and Operating Systems, ASPLOS’11. 2011. [Rio-Crashes] P. M. Chen, et al. The Rio File Cache: Surviving Operating System Crashes. In ASPLOS, ASPLOS- VII. 1996. [Start-Gap] M. K. Qureshi, et al. Enhancing Lifetime and Security of PCM-based Main Memory with Start-Gap Wear

  • Leveling. In Proceedings ACM MICRO. 2009.

[Wear-level-malicious] M. K. Qureshi, et al. Practical and Secure PCM Systems by Online Detection of Malicious Write

  • Streams. In Proceedings HPCA. 2011.

[BTree-NVBM] S.Venkataraman, et al. ConsistentandDurableData Structures for Non-Volatile Byte-Addressable

  • Memory. In Proceedings of the 9th USENIX Conference on File and Storage Technologies (FAST ’11). 2011.

[FlexSC] L. Soares, M. Stumm. FlexSC: Flexible system call scheduling with exception-less system calls. In Proceedings 9th USENIX OSDI. 2010.

25

Sunday, November 3, 13