Log Log-Struct ctured Non-Vo Volatile Ma Main n Me Memory - - PowerPoint PPT Presentation

log log struct ctured non vo volatile ma main n me memory
SMART_READER_LITE
LIVE PREVIEW

Log Log-Struct ctured Non-Vo Volatile Ma Main n Me Memory - - PowerPoint PPT Presentation

Log Log-Struct ctured Non-Vo Volatile Ma Main n Me Memory Qingda Hu*, Jinglei Ren, Anirudh Badam, Jiwu Shu* and Thomas Moscibroda *Tsinghua University , Microsoft Research No Non-vo volat atile memory is coming Data storage 3D


slide-1
SLIDE 1

Log Log-Struct ctured Non-Vo Volatile Ma Main n Me Memory

Qingda Hu*, Jinglei Ren, Anirudh Badam, Jiwu Shu* and Thomas Moscibroda *Tsinghua University, Microsoft Research

slide-2
SLIDE 2

No Non-vo volat atile memory is coming…

  • Data storage

2

Read: ~50ns Write: ~10GB/s Read: ~10µs Write: ~100MB/s Read: ~100ns Write: ~1GB/s 3D XPoint/Optane (2015 - ) PCM

slide-3
SLIDE 3

Background: Impact of NVM VM

  • Architecture:
  • Data persistence as a bottleneck

è 10+x application performance improvement

3

DRAM SSD DRAM NVM

Non-Volatile Main Memory (NVMM)

slide-4
SLIDE 4
  • Motivation
  • Solution: Log-structured memory management for NVMM.
  • Evaluation: 7x less memory waste; 90% higher write throughput.

Application

Library

Ex Execut utive Sum ummary

4

Application Library DRAM SSD NVMM

  • Inefficient use of

memory space

  • Inefficient support for

crash consistency

slide-5
SLIDE 5

Ou Outline

  • Motivation
  • Log-Structured NVMM
  • Tree-Based Address Mapping
  • Evaluation

5

slide-6
SLIDE 6

Mo Motivation I

  • Inefficient use of memory space
  • Reason: Traditional DRAM allocators incur high memory fragmentation.
  • Explanation:

Internal fragmentation: External fragmentation:

6

8B 16B … 8B 8B 8B 8B 8B … 8B 8B 16B 16B … 16B …

…… …… …… Waste 32B 24B 32B Waste (32B) 32B 32B Waste (32B)

64B request

slide-7
SLIDE 7

Mo Motivation I

  • Inefficient use of memory space (cont.)
  • Fragmentation is a more severe issue for NVM!

7

process process DRAM NVMM process process process process

slide-8
SLIDE 8

NVMM Home b a

Mo Motivation II

  • Inefficient support for crash consistency
  • Reason: Write-twice in log and home.
  • Explanation: Redo logging for example.

8

transaction { a += 1; b -= 1; } Log a’ b’

slide-9
SLIDE 9

Ou Outline

  • Motivation
  • Log-Structured NVMM
  • Tree-Based Address Mapping
  • Evaluation

9

slide-10
SLIDE 10

Process (user space)

Lo Log-Structured NVM VMM

  • Library and architecture

10

Allocated Available Memory management: An append-only log

Home addr. Log addr. &a &b …

Address mapping (DRAM) a translate(&a) Application X NVM device mmap() Transaction a a’

slide-11
SLIDE 11

Lo Log-Structured NVM VMM

  • Low fragmentation
  • For internal fragmentation: Compact append
  • For external fragmentation: Log cleaning

11

Allocated Available No internal fragmentation Allocated Available a a a’

slide-12
SLIDE 12

Lo Log-Structured NVM VMM

  • Efficient crash-consistent update
  • No separate areas. Write only once.
  • Header: size, checksum, etc.

12

Allocated Available Home addr. Log addr. &a &b Address mapping b transaction { a += 1; b -= 1; } a a’ b’

slide-13
SLIDE 13

Ou Outline

  • Motivation
  • Log-Structured NVMM
  • Tree-Based Address Mapping
  • Evaluation

13

slide-14
SLIDE 14

Tr Tree-Ba Base sed Ad Address ss Ma Mapping

  • Unique challenges to NVMM
  • Pervasive and highly frequent memory accesses.
  • Allocation granularity ≠ access granularity è No O(1) lookup.
  • Filesystems: hash(block number) as the index.
  • Databases: hash(key or tuple ID) as the index.
  • Main memory: hash(address)? That maps every address!
  • Tree-based mapping

made performant.

14

0xABB4, size=16 0xABC0, size=24 ... ? 0xABC8

slide-15
SLIDE 15

Tr Tree-Ba Base sed Ad Address ss Ma Mapping

  • Two-layer mapping

…… ……

Tree for a small partition (4KB) Partition index: Ο(1)

……

Ο(log 𝑜)

15

  • Improves transaction throughput by

39.6% on average.

slide-16
SLIDE 16

Tr Tree-Ba Base sed Ad Address ss Ma Mapping

  • Skip list

16

……

  • A probabilistically balanced tree. No complex

balancing operations è No locking for read-

  • nly operations.
  • Improves transaction throughput by

48.9% with four threads.

slide-17
SLIDE 17

Tr Tree-Ba Base sed Ad Address ss Ma Mapping

  • Group update
  • Within each transaction, all writes are first buffered in DRAM.
  • Writes with contiguous addresses are combined on transaction

commit.

  • Improves transaction throughput by 42.3% on average.

17

slide-18
SLIDE 18

Ou Outline

  • Motivation
  • Log-Structured NVMM
  • Tree-Based Address Mapping
  • Evaluation

18

slide-19
SLIDE 19

Ev Evaluation

  • Environment:
  • 8-core Intel Xeon CPU E5-2637 v3 (3.5 GHz), 64 GB DRAM
  • 64-bit Linux kernel version 4.2.3
  • NVM emulation: write latency = max

{500ns, 34567_95:7

;<=/9 }

  • Part I: How effective are individual optimizations? – Already shown.
  • Part II: How does LSNVMM perform against traditional systems?
  • Part III: What are the inherent costs of the log-structured approach?

19

slide-20
SLIDE 20

Ev Evaluation

  • Fragmentation: Compared to Hoard and jemalloc
  • Workloads 1 ~ 3 collected from [S. Rumble, FAST ’14].
  • Hoard/jemalloc produces 25.3%/35.0% fragmentation on average.

ØLog-structured NVM (LSNVMM) produces 4.5% fragmentation on average.

20

slide-21
SLIDE 21

Ev Evaluation

  • Transaction throughput compared to Mnemosyne
  • With 4 threads, log-structured NVMM performs 44.7% and 80.8% better than

Mnemosyne and Mnemosyne-Undo, respectively, on average.

21

slide-22
SLIDE 22

Co Conclusi sion

  • Takeaway I: Applying the log-structured approach to NVMM can

largely reduce memory fragmentation and improve system performance.

  • Takeaway II: A tree-based address mapping mechanism can be made

efficient to serve log-structured NVMM.

  • Thank you!
  • Q & A

22

slide-23
SLIDE 23

Ev Evaluation

  • Cost of log cleaning
  • The performance degradation due to log cleaning is 8% at 90% memory

utilization.

23

slide-24
SLIDE 24

Tr Tree-Ba Base sed Ad Address ss Ma Mapping

  • Hot tree node cache
  • A thread-local cache that references recently accessed nodes of the trees.
  • A special hash table design: Deliberately high collision.
  • Motivation: Addresses within a cached node are not hit due to random

distribution of their hash values.

  • Solution: Use high-order bits of an address as its hash value.
  • Improves transaction throughput by 30.1% on average.

24

? 0xABC08 0xABB* 0xABC* 0xABD* 0xABC00 (size=24) Collison and found! 0xABCD0 (size=16)

slide-25
SLIDE 25

Ba Backup

  • Recovery time (10GB logs)

25

slide-26
SLIDE 26

Ba Backup

  • DRAM footprint (1GB data)

26