Virtualize and Share Non-Volatile Memory in User Space Chih Chieh - - PowerPoint PPT Presentation

virtualize and share non volatile memory in user space
SMART_READER_LITE
LIVE PREVIEW

Virtualize and Share Non-Volatile Memory in User Space Chih Chieh - - PowerPoint PPT Presentation

Virtualize and Share Non-Volatile Memory in User Space Chih Chieh Chou, Jaemin Jung, A. L. Narasimha Reddy, Paul Gratz, and Doug Voigt May 23, 2019 DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING 1 Outline Introduction Motivation


slide-1
SLIDE 1

1 DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING

Chih Chieh Chou, Jaemin Jung, A. L. Narasimha Reddy, Paul Gratz, and Doug Voigt

Virtualize and Share Non-Volatile Memory in User Space

May 23, 2019

slide-2
SLIDE 2

2 DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING

Outline

  • Introduction
  • Motivation and Goal
  • Architecture
  • Conclusions
  • Acknowledgements
slide-3
SLIDE 3

3 DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING

Introduction

  • The non-volatile memory has becomes promising

storage device because of some amazing properties

– Byte-addressability – Non-volatility – Low latency – Low power in idle (except for NVDIMM)

HPE 8GB NVDIMM single Rank x4 DDR4-2133 Module

slide-4
SLIDE 4

4 DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING

Introduction

  • Unlike DRAM and disk, how to deploy NVM (put in

which layer of memory hierarchy) does not have an agreement so far

cache

DRAM Disk NVM

  • 1. Use DRAM as cache of NVM (w/o non-volatility)
  • 2. Use NVM as cache of disk (w/o byte-addressability)

Can we do more?

cache

DRAM Disk NVM

cache

DRAM Disk NVM

cache

DRAM Disk

slide-5
SLIDE 5

5 DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING

Challenge

  • Directly attach to memory bus as DIMM under

cache are “not persistent” after power cycling

  • Need write ordering! (sol: logs and transactions)

cache

NVM A’ B’ C’

cache

NVM

cache

NVM A B C B’ B A’ C’ B A’ C’ System crashes

slide-6
SLIDE 6

6 DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING

Motivations

  • Several prior work focusing on building a specific

file system tailed for NVM

  • Scmfs (SC’11), NOVA (FAST’16, MSST’17), Strata

(SOSP’17)

– Limit users to use their file systems – No concurrency – System calls are too expensive and will squander the low latency provided by NVM

  • Handling almost everything in user space provides much better

performance

  • Intel SPDK (https://spdk.io): user space, polling-based, NVMe

driver

– ULL SSD: Intel Optane SSD/Samsung Z-NAND

slide-7
SLIDE 7

7 DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING

Motivations

  • SNIA NVM Programming Model/Intel

PMDK(https://pmem.io/pmdk/)

– Use mmap interface to access NVM

  • Virtualize and share NVM (between processes), like

virtual memory (mmap)

– Virtual NVM capacity more than physical available capacity

  • Leveraging storage device as data final destination
  • Leveraging DRAM as cache

– Performance: better latency; avoid log searching – Write lifetime issue of PCM: reduce write to NVM

slide-8
SLIDE 8

8 DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING

Our Goals

  • User space

– library

  • Transactional interface

– Log

  • mmap-like access form
  • Virtualization and sharing of NVM

– Leverage storage device

  • DRAM cache
slide-9
SLIDE 9

9 DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING

Methodology

  • Leveraging the existing mmap function
  • Integrate DRAM, NVM, and SSD to provide virtual

NVM

– Treat (DRAM + NVM + SSD) as a huge NVM pool – Its performance is very close to that of NVM (or DRAM)

slide-10
SLIDE 10

10 DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING

Methodology

  • User space library: vNVML

– Access NVM only through vNVML – Support concurrently (processes) access – Allocate (virtual) NVM regardless of actual NVM size

Storage

DRAM NVM vNVML App1 App2 NVM vNVML App1 App2

slide-11
SLIDE 11

11 DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING

Example

ptr = nv_allocate(filepath, filesize, mode); tid = nv_txbegin(); // TX starts x = *ptr; // read y = *(ptr + sizeof(x)); // read x = 1; y = 2; nv_write(tid, ptr, &x, sizeof(x)); //write nv_write(tid, ptr+sizeof(x), &y, sizeof(y)); //write nv_commit(tid); //TX commits

slide-12
SLIDE 12

12 DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING

Components

Storage File Log buffer Cache Meta data

Virtual address space Storage NVM

SHM object

mmapping DAX shared mapping

File

  • Limitations/challenge:
  • 1. File system must support mmap
  • 2. Virtual addressed cannot be stored in NVM
slide-13
SLIDE 13

13 DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING

Components

Storage File Log buffer Cache Meta data

Process 1 virtual address space Storage NVM

SHM object File Log buffer Cache Meta data

Process 2 virtual address space

SHM object File File

slide-14
SLIDE 14

14 DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING

Data access flow

Storage File A NVM

File A Mapping

NVM log NVM Cache Meta data

  • 1. write
  • 3. read
  • 2. commit
  • 4. write back to SSD

private mmap

slide-15
SLIDE 15

15 DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING

R/W flows

NVM NVM DRAM Read cache Log buffer Write cache

Storage

R W

  • DRAM as read only cache
  • Limitation: Read committed

TX

  • NVM as log buffer and
  • Write only cache
  • Two background threads

– Update the logs to write cache – Update the write cache to storage

1 1 2

slide-16
SLIDE 16

16 DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING

Log structure

tid 5

  • pen lists

Committed list : log object P P P P tid 37 tid 2 tid 15 tid 7 tid N P P P P P P : page object (log page) tid 4 tid 1 tid 33

  • 1. Committed
  • 2. Abort

tid 37 P Limitation: write first should commit first (only when writing the same object)

slide-17
SLIDE 17

17 DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING

NVM Cache Management

Dirty list Clean list Free list log content adoption - after commit writeback - when over 30% pages are dirty cache hit cache miss cache hit

slide-18
SLIDE 18

18 DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING

Shared files

NVM DRAM Log buffer W

File A

Shared mmap Storage R msync Committed list tid N1 tid N2 tid N3 “digested” by background thread tid N4 Limitations:

  • 1. write first should commit first (only when writing the same object)
  • 2. All writes of a TX must write to the same shared file
slide-19
SLIDE 19

19 DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING

Recovery after crashing

A B C A B C A B C (a) (b) (c)

  • Dirty pages: check dirty bits
  • Logs of committed list: leverage 8-byte atomicity (pointer) of cpu
  • Insert: (a) => (b) => (c)
  • Delete: (c) => (b) => (a)
slide-20
SLIDE 20

20 DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING

Experiment methodology

  • YCSB + MongoDB + Library

– YCSB generates read/write traffic (workload) to MongoDB

  • Fixed size record: 64KB
  • Run 100K records for each experiments

– MongoDB accesses the NVM through library – Baseline: MongoDB generates files directly to NVM, and disables journaling/msync

  • Platform setting:

– 12GB DRAM, 12GB emulated NVM – CPU: 4 cores – 4 MongoDB instances run concurrently

YCSB MongoDB vNVML Storage DRAM NVM

slide-21
SLIDE 21

21 DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING

Evaluation

  • Assume NVM size is fixed, how to partition the log

buffer size and cache size?

  • How does vNVML perform compared to other

libraries?

slide-22
SLIDE 22

22 DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING

Results of fixed cache size

  • NVM cache size is 4GB, record number is the size
  • f data set in the MongoDB

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 record = 10000 (0.6GB) (2.4GB for 4) record = 15000 (0.9GB) (3.6GB for 4) record = 20000 (1.22GB) (4.88GB for 4) record = 25000 (1.5GB) (6.0GB for 4) record = 30000 (1.8GB) (7.2GB for 4)

Normalized throughput

Uniform, W/R=95/5

Log :2G Log: 1G Log: 512M Log: 128M 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 record = 10000 (0.6GB) (2.4GB for 4) record = 15000 (0.9GB) (3.6GB for 4) record = 20000 (1.22GB) (4.88GB for 4) record = 25000 (1.5GB) (6.0GB for 4) record = 30000 (1.8GB) (7.2GB for 4)

Normalized throughput

Zipfian, W/R=95/5

Log :2G Log: 1G Log: 512M Log: 128M

slide-23
SLIDE 23

23 DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING

Results of fixed cache size

  • NVM cache size is 4GB

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 record = 10000 (0.6GB) (2.4GB for 4) record = 15000 (0.9GB) (3.6GB for 4) record = 20000 (1.22GB) (4.88GB for 4) record = 25000 (1.5GB) (6.0GB for 4) record = 30000 (1.8GB) (7.2GB for 4)

Normalized throughput

Uniform, W/R=50/50

Log :2G Log: 1G Log: 512M Log: 128M 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 record = 10000 (0.6GB) (2.4GB for 4) record = 15000 (0.9GB) (3.6GB for 4) record = 20000 (1.22GB) (4.88GB for 4) record = 25000 (1.5GB) (6.0GB for 4) record = 30000 (1.8GB) (7.2GB for 4)

Normalized throughput

Zipfian, W/R=50/50

Log :2G Log: 1G Log: 512M Log: 128M

slide-24
SLIDE 24

24 DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING

Results of fixed cache size

  • NVM cache size is 4GB

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 record = 10000 (0.6GB) (2.4GB for 4) record = 15000 (0.9GB) (3.6GB for 4) record = 20000 (1.22GB) (4.88GB for 4) record = 25000 (1.5GB) (6.0GB for 4) record = 30000 (1.8GB) (7.2GB for 4)

Normalized throughput

Uniform, W/R=30/70

Log :2G Log: 1G Log: 512M Log: 128M 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 record = 10000 (0.6GB) (2.4GB for 4) record = 15000 (0.9GB) (3.6GB for 4) record = 20000 (1.22GB) (4.88GB for 4) record = 25000 (1.5GB) (6.0GB for 4) record = 30000 (1.8GB) (7.2GB for 4)

Normalized throughput

Zipfian, W/R=30/70

Log :2G Log: 1G Log: 512M Log: 128M

slide-25
SLIDE 25

25 DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING

Results of fixed cache size

  • NVM cache size is 4GB

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 record = 10000 (0.6GB) (2.4GB for 4) record = 15000 (0.9GB) (3.6GB for 4) record = 20000 (1.22GB) (4.88GB for 4) record = 25000 (1.5GB) (6.0GB for 4) record = 30000 (1.8GB) (7.2GB for 4)

Normalized throughput

Uniform, W/R=5/95

Log :2G Log: 1G Log: 512M Log: 128M 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 record = 10000 (0.6GB) (2.4GB for 4) record = 15000 (0.9GB) (3.6GB for 4) record = 20000 (1.22GB) (4.88GB for 4) record = 25000 (1.5GB) (6.0GB for 4) record = 30000 (1.8GB) (7.2GB for 4)

Normalized throughput

Zipfian, W/R=5/95

Log :2G Log: 1G Log: 512M Log: 128M

slide-26
SLIDE 26

26 DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING

Results of fixed log buffer size

  • NVM log buffer size is 2GB

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 record = 10000 (0.6GB) (2.4GB for 4) record = 15000 (0.9GB) (3.6GB for 4) record = 20000 (1.22GB) (4.88GB for 4) record = 25000 (1.5GB) (6.0GB for 4) record = 30000 (1.8GB) (7.2GB for 4)

Normalized throughput

Uniform, W/R=95/5

Cache :8G Cache :4G Cache :2G Cache :1G 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 record = 10000 (0.6GB) (2.4GB for 4) record = 15000 (0.9GB) (3.6GB for 4) record = 20000 (1.22GB) (4.88GB for 4) record = 25000 (1.5GB) (6.0GB for 4) record = 30000 (1.8GB) (7.2GB for 4)

Normalized throughput

Zipfian, W/R=95/5

Cache :8G Cache :4G Cache :2G Cache :1G

slide-27
SLIDE 27

27 DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING

Results of fixed log buffer size

  • NVM log buffer size is 2GB

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 record = 10000 (0.6GB) (2.4GB for 4) record = 15000 (0.9GB) (3.6GB for 4) record = 20000 (1.22GB) (4.88GB for 4) record = 25000 (1.5GB) (6.0GB for 4) record = 30000 (1.8GB) (7.2GB for 4)

Normalized throughput

Uniform, W/R=50/50

Cache :8G Cache :4G Cache :2G Cache :1G 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 record = 10000 (0.6GB) (2.4GB for 4) record = 15000 (0.9GB) (3.6GB for 4) record = 20000 (1.22GB) (4.88GB for 4) record = 25000 (1.5GB) (6.0GB for 4) record = 30000 (1.8GB) (7.2GB for 4)

Normalized throughput

Zipfian, W/R=50/50

Cache :8G Cache :4G Cache :2G Cache :1G

slide-28
SLIDE 28

28 DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING

Results of fixed log buffer size

  • NVM log buffer size is 2GB

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 record = 10000 (0.6GB) (2.4GB for 4) record = 15000 (0.9GB) (3.6GB for 4) record = 20000 (1.22GB) (4.88GB for 4) record = 25000 (1.5GB) (6.0GB for 4) record = 30000 (1.8GB) (7.2GB for 4)

Normalized throughput

Uniform, W/R=30/70

Cache :8G Cache :4G Cache :2G Cache :1G 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 record = 10000 (0.6GB) (2.4GB for 4) record = 15000 (0.9GB) (3.6GB for 4) record = 20000 (1.22GB) (4.88GB for 4) record = 25000 (1.5GB) (6.0GB for 4) record = 30000 (1.8GB) (7.2GB for 4)

Normalized throughput

Zipfian, W/R=30/70

Cache :8G Cache :4G Cache :2G Cache :1G

slide-29
SLIDE 29

29 DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING

Results of fixed log buffer size

  • NVM log buffer size is 2GB

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 record = 10000 (0.6GB) (2.4GB for 4) record = 15000 (0.9GB) (3.6GB for 4) record = 20000 (1.22GB) (4.88GB for 4) record = 25000 (1.5GB) (6.0GB for 4) record = 30000 (1.8GB) (7.2GB for 4)

Normalized throughput

Uniform, W/R=5/95

Cache :8G Cache :4G Cache :2G Cache :1G 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 record = 10000 (0.6GB) (2.4GB for 4) record = 15000 (0.9GB) (3.6GB for 4) record = 20000 (1.22GB) (4.88GB for 4) record = 25000 (1.5GB) (6.0GB for 4) record = 30000 (1.8GB) (7.2GB for 4)

Normalized throughput

Zipfian, W/R=5/95

Cache :8G Cache :4G Cache :2G Cache :1G

slide-30
SLIDE 30

30 DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING

Results of read only case

  • NVM log buffer size is 128MB, cache size is 4GB

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 record = 10000 (0.6GB) (2.4GB for 4) record = 15000 (0.9GB) (3.6GB for 4) record = 20000 (1.22GB) (4.88GB for 4) record = 25000 (1.5GB) (6.0GB for 4) record = 30000 (1.8GB) (7.2GB for 4)

Normalized throughput

Uniform, W/R=0/100

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 record = 10000 (0.6GB) (2.4GB for 4) record = 15000 (0.9GB) (3.6GB for 4) record = 20000 (1.22GB) (4.88GB for 4) record = 25000 (1.5GB) (6.0GB for 4) record = 30000 (1.8GB) (7.2GB for 4)

Normalized throughput

Zipfian, W/R=0/100

slide-31
SLIDE 31

31 DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING

Results of docker container using bind mount

  • NVM log buffer size is 2GB
  • Baseline: access library from normal processes

with the same setting

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 record = 10000 (0.6GB) (2.4GB for 4) record = 15000 (0.9GB) (3.6GB for 4) record = 20000 (1.22GB) (4.88GB for 4) record = 25000 (1.5GB) (6.0GB for 4) record = 30000 (1.8GB) (7.2GB for 4)

Normalized throughput

Zipfian, W/R=30/70

Cache :8G Cache :4G Cache :2G Cache :1G 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 record = 10000 (0.6GB) (2.4GB for 4) record = 15000 (0.9GB) (3.6GB for 4) record = 20000 (1.22GB) (4.88GB for 4) record = 25000 (1.5GB) (6.0GB for 4) record = 30000 (1.8GB) (7.2GB for 4)

Normailzed throughput

Zipfian, W/R=95/5

Cache :8G Cache :4G Cache :2G Cache :1G

slide-32
SLIDE 32

32 DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING

Comparison of other libraries

  • We use a microbenchmark to compare three

libraries: vNVML, PMDK, and SoftWrAP (MSST’15)

  • We allocate a 2GB array in NVM, and write certain

amount of data to each 4K page until we have written all pages in the 2GB NVM array

slide-33
SLIDE 33

33 DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING

Results

X-axis stands for the written data of each page; Y-axis is total execution time Write 2GB NVM array once Write 2GB NVM array 16 times

slide-34
SLIDE 34

34 DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING

Conclusions

  • The log buffer size does not affect the performance

a lot (less than 10%) when we shrink the size of log buffer from 2GB to 128MB

  • The vNVML can provide over 90% throughput

compared to that of baseline if the NVM cache system can handle the write traffic well

  • The performance between accessed vNVML from

normal processes and from docker container has no much difference

slide-35
SLIDE 35

35 DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING

Acknowledgements

  • Thank the generous support from Hewlett Packard

Enterprise and National Science Foundation through IUCRC (Industry–University Cooperative Research Centers) Program

slide-36
SLIDE 36

36 DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING

Thank You