Libnvmmio : Reconstructing SW IO Path with Failure-Atomic - - PowerPoint PPT Presentation

libnvmmio reconstructing sw io path with failure atomic
SMART_READER_LITE
LIVE PREVIEW

Libnvmmio : Reconstructing SW IO Path with Failure-Atomic - - PowerPoint PPT Presentation

Libnvmmio : Reconstructing SW IO Path with Failure-Atomic Memory-Mapped Interface Jungsik Choi 1 , Jaewan Hong 2 , Youngjin Kwon 2 , Hwansoo Han 1 1 2 USENIX ATC 20 SW Overhead Greater than Storage Latency ms SW Overhead


slide-1
SLIDE 1

Libnvmmio: Reconstructing SW IO Path with Failure-Atomic Memory-Mapped Interface

Jungsik Choi1, Jaewan Hong2, Youngjin Kwon2, Hwansoo Han1

USENIX ATC ‘20

  • 1

2

slide-2
SLIDE 2

SW Overhead Greater than Storage Latency

Latency SW Overhead

ns 𝜈s ms

TLC 3D NAND SSD Optane SSD NVDIMM-N PM XL-Flash SSD DCPMM PM HDD SSD

Time

2

slide-3
SLIDE 3

Reconstruct SW IO Path with Libnvmmio

  • Libnvmmio
  • Library
  • Run on any POSIX FS (DAX-mmap)
  • Transparent MMIO with logging
  • Make common IO path efficient
  • Handle data ops at user-level
  • Route metadata ops to kernel FS
  • Low-latency & scalable IO
  • Data-atomicity

3

Atomic Write

  • pen

write read fsync close

Application

NVM-aware FS

Kernel

Files

NVMM … Libnvmmio

Logs Memory Mapped Files munmap /close

  • pen

/mmap

MMIO

a

slide-4
SLIDE 4

User-Level IO is Suitable in NVMM system

  • Kernel’s IO stacks introduce SW overhead
  • User-level IO with mmap
  • Access files directly with load/store
  • Reduce user/kernel mode switches
  • Avoid complex IO stacks
  • No indexing, no permission checks
  • MMIO is the fastest way to access files

VFS File System Device Driver

NVMM

read/write load/store

OS Kernel Application

4

slide-5
SLIDE 5

Logging is more Efficeint than CoW

  • CoW (or shadow paging)
  • High write amplification
  • Hugepages make CoW more expensive
  • Frequent TLB-shutdown
  • Logging (or journaling)
  • Writing data twice: logs and files
  • Differential logging
  • Checkpointing can be postponed

5

slide-6
SLIDE 6

Redo vs. Undo

  • Most logging systems use only one policy (redo or undo)
  • They have different pros & cons depending on access type

App UNDO

Log

File App REDO

Log

File Write Async Write Read 6

  • REDO is better for writing, UNDO is better for reading
slide-7
SLIDE 7

Hybrid Logging

  • Uses adaptive policy depending on the access type of a file
  • Read-intensive file à Undo logging
  • Write-intensive file à Redo logging
  • Maintains per-file read/write counters
  • Determines logging policy on each fsync
  • Achieves the best case performance of two logging policies
  • Reduce SW overhead and improve logging efficiency

7

slide-8
SLIDE 8

Centralized Logging with Fine-Grained Locks

  • Decentralized logging was designed for transactions
  • e.g., per-thread logging, per-transaction logging
  • Centralized logging is appropriate for file IO, but not scalable
  • Requires fine-grained locks for scalable file IO

File

Log Log Log

File

Log

Decentralized Logging Centralized Logging

8

slide-9
SLIDE 9

Per-Block Logging

File Per-Block Log Radix Tree Multi-Level Tree

9

slide-10
SLIDE 10

Lock-Free Radix Tree

Global Upper Middle Table Offset

entry rwlock

  • ffset

len dest policy epoch

File Offset

lgd skip

radix_root LGD LUD LMD Table

Index Entry

Delta

Log Entry

(4KB)

9 9 9

Per-Block Log

size 9 12

10

slide-11
SLIDE 11

Commit & Checkpoint based on Epoch

  • Per-block logs are atomically committed on fsync
  • Libnvmmio commits by increasing the global epoch value
  • Committed logs have an epoch smaller than the global epoch
  • Background ckeckpointing

11 Radix Tree 1

Memory Mapped File

Per-File Metadata Background Threads 2 2 2 Per-Block Logs

slide-12
SLIDE 12

Design Summary

  • Low-latency IO

−User-level IO with mmap −Differential logging −Hybrid logging −Various log sizes −Epoch-based committing −Background checkpointing

  • Scalable IO

−Per-block logging −Lock-free index data structure

12

Libnvmmio provides low-latency and scalable IO

while guaranteeing data-atomicity

slide-13
SLIDE 13

Experimental Setup

  • Experimental Machines
  • 32GB NVDIMM-N, 20 cores and 32GB DRAM
  • 256GB Optane DC, 16 cores and 128GB DRAM (in our paper)
  • Comparison systems

Filesystem File IO Data-Atomicity Kernel Ext4-DAX Kernel X 5.1 PMFS Kernel X 4.13 NOVA Kernel O 5.1 SplitFS User O 4.13 Libnvmmio* User O 5.1

13

slide-14
SLIDE 14

Hybrid Logging

0:100 10:90 20:80 30:70 40:60 50:50 60:40 70:30 80:20 90:10 100:0

5:: 5aWiR

5 10 15 20

(lapsHd 7imH (sHc)

8ndR 5HdR HyEUid

14

slide-15
SLIDE 15

FIO: Different Access Patterns

  • A single thread, file size=4GB, block size=4KB, time=60s

6R RR 6W RW

AFFeVV PDWWern

2 4 6 8

BDnGwLGWh (GLB/V)

(xW4-DAX P0)6 N2VA /LEnvPPLR

15

slide-16
SLIDE 16

FIO: Different Write Sizes

128B 1.B 0.0 0.5 1.0 1.5

BDnGwLGWK (GLB/V)

4.B 64.B 10B 1 2 3 4

(xW4-DAX 30)6 12VA /LEnvPPLo WrLWe 6Lze

16

slide-17
SLIDE 17

FIO: Random Write with Multithreads

1 2 4 8 16

# ThreDGV

5 10 15 20 25

BDnGwLGth (GLB/V) PrLvDte fLOe

(xt4-DAX P0)6 12VA /LEnvPPLo 1 2 4 8 16

# ThreDGV

5 10 15 20 25

6hDreG fLOe

17

slide-18
SLIDE 18

TPC-C on SQLite

18

  • Underlying FS with WAL, and Libnvmmio without WAL

Ext4-DAX P0FS 1OVA SSOLtFS 0.0 0.5 1.0 1.5

1orPDOLzed tSPC

OnOy XnderOyLng FS LLEnvPPLo on FS

slide-19
SLIDE 19

SQLite WAL vs. Libnvmmio

  • SQLite WAL

−Design for block devices −Similar to REDO logging −Read both WAL and DB file −Only one writer at a time −Synchronous checkpointing

  • Libnvmmio

−Design for NVMM −Hybrid Logging −Read DB file (UNDO) −Concurrent writes −Background checkpointing

19

  • Easily improve performance with Libnvmmio

−Support any FS, Even FS that does not provide data-atomicity

slide-20
SLIDE 20

Conclusion

  • It is important to minimize SW overhead in NVMM systems
  • Libnvmmio is a simple and practical solution
  • Reconstruct SW IO path
  • Run on any filesystem that provide DAX-mmap
  • Low-latency, scalable IO while guaranteeing data-atomicity
  • 2.2x better throughput
  • 13x better scalability
  • https://github.com/chjs/libnvmmio

20

slide-21
SLIDE 21

QnA

chjs@skku.edu