Revisiting Virtual File System for Metadata Optimized Non-Volatile - - PowerPoint PPT Presentation

revisiting virtual file system for metadata optimized non
SMART_READER_LITE
LIVE PREVIEW

Revisiting Virtual File System for Metadata Optimized Non-Volatile - - PowerPoint PPT Presentation

Revisiting Virtual File System for Metadata Optimized Non-Volatile Main Memory File System Ying Wang , Dejun Jiang, Jin Xiong Institute of Computing Technology, CAS University of


slide-1
SLIDE 1

Revisiting Virtual File System for Metadata Optimized Non-Volatile Main Memory File System

Ying Wang, Dejun Jiang, Jin Xiong Institute of Computing Technology, CAS University of Chinese Academy of Sciences

slide-2
SLIDE 2

Outline

  • Background & Motivation
  • Design

– Cachelet for metadata caching – Global hash based metadata index – Metadata scalability

  • Evaluation
  • Summary

2

slide-3
SLIDE 3

Background

  • Non-Volatile Main Memories(NVMMs) provide low latency,

high bandwidth, byte-addressable and persistent storage

– PCM, MRAM, RRAM, 3D Xpoint[1]

  • Intel releases Optane DC Persistent Memory (Optane PMM)

3 [6] R lat. W lat. R BW. W BW. DRAM 60ns 69ns 20 GB/s ~15 GB/s Optane PMM 305ns 81ns ~6GB/s ~2GB/s NVMe SSD 120us 30us 2GB/s 500MB/s HDD 10ms 10ms 0.1GB/s 0.1GB/s

[1] What is Intel Optane DC Persistent Memory. Intel. [2] Condit, SIGOPS 2009 [3] Wu, SC 2011 [4] Dulloor, EuroSys 2014 [5] Haris, EuroSys 2014 [6] The data from our evaluation and the paper of “Basic Performance Measurements of the Intel Optane DC Persistent Memory Module”

slide-4
SLIDE 4

Background

  • Non-Volatile Main Memories(NVMMs) provide low latency,

high bandwidth, byte-addressable and persistent storage

– PCM, MRAM, RRAM, 3D Xpoint[1]

  • Intel releases Optane DC Persistent Memory (Optane PMM)
  • File system can be directly built on memory

– Software has become the main factor affecting the file system performance and scalability[2,3,4,5] 3

NVMM CPU Memory bus File system

[6] R lat. W lat. R BW. W BW. DRAM 60ns 69ns 20 GB/s ~15 GB/s Optane PMM 305ns 81ns ~6GB/s ~2GB/s NVMe SSD 120us 30us 2GB/s 500MB/s HDD 10ms 10ms 0.1GB/s 0.1GB/s

I/O

[1] What is Intel Optane DC Persistent Memory. Intel. [2] Condit, SIGOPS 2009 [3] Wu, SC 2011 [4] Dulloor, EuroSys 2014 [5] Haris, EuroSys 2014 [6] The data from our evaluation and the paper of “Basic Performance Measurements of the Intel Optane DC Persistent Memory Module”

slide-5
SLIDE 5

Background

  • Existing kernel-level NVMM file systems
  • , 320

, , 11334 43 , 40 43

slide-6
SLIDE 6

Background

  • Existing kernel-level NVMM file systems

– Remove page cache, generic block layer and I/O scheduler layer

  • , 320

, , 11334 43 , 40 43

slide-7
SLIDE 7

Background

  • Existing kernel-level NVMM file systems

– Remove page cache, generic block layer and I/O scheduler layer – Retain virtual file system(VFS)[1,2,3,4]

  • , 320

, , 11334 43 , 40 43

Application VFS NVMM FS NVMM

USER Kernel

slide-8
SLIDE 8

Background

  • Existing kernel-level NVMM file systems

– Remove page cache, generic block layer and I/O scheduler layer – Retain virtual file system(VFS)[1,2,3,4]

  • dentry -> dcache

– Speed up path lookup and maintain a unified namespace

  • inode -> icache

– Speed up file metadata access

  • , 320

, , 11334 43 , 40 43

Application VFS NVMM FS NVMM

USER Kernel

slide-9
SLIDE 9

Background

  • File metadata operation type

– Lookup – Update

slide-10
SLIDE 10

Background

  • File metadata operation type

– Lookup

  • VFS warm cache (Cache hit)
  • VFS cold cache (Cache miss)

– Update

slide-11
SLIDE 11

Background

  • File metadata operation type

– Lookup

  • VFS warm cache (Cache hit)
  • VFS cold cache (Cache miss)

– Update

  • Only lookup in VFS
slide-12
SLIDE 12

Background

  • File metadata operation type

– Lookup

  • VFS warm cache (Cache hit)
  • VFS cold cache (Cache miss)

– Update

  • Only lookup in VFS

Lookup in both VFS and physical FS, and builds VFS cache

slide-13
SLIDE 13

Background

  • File metadata operation type

– Lookup

  • VFS warm cache (Cache hit)
  • VFS cold cache (Cache miss)

– Update

  • Only lookup in VFS

Lookup in both VFS and physical FS, and builds VFS cache Update both VFS and PFS

slide-14
SLIDE 14

Background

  • File metadata operation type

– Lookup

  • VFS warm cache (Cache hit)
  • VFS cold cache (Cache miss)

– Update

  • Only lookup in VFS

Lookup in both VFS and physical FS, and builds VFS cache Types Syscalls Example Lookup 20

  • pen(lookup), stat, access

Update 29

  • pen(create), remove,

rename, chown Update both VFS and PFS

slide-15
SLIDE 15

Motivation

  • The latency of NVMM is close to DRAM and supports high

concurrent access

slide-16
SLIDE 16

Motivation

  • The latency of NVMM is close to DRAM and supports high

concurrent access

– The metadata performance of physical file system is close to VFS

slide-17
SLIDE 17

Motivation

  • The latency of NVMM is close to DRAM and supports high

concurrent access

– The metadata performance of physical file system is close to VFS – File system requires high concurrent software support

slide-18
SLIDE 18

Motivation

  • The latency of NVMM is close to DRAM and supports high

concurrent access

– The metadata performance of physical file system is close to VFS – File system requires high concurrent software support

  • Traditional metadata management are not suitable for NVMM file

system

slide-19
SLIDE 19

Motivation

  • The latency of NVMM is close to DRAM and supports high

concurrent access

– The metadata performance of physical file system is close to VFS – File system requires high concurrent software support

  • Traditional metadata management are not suitable for NVMM file

system

– Two-layer metadata lookup and maintenance overhead – Low-scalability metadata operations

slide-20
SLIDE 20

Motivation

  • Tow layer metadata lookup and maintenance

– The latency of NVMM is closed to DRAM, VFS and physical file system maintain a copy of metadata respectively

  • 0%

10% 20% 30% 40% 50% 60% 70% 4KB 16KB 4KB 16KB 4KB 16KB 4KB 16KB read_ext4-dax read_NOVA write_ext4-dax write_NOVA Metadata % total execution time

VFS PFS

slide-21
SLIDE 21

Motivation

  • Tow layer metadata lookup and maintenance

– The latency of NVMM is closed to DRAM, VFS and physical file system maintain a copy of metadata respectively

  • In ext4-dax, metadata overhead accounts for 49.1%, which

VFS lookup overhead accounts for 21.2%

  • 0%

10% 20% 30% 40% 50% 60% 70% 4KB 16KB 4KB 16KB 4KB 16KB 4KB 16KB read_ext4-dax read_NOVA write_ext4-dax write_NOVA Metadata % total execution time

VFS PFS

slide-22
SLIDE 22

Motivation

  • Low-scalability metadata operations
  • 5

10 15 20 1 4 8 12 16 20 24 Throughput (M ops/s) create file delete file lookup file Cold lookup file Warm

slide-23
SLIDE 23

Motivation

  • Low-scalability metadata operations

– All metadata operations that need to be in the physical file system are locked in the parent directory

  • 5

10 15 20 1 4 8 12 16 20 24 Throughput (M ops/s) create file delete file lookup file Cold lookup file Warm

slide-24
SLIDE 24

Motivation

  • Low-scalability metadata operations

– All metadata operations that need to be in the physical file system are locked in the parent directory

  • Limit the scalability of file system metadata operations
  • Create file, delete file
  • 5

10 15 20 1 4 8 12 16 20 24 Throughput (M ops/s) create file delete file lookup file Cold lookup file Warm

slide-25
SLIDE 25

Motivation

  • Low-scalability metadata operations

– All metadata operations that need to be in the physical file system are locked in the parent directory

  • Limit the scalability of file system metadata operations
  • Create file, delete file

– When metadata is added/deleted in VFS, the VFS lock limits scalability

  • 5

10 15 20 1 4 8 12 16 20 24 Throughput (M ops/s) create file delete file lookup file Cold lookup file Warm

slide-26
SLIDE 26

Motivation

  • VFS results in two copy metadata overhead and limits

metadata scalability on NVMM file system

  • 2

4 6 8 10 12 14

  • pen

stat remove

  • pen

stat remove Cold cache Warm cache Time (us) VFS PFS

slide-27
SLIDE 27

Motivation

  • VFS results in two copy metadata overhead and limits

metadata scalability on NVMM file system

– Can we directly delete the metadata cache in VFS?

  • 2

4 6 8 10 12 14

  • pen

stat remove

  • pen

stat remove Cold cache Warm cache Time (us) VFS PFS

slide-28
SLIDE 28

Motivation

  • VFS results in two copy metadata overhead and limits

metadata scalability on NVMM file system

– Can we directly delete the metadata cache in VFS?

  • Compared to VFS having cached metadata, removing VFS

cache has low performance

  • 2

4 6 8 10 12 14

  • pen

stat remove

  • pen

stat remove Cold cache Warm cache Time (us) VFS PFS

slide-29
SLIDE 29

Motivation

  • VFS results in two copy metadata overhead and limits

metadata scalability on NVMM file system

– Can we directly delete the metadata cache in VFS?

  • Compared to VFS having cached metadata, removing VFS

cache has low performance

  • 2

4 6 8 10 12 14

  • pen

stat remove

  • pen

stat remove Cold cache Warm cache Time (us) VFS PFS

slide-30
SLIDE 30

Contribution

  • DirectFS
slide-31
SLIDE 31

Contribution

  • DirectFS

– Cachelet: a small metadata cache in VFS

slide-32
SLIDE 32

Contribution

  • DirectFS

– Cachelet: a small metadata cache in VFS – A global hash based metadata index

slide-33
SLIDE 33

Contribution

  • DirectFS

– Cachelet: a small metadata cache in VFS – A global hash based metadata index – Using fine-grained flag and atomic write to improve metadata scalability

slide-34
SLIDE 34

Contribution

  • DirectFS

– Cachelet: a small metadata cache in VFS – A global hash based metadata index – Using fine-grained flag and atomic write to improve metadata scalability

  • mnt1

index icache dcache Global mindex cachelet index dentry inode dentry inode

Physical file system Ext4, NOVA DirectFS

VFS

Interface: e.g. create, unlink

mnt2 /

Unified FS interfaces

System calls

S_DIRECTFS

slide-35
SLIDE 35

Contribution

  • DirectFS

– Cachelet: a small metadata cache in VFS – A global hash based metadata index – Using fine-grained flag and atomic write to improve metadata scalability

  • mnt1

index icache dcache Global mindex cachelet index dentry inode dentry inode

Physical file system Ext4, NOVA DirectFS

VFS

Interface: e.g. create, unlink

mnt2 /

Unified FS interfaces

System calls

S_DIRECTFS

slide-36
SLIDE 36

Outline

  • Background & Motivation
  • Design

– Cachelet for metadata caching – Global hash based metadata index – Metadata scalability

  • Evaluation
  • Summary

11

slide-37
SLIDE 37

Cachelet for metadata caching

  • VFS cachelet
slide-38
SLIDE 38

Cachelet for metadata caching

  • VFS cachelet
  • dcache

192B icache 592B File Metadata Access status Security … File name Inode addr. Lock …

slide-39
SLIDE 39

Cachelet for metadata caching

  • VFS cachelet
  • dcache

192B icache 592B File Metadata Access status Security … File name Inode addr. Lock … VFS cachelet 128B Frequently read metadata Simplified access status Security

slide-40
SLIDE 40

Cachelet for metadata caching

  • VFS cachelet

– Reducing metadata maintain overhead and keeping metadata lookup performance

  • dcache

192B icache 592B File Metadata Access status Security … File name Inode addr. Lock … VFS cachelet 128B Frequently read metadata Simplified access status Security

slide-41
SLIDE 41

Outline

  • Background & Motivation
  • Design

– Cachelet for metadata caching – Global hash based metadata index – Metadata scalability

  • Evaluation
  • Summary

13

slide-42
SLIDE 42

Global hash based metadata index

  • Global mindex
slide-43
SLIDE 43

Global hash based metadata index

  • Global mindex
  • VFS index

dcache icache PFS index dentry inode DRAM NVMM

Inode number

slide-44
SLIDE 44

Global hash based metadata index

  • Global mindex

– A global hash based metadata index indexes metadata cache and metdata

  • Reduce metadata lookup overhead
  • VFS index

dcache icache PFS index dentry inode Global mindex cachelet dentry inode DRAM NVMM

Inode number Inode number

DRAM NVMM

slide-45
SLIDE 45

Global hash based metadata index

  • File lookup

– VFS warm cache – VFS cold cache

  • File metadata update

15

slide-46
SLIDE 46

Global hash based metadata index

  • File lookup

– VFS warm cache – VFS cold cache

  • File metadata update

15

Lookup VFS

slide-47
SLIDE 47

Global hash based metadata index

  • File lookup

– VFS warm cache – VFS cold cache

  • File metadata update

15

Lookup Global mindex Lookup VFS

slide-48
SLIDE 48

Global hash based metadata index

  • File lookup

– VFS warm cache – VFS cold cache

  • File metadata update

15

Lookup Global mindex Lookup VFS Lookup VFS Lookup physical FS Build cache in VFS

slide-49
SLIDE 49

Global hash based metadata index

  • File lookup

– VFS warm cache – VFS cold cache

  • File metadata update

15

Lookup Global mindex Lookup VFS Lookup VFS Lookup physical FS Build cache in VFS Lookup Global mindex Build cachelet

slide-50
SLIDE 50

Global hash based metadata index

  • File lookup

– VFS warm cache – VFS cold cache

  • File metadata update

15

Lookup Global mindex Lookup VFS Lookup VFS Lookup physical FS Build cache in VFS Lookup Global mindex Build cachelet Update two index Update two metadata

slide-51
SLIDE 51

Global hash based metadata index

  • File lookup

– VFS warm cache – VFS cold cache

  • File metadata update

15

Lookup Global mindex Lookup VFS Lookup VFS Lookup physical FS Build cache in VFS Lookup Global mindex Build cachelet Update two index Update two metadata Update one index Update two small metadata

slide-52
SLIDE 52

Outline

  • Background & Motivation
  • Design

– Cachelet for metadata caching – Global hash based metadata index – Metadata scalability

  • Evaluation
  • Summary

16

slide-53
SLIDE 53

Metadata scalability

  • VFS index

dcache icache PFS index dentry inode DRAM NVMM

Inode number

Global mindex cachelet dentry inode

Inode number

DRAM NVMM

slide-54
SLIDE 54

Metadata scalability

  • VFS lock limits the concurrency of metadata
  • perations in directory
  • VFS index

dcache icache PFS index dentry inode DRAM NVMM

Inode number

Lock Global mindex cachelet dentry inode

Inode number

DRAM NVMM

slide-55
SLIDE 55

Metadata scalability

  • VFS lock limits the concurrency of metadata operations in

directory

  • Fine-grained flags and atomic write to remove VFS lock
  • VFS index

dcache icache PFS index dentry inode DRAM NVMM

Inode number

Lock Global mindex cachelet dentry inode

Inode number

DRAM NVMM

flag

slide-56
SLIDE 56

Metadata scalability

  • Case study: creating a file
  • Global mindex

DRAM NVMM

slide-57
SLIDE 57

Metadata scalability

  • Case study: creating a file
  • 1. Creating inode, dentry and

Cachelet

  • Mark Cachelet as unreadable
  • Global mindex

cachelet dentry inode

Inode number

DRAM NVMM

slide-58
SLIDE 58

Metadata scalability

  • Case study: creating a file
  • 1. Creating inode, dentry and

Cachelet

  • Mark Cachelet as unreadable
  • Global mindex

cachelet dentry inode

Inode number

DRAM NVMM

unread

slide-59
SLIDE 59

Metadata scalability

  • Case study: creating a file
  • 1. Creating inode, dentry and

Cachelet

  • Mark Cachelet as unreadable
  • 2. Atomically update Global

mindex

  • Insert cachelet, inode and

dentry

  • Global mindex

cachelet dentry inode

Inode number

DRAM NVMM

unread

slide-60
SLIDE 60

Metadata scalability

  • Case study: creating a file
  • 1. Creating inode, dentry and

Cachelet

  • Mark Cachelet as unreadable
  • 2. Atomically update Global

mindex

  • Insert cachelet, inode and

dentry

  • Unreadable flag prevents

concurrent reader finding the creating file

  • Global mindex

cachelet dentry inode

Inode number

DRAM NVMM

unread

slide-61
SLIDE 61

Metadata scalability

  • Case study: creating a file
  • 1. Creating inode, dentry and Cachelet
  • Mark Cachelet as unreadable
  • 2. Atomically update Global mindex
  • Insert cachelet, inode and dentry
  • Unreadable flag prevents concurrent

reader finding the creating file

  • Unreadable flag prevents concurrent

creation of the same file

  • Global mindex

cachelet dentry inode

Inode number

DRAM NVMM

unread

slide-62
SLIDE 62

Metadata scalability

  • Case study: creating a file
  • 1. Creating inode, dentry and Cachelet
  • Mark Cachelet as unreadable
  • 2. Atomically update Global mindex
  • Insert cachelet, inode and dentry
  • 3. Updating inode and cachelet of

parent directory

  • Global mindex

cachelet dentry inode

Inode number

DRAM NVMM

unread

slide-63
SLIDE 63

Metadata scalability

  • Case study: creating a file
  • 1. Creating inode, dentry and Cachelet
  • Mark Cachelet as unreadable
  • 2. Atomically update Global mindex
  • Insert cachelet, inode and dentry
  • 3. Updating inode and cachelet of

parent directory

  • 4. Mark Cachelet as readable
  • Global mindex

cachelet dentry inode

Inode number

DRAM NVMM

unread read

slide-64
SLIDE 64

Metadata scalability

  • Case study: creating a file
  • 1. Creating inode, dentry and Cachelet
  • Mark Cachelet as unreadable
  • 2. Atomically update Global mindex
  • Insert cachelet, inode and dentry
  • 3. Updating inode and cachelet of

parent directory

  • 4. Mark Cachelet as readable
  • Global mindex

cachelet dentry inode

Inode number

DRAM NVMM

read

slide-65
SLIDE 65

Metadata scalability

  • Case study: creating a file
  • 1. Creating inode, dentry and Cachelet
  • Mark Cachelet as unreadable
  • 2. Atomically update Global mindex
  • Insert cachelet, inode and dentry
  • 3. Updating inode and cachelet of

parent directory

  • 4. Mark Cachelet as readable
  • How to guarantee consistency
  • Global mindex

cachelet dentry inode

Inode number

DRAM NVMM

read

slide-66
SLIDE 66

Metadata scalability

  • Case study: creating a file
  • 1. Creating inode, dentry and Cachelet
  • Mark Cachelet as unreadable
  • 2. Atomically update Global mindex
  • Insert cachelet, inode and dentry
  • 3. Updating inode and cachelet of

parent directory

  • 4. Mark Cachelet as readable
  • How to guarantee consistency

– Extending dentry

  • Global mindex

cachelet dentry inode

Inode number

DRAM NVMM

read

slide-67
SLIDE 67

Metadata scalability

  • Case study: creating a file
  • 1. Creating inode, dentry and Cachelet
  • Mark Cachelet as unreadable
  • 2. Atomically update Global mindex
  • Insert cachelet, inode and dentry
  • 3. Updating inode and cachelet of

parent directory

  • 4. Mark Cachelet as readable
  • How to guarantee consistency

– Extending dentry

  • Recording the update of parent

directory

  • Global mindex

cachelet dentry inode

Inode number

DRAM NVMM

read

Edentry

slide-68
SLIDE 68

Metadata scalability

  • Case study: creating a file
  • 1. Creating inode, dentry and Cachelet
  • Mark Cachelet as unreadable
  • 2. Atomically update Global mindex
  • Insert cachelet, inode and dentry
  • 3. Updating inode and cachelet of

parent directory

  • 4. Mark Cachelet as readable
  • How to guarantee consistency

– Extending dentry

  • Recording the update of parent

directory

– Atomically records dentry address into log

  • Reducing contention of log write
  • Global mindex

cachelet dentry inode

Inode number

DRAM NVMM

read addr

… Log Edentry

slide-69
SLIDE 69

Metadata scalability

  • Case study: deleting a file

Global mindex cachelet dentry inode

Inode number

DRAM NVMM

read addr

… Log Edentry

slide-70
SLIDE 70

Metadata scalability

  • Case study: deleting a file

– Mark the file as deleting

  • Prevent concurrent delete operations

Global mindex cachelet dentry inode

Inode number

DRAM NVMM

read addr

… Log Edentry

deleting

slide-71
SLIDE 71

Metadata scalability

  • Case study: deleting a file

– Mark the file as deleting

  • Prevent concurrent delete operations

– Mark the file as deletion persistently

  • Guarantee consistency

Global mindex cachelet dentry inode

Inode number

DRAM NVMM

read addr

… Log Edentry

deleting

D

slide-72
SLIDE 72

Metadata scalability

  • Case study: deleting a file

– Mark the file as deleting

  • Prevent concurrent delete operations

– Mark the file as deletion persistently

  • Guarantee consistency

– Mark the file as invalid

  • The file cannot be found by other

concurrent metadata operations

Global mindex cachelet dentry inode

Inode number

DRAM NVMM

read addr

… Log Edentry

deleting

D

invalid

slide-73
SLIDE 73

Metadata scalability

  • Case study: deleting a file

– Mark the file as deleting

  • Prevent concurrent delete operations

– Mark the file as deletion persistently

  • Guarantee consistency

– Mark the file as invalid

  • The file cannot be found by other

concurrent metadata operations

– Update inode and cachelet of parent directory

Global mindex cachelet dentry inode

Inode number

DRAM NVMM

read addr

… Log Edentry

deleting

D

invalid

slide-74
SLIDE 74

Metadata scalability

  • Case study: deleting a file

– Mark the file as deleting

  • Prevent concurrent delete operations

– Mark the file as deletion persistently

  • Guarantee consistency

– Mark the file as invalid

  • The file cannot be found by other

concurrent metadata operations

– Update inode and cachelet of parent directory – Asynchronous recycles the file data

  • Support concurrent threads to read and

write the file

Global mindex DRAM NVMM

addr

… Log

NULL

slide-75
SLIDE 75

Other design issues

  • Support hard link
  • Support getcwd
  • The design of Global mindex

– How to index metadata and cache – How to collection inaccessible metadata

  • More concurrency control

– File lookup, delete and rename

  • Please refer to the paper
slide-76
SLIDE 76

Outline

  • Background & Motivation
  • Design

– Cachelet for metadata caching – Global hash based metadata index – Metadata scalability

  • Evaluation
  • Summary

23

slide-77
SLIDE 77

Evaluation

  • Platform

– Two NUMA nodes

  • Intel Gold 6271 CPU24 CPU core
  • 64G DRAM, 512GB Optane PMM
  • Only running evaluation on NUMA node 0 to avoid the effect of

NUMA architecture

  • Compared system

– Ext4-dax, NOVA

  • Benchmark

– System call, small file read/write, filebench

24

slide-78
SLIDE 78

System call

  • For metadata read operation (stat)

25 5 10 15 20

stat create rename delete stat create rename delete cold cache warm cache

Execution time (us) ext4-dax NOVA DirectFS

slide-79
SLIDE 79

System call

  • For metadata read operation (stat)

– Cold cache: reduce lookup times and maintain overhead; 56%

25 5 10 15 20

stat create rename delete stat create rename delete cold cache warm cache

Execution time (us) ext4-dax NOVA DirectFS

slide-80
SLIDE 80

System call

  • For metadata read operation (stat)

– Cold cache: reduce lookup times and maintain overhead; 56% – Warm cache: a small cachelet and Global mindex; similar

25 5 10 15 20

stat create rename delete stat create rename delete cold cache warm cache

Execution time (us) ext4-dax NOVA DirectFS

slide-81
SLIDE 81

System call

  • For metadata read operation (stat)

– Cold cache: reduce lookup times and maintain overhead; 56% – Warm cache: a small cachelet and Global mindex; similar

  • For metadata write operation

– Low metadata maintain overhead; 47%

25 5 10 15 20

stat create rename delete stat create rename delete cold cache warm cache

Execution time (us) ext4-dax NOVA DirectFS

slide-82
SLIDE 82

System call

  • Scalability

– Fine-grained flags and atomic write

  • Improving metadata scalability

26 5 10 15 20 25 1 4 8 12 16 20 24 Throughput (M ops/s) Threads ext4-dax NOVA DirectFS 2 4 6 8 1 4 8 12 16 20 24 Throughput (M ops/s) Threads ext4-dax NOVA DirectFS

Lookup in warm cache Delete

slide-83
SLIDE 83

System call

  • Scalability

– Fine-grained flags and atomic write

  • Improving metadata scalability

26 5 10 15 20 25 1 4 8 12 16 20 24 Throughput (M ops/s) Threads ext4-dax NOVA DirectFS 2 4 6 8 1 4 8 12 16 20 24 Throughput (M ops/s) Threads ext4-dax NOVA DirectFS

Lookup in warm cache Delete

slide-84
SLIDE 84

Small file read/write

  • Optimizing file metadata lookup and updates

27

20 40 60 80 100 120 140 1KB 4KB 8KB 16KB 32KB Throughput (K ops/s) R: read W: write ED: ext4-dax N: NOVA D: DirectFS W_ED W_N W_D R_ED R_N R_D

slide-85
SLIDE 85

Small file read/write

  • Optimizing file metadata lookup and updates

– Compared with ext4-dax and NOVA, the throughput of small file

  • perations is increased by 35.6% and 38.3% respectively.

27

20 40 60 80 100 120 140 1KB 4KB 8KB 16KB 32KB Throughput (K ops/s) R: read W: write ED: ext4-dax N: NOVA D: DirectFS W_ED W_N W_D R_ED R_N R_D

slide-86
SLIDE 86

Filebench

  • Varmail
  • Fileserver

28

2 4 6 8 1 4 8 16 Throughput (100 K ops/s) ext4-dax NOVA DirectFS 5 10 15 1 4 8 16 Throughput (100 K ops/s) ext4-dax NOVA DirectFS

Varmail Fileserver

slide-87
SLIDE 87

Filebench

  • Varmail

– For single thread, DirectFS increases throughput by 27%

  • Fileserver

28

2 4 6 8 1 4 8 16 Throughput (100 K ops/s) ext4-dax NOVA DirectFS 5 10 15 1 4 8 16 Throughput (100 K ops/s) ext4-dax NOVA DirectFS

Varmail Fileserver

slide-88
SLIDE 88

Filebench

  • Varmail

– For single thread, DirectFS increases throughput by 27% – For multiple threads, DirectFS increases throughput by 66.9%

  • Fileserver

28

2 4 6 8 1 4 8 16 Throughput (100 K ops/s) ext4-dax NOVA DirectFS 5 10 15 1 4 8 16 Throughput (100 K ops/s) ext4-dax NOVA DirectFS

Varmail Fileserver

slide-89
SLIDE 89

Filebench

  • Varmail

– For single thread, DirectFS increases throughput by 27% – For multiple threads, DirectFS increases throughput by 66.9%

  • Fileserver

– NVMM bandwidth has become the main factor affecting performance when running multiple threads

28

2 4 6 8 1 4 8 16 Throughput (100 K ops/s) ext4-dax NOVA DirectFS 5 10 15 1 4 8 16 Throughput (100 K ops/s) ext4-dax NOVA DirectFS

Varmail Fileserver

slide-90
SLIDE 90

Outline

  • Background & Motivation
  • Design

– Cachelet for metadata caching – Global hash based metadata index – Metadata scalability

  • Evaluation
  • Summary

29

slide-91
SLIDE 91

Summary

30

slide-92
SLIDE 92

Summary

  • The features of NVMM enable FS to be built on the

memory bus, improving the performance of FS

30

slide-93
SLIDE 93

Summary

  • The features of NVMM enable FS to be built on the

memory bus, improving the performance of FS

  • Existing NVMM file systems retain VFS

– Two-layer metadata lookup and maintenance overhead – Low-scalability metadata operations

30

slide-94
SLIDE 94

Summary

  • The features of NVMM enable FS to be built on the

memory bus, improving the performance of FS

  • Existing NVMM file systems retain VFS

– Two-layer metadata lookup and maintenance overhead – Low-scalability metadata operations

  • DirectFS: A metadata optimized high performance and

scalability file system for NVMM

30

slide-95
SLIDE 95

Summary

  • The features of NVMM enable FS to be built on the

memory bus, improving the performance of FS

  • Existing NVMM file systems retain VFS

– Two-layer metadata lookup and maintenance overhead – Low-scalability metadata operations

  • DirectFS: A metadata optimized high performance and

scalability file system for NVMM

– A small metadata cache in VFS

30

slide-96
SLIDE 96

Summary

  • The features of NVMM enable FS to be built on the

memory bus, improving the performance of FS

  • Existing NVMM file systems retain VFS

– Two-layer metadata lookup and maintenance overhead – Low-scalability metadata operations

  • DirectFS: A metadata optimized high performance and

scalability file system for NVMM

– A small metadata cache in VFS – A global hash based metadata index

30

slide-97
SLIDE 97

Summary

  • The features of NVMM enable FS to be built on the

memory bus, improving the performance of FS

  • Existing NVMM file systems retain VFS

– Two-layer metadata lookup and maintenance overhead – Low-scalability metadata operations

  • DirectFS: A metadata optimized high performance and

scalability file system for NVMM

– A small metadata cache in VFS – A global hash based metadata index – Using fine-grained flag and atomic write to improve metadata scalability

30

slide-98
SLIDE 98

Summary

  • The features of NVMM enable FS to be built on the

memory bus, improving the performance of FS

  • Existing NVMM file systems retain VFS

– Two-layer metadata lookup and maintenance overhead – Low-scalability metadata operations

  • DirectFS: A metadata optimized high performance and

scalability file system for NVMM

– A small metadata cache in VFS – A global hash based metadata index – Using fine-grained flag and atomic write to improve metadata scalability – Increase the application throughput by up to 66.9%

30

slide-99
SLIDE 99

Thanks

31

Author email: wangying01@ict.ac.cn