Direct-FUSE: Removing the Middleman for High-Performance FUSE File - - PowerPoint PPT Presentation

direct fuse removing the middleman for high performance
SMART_READER_LITE
LIVE PREVIEW

Direct-FUSE: Removing the Middleman for High-Performance FUSE File - - PowerPoint PPT Presentation

Direct-FUSE: Removing the Middleman for High-Performance FUSE File System Support Yue Zhu *, Teng Wang*, Kathryn Mohror + , Adam Moody + , Kento Sato + , Muhib Khan*, Weikuan Yu* Florida State University* Lawrence Livermore National Laboratory +


slide-1
SLIDE 1

Direct-FUSE: Removing the Middleman for High-Performance FUSE File System Support

Yue Zhu*, Teng Wang*, Kathryn Mohror+, Adam Moody+, Kento Sato+, Muhib Khan*, Weikuan Yu* Florida State University*

Lawrence Livermore National Laboratory+

slide-2
SLIDE 2

ROSS’18 S-2

Outline

  • Background & Motivation

·Design ·Performance Evaluation ·Conclusion

slide-3
SLIDE 3

ROSS’18 S-3

Introduction

n High-performance computing (HPC) systems needs efficient

file system for supporting large-scale scientific applications

Ø Different file systems are used for different kinds of data in a single job Ø Both kernel- and user-level file systems can be used in the applications Ø Due to kernel-level file systems’ development complexity, reliability and

portability issues, user-level file systems are more leveraged for particular I/O workloads with special purpose

n Filesystem in UserSpacE (FUSE)

Ø A software interface for Unix-like computer operating systems Ø It allows non-privileged users to create their own file systems without

modifying kernel code

Ø User defined file system is run as a separate process in user-space Ø Example: SSHFS, GlusterFS client, FusionFS(BigData’14)

slide-4
SLIDE 4

ROSS’18 S-4

How does FUSE Work?

n Execution path of a function call

Ø Send the request to the user-level file system process

  • App program → VFS → FUSE kernel module → User-level file system

process

Ø Return the data back to the application program

  • User-level file system process → FUSE kernel module → VFS → App

program Application Program User Level File System Ext4 Storage Device User Space Kernel Space FUSE Page Cache Virtual File System (VFS)

slide-5
SLIDE 5

ROSS’18 S-5

FUSE File System vs. Native File System

FUSE File System Native File System # User-kernel Mode Switch 4 2 # Context Switch 2 # Memory Copies 2 1 Application Program User Level File System Ext4 Storage Device User Space Kernel Space FUSE Page Cache Virtual File System (VFS)

slide-6
SLIDE 6

ROSS’18 S-6

Number of Context Switches & I/O Bandwidth

n The complexity added in FUSE file system execution path

causes performance degradation in I/O bandwidth

Ø

tmpfs: a file system that stores data in volatile memory

Ø

FUSE-tmpfs: a FUSE file system deployed on top of tmpfs

Ø

dd micro-benchmark and perf system profiling tool are used to gather the I/O bandwidth and the number of context switches

Ø

Experiment method: continually issue 1000 writes Write Bandwidth # Context Switches Block Size (KB) FUSE-tmpfs (MB/s) tmpfs (GB/s) FUSE- tmpfs tmpfs 4 163 1.3 1012 7 16 372 1.6 1012 7 64 519 1.7 1012 7 128 549 2.0 1012 7 256 569 2.4 2012 7

slide-7
SLIDE 7

ROSS’18 S-7

Breakdown of Metadata & Data Latency

n The actual file system operations (i.e. metadata or data

  • perations) only occupy a small amount of total execution time

Ø Tests are on tmpfs and FUSE-tmpfs Ø Real Operation in metadata operation: the time of conducting operation Ø Data Movement: the actual time of write in a complete write function call Ø Overhead: the cost besides the above two, e.g. the time of context switches

100 200 300 400 500 600 1 4 16 64 128 256 Latency (ns) Transfer Sizes (KB) Data Movement Overhead

34.8% 33.7% 37.86% 15.82% 10.08% 38.12%

50 100 150 200 250 Latency (ns) Metadata Operations Real Operation Overhead Create Close

11.18% 2.17% tmpfs FUSE-tmpfs tmpfs FUSE-tmpfs

  • Fig. 1. Time Expense in Metadata Operations
  • Fig. 2. Time Expense in Data Operations
slide-8
SLIDE 8

ROSS’18 S-8

Existing Solution and Our Approach

n How to reduce the overheads from FUSE?

Ø Build an independent user-space library to avoid going

through kernel (e.g., IndexFS (SC’14), FusionFS)

Ø However, this approach cannot support multiple FUSE

libraries with distinct file paths and file descriptors

n We propose Direct-FUSE to support multiple backend

I/O services to an application

Ø We adapted libsysio to our purpose in Direct-FUSE

  • libsysio is developed by Scalability team of Sandia National Lab):

« a POSIX-like file I/O, and name space support for remote file systems

from an application’s user-level address space.

slide-9
SLIDE 9

ROSS’18 S-9

Outline

·Background & Motivation

  • Design

·Performance Evaluation ·Conclusion

slide-10
SLIDE 10

ROSS’18 S-10

n Direct-FUSE mainly consists of three components

1.

Adapted-libsysio

  • Intercept file path and file descriptor for backend services identification
  • Simplify metadata and data execution path in original libsysio

2.

lightweight-libfuse (not real libfuse)

  • Abstract file system operations from backend services to unified APIs

3.

Backend services

  • Provide defined file system operations (e.g., FusionFS)

The Overview of Direct-FUSE

Application Program Ext4 Adapted-libsysio lightweight-libfuse FUSE-Ext4 FusionFS Client …. FusionFS Server … Backend Services

Direct-FUSE

slide-11
SLIDE 11

ROSS’18 S-11

Path and File Descriptor Operations

n To facilitate the interception of file system operations

for multiple backends, the operations are categorized into two:

1.

File path operations

i.

Intercept prefix and path (e.g., sshfs:/sshfs/test.txt) and return mount information

ii.

Look up corresponding inode based on the mount information, and redirect to defined operations

2.

File descriptor operations

i.

Find open-file record based on given file descriptor

« Open-file record contains pointers to inode, current stream position,

etc

ii.

Redirect to defined operations based inode info in open-file record

slide-12
SLIDE 12

ROSS’18 S-12

Requirements for New Backends

n Interact with FUSE high-level APIs n Separated as an independent user-space library

Ø The library contains the fuse file system operations,

initialization function, and also the unmount function

Ø If a backend passes some specialized data to the fuse

module via fuse_mount(), then the data has to be globalized for later file system operations

n Implemented in C/C++ or has to be binary compatible

with C/C++

slide-13
SLIDE 13

ROSS’18 S-13

Outline

·Background and Challenges ·Design

  • Performance Evaluation

·Conclusion

slide-14
SLIDE 14

ROSS’18 S-14

Experimental Methodology

n We compare the bandwidth of Direct-FUSE with local

FUSE file system and native file system on disk and memory by Iozone

Ø Disk

  • Ext4-fuse: FUSE file system overlying Ext4
  • Ext4-direct: Ext4-fuse bypasses the FUSE kernel
  • Ext4-native: original Ext4 on disk

Ø Memory

  • tmpfs-fuse, tmpfs-direct, and tmpfs-native are similar to the three tests on

disk

n We also compare the I/O bandwidth of distributed

FUSE file system with Direct-FUSE

Ø FusionFS: a distributed file system that supports metadata-

and write-intensive operations

slide-15
SLIDE 15

ROSS’18 S-15

Sequential Write Bandwidth

n Direct-FUSE achieves comparable bandwidth

performance to the native file system

Ø Ext4-direct outperforms Ext4-fuse by 16.5% on average Ø tmpfs-direct outperforms tmpfs-fuse at least 2.15x

1 10 100 1000 10000 4 16 64 256 1024 Bandwidth (MB/s) Write Transfer Sizes (KB) Ext4-fuse Ext4-direct Ext4-native tmpfs-fuse tmpfs-direct tmpfs-native

slide-16
SLIDE 16

ROSS’18 S-16

Sequential Read Bandwidth

n Similar to the sequential write bandwidth, the read

bandwidth of Direct-FUSE is comparable to the native file system

Ø Ext4-direct outperforms Ext4-fuse by 2.5% on average Ø tmpfs-direct outperforms tmpfs-fuse at least 2.26x

1 10 100 1000 10000 4 16 64 256 1024 Bandwidth (MB/s) Read Transfer Sizes (KB) Ext4-fuse Ext4-direct Ext4-native tmpfs-fuse tmpfs-direct tmpfs-native

slide-17
SLIDE 17

ROSS’18 S-17

Distributed I/O Bandwidth

n Direct-FUSE outperforms FusionFS in write

bandwidth and shows comparable read bandwidth

Ø Writes benefit more from the FUSE kernel bypassing

n Direct-FUSE delivers similar scalability results as the

  • riginal FusionFS

1 10 100 1000 10000 1 2 4 8 16 Bandwidth (MB/s) Number of Nodes fusionfs direct-fusionfs 1 10 100 1000 10000 1 2 4 8 16 Bandwidth (MB/s) Number of Nodes fusionfs direct-fusionfs Write Read

slide-18
SLIDE 18

ROSS’18 S-18

Overhead Analysis

n The dummy read/write occupies less than 3% of the

complete I/O function time in Direct-FUSE, even when the I/O size is very small

Ø Dummy write/read: no actual data movement, directly

return once reach the backend service

Ø Real write/read: the actual Direct-FUSE read and write I/O

calls

1 10 100 1000 10000 1B 4B 16B 64B 256B 1KB Latency (ns) Transfer Sizes dummy write real write 1 10 100 1000 10000 1B 4B 16B 64B 256B 1KB Latency (ns) Transfer Sizes dummy read real read

slide-19
SLIDE 19

ROSS’18 S-19

Conclusions

Ø We have revealed and analyzed the context switches count and time overheads in FUSE metadata and data operations Ø We have designed and implemented Direct-FUSE, which can avoid crossing kernel boundary and support multiple FUSE backends simultaneously Ø Our experimental results indicate that Direct-FUSE achieves significant performance improvement compared to original FUSE file systems

slide-20
SLIDE 20

ROSS’18 S-20

Sponsors of This Research