Direct-FUSE: A User-level File System with Multiple Backends Yue - - PowerPoint PPT Presentation
Direct-FUSE: A User-level File System with Multiple Backends Yue - - PowerPoint PPT Presentation
Direct-FUSE: A User-level File System with Multiple Backends Yue Zhu yzhu@cs.fsu.edu Florida State University Outline Background & Motivation The Overview of Direct-FUSE Performance Evaluation Conclusions S-2 User Space vs.
S-2
Outline
Ø Background & Motivation Ø The Overview of Direct-FUSE Ø Performance Evaluation Ø Conclusions
S-3
User Space vs. Kernel-level File Systems
Ø The development complexity, reliability, and portability of kernel-level and user space file systems are different.
Kernel-level File System User-level File System Development Complexity 1) System crushing and restarting during debugging. 2) Language limitation. 1) Few system crushing and restarting during debugging. 2) Numerous user-space tools for debugging. 3) Less language limitation and more useful libraries. 4) Systems can be mounted and developed by non-privileged users. Reliability 1) A kernel bug crashes entire production system 1) Lower kernel crash possibility Portability 1) Significant efforts for porting a special file system to a different one 1) Easy to port to other systems
S-4
Filesystem in Userspace
Ø What is Filesystem in Userspace (FUSE) ?
– A software interface for Unix-like computer operating systems. – Non-privileged users can create their own file systems without editing kernel code. – However, the FUSE kernel module is needed to be pre-installed by system administrator. – Example:
- SSHFS: a file system client that interacts with directories and files on the
remote server over ssh connection.
- FusionFS (BigData’14): a distributed file system, which supports metadata-
intensive and write-intensive operations.
- IndexFS Client (SC’14): the client of IndexFS, which redirects applications’
file operations to the appropriate destination.
S-5
How does FUSE File System Work?
Ø Execution path of a function call
1. Send the request to the user-level file system
- App program → VFS → FUSE kernel module → User-level file system
process
2. Return the data back to the application program
- User-level file system process → FUSE kernel module → VFS → App
program
Application Program User Level File System Virtual File System (VFS) In-Built File System Storage Device 1 6 3 4 5 2 User Space Kernel Space FUSE Kernel Modules
S-6
FUSE File System vs. Native File System
Ø Overheads in FUSE file systems
– 4 user-kernel mode switches
- App ↔ kernel
- Kernel ↔ file system process
– 2 context switches
- App ↔ file system process
– 2 or 3 memory copies
- Write: App → page cache → file system
process → page cache (if made to native file system)
Ø Overhead in native file system (Ext4)
– 2 user-kernel mode switches
- App ↔ kernel
– 0 context switches – 1 memory copy
- Write: App → page cache
Application Program User Level File System Virtual File System (VFS) In-Built File System Storage Device 1 6 3 4
5 2
User Space Kernel Space FUSE Kernel Modules
S-7
Number of Context Switches & I/O Bandwidth
Ø Measuring the number of context switches and bandwidth in FUSE file system and a native file system.
– dd microbenchmark and perf are used in the tests. – FUSE-tmpfs is a FUSE file system deployed on top of tmpfs, and mounted with tuned option values.
Block Size (KB) FUSE-tmpfs Throughput (MB/s) FUSE-tmpfs # Context Switches tmpfs Throughput (GB/s) tmpfs # Context Switches 4 163 1012 1.3 7 16 372 1012 1.6 7 64 519 1012 1.7 7 128 549 1012 2.0 7 256 569 2012 2.4 7 1024 576 8012 2.5 7
S-8
Breakdown of Metadata Operation Latency
Ø The create() and close() latency on tmpfs and FUSE-tmpfs.
– Real Operation: the time in the conducting operation (the actual create or close time). – Overhead: the cost besides the real operation, e.g., the involvement
- f FUSE kernel module.
Ø The real operation time only consists of a small portion of a complete FUSE function call. 50 100 150 200 250 Latency (µs) Real Operation Overhead
Create Close
11.18% 2.17%
tmpfs FUSE-tmpfs tmpfs FUSE-tmpfs
S-9
Breakdown of Data Operation Latency
Ø The write latency on tmpfs and FUSE-tmpfs
– Data Movement: the actual write operation in a complete write function call. – Overhead: the cost besides the data movement.
Ø The data movement time only consists of a small portion of a complete FUSE I/O call.
100 200 300 400 500 600 1 4 16 64 128 256 Latency (µs) Transfer Sizes (KB) Data Movement Overhead
34.8% 33.7% 37.86% 10.08% 15.82% 38.21%
S-10
Desirable Objectives
Ø Some file systems, such as TableFS (USENIX’13), are leveraged as libraries to avoid the involvement of FUSE kernel module.
– However, this approach may not support multiple FUSE libraries with distinct file paths and file descriptors.
Ø We propose Direct-FUSE to provide multiple backend services for one application without going through the FUSE kernel.
– To reduce the overheads from FUSE modules, we adopt libsysio for providing an FUSE clients service without going through kernels. – Libsysio
- developed by the Scalable I/O team at Sandia National Laboratories.
- POSIX-like file I/O
- Name space support for file systems from the application’s user-level
address space.
S-11
Outline
Ø Background & Motivation Ø The Overview of Direct-FUSE Ø Performance Evaluation Ø Conclusions
S-12
The Overview of Direct-FUSE
Ø Direct-FUSE‘s components includes the adopted-libsysio, lightweight-libfuse, and backend services.
– Adopted-libsysio
- Distinguishes file path and
descriptor for different backends.
– lightweight-libfuse
- Not real libfuse
- Exposes file system operation to
under layer backend services.
– Backend services
- Provide defined file system
- perations.
S-13
Path and File Descriptor Operations
Ø To support multiple FUSE backends, file system operations are divided into two categories: path operations and file descriptor operations.
– Path
- Apply a prefix for the path. (sshfs:/sshfs/test.txt)
- Intercept the prefix and path to return the mount information, which
contains the pointers to the defined operations.
- When a new file is opened, the returned file descriptors of the backend is
mapped to a new file descriptor assigned by adopted-libsysio.
– File descriptor
- file record is found by the file descriptor in the open file table.
- file record contains pointers to the operations, current stream position,
etc.
S-14
Requirements for New Backends
- The file system operations work with paths and file names
instead of inodes. Ø A independent library which contains the fuse file system
- perations, initialization function, and also the unmount function.
– If there is no existing library for the backend, we have to build the library by ourselves. – If there is a library for the backend, we have to wrap its APIs and provide the initialization function.
Ø No user data is passed to FUSE module via fuse_mount() function.
– If the file system passes the user data via fuse_mount() when mount, then additional efforts are needed to globalize the user data for other file system operations.
Ø Implemented in C or C++.
S-15
Outline
Ø Background & Motivation Ø The Overview of Direct-FUSE Ø Performance Evaluation Ø Conclusions
S-16
Experimental Methodology
Ø We compare the bandwidth of Direct-FUSE with local FUSE file system and native file system on disk and RAM-disk by Iozone.
– Disk
- Ext4-fuse: FUSE file system overlying Ext4.
- Ext4-direct: Ext4-fuse bypasses the FUSE kernel.
- Ext4-native: original Ext4 on disk.
– RAM-disk
- Tmpfs-fuse, Tmpfs-direct, and Tmpfs-native are similar to the three tests
- n disk.
Ø We also compare the I/O bandwidth of distributed FUSE file system with Direct-FUSE.
– FusionFS: a distributed file system that supports metadata-intensive and write-intensive operations.
Ø Breakdown Analysis of I/O Processing in Direct-FUSE
S-17
Sequential Write Bandwidth
Ø The bandwidth of Direct-FUSE is very close to the native file system.
1 10 100 1000 10000 4 KB 16 KB 64 KB 256 KB 1 MB 4 MB 16 MB Bandwidth (MB/s) Transfer Sizes Ext4-fuse Ext4-direct Ext4-native tmpfs-fuse tmpfs-direct tmpfs-native
S-18
Sequential Read Bandwidth
Ø Similar to the sequential write bandwidth, the read bandwidth
- f Direct-FUSE is close to the native file system.
1 10 100 1000 10000 4 KB 16 KB 64 KB 256 KB 1 MB 4 MB 16 MB Bandwidth (MB/s) Transfer Sizes Ext4-fuse Ext4-direct Ext4-native tmpfs-fuse tmpfs-direct tmpfs-native
S-19
I/O Bandwidth of FusionFS
Ø According the figure, doubling the number of nodes yields doubled throughput both in read and write, which demonstrates the almost linear scalability of FusionFS and Direct-FUSE to up to 16 nodes.
1 10 100 1000 10000 1 2 4 8 16 Bandwidth (MB/s) Number of Nodes fusionfs direct-fusionfs 1 10 100 1000 10000 1 2 4 8 16 Bandwidth (MB/s) Number of Nodes fusionfs direct-fusionfs
S-20
Breakdown Analysis of I/O Processing in Direct-FUSE
Ø The dummy read/write only takes about 38 ns, which
- ccupies less than 3% of the complete I/O function time in
Direct-FUSE, even when the I/O size is very small.
– Dummy write/read: no actual data movement, directly return after reaching the backend service. – Real write/read: the actual Direct-FUSE read and write I/O calls.
1 10 100 1000 10000 1B 4B 16B 64B 256B 1KB Latency (ns) Transfer Sizes dummy write real write 1 10 100 1000 10000 1B 4B 16B 64B 256B 1KB Latency (ns) Transfer Sizes dummy read real read
S-21
Conclusions
Ø We have analyzed the additional overheads in the FUSE file system in detail. Ø To facilitate the multiple backend services, we propose the Direct-FUSE. Ø Our Direct-FUSE can largely reduce the overheads from FUSE kernel module, and support multiple FUSE backends simultaneously.
S-22