direct fuse removing the middleman for high performance
play

Direct-FUSE: Removing the Middleman for High-Performance FUSE File - PowerPoint PPT Presentation

Direct-FUSE: Removing the Middleman for High-Performance FUSE File System Support Yue Zhu *, Teng Wang*, Kathryn Mohror + , Adam Moody + , Kento Sato + , Muhib Khan*, Weikuan Yu* Florida State University* Lawrence Livermore National Laboratory +


  1. Direct-FUSE: Removing the Middleman for High-Performance FUSE File System Support Yue Zhu *, Teng Wang*, Kathryn Mohror + , Adam Moody + , Kento Sato + , Muhib Khan*, Weikuan Yu* Florida State University* Lawrence Livermore National Laboratory +

  2. Outline • Background & Motivation · Design · Performance Evaluation · Conclusion ROSS’18 S-2

  3. Introduction n High-performance computing (HPC) systems needs efficient file system for supporting large-scale scientific applications Ø Different file systems are used for different kinds of data in a single job Ø Both kernel- and user-level file systems can be used in the applications Ø Due to kernel-level file systems’ development complexity, reliability and portability issues, user-level file systems are more leveraged for particular I/O workloads with special purpose n Filesystem in UserSpacE (FUSE) Ø A software interface for Unix-like computer operating systems Ø It allows non-privileged users to create their own file systems without modifying kernel code Ø User defined file system is run as a separate process in user-space Ø Example: SSHFS, GlusterFS client, FusionFS(BigData’14) ROSS’18 S-3

  4. How does FUSE Work? n Execution path of a function call Ø Send the request to the user-level file system process o App program → VFS → FUSE kernel module → User-level file system process Ø Return the data back to the application program o User-level file system process → FUSE kernel module → VFS → App program Application Program User Level File System User Space Kernel Space Virtual File System (VFS) FUSE Ext4 Page Cache Storage Device ROSS’18 S-4

  5. FUSE File System vs. Native File System FUSE File System Native File System # User-kernel 4 2 Mode Switch # Context 2 0 Switch # Memory 2 1 Copies Application Program User Level File System User Space Kernel Space Virtual File System (VFS) FUSE Ext4 Page Cache Storage Device ROSS’18 S-5

  6. Number of Context Switches & I/O Bandwidth n The complexity added in FUSE file system execution path causes performance degradation in I/O bandwidth tmpfs : a file system that stores data in volatile memory Ø FUSE-tmpfs : a FUSE file system deployed on top of tmpfs Ø dd micro-benchmark and perf system profiling tool are used to gather the I/O Ø bandwidth and the number of context switches Experiment method: continually issue 1000 writes Ø Write Bandwidth # Context Switches Block FUSE-tmpfs tmpfs FUSE- tmpfs Size (KB) (MB/s) (GB/s) tmpfs 4 163 1.3 1012 7 16 372 1.6 1012 7 64 519 1.7 1012 7 128 549 2.0 1012 7 256 569 2.4 2012 7 ROSS’18 S-6

  7. Breakdown of Metadata & Data Latency n The actual file system operations (i.e. metadata or data operations) only occupy a small amount of total execution time Ø Tests are on tmpfs and FUSE-tmpfs Ø Real Operation in metadata operation: the time of conducting operation Ø Data Movement : the actual time of write in a complete write function call Ø Overhead : the cost besides the above two, e.g. the time of context switches 250 600 11.18% Real Operation Data Movement 38.12% 500 200 Overhead Overhead Latency (ns) 400 Latency (ns) 150 300 37.86% 100 200 33.7% 2.17% 50 100 34.8% 15.82% 10.08% 0 0 tmpfs FUSE-tmpfs tmpfs FUSE-tmpfs Create Close 1 4 16 64 128 256 Metadata Operations Transfer Sizes (KB) Fig. 1. Time Expense in Metadata Operations Fig. 2. Time Expense in Data Operations ROSS’18 S-7

  8. Existing Solution and Our Approach n How to reduce the overheads from FUSE? Ø Build an independent user-space library to avoid going through kernel (e.g., IndexFS (SC’14), FusionFS) Ø However, this approach cannot support multiple FUSE libraries with distinct file paths and file descriptors n We propose Direct-FUSE to support multiple backend I/O services to an application Ø We adapted libsysio to our purpose in Direct-FUSE o libsysio i s developed by Scalability team of Sandia National Lab): « a POSIX-like file I/O, and name space support for remote file systems from an application’s user-level address space. ROSS’18 S-8

  9. Outline · Background & Motivation • Design · Performance Evaluation · Conclusion ROSS’18 S-9

  10. The Overview of Direct-FUSE n Direct-FUSE mainly consists of three components Adapted-libsysio 1. o Intercept file path and file descriptor for backend services identification o Simplify metadata and data execution path in original libsysio lightweight-libfuse (not real libfuse) 2. o Abstract file system operations from backend services to unified APIs Backend services 3. o Provide defined file system operations (e.g., FusionFS) Application Program Direct-FUSE Adapted-libsysio lightweight-libfuse Backend FUSE-Ext4 FusionFS Client …. Services Ext4 FusionFS Server … ROSS’18 S-10

  11. Path and File Descriptor Operations n To facilitate the interception of file system operations for multiple backends, the operations are categorized into two: File path operations 1. Intercept prefix and path (e.g., sshfs:/sshfs/test.txt) and return mount i. information Look up corresponding inode based on the mount information, and ii. redirect to defined operations File descriptor operations 2. Find open-file record based on given file descriptor i. « Open-file record contains pointers to inode, current stream position, etc Redirect to defined operations based inode info in open-file record ii. ROSS’18 S-11

  12. Requirements for New Backends n Interact with FUSE high-level APIs n Separated as an independent user-space library Ø The library contains the fuse file system operations, initialization function, and also the unmount function Ø If a backend passes some specialized data to the fuse module via fuse_mount(), then the data has to be globalized for later file system operations n Implemented in C/C++ or has to be binary compatible with C/C++ ROSS’18 S-12

  13. Outline · Background and Challenges · Design • Performance Evaluation · Conclusion ROSS’18 S-13

  14. Experimental Methodology n We compare the bandwidth of Direct-FUSE with local FUSE file system and native file system on disk and memory by Iozone Ø Disk o Ext4-fuse: FUSE file system overlying Ext4 o Ext4-direct: Ext4-fuse bypasses the FUSE kernel o Ext4-native: original Ext4 on disk Ø Memory o tmpfs-fuse, tmpfs-direct, and tmpfs-native are similar to the three tests on disk n We also compare the I/O bandwidth of distributed FUSE file system with Direct-FUSE Ø FusionFS: a distributed file system that supports metadata- and write-intensive operations ROSS’18 S-14

  15. Sequential Write Bandwidth n Direct-FUSE achieves comparable bandwidth performance to the native file system Ø Ext4-direct outperforms Ext4-fuse by 16.5% on average Ø tmpfs-direct outperforms tmpfs-fuse at least 2.15x Ext4-fuse Ext4-direct Ext4-native tmpfs-fuse tmpfs-direct tmpfs-native 10000 Bandwidth (MB/s) 1000 100 10 1 4 16 64 256 1024 Write Transfer Sizes (KB) ROSS’18 S-15

  16. Sequential Read Bandwidth n Similar to the sequential write bandwidth, the read bandwidth of Direct-FUSE is comparable to the native file system Ø Ext4-direct outperforms Ext4-fuse by 2.5% on average Ø tmpfs-direct outperforms tmpfs-fuse at least 2.26x Ext4-fuse Ext4-direct Ext4-native tmpfs-fuse tmpfs-direct tmpfs-native 10000 Bandwidth (MB/s) 1000 100 10 1 4 16 64 256 1024 Read Transfer Sizes (KB) ROSS’18 S-16

  17. Distributed I/O Bandwidth n Direct-FUSE outperforms FusionFS in write bandwidth and shows comparable read bandwidth Ø Writes benefit more from the FUSE kernel bypassing n Direct-FUSE delivers similar scalability results as the original FusionFS 10000 10000 fusionfs direct-fusionfs fusionfs direct-fusionfs Bandwidth (MB/s) Bandwidth (MB/s) 1000 1000 100 100 10 10 1 1 1 2 4 8 16 1 2 4 8 16 Number of Nodes Number of Nodes Write Read ROSS’18 S-17

  18. Overhead Analysis n The dummy read/write occupies less than 3% of the complete I/O function time in Direct-FUSE, even when the I/O size is very small Ø Dummy write/read: no actual data movement, directly return once reach the backend service Ø Real write/read: the actual Direct-FUSE read and write I/O calls 10000 10000 dummy write real write dummy read real read 1000 1000 Latency (ns) Latency (ns) 100 100 10 10 1 1 1B 4B 16B 64B 256B 1KB 1B 4B 16B 64B 256B 1KB Transfer Sizes Transfer Sizes ROSS’18 S-18

  19. Conclusions Ø We have revealed and analyzed the context switches count and time overheads in FUSE metadata and data operations Ø We have designed and implemented Direct-FUSE, which can avoid crossing kernel boundary and support multiple FUSE backends simultaneously Ø Our experimental results indicate that Direct-FUSE achieves significant performance improvement compared to original FUSE file systems ROSS’18 S-19

  20. Sponsors of This Research ROSS’18 S-20

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend