compoundfs compounding i o operations in firmware file
play

CompoundFS: Compounding I/O Operations in Firmware File Systems - PowerPoint PPT Presentation

CompoundFS: Compounding I/O Operations in Firmware File Systems Yujie Ren 1 , Jian Zhang 2 and Sudarsun Kannan 1 1 Rutgers University; 2 ShanghaiTech University Outline Background Analysis Design Evaluation Conclusion 2


  1. CompoundFS: Compounding I/O Operations in Firmware File Systems Yujie Ren 1 , Jian Zhang 2 and Sudarsun Kannan 1 1 Rutgers University; 2 ShanghaiTech University

  2. Outline • Background • Analysis • Design • Evaluation • Conclusion 2

  3. In-storage Processors Are Powerful Samsung 840 Samsung 970 Intel X25M Year: 2008 2013 2018 Price: $7.4/GB $0.92/GB $0.80/GB CPU: 2-core 3-core 5-core RAM: 128MB DDR2 512MB LPDDR2 1GB LPDDR4 B/W: 250 MB/s 500 MB/s 3300 MB/s Latency: ~70 𝜈 s ~60 𝜈 s ~40 𝜈 s 3

  4. Software Latency Matters Now Application write() : Kernel Trap VFS Layer : Data Copy : OS Overhead Page Cache Actual FS PMFS ext4 1 - 4 𝜈𝑡 Block I/O Layer Software Device Driver OS Kernel Overhead Matters ! 4

  5. Current Solutions • DirectFS (i.e. Strata, SplitFS, DevFS) reduces software overhead bypassing OS kernel partially or fully : data-plane ops : control-plane ops Application Application Application FS Lib FS Lib FS Lib Kernel FS Server DAX FS Firmware FS Storage Storage Storage Strata (SOSP’17) SplitFS (SOSP’19) DevFS (FAST’18) 5

  6. Limitation of Current Solutions • DirectFS designs do not reduce boundary crossing - Strata needs boundary crossing between FS Lib and FS Server - SplitFS needs kernel trap for control-plane operations - DevFS suffers from high PCIe latency for every operation • DirectFS designs do not efficiently reduce data copy - Current solutions need multiple data copies back and forth between application and storage stack • DirectFS designs do not utilize in-storage computation - Current solutions only use host CPUs for I/O related operations 6

  7. Outline • Background • Analysis • Design • Evaluation • Conclusion 7

  8. Analysis Methodology • Storage - Emulated persistent memory on DRAM like prior work (e.g., SplitFS) • File Systems - ext4-DAX: ext4 on byte-addressable storage bypassing page cache - SplitFS: direct-access file system bypassing kernel for data-plane ops • Application - LevelDB: Well-known persistent key-value store - db_bench: random write and read benchmarks 8

  9. LevelDB Overhead Breakdown Data allocation (OS) Data copy (OS) Filesystem update (OS) Lock (OS) 80% Data allocation (user) Data copy (user) Run time percentage(%) CRC32 (user) 60% 40% 20% 0% 256 4096 256 4096 (DAX) (DAX) (SplitFS) (SplitFS) Value size (bytes) • LevelDB spends significant time (~%50) in OS storage stack • Spends ~%15 of time on data copy between App and OS • Spends ~%20 of time on App-level crash consistency – CRC of data 9

  10. Outline • Background • Analysis • Design • Evaluation • Conclusion 10

  11. Our solution: CompoundFS • Combine (compound) multiple file system I/O ops into one • Offload I/O pre- and post-processing to storage-level CPUs • Bypass OS kernel and provide direct-access 11

  12. Our solution: CompoundFS • Combine (compound) multiple file system I/O ops into one - e.g. write() after read() compounded to write-after-read() - Reduces boundary crossing b/w host and storage (e.g., syscall) • Offload I/O pre- and post-processing to storage-level CPUs - e.g. checksum() after write() compounded to write-and-checksum() - Storage CPUs perform computation (e.g., checksum) and persist - Reduce data movement cost across boundaries • Bypass OS kernel and provide direct-access - firmware file system design to provide direct access for data plane and most control plane operations 12

  13. I/O Only Compound Operations Read-modify-write: Traditional FS Path: Compound FS Path: Read(data) Write(data) Read_modify_write(data) User space User space modify Storage Kernel space Storage FS 2 syscalls + 2 data copies Only 1 data copy with direct access : Kernel Trap StorageFS performs compound op : Data Copy 13

  14. I/O + Compute Compound Operations Write-and-checksum Traditional FS Path: Compound FS Path: Write_and_checksum(data) Write(data) Write(checksum) User space User space checksum Storage Kernel space Storage FS 2 syscalls + 2 data copies Only 1 data copy with direct access : Kernel Trap StorageFS handles checksum calculation : Data Copy 14

  15. CompoundFS Architecture Application (Thread 2) Application (Thread 1) Op1 open( File1 ) -> fd 1 Op2+ read_modify_write (fd2, buf, off=30, sz=5) Op3* write_and_checksum (fd1,buff, off=10, Op4 read(fd2, buf, off=30, sz=5) sz=1K, checksum_pos=head) Per-inode I/O Queue Per-inode Data Buffer UserLib (in Host) Converting POSIX I/O syscalls to Op1 Op2+ Op3* Op4 CompoundFS compoundOps StorageFS Journal (In Device) Meta- NVM Data TxB TxE … data Block Addr I/O Request Processing Threads Compounding I/O ops CPUID CPUID Perform CRC calculation CPUID Cred before write() Table Cred Cred Cred Device CPU Cores 15

  16. CompoundFS Implementation • Command-based arch based on PMFS (Eurosys’14) - control-plane ops (e.g. open) as commands via ioctl() - ioctl() carries arguments for each I/O ops • Avoids VFS overhead - control-plane ops are issued via ioctl(), no VFS layer • Avoids system call overhead - UserLib and StorageFS share a command buffer - UserLib adds requests to command buffer - StorageFS processes requests from the buffer 16

  17. CompoundFS Challenges • Crash-consistency model for compound I/O operations • All-or-nothing model (current solution) - an entire compound operation is a transaction - partially completed operations cannot be recovered - e.g., write-and-checksum, only data is persisted but checksum not • All-or-something model (ongoing) - fine-grained journaling and partial recovery is supported - recovery could become complex 17

  18. Outline • Background • Analysis • Design • Evaluation • Conclusion 18

  19. Evaluation Goal • Effectiveness to reduce boundary crossing • Effectiveness to reduce data copy overheads • Ability to exploit compute capability of modern storage 19

  20. Experimental Setup Hardware Platform • - dual-socket 64-core Xeon Scalable CPU @ 2.6GHz - 512GB Intel DC Optane NVM • Emulate firmware-level FS - reserve dedicated device threads handling I/O requests - add PCIe latency for every I/O operation - reduce CPU frequency to 1.2GHz for device CPU State-of-the-art File Systems • - ext4-DAX (Kernel-level file system) - SplitFS (User-level file system) - DevFS (Device-level file system) 20

  21. Micro-Benchmark 1200 1200 ext4-DAX 1000 1000 Throughput (MB/s) SplitFS Throughput (MB/s) 1.25x DevFS 800 800 CompoundFS 600 600 CompoundFS-slowCPU 400 400 2.1x 200 200 0 0 256 4096 256 4096 Value Size Value Size Read-modify-write Write-and-checksum CompoundFS reduces unnecessary data movement and system call • overhead by combining operations Even with slow device CPUs, CompoundFS can still provide gains for in- • storage computation 21

  22. LevelDB 40 100 ext4-DAX SplitFS Throughput (MB/s) 80 1.75x 30 Latency (us/op) DevFS 60 CompoundFS 20 40 CompoundFS-slowCPU 10 20 0 0 512 4096 512 4096 db_bench Value Size (500k keys) db_bench Value Size (500k keys) db_bench random read db_bench random write CompoundFS also shows promising speedup in Leveldb • 22

  23. Conclusion Storage hardware is moving to microsecond era • - Software overhead matters and providing direct-access is critical - Storage compute capability can benefit I/O intensive applications CompoundFS combines I/O ops and offloads computations • - Reduces boundary crossing (system call) and data copy overhead - Takes advantage of in-storage compute resources Our ongoing work • - Fine-grained crash consistency mechanism - Efficient I/O scheduler for managing computation in storage 23

  24. Thanks! yujie.ren@rutgers.edu Questions? 24

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend