splitfs reducing software overhead in file systems for
play

SplitFS: Reducing Software Overhead in File Systems for Persistent - PowerPoint PPT Presentation

SplitFS: Reducing Software Overhead in File Systems for Persistent Memory Rohan Kadekodi, Se Kwon Lee, Sanidhya Kashyap*, Taesoo Kim, Aasheesh Kolli, Vijay Chidambaram * on the job market 1 Persistent Memory (PM) Non-volatile Fast


  1. Outline • Target usage scenario • High-level design • Handling data operations • Handling file reads and updates • Handling file appends • Consistency guarantees • Evaluation � 15

  2. Handling reads and updates Application U-Split User Kernel K-Split (ext4-DAX) File PM � 16

  3. Handling reads and updates Application read / update U-Split User Kernel K-Split (ext4-DAX) File PM � 16

  4. Handling reads and updates Application read / update U-Split User mmap Kernel K-Split (ext4-DAX) perform mmap File PM � 16

  5. Handling reads and updates Application read / update DAX-mmaps U-Split User mmap Kernel K-Split (ext4-DAX) perform mmap File PM � 16

  6. Handling reads and updates Application read / update DAX-mmaps U-Split User Kernel K-Split (ext4-DAX) File PM � 16

  7. Handling reads and updates Application read / update DAX-mmaps U-Split User Kernel In the common case, file reads and updates do K-Split (ext4-DAX) not pass through the kernel File PM � 16

  8. Outline • Target usage scenario • High-level design • Handling data operations • Handling file reads and updates • Handling file appends • Consistency guarantees • Evaluation � 17

  9. Handling appends user kernel foo inode size = 10 foo Persistent Memory � 18

  10. Handling appends Application Start user kernel foo inode size = 10 foo Persistent Memory � 18

  11. Handling appends Application Start user kernel staging file inode foo inode size = 100 size = 10 foo staging file Persistent Memory � 18

  12. Handling appends Application Start staging file mmap user kernel staging file inode foo inode size = 100 size = 10 foo staging file Persistent Memory � 18

  13. Handling appends Application Start staging file mmap append (foo,“abc”) store user kernel staging file inode foo inode size = 100 size = 10 abc foo staging file Persistent Memory 18 �

  14. Handling appends Application Start staging file mmap append (foo,“abc”) load user read (foo) kernel staging file inode foo inode size = 100 size = 10 abc foo staging file Persistent Memory 18 �

  15. Handling appends Application Start staging file mmap append (foo,“abc”) user read (foo) kernel fsync (foo) staging file inode foo inode size = 100 size = 10 abc foo staging file Persistent Memory 18 �

  16. Handling appends Application Start staging file mmap append (foo,“abc”) user read (foo) kernel relink() fsync (foo) staging file inode foo inode size = 100 size = 10 abc foo staging file Persistent Memory 18 �

  17. Handling appends Application Start staging file mmap append (foo,“abc”) user read (foo) kernel relink() fsync (foo) staging foo staging file inode foo inode ext4-journal transaction size = 100 size = 10 abc foo staging file Persistent Memory 18 �

  18. Handling appends Application Start staging file mmap append (foo,“abc”) user read (foo) kernel relink() fsync (foo) staging foo staging file inode foo inode ext4-journal transaction size = 100 size = 10 abc foo staging file Persistent Memory 18 �

  19. Handling appends Application Start staging file mmap append (foo,“abc”) user read (foo) kernel relink() fsync (foo) staging foo staging file inode foo inode ext4-journal transaction size = 100 size = 10 In the common case, file appends do not pass through the kernel abc foo staging file Persistent Memory 18 �

  20. Outline • Target usage scenario • High-level design • Handling data operations • Consistency guarantees • Evaluation � 19

  21. Consistency Guarantees Metadata Synchronous Data Mode File System Atomicity Operations Atomicity ext4-DAX, POSIX SplitFS-POSIX PMFS, Sync SplitFS-Sync NOVA, Strata, Strict SplitFS-Strict � 20

  22. Consistency Guarantees Metadata Synchronous Data Mode File System Atomicity Operations Atomicity ext4-DAX, POSIX SplitFS-POSIX PMFS, Sync SplitFS-Sync NOVA, Strata, Strict SplitFS-Strict � 20

  23. Consistency Guarantees Metadata Synchronous Data Mode File System Atomicity Operations Atomicity ext4-DAX, POSIX SplitFS-POSIX PMFS, Sync SplitFS-Sync NOVA, Strata, Strict SplitFS-Strict � 20

  24. Consistency Guarantees Metadata Synchronous Data Mode File System Atomicity Operations Atomicity ext4-DAX, POSIX SplitFS-POSIX PMFS, Sync SplitFS-Sync NOVA, Strata, Strict SplitFS-Strict � 20

  25. Consistency Guarantees Metadata Synchronous Data Mode File System Atomicity Operations Atomicity ext4-DAX, POSIX SplitFS-POSIX Optimized logging is used in order to provide PMFS, Sync SplitFS-Sync stronger guarantees in sync and strict modes NOVA, Strata, Strict SplitFS-Strict � 20

  26. Optimized logging � 21

  27. Optimized logging SplitFS employs a per-application log in sync and strict mode, which logs every logical operation � 21

  28. Optimized logging SplitFS employs a per-application log in sync and strict mode, which logs every logical operation In the common case • Each log entry fits in one cache line • Persisted using a single non-temporal store and sfence instruction � 21

  29. Flexible SplitFS App 2 App 3 App 1 User Kernel K-Split (ext4-DAX) PM File 1 File 2 File 3 File 4 � 22

  30. Flexible SplitFS App 2 App 3 App 1 U-Split- U-Split- U-Split- strict sync POSIX User Kernel K-Split (ext4-DAX) PM File 1 File 2 File 3 File 4 � 22

  31. Flexible SplitFS App 2 App 3 App 1 Data Meta Data Meta Data Meta U-Split- U-Split- U-Split- strict sync POSIX User Kernel K-Split (ext4-DAX) PM File 1 File 2 File 3 File 4 � 22

  32. Visibility � 23

  33. Visibility When are updates from one application visible to another? � 23

  34. Visibility When are updates from one application visible to another? • All metadata operations are immediately visible to all other processes � 23

  35. Visibility When are updates from one application visible to another? • All metadata operations are immediately visible to all other processes • Writes are visible to all other processes on subsequent fsync() � 23

  36. Visibility When are updates from one application visible to another? • All metadata operations are immediately visible to all other processes • Writes are visible to all other processes on subsequent fsync() • Memory mapped files have the same visibility guarantees as that of ext4-DAX � 23

  37. SplitFS Techniques Technique Benefit � 24

  38. SplitFS Techniques Technique Benefit Low-overhead data operations, SplitFS Architecture Correct metadata operations � 24

  39. SplitFS Techniques Technique Benefit Low-overhead data operations, SplitFS Architecture Correct metadata operations Optimized appends, Staging + Relink No data copy � 24

  40. SplitFS Techniques Technique Benefit Low-overhead data operations, SplitFS Architecture Correct metadata operations Optimized appends, Staging + Relink No data copy Optimized Logging + out-of-place writes Stronger guarantees � 24

  41. Outline • Target usage scenario • High-level design • Handling data operations • Consistency guarantees • Evaluation � 25

  42. Evaluation � 26

  43. Evaluation Setup: • 2-socket 96-core machine with 32 MB LLC • 768 GB Intel Optane DC PMM, 378 GB DRAM � 26

  44. Evaluation Setup: • 2-socket 96-core machine with 32 MB LLC • 768 GB Intel Optane DC PMM, 378 GB DRAM File systems compared: • ext4-DAX, PMFS, NOVA, Strata � 26

  45. Evaluation Setup: • 2-socket 96-core machine with 32 MB LLC • 768 GB Intel Optane DC PMM, 378 GB DRAM File systems compared: • ext4-DAX, PMFS, NOVA , Strata � 26

  46. Does SplitFS reduce software overhead compared to other file systems? How does SplitFS perform on data intensive workloads? How does SplitFS perform on metadata intensive workloads? � 27

  47. Does SplitFS reduce software overhead compared to other file systems? How does SplitFS perform on data intensive workloads? How does SplitFS perform on metadata intensive workloads? • < 15% overhead for metadata intensive workloads � 27

  48. Software Overhead of SplitFS • Append 4KB data to a file • Time taken to copy user data to PM: ~700 ns 9002 10000 (12x) 8000 Time (ns) 6000 4150 (5x) 3021 4000 (3x) 2450 (2.5x) 2000 700 0 device SplitFS-strict Strata NOVA PMFS ext4-DAX � 28

  49. Software Overhead of SplitFS • Append 4KB data to a file • Time taken to copy user data to PM: ~700 ns 9002 10000 (12x) 8000 Time (ns) 6000 4150 (5x) 3021 4000 (3x) 2450 (2.5x) 1251 2000 (0.8x) 700 0 device SplitFS-strict Strata NOVA PMFS ext4-DAX � 28

  50. Workloads Seq writes Microbenchmarks Seq reads Appends Rand reads Rand writes YCSB on LevelDB Data intensive Redis TPCC on SQLite Metadata intensive Tar Git Rsync � 29

  51. Workloads Seq writes Microbenchmarks Seq reads Appends Rand reads Rand writes YCSB on LevelDB Data intensive Redis TPCC on SQLite Metadata intensive Tar Git Rsync � 29

  52. YCSB on LevelDB Yahoo! Cloud Serving Benchmark - Industry standard macro-benchmark Insert 5M keys. Run 5M operations. Key size = 16 bytes. Value size = 1K 2.5 Normalized throughput 2 1.5 NOVA 1 SplitFS-Strict 0.5 0 Load A Run A Run B Run C Run D Load E Run E Run F Load A - 100% writes Run D - 95% reads (latest), 5% writes Run A - 50% reads, 50% writes Load E - 100% writes Run B - 95% reads, 5% writes Run E - 95% range queries, 5% writes Run C - 100% reads Run F - 50% reads, 50% read-modify-writes � 30

  53. YCSB on LevelDB Yahoo! Cloud Serving Benchmark - Industry standard macro-benchmark Insert 5M keys. Run 5M operations. Key size = 16 bytes. Value size = 1K 2.5 Normalized throughput 139.94 kops/s 174.85 kops/s 191.54 kops/s 32.24 kops/s 13.39 kops/s 66.54 kops/s 13.59 kops/s 17.75 kops/s 2 1.5 NOVA 1 SplitFS-Strict 0.5 0 Load A Run A Run B Run C Run D Load E Run E Run F Load A - 100% writes Run D - 95% reads (latest), 5% writes Run A - 50% reads, 50% writes Load E - 100% writes Run B - 95% reads, 5% writes Run E - 95% range queries, 5% writes Run C - 100% reads Run F - 50% reads, 50% read-modify-writes � 31

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend