all file systems are not created equal on the complexity
play

All File Systems Are Not Created Equal: On the Complexity of - PowerPoint PPT Presentation

All File Systems Are Not Created Equal: On the Complexity of Crafting Crash-Consistent Applications Thanumalayan Sankaranarayana Pillai Vijay Chidambaram Ramnatthan Alagappan, Samer Al-Kiswany Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau Crash


  1. Studying File Systems Uses BOB to study how properties varied over file systems Studied six file systems - ext2, ext3, ext4, btrfs, xfs, reiserfs - A total of 16 configurations CMU SDI Seminar 14 25

  2. Study Results: Atomicity ext2 async ext2 sync ext3 writeback ext3 ordered ext3 journal ext4 writeback ext4 ordered ext4 no-delalloc ext4 journal btrfs xfs default xfs wsync CMU SDI Seminar 14 26

  3. Study Results: Atomicity Single Sector ext2 async ext2 sync ext3 writeback ext3 ordered write(512) atomic? ext3 journal ext4 writeback ext4 ordered ext4 no-delalloc ext4 journal btrfs xfs default xfs wsync CMU SDI Seminar 14 26

  4. Study Results: Atomicity Single Sector ext2 async ext2 sync ext3 writeback ext3 ordered ext3 journal ext4 writeback ext4 ordered ext4 no-delalloc ext4 journal btrfs xfs default xfs wsync CMU SDI Seminar 14 26

  5. Study Results: Atomicity Single Multi Sector Sector ext2 async ext2 sync ext3 writeback ext3 ordered write(1GB) atomic? ext3 journal ext4 writeback ext4 ordered ext4 no-delalloc ext4 journal btrfs xfs default xfs wsync CMU SDI Seminar 14 26

  6. Study Results: Atomicity Single Multi Sector Sector ext2 async x ext2 sync x ext3 writeback x ext3 ordered x ext3 journal x ext4 writeback x ext4 ordered x ext4 no-delalloc x ext4 journal x btrfs x xfs default x xfs wsync x CMU SDI Seminar 14 26

  7. Study Results: Atomicity Single Multi Append Sector Sector Content ext2 async x ext2 sync x ext3 writeback x ext3 ordered open(file, O_APPEND) x ext3 journal write(12K) atomic? x ext4 writeback x ext4 ordered x ext4 no-delalloc x ext4 journal x btrfs x xfs default x xfs wsync x CMU SDI Seminar 14 26

  8. Study Results: Atomicity Single Multi Append Sector Sector Content ext2 async x x ext2 sync x x ext3 writeback x x ext3 ordered x ext3 journal x ext4 writeback x x ext4 ordered x ext4 no-delalloc x ext4 journal x btrfs x xfs default x xfs wsync x CMU SDI Seminar 14 26

  9. Study Results: Atomicity Single Multi Append Directory Sector Sector Content Operation ext2 async x x ext2 sync x x ext3 writeback x x ext3 ordered x rename(old, new) atomic? ext3 journal x ext4 writeback x x ext4 ordered x ext4 no-delalloc x ext4 journal x btrfs x xfs default x xfs wsync x CMU SDI Seminar 14 26

  10. Study Results: Atomicity Single Multi Append Directory Sector Sector Content Operation ext2 async x x x ext2 sync x x x ext3 writeback x x ext3 ordered x ext3 journal x ext4 writeback x x ext4 ordered x ext4 no-delalloc x ext4 journal x btrfs x xfs default x xfs wsync x CMU SDI Seminar 14 26

  11. Study Results: Ordering ext2 async ext2 sync ext3 writeback ext3 ordered ext3 journal ext4 writeback ext4 ordered ext4 no-delalloc ext4 journal btrfs xfs default xfs wsync CMU SDI Seminar 14 27

  12. Study Results: Ordering Overwrite -> any op ext2 async ext2 sync ext3 writeback ext3 ordered ext3 journal write(4K) -> rename() ext4 writeback ext4 ordered ext4 no-delalloc ext4 journal btrfs xfs default xfs wsync CMU SDI Seminar 14 27

  13. Study Results: Ordering Overwrite -> any op ext2 async x ext2 sync ext3 writeback x ext3 ordered x ext3 journal ext4 writeback x ext4 ordered x ext4 no-delalloc x ext4 journal btrfs xfs default x xfs wsync x CMU SDI Seminar 14 27

  14. Study Results: Ordering Overwrite Append -> any op -> any op ext2 async x x ext2 sync ext3 writeback x x ext3 ordered x ext3 journal ext4 writeback x x ext4 ordered x ext4 no-delalloc x ext4 journal btrfs x xfs default x x xfs wsync x CMU SDI Seminar 14 27

  15. Study Results: Ordering Overwrite Append Dir op -> any op -> any op -> any op ext2 async x x x ext2 sync ext3 writeback x x ext3 ordered x ext3 journal ext4 writeback x x ext4 ordered x ext4 no-delalloc x ext4 journal btrfs x x xfs default x x xfs wsync x CMU SDI Seminar 14 27

  16. Study Results: Ordering Overwrite Append Dir op Append(f) -> any op -> any op -> any op -> rename(f) ext2 async x x x x ext2 sync ext3 writeback x x x ext3 ordered x ext3 journal ext4 writeback x x x ext4 ordered x ext4 no-delalloc x ext4 journal btrfs x x xfs default x x xfs wsync x CMU SDI Seminar 14 27

  17. Study Results: Ordering Overwrite Append Dir op Append(f) -> any op -> any op -> any op -> rename(f) ext2 async x x x x ext2 sync ext3 writeback x x x ext3 ordered x ext3 journal ext4 writeback x x x ext4 ordered x ext4 no-delalloc x ext4 journal btrfs x x xfs default x x xfs wsync x CMU SDI Seminar 14 27

  18. File-System Study Results Persistence properties vary widely among file systems - Even within different configurations of same file system Applications should not rely on them Testing application correctness on single file system is not enough CMU SDI Seminar 14 28

  19. Outline Introduction Background Analyzing file systems with BOB Analyzing applications with ALICE Application Study Conclusion and Future Work CMU SDI Seminar 14 29

  20. Application-Level Intelligent Crash Explorer (ALICE) ALICE: tool to find Crash Vulnerabilities Application Crash Vulnerabilities - code that depends on specific persistence properties for correct behavior - ex: if file system doesn't persist two system calls calls in order, it leads to data corruption CMU SDI Seminar 14 30

  21. ALICE Methodology Construct crash state by violating a single persistence property Run application on crash state (allow recovery) Examine application state If application inconsistent, it depended on persistence property violated in crash state CMU SDI Seminar 14 31

  22. ALICE Overview FS Abstract Application Application Persistence Workload Checker Model git add file1 git status ERROR creat(index.lock) Crash Crash State creat(tmp) Crash append(tmp, 4K) State Constructor Crash fsync(tmp) State link(tmp, perm) State System-Call Trace CMU SDI Seminar 14 32

  23. ALICE Overview FS Abstract Application Application Persistence Workload Checker Model git add file1 git status ERROR creat(index.lock) Crash Crash State creat(tmp) Crash append(tmp, 4K) State Constructor Crash fsync(tmp) State link(tmp, perm) State System-Call Trace CMU SDI Seminar 14 33

  24. Tracing the Workload Run the application workload Collect the system-call traces System calls converted into logical operations: - Abstract away current file offset, fd, etc - Group writev(), pwrite() etc into a single type of operation CMU SDI Seminar 14 34

  25. ALICE Overview FS Abstract Application Application Persistence Workload Checker Model git add file1 git status ERROR creat(index.lock) Crash Crash State creat(tmp) Crash append(tmp, 4K) State Constructor Crash fsync(tmp) State link(tmp, perm) State System-Call Trace CMU SDI Seminar 14 35

  26. Constructing Crash States ALICE constructs crash states by applying a subset of operations to the initial disk image creat(index.lock) creat(tmp) append(tmp, 4K) fsync(tmp) link(tmp, perm) Initial Crash Disk State State CMU SDI Seminar 14 36

  27. Constructing Crash States Persistence Properties Violated: 1. Atomicity across system calls creat(index.lock) creat(tmp) Method: apply prefix append(tmp, 4K) fsync(tmp) of operations link(tmp, perm) 2. Atomicity within system calls CMU SDI Seminar 14 37

  28. Constructing Crash States Persistence Properties Violated: 1. Atomicity across system calls creat(index.lock) creat(tmp) Method: apply prefix append(tmp, 4K) fsync(tmp) of operations link(tmp, perm) creat(index.lock) 2. Atomicity within system calls creat(tmp) append(tmp, 4K) Method: apply prefix fsync(tmp) link(tmp, perm) + partial operation CMU SDI Seminar 14 37

  29. Constructing Crash States Persistence Properties Violated: 1. Atomicity across system calls creat(index.lock) creat(tmp) Method: apply prefix append(tmp, 4K) fsync(tmp) of operations link(tmp, perm) creat(index.lock) 2. Atomicity within system calls creat(tmp) append(tmp, 512) Method: apply prefix … append(tmp, 512) + partial operation fsync(tmp) link(tmp, perm) CMU SDI Seminar 14 37

  30. Constructing Crash States Persistence Properties Violated: 1. Atomicity across system calls creat(index.lock) creat(tmp) Method: apply prefix append(tmp, 4K) fsync(tmp) of operations link(tmp, perm) creat(index.lock) 2. Atomicity within system calls creat(tmp) append(tmp, 512) Method: apply prefix … append(tmp, 512) + partial operation fsync(tmp) link(tmp, perm) CMU SDI Seminar 14 37

  31. Constructing Crash States Persistence Properties Violated: 3. Ordering among system calls creat(index.lock) creat(tmp) append(tmp, 4K) fsync(tmp) link(tmp, perm) Method: ignore an operation, apply prefix CMU SDI Seminar 14 38

  32. Constructing Crash States Persistence Properties Violated: 3. Ordering among system calls creat(index.lock) creat(tmp) append(tmp, 4K) fsync(tmp) link(tmp, perm) Method: ignore an operation, apply prefix CMU SDI Seminar 14 38

  33. ALICE Overview FS Abstract Application Application Persistence Workload Checker Model git add file1 git status ERROR creat(index.lock) Crash Crash State creat(tmp) Crash append(tmp, 4K) State Constructor Crash fsync(tmp) State link(tmp, perm) State System-Call Trace CMU SDI Seminar 14 39

  34. FS Abstract Persistence Model Each file system implements persistence properties differently - Ex: ext4 orders writes of a file before its rename APM defines which crash states are permitted APM defines atomicity and ordering constraints APM allow ALICE to model file-system behavior without file-system implementation CMU SDI Seminar 14 40

  35. ALICE Overview FS Abstract Application Application Persistence Workload Checker Model git add file1 git status ERROR creat(index.lock) Crash Crash State creat(tmp) Crash append(tmp, 4K) State Constructor Crash fsync(tmp) State link(tmp, perm) State System-Call Trace CMU SDI Seminar 14 41

  36. Finding Crash Vulnerabilities Identify persistence property violated creat(index.lock) creat(tmp) append(tmp, 4K) fsync(tmp) link(tmp, perm) Identify system calls involved Identify source code lines involved CMU SDI Seminar 14 42

  37. ALICE Overview FS Abstract Application Application Persistence Workload Checker Model git add file1 git status ERROR creat(index.lock) Crash Crash State creat(tmp) Crash append(tmp, 4K) State Constructor Crash fsync(tmp) State link(tmp, perm) State System-Call Trace CMU SDI Seminar 14 43

  38. ALICE Limitations Not complete - does not execute all code paths in application - does not explore all crash states - does not test combinations of persistence property violations (ex: atomicity + ordering) Cannot prove an update protocol is correct CMU SDI Seminar 14 44

  39. Outline Introduction Background Analyzing file systems with BOB Analyzing applications with ALICE Application Study Conclusion and Future Work CMU SDI Seminar 14 45

  40. Application Study Used ALICE to study eleven applications Version Control Systems Key-Value Stores GDBM LMDB Relational Databases Distributed Systems ZooKeeper Virtualization Platforms Player CMU SDI Seminar 14

  41. Study Goals Analyzed applications using weak APM - Minimum constraints on possible crash states Sought to answer: - Which persistence properties do applications depend upon? - What are the consequences of vulnerabilities? - How many vulnerabilities occur on today’s file systems? Did not seek to compare applications CMU SDI Seminar 14 47

  42. Study: Setup What is correct behavior for an application? - We use guarantees in documentation - In case of no documentation, we assume typical user expectations (“committed data is durable”) Configurations change guarantees - We test each configuration separately - Tested 34 configurations across 11 applications Post-crash, we run all appropriate application recovery mechanisms CMU SDI Seminar 14 48

  43. Example: Git mkdir(o/x) creat(o/x/tmp_y) append(o/x/tmp_y) fsync(o/x/tmp_y) link(o/x/tmp_y, o/x/y) unlink(o/x/tmp_y) store object do(store object) creat(branch.lock) append(branch.lock) append(branch.lock) append(logs/branch) append(logs/HEAD) rename(branch.lock,x/branch) stdout(“finished commit”) git commit CMU SDI Seminar 14 49

  44. Example: Git mkdir(o/x) creat(o/x/tmp_y) append(o/x/tmp_y) fsync(o/x/tmp_y) link(o/x/tmp_y, o/x/y) unlink(o/x/tmp_y) store object do(store object) creat(branch.lock) append(branch.lock) append(branch.lock) [ append(logs/branch) ] Atomicity append(logs/HEAD) rename(branch.lock,x/branch) stdout(“finished commit”) git commit CMU SDI Seminar 14 49

  45. Example: Git mkdir(o/x) creat(o/x/tmp_y) append(o/x/tmp_y) Ordering fsync(o/x/tmp_y) link(o/x/tmp_y, o/x/y) unlink(o/x/tmp_y) store object do(store object) creat(branch.lock) append(branch.lock) append(branch.lock) append(logs/branch) append(logs/HEAD) rename(branch.lock,x/branch) stdout(“finished commit”) git commit CMU SDI Seminar 14 49

  46. Example: Git mkdir(o/x) creat(o/x/tmp_y) append(o/x/tmp_y) Ordering fsync(o/x/tmp_y) link(o/x/tmp_y, o/x/y) unlink(o/x/tmp_y) store object do(store object) creat(branch.lock) append(branch.lock) append(branch.lock) append(logs/branch) append(logs/HEAD) rename(branch.lock,x/branch) stdout(“finished commit”) git commit CMU SDI Seminar 14 49

  47. Example: Git mkdir(o/x) creat(o/x/tmp_y) append(o/x/tmp_y) fsync(o/x/tmp_y) link(o/x/tmp_y, o/x/y) unlink(o/x/tmp_y) store object do(store object) creat(branch.lock) append(branch.lock) append(branch.lock) append(logs/branch) append(logs/HEAD) Durability rename(branch.lock,x/branch) stdout(“finished commit”) git commit CMU SDI Seminar 14 49

  48. Vulnerability Types Multi-call atomicity Single-call atomicity Ordering Durability Git Mercurial LevelDB-1.10 LevelDB-1.15 GDBM LMDB PostgreSQL HSQLDB SQLite HDFS ZooKeeper VMWare Player 0 1 2 3 4 5 6 7 8 9 10 #vulnerabilties CMU SDI Seminar 14 50

  49. Vulnerability Types Multi-call atomicity Single-call atomicity Ordering Durability 1 Git 2 Mercurial 1 LevelDB-1.10 1 LevelDB-1.15 1 GDBM LMDB PostgreSQL HSQLDB SQLite HDFS ZooKeeper VMWare Player 0 1 2 3 4 5 6 7 8 9 10 #vulnerabilties CMU SDI Seminar 14 50

  50. Vulnerability Types Multi-call atomicity Single-call atomicity Ordering Durability 1 1 Git 2 2 Mercurial 1 2 LevelDB-1.10 1 2 LevelDB-1.15 1 1 GDBM 1 LMDB 1 PostgreSQL 3 HSQLDB SQLite 1 HDFS 1 ZooKeeper 1 VMWare Player 0 1 2 3 4 5 6 7 8 9 10 #vulnerabilties CMU SDI Seminar 14 50

  51. Vulnerability Types Multi-call atomicity Single-call atomicity Ordering Durability 1 1 6 Git 2 2 5 Mercurial 1 2 6 LevelDB-1.10 1 2 3 LevelDB-1.15 1 1 1 GDBM 1 LMDB 1 PostgreSQL 3 4 HSQLDB SQLite 1 1 HDFS 1 1 ZooKeeper 1 VMWare Player 0 1 2 3 4 5 6 7 8 9 10 #vulnerabilties CMU SDI Seminar 14 50

  52. Vulnerability Types Multi-call atomicity Single-call atomicity Ordering Durability 1 1 6 1 Git 2 2 5 2 Mercurial 1 2 6 1 LevelDB-1.10 1 2 3 LevelDB-1.15 1 1 1 2 GDBM 1 LMDB 1 PostgreSQL 3 4 3 HSQLDB 1 SQLite 1 1 HDFS 1 1 2 ZooKeeper 1 VMWare Player 0 1 2 3 4 5 6 7 8 9 10 #vulnerabilties CMU SDI Seminar 14 50

  53. Vulnerability Types Multi-call atomicity Single-call atomicity Ordering Durability 1 1 6 1 Git 2 2 5 2 Mercurial 1 2 6 1 LevelDB-1.10 1 2 3 LevelDB-1.15 1 1 1 2 GDBM 1 LMDB 60 vulnerabilities across 11 applications 1 PostgreSQL 3 4 3 HSQLDB 1 SQLite 1 1 HDFS 1 1 2 ZooKeeper 1 VMWare Player 0 1 2 3 4 5 6 7 8 9 10 #vulnerabilties CMU SDI Seminar 14 50

  54. Vulnerability Consequences Silent Errors Data Loss Cannot Open Failed reads/writes Misc Git Mercurial 1 LevelDB-1.10 2 LevelDB-1.15 GDBM LMDB PostgreSQL 2 HSQLDB SQLite HDFS ZooKeeper VMWare Player 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 #vulnerabilties CMU SDI Seminar 14 51

  55. Vulnerability Consequences Silent Errors Data Loss Cannot Open Failed reads/writes Misc 1 Git 2 Mercurial 1 1 LevelDB-1.10 2 LevelDB-1.15 2 GDBM LMDB PostgreSQL 2 3 HSQLDB 1 SQLite HDFS 2 ZooKeeper VMWare Player 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 #vulnerabilties CMU SDI Seminar 14 51

  56. Vulnerability Consequences Silent Errors Data Loss Cannot Open Failed reads/writes Misc 1 3 Git 2 1 Mercurial 1 1 5 LevelDB-1.10 2 2 LevelDB-1.15 2 3 GDBM LMDB 1 PostgreSQL 2 3 5 HSQLDB 1 SQLite 2 HDFS 2 2 ZooKeeper 1 VMWare Player 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 #vulnerabilties CMU SDI Seminar 14 51

  57. Vulnerability Consequences Silent Errors Data Loss Cannot Open Failed reads/writes Misc 1 3 5 Git 2 1 6 Mercurial 1 1 5 4 LevelDB-1.10 2 2 2 LevelDB-1.15 2 3 GDBM LMDB 1 PostgreSQL 2 3 5 HSQLDB 1 SQLite 2 HDFS 2 2 ZooKeeper 1 VMWare Player 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 #vulnerabilties CMU SDI Seminar 14 51

  58. Vulnerability Consequences Silent Errors Data Loss Cannot Open Failed reads/writes Misc 1 3 5 3 Git 2 1 6 5 Mercurial 1 1 5 4 LevelDB-1.10 2 2 2 LevelDB-1.15 2 3 GDBM 1 LMDB 1 PostgreSQL 2 3 5 HSQLDB 1 SQLite 2 HDFS 2 2 ZooKeeper 1 VMWare Player 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 #vulnerabilties CMU SDI Seminar 14 51

  59. Vulnerability Consequences Silent Errors Data Loss Cannot Open Failed reads/writes Misc 1 3 5 3 Git 2 1 6 5 Mercurial 1 1 5 4 LevelDB-1.10 2 2 2 LevelDB-1.15 2 3 GDBM Many vulnerabilities result in data loss, 1 LMDB silent errors, and failed reads/writes 1 PostgreSQL 2 3 5 HSQLDB 1 SQLite 2 HDFS 2 2 ZooKeeper 1 VMWare Player 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 #vulnerabilties CMU SDI Seminar 14 51

  60. Vulnerabilities on Current File Systems 60 60 45 #vulnerabilities 30 15 0 Weak APM ext3-writeback ext3-ordered ext3-journal ext4-ordered btrfs CMU SDI Seminar 14 52

  61. Vulnerabilities on Current File Systems 60 60 45 #vulnerabilities 30 15 16 12 10 0 Weak APM ext3-writeback ext3-ordered ext3-journal ext4-ordered btrfs CMU SDI Seminar 14 52

  62. Vulnerabilities on Current File Systems 60 60 45 #vulnerabilities 30 15 17 16 12 10 0 Weak APM ext3-writeback ext3-ordered ext3-journal ext4-ordered btrfs CMU SDI Seminar 14 52

  63. Vulnerabilities on Current File Systems 60 60 45 #vulnerabilities 30 31 15 17 16 12 10 0 Weak APM ext3-writeback ext3-ordered ext3-journal ext4-ordered btrfs CMU SDI Seminar 14 52

  64. Vulnerabilities on Current File Systems 60 60 45 #vulnerabilities Every current file system exposes at least one vulnerability; 30 31 btrfs exposes more than half 15 17 16 12 10 0 Weak APM ext3-writeback ext3-ordered ext3-journal ext4-ordered btrfs CMU SDI Seminar 14 52

  65. Observations Applications very careful in overwriting user data - None required atomicity for multi-block overwrites Applications not as careful in appending to logs - Multi-block appends require prefix atomicity - Ex: write(“ABC”) should result in “A”/“AB”/“ABC” Atomicity across system calls doesn't seem useful CMU SDI Seminar 14 53

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend