verifying a high performance crash safe file system using
play

Verifying a high-performance crash-safe file system using a tree - PowerPoint PPT Presentation

Verifying a high-performance crash-safe file system using a tree specifica6on Haogang Chen, Tej Chajed , Stephanie Wang, Alex Konradi, Atalay leri, Adam Chlipala, M. Frans Kaashoek, Nickolai Zeldovich File systems are difficult to make correct


  1. Verifying a high-performance crash-safe file system using a tree specifica6on Haogang Chen, Tej Chajed , Stephanie Wang, Alex Konradi, Atalay İleri, Adam Chlipala, M. Frans Kaashoek, Nickolai Zeldovich

  2. File systems are difficult to make correct • Complicated implementa6ons • on-disk layout • in-memory data structures • Computer can crash at any 6me 2

  3. Despite much effort, file systems have bugs • File systems s6ll have subtle bugs • Well documented [Lu, TOS ’14] [Min, SOSP ’15] • Example from ext4: 
 combina6on of two op6miza6ons allows data to leak from one file to another on crash • Discovered a[er 6 years [Kara 2014] 3

  4. Approach: formal verifica6on • Write a specifica6on • Prove implementa6on meets the specifica6on • Ensures implementa6on handles all corner cases • Proof assistant (Coq) ensures proof is correct • Avoid large class of bugs 4

  5. Exis6ng verified file systems correctness FSCQ [SOSP ’15] BilbyFS [ASPLOS ’16] Yggdrasil [OSDI ’16] verified file systems ext4 btrfs ZFS performance 5

  6. Goal: verified high-performance file system correctness FSCQ [SOSP ’15] ? BilbyFS [ASPLOS ’16] Yggdrasil [OSDI ’16] verified file systems ext4 btrfs ZFS performance 6

  7. Strawman: op6mize FSCQ correctness FSCQ code performance 7

  8. Strawman: op6mize FSCQ spec proof correctness FSCQ code performance 7

  9. Strawman: op6mize FSCQ spec proof? proof correctness FSCQ code fast FSCQ performance 7

  10. Problem: specifica6on incompa6ble with high performance • Achieving high performance requires op6miza6ons • Some op6miza6ons change file-system behavior • Requires changes to specifica6on 8

  11. Example op6miza6on: deferred commit • Deferred commit: buffer system calls un6l fsync • FSCQ’s specifica6on: “if create(f) has returned and computer crashes, f exists” • Deferred commit requires a new specifica6on 9

  12. Op6miza6ons that change crash behavior • Deferred commit: buffer system calls un6l fsync • Log-bypass writes: skip log for data writes • Buffer cache: cache data un6l fdatasync • Exis*ng specifica*ons do not support these op*miza*ons 10

  13. Contribu6on: DFSCQ file system • Precise specifica6on for a subset of POSIX • supports deferred commit and log-bypass writes • Verified, crash-safe file system • Tradi6onal journalling file-system design • Implements most of ext4’s op6miza6ons • Machine-checked proof that implementa6on meets specifica6on • Performance on par with ext4 (but DFSCQ has fewer features) 11

  14. Specifying a file system • Design abstract state 12

  15. Specifying a file system • Design abstract state • Describe how system calls execute 12

  16. Specifying a file system • Design abstract state • Describe how system calls execute • Describe effect of crashes 12

  17. Star6ng point: tree as abstract state Trees are a simplified abstrac6on of a file system g f 13

  18. Specifica6on abstracts implementa6on details g abstract state f implementa6on’s state 14

  19. Specify how system calls affect abstract state specifica6on describes transi6on unlink(g) g f f unlink(g) 15

  20. Challenges in specifying crash behavior • Op6miza6ons mean crashes can be complex • Problem 1: deferred commit • Problem 2: log-bypass writes • Problem 3: caching 16

  21. Problem 1: deferred commit leads to many crash states unlink(g) g f f 17

  22. Problem 1: deferred commit leads to many crash states unlink(g) g f f crash: reset memory 17

  23. Problem 1: deferred commit leads to many crash states g unlink(g) g f f f f crash: reset memory 17

  24. How do we specify crash outcomes with deferred commit? g f f 18

  25. How do we specify crash outcomes with deferred commit? crash g f f 18

  26. Specify deferred commit using tree sequences g tree sequence f 19

  27. Specify deferred commit using tree sequences • Abstract state is a sequence of trees g tree sequence f 19

  28. Specify deferred commit using tree sequences • Abstract state is a sequence of trees • Always read from the latest tree g tree sequence f 19

  29. Specify deferred commit using tree sequences • Metadata updates add new trees in the specifica6on • Always read from the latest tree g f g unlink(g) f f 20

  30. Specify deferred commit using tree sequences • Metadata updates add new trees in the specifica6on • Always read from the latest tree g f f 21

  31. Specify deferred commit using tree sequences • Metadata updates add new trees in the specifica6on • Always read from the latest tree g f f g truncate(f,2) f f f 22

  32. Specify deferred commit using tree sequences • Metadata updates add new trees in the specifica6on • Always read from the latest tree g f f f 23

  33. Specify deferred commit using tree sequences • Metadata updates add new trees in the specifica6on • Always read from the latest tree g f f f g f rename(f,/) f f f 24

  34. Behavior of tree sequences on crash • What about crash behavior? g f tree sequence f f f 25

  35. Behavior of tree sequences on crash • What about crash behavior? g f tree sequence f f f crash post-crash g tree sequence f 25

  36. Crash specifica6on allows background commits g f tree sequence f f f post-crash states: crash g f f f f 26

  37. Specifica6on for fsync g f f f f fsync("/") f 27

  38. Problem 2: log-bypass writes may reorder updates • Log-bypass writes: update file data blocks in place, skipping log write rename f f f 28

  39. Problem 2: log-bypass writes may reorder updates • Log-bypass writes: update file data blocks in place, skipping log • Effect: data writes and metadata updates can be reordered on crash crash write rename f f f f 28

  40. Log-bypass writes f g f f f f g write(f,…) f f f At minimum, writes to latest tree 29

  41. Log-bypass writes f g f f f f g write(f,…) f f f Affects the same file in earlier trees 30

  42. Specify that other files are unaffected f g f f f ? b21 f g write(f,…) b21 b21 f f f Puts an obliga6on on the implementa6on to avoid block re-use within a tree sequence 31

  43. Specify that other files are unaffected f g f f f b21 f g write(f,…) b21 b21 f f f Puts an obliga6on on the implementa6on to avoid block re-use within a tree sequence 32

  44. Specify that other files are unaffected f g f f f b21 f g write(f,…) b21 b21 f f b51 f b51 Puts an obliga6on on the implementa6on to avoid block re-use within a tree sequence 32

  45. Problem 3: data writes are cached • Write-back buffer cache write crash f f f 33

  46. Problem 3: data writes are cached • Write-back buffer cache • Data can be persisted in any order write crash f f f f f f 33

  47. Specifying data caching: block sets f g f f f uncached two possible values: old ( ) and new ( ) 34

  48. Behavior of block sets on crash f g f f f f g crash f f f

  49. Behavior of block sets on crash f g f f f two degrees of non-determinism in crash states: f g crash f f f f f

  50. Behavior of block sets on crash f g f f f two degrees of non-determinism in crash states: f g crash f f f specifica6on allows f metadata and data updates to be reordered f

  51. Specifica6on for fdatasync f g f f f fdatasync(f) 37

  52. Specifica6on for fdatasync f g f f f f g fdatasync(f) f f f fdatasync specifica6on says block sets collapse in every tree 38

  53. Summary: DFSCQ’s tree-based specifica6on • metadata opera6ons add a new tree • fsync collapses to latest tree • writes update blocksets in every tree • fdatasync collapses blocksets in every tree 39

  54. Prove implementa6on meets specifica6on length: 2 type: file … stat(g) g g f f length: 2 type: file … stat(g) 40

  55. Prove implementa6on meets specifica6on length: 2 type: file … stat(g) g g f f length: 2 type: file … stat(g) return values match 40

  56. Prove implementa6on meets specifica6on length: 2 type: file … stat(g) unlink(g) g g g f f f f length: 2 type: file … stat(g) unlink(g) return values match 40

  57. Prove implementa6on meets specifica6on length: 2 type: file … stat(g) unlink(g) g g g f f f f length: 2 type: file … stat(g) unlink(g) disk con6nues to relate return values match to abstract state 40

  58. DFSCQ Design directory name cache inode k -indirect blocks dirty blocks block allocator free-bit cache avoid re-use logging checksums deferred commit log-bypass API buffer cache 41

  59. Many single-layer op6miza6ons directory • Affect only proof of single layer name cache inode k -indirect blocks dirty blocks block allocator free-bit cache avoid re-use logging checksums deferred commit log-bypass API buffer cache 42

  60. Many single-layer op6miza6ons directory • Affect only proof of single layer name cache inode k -indirect blocks dirty blocks block allocator cache free blocks free-bit cache avoid re-use logging checksums deferred commit log-bypass API buffer cache 42

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend