Verifying a high-performance crash-safe file system using a tree - PowerPoint PPT Presentation

Verifying a high-performance crash-safe file system using a tree specifica6on Haogang Chen, Tej Chajed , Stephanie Wang, Alex Konradi, Atalay İleri, Adam Chlipala, M. Frans Kaashoek, Nickolai Zeldovich

File systems are difficult to make correct • Complicated implementa6ons • on-disk layout • in-memory data structures • Computer can crash at any 6me 2

Despite much effort, file systems have bugs • File systems s6ll have subtle bugs • Well documented [Lu, TOS ’14] [Min, SOSP ’15] • Example from ext4:   combina6on of two op6miza6ons allows data to leak from one file to another on crash • Discovered a[er 6 years [Kara 2014] 3

Approach: formal verifica6on • Write a specifica6on • Prove implementa6on meets the specifica6on • Ensures implementa6on handles all corner cases • Proof assistant (Coq) ensures proof is correct • Avoid large class of bugs 4

Exis6ng verified file systems correctness FSCQ [SOSP ’15] BilbyFS [ASPLOS ’16] Yggdrasil [OSDI ’16] verified file systems ext4 btrfs ZFS performance 5

Goal: verified high-performance file system correctness FSCQ [SOSP ’15] ? BilbyFS [ASPLOS ’16] Yggdrasil [OSDI ’16] verified file systems ext4 btrfs ZFS performance 6

Strawman: op6mize FSCQ correctness FSCQ code performance 7

Strawman: op6mize FSCQ spec proof correctness FSCQ code performance 7

Strawman: op6mize FSCQ spec proof? proof correctness FSCQ code fast FSCQ performance 7

Problem: specifica6on incompa6ble with high performance • Achieving high performance requires op6miza6ons • Some op6miza6ons change file-system behavior • Requires changes to specifica6on 8

Example op6miza6on: deferred commit • Deferred commit: buffer system calls un6l fsync • FSCQ’s specifica6on: “if create(f) has returned and computer crashes, f exists” • Deferred commit requires a new specifica6on 9

Op6miza6ons that change crash behavior • Deferred commit: buffer system calls un6l fsync • Log-bypass writes: skip log for data writes • Buffer cache: cache data un6l fdatasync • Exis*ng specifica*ons do not support these op*miza*ons 10

Contribu6on: DFSCQ file system • Precise specifica6on for a subset of POSIX • supports deferred commit and log-bypass writes • Verified, crash-safe file system • Tradi6onal journalling file-system design • Implements most of ext4’s op6miza6ons • Machine-checked proof that implementa6on meets specifica6on • Performance on par with ext4 (but DFSCQ has fewer features) 11

Specifying a file system • Design abstract state 12

Specifying a file system • Design abstract state • Describe how system calls execute 12

Specifying a file system • Design abstract state • Describe how system calls execute • Describe effect of crashes 12

Star6ng point: tree as abstract state Trees are a simplified abstrac6on of a file system g f 13

Specifica6on abstracts implementa6on details g abstract state f implementa6on’s state 14

Specify how system calls affect abstract state specifica6on describes transi6on unlink(g) g f f unlink(g) 15

Challenges in specifying crash behavior • Op6miza6ons mean crashes can be complex • Problem 1: deferred commit • Problem 2: log-bypass writes • Problem 3: caching 16

Problem 1: deferred commit leads to many crash states unlink(g) g f f 17

Problem 1: deferred commit leads to many crash states unlink(g) g f f crash: reset memory 17

Problem 1: deferred commit leads to many crash states g unlink(g) g f f f f crash: reset memory 17

How do we specify crash outcomes with deferred commit? g f f 18

How do we specify crash outcomes with deferred commit? crash g f f 18

Specify deferred commit using tree sequences g tree sequence f 19

Specify deferred commit using tree sequences • Abstract state is a sequence of trees g tree sequence f 19

Specify deferred commit using tree sequences • Abstract state is a sequence of trees • Always read from the latest tree g tree sequence f 19

Specify deferred commit using tree sequences • Metadata updates add new trees in the specifica6on • Always read from the latest tree g f g unlink(g) f f 20

Specify deferred commit using tree sequences • Metadata updates add new trees in the specifica6on • Always read from the latest tree g f f 21

Specify deferred commit using tree sequences • Metadata updates add new trees in the specifica6on • Always read from the latest tree g f f g truncate(f,2) f f f 22

Specify deferred commit using tree sequences • Metadata updates add new trees in the specifica6on • Always read from the latest tree g f f f 23

Specify deferred commit using tree sequences • Metadata updates add new trees in the specifica6on • Always read from the latest tree g f f f g f rename(f,/) f f f 24

Behavior of tree sequences on crash • What about crash behavior? g f tree sequence f f f 25

Behavior of tree sequences on crash • What about crash behavior? g f tree sequence f f f crash post-crash g tree sequence f 25

Crash specifica6on allows background commits g f tree sequence f f f post-crash states: crash g f f f f 26

Specifica6on for fsync g f f f f fsync("/") f 27

Problem 2: log-bypass writes may reorder updates • Log-bypass writes: update file data blocks in place, skipping log write rename f f f 28

Problem 2: log-bypass writes may reorder updates • Log-bypass writes: update file data blocks in place, skipping log • Effect: data writes and metadata updates can be reordered on crash crash write rename f f f f 28

Log-bypass writes f g f f f f g write(f,…) f f f At minimum, writes to latest tree 29

Log-bypass writes f g f f f f g write(f,…) f f f Affects the same file in earlier trees 30

Specify that other files are unaffected f g f f f ? b21 f g write(f,…) b21 b21 f f f Puts an obliga6on on the implementa6on to avoid block re-use within a tree sequence 31

Specify that other files are unaffected f g f f f b21 f g write(f,…) b21 b21 f f f Puts an obliga6on on the implementa6on to avoid block re-use within a tree sequence 32

Specify that other files are unaffected f g f f f b21 f g write(f,…) b21 b21 f f b51 f b51 Puts an obliga6on on the implementa6on to avoid block re-use within a tree sequence 32

Problem 3: data writes are cached • Write-back buffer cache write crash f f f 33

Problem 3: data writes are cached • Write-back buffer cache • Data can be persisted in any order write crash f f f f f f 33

Specifying data caching: block sets f g f f f uncached two possible values: old ( ) and new ( ) 34

Behavior of block sets on crash f g f f f f g crash f f f

Behavior of block sets on crash f g f f f two degrees of non-determinism in crash states: f g crash f f f f f

Behavior of block sets on crash f g f f f two degrees of non-determinism in crash states: f g crash f f f specifica6on allows f metadata and data updates to be reordered f

Specifica6on for fdatasync f g f f f fdatasync(f) 37

Specifica6on for fdatasync f g f f f f g fdatasync(f) f f f fdatasync specifica6on says block sets collapse in every tree 38

Summary: DFSCQ’s tree-based specifica6on • metadata opera6ons add a new tree • fsync collapses to latest tree • writes update blocksets in every tree • fdatasync collapses blocksets in every tree 39

Prove implementa6on meets specifica6on length: 2 type: file … stat(g) g g f f length: 2 type: file … stat(g) 40

Prove implementa6on meets specifica6on length: 2 type: file … stat(g) g g f f length: 2 type: file … stat(g) return values match 40

Prove implementa6on meets specifica6on length: 2 type: file … stat(g) unlink(g) g g g f f f f length: 2 type: file … stat(g) unlink(g) return values match 40

Prove implementa6on meets specifica6on length: 2 type: file … stat(g) unlink(g) g g g f f f f length: 2 type: file … stat(g) unlink(g) disk con6nues to relate return values match to abstract state 40

DFSCQ Design directory name cache inode k -indirect blocks dirty blocks block allocator free-bit cache avoid re-use logging checksums deferred commit log-bypass API buffer cache 41

Many single-layer op6miza6ons directory • Affect only proof of single layer name cache inode k -indirect blocks dirty blocks block allocator free-bit cache avoid re-use logging checksums deferred commit log-bypass API buffer cache 42

Many single-layer op6miza6ons directory • Affect only proof of single layer name cache inode k -indirect blocks dirty blocks block allocator cache free blocks free-bit cache avoid re-use logging checksums deferred commit log-bypass API buffer cache 42

Verifying a high-performance crash-safe file system using a tree - PowerPoint PPT Presentation

Verifying a high-performance crash-safe file system using a tree specifica6on Haogang Chen, Tej Chajed , Stephanie Wang, Alex Konradi, Atalay leri, Adam Chlipala, M. Frans Kaashoek, Nickolai Zeldovich File systems are difficult to make correct

PUEBLO MS2 - CRASH http://pueblo.ms2soft.com/ By: Hannah Haunert TCDS Traffic Crash Location

Cool Cisco IOS Commands: test crash test crash test crash is an undocumented Cisco IOS command

Self- -Verifying Verifying Self Self-Verifying * * Dining Philosophers Dining Philosophers

File Management What is a file? Elements of file management File organization

Verifying concurrent, crash-safe systems with Perennial Tej Chajed , Joseph Tassarotti*, Frans

Click on M odel File for CAD Click on M odel File for CAD Click on Model File for CAD Click

~FILE SYSTEM~ SUNU WIBIRAMA OUTLINE FILE SYSTEM ACCESS METHODS DIRECTORY STRUCTURE FILE

Arizona Crash Report Presentation by Glen Robison State Custodian of Crash Records Prepared

Crash Preventability Determination Program 1 Request and Review Process 2 Eligible Crash Types

CPSC 410/611: File Management What is a file? Elements of file management File

Week 10: File Management What is a file? Elements of file management File

Verifying filesystems in ACL2 Towards verifying file recovery tools Mihir Mehta Department of

Verifying filesystems in ACL2 Towards verifying file recovery tools Mihir Mehta Department of

File System Implementation Summer 2016 Cornell University Today File allocation Unix

Advanced File Systems Thierry Sans Advanced File Systems How to improve the performances?

Chapter 12: File System Implementation File System Structure File System Implementation

Computing using Linux: The Good and the Bad Christoph Lameter HPC and Linux Most of the

Teacher Peter Schneider-Kamp <petersk@imada.sdu.dk> Teaching Assistants Christian

UK Particle Physics Outreach Very Selective Highlights Peter Watkins Head of Particle Physics

Tree-sitter @maxbrunsfeld What is Tree-sitter? Why I wrote Tree-sitter What were

[D YNAMO & G OOGLE F ILE S YSTEM ] Shrideep Pallickara Computer Science Colorado State

Flood Management Task Force November 13, 2020 Welcome and Introductions Thanks for attending!

I SPD 2 0 0 7 Shankar Krishnamoorthy Chief Technical Officer Agenda Physical Design at

From data to publication Walk through ATSAS programs and data deposition Al Kikhney EMBL Hamburg

Verifying a high-performance crash-safe file system using a tree - PowerPoint PPT Presentation

Verifying a high-performance crash-safe file system using a tree specifica6on Haogang Chen, Tej Chajed , Stephanie Wang, Alex Konradi, Atalay leri, Adam Chlipala, M. Frans Kaashoek, Nickolai Zeldovich File systems are difficult to make correct

PUEBLO MS2 - CRASH http://pueblo.ms2soft.com/ By: Hannah Haunert TCDS Traffic Crash Location

Cool Cisco IOS Commands: test crash test crash test crash is an undocumented Cisco IOS command

Self- -Verifying Verifying Self Self-Verifying * * Dining Philosophers Dining Philosophers

File Management What is a file? Elements of file management File organization

Verifying concurrent, crash-safe systems with Perennial Tej Chajed , Joseph Tassarotti*, Frans

Click on M odel File for CAD Click on M odel File for CAD Click on Model File for CAD Click

~FILE SYSTEM~ SUNU WIBIRAMA OUTLINE FILE SYSTEM ACCESS METHODS DIRECTORY STRUCTURE FILE

Arizona Crash Report Presentation by Glen Robison State Custodian of Crash Records Prepared

Crash Preventability Determination Program 1 Request and Review Process 2 Eligible Crash Types

CPSC 410/611: File Management What is a file? Elements of file management File

Week 10: File Management What is a file? Elements of file management File

Verifying filesystems in ACL2 Towards verifying file recovery tools Mihir Mehta Department of

Verifying filesystems in ACL2 Towards verifying file recovery tools Mihir Mehta Department of

File System Implementation Summer 2016 Cornell University Today File allocation Unix

Advanced File Systems Thierry Sans Advanced File Systems How to improve the performances?

Chapter 12: File System Implementation File System Structure File System Implementation

Computing using Linux: The Good and the Bad Christoph Lameter HPC and Linux Most of the

Teacher Peter Schneider-Kamp &lt;petersk@imada.sdu.dk&gt; Teaching Assistants Christian

UK Particle Physics Outreach Very Selective Highlights Peter Watkins Head of Particle Physics

Tree-sitter @maxbrunsfeld What is Tree-sitter? Why I wrote Tree-sitter What were

[D YNAMO &amp; G OOGLE F ILE S YSTEM ] Shrideep Pallickara Computer Science Colorado State

Flood Management Task Force November 13, 2020 Welcome and Introductions Thanks for attending!

I SPD 2 0 0 7 Shankar Krishnamoorthy Chief Technical Officer Agenda Physical Design at

From data to publication Walk through ATSAS programs and data deposition Al Kikhney EMBL Hamburg

Teacher Peter Schneider-Kamp <petersk@imada.sdu.dk> Teaching Assistants Christian

[D YNAMO & G OOGLE F ILE S YSTEM ] Shrideep Pallickara Computer Science Colorado State