Specifying and Checking File System Crash-Consistency Models James - - PowerPoint PPT Presentation

specifying and checking file system crash consistency
SMART_READER_LITE
LIVE PREVIEW

Specifying and Checking File System Crash-Consistency Models James - - PowerPoint PPT Presentation

Specifying and Checking File System Crash-Consistency Models James Bornholt Antoine Kaufmann Jialin Li Arvind Krishnamurthy Emina Torlak Xi Wang University of Washington File systems persist our data Application File System File systems


slide-1
SLIDE 1

Specifying and Checking File System Crash-Consistency Models

James Bornholt Antoine Kaufmann Jialin Li Arvind Krishnamurthy Emina Torlak Xi Wang

University of Washington

slide-2
SLIDE 2

File systems persist our data

File System Application

slide-3
SLIDE 3

File systems persist our data

File System

The best of times The worst of times

Application

slide-4
SLIDE 4

File systems persist our data

File System

The best of times The worst of times

Application

The best of times The worst of times

slide-5
SLIDE 5

But what if the system crashes?

File System

The best of times The worst of times

Application

The best of times The worst of times

slide-6
SLIDE 6

But what if the system crashes?

File System

The best of times The worst of times

Application

The best of times The worst of times POSIX system calls

slide-7
SLIDE 7

But what if the system crashes?

File System

The best of times The worst of times

Application

The best of times The worst of times

This provides roughly the same level of guarantees as ext3.

Linux kernel ext4 documentation

If the file system is inconsistent afuer a crash it is usually automatically checked and repaired when the system is rebooted

Proposed POSIX fsync documentation

POSIX system calls

slide-8
SLIDE 8

But what if the system crashes?

File System

The best of times The worst of times

Application

The best of times The worst of times POSIX system calls

slide-9
SLIDE 9

But what if the system crashes?

File System

The best of times The worst of times

Application

The best of times The worst of times POSIX system calls Optimizations are exposed The best o00000 0000000 of tim

slide-10
SLIDE 10

But what if the system crashes?

File System

The best of times The worst of times

Application

The best of times The worst of times POSIX system calls Optimizations are exposed When gradually appending to a file, the content gets corrupted, causing Chrome to crash

ChromeOS “FS corruption on panic”, 2015

…some of the KDE core config files were reset. Also some of my MySQL databases were killed…

Ubuntu “ext4 data loss”, 2009

The best o00000 0000000 of tim

slide-11
SLIDE 11

Crash-consistency models

File System Application Crash-consistency model

slide-12
SLIDE 12

Crash-consistency models

File System Application Crash-consistency model

A precise formal specification

  • f the crash guarantees that a

file system provides

slide-13
SLIDE 13

Crash-consistency models

File System Application Crash-consistency model

A precise formal specification

  • f the crash guarantees that a

file system provides Just like a memory model!

slide-14
SLIDE 14

Crash-consistency models

File System Application Crash-consistency model

A precise formal specification

  • f the crash guarantees that a

file system provides

Ferrite

Validate the model against the system with litmus tests Just like a memory model!

slide-15
SLIDE 15

Crash behavior of modern file systems Crash-consistency models

Litmus tests & formal specifications

Ferrite: developing crash-consistency models Building crash-safe applications

slide-16
SLIDE 16

Crash behavior of modern file systems Crash-consistency models

Litmus tests & formal specifications

Ferrite: developing crash-consistency models Building crash-safe applications

slide-17
SLIDE 17

Replacing the contents of a file

foo.txt foo.txt foo.txt

The best of times The worst of times The age of wisdom The epoch of belief

slide-18
SLIDE 18

foo.txt foo.txt

Atomic replace via rename

f = create(“foo.tmp”) write(f, “The age of …”) write(f, “The epoch of …”) close(f) rename(“foo.tmp”, “foo.txt”)

The best of times The worst of times

slide-19
SLIDE 19

foo.tmp foo.txt foo.txt

Atomic replace via rename

f = create(“foo.tmp”) write(f, “The age of …”) write(f, “The epoch of …”) close(f) rename(“foo.tmp”, “foo.txt”)

The best of times The worst of times

slide-20
SLIDE 20

foo.tmp foo.txt foo.txt

Atomic replace via rename

f = create(“foo.tmp”) write(f, “The age of …”) write(f, “The epoch of …”) close(f) rename(“foo.tmp”, “foo.txt”)

The best of times The worst of times The age of wisdom

slide-21
SLIDE 21

foo.tmp foo.txt foo.txt

Atomic replace via rename

f = create(“foo.tmp”) write(f, “The age of …”) write(f, “The epoch of …”) close(f) rename(“foo.tmp”, “foo.txt”)

The best of times The worst of times The age of wisdom The epoch of belief

slide-22
SLIDE 22

foo.tmp foo.txt foo.txt

Atomic replace via rename

f = create(“foo.tmp”) write(f, “The age of …”) write(f, “The epoch of …”) close(f) rename(“foo.tmp”, “foo.txt”)

The best of times The worst of times The age of wisdom The epoch of belief

slide-23
SLIDE 23

foo.tmp foo.txt foo.txt

Atomic replace via rename

f = create(“foo.tmp”) write(f, “The age of …”) write(f, “The epoch of …”) close(f) rename(“foo.tmp”, “foo.txt”)

The best of times The worst of times The age of wisdom The epoch of belief

slide-24
SLIDE 24

Atomic replace via rename

f = create(“foo.tmp”) write(f, “The age of …”) write(f, “The epoch of …”) close(f) rename(“foo.tmp”, “foo.txt”)

slide-25
SLIDE 25

Atomic replace via rename

f = create(“foo.tmp”) write(f, “The age of …”) write(f, “The epoch of …”) close(f) rename(“foo.tmp”, “foo.txt”)

create(“foo.tmp”) write(f, “The age of …”) write(f, “The epoch of …”) rename(“foo.tmp”, “foo.txt”)

slide-26
SLIDE 26

Atomic replace via rename

create(“foo.tmp”) write(f, “The age of …”) write(f, “The epoch of …”) rename(“foo.tmp”, “foo.txt”)

slide-27
SLIDE 27

Atomic replace via rename

create(“foo.tmp”) write(f, “The age of …”) write(f, “The epoch of …”) rename(“foo.tmp”, “foo.txt”)

File operations Writes

slide-28
SLIDE 28

Atomic replace via rename

create(“foo.tmp”) write(f, “The age of …”) write(f, “The epoch of …”) rename(“foo.tmp”, “foo.txt”)

slide-29
SLIDE 29

Atomic replace via rename

create(“foo.tmp”) rename(“foo.tmp”, “foo.txt”) write(f, “The age of …”) write(f, “The epoch of …”)

foo.txt foo.tmp foo.txt

The best of times The worst of times

slide-30
SLIDE 30

Atomic replace via rename

create(“foo.tmp”) rename(“foo.tmp”, “foo.txt”) write(f, “The age of …”) write(f, “The epoch of …”)

foo.txt foo.tmp foo.txt

The best of times The worst of times

slide-31
SLIDE 31

Atomic replace via rename

create(“foo.tmp”) rename(“foo.tmp”, “foo.txt”) write(f, “The age of …”) write(f, “The epoch of …”)

foo.txt foo.tmp foo.txt

The best of times The worst of times

Crash!

slide-32
SLIDE 32

The storage stack

write(f, “The age of …”) write(f, “The epoch of …”)

slide-33
SLIDE 33

The storage stack

write(f, “The age of …”) write(f, “The epoch of …”)

Controller Low-level Driver Block Layer File System

slide-34
SLIDE 34

The storage stack

write(f, “The age of …”) write(f, “The epoch of …”)

Controller Low-level Driver Block Layer File System

Diagram by Werner Fischer

slide-35
SLIDE 35

The storage stack

write(f, “The age of …”) write(f, “The epoch of …”)

Controller Low-level Driver Block Layer File System

slide-36
SLIDE 36

The storage stack

write(f, “The age of …”) write(f, “The epoch of …”)

Controller Low-level Driver Block Layer File System

This provides roughly the same level of guarantees as ext3.

Linux kernel ext4 documentation

slide-37
SLIDE 37

The storage stack

write(f, “The age of …”) write(f, “The epoch of …”)

Controller Low-level Driver Block Layer File System

This provides roughly the same level of guarantees as ext3.

Linux kernel ext4 documentation

The key aspects of fsync() are unreasonable to test in a test suite

POSIX specification for fsync

slide-38
SLIDE 38

Existing work

write(f, “The age of …”) write(f, “The epoch of …”)

Controller Low-level Driver Block Layer File System

Formalize the existing POSIX interface (e.g. SibylFS [SOSP’15])

But the interface says nothing about crash safety

slide-39
SLIDE 39

Existing work

write(f, “The age of …”) write(f, “The epoch of …”)

Controller Low-level Driver Block Layer File System

Formalize the existing POSIX interface (e.g. SibylFS [SOSP’15])

But the interface says nothing about crash safety

Build a new crash-safe file system (e.g. FSCQ [SOSP’15])

Comes with extremely high verification burden

slide-40
SLIDE 40

Existing work

write(f, “The age of …”) write(f, “The epoch of …”)

Controller Low-level Driver Block Layer File System

Formalize the existing POSIX interface (e.g. SibylFS [SOSP’15])

But the interface says nothing about crash safety

Build a new crash-safe file system (e.g. FSCQ [SOSP’15])

Comes with extremely high verification burden

Find bugs in existing file systems (e.g. eXplode [OSDI’06])

Ours is a complementary problem: precisely specifying actual behavior

slide-41
SLIDE 41

Crash behavior of modern file systems Crash-consistency models

Litmus tests & formal specifications

Ferrite: developing crash-consistency models Building crash-safe applications

slide-42
SLIDE 42

Crash behavior of modern file systems Crash-consistency models

Litmus tests & formal specifications

Ferrite: developing crash-consistency models Building crash-safe applications

slide-43
SLIDE 43

Crash-consistency models

slide-44
SLIDE 44

Crash-consistency models

Litmus tests

Small programs that demonstrate allowed or forbidden behaviors of a file system across crashes

slide-45
SLIDE 45

Crash-consistency models

Litmus tests Formal specifications

Small programs that demonstrate allowed or forbidden behaviors of a file system across crashes Axiomatic descriptions of crash consistency using first

  • rder logic
slide-46
SLIDE 46

Crash-consistency models

Litmus tests Formal specifications

Small programs that demonstrate allowed or forbidden behaviors of a file system across crashes Axiomatic descriptions of crash consistency using first

  • rder logic

Documentation for application developers

slide-47
SLIDE 47

Crash-consistency models

Litmus tests Formal specifications

Small programs that demonstrate allowed or forbidden behaviors of a file system across crashes Axiomatic descriptions of crash consistency using first

  • rder logic

Documentation for application developers Automated reasoning about crash safety

slide-48
SLIDE 48

Litmus tests

Small programs that demonstrate allowed or forbidden behaviors of a file system across crashes initial: f = create(“file”) write(f, old) main: f = create(“file.tmp”) write(f, new) close(f) rename(“file.tmp”, “file”) exists?: content(“file”) != old & content(“file”) != new

slide-49
SLIDE 49

Litmus tests

Small programs that demonstrate allowed or forbidden behaviors of a file system across crashes initial: f = create(“file”) write(f, old) main: f = create(“file.tmp”) write(f, new) close(f) rename(“file.tmp”, “file”) exists?: content(“file”) != old & content(“file”) != new

Initial setup (cannot crash)

slide-50
SLIDE 50

Litmus tests

Small programs that demonstrate allowed or forbidden behaviors of a file system across crashes initial: f = create(“file”) write(f, old) main: f = create(“file.tmp”) write(f, new) close(f) rename(“file.tmp”, “file”) exists?: content(“file”) != old & content(“file”) != new

Initial setup (cannot crash) Main body (may crash at any point)

slide-51
SLIDE 51

Litmus tests

Small programs that demonstrate allowed or forbidden behaviors of a file system across crashes initial: f = create(“file”) write(f, old) main: f = create(“file.tmp”) write(f, new) close(f) rename(“file.tmp”, “file”) exists?: content(“file”) != old & content(“file”) != new

Initial setup (cannot crash) Main body (may crash at any point) Check whether some (possibly crashing) execution satisfies predicates

slide-52
SLIDE 52

Litmus tests

Small programs that demonstrate allowed or forbidden behaviors of a file system across crashes initial: f = create(“file”) write(f, old) main: f = create(“file.tmp”) write(f, new) close(f) rename(“file.tmp”, “file”) exists?: content(“file”) != old & content(“file”) != new

Initial setup (cannot crash) Main body (may crash at any point) Check whether some (possibly crashing) execution satisfies predicates Check for behavior that may surprise application writers

slide-53
SLIDE 53

Litmus tests

Small programs that demonstrate allowed or forbidden behaviors of a file system across crashes initial: f = create(“file”) write(f, old) main: f = create(“file.tmp”) write(f, new) close(f) rename(“file.tmp”, “file”) exists?: content(“file”) != old & content(“file”) != new

slide-54
SLIDE 54

Litmus tests

Small programs that demonstrate allowed or forbidden behaviors of a file system across crashes initial: f = create(“file”) write(f, old) main: f = create(“file.tmp”) write(f, new) close(f) rename(“file.tmp”, “file”) exists?: content(“file”) != old & content(“file”) != new memory parallelism

slide-55
SLIDE 55

Litmus tests

Small programs that demonstrate allowed or forbidden behaviors of a file system across crashes initial: f = create(“file”) write(f, old) main: f = create(“file.tmp”) write(f, new) close(f) rename(“file.tmp”, “file”) exists?: content(“file”) != old & content(“file”) != new memory parallelism Initially A = B = 0 A = 1 r1 = B B = 1 r2 = A Thread 1 Thread 2 Can r1 = 0 & r2 = 0?

slide-56
SLIDE 56

Litmus tests

Small programs that demonstrate allowed or forbidden behaviors of a file system across crashes Prefix append Atomic replace via rename Atomic create via rename Litmus test

slide-57
SLIDE 57

Litmus tests

Small programs that demonstrate allowed or forbidden behaviors of a file system across crashes Prefix append Atomic replace via rename Atomic create via rename ext4 Unsafe Unsafe Unsafe File system Litmus test

slide-58
SLIDE 58

Litmus tests

Small programs that demonstrate allowed or forbidden behaviors of a file system across crashes Prefix append Atomic replace via rename Atomic create via rename ext4 Unsafe Unsafe Unsafe xfs Safe Unsafe Unsafe f2fs Unsafe Unsafe Unsafe nilfs2 Safe Unsafe Unsafe btrfs Safe Safe Unsafe ufs2 Unsafe Unsafe Unsafe File system Litmus test

slide-59
SLIDE 59

Litmus tests

Small programs that demonstrate allowed or forbidden behaviors of a file system across crashes Prefix append Atomic replace via rename Atomic create via rename ext4 Unsafe Unsafe Unsafe xfs Safe Unsafe Unsafe f2fs Unsafe Unsafe Unsafe nilfs2 Safe Unsafe Unsafe btrfs Safe Safe Unsafe ufs2 Unsafe Unsafe Unsafe File system Litmus test

We suspect that most modern filesystems exhibit the safe append property.

SQLite Atomic Commit documentation

slide-60
SLIDE 60

Crash behavior of modern file systems Crash-consistency models

Litmus tests & formal specifications

Ferrite: developing crash-consistency models Building crash-safe applications

slide-61
SLIDE 61

Formal specifications

Axiomatic descriptions of crash consistency using first order logic Ordering constraints on events in traces f = create(“file.tmp”) write(f, new) close(f) rename(“file.tmp”, “file”) P =

slide-62
SLIDE 62

Formal specifications

Axiomatic descriptions of crash consistency using first order logic Ordering constraints on events in traces f = create(“file.tmp”) write(f, new) close(f) rename(“file.tmp”, “file”) P =

create(“foo.tmp”) write(f, “The epoch of …”) rename(“foo.tmp”, “foo.txt”)

slide-63
SLIDE 63

Formal specifications

Axiomatic descriptions of crash consistency using first order logic Ordering constraints on events in traces f = create(“file.tmp”) write(f, new) close(f) rename(“file.tmp”, “file”) A trace is a sequence of file system events generated by an execution of P P =

create(“foo.tmp”) write(f, “The epoch of …”) rename(“foo.tmp”, “foo.txt”)

slide-64
SLIDE 64

Formal specifications

Axiomatic descriptions of crash consistency using first order logic Ordering constraints on events in traces f = create(“file.tmp”) write(f, new) close(f) rename(“file.tmp”, “file”) A trace is a sequence of file system events generated by an execution of P P =

create(“foo.tmp”) write(f, “The epoch of …”) rename(“foo.tmp”, “foo.txt”)

A crash-consistency model is a filter on traces: it specifies which traces (and prefixes of traces) are allowed.

slide-65
SLIDE 65

Formal specifications

f = create(“file.tmp”) write(f, new) close(f) rename(“file.tmp”, “file”) P =

create(“foo.tmp”) write(f, “The epoch of …”) rename(“foo.tmp”, “foo.txt”)

A crash-consistency model is a filter on traces: it specifies which traces (and prefixes of traces) are allowed.

Stronger models Fewer traces Weaker models More traces

slide-66
SLIDE 66

Formal specifications

f = create(“file.tmp”) write(f, new) close(f) rename(“file.tmp”, “file”) P =

create(“foo.tmp”) write(f, “The epoch of …”) rename(“foo.tmp”, “foo.txt”)

A crash-consistency model is a filter on traces: it specifies which traces (and prefixes of traces) are allowed.

Sequential crash-consistency allows no reorderings Stronger models Fewer traces Weaker models More traces

slide-67
SLIDE 67

Formal specifications

f = create(“file.tmp”) write(f, new) close(f) rename(“file.tmp”, “file”) P =

create(“foo.tmp”) write(f, “The epoch of …”) rename(“foo.tmp”, “foo.txt”)

A crash-consistency model is a filter on traces: it specifies which traces (and prefixes of traces) are allowed.

Sequential crash-consistency allows no reorderings Stronger models Fewer traces Weaker models More traces Relaxed file systems allow more reorderings

slide-68
SLIDE 68

ext4 crash-consistency

f = create(“file.tmp”) write(f, new) close(f) rename(“file.tmp”, “file”) P =

create(“foo.tmp”) write(f, “The epoch of …”) rename(“foo.tmp”, “foo.txt”)

slide-69
SLIDE 69

ext4 crash-consistency

f = create(“file.tmp”) write(f, new) close(f) rename(“file.tmp”, “file”) P =

create(“foo.tmp”) write(f, “The epoch of …”) rename(“foo.tmp”, “foo.txt”)

ext4 crash-consistency: allows traces that respect ordering of:

  • 1. Metadata updates to the same file
  • 2. Same-block writes
  • 3. Same-directory operations
  • 4. Write-append operations
slide-70
SLIDE 70

Crash-consistency models

Like memory consistency models but for describing file system crashes Litmus tests Formal specifications

Small programs that demonstrate allowed or forbidden behaviors of a file system across crashes Axiomatic descriptions of crash consistency using first

  • rder logic

Documentation for application developers Automated reasoning about crash safety

slide-71
SLIDE 71

Crash behavior of modern file systems Crash-consistency models

Litmus tests & formal specifications

Ferrite: developing crash-consistency models Building crash-safe applications

slide-72
SLIDE 72

Crash behavior of modern file systems Crash-consistency models

Litmus tests & formal specifications

Ferrite: developing crash-consistency models Building crash-safe applications

slide-73
SLIDE 73

The storage stack is complex

write(f, “The age of …”) write(f, “The epoch of …”)

Diagram by Werner Fischer

slide-74
SLIDE 74

Building models with Ferrite

Litmus tests File System

(via QEMU)

Ferrite

slide-75
SLIDE 75

Building models with Ferrite

Litmus tests File System

(via QEMU)

Ferrite

System calls

slide-76
SLIDE 76

Building models with Ferrite

Litmus tests File System

(via QEMU)

Ferrite

Storage stack System calls

slide-77
SLIDE 77

Building models with Ferrite

Litmus tests File System

(via QEMU)

Ferrite

Storage stack System calls Disk commands

slide-78
SLIDE 78

Building models with Ferrite

Litmus tests File System

(via QEMU)

Ferrite

Storage stack System calls Disk commands Correlate system calls and disk commands; generate possible crash outcomes

slide-79
SLIDE 79

Building models with Ferrite

Litmus tests File System

(via QEMU)

Ferrite Crash-consistency Model

slide-80
SLIDE 80

Checking models with Ferrite

Litmus tests Results Ferrite Crash-consistency Model

Check the model produces expected

  • utcomes.
slide-81
SLIDE 81

Crash behavior of modern file systems Crash-consistency models

Litmus tests & formal specifications

Ferrite: developing crash-consistency models Building crash-safe applications

slide-82
SLIDE 82

Crash behavior of modern file systems Crash-consistency models

Litmus tests & formal specifications

Ferrite: developing crash-consistency models Building crash-safe applications

slide-83
SLIDE 83

exists?: content(“file”) != old & content(“file”) != new

Automating crash consistency

initial: f = create(“file”) write(f, old) main: f = create(“file.tmp”) write(f, new) close(f) rename(“file.tmp”, “file”)

slide-84
SLIDE 84

exists?: content(“file”) != old & content(“file”) != new

Automating crash consistency

initial: f = create(“file”) write(f, old) main: fsync(f) f = create(“file.tmp”) fsync(f) write(f, new) fsync(f) close(f) fsync(f) rename(“file.tmp”, “file”) fsync(f)

slide-85
SLIDE 85

Synthesizing crash consistency

initial: f = create(“file”) write(f, old) main: f = create(“file.tmp”) write(f, new) close(f) rename(“file.tmp”, “file”) exists?: content(“file”) != old & content(“file”) != new

Program Crash-consistency model Spec

slide-86
SLIDE 86

Synthesizing crash consistency

initial: f = create(“file”) write(f, old) main: f = create(“file.tmp”) write(f, new) close(f) rename(“file.tmp”, “file”) exists?: content(“file”) != old & content(“file”) != new

Program Crash-consistency model Synthesizer Spec

slide-87
SLIDE 87

fsync(f) rename(“file.tmp”, “file”) exists?: content(“file”) != old & content(“file”) != new

Synthesizing crash consistency

initial: f = create(“file”) write(f, old) main: close(f) f = create(“file.tmp”) write(f, new)

Program Crash-consistency model Synthesizer Spec

slide-88
SLIDE 88

fsync(f) rename(“file.tmp”, “file”) exists?: content(“file”) != old & content(“file”) != new

Synthesizing crash consistency

initial: f = create(“file”) write(f, old) main: close(f)

Crash-safe program

f = create(“file.tmp”) write(f, new)

Program Crash-consistency model Synthesizer Spec

slide-89
SLIDE 89

fsync(f) rename(“file.tmp”, “file”) exists?: content(“file”) != old & content(“file”) != new

Synthesizing crash consistency

initial: f = create(“file”) write(f, old) main: close(f)

Crash-safe program Minimal necessary synchronization

f = create(“file.tmp”) write(f, new)

Program Crash-consistency model Synthesizer Spec

slide-90
SLIDE 90

Crash-consistency models

File System Application Crash-consistency model

A precise formal specification

  • f the crash guarantees that a

file system provides

Ferrite

Validate the model against the system with litmus tests

slide-91
SLIDE 91

Crash-consistency models

File System Application Crash-consistency model

A precise formal specification

  • f the crash guarantees that a

file system provides

Ferrite

Validate the model against the system with litmus tests

A DNA-Based Archival Storage System

Wednesday, right before lunch