Specifying and Checking File System Crash-Consistency Models
James Bornholt Antoine Kaufmann Jialin Li Arvind Krishnamurthy Emina Torlak Xi Wang
University of Washington
Specifying and Checking File System Crash-Consistency Models James - - PowerPoint PPT Presentation
Specifying and Checking File System Crash-Consistency Models James Bornholt Antoine Kaufmann Jialin Li Arvind Krishnamurthy Emina Torlak Xi Wang University of Washington File systems persist our data Application File System File systems
James Bornholt Antoine Kaufmann Jialin Li Arvind Krishnamurthy Emina Torlak Xi Wang
University of Washington
File System Application
File System
The best of times The worst of times
Application
File System
The best of times The worst of times
Application
The best of times The worst of times
File System
The best of times The worst of times
Application
The best of times The worst of times
File System
The best of times The worst of times
Application
The best of times The worst of times POSIX system calls
File System
The best of times The worst of times
Application
The best of times The worst of times
This provides roughly the same level of guarantees as ext3.
Linux kernel ext4 documentation
If the file system is inconsistent afuer a crash it is usually automatically checked and repaired when the system is rebooted
Proposed POSIX fsync documentation
POSIX system calls
File System
The best of times The worst of times
Application
The best of times The worst of times POSIX system calls
File System
The best of times The worst of times
Application
The best of times The worst of times POSIX system calls Optimizations are exposed The best o00000 0000000 of tim
File System
The best of times The worst of times
Application
The best of times The worst of times POSIX system calls Optimizations are exposed When gradually appending to a file, the content gets corrupted, causing Chrome to crash
ChromeOS “FS corruption on panic”, 2015
…some of the KDE core config files were reset. Also some of my MySQL databases were killed…
Ubuntu “ext4 data loss”, 2009
The best o00000 0000000 of tim
File System Application Crash-consistency model
File System Application Crash-consistency model
A precise formal specification
file system provides
File System Application Crash-consistency model
A precise formal specification
file system provides Just like a memory model!
File System Application Crash-consistency model
A precise formal specification
file system provides
Ferrite
Validate the model against the system with litmus tests Just like a memory model!
Litmus tests & formal specifications
Litmus tests & formal specifications
foo.txt foo.txt foo.txt
The best of times The worst of times The age of wisdom The epoch of belief
foo.txt foo.txt
f = create(“foo.tmp”) write(f, “The age of …”) write(f, “The epoch of …”) close(f) rename(“foo.tmp”, “foo.txt”)
The best of times The worst of times
foo.tmp foo.txt foo.txt
f = create(“foo.tmp”) write(f, “The age of …”) write(f, “The epoch of …”) close(f) rename(“foo.tmp”, “foo.txt”)
The best of times The worst of times
foo.tmp foo.txt foo.txt
f = create(“foo.tmp”) write(f, “The age of …”) write(f, “The epoch of …”) close(f) rename(“foo.tmp”, “foo.txt”)
The best of times The worst of times The age of wisdom
foo.tmp foo.txt foo.txt
f = create(“foo.tmp”) write(f, “The age of …”) write(f, “The epoch of …”) close(f) rename(“foo.tmp”, “foo.txt”)
The best of times The worst of times The age of wisdom The epoch of belief
foo.tmp foo.txt foo.txt
f = create(“foo.tmp”) write(f, “The age of …”) write(f, “The epoch of …”) close(f) rename(“foo.tmp”, “foo.txt”)
The best of times The worst of times The age of wisdom The epoch of belief
foo.tmp foo.txt foo.txt
f = create(“foo.tmp”) write(f, “The age of …”) write(f, “The epoch of …”) close(f) rename(“foo.tmp”, “foo.txt”)
The best of times The worst of times The age of wisdom The epoch of belief
f = create(“foo.tmp”) write(f, “The age of …”) write(f, “The epoch of …”) close(f) rename(“foo.tmp”, “foo.txt”)
f = create(“foo.tmp”) write(f, “The age of …”) write(f, “The epoch of …”) close(f) rename(“foo.tmp”, “foo.txt”)
create(“foo.tmp”) write(f, “The age of …”) write(f, “The epoch of …”) rename(“foo.tmp”, “foo.txt”)
create(“foo.tmp”) write(f, “The age of …”) write(f, “The epoch of …”) rename(“foo.tmp”, “foo.txt”)
create(“foo.tmp”) write(f, “The age of …”) write(f, “The epoch of …”) rename(“foo.tmp”, “foo.txt”)
File operations Writes
create(“foo.tmp”) write(f, “The age of …”) write(f, “The epoch of …”) rename(“foo.tmp”, “foo.txt”)
create(“foo.tmp”) rename(“foo.tmp”, “foo.txt”) write(f, “The age of …”) write(f, “The epoch of …”)
foo.txt foo.tmp foo.txt
The best of times The worst of times
create(“foo.tmp”) rename(“foo.tmp”, “foo.txt”) write(f, “The age of …”) write(f, “The epoch of …”)
foo.txt foo.tmp foo.txt
The best of times The worst of times
create(“foo.tmp”) rename(“foo.tmp”, “foo.txt”) write(f, “The age of …”) write(f, “The epoch of …”)
foo.txt foo.tmp foo.txt
The best of times The worst of times
Crash!
Controller Low-level Driver Block Layer File System
Controller Low-level Driver Block Layer File System
Diagram by Werner Fischer
Controller Low-level Driver Block Layer File System
Controller Low-level Driver Block Layer File System
This provides roughly the same level of guarantees as ext3.
Linux kernel ext4 documentation
Controller Low-level Driver Block Layer File System
This provides roughly the same level of guarantees as ext3.
Linux kernel ext4 documentation
The key aspects of fsync() are unreasonable to test in a test suite
POSIX specification for fsync
Controller Low-level Driver Block Layer File System
Formalize the existing POSIX interface (e.g. SibylFS [SOSP’15])
But the interface says nothing about crash safety
Controller Low-level Driver Block Layer File System
Formalize the existing POSIX interface (e.g. SibylFS [SOSP’15])
But the interface says nothing about crash safety
Build a new crash-safe file system (e.g. FSCQ [SOSP’15])
Comes with extremely high verification burden
Controller Low-level Driver Block Layer File System
Formalize the existing POSIX interface (e.g. SibylFS [SOSP’15])
But the interface says nothing about crash safety
Build a new crash-safe file system (e.g. FSCQ [SOSP’15])
Comes with extremely high verification burden
Find bugs in existing file systems (e.g. eXplode [OSDI’06])
Ours is a complementary problem: precisely specifying actual behavior
Litmus tests & formal specifications
Litmus tests & formal specifications
Litmus tests
Small programs that demonstrate allowed or forbidden behaviors of a file system across crashes
Litmus tests Formal specifications
Small programs that demonstrate allowed or forbidden behaviors of a file system across crashes Axiomatic descriptions of crash consistency using first
Litmus tests Formal specifications
Small programs that demonstrate allowed or forbidden behaviors of a file system across crashes Axiomatic descriptions of crash consistency using first
Documentation for application developers
Litmus tests Formal specifications
Small programs that demonstrate allowed or forbidden behaviors of a file system across crashes Axiomatic descriptions of crash consistency using first
Documentation for application developers Automated reasoning about crash safety
Small programs that demonstrate allowed or forbidden behaviors of a file system across crashes initial: f = create(“file”) write(f, old) main: f = create(“file.tmp”) write(f, new) close(f) rename(“file.tmp”, “file”) exists?: content(“file”) != old & content(“file”) != new
Small programs that demonstrate allowed or forbidden behaviors of a file system across crashes initial: f = create(“file”) write(f, old) main: f = create(“file.tmp”) write(f, new) close(f) rename(“file.tmp”, “file”) exists?: content(“file”) != old & content(“file”) != new
Initial setup (cannot crash)
Small programs that demonstrate allowed or forbidden behaviors of a file system across crashes initial: f = create(“file”) write(f, old) main: f = create(“file.tmp”) write(f, new) close(f) rename(“file.tmp”, “file”) exists?: content(“file”) != old & content(“file”) != new
Initial setup (cannot crash) Main body (may crash at any point)
Small programs that demonstrate allowed or forbidden behaviors of a file system across crashes initial: f = create(“file”) write(f, old) main: f = create(“file.tmp”) write(f, new) close(f) rename(“file.tmp”, “file”) exists?: content(“file”) != old & content(“file”) != new
Initial setup (cannot crash) Main body (may crash at any point) Check whether some (possibly crashing) execution satisfies predicates
Small programs that demonstrate allowed or forbidden behaviors of a file system across crashes initial: f = create(“file”) write(f, old) main: f = create(“file.tmp”) write(f, new) close(f) rename(“file.tmp”, “file”) exists?: content(“file”) != old & content(“file”) != new
Initial setup (cannot crash) Main body (may crash at any point) Check whether some (possibly crashing) execution satisfies predicates Check for behavior that may surprise application writers
Small programs that demonstrate allowed or forbidden behaviors of a file system across crashes initial: f = create(“file”) write(f, old) main: f = create(“file.tmp”) write(f, new) close(f) rename(“file.tmp”, “file”) exists?: content(“file”) != old & content(“file”) != new
Small programs that demonstrate allowed or forbidden behaviors of a file system across crashes initial: f = create(“file”) write(f, old) main: f = create(“file.tmp”) write(f, new) close(f) rename(“file.tmp”, “file”) exists?: content(“file”) != old & content(“file”) != new memory parallelism
Small programs that demonstrate allowed or forbidden behaviors of a file system across crashes initial: f = create(“file”) write(f, old) main: f = create(“file.tmp”) write(f, new) close(f) rename(“file.tmp”, “file”) exists?: content(“file”) != old & content(“file”) != new memory parallelism Initially A = B = 0 A = 1 r1 = B B = 1 r2 = A Thread 1 Thread 2 Can r1 = 0 & r2 = 0?
Small programs that demonstrate allowed or forbidden behaviors of a file system across crashes Prefix append Atomic replace via rename Atomic create via rename Litmus test
Small programs that demonstrate allowed or forbidden behaviors of a file system across crashes Prefix append Atomic replace via rename Atomic create via rename ext4 Unsafe Unsafe Unsafe File system Litmus test
Small programs that demonstrate allowed or forbidden behaviors of a file system across crashes Prefix append Atomic replace via rename Atomic create via rename ext4 Unsafe Unsafe Unsafe xfs Safe Unsafe Unsafe f2fs Unsafe Unsafe Unsafe nilfs2 Safe Unsafe Unsafe btrfs Safe Safe Unsafe ufs2 Unsafe Unsafe Unsafe File system Litmus test
Small programs that demonstrate allowed or forbidden behaviors of a file system across crashes Prefix append Atomic replace via rename Atomic create via rename ext4 Unsafe Unsafe Unsafe xfs Safe Unsafe Unsafe f2fs Unsafe Unsafe Unsafe nilfs2 Safe Unsafe Unsafe btrfs Safe Safe Unsafe ufs2 Unsafe Unsafe Unsafe File system Litmus test
We suspect that most modern filesystems exhibit the safe append property.
SQLite Atomic Commit documentation
Litmus tests & formal specifications
Axiomatic descriptions of crash consistency using first order logic Ordering constraints on events in traces f = create(“file.tmp”) write(f, new) close(f) rename(“file.tmp”, “file”) P =
Axiomatic descriptions of crash consistency using first order logic Ordering constraints on events in traces f = create(“file.tmp”) write(f, new) close(f) rename(“file.tmp”, “file”) P =
create(“foo.tmp”) write(f, “The epoch of …”) rename(“foo.tmp”, “foo.txt”)
Axiomatic descriptions of crash consistency using first order logic Ordering constraints on events in traces f = create(“file.tmp”) write(f, new) close(f) rename(“file.tmp”, “file”) A trace is a sequence of file system events generated by an execution of P P =
create(“foo.tmp”) write(f, “The epoch of …”) rename(“foo.tmp”, “foo.txt”)
Axiomatic descriptions of crash consistency using first order logic Ordering constraints on events in traces f = create(“file.tmp”) write(f, new) close(f) rename(“file.tmp”, “file”) A trace is a sequence of file system events generated by an execution of P P =
create(“foo.tmp”) write(f, “The epoch of …”) rename(“foo.tmp”, “foo.txt”)
A crash-consistency model is a filter on traces: it specifies which traces (and prefixes of traces) are allowed.
f = create(“file.tmp”) write(f, new) close(f) rename(“file.tmp”, “file”) P =
create(“foo.tmp”) write(f, “The epoch of …”) rename(“foo.tmp”, “foo.txt”)
A crash-consistency model is a filter on traces: it specifies which traces (and prefixes of traces) are allowed.
Stronger models Fewer traces Weaker models More traces
f = create(“file.tmp”) write(f, new) close(f) rename(“file.tmp”, “file”) P =
create(“foo.tmp”) write(f, “The epoch of …”) rename(“foo.tmp”, “foo.txt”)
A crash-consistency model is a filter on traces: it specifies which traces (and prefixes of traces) are allowed.
Sequential crash-consistency allows no reorderings Stronger models Fewer traces Weaker models More traces
f = create(“file.tmp”) write(f, new) close(f) rename(“file.tmp”, “file”) P =
create(“foo.tmp”) write(f, “The epoch of …”) rename(“foo.tmp”, “foo.txt”)
A crash-consistency model is a filter on traces: it specifies which traces (and prefixes of traces) are allowed.
Sequential crash-consistency allows no reorderings Stronger models Fewer traces Weaker models More traces Relaxed file systems allow more reorderings
f = create(“file.tmp”) write(f, new) close(f) rename(“file.tmp”, “file”) P =
create(“foo.tmp”) write(f, “The epoch of …”) rename(“foo.tmp”, “foo.txt”)
f = create(“file.tmp”) write(f, new) close(f) rename(“file.tmp”, “file”) P =
create(“foo.tmp”) write(f, “The epoch of …”) rename(“foo.tmp”, “foo.txt”)
ext4 crash-consistency: allows traces that respect ordering of:
Like memory consistency models but for describing file system crashes Litmus tests Formal specifications
Small programs that demonstrate allowed or forbidden behaviors of a file system across crashes Axiomatic descriptions of crash consistency using first
Documentation for application developers Automated reasoning about crash safety
Litmus tests & formal specifications
Litmus tests & formal specifications
Diagram by Werner Fischer
Litmus tests File System
(via QEMU)
Ferrite
Litmus tests File System
(via QEMU)
Ferrite
System calls
Litmus tests File System
(via QEMU)
Ferrite
Storage stack System calls
Litmus tests File System
(via QEMU)
Ferrite
Storage stack System calls Disk commands
Litmus tests File System
(via QEMU)
Ferrite
Storage stack System calls Disk commands Correlate system calls and disk commands; generate possible crash outcomes
Litmus tests File System
(via QEMU)
Ferrite Crash-consistency Model
Litmus tests Results Ferrite Crash-consistency Model
Check the model produces expected
Litmus tests & formal specifications
Litmus tests & formal specifications
exists?: content(“file”) != old & content(“file”) != new
initial: f = create(“file”) write(f, old) main: f = create(“file.tmp”) write(f, new) close(f) rename(“file.tmp”, “file”)
exists?: content(“file”) != old & content(“file”) != new
initial: f = create(“file”) write(f, old) main: fsync(f) f = create(“file.tmp”) fsync(f) write(f, new) fsync(f) close(f) fsync(f) rename(“file.tmp”, “file”) fsync(f)
initial: f = create(“file”) write(f, old) main: f = create(“file.tmp”) write(f, new) close(f) rename(“file.tmp”, “file”) exists?: content(“file”) != old & content(“file”) != new
Program Crash-consistency model Spec
initial: f = create(“file”) write(f, old) main: f = create(“file.tmp”) write(f, new) close(f) rename(“file.tmp”, “file”) exists?: content(“file”) != old & content(“file”) != new
Program Crash-consistency model Synthesizer Spec
fsync(f) rename(“file.tmp”, “file”) exists?: content(“file”) != old & content(“file”) != new
initial: f = create(“file”) write(f, old) main: close(f) f = create(“file.tmp”) write(f, new)
Program Crash-consistency model Synthesizer Spec
fsync(f) rename(“file.tmp”, “file”) exists?: content(“file”) != old & content(“file”) != new
initial: f = create(“file”) write(f, old) main: close(f)
Crash-safe program
f = create(“file.tmp”) write(f, new)
Program Crash-consistency model Synthesizer Spec
fsync(f) rename(“file.tmp”, “file”) exists?: content(“file”) != old & content(“file”) != new
initial: f = create(“file”) write(f, old) main: close(f)
Crash-safe program Minimal necessary synchronization
f = create(“file.tmp”) write(f, new)
Program Crash-consistency model Synthesizer Spec
File System Application Crash-consistency model
A precise formal specification
file system provides
Ferrite
Validate the model against the system with litmus tests
File System Application Crash-consistency model
A precise formal specification
file system provides
Ferrite
Validate the model against the system with litmus tests
Wednesday, right before lunch