Using Model Checking to Find Serious File System Errors Junfeng - - PowerPoint PPT Presentation

using model checking to find serious file system errors
SMART_READER_LITE
LIVE PREVIEW

Using Model Checking to Find Serious File System Errors Junfeng - - PowerPoint PPT Presentation

Using Model Checking to Find Serious File System Errors Junfeng Yang, Paul Twohey, Dawson Engler Stanford University Madanlal Musuvathi Microsoft Research Authors Dawson Junfeng Paul Madan FS Errors are Destructive Kernel crash, FS


slide-1
SLIDE 1

Using Model Checking to Find Serious File System Errors

Junfeng Yang, Paul Twohey, Dawson Engler

Stanford University

Madanlal Musuvathi

Microsoft Research

slide-2
SLIDE 2

Authors

Junfeng Paul Dawson Madan

slide-3
SLIDE 3

FS Errors are Destructive

  • Kernel crash, FS corruption
  • Recovery code is error-prone

– Crash at any point, must recover

  • Hard to test

– Slow reboot, reconstruction – many crash possibilities, hard to cover all

slide-4
SLIDE 4

FiSC = File System Model Checker

  • Leverages CMC [OSDI 02, NSDI 04]

– Implementation-level Model Checker

  • Generic and FS-specific checks
  • Good at enumerating failures/crashes
  • 32 Bugs on JFS, ReiserFS and ext3

– 10 unrecoverable losses of ‘/’, hard to get with static analysis – 3 security holes – 30 confirmed and 21 fixed quickly

slide-5
SLIDE 5

Outline

  • How FiSC works
  • Two consistency checks
  • How to plug a file system into FiSC
  • Checking crashes during recovery
  • Results
slide-6
SLIDE 6

Idealized Checking Process

mkdir root file0 root root dir0

slide-7
SLIDE 7

Galactic View of FiSC

Test Driver Disk r/w FS operations fsck libc interceptor libc r/w ext3 FiSC Disks User Mode Linux CMC

slide-8
SLIDE 8

root file0 root file0 dir1 scheduler kernel threads create mkdir …

The Checking Loop

checkers state seen? permuter drop state queue

disk writes Will talk about permuter later… Test Driver

Err! Err!

Our modified scheduler will enumerate through all kernel threads and file system operations

slide-9
SLIDE 9

Difference With Randomized Testing

  • Randomized testing = only one possible

execution

  • Our approach = guided search

– Systematic: enumerate through all actions – Better controlled: choose what to explore – Visibility: see all events – Repeatable: bugs are replicable

slide-10
SLIDE 10
  • loss of an extent of inodes!
  • 3 years old, ever since the first version!
  • Caused serious data-loss

– Dave Kleikamp (IBM JFS): “I'm sure this has bitten us before, but it's usually hard to go back and find out what causes the file system to get messed up so bad”

  • Fixed in 2 days with our complete trace

Long-lived JFS fsck Bug Fixed in 2 Days

slide-11
SLIDE 11

Outline

  • How FiSC works
  • Two consistency checks
  • How to plug a file system into FiSC
  • Checking crashes during recovery
  • Results
slide-12
SLIDE 12

Checking FS Operations are Correct

root file0 dir1 Abstract FS Current State root file0 Abstract FS block cache Next State dirty blocks actual_mkdir

  • Abstract FS: model of a file system. Currently

tracks topology and file sizes. Can be extended

  • Reference model, run in parallel with the actual FS

abstract_mkdir abstract actual

slide-13
SLIDE 13

Checking FS Operations are Correct

root file0 dir1 Abstract FS Next State dirty blocks

  • Generic, implemented by FiSC

root file0 dir1 Actual FS

=

abstract actual abstract = marshal the actual FS, record the topology and file sizes, throw away details

?

slide-14
SLIDE 14

root file0 root file0 dir1 scheduler kernel threads create mkdir … checkers state seen? permuter drop state queue

disk writes Test Driver

Err! Err! Permuter: Write Schedules are Recoverable

slide-15
SLIDE 15

Permuter: Write Schedules are Recoverable

  • Stable FS: what FS should recover to after crash
  • FS-Specific, provided by FS developers

Current State root file0 Stable FS block cache Next State dirty blocks root file0 Stable FS mkdir

slide-16
SLIDE 16

Permuter: Write Schedules are Recoverable

Next State dirty blocks root file0 Stable FS root file0 Recovered permute clone = fsck writes

= ?

slide-17
SLIDE 17

Outline

  • How FiSC works
  • Two consistency checks
  • How to plug a file system into FiSC
  • Checking crashes during recovery
  • Results
slide-18
SLIDE 18

Plugging an FS into FiSC

  • 1. FS utilities: mkfs, fsck
  • 2. Dirty buffers

– Not needed if using standard system mark_dirty

  • 3. Minimum disk and memory sizes

– 2MB, 16 pages for ext3

  • 4. Function to compute the Stable FS

– Stable FS: What FS should recover to, FS-specific

....................................................................

  • Roughly 1-2 weeks for us
slide-19
SLIDE 19

Stable FS Trick for Journaling FS

  • Only log write can update the Stable FS

– Log write  use fsck to compute Stable FS – FS write  fsck and abstract, compare result to Stable FS – FS writes cannot change Stable FS

  • Log write = commit + normal log write

– Only commit can update the Stable FS – If easy to recognize commit, update Stable FS on commit

slide-20
SLIDE 20

Checking More Thoroughly

  • Downscale

– Small disks. 2MB for ext3 – Small memory. 16 pages for ext3 – Tiny FS topology. 2-4 nodes

  • Canonicalization

– General rule: setting things to constants: e.g. inode generation #, mount count – Filenames. “x”, “y”, “z” == “1”, “2”, “3”

slide-21
SLIDE 21

Exposing choice points

  • Choice point = can abstractly do

multiple actions, practically does one

  • Want to explore all actions

struct block* read_block (int i) { struct block *b; if ((b = cache_lookup(i))) return b; return disk_read (i); } if (fisc_choose(2) == 0)

return twice, 1st time return 0, 2nd time return 1 if there are N possible actions, call fisc_choose(N) return 0, 1, …, N-1

slide-22
SLIDE 22

root file0 root file0 dir1 scheduler kernel threads create mkdir …

Scheduler is a Built-in Choice Point

checkers state seen? permuter drop state queue

disk writes Test Driver

Err! Err!

Kernel threads and FS operations are possible actions. Enumerate through all of them.

slide-23
SLIDE 23

Outline

  • How FiSC works
  • Two consistency checks
  • How to plug a file system into FiSC
  • Checking crashes during recovery
  • Results
slide-24
SLIDE 24

The Basic Check

  • Obtain a crashed disk image D
  • Run fsck, recording all writes
  • Simulate a crash during recovery

– Apply prefix to D – Re-run fsck – Compare to Stable FS

  • Repeat until all the prefixes are tried
  • Effective, Speed (redundant crashes)
slide-25
SLIDE 25

Assume: fsck is Deterministic

  • Same inputs  same outputs

– Inputs = disk reads , outputs = writes

  • Is crash after a write redundant?

– A write doesn’t change prior reads  2nd fsck computes the same write  redundant crash, can be optimized away

  • More optimizations in paper

– Obvious: cache fsck results

slide-26
SLIDE 26

Equivalent: Write But No Read

read B1 write B2 read B1 write B2 crash & re-run … read B1 write B2, same! … Same!

  • No read of B2 prior to write of B2

Schedule 1: Schedule 2:

=

done done.

slide-27
SLIDE 27

Equivalent: Dominated Write

read B1 write B2 write B2 … read B1 write B2 crash & re-run write B2 read B1 write B2 … Same!

  • 2nd write of B2 is

dominated by 1st write of B2

… … …

=

write B2, same! Schedule 1: Schedule 2: done done

slide-28
SLIDE 28

Results

Error Type VFS ext2 ext3 JFS Reiser total Data loss N/A N/A 1 8 1 10 False clean N/A N/A 1 1 2 Security 2 2 1 3 + 2 Crashes 1 10 1 12 Other 1 1 1 3 Total 2 2 5 21 2 32

32 in total, 21 fixed, 9 of the remaining 11 confirmed

slide-29
SLIDE 29

Recovery Write Ordering Bugs

  • Under Normal operation:

– Changes must first be flushed to log before they can reach the actual FS

  • All FS seem to get this right
  • During Recovery:

– Changes must first be flushed to the actual FS before the log can be cleared

  • Found this type of bug in all FS, total 5
slide-30
SLIDE 30

ext3 Recovery Bug

recover_ext3_journal(…) { // … retval = -journal_recover(journal) // … // clear the journal e2fsck_journal_release(…) // … } journal_recover(…) { // replay the journal //… // sync modifications to disk fsync_no_super (…) }

  • Code was directly adapted from the kernel
  • But, fsync_no_super was defined as NOP !

// Error! Empty macro, doesn’t sync data! #define fsync_no_super(dev) do {} while (0)

slide-31
SLIDE 31

Conclusion

  • FiSC, a FS model checker

– On average 1-2 weeks to plug in an FS – Checked JFS, ReiserFS and ext3 – Serious data-loss bugs in all, 10 in total

  • Model Checking worked well

– Can crash everywhere. Must always be recoverable. – Systematic

  • Future work: anything that must handle failure

correctly, always

– Raid, databases, consensus algorithms…