Recon: Verifying File System Consistency at Runtime Daniel Fryer, - - PowerPoint PPT Presentation

recon verifying file system
SMART_READER_LITE
LIVE PREVIEW

Recon: Verifying File System Consistency at Runtime Daniel Fryer, - - PowerPoint PPT Presentation

Recon: Verifying File System Consistency at Runtime Daniel Fryer, Jack (Kuei) Sun, Rahat Mahmood, TingHao Cheng, Shaun Benjamin, Angela Demke Brown and Ashvin Goel University of Toronto October 4, 2011 Metadata Integrity is Crucial D Kernel


slide-1
SLIDE 1

October 4, 2011

Recon: Verifying File System Consistency at Runtime

Daniel Fryer, Jack (Kuei) Sun, Rahat Mahmood, TingHao Cheng, Shaun Benjamin, Angela Demke Brown and Ashvin Goel University of Toronto

slide-2
SLIDE 2

Metadata Integrity is Crucial

You don’t know what you’ve got ’til it’s gone…

2

D D a D D D D D t D D a

Kernel

Block Layer M M M

Storage

File System

slide-3
SLIDE 3

File Systems Have Bugs

Why can’t existing solutions handle this problem?

3 Bugs in Linux Ext3 File System Closed panic/ext3 fs corruption with RHEL4-U6-re20070927.0 2007-11 Re: [2.6.27] filesystem (ext3) corruption (access beyond end) 2008-06 linux-2.6: ext3 filesystem corruption 2008-09 linux-image-2.6.29-2-amd64: occasional ext3 filesystem corruption 2009-06 ENOSPC during fsstress leads to filesystem corruption on ext2, ext3, and ext4 2010-03 ext3: Fix fs corruption when make_indexed_dir() fails 2011-06 Data corruption: resume from hibernate always ends up with EXT3 fs errors Not yet

slide-4
SLIDE 4

“Solutions”

4

None of these protect against bugs in file systems Existing approaches assume file systems are correct Kernel

Block Layer

Storage

File System

RAID? Checksums? Journals?

slide-5
SLIDE 5

Offline Checking

  • Check consistency offline, e.g., fsck
  • Consistency properties necessary for correctness

5

FS1: No double allocation FS2: Refcount-based sharing D D M M D Ref: 2 M M metadata data

slide-6
SLIDE 6

Problems with Offline Checking

  • Slow, getting slower with larger disks
  • Requires taking file system offline
  • After the fact, repair is error prone

6

M M D metadata data

slide-7
SLIDE 7

Outline

  • Problem
  • Metadata can be corrupted by bugs and existing

techniques are inadequate

  • Our Solution: Recon
  • a system for protecting metadata from bugs
  • Key idea
  • Runtime consistency checking
  • Design
  • Evaluation

7

slide-8
SLIDE 8

Runtime Consistency Checking

  • Ensure every update results in a consistent file

system

  • Makes repair unnecessary!
  • “What happens in DRAM stays in DRAM”

BUT

  • Consistency properties are global
  • Global properties require full scan
  • We can’t run fsck at every write

8

slide-9
SLIDE 9

Consistency Invariants

  • We transform global consistency properties to

fast, local consistency invariants

  • Assume initial consistent state
  • New file system is clean
  • Use checksums/redundancy to handle errors below FS
  • At runtime, check only what is changing
  • Do so before changes become persistent
  • Resulting new state is consistent

9

slide-10
SLIDE 10

size

Example: Block Allocation in Ext3

  • Ext3 maintains a block bitmap – every allocated

block is marked in the bitmap

10

Block Bitmap 5 6 7 8 9 Block 7 inode time 7 Block 8

Updated Block

8 8

Updated Block

slide-11
SLIDE 11

Example: Block Allocation in Ext3

  • Consistency Invariant
  • Invariant fails if either update is missing
  • Should not mark allocated without setting block pointer
  • Should not set block pointer without marking allocated
  • Can any consistency property be transformed?
  • File systems should maintain consistency efficiently

11

Bitmap bit X flip from “0” to “1” Block pointer set to X

slide-12
SLIDE 12

When to Check Invariants

  • Invariants involve changes to multiple blocks
  • When should they be consistent?
  • Transactions are used for crash consistency
  • Consistency can be checked at transaction

boundaries

12 Transaction

Must check transaction just before commit block reaches disk Memory Disk

slide-13
SLIDE 13

Outline

  • Problem
  • Metadata corruption cause by bugs
  • Solution
  • Recon
  • Key idea
  • Runtime checking
  • Design
  • Metadata interpretation
  • Logical change generation
  • Evaluation

13

slide-14
SLIDE 14

The Recon Design

14

Recon File System Ye Olde Disk Block Layer Metadata Write Cache Metadata Read Cache Ext3_Recon Btrfs_Recon FS Recon Interface Metadata interpretation Logical change generation

slide-15
SLIDE 15

Metadata Interpretation

  • To check invariants, we need to determine the

type of a block on a read or write

  • Take advantage of tree structure of metadata
  • Superblock is the root of the tree
  • Parents are read before children
  • For example, inode is read before indirect blocks
  • We see the pointer to the block before the block, and
  • The pointer within the parent determines the type of

the child block

15

slide-16
SLIDE 16

Logical Change Generation

  • Invariants are expressed in terms of logical

changes to structures, e.g., bitmaps, pointers

  • Recon generates these changes based on
  • Block types
  • Comparing the blocks in the write and read cache
  • Logical changes to metadata structures are

represented as a set of change records:

16

Bitmap bit X flip from “0” to “1” Block pointer set to X

[type, id, field, old, new]

slide-17
SLIDE 17

Checking with Change Records

17

type id field

  • ldval

newval inode 12 blockptr[1] 501 inode 12 i_size 4096 8192 inode 12 i_blocks 8 16 Bitmap 501

  • 1

BGD free_blocks 1500 1499

Transaction appends a new block to inode 12

Bitmap bit X flip from “0” to “1” Block pointer set to X

slide-18
SLIDE 18

Outline

  • Problem
  • Metadata corruption cause by bugs
  • Solution
  • Recon
  • Key idea
  • Runtime checking
  • Design
  • Evaluation
  • Complexity
  • Corruption detection
  • Performance overhead

18

slide-19
SLIDE 19

Complexity

  • Much simpler than FS code
  • Only need to verify result of file system operations
  • Each invariant can be checked independently
  • Code divided into three sections
  • Generic Recon framework: 1.5 kLOC
  • Ext3 metadata interpretation: 1.5kLOC
  • 31 Ext3 invariants: 800 LOC

19

slide-20
SLIDE 20

Corruption Detection

20

31 79 52 59 112 17 72 352 2 2 1 4 25 8 23 31 0% 100% Corruptions Caught Detected by both e2fsck only Recon only inode (stat) inode (blk ptr) inode (others) dir bgd bbm ibm random

Recon matches e2fsck

slide-21
SLIDE 21

Performance Evaluation

  • Used Linux port of Sun’s FileBench
  • Used 5 different emulated workloads
  • webserver, webproxy, varmail, fileserver, ms_nfs
  • ms_nfs configured to match metadata

characteristics from Microsoft study (FAST’11)

  • 3 GHz dual core Xeon CPUs, 2 GB RAM
  • 1 TB ext3 file system

21

slide-22
SLIDE 22

Performance Evaluation

22

webserver webproxy varmail fileserver ms_nfs Cache Size = 128MB

For reasonable cache sizes, performance impact is modest

slide-23
SLIDE 23

Handling Violations

Several options

  • Prevent all writes, remount read-only
  • Preserves correctness
  • Reduces availability
  • Take snapshot of filesystem and continue
  • Minimal availability impact, snapshot is correct
  • Requires repair afterwards
  • Micro-reboot file system or kernel
  • Transparent to applications
  • Overcomes transient failures

23

slide-24
SLIDE 24

Conclusion

  • All consistency properties of fsck can be

enforced on updates without full disk scan

  • Checking can be done outside the file system,

entirely at the block layer

  • Preventing corruption from being committed is a

huge win over after-the-fact repair!

24

slide-25
SLIDE 25

Thanks!

  • To our anonymous reviewers
  • To our shepherd, Junfeng Yang
  • To the Systems Software Reading Group @ U of T

For their many insightful comments & suggestions!

  • To Vivek Lakshmanan

For early insights that helped start the project!

This work was supported by NSERC through the Discovery Grants program

25