A Study of Linux File System Evolution Lanyue Lu Andrea C. - - PowerPoint PPT Presentation

a study of linux file system evolution
SMART_READER_LITE
LIVE PREVIEW

A Study of Linux File System Evolution Lanyue Lu Andrea C. - - PowerPoint PPT Presentation

A Study of Linux File System Evolution Lanyue Lu Andrea C. Arpaci-Dusseau Remzi H. Arpaci-Dusseau Shan Lu University of Wisconsin - Madison Local File Systems Are Important Local File Systems Are Important Windows Mac Linux Local File


slide-1
SLIDE 1

A Study of Linux File System Evolution

Lanyue Lu Andrea C. Arpaci-Dusseau Remzi H. Arpaci-Dusseau Shan Lu University of Wisconsin - Madison

slide-2
SLIDE 2

Local File Systems Are Important

slide-3
SLIDE 3

Local File Systems Are Important

Windows Mac Linux

slide-4
SLIDE 4

Local File Systems Are Important

Windows Mac Linux

slide-5
SLIDE 5

Local File Systems Are Important

Google GFS Hadoop DFS Windows Mac Linux

slide-6
SLIDE 6

Local File Systems Are Important

Google GFS Hadoop DFS Windows Mac Linux

slide-7
SLIDE 7

Local File Systems Are Important

Google GFS Hadoop DFS Android iPhone Windows Mac Linux

slide-8
SLIDE 8

Local File Systems Are Important

Google GFS Hadoop DFS Android iPhone Windows Mac Linux

slide-9
SLIDE 9

Why Study Is Useful ?

slide-10
SLIDE 10

Why Study Is Useful ?

Study drives system designs

➡ previous work focuses on measurements ➡ little emphasis on system evolution

slide-11
SLIDE 11

Why Study Is Useful ?

Study drives system designs

➡ previous work focuses on measurements ➡ little emphasis on system evolution

Answer important questions

➡ complexity of file systems ➡ dominant bug types ➡ performance optimizations ➡ reliability enhancements ➡ similarities across file systems

slide-12
SLIDE 12

Who Is This Study Useful To ?

slide-13
SLIDE 13

Who Is This Study Useful To ?

File system developers

➡ avoid same mistakes ➡ improve existing design and implementation

slide-14
SLIDE 14

Who Is This Study Useful To ?

File system developers

➡ avoid same mistakes ➡ improve existing design and implementation

System researchers

➡ identify problems that plague existing systems ➡ match research to reality

slide-15
SLIDE 15

Who Is This Study Useful To ?

File system developers

➡ avoid same mistakes ➡ improve existing design and implementation

System researchers

➡ identify problems that plague existing systems ➡ match research to reality

Tool builders

➡ large-scale statistical bug patterns ➡ effective bug-finding tools ➡ realistic fault injection

slide-16
SLIDE 16

How We Studied ?

slide-17
SLIDE 17

How We Studied ?

File systems are evolving

➡ code base is not static ➡ new features, bug-fixings ➡ performance and reliability improvement

slide-18
SLIDE 18

How We Studied ?

File systems are evolving

➡ code base is not static ➡ new features, bug-fixings ➡ performance and reliability improvement

Patches describe evolution

➡ how one version transforms to the next ➡ every patch is available ➡ “system archeology”

slide-19
SLIDE 19

How We Studied ?

File systems are evolving

➡ code base is not static ➡ new features, bug-fixings ➡ performance and reliability improvement

Patches describe evolution

➡ how one version transforms to the next ➡ every patch is available ➡ “system archeology”

Study with other rich information

➡ source code, design documents ➡ forum, mailing lists

slide-20
SLIDE 20

What We Did ?

slide-21
SLIDE 21

Manual patch inspection

➡ XFS, Ext4, Btrfs, Ext3, Reiserfs, JFS ➡ Linux 2.6 series ➡ 5079 patches, multiple passes

What We Did ?

slide-22
SLIDE 22

Manual patch inspection

➡ XFS, Ext4, Btrfs, Ext3, Reiserfs, JFS ➡ Linux 2.6 series ➡ 5079 patches, multiple passes

Quantitatively analyze in various aspects

➡ patch types, bug patterns and consequence ➡ performance and reliability techniques

What We Did ?

slide-23
SLIDE 23

Manual patch inspection

➡ XFS, Ext4, Btrfs, Ext3, Reiserfs, JFS ➡ Linux 2.6 series ➡ 5079 patches, multiple passes

Quantitatively analyze in various aspects

➡ patch types, bug patterns and consequence ➡ performance and reliability techniques

Provide an annotated dataset

➡ rich data for further analysis

What We Did ?

slide-24
SLIDE 24

Major Results Preview

slide-25
SLIDE 25

Bugs are prevalent

Major Results Preview

slide-26
SLIDE 26

Bugs are prevalent Semantic bugs dominate

Major Results Preview

slide-27
SLIDE 27

Bugs are prevalent Semantic bugs dominate Bugs are constant

Major Results Preview

slide-28
SLIDE 28

Bugs are prevalent Semantic bugs dominate Bugs are constant Corruption and crash are most common

Major Results Preview

slide-29
SLIDE 29

Bugs are prevalent Semantic bugs dominate Bugs are constant Corruption and crash are most common Metadata management has high bug density

Major Results Preview

slide-30
SLIDE 30

Bugs are prevalent Semantic bugs dominate Bugs are constant Corruption and crash are most common Metadata management has high bug density Failure paths are error-prone

Major Results Preview

slide-31
SLIDE 31

Bugs are prevalent Semantic bugs dominate Bugs are constant Corruption and crash are most common Metadata management has high bug density Failure paths are error-prone Various performance techniques are used

Major Results Preview

slide-32
SLIDE 32

Introduction Methodology Study Results

Outline

slide-33
SLIDE 33

Methodology

slide-34
SLIDE 34

Methodology

Ext4 Btrfs Ext3 JFS Reiser XFS

Diverse:

slide-35
SLIDE 35

Methodology

Ext4 Btrfs Ext3 JFS Reiser XFS

Linux 2.6.0 to 2.6.39

  • Dec. 2003
  • May. 2011

Diverse: Complete:

5079 Patches

slide-36
SLIDE 36

Methodology

Ext4 Btrfs Ext3 JFS Reiser XFS

Linux 2.6.0 to 2.6.39

  • Dec. 2003
  • May. 2011

Patch

Header Description Code

Diverse: Complete: Comprehensive:

5079 Patches

slide-37
SLIDE 37

Patch Header

1

[PATCH] fix possible NULL pointer in ext3/super.c.

slide-38
SLIDE 38

Patch Description

2

In fs/ext3/super.c::ext3_get_journal() at line 1675, `journal' can be NULL, but it is not handled right (detect by Coverity's checker).

slide-39
SLIDE 39

Related Code

3

  • -- /fs/ext3/super.c

+++ /fs/ext3/super.c @@ -1675,6 +1675,7 @@ journal_t *ext3_get_journal() 1 if (!journal) { 2 printk(KERN_ERR "EXT3: Could not load"); 3 iput(journal_inode); 4 } 5 journal->j_private = sb;

slide-40
SLIDE 40

Related Code

3

  • -- /fs/ext3/super.c

+++ /fs/ext3/super.c @@ -1675,6 +1675,7 @@ journal_t *ext3_get_journal() 1 if (!journal) { 2 printk(KERN_ERR "EXT3: Could not load"); 3 iput(journal_inode); 4 } 5 journal->j_private = sb;

slide-41
SLIDE 41

Related Code

3

  • -- /fs/ext3/super.c

+++ /fs/ext3/super.c @@ -1675,6 +1675,7 @@ journal_t *ext3_get_journal() 1 if (!journal) { 2 printk(KERN_ERR "EXT3: Could not load"); 3 iput(journal_inode); 4 } 5 journal->j_private = sb; return NULL;

slide-42
SLIDE 42

Classifications

slide-43
SLIDE 43

Classifications

Patch overview

➡ type: bug ➡ size: 1

slide-44
SLIDE 44

Classifications

Patch overview

➡ type: bug ➡ size: 1

Bug analysis

➡ pattern: memory (nullptr) ➡ consequence: crash ➡ data structure: super ➡ tool: Coverity

slide-45
SLIDE 45

Classifications

Patch overview

➡ type: bug ➡ size: 1

Bug analysis

➡ pattern: memory (nullptr) ➡ consequence: crash ➡ data structure: super ➡ tool: Coverity

Performance and reliability

➡ pattern ➡ location

slide-46
SLIDE 46

Limitations

slide-47
SLIDE 47

Limitations

Only six popular file systems

➡ many other file systems

slide-48
SLIDE 48

Limitations

Only six popular file systems

➡ many other file systems

Only Linux 2.6 major versions

➡ omit earlier versions

slide-49
SLIDE 49

Limitations

Only six popular file systems

➡ many other file systems

Only Linux 2.6 major versions

➡ omit earlier versions

Only reported bugs

➡ existing, but unknown bugs

slide-50
SLIDE 50

Outline

Introduction Methodology Study Results

slide-51
SLIDE 51

Questions to Answer

What do patches do ? What do bugs look like ? Do bugs diminish over time ? What consequences do bugs have ? Where does complexity of file systems lie ? Do bugs occur on normal paths ? What performance techniques are used ?

slide-52
SLIDE 52

Q1:

What do patches do ?

slide-53
SLIDE 53

Patch Overview

slide-54
SLIDE 54

Type Description Bug

Fix existing bugs

Performance Propose efficient design or implementation Reliability

Improve robustness

Feature

Add new functionality

Maintenance Maintain the code and documentation

Patch Overview

slide-55
SLIDE 55

Patch Overview

0% 20% 40% 60% 80% 100% XFS Ext4 Btrfs Ext3 Reiser JFS All 2004 1154 809 537 384 191 5079

slide-56
SLIDE 56

Patch Overview

0% 20% 40% 60% 80% 100% XFS Ext4 Btrfs Ext3 Reiser JFS All 2004 1154 809 537 384 191 5079

slide-57
SLIDE 57

Patch Overview

0% 20% 40% 60% 80% 100% XFS Ext4 Btrfs Ext3 Reiser JFS All 2004 1154 809 537 384 191 5079

slide-58
SLIDE 58

Patch Overview

0% 20% 40% 60% 80% 100% XFS Ext4 Btrfs Ext3 Reiser JFS All 2004 1154 809 537 384 191 5079

slide-59
SLIDE 59

Patch Overview

0% 20% 40% 60% 80% 100% XFS Ext4 Btrfs Ext3 Reiser JFS All 2004 1154 809 537 384 191 5079

slide-60
SLIDE 60

Patch Overview

0% 20% 40% 60% 80% 100% XFS Ext4 Btrfs Ext3 Reiser JFS All 2004 1154 809 537 384 191 5079

Bug

slide-61
SLIDE 61

Patch Overview

0% 20% 40% 60% 80% 100% XFS Ext4 Btrfs Ext3 Reiser JFS All 2004 1154 809 537 384 191 5079

Bug Performance

slide-62
SLIDE 62

Patch Overview

0% 20% 40% 60% 80% 100% XFS Ext4 Btrfs Ext3 Reiser JFS All 2004 1154 809 537 384 191 5079

Bug Performance Reliability

slide-63
SLIDE 63

Patch Overview

0% 20% 40% 60% 80% 100% XFS Ext4 Btrfs Ext3 Reiser JFS All 2004 1154 809 537 384 191 5079

Bug Performance Reliability Feature

slide-64
SLIDE 64

0% 20% 40% 60% 80% 100% XFS Ext4 Btrfs Ext3 Reiser JFS All 2004 1154 809 537 384 191 5079

Bug Performance Reliability Feature Maintenance

Patch Overview

slide-65
SLIDE 65
slide-66
SLIDE 66

45% of patches are for maintenance

slide-67
SLIDE 67

45% of patches are for maintenance 35% of patches are bug fixing

slide-68
SLIDE 68

45% of patches are for maintenance 35% of patches are bug fixing

slide-69
SLIDE 69

Q2:

What do bugs look like ?

slide-70
SLIDE 70

Bug Pattern

slide-71
SLIDE 71

Type Description Semantic

Incorrect design or implementation (e.g. incorrect state update, wrong design)

Concurrency Incorrect concurrent behavior

(e.g. miss unlock, deadlock)

Memory

Incorrect handling of memory objects (e.g. resource leak, null dereference)

Error Code

Missing or wrong error code handling (e.g. return wrong error code)

Bug Pattern

slide-72
SLIDE 72

ext3/ialloc.c, 2.6.4 find_group_other(...){ ... ... 1 2 ... ... }

Semantic Bug Example

group = parent_group + 1; for (i = 2; i < ngroups; i++) { }

slide-73
SLIDE 73

ext3/ialloc.c, 2.6.4 find_group_other(...){ ... ... 1 2 ... ... }

Semantic Bug Example

group = parent_group; for (i = 0; i < ngroups; i++) { }

slide-74
SLIDE 74

Bug Pattern

0% 20% 40% 60% 80% 100% XFS Ext4 Btrfs Ext3 Reiser JFS All 511 450 358 229 158 80 1786

Semantic

slide-75
SLIDE 75

Bug Pattern

0% 20% 40% 60% 80% 100% XFS Ext4 Btrfs Ext3 Reiser JFS All 511 450 358 229 158 80 1786

Semantic

slide-76
SLIDE 76

Bug Pattern

0% 20% 40% 60% 80% 100% XFS Ext4 Btrfs Ext3 Reiser JFS All 511 450 358 229 158 80 1786

Semantic

slide-77
SLIDE 77

ext4/extents.c, 2.6.30 ext4_ext_put_in_cache(...){ ... ... 1 cex = &EXT4_I(inode)->i_cached_extent; 2 cex->ec_FOO = FOO; }

Concurrency Bug Example

slide-78
SLIDE 78

ext4/extents.c, 2.6.30 ext4_ext_put_in_cache(...){ ... ... 1 cex = &EXT4_I(inode)->i_cached_extent; 2 cex->ec_FOO = FOO; }

Concurrency Bug Example

spin_lock(i_br_lock); spin_unlock(i_br_lock);

slide-79
SLIDE 79

Bug Pattern

0% 20% 40% 60% 80% 100% XFS Ext4 Btrfs Ext3 Reiser JFS All 511 450 358 229 158 80 1786

Semantic Concurrency

slide-80
SLIDE 80

btrfs/inode, 2.6.30 btrfs_new_inode(...){ 1 inode = new_inode(...); 2 ret = btrfs_set_inode_index(...); 3 if (ret){ 4 return ERR_PTY(ret); } }

Memory Bug Example

slide-81
SLIDE 81

btrfs/inode, 2.6.30 btrfs_new_inode(...){ 1 inode = new_inode(...); 2 ret = btrfs_set_inode_index(...); 3 if (ret){ 4 return ERR_PTY(ret); } }

Memory Bug Example

slide-82
SLIDE 82

btrfs/inode, 2.6.30 btrfs_new_inode(...){ 1 inode = new_inode(...); 2 ret = btrfs_set_inode_index(...); 3 if (ret){ 4 return ERR_PTY(ret); } }

Memory Bug Example

slide-83
SLIDE 83

btrfs/inode, 2.6.30 btrfs_new_inode(...){ 1 inode = new_inode(...); 2 ret = btrfs_set_inode_index(...); 3 if (ret){ 4 return ERR_PTY(ret); } }

Memory Bug Example

iput(inode);

slide-84
SLIDE 84

Bug Pattern

0% 20% 40% 60% 80% 100% XFS Ext4 Btrfs Ext3 Reiser JFS All 511 450 358 229 158 80 1786

Semantic Concurrency Memory

slide-85
SLIDE 85

reiserfs/xattr_acl.c, 2.6.16 reiserfs_get_acl(...) { ... ... 1 acl = posix_acl_from_disk(...); 2 *p_acl = posix_acl_dup(acl); }

Error Code Example

slide-86
SLIDE 86

reiserfs/xattr_acl.c, 2.6.16 reiserfs_get_acl(...) { ... ... 1 acl = posix_acl_from_disk(...); 2 *p_acl = posix_acl_dup(acl); }

Error Code Example

if (!IS_ERR(acl)) *p_acl = posix_acl_dup(acl);

slide-87
SLIDE 87

0% 20% 40% 60% 80% 100% XFS Ext4 Btrfs Ext3 Reiser JFS All 511 450 358 229 158 80 1786

Semantic Concurrency Memory Error Code

Bug Pattern

slide-88
SLIDE 88

55% of file-system bugs are semantic bugs

slide-89
SLIDE 89

Q3:

Do bugs diminish over time ?

slide-90
SLIDE 90

Ext3 Bug Trend

5 10 15 20 25 30 35 40 5 10 15 Linux Versions Number of Bugs

slide-91
SLIDE 91

Ext3 Bug Trend

5 10 15 20 25 30 35 40 5 10 15 Linux Versions Number of Bugs

2.6.10: block reservation 2.6.11: xttra in inode

slide-92
SLIDE 92

Ext3 Bug Trend

5 10 15 20 25 30 35 40 5 10 15 Linux Versions Number of Bugs

2.6.17: multiple block allocation

slide-93
SLIDE 93

Ext3 Bug Trend

5 10 15 20 25 30 35 40 5 10 15 Linux Versions Number of Bugs

2.6.38: miss error handling

slide-94
SLIDE 94

10 20 30 40 10 20 30 40

XFS

10 20 30 40 10 20 30 40

Ext4

10 20 30 40 20 40 60 80

Btrfs

10 20 30 40 5 10 15

Ext3

10 20 30 40 10 20 30 40

ReiserFS

10 20 30 40 5 10

JFS

Number of Bugs Linux Version

Bug Trend

slide-95
SLIDE 95

10 20 30 40 10 20 30 40

XFS

10 20 30 40 10 20 30 40

Ext4

10 20 30 40 20 40 60 80

Btrfs

10 20 30 40 5 10 15

Ext3

10 20 30 40 10 20 30 40

ReiserFS

10 20 30 40 5 10

JFS

Number of Bugs Linux Version

Bug Trend

slide-96
SLIDE 96

10 20 30 40 10 20 30 40

XFS

10 20 30 40 10 20 30 40

Ext4

10 20 30 40 20 40 60 80

Btrfs

10 20 30 40 5 10 15

Ext3

10 20 30 40 10 20 30 40

ReiserFS

10 20 30 40 5 10

JFS

Number of Bugs Linux Version

Bug Trend

slide-97
SLIDE 97

10 20 30 40 10 20 30 40

XFS

10 20 30 40 10 20 30 40

Ext4

10 20 30 40 20 40 60 80

Btrfs

10 20 30 40 5 10 15

Ext3

10 20 30 40 10 20 30 40

ReiserFS

10 20 30 40 5 10

JFS

Number of Bugs Linux Version

Bug Trend

slide-98
SLIDE 98

10 20 30 40 10 20 30 40

XFS

10 20 30 40 10 20 30 40

Ext4

10 20 30 40 20 40 60 80

Btrfs

10 20 30 40 5 10 15

Ext3

10 20 30 40 10 20 30 40

ReiserFS

10 20 30 40 5 10

JFS

Number of Bugs Linux Version Semantic Concurrency Memory Error Code

slide-99
SLIDE 99

10 20 30 40 10 20 30 40

XFS

10 20 30 40 10 20 30 40

Ext4

10 20 30 40 20 40 60 80

Btrfs

10 20 30 40 5 10 15

Ext3

10 20 30 40 10 20 30 40

ReiserFS

10 20 30 40 5 10

JFS

Number of Bugs Linux Version Semantic Concurrency Memory Error Code

2.6.33: remove BKL

slide-100
SLIDE 100

Bug-fixing is a Constant in a file system’s lifetime

slide-101
SLIDE 101

Q4:

What consequences do file-system bugs have ?

slide-102
SLIDE 102

Bug Consequence

slide-103
SLIDE 103

Type Description Corruption

On-disk or in-memory data is corrupted

Crash

File system becomes unusable

Error

Unexpected operation failure or error code

Deadlock

Wait for resources in circular chain

Hang

File system makes no progress

Leak

Resources are not freed properly

Wrong

Diverts from expectation (exclude above)

Bug Consequence

slide-104
SLIDE 104

Bug Consequence

0% 20% 40% 60% 80% 100% XFS Ext4 Btrfs Ext3 Reiser JFS All 525 461 366 235 166 80 1833

Corruption

slide-105
SLIDE 105

Bug Consequence

0% 20% 40% 60% 80% 100% XFS Ext4 Btrfs Ext3 Reiser JFS All 525 461 366 235 166 80 1833

Corruption Crash

slide-106
SLIDE 106

Bug Consequence

0% 20% 40% 60% 80% 100% XFS Ext4 Btrfs Ext3 Reiser JFS All 525 461 366 235 166 80 1833

Corruption Crash Error

slide-107
SLIDE 107

Bug Consequence

0% 20% 40% 60% 80% 100% XFS Ext4 Btrfs Ext3 Reiser JFS All 525 461 366 235 166 80 1833

Corruption Crash Error Deadlock

slide-108
SLIDE 108

Bug Consequence

0% 20% 40% 60% 80% 100% XFS Ext4 Btrfs Ext3 Reiser JFS All 525 461 366 235 166 80 1833

Corruption Crash Error Deadlock Hang

slide-109
SLIDE 109

Bug Consequence

0% 20% 40% 60% 80% 100% XFS Ext4 Btrfs Ext3 Reiser JFS All 525 461 366 235 166 80 1833

Corruption Crash Error Deadlock Hang Leak

slide-110
SLIDE 110

Bug Consequence

0% 20% 40% 60% 80% 100% XFS Ext4 Btrfs Ext3 Reiser JFS All 525 461 366 235 166 80 1833

Corruption Crash Error Deadlock Hang Leak Wrong

slide-111
SLIDE 111

Corruption and Crash are most common

slide-112
SLIDE 112

Q5:

Does each logical component have an equal degree of complexity ?

slide-113
SLIDE 113

Components

slide-114
SLIDE 114

Type Description balloc

Data block allocation and deallocation

dir

Directory management

extent

Contiguous physical blocks mapping

file

File read and write operations

inode

Inode-related metadata management

transaction

Journaling or other transactional support

super

Superblock-related metadata management

tree

Generic tree structure procedures

  • ther

Other supporting components (e.g., xattr)

Components

slide-115
SLIDE 115

Correlation

0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4

XFS

0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4

Ext4

0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4

Btrfs

0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4

Ext3

0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4

ReiserFS

0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4

JFS

Percentage of Bugs Percentage of Code

slide-116
SLIDE 116

0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4

XFS

0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4

Ext4

0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4

Btrfs

0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4

Ext3

0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4

ReiserFS

0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4

JFS

Percentage of Bugs Percentage of Code file inode super

Correlation

slide-117
SLIDE 117

0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4

XFS

0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4

Ext4

0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4

Btrfs

0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4

Ext3

0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4

ReiserFS

0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4

JFS

Percentage of Bugs Percentage of Code file inode super trans

Correlation

slide-118
SLIDE 118

0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4

XFS

0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4

Ext4

0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4

Btrfs

0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4

Ext3

0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4

ReiserFS

0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4

JFS

Percentage of Bugs Percentage of Code file inode super trans tree

Correlation

slide-119
SLIDE 119

0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4

XFS

0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4

Ext4

0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4

Btrfs

0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4

Ext3

0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4

ReiserFS

0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4

JFS

Percentage of Bugs Percentage of Code file balloc inode dir super extent trans

  • ther

tree

Correlation

slide-120
SLIDE 120
slide-121
SLIDE 121

Metadata management

has high bug density

slide-122
SLIDE 122

Metadata management

has high bug density

Tree related code is not

particularly error-prone

slide-123
SLIDE 123

Q6:

Do bugs occur at normal paths or

failure paths ?

slide-124
SLIDE 124

Failure Path

slide-125
SLIDE 125

Failure Path

A wide range of failures

➡ resource allocation failure ➡ I/O operation failure ➡ silent data corruption ➡ incorrect system states

slide-126
SLIDE 126

Failure Path

A wide range of failures

➡ resource allocation failure ➡ I/O operation failure ➡ silent data corruption ➡ incorrect system states

Unique code style

➡ goto statement ➡ error code propagation

slide-127
SLIDE 127

ext4/resize.c, 2.6.25 ext4_group_extend( … ) { ... ... 1 if (count != ext4_blocks_count(es)){ 2 ext4_warning("multiple resizers

run on filesystem !"); 3 err = -EBUSY; 4 goto exit_put;

} }

A Semantic Bug on Failure Path

slide-128
SLIDE 128

ext4/resize.c, 2.6.25 ext4_group_extend( … ) { ... ... 1 if (count != ext4_blocks_count(es)){ 2 ext4_warning("multiple resizers

run on filesystem !"); 3 err = -EBUSY; 4 goto exit_put;

} }

A Semantic Bug on Failure Path

slide-129
SLIDE 129

ext4/resize.c, 2.6.25 ext4_group_extend( … ) { ... ... 1 if (count != ext4_blocks_count(es)){ 2 ext4_warning("multiple resizers

run on filesystem !"); 3 err = -EBUSY; 4 goto exit_put;

} }

A Semantic Bug on Failure Path

ext4_journal_stop(handle);

slide-130
SLIDE 130

ext4/inode.c, 2.6.22 ext4_read_inode(struct inode * inode) { ... ... 1 if (inode_is_bad) { 2 goto bad_inode; } }

A Memory Bug on Failure Path

slide-131
SLIDE 131

ext4/inode.c, 2.6.22 ext4_read_inode(struct inode * inode) { ... ... 1 if (inode_is_bad) { 2 goto bad_inode; } }

A Memory Bug on Failure Path

slide-132
SLIDE 132

ext4/inode.c, 2.6.22 ext4_read_inode(struct inode * inode) { ... ... 1 if (inode_is_bad) { 2 goto bad_inode; } }

A Memory Bug on Failure Path

brelse(bh);

slide-133
SLIDE 133

Bugs on Failure Paths

0% 20% 40% 60% 80% 100% XFS Ext4 Btrfs Ext3 Reiser JFS All 200 149 144 88 63 28 672

Percentage of total bugs

slide-134
SLIDE 134

38% of bugs are on failure paths

slide-135
SLIDE 135

Q7:

What performance techniques are used by file systems ?

slide-136
SLIDE 136

Performance

slide-137
SLIDE 137

Type Description Synchronization Improve synchronization efficiency Access Optimization

Apply smarter access strategies

Scheduling

Improve I/O operations scheduling

Scalability

Scale on-disk and in-memory structures

Locality

Overcome sub-optimal block allocation

Other

Other performance improvement (e.g., reducing function stack usage)

Performance

slide-138
SLIDE 138

ext4/extents.c, 2.6.31 ext4_fiemap(...){ 1 down_write(&EXT4_I(inode)->sem); 2 error = ext4_ext_walk_space(...); 3 up_write(&EXT4_I(inode)->sem); }

Synchronization Example

slide-139
SLIDE 139

ext4/extents.c, 2.6.31 ext4_fiemap(...){ 1 down_write(&EXT4_I(inode)->sem); 2 error = ext4_ext_walk_space(...); 3 up_write(&EXT4_I(inode)->sem); }

Synchronization Example

slide-140
SLIDE 140

ext4/extents.c, 2.6.31 ext4_fiemap(...){ 1 down_write(&EXT4_I(inode)->sem); 2 error = ext4_ext_walk_space(...); 3 up_write(&EXT4_I(inode)->sem); }

Synchronization Example

down_read(&EXT4_I(inode)->sem); up_read(&EXT4_I(inode)->sem);

slide-141
SLIDE 141

Performance

0% 20% 40% 60% 80% 100% XFS Ext4 Btrfs Ext3 Reiser JFS All 135 80 126 41 24 9 415

Sync

slide-142
SLIDE 142

btrfs/free-space-cache.c, 2.6.39 btrfs_find_space_cluster( … ) /* start to search for blocks */

Access Optimization Example

slide-143
SLIDE 143

btrfs/free-space-cache.c, 2.6.39 btrfs_find_space_cluster( … ) /* start to search for blocks */

Access Optimization Example

if (free_space < min_bytes) { spin_unlock(&tree_lock); return -ENOSPC; }

slide-144
SLIDE 144

Performance

0% 20% 40% 60% 80% 100% XFS Ext4 Btrfs Ext3 Reiser JFS All 135 80 126 41 24 9 415

Sync Access Opt

slide-145
SLIDE 145

Performance

0% 20% 40% 60% 80% 100% XFS Ext4 Btrfs Ext3 Reiser JFS All 135 80 126 41 24 9 415

Sync Access Opt Scheduling

slide-146
SLIDE 146

Performance

0% 20% 40% 60% 80% 100% XFS Ext4 Btrfs Ext3 Reiser JFS All 135 80 126 41 24 9 415

Sync Access Opt Scheduling Scalability

slide-147
SLIDE 147

Performance

0% 20% 40% 60% 80% 100% XFS Ext4 Btrfs Ext3 Reiser JFS All 135 80 126 41 24 9 415

Sync Access Opt Scheduling Scalability Locality

slide-148
SLIDE 148

Performance

0% 20% 40% 60% 80% 100% XFS Ext4 Btrfs Ext3 Reiser JFS All 135 80 126 41 24 9 415

Sync Access Opt Scheduling Scalability Locality Other

slide-149
SLIDE 149

A wide variety of

same performance techniques

exist in all file systems

slide-150
SLIDE 150

Results Summary

slide-151
SLIDE 151

Bugs are prevalent Semantic bugs dominate Bugs are constant Corruption and crash are common Metadata management is complex Failure paths are error-prone Diverse performance techniques

Results Summary

slide-152
SLIDE 152

Lessons Learned

slide-153
SLIDE 153

Lessons Learned

A large-scale study is feasible and valuable

➡ time consuming, but still manageable ➡ similar study for other OS components ➡ new research opportunities

slide-154
SLIDE 154

Lessons Learned

A large-scale study is feasible and valuable

➡ time consuming, but still manageable ➡ similar study for other OS components ➡ new research opportunities

Research should match reality

➡ new tools for semantic bugs ➡ more attention for failure paths

slide-155
SLIDE 155

Lessons Learned

A large-scale study is feasible and valuable

➡ time consuming, but still manageable ➡ similar study for other OS components ➡ new research opportunities

Research should match reality

➡ new tools for semantic bugs ➡ more attention for failure paths

History repeats itself

➡ similar mistakes ➡ similar performance and reliability techniques ➡ learn from history for a better future

slide-156
SLIDE 156

Resources

slide-157
SLIDE 157

Resources

More information in paper

➡ detailed bug patterns ➡ reliability patches ➡ common patches across file systems

slide-158
SLIDE 158

Resources

More information in paper

➡ detailed bug patterns ➡ reliability patches ➡ common patches across file systems

Even more information in dataset

➡ fix-on-fix ➡ detailed bug consequences ➡ tools used

slide-159
SLIDE 159

Resources

More information in paper

➡ detailed bug patterns ➡ reliability patches ➡ common patches across file systems

Even more information in dataset

➡ fix-on-fix ➡ detailed bug consequences ➡ tools used

Our dataset is released

➡ http://research.cs.wisc.edu/wind/Traces/fs-patch/