Cross-checking Semantic Correctness: The Case of Finding File System - - PowerPoint PPT Presentation

cross checking semantic correctness the case of finding
SMART_READER_LITE
LIVE PREVIEW

Cross-checking Semantic Correctness: The Case of Finding File System - - PowerPoint PPT Presentation

Cross-checking Semantic Correctness: The Case of Finding File System Bugs Changwoo Min , Sanidhya Kashyap, Byoungyoung Lee, Chengyu Song, Taesoo Kim Georgia Institute of Technology School of Computer Science Two promising approaches to make


slide-1
SLIDE 1

Cross-checking Semantic Correctness: The Case of Finding File System Bugs

Changwoo Min, Sanidhya Kashyap, Byoungyoung Lee, Chengyu Song, Taesoo Kim

Georgia Institute of Technology School of Computer Science

slide-2
SLIDE 2

2

Two promising approaches to make bug-free software

  • Formal proof

require “proof” →

– Guarantee high-level invariants (e.g., functional correctness)

  • Model checking

require “model” →

– Check if code fjts with domain model (e.g., locking rules)

slide-3
SLIDE 3

3

Two promising approaches to make bug-free software

  • Formal proof

require “proof” →

– Guarantee high-level invariants (e.g., functional correctness)

  • Model checking

require “model” →

– Check if code fjts with domain model (e.g., locking rules)

In practice, many software are (already) built without such theories

slide-4
SLIDE 4

4

There exist many similar implementations of a program

  • File systems: >50 implementations in Linux
  • JavaScript: ECMAScript, V8, SpiderMonkey, etc
  • POSIX C Library: Gnu Libc, FreeBSD, eLibc, etc

Without proof or model, can we leverage these existing implementations?

slide-5
SLIDE 5

5

There exist many similar implementations of a program

  • File systems: >50 implementations in Linux
  • JavaScript: ECMAScript, V8, SpiderMonkey, etc
  • POSIX C Library: Gnu Libc, FreeBSD, eLibc, etc

Without proof or model, can we leverage these existing implementations?

slide-6
SLIDE 6

6

File system bugs are critical

2013-01-07

slide-7
SLIDE 7

7

File system bugs are critical

2013-01-07 2014-10-17

slide-8
SLIDE 8

8

File system bugs are critical

2013-01-07 2014-10-17 2015-03-19

slide-9
SLIDE 9

9

A majority of bugs in fjle systems are hard to detect

Semantic bugs:

Incorrect condition check Incorrect statue update Incorrect argument Incorrect error code ...

Memory bugs: NULL dereference Use-after-free ...

87.4%

12.6%

slide-10
SLIDE 10

10

A majority of bugs in fjle systems are hard to detect

Semantic bugs:

Incorrect condition check Incorrect statue update Incorrect argument Incorrect error code ...

Memory bugs: NULL dereference Use-after-free ...

87.4%

12.6%

slide-11
SLIDE 11

11

Example of semantic bug: Missing capability check in OCFS2

  • cfs2: trusted xattr missing CAP_SYS_ADMIN check

Signed-ofg-by: Sanidhya Kashyap <sanidhya@gatech.edu> ...

@@ static size_t ocfs2_xattr_trusted_list + if (!capable(CAP_SYS_ADMIN)) + return 0;

slide-12
SLIDE 12

12

Example of semantic bug: Missing capability check in OCFS2

  • cfs2: trusted xattr missing CAP_SYS_ADMIN check

Signed-ofg-by: Sanidhya Kashyap <sanidhya@gatech.edu> ...

@@ static size_t ocfs2_xattr_trusted_list + if (!capable(CAP_SYS_ADMIN)) + return 0;

Can we fjnd this bug by leveraging

  • ther implementations?
slide-13
SLIDE 13

13

A majority of fjle system already implemented capability check

  • cfs2: trusted xattr missing CAP_SYS_ADMIN check

Signed-ofg-by: Sanidhya Kashyap <sanidhya@gatech.edu> ...

@@ static size_t ocfs2_xattr_trusted_list + if (!capable(CAP_SYS_ADMIN)) + return 0;

  • ext2

static size_t ext2_xattr_trusted_list() if (!capable(CAP_SYS_ADMIN)) return 0;

  • XFS

static size_t xfs_xattr_put_listent() if ((fmags & XFS_ATTR_ROOT) && !capable(CAP_SYS_ADMIN)) return 0;

  • ext4

static size_t ext4_xattr_trusted_list() if (!capable(CAP_SYS_ADMIN)) return 0;

...

slide-14
SLIDE 14

14

A majority of fjle system already implemented capability check

  • cfs2: trusted xattr missing CAP_SYS_ADMIN check

Signed-ofg-by: Sanidhya Kashyap <sanidhya@gatech.edu> ...

@@ static size_t ocfs2_xattr_trusted_list + if (!capable(CAP_SYS_ADMIN)) + return 0;

  • ext2

static size_t ext2_xattr_trusted_list() if (!capable(CAP_SYS_ADMIN)) return 0;

  • XFS

static size_t xfs_xattr_put_listent() if ((fmags & XFS_ATTR_ROOT) && !capable(CAP_SYS_ADMIN)) return 0;

  • ext4

static size_t ext4_xattr_trusted_list() if (!capable(CAP_SYS_ADMIN)) return 0;

...

Deviant implementation → potential bugs?

slide-15
SLIDE 15

15

A majority of fjle system already implemented capability check

  • cfs2: trusted xattr missing CAP_SYS_ADMIN check

Signed-ofg-by: Sanidhya Kashyap <sanidhya@gatech.edu> ...

@@ static size_t ocfs2_xattr_trusted_list + if (!capable(CAP_SYS_ADMIN)) + return 0;

  • ext2

static size_t ext2_xattr_trusted_list() if (!capable(CAP_SYS_ADMIN)) return 0;

  • XFS

static size_t xfs_xattr_put_listent() if ((fmags & XFS_ATTR_ROOT) && !capable(CAP_SYS_ADMIN)) return 0;

  • ext4

static size_t ext4_xattr_trusted_list() if (!capable(CAP_SYS_ADMIN)) return 0;

...

A new bug we found It has been hidden for 6 years

Deviant implementation → potential bugs?

slide-16
SLIDE 16

16

Case study: Write a page

  • Each fjle system defjnes how to write a page
  • Semantic of writepage()

– Success

return locked page →

– Failure

return unlocked page →

  • Document/fjlesystems/vfs.txt specifjes such rule

– Hard to detect without domain knowledge

What if 99% fjle systems follow above pattern, but not one fjle system? bug?

slide-17
SLIDE 17

17

Our approach can reveal such bugs without domain specifjc knowledge

  • 52 fjle systems follow the locking rules
  • But 2 fjle systems don't (Ceph and AFFS)
  • ------------------------------- fs/ceph/addr.c --------------------------------

index fd5599d..e723482 100644 @@ static int ceph_write_begin + if (r < 0) + page_cache_release(page); + else + *pagep = page;

slide-18
SLIDE 18

18

Our approach can reveal such bugs without domain specifjc knowledge

  • 52 fjle systems follow the locking rules
  • But 2 fjle systems don't (Ceph and AFFS)
  • ------------------------------- fs/ceph/addr.c --------------------------------

index fd5599d..e723482 100644 @@ static int ceph_write_begin + if (r < 0) + page_cache_release(page); + else + *pagep = page;

We found 3 bugs in 2 fjle systems Hidden for over 5 years

slide-19
SLIDE 19

19

Our approach in fjnding bugs

Idea:

Find deviant ones as potential bugs

Intuition:

Bugs are rare Majority of implementations is correct

slide-20
SLIDE 20

20

Our approach is promising in fjnding semantic bugs (Example: fjle systems)

  • New semantics bugs

– 118 new bugs in 54 fjle systems

  • Critical bugs

– System crash, data corruption, deadlock, etc

  • Bugs diffjcult to fjnd

– Bugs were hidden for 6.2 years on average

  • Various kinds of bugs

– Condition check, argument use, return value, locking, etc

slide-21
SLIDE 21

21

Technical challenges

  • All software are difgerent one way or another

– e.g., disk layout in fjle system

  • How to compare difgerent implementation?

– Q1: Where to start? – Q2: What to compare? – Q3: How to compare?

slide-22
SLIDE 22

22

Juxta: the case of fjle system

  • All fjle systems should follow VFS API in Linux

– e.g., vfs_rename() in each fjle system

  • How to compare difgerent fjle systems?

– Q1: Where to start?

VFS entries in fjle system →

– Q2: What to compare?

symbolic environment →

– Q3: How to compare?

statistical comparison →

slide-23
SLIDE 23

23

Juxta overview

File System Source Code Per-Filesystem Path Database Statistical Path Comparison Symbolic Execution

Juxta

slide-24
SLIDE 24

24

Juxta overview

File System Source Code Per-Filesystem Path Database Statistical Path Comparison Symbolic Execution

Juxta

Path Condition Checker Argument Checker

...

7 Checkers

slide-25
SLIDE 25

25

Comparing multiple fjle systems

  • Q1: Where to start?

– Identifying semantically similar entry points

  • Q2: What to compare?

– Building per-path symbolic environment

  • Q3: How to compare?

– Statistically comparing each path

slide-26
SLIDE 26

26

Comparing multiple fjle systems

  • Q1: Where to start?

– Identifying semantically similar entry points

  • Q2: What to compare?

– Building per-path symbolic environment

  • Q3: How to compare?

– Statistically comparing each path

slide-27
SLIDE 27

27

Identifying semantically similar entry points

  • Linux Virtual File System (VFS)

– Use common data structures and behavior

(e.g., inode and page cache)

– Defjne fjlesystem-specifjc interfaces

(e.g., open, rename)

slide-28
SLIDE 28

28

Example: inode_operations rename() →

Compare *_rename() to fjnd deviant rename() implementations.

struct inode_operations { int (*rename) (struct inode *, ...); int (*create) (struct inode *,...); int (*unlink) (struct inode *,..); int (*mkdir) (struct inode *,...); };

slide-29
SLIDE 29

29

Example: inode_operations rename() →

Compare *_rename() to fjnd deviant rename() implementations.

struct inode_operations { int (*rename) (struct inode *, ...); int (*create) (struct inode *,...); int (*unlink) (struct inode *,..); int (*mkdir) (struct inode *,...); }; btrfs_rename(...); ext4_rename(...); xfs_vn_rename(…); ...

slide-30
SLIDE 30

30

Comparing multiple fjle systems

  • Q1: Where to start?

– Identifying semantically similar entry points

  • Q2: What to compare?

– Building per-path symbolic environment

  • Q3: How to compare?

– Statistically comparing each path

slide-31
SLIDE 31

31

Building per-path symbolic environment

  • Context/fmow-sensitive symbolic execution

– C language level – Build symbolic environment per path

(e.g., path cond, return values, side-efgect, function calls)

  • Key idea: return-oriented comparison

– Error codes represent per-path semantics

(e.g., comparing all paths returning EACCES in rename() implementations)

slide-32
SLIDE 32

32

int foo_rename(int fmag) { if (fmag == RO) return -EACCES; inode fmag = fmag; → kmalloc(…, GFP_NOFS) return SUCCESS; }

Example: Per-path symbolic environment

Execution Path Information

slide-33
SLIDE 33

33

int foo_rename(int fmag) { if (fmag == RO) return -EACCES; inode fmag = fmag; → kmalloc(…, GFP_NOFS) return SUCCESS; }

Example: Per-path symbolic environment

Execution Path Information

slide-34
SLIDE 34

34

int foo_rename(int fmag) { if (fmag == RO) return -EACCES; inode fmag = fmag; → kmalloc(…, GFP_NOFS) return SUCCESS; }

Example: Per-path symbolic environment

Execution Path Information

Condition fmag: !RO

slide-35
SLIDE 35

35

int foo_rename(int fmag) { if (fmag == RO) return -EACCES; inode fmag = fmag; → kmalloc(…, GFP_NOFS) return SUCCESS; }

Example: Per-path symbolic environment

Execution Path Information

Condition fmag: !RO Side-efgect inode fmag = fmag →

slide-36
SLIDE 36

36

int foo_rename(int fmag) { if (fmag == RO) return -EACCES; inode fmag = fmag; → kmalloc(…, GFP_NOFS) return SUCCESS; }

Example: Per-path symbolic environment

Execution Path Information

Condition fmag: !RO Side-efgect inode fmag = fmag → Call kmalloc(…, GFP_NOFS)

slide-37
SLIDE 37

37

int foo_rename(int fmag) { if (fmag == RO) return -EACCES; inode fmag = fmag; → kmalloc(…, GFP_NOFS) return SUCCESS; }

Example: Per-path symbolic environment

Execution Path Information

Condition fmag: !RO Side-efgect inode fmag = fmag → Call kmalloc(…, GFP_NOFS) Return SUCCESS

slide-38
SLIDE 38

38

Constructing path database

Per-Filesystem Path Database

  • 54 fjle systems (680K LoC)
  • 8 Million paths (300 GB)
  • Took 3 hours to generate
  • n our 80-core machine

ext4 ... btrfs ... xfs ... ...

slide-39
SLIDE 39

39

Comparing multiple fjle systems

  • Q1: Where to start?

– Identifying semantically similar entry points

  • Q2: What to compare?

– Building per-path symbolic environment

  • Q3: How to compare?

– Statistically comparing each path

slide-40
SLIDE 40

40

Two types of per-path symbolic data

ext4_rename btrfs_rename xfs_rename ... ... ... ...

  • Range data (or symbolic constraint)

– What is the range of argument for this execution path?

e.g., path condition, return value, etc.

  • Occurrences

– How many times a particular API fmag is used?

e.g., API argument usage, error handling, etc.

slide-41
SLIDE 41

41

Two types of per-path symbolic data

ext4_rename btrfs_rename xfs_rename ... ... ... ...

fmag: !RO fmag: !RO fmag: WO

  • Range data (or symbolic constraint)

– What is the range of argument for this execution path?

e.g., path condition, return value, etc.

  • Occurrences

– How many times a particular API fmag is used?

e.g., API argument usage, error handling, etc.

slide-42
SLIDE 42

42

Two types of per-path symbolic data

ext4_rename btrfs_rename xfs_rename ... ... ... ...

f(NOFS) f(NOFS) f(KERNEL)

  • Range data (or symbolic constraint)

– What is the range of argument for this execution path?

e.g., path condition, return value, etc.

  • Occurrences

– How many times a particular API fmag is used?

e.g., API argument usage, error handling, etc.

slide-43
SLIDE 43

43

Two statistical comparison methods

  • For range data

Histogram-based comparison →

– Compare range data and fjnd deviant sub-ranges

  • For occurrences

Entropy-based comparison →

– Find deviation in event occurrences

slide-44
SLIDE 44

44

Histogram-based comparison

  • 1. Represent range data

histogram (see our paper) →

  • 2. Build a representative histogram

average histograms →

– High rank frequently used common patterns (e.g., VFS) – Low rank specifjc implementations of fjle systems

  • 3. Measure distance between histograms

– Sum up the sizes of non-overlapping area

slide-45
SLIDE 45

45

Example: Path condition checker

int foo_rename(fmag) { if (fmag == RO) return -EACCES; } int bar_rename(fmag) { if (fmag == RO) return -EACCES; } int cad_rename(fmag) { if (fmag == WO) return -EACCES; }1

foo bar cad

Let's compare *_rename()

  • n -EACCES path
slide-46
SLIDE 46

46

Example: Path condition checker

int foo_rename(fmag) { if (fmag == RO) return -EACCES; } int bar_rename(fmag) { if (fmag == RO) return -EACCES; } int cad_rename(fmag) { if (fmag == WO) return -EACCES; }1

foo bar cad

Let's compare *_rename()

  • n -EACCES path
slide-47
SLIDE 47

47

Represent range data histogram →

int foo_rename(fmag) { if (fmag == RO) return -EACCES; } int bar_rename(fmag) { if (fmag == RO) return -EACCES; } int cad_rename(fmag) { if (fmag == WO) return -EACCES; }

foo bar cad fmag 1.0 fmag 1.0 fmag

RO WO

1.0

RO WO RO WO

slide-48
SLIDE 48

48

Build a representative histogram

foo_rename bar_rename cad_rename

∑/3

fmag 0.3 fmag 1.0 fmag 1.0 fmag 1.0 0.7

RO WO RO WO RO WO RO WO

VFS Histogram: vfs_rename

slide-49
SLIDE 49

49

Build a representative histogram

foo_rename bar_rename cad_rename

∑/3

fmag 0.3 fmag 1.0 fmag 1.0 fmag 1.0 0.7

RO WO RO WO RO WO RO WO

VFS Histogram: vfs_rename

Increase commonality

slide-50
SLIDE 50

50

Build a representative histogram

foo_rename bar_rename cad_rename

∑/3

fmag 0.3 fmag 1.0 fmag 1.0 fmag 1.0 0.7

RO WO RO WO RO WO RO WO

VFS Histogram: vfs_rename

Increase commonality Decrease uncommonality

slide-51
SLIDE 51

51

Measure distance between histograms

fmag fmag 1.0 foo_rename vfs_rename

RO WO RO WO

0.3 0.7

slide-52
SLIDE 52

52

Measure distance between histograms

fmag fmag 1.0 foo_rename vfs_rename

RO WO RO WO

0.3 0.7 fmag

slide-53
SLIDE 53

53

Measure distance between histograms

fmag fmag 1.0 foo_rename vfs_rename

RO WO RO WO

0.3 0.7 fmag

Non-overlapping regions = 0.3 + 0.3 = 0.6

slide-54
SLIDE 54

54

Histogram distance

fmag 1.0 fmag 1.0 fmag 1.0 distance(foo, VFS) = 0.6 distance(bar, VFS) = 0.6 distance(cad, VFS) = 1.2 foo_rename bar_rename cad_rename

RO WO RO WO RO WO

slide-55
SLIDE 55

55

Ranking based on distance

Distance

1.2 0.6 0.6

Reason

cad foo bar

int foo_rename(fmag) { if (fmag == RO) return -EACCES; } int bar_rename(fmag) { if (fmag == RO) return -EACCES; } int cad_rename(fmag) { if (fmag == WO) return -EACCES; }1

fmag

RO WO

1.0

Missing check: fmag == RO

slide-56
SLIDE 56

56

Ranking based on distance

Distance

1.2 0.6 0.6

Reason

cad foo bar

int foo_rename(fmag) { if (fmag == RO) return -EACCES; } int bar_rename(fmag) { if (fmag == RO) return -EACCES; } int cad_rename(fmag) { if (fmag == WO) return -EACCES; }1

fmag

RO WO

1.0 Larger distance more deviant →

Missing check: fmag == RO

slide-57
SLIDE 57

57

Ranking based on distance

Distance

1.2 0.6 0.6

Reason

cad foo bar

int foo_rename(fmag) { if (fmag == RO) return -EACCES; } int bar_rename(fmag) { if (fmag == RO) return -EACCES; } int cad_rename(fmag) { if (fmag == WO) return -EACCES; }1

fmag

RO WO

1.0 Larger distance more deviant → We found 59 new semantic bugs using histogram-based comparison

Missing check: fmag == RO

slide-58
SLIDE 58

58

Two statistical comparison methods

  • For range data

Histogram-based comparison →

– Compare range data and fjnd deviant sub-ranges

  • For occurrences

Entropy-based comparison →

– Find deviation in event occurrences

slide-59
SLIDE 59

59

Entropy-based comparison

  • Find deviation in event occurrence

– Function argument, return value handling, etc.

  • Shannon Entropy

Entropy Event A : Event B

slide-60
SLIDE 60

60

Entropy-based comparison

  • Find deviation in event occurrence

– Function argument, return value handling, etc.

  • Shannon Entropy

Entropy Event A : Event B 100 : 0

  • r 0 : 100
slide-61
SLIDE 61

61

Entropy-based comparison

  • Find deviation in event occurrence

– Function argument, return value handling, etc.

  • Shannon Entropy

Entropy Event A : Event B 50 : 50 100 : 0

  • r 0 : 100
slide-62
SLIDE 62

62

Entropy-based comparison

  • Find deviation in event occurrence

– Function argument, return value handling, etc.

  • Shannon Entropy

Entropy Event A : Event B 50 : 50 100 : 0

  • r 0 : 100

98:2 or 2:98

slide-63
SLIDE 63

63

Example: Argument checker

  • Inferring API usage patterns

– e.g., kmalloc() in fjle system

→ GFP_NOFS to avoid deadlock

  • Without any special knowledge, the argument

checker can statically identify incorrect uses of API fmags in fjle systems

slide-64
SLIDE 64

64

Calculating entropy of GFP fmag usages in fjle systems

VFS entry inode set_acl() → fjle read() → fjle write() → GFP_KERNEL GFP_NOFS Entropy 0.97 0.97 0.14 60 40 40 60 2 98

slide-65
SLIDE 65

65

Calculating entropy of GFP fmag usages in fjle systems

VFS entry inode set_acl() → fjle read() → fjle write() → GFP_KERNEL GFP_NOFS Entropy 0.97 0.97 0.14 60 40 40 60 2 98

slide-66
SLIDE 66

66

Ranking based on entropy

VFS entry GFP_KERNEL GFP_NOFS Entropy 0.97 0.97 60 40 40 60 inode set_acl() → fjle read() → 0.14 2 98 fjle write() →

slide-67
SLIDE 67

67

Ranking based on entropy

VFS entry GFP_KERNEL GFP_NOFS Entropy 0.97 0.97 60 40 40 60 inode set_acl() → fjle read() → 0.14 2 98 fjle write() → Smaller entropy more deviant →

slide-68
SLIDE 68

68

Ranking based on entropy

VFS entry GFP_KERNEL GFP_NOFS Entropy 0.97 0.97 60 40 40 60 inode set_acl() → fjle read() → 0.14 2 98 fjle write() → Smaller entropy more deviant → We found 59 new semantic bugs using entropy-based comparison

slide-69
SLIDE 69

69

Specialized Checkers for Specifjc Types of Semantic Bugs

Path Condition Checker Return Code Checker Function Call Checker Side-efgect Checker Argument Checker Error Handling Checker Lock Checker Spec. Generator Per-Filesystem Path Database Statistical Path Comparison

Juxta 7 Checkers

Entropy-based Histogram-based

slide-70
SLIDE 70

70

Implementation of Juxta

  • 12K LoC in total

– Symbolic path explorer

6K lines of C/C++ (Clang 3.6) →

– Tools and library

3K lines of Python →

– Checkers

3K lines of Python →

  • VFS entry database

Linux kernel 4.0-rc2 →

slide-71
SLIDE 71

71

Evaluation questions

  • How efgective is Juxta in fjnding new bugs?
  • What types of semantic bugs can Juxta fjnd?
  • How complete is Juxta's approach?
  • How efgective is Juxta's ranking scheme?
slide-72
SLIDE 72

72

Juxta found 118 bugs in 54 fjle systems

Checker # reports # manually verifjed reports New bugs Return code 573 150 2 Side-efgect 389 150 6 Function call 521 100 5 Path condition 470 150 46 Argument 56 10 4 Error handling 242 100 47 Lock 131 50 8 Total 2,382 710

118

slide-73
SLIDE 73

73

Juxta found 7 types of new semantic bugs

Checker # reports # manually verifjed reports New bugs Return code 573 150 2 Side-efgect 389 150 6 Function call 521 100 5 Path condition 470 150 46 Argument 56 10 4 Error handling 242 100 47 Lock 131 50 8 Total 2,382 710

118

slide-74
SLIDE 74

74

Juxta found most known bugs

  • Test case

– 21 known fjle system semantic bugs from PatchDB

[Lu:FAST12]

– Synthesize them to the Linux Kernel 4.0-rc2

  • Juxta found 19 out of 21 bugs
  • 2 missing bugs

← incomplete symbolic execution

– state explosion – limited inter-procedural analysis

slide-75
SLIDE 75

75

Juxta's ranking scheme is efgective

Entropy-based ranking Cumulative # of bugs

slide-76
SLIDE 76

76

Juxta's ranking scheme is efgective

Entropy-based ranking Cumulative # of bugs Highest-ranked

slide-77
SLIDE 77

77

Juxta's ranking scheme is efgective

Entropy-based ranking Cumulative # of bugs Highest-ranked Lowest-ranked

slide-78
SLIDE 78

78

Juxta's ranking scheme is efgective

Entropy-based ranking Cumulative # of bugs Highest-ranked Lowest-ranked

slide-79
SLIDE 79

79

Juxta's ranking scheme is efgective

Entropy-based ranking Cumulative # of bugs Highest-ranked Lowest-ranked

> 50% of real bugs were found in top 100

slide-80
SLIDE 80

80

Limitation

  • Deviations do not always mean bugs

– e.g., 24 patches are rejected after developers' review

  • Not universally applicable

– e.g., requirement: multiple existing implementations

  • Symbolic execution is not complete

– e.g., state explosion, limited inter-procedural analysis

slide-81
SLIDE 81

81

Discussion

  • Self-regression

– e.g., comparing between subsequent versions

  • Cross-layer refactoring

– promoting common code to VFS in Linux fjle systems – e.g., if all fjle systems need the same capability check,

shall we move such check to the VFS?

  • Potential programs to be checked

– e.g., C libs, SCSI device drivers, JavaScript engines, etc.

slide-82
SLIDE 82

82

Conclusion

  • Cross-checking semantic correctness by comparing

and contrasting multiple implementations

  • Juxta: a static tool to fjnd bugs in fjle systems

– Seven specialized checkers were developed – 118 new semantic bugs found (e.g., ext4, XFS, Ceph, etc.)

  • Our code and database will be released soon
slide-83
SLIDE 83

83 Changwoo Min changwoo@gatech.edu Sanidhya Kashyap, Byoungyoung Lee, Chengyu Song, Taesoo Kim

Georgia Institute of Technology School of Computer Science

Thank you! Questions?

slide-84
SLIDE 84

Case study: Rename a fjle

  • Rename() has complex semantics

– e.g., rename(old_dir/a, new_dir/b) requires 3x3x3x3

combinations for update (e.g., mtime of dir and fjle)

  • POSIX specifjcation defjnes subset of such

combinations

– e.g., ctime and mtime of old_dir and new_dir

slide-85
SLIDE 85

Compare rename() of existing fjle systems in Linux

  • Majority follows the POSIX spec

– Found 6 incorrect implementation (e.g., HPFS)

  • Found inconsistency of undocumented combinations

– Found 6 potential bugs (e.g., HFS)

Attribute # Updated FS # Not updated FS

  • ld_dir

ctime 53 1 mtime 53 1 new_dir ctime 52 2 mtime 52 2 fjle ctime 48 6

Bugs Hidden Spec.