Cross-checking Semantic Correctness: The Case of Finding File System Bugs
Changwoo Min, Sanidhya Kashyap, Byoungyoung Lee, Chengyu Song, Taesoo Kim
Georgia Institute of Technology School of Computer Science
Cross-checking Semantic Correctness: The Case of Finding File System - - PowerPoint PPT Presentation
Cross-checking Semantic Correctness: The Case of Finding File System Bugs Changwoo Min , Sanidhya Kashyap, Byoungyoung Lee, Chengyu Song, Taesoo Kim Georgia Institute of Technology School of Computer Science Two promising approaches to make
Changwoo Min, Sanidhya Kashyap, Byoungyoung Lee, Chengyu Song, Taesoo Kim
Georgia Institute of Technology School of Computer Science
2
– Guarantee high-level invariants (e.g., functional correctness)
– Check if code fjts with domain model (e.g., locking rules)
3
– Guarantee high-level invariants (e.g., functional correctness)
– Check if code fjts with domain model (e.g., locking rules)
4
5
6
2013-01-07
7
2013-01-07 2014-10-17
8
2013-01-07 2014-10-17 2015-03-19
9
Memory bugs: NULL dereference Use-after-free ...
12.6%
10
Memory bugs: NULL dereference Use-after-free ...
12.6%
11
Signed-ofg-by: Sanidhya Kashyap <sanidhya@gatech.edu> ...
@@ static size_t ocfs2_xattr_trusted_list + if (!capable(CAP_SYS_ADMIN)) + return 0;
12
Signed-ofg-by: Sanidhya Kashyap <sanidhya@gatech.edu> ...
@@ static size_t ocfs2_xattr_trusted_list + if (!capable(CAP_SYS_ADMIN)) + return 0;
13
Signed-ofg-by: Sanidhya Kashyap <sanidhya@gatech.edu> ...
@@ static size_t ocfs2_xattr_trusted_list + if (!capable(CAP_SYS_ADMIN)) + return 0;
static size_t ext2_xattr_trusted_list() if (!capable(CAP_SYS_ADMIN)) return 0;
static size_t xfs_xattr_put_listent() if ((fmags & XFS_ATTR_ROOT) && !capable(CAP_SYS_ADMIN)) return 0;
static size_t ext4_xattr_trusted_list() if (!capable(CAP_SYS_ADMIN)) return 0;
14
Signed-ofg-by: Sanidhya Kashyap <sanidhya@gatech.edu> ...
@@ static size_t ocfs2_xattr_trusted_list + if (!capable(CAP_SYS_ADMIN)) + return 0;
static size_t ext2_xattr_trusted_list() if (!capable(CAP_SYS_ADMIN)) return 0;
static size_t xfs_xattr_put_listent() if ((fmags & XFS_ATTR_ROOT) && !capable(CAP_SYS_ADMIN)) return 0;
static size_t ext4_xattr_trusted_list() if (!capable(CAP_SYS_ADMIN)) return 0;
15
Signed-ofg-by: Sanidhya Kashyap <sanidhya@gatech.edu> ...
@@ static size_t ocfs2_xattr_trusted_list + if (!capable(CAP_SYS_ADMIN)) + return 0;
static size_t ext2_xattr_trusted_list() if (!capable(CAP_SYS_ADMIN)) return 0;
static size_t xfs_xattr_put_listent() if ((fmags & XFS_ATTR_ROOT) && !capable(CAP_SYS_ADMIN)) return 0;
static size_t ext4_xattr_trusted_list() if (!capable(CAP_SYS_ADMIN)) return 0;
16
– Success
– Failure
– Hard to detect without domain knowledge
17
index fd5599d..e723482 100644 @@ static int ceph_write_begin + if (r < 0) + page_cache_release(page); + else + *pagep = page;
18
index fd5599d..e723482 100644 @@ static int ceph_write_begin + if (r < 0) + page_cache_release(page); + else + *pagep = page;
19
20
– 118 new bugs in 54 fjle systems
– System crash, data corruption, deadlock, etc
– Bugs were hidden for 6.2 years on average
– Condition check, argument use, return value, locking, etc
21
– e.g., disk layout in fjle system
– Q1: Where to start? – Q2: What to compare? – Q3: How to compare?
22
– e.g., vfs_rename() in each fjle system
– Q1: Where to start?
– Q2: What to compare?
– Q3: How to compare?
23
File System Source Code Per-Filesystem Path Database Statistical Path Comparison Symbolic Execution
24
File System Source Code Per-Filesystem Path Database Statistical Path Comparison Symbolic Execution
Path Condition Checker Argument Checker
...
25
– Identifying semantically similar entry points
– Building per-path symbolic environment
– Statistically comparing each path
26
– Identifying semantically similar entry points
– Building per-path symbolic environment
– Statistically comparing each path
27
– Use common data structures and behavior
– Defjne fjlesystem-specifjc interfaces
28
struct inode_operations { int (*rename) (struct inode *, ...); int (*create) (struct inode *,...); int (*unlink) (struct inode *,..); int (*mkdir) (struct inode *,...); };
29
struct inode_operations { int (*rename) (struct inode *, ...); int (*create) (struct inode *,...); int (*unlink) (struct inode *,..); int (*mkdir) (struct inode *,...); }; btrfs_rename(...); ext4_rename(...); xfs_vn_rename(…); ...
30
– Identifying semantically similar entry points
– Building per-path symbolic environment
– Statistically comparing each path
31
– C language level – Build symbolic environment per path
– Error codes represent per-path semantics
32
33
34
Condition fmag: !RO
35
Condition fmag: !RO Side-efgect inode fmag = fmag →
36
Condition fmag: !RO Side-efgect inode fmag = fmag → Call kmalloc(…, GFP_NOFS)
37
Condition fmag: !RO Side-efgect inode fmag = fmag → Call kmalloc(…, GFP_NOFS) Return SUCCESS
38
ext4 ... btrfs ... xfs ... ...
39
– Identifying semantically similar entry points
– Building per-path symbolic environment
– Statistically comparing each path
40
ext4_rename btrfs_rename xfs_rename ... ... ... ...
– What is the range of argument for this execution path?
e.g., path condition, return value, etc.
– How many times a particular API fmag is used?
e.g., API argument usage, error handling, etc.
41
ext4_rename btrfs_rename xfs_rename ... ... ... ...
fmag: !RO fmag: !RO fmag: WO
– What is the range of argument for this execution path?
e.g., path condition, return value, etc.
– How many times a particular API fmag is used?
e.g., API argument usage, error handling, etc.
42
ext4_rename btrfs_rename xfs_rename ... ... ... ...
f(NOFS) f(NOFS) f(KERNEL)
– What is the range of argument for this execution path?
e.g., path condition, return value, etc.
– How many times a particular API fmag is used?
e.g., API argument usage, error handling, etc.
43
– Compare range data and fjnd deviant sub-ranges
– Find deviation in event occurrences
44
– High rank frequently used common patterns (e.g., VFS) – Low rank specifjc implementations of fjle systems
– Sum up the sizes of non-overlapping area
45
int foo_rename(fmag) { if (fmag == RO) return -EACCES; } int bar_rename(fmag) { if (fmag == RO) return -EACCES; } int cad_rename(fmag) { if (fmag == WO) return -EACCES; }1
46
int foo_rename(fmag) { if (fmag == RO) return -EACCES; } int bar_rename(fmag) { if (fmag == RO) return -EACCES; } int cad_rename(fmag) { if (fmag == WO) return -EACCES; }1
47
int foo_rename(fmag) { if (fmag == RO) return -EACCES; } int bar_rename(fmag) { if (fmag == RO) return -EACCES; } int cad_rename(fmag) { if (fmag == WO) return -EACCES; }
RO WO
RO WO RO WO
48
∑/3
RO WO RO WO RO WO RO WO
49
∑/3
RO WO RO WO RO WO RO WO
Increase commonality
50
∑/3
RO WO RO WO RO WO RO WO
Increase commonality Decrease uncommonality
51
RO WO RO WO
52
RO WO RO WO
53
RO WO RO WO
54
RO WO RO WO RO WO
55
Distance
Reason
int foo_rename(fmag) { if (fmag == RO) return -EACCES; } int bar_rename(fmag) { if (fmag == RO) return -EACCES; } int cad_rename(fmag) { if (fmag == WO) return -EACCES; }1
RO WO
56
Distance
Reason
int foo_rename(fmag) { if (fmag == RO) return -EACCES; } int bar_rename(fmag) { if (fmag == RO) return -EACCES; } int cad_rename(fmag) { if (fmag == WO) return -EACCES; }1
RO WO
57
Distance
Reason
int foo_rename(fmag) { if (fmag == RO) return -EACCES; } int bar_rename(fmag) { if (fmag == RO) return -EACCES; } int cad_rename(fmag) { if (fmag == WO) return -EACCES; }1
RO WO
58
– Compare range data and fjnd deviant sub-ranges
– Find deviation in event occurrences
59
– Function argument, return value handling, etc.
60
– Function argument, return value handling, etc.
61
– Function argument, return value handling, etc.
62
– Function argument, return value handling, etc.
63
– e.g., kmalloc() in fjle system
64
65
66
67
68
69
Path Condition Checker Return Code Checker Function Call Checker Side-efgect Checker Argument Checker Error Handling Checker Lock Checker Spec. Generator Per-Filesystem Path Database Statistical Path Comparison
Entropy-based Histogram-based
70
– Symbolic path explorer
– Tools and library
– Checkers
71
72
Checker # reports # manually verifjed reports New bugs Return code 573 150 2 Side-efgect 389 150 6 Function call 521 100 5 Path condition 470 150 46 Argument 56 10 4 Error handling 242 100 47 Lock 131 50 8 Total 2,382 710
73
Checker # reports # manually verifjed reports New bugs Return code 573 150 2 Side-efgect 389 150 6 Function call 521 100 5 Path condition 470 150 46 Argument 56 10 4 Error handling 242 100 47 Lock 131 50 8 Total 2,382 710
74
– 21 known fjle system semantic bugs from PatchDB
– Synthesize them to the Linux Kernel 4.0-rc2
– state explosion – limited inter-procedural analysis
75
Entropy-based ranking Cumulative # of bugs
76
Entropy-based ranking Cumulative # of bugs Highest-ranked
77
Entropy-based ranking Cumulative # of bugs Highest-ranked Lowest-ranked
78
Entropy-based ranking Cumulative # of bugs Highest-ranked Lowest-ranked
79
Entropy-based ranking Cumulative # of bugs Highest-ranked Lowest-ranked
80
– e.g., 24 patches are rejected after developers' review
– e.g., requirement: multiple existing implementations
– e.g., state explosion, limited inter-procedural analysis
81
– e.g., comparing between subsequent versions
– promoting common code to VFS in Linux fjle systems – e.g., if all fjle systems need the same capability check,
shall we move such check to the VFS?
– e.g., C libs, SCSI device drivers, JavaScript engines, etc.
82
– Seven specialized checkers were developed – 118 new semantic bugs found (e.g., ext4, XFS, Ceph, etc.)
83 Changwoo Min changwoo@gatech.edu Sanidhya Kashyap, Byoungyoung Lee, Chengyu Song, Taesoo Kim
Georgia Institute of Technology School of Computer Science
– e.g., rename(old_dir/a, new_dir/b) requires 3x3x3x3
– e.g., ctime and mtime of old_dir and new_dir
– Found 6 incorrect implementation (e.g., HPFS)
– Found 6 potential bugs (e.g., HFS)
Attribute # Updated FS # Not updated FS
ctime 53 1 mtime 53 1 new_dir ctime 52 2 mtime 52 2 fjle ctime 48 6