1
DScope: Detecting Real-World Data Corruption Hang Bugs in Cloud Server Systems
Ting Dai1, Jingzhu He1, Xiaohui (Helen) Gu1, Shan Lu2, Peipei Wang1
1NC State University 2University of Chicago
DScope: Detecting Real-World Data Corruption Hang Bugs in Cloud - - PowerPoint PPT Presentation
DScope: Detecting Real-World Data Corruption Hang Bugs in Cloud Server Systems Ting Dai 1 , Jingzhu He 1 , Xiaohui (Helen) Gu 1 , Shan Lu 2 , Peipei Wang 1 1 NC State University 2 University of Chicago 1 DScope, SoCC18 Real-World Data
1
1NC State University 2University of Chicago
DScope, SoCC’18 2
Recovering from backup
Corrupted data Software hang
DScope, SoCC’18
3
183 public static void skipFully(
184
185
… … 189
190
191 }
Corrupted InputStream
The loop stride (ret) is always 0 when in is corrupted.
DScope
DScope, SoCC’18
549 560
No
550
Yes
... 559
4
DScope, SoCC’18
544 ...
Yes
572
No
549 550
Yes
560
No
... 559 ... 571
5
Outer: Inner:
Outer: DScope then extracts the exit conditions for each loop path.
DScope, SoCC’18
120 while (!dataFile.isEOF()) {
…
129
try {
130
key = decorateKey(…dataFile); …
139
} catch (Throwable th) {
140
//ignore exception
141
} …
185
try {
186
if (key == null)
187
throw new IOError(…); …
207
} catch (Throwable th) {
208
//ignore exception } } Corrupted dataFile
6
throw exception throw exception
based on arguments.
group throw exceptions when their arguments get corrupted.
feasible loop paths.
120 128
Yes
257
No
129 130 139 131 140 141 185 186 187
Yes
188
No
207 255 256 ... 138 ... 206
Infeasible path
DScope, SoCC’18
7
198 $i1 = r0.<InputStream: read()>(r2) //$i1 is an I/O related variable 199 if $i1 == -1 goto line #203 //``$i1 == -1'' is the exit condition
202 goto line #198
DScope, SoCC’18
8
//Soot IR 3 if l8 >= l0 goto line #12 //``l8 >= l0'’ is the exit condition ... 5 $l2 = l0 - l8 6 $l4 = $r2.<InputStream: skip>($l2) //$l4 is an I/O related variable 7 $b5 = $l4 cmp 0L 8 if $b5 == 0 goto line #12 //``$b5 == 0'' is the exit condition 9 $l7 = $l8 + $l4 10 i8 = $l7 11 goto line #3 Dependency: I/O operation $l4 $l8 $b5 $l7
DScope, SoCC’18
9
DScope, SoCC’18
10
307 public static long readVLong(DataInput stream)…{ 308
309
314
It’s a FP because the loop stride is always 1 and the upper bound (len-1) is fixed. len is I/O dependent
DScope, SoCC’18
11
Bound (len-1) Stride (1)
DScope, SoCC’18
12
Bound checking Stride forwarding
DScope, SoCC’18
13 System Description # of bugs Cassandra Distributed database management system 2 Compress Libraries for I/O ops on compressed file 2 Hadoop Common Hadoop utilities and libraries 10 Mapreduce Hadoop big data processing framework 5 HDFS Hadoop distributed file system 4 Yarn Hadoop resource management platform 4 Hive Data warehouse 12 Kafka Distributed streaming platform 1 Lucene Indexing and search server 2
DScope, SoCC’18
14
System DScope Findbugs Infer TP FP TP TP Cassandra
v2.0.8 2 1 1
Compress
v1.0 2 2
Common
v0.23.0 4 6 v2.5.0 6 6
Mapreduce
v0.23.0 3 v2.5.0 2
HDFS
v0.23.0 1 1 v2.5.0 3 5 1
v0.23.0 2 2 1 v2.5.0 2 5
Hive
v1.0.0 7 6
5 1
Kafka
v0.10.0.0 1 1
Lucene
V2.1.0 2 1
Total 42 37 2 1
DScope, SoCC’18
15
DScope, SoCC’18
183 public static void skipFully(InputStream in, long len) … { 184
185
… … 189
Corrupted InputStream
16
The loop stride (ret) is always 0 when in is corrupted.
DScope, SoCC’18
78
84
85
86
87
17
e m p t y a r r a y The loop stride (size) is always 0 when conducting read op on an empty array.
194 BUFFER_SIZE = conf.getInt(); Corrupted configuration file
DScope, SoCC’18
1668
1669
1689
Corrupted block
18
Application function
DScope, SoCC’18
19
472 private int readWithBounceBuffer(
481
502
512
514 }
277 private int drainDataBuf(
286
291 } Check bounds Forward index
DScope, SoCC’18
20
DScope, SoCC’18
21