Efficient and Precise Points-to Analysis:
Modeling the Heap by Merging Equivalent Automata
Tian Tan, Yue Li and Jingling Xue
PLDI 2017 June, 2017
1
Efficient and Precise Points-to Analysis: Modeling the Heap by - - PowerPoint PPT Presentation
Efficient and Precise Points-to Analysis: Modeling the Heap by Merging Equivalent Automata Tian Tan, Yue Li and Jingling Xue PLDI 2017 June, 2017 1 A New Points-to Analysis T echnique for Object-Oriented Programs 2 Points-to Analysis
PLDI 2017 June, 2017
1
2
Determines
3
Security analysis Bug detection Compiler optimization Program verification Program understanding …
4
Security analysis Bug detection Compiler optimization Program verification Program understanding …
5
6
On-the-fly construction
7
On-the-fly construction
3-object-sensitive points-to analysis
7
Analyze Java programs
8
Analyze Java programs
9
Unscalable (> 5 hours) 14469 (4 hours)
5000 10000 15000 findbugs pmd
Analysis time (sec.)
Model control-flow Model data-flow
10
Model control-flow
Call-site-sensitivity (PLDI’04, PLDI’06) Object-sensitivity (ISSTA’02, TOSEM’05, SAS’16) Type-sensitivity (POPL’11) …
Model data-flow
11
Model control-flow
Call-site-sensitivity (PLDI’04, PLDI’06) Object-sensitivity (ISSTA’02, TOSEM’05, SAS’16) Type-sensitivity (POPL’11) …
Model data-flow
Allocation-site abstraction Type-based abstraction …
12
Model control-flow
Call-site-sensitivity (PLDI’04, PLDI’06) Object-sensitivity (ISSTA’02, TOSEM’05, SAS’16) Type-sensitivity (POPL’11) …
Model data-flow
Allocation-site abstraction Type-based abstraction …
13
14
Infinite-size heap Finite (abstract)
Dynamic execution Static analysis abstracted
partitioned … …
One object per allocation site
15
1 A a1 = new A(); 2 A a2 = new A(); 3 B b = new B();
One object per allocation site
16
1 A a1 = new A(); 2 A a2 = new A(); 3 B b = new B();
A
A
B
One object per allocation site
17
1 A a1 = new A(); 2 A a2 = new A(); 3 B b = new B();
A
A
B
Over-partition for call graph construction
18
A::toString()
A
A void foo(Object o) {
} 1 A a1 = new A(); 2 A a2 = new A(); 3 foo(a1); 4 foo(a2);
A
A
Over-partition for type-dependent clients
19
1 A a1 = new A(); 2 A a2 = new A(); 3 foo(a1); 4 foo(a2);
A
A void foo(Object o) {
A a = (A) o; }
A
A
One object per type
20
1 A a1 = new A(); 2 A a2 = new A(); 3 B b = new B();
One object per type
21
1 A a1 = new A(); 2 A a2 = new A(); 3 B b = new B();
Precision loss for type-dependent clients
22
A a1 = new A(); A a2 = new A(); B b = new B(); C c = new C(); a1.f = b; a2.f = c; Object o = a1.f;
Precision loss for type-dependent clients
23
A a1 = new A(); A a2 = new A(); B b = new B(); C c = new C(); a1.f = b; a2.f = c; Object o = a1.f;
Precision loss for type-dependent clients
24
A a1 = new A(); A a2 = new A(); B b = new B(); C c = new C(); a1.f = b; a2.f = c; Object o = a1.f;
Precision loss for type-dependent clients
25
A a1 = new A(); A a2 = new A(); B b = new B(); C c = new C(); a1.f = b; a2.f = c; Object o = a1.f;
B::toString() C::toString()
Precision loss for type-dependent clients
26
A a1 = new A(); A a2 = new A(); B b = new B(); C c = new C(); a1.f = b; a2.f = c; Object o = a1.f;
B::toString() C::toString()
False positive
27
28
Improve Efficiency
Adopted by all mainstream points-to analyses
Unscalable (> 5 hours) 14469 (4 fours) 524 128
findbugs pmd
Analysis Time (sec.)
MAHJONG Allocation-site abstraction
29
Unscalable (> 5 hours) 14469 (4 fours) 524 128
findbugs pmd
Analysis Time (sec.)
MAHJONG Allocation-site abstraction 44004 44016
pmd
#call graph edges
MAHJONG Allocation-site abstraction
Improve Efficiency Preserve Precision
Adopted by all mainstream points-to analyses
30
44004 44016
pmd
#call graph edges
MAHJONG Allocation-site abstraction
Improve Efficiency Preserve Precision
Adopted by all mainstream points-to analyses
Unscalable (> 5 hours) 14469 (4 fours) 524 128
findbugs pmd
Analysis Time (sec.)
MAHJONG Allocation-site abstraction
31
alleviate cause
32
alleviate cause
A
A
B
C
33
alleviate cause
A
A
B
C
T and Oj T are type-consistent objects,
34
T and Oj T are type-consistent objects,
35
MAHJONG only merges type-consistent objects
Example
36
T
U
X
T f
Y
Y
X
Y
U
Y
Example
37
T
U
X
T f
Y
Y
X
Y
U
Y
O1
T
O2
T
.f U U .f.h Y Y .g X X .g.k Y Y
Example
38
T
U
X
T f
Y
Y
X
Y
U
Y
O1
T
O2
T
.f U U .f.h Y Y .g X X .g.k Y Y
O1
T and O2 T are
type-consistent objects
39
40
Check T ype-Consistency
Test Equivalence
6-tuple (Q, Σ, δ, q0, Γ, γ), where:
41
42
Check T ype-Consistency
Test Equivalence
A set of objects A set of field names The field points-to map The object to be checked A set of types The object-to-type map Q: a set of states Σ: a set of input symbols δ: the next-state map q0: the initial state Γ: a set of output symbols γ: the output map
43
T
X
U
Y
44
T, O4 U, O6 X, O8 Y A set of objects
A set of field names The field points-to map The object to be checked A set of types The object-to-type map Q: a set of states Σ: a set of input symbols δ: the next-state map q0: the initial state Γ: a set of output symbols γ: the output map
T
X
U
Y
45
field names ↔ input symbols
A set of objects A set of field names The field points-to map The object to be checked A set of types The object-to-type map Q: a set of states Σ: a set of input symbols δ: the next-state map q0: the initial state Γ: a set of output symbols γ: the output map
T
X
U
Y
46
T
X
U
Y
field points-to map ↔ next-state map
O2
T
f O4
U
O2
T
g O6
X
O4
U
h O8
Y
O6
X
k O8
Y
A set of objects A set of field names The field points-to map The object to be checked A set of types The object-to-type map Q: a set of states Σ: a set of input symbols δ: the next-state map q0: the initial state Γ: a set of output symbols γ: the output map
47
T
checked object ↔ initial state
A set of objects A set of field names The field points-to map The object to be checked A set of types The object-to-type map Q: a set of states Σ: a set of input symbols δ: the next-state map q0: the initial state Γ: a set of output symbols γ: the output map
T
X
U
Y
48
types ↔ output symbols
A set of objects A set of field names The field points-to map The object to be checked A set of types The object-to-type map Q: a set of states Σ: a set of input symbols δ: the next-state map q0: the initial state Γ: a set of output symbols γ: the output map
T
X
U
Y
49
O2
T
T O4
U
U O6
X
X
O8
Y
Y
A set of objects A set of field names The field points-to map The object to be checked A set of types The object-to-type map Q: a set of states Σ: a set of input symbols δ: the next-state map q0: the initial state Γ: a set of output symbols γ: the output map
T
X
U
Y
50
Test Equivalence
A set of objects A set of field names The field points-to map The object to be checked A set of types The object-to-type map Q: a set of states Σ: a set of input symbols δ: the next-state map q0: the initial state Γ: a set of output symbols γ: the output map
Check Type-Consistency
Hopcroft-Karp algorithm*
51
* J. E. Hopcroft and R. M. Karp, A linear algorithm for testing equivalence of finite automata, Technical Report 71-114, 1971
52
53
Pre-Analysis NFA Builder Automata Equivalence Checker
Field Points-to Graph (FPG) Heap Abstraction
DFAOiT ≡ DFAOjT ? Points-to Analysis Heap Modeler ∀ Oi
T, Oj T
in FPG
fast but imprecise e.g., context-insensitive precise but expensive e.g., 3-object-sensitive DFA Converter NFAOiT NFAOjT DFAOiT DFAOjT
Allocation-site heap
MAHJONG heap
54
… …
type-consistent objects
1500 LOC of Java in total Integrated with Can also be easily integrated to other
55
56
RQ1: MAHJONG’s effectiveness as a pre-analysis RQ2: MAHJONG-based points-to analysis
57
Efficiency
Heap partitioning
58
antlr fop luindex pmd chart checkstyle xalan bloat lusearch JPC findbugs eclipse
CI
44.1 34.7 26.2 44.8 37.7 89.6 66.6 38.7 41.4 58.9 90.6 174.1
FPG
1.3 0.7 0.8 1.4 2.4 2.3 3.0 1.2 0.8 2.1 4.6 15.5
MAHJONG
1.3 1.1 1.1 1.5 1.9 4.0 3.1 1.7 1.0 4.5 3.2 21.4
T
46.7 36.5 28.1 47.7 42.0 95.9 72.7 41.6 43.2 65.5 98.4 211.0 59
CI: Context-Insensitive points-to analysis FPG: Read Field Points-to Graph MAHJONG: Check automata equivalence, build heap abstraction
60
7729 7159 6190 7363 8106 14337 6523 7807 10888 11181 14063 19529 2228 2474 2108 2727 3107 5285 2229 2942 4028 4142 5233 9414 5000 10000 15000 20000
Number of abstract objects created by the allocation-site abstraction and MAHJONG
Allocation-Site Abstraction MAHJONG
Average reduction: 62%
Efficiency
Precision
61
5 mainstream context-sensitive
Time budget: 5 hours
62
Call graph construction Devirtualization May-fail casting
63
64
Efficiency
Call graph: -0.02% Devirtualization: -0.29% May-fail casting: -0%
Precision
65
Efficiency
Call graph: -0.02% Devirtualization: -0.29% May-fail casting: -0%
Call graph: -0.02% Devirtualization: -0.18% May-fail casting: -0.03%
Precision
66
Efficiency
Call graph: -0.02% Devirtualization: -0.29% May-fail casting: -0%
Call graph: -0.02% Devirtualization: -0.18% May-fail casting: -0.03%
Precision
For checkstyle, xalan, lusearch, JPC, findbugs 3-object-sensitive analysis:
MAHJONG
Call-site-, object- and type-sensitivity
Direct impact
67
68