Efficient and Precise Points-to Analysis: Modeling the Heap by - - PowerPoint PPT Presentation

efficient and precise points to analysis
SMART_READER_LITE
LIVE PREVIEW

Efficient and Precise Points-to Analysis: Modeling the Heap by - - PowerPoint PPT Presentation

Efficient and Precise Points-to Analysis: Modeling the Heap by Merging Equivalent Automata Tian Tan, Yue Li and Jingling Xue PLDI 2017 June, 2017 1 A New Points-to Analysis T echnique for Object-Oriented Programs 2 Points-to Analysis


slide-1
SLIDE 1

Efficient and Precise Points-to Analysis:

Modeling the Heap by Merging Equivalent Automata

Tian Tan, Yue Li and Jingling Xue

PLDI 2017 June, 2017

1

slide-2
SLIDE 2

A New Points-to Analysis T echnique for Object-Oriented Programs

2

slide-3
SLIDE 3

Points-to Analysis

 Determines

  • “which objects a variable can point to?”

3

slide-4
SLIDE 4

Uses of Points-to Analysis

Clients Tools

 Security analysis  Bug detection  Compiler optimization  Program verification  Program understanding  …

Chord

4

slide-5
SLIDE 5

Uses of Points-to Analysis

Clients Tools

 Security analysis  Bug detection  Compiler optimization  Program verification  Program understanding  …

Chord

5

Call Graph

slide-6
SLIDE 6

Existing Call Graph Construction

6

 On-the-fly construction

(run with points-to analysis)

  • Precise
  • Inefficient
slide-7
SLIDE 7

Existing Call Graph Construction

7

 On-the-fly construction

(run with points-to analysis)

  • Precise
  • Inefficient

 3-object-sensitive points-to analysis

  • Very precise
  • Adopted by, e.g.,

7

Chord

slide-8
SLIDE 8

3-Object-Sensitive Points-to Analysis

 Analyze Java programs

  • Intel Xeon E5 3.70GHz,128GB of memory
  • Time budget: 5 hours (18000 secs)

8

slide-9
SLIDE 9

3-Object-Sensitive Points-to Analysis

 Analyze Java programs

  • Intel Xeon E5 3.70GHz,128GB of memory
  • Time budget: 5 hours (18000 secs)

9

Unscalable (> 5 hours) 14469 (4 hours)

5000 10000 15000 findbugs pmd

Analysis time (sec.)

slide-10
SLIDE 10

T wo Mainstreams of Points-to Analysis T echniques

 Model control-flow  Model data-flow

10

slide-11
SLIDE 11

T wo Mainstreams of Points-to Analysis T echniques

 Model control-flow

  • Context-sensitivity

 Call-site-sensitivity (PLDI’04, PLDI’06)  Object-sensitivity (ISSTA’02, TOSEM’05, SAS’16)  Type-sensitivity (POPL’11)  …

 Model data-flow

11

slide-12
SLIDE 12

T wo Mainstreams of Points-to Analysis T echniques

 Model control-flow

  • Context-sensitivity

 Call-site-sensitivity (PLDI’04, PLDI’06)  Object-sensitivity (ISSTA’02, TOSEM’05, SAS’16)  Type-sensitivity (POPL’11)  …

 Model data-flow

  • Heap abstraction

 Allocation-site abstraction  Type-based abstraction  …

12

slide-13
SLIDE 13

T wo Mainstreams of Points-to Analysis T echniques

 Model control-flow

  • Context-sensitivity

 Call-site-sensitivity (PLDI’04, PLDI’06)  Object-sensitivity (ISSTA’02, TOSEM’05, SAS’16)  Type-sensitivity (POPL’11)  …

 Model data-flow

  • Heap abstraction

 Allocation-site abstraction  Type-based abstraction  …

13

slide-14
SLIDE 14

Heap Abstraction

14

Infinite-size heap Finite (abstract)

  • bjects

Dynamic execution Static analysis abstracted

  • r

partitioned … …

slide-15
SLIDE 15

Allocation-Site Abstraction

 One object per allocation site

15

1 A a1 = new A(); 2 A a2 = new A(); 3 B b = new B();

slide-16
SLIDE 16

Allocation-Site Abstraction

 One object per allocation site

16

1 A a1 = new A(); 2 A a2 = new A(); 3 B b = new B();

  • 1

A

  • 2

A

  • 3

B

slide-17
SLIDE 17

Allocation-Site Abstraction

 One object per allocation site

  • Adopted by all mainstream points-to analyses

17

1 A a1 = new A(); 2 A a2 = new A(); 3 B b = new B();

  • 1

A

  • 2

A

  • 3

B

slide-18
SLIDE 18

Allocation-Site Abstraction

 Over-partition for call graph construction

18

A::toString()

  • 1

A

  • 2

A void foo(Object o) {

  • .toString();

} 1 A a1 = new A(); 2 A a2 = new A(); 3 foo(a1); 4 foo(a2);

  • 1

A

  • 2

A

slide-19
SLIDE 19

Allocation-Site Abstraction

 Over-partition for type-dependent clients

  • Call graph construction
  • Devirtualization
  • May-fail casting

19

1 A a1 = new A(); 2 A a2 = new A(); 3 foo(a1); 4 foo(a2);

  • 1

A

  • 2

A void foo(Object o) {

  • .toString();

A a = (A) o; }

  • 1

A

  • 2

A

slide-20
SLIDE 20

Type-Based Abstraction

 One object per type

20

1 A a1 = new A(); 2 A a2 = new A(); 3 B b = new B();

slide-21
SLIDE 21

Type-Based Abstraction

 One object per type

21

  • B
  • A

1 A a1 = new A(); 2 A a2 = new A(); 3 B b = new B();

slide-22
SLIDE 22

Type-Based Abstraction

 Precision loss for type-dependent clients

22

A a1 = new A(); A a2 = new A(); B b = new B(); C c = new C(); a1.f = b; a2.f = c; Object o = a1.f;

  • .toString();
  • B
  • A
  • C
slide-23
SLIDE 23

Type-Based Abstraction

 Precision loss for type-dependent clients

23

A a1 = new A(); A a2 = new A(); B b = new B(); C c = new C(); a1.f = b; a2.f = c; Object o = a1.f;

  • .toString();
  • A
  • B
  • A
  • C
  • B
  • C
slide-24
SLIDE 24

Type-Based Abstraction

 Precision loss for type-dependent clients

24

A a1 = new A(); A a2 = new A(); B b = new B(); C c = new C(); a1.f = b; a2.f = c; Object o = a1.f;

  • .toString();
  • A
  • B
  • C
  • B
  • A
  • C
  • B
  • C
slide-25
SLIDE 25

Type-Based Abstraction

 Precision loss for type-dependent clients

25

A a1 = new A(); A a2 = new A(); B b = new B(); C c = new C(); a1.f = b; a2.f = c; Object o = a1.f;

  • .toString();

B::toString() C::toString()

  • A
  • B
  • C
  • B
  • A
  • C
  • B
  • C
slide-26
SLIDE 26

Type-Based Abstraction

 Precision loss for type-dependent clients

26

A a1 = new A(); A a2 = new A(); B b = new B(); C c = new C(); a1.f = b; a2.f = c; Object o = a1.f;

  • .toString();

B::toString() C::toString()

  • A
  • B
  • C
  • B
  • A
  • C
  • B
  • C

False positive

slide-27
SLIDE 27

Our Goal: Improve Efficiency Preserve Precision

27

slide-28
SLIDE 28

MAHJONG: A New Heap Abstraction

28

Improve Efficiency

Adopted by all mainstream points-to analyses

Unscalable (> 5 hours) 14469 (4 fours) 524 128

findbugs pmd

Analysis Time (sec.)

MAHJONG Allocation-site abstraction

slide-29
SLIDE 29

MAHJONG: A New Heap Abstraction

29

Unscalable (> 5 hours) 14469 (4 fours) 524 128

findbugs pmd

Analysis Time (sec.)

MAHJONG Allocation-site abstraction 44004 44016

pmd

#call graph edges

MAHJONG Allocation-site abstraction

Improve Efficiency Preserve Precision

Adopted by all mainstream points-to analyses

slide-30
SLIDE 30

MAHJONG: A New Heap Abstraction

30

44004 44016

pmd

#call graph edges

MAHJONG Allocation-site abstraction

Improve Efficiency Preserve Precision

How?

Adopted by all mainstream points-to analyses

Unscalable (> 5 hours) 14469 (4 fours) 524 128

findbugs pmd

Analysis Time (sec.)

MAHJONG Allocation-site abstraction

slide-31
SLIDE 31

31

Merging Objects Over-Partition Blindly Merging Objects Precision Loss

alleviate cause

slide-32
SLIDE 32

32

Merging Objects Over-Partition Blindly Merging Objects Precision Loss

alleviate cause

  • 1

A

  • 2

A

  • 3

B

  • 4

C

f f inconsistent types inconsistent types

slide-33
SLIDE 33

33

Merging Objects Over-Partition Blindly Merging Objects Precision Loss

alleviate cause

  • 1

A

  • 2

A

  • 3

B

  • 4

C

f f

  • A
  • B
  • C

f f inconsistent types

slide-34
SLIDE 34

 Definition

Oi

T and Oj T are type-consistent objects,

if for every sequence of field names, = f1. f2. ... . fn : Oi

  • T. and Oj
  • T. point to the objects of the

same types.

Type-Consistent Objects

34

f f f

slide-35
SLIDE 35

 Definition

Oi

T and Oj T are type-consistent objects,

if for every sequence of field names, = f1. f2. ... . fn : Oi

  • T. and Oj
  • T. point to the objects of the

same types.

Type-Consistent Objects

35

f f f

MAHJONG only merges type-consistent objects

slide-36
SLIDE 36

Type-Consistent Objects

 Example

36

  • 2

T

  • 3

U

  • 6

X

  • 1

T f

f

  • 7

Y

  • 9

Y

  • 5

X

  • 11

Y

  • 4

U

g h h k

  • 8

Y

g h k

slide-37
SLIDE 37

Type-Consistent Objects

 Example

37

  • 2

T

  • 3

U

  • 6

X

  • 1

T f

f

  • 7

Y

  • 9

Y

  • 5

X

  • 11

Y

  • 4

U

g h h k

  • 8

Y

g h k

O1

T

O2

T

.f U U .f.h Y Y .g X X .g.k Y Y

slide-38
SLIDE 38

Type-Consistent Objects

 Example

38

  • 2

T

  • 3

U

  • 6

X

  • 1

T f

f

  • 7

Y

  • 9

Y

  • 5

X

  • 11

Y

  • 4

U

g h h k

  • 8

Y

g h k

O1

T

O2

T

.f U U .f.h Y Y .g X X .g.k Y Y

∵ ∴

O1

T and O2 T are

type-consistent objects

slide-39
SLIDE 39

How to Check Type-Consistency?

39

slide-40
SLIDE 40

Our Solution: Sequential Automata

40

Check T ype-Consistency

  • f Objects

Test Equivalence

  • f Automata
slide-41
SLIDE 41

Sequential Automata

 6-tuple (Q, Σ, δ, q0, Γ, γ), where:

  • Q is a set of states
  • Σ is a set of input symbols
  • δ is the next-state map: Q × Σ  P(Q)
  • q0 is the initial state
  • Γ is a set of output symbols
  • γ is the output map: Q  Γ

41

slide-42
SLIDE 42

42

Check T ype-Consistency

  • f Objects

Test Equivalence

  • f Automata

How?

slide-43
SLIDE 43

 A set of objects  A set of field names  The field points-to map  The object to be checked  A set of types  The object-to-type map  Q: a set of states  Σ: a set of input symbols  δ: the next-state map  q0: the initial state  Γ: a set of output symbols  γ: the output map

43

Objects Automata

  • 2

T

  • 6

X

f

  • 4

U

  • 8

Y

g h k

slide-44
SLIDE 44

44

  • bjects ↔ states

O2

T, O4 U, O6 X, O8 Y  A set of objects

 A set of field names  The field points-to map  The object to be checked  A set of types  The object-to-type map  Q: a set of states  Σ: a set of input symbols  δ: the next-state map  q0: the initial state  Γ: a set of output symbols  γ: the output map

Objects Automata

  • 2

T

  • 6

X

f

  • 4

U

  • 8

Y

g h k

slide-45
SLIDE 45

45

f, g, h, k

field names ↔ input symbols

 A set of objects  A set of field names  The field points-to map  The object to be checked  A set of types  The object-to-type map  Q: a set of states  Σ: a set of input symbols  δ: the next-state map  q0: the initial state  Γ: a set of output symbols  γ: the output map

Objects Automata

  • 2

T

  • 6

X

f

  • 4

U

  • 8

Y

g h k

slide-46
SLIDE 46

46

  • 2

T

  • 6

X

f

  • 4

U

  • 8

Y

g h k

field points-to map ↔ next-state map

O2

T

f O4

U

O2

T

g O6

X

O4

U

h O8

Y

O6

X

k O8

Y

 A set of objects  A set of field names  The field points-to map  The object to be checked  A set of types  The object-to-type map  Q: a set of states  Σ: a set of input symbols  δ: the next-state map  q0: the initial state  Γ: a set of output symbols  γ: the output map

Objects Automata

slide-47
SLIDE 47

47

O2

T

checked object ↔ initial state

 A set of objects  A set of field names  The field points-to map  The object to be checked  A set of types  The object-to-type map  Q: a set of states  Σ: a set of input symbols  δ: the next-state map  q0: the initial state  Γ: a set of output symbols  γ: the output map

Objects Automata

  • 2

T

  • 6

X

f

  • 4

U

  • 8

Y

g h k

slide-48
SLIDE 48

48

T, U, X, Y

types ↔ output symbols

 A set of objects  A set of field names  The field points-to map  The object to be checked  A set of types  The object-to-type map  Q: a set of states  Σ: a set of input symbols  δ: the next-state map  q0: the initial state  Γ: a set of output symbols  γ: the output map

Objects Automata

  • 2

T

  • 6

X

f

  • 4

U

  • 8

Y

g h k

slide-49
SLIDE 49

49

  • bject-to-type map ↔ output map

O2

T

T O4

U

U O6

X

X

O8

Y

Y

 A set of objects  A set of field names  The field points-to map  The object to be checked  A set of types  The object-to-type map  Q: a set of states  Σ: a set of input symbols  δ: the next-state map  q0: the initial state  Γ: a set of output symbols  γ: the output map

Objects Automata

  • 2

T

  • 6

X

f

  • 4

U

  • 8

Y

g h k

slide-50
SLIDE 50

50

Test Equivalence

  • f Automata

Objects Automata

 A set of objects  A set of field names  The field points-to map  The object to be checked  A set of types  The object-to-type map  Q: a set of states  Σ: a set of input symbols  δ: the next-state map  q0: the initial state  Γ: a set of output symbols  γ: the output map

Check Type-Consistency

  • f Objects
slide-51
SLIDE 51

T est Equivalence of Automata

 Hopcroft-Karp algorithm*

  • Almost linear in terms of |Qlarger|
  • Qlarger: set of states of the larger automaton

51

* J. E. Hopcroft and R. M. Karp, A linear algorithm for testing equivalence of finite automata, Technical Report 71-114, 1971

slide-52
SLIDE 52

Methodology (MAHJONG)

52

slide-53
SLIDE 53

53

Pre-Analysis NFA Builder Automata Equivalence Checker

Field Points-to Graph (FPG) Heap Abstraction

DFAOiT ≡ DFAOjT ? Points-to Analysis Heap Modeler ∀ Oi

T, Oj T

in FPG

MAHJONG

fast but imprecise e.g., context-insensitive precise but expensive e.g., 3-object-sensitive DFA Converter NFAOiT NFAOjT DFAOiT DFAOjT

Overview

slide-54
SLIDE 54

Working with Points-to Analysis

Original New

 Allocation-site heap

abstraction

 MAHJONG heap

abstraction

54

… …

type-consistent objects

slide-55
SLIDE 55

Implementation

 1500 LOC of Java in total  Integrated with  Can also be easily integrated to other

points-to analysis frameworks

55

slide-56
SLIDE 56

Evaluation

56

slide-57
SLIDE 57

Evaluation - Research Questions

 RQ1: MAHJONG’s effectiveness as a pre-analysis  RQ2: MAHJONG-based points-to analysis

57

slide-58
SLIDE 58

RQ1:

MAHJONG’s Effectiveness as A Pre-Analysis

 Efficiency

  • Is MAHJONG lightweight for large programs?

 Heap partitioning

  • Can MAHJONG avoid heap over-partition?

58

slide-59
SLIDE 59

antlr fop luindex pmd chart checkstyle xalan bloat lusearch JPC findbugs eclipse

CI

44.1 34.7 26.2 44.8 37.7 89.6 66.6 38.7 41.4 58.9 90.6 174.1

FPG

1.3 0.7 0.8 1.4 2.4 2.3 3.0 1.2 0.8 2.1 4.6 15.5

MAHJONG

1.3 1.1 1.1 1.5 1.9 4.0 3.1 1.7 1.0 4.5 3.2 21.4

T

  • tal

46.7 36.5 28.1 47.7 42.0 95.9 72.7 41.6 43.2 65.5 98.4 211.0 59

In total: 1 minute Each program (on average) MAHJONG itself: 3.8 seconds

Pre-Analysis: Efficiency

CI: Context-Insensitive points-to analysis FPG: Read Field Points-to Graph MAHJONG: Check automata equivalence, build heap abstraction

slide-60
SLIDE 60

60

7729 7159 6190 7363 8106 14337 6523 7807 10888 11181 14063 19529 2228 2474 2108 2727 3107 5285 2229 2942 4028 4142 5233 9414 5000 10000 15000 20000

Number of abstract objects created by the allocation-site abstraction and MAHJONG

Allocation-Site Abstraction MAHJONG

Average reduction: 62%

Pre-Analysis: Heap Partition

slide-61
SLIDE 61

RQ2: MAHJONG-Based Points-to Analysis

 Efficiency

  • Can MAHJONG accelerate points-to analysis?

 Precision

  • Can MAHJONG preserve precision for

type-dependent clients?

61

slide-62
SLIDE 62

Evaluated Points-to Analyses

 5 mainstream context-sensitive

points-to analyses:

  • 1. 2-call-site-sensitive analysis
  • 2. 2-type-sensitive analysis
  • 3. 3-type-sensitive analysis
  • 4. 2-object-sensitive analysis
  • 5. 3-object-sensitive analysis

 Time budget: 5 hours

62

slide-63
SLIDE 63

Evaluated Clients

 Call graph construction  Devirtualization  May-fail casting

63

slide-64
SLIDE 64

64

MAHJONG-Base Points-to Analysis: Results

 Efficiency

Most precise (3-object-sensitive) Speedup: 131X

Call graph: -0.02% Devirtualization: -0.29% May-fail casting: -0%

 Precision

slide-65
SLIDE 65

65

MAHJONG-Base Points-to Analysis: Results

 Efficiency

Most precise (3-object-sensitive) Speedup: 131X

Call graph: -0.02% Devirtualization: -0.29% May-fail casting: -0%

On average Speedup: 15X

Call graph: -0.02% Devirtualization: -0.18% May-fail casting: -0.03%

 Precision

slide-66
SLIDE 66

66

MAHJONG-Base Points-to Analysis: Results

 Efficiency

Most precise (3-object-sensitive) Speedup: 131X

Call graph: -0.02% Devirtualization: -0.29% May-fail casting: -0%

On average Speedup: 15X

Call graph: -0.02% Devirtualization: -0.18% May-fail casting: -0.03%

 Precision

For checkstyle, xalan, lusearch, JPC, findbugs 3-object-sensitive analysis:

  • without MAHJONG, unscalable (> 5 hours)
  • with MAHJONG, finish in 1min ~ 84 mins (33 minutes on average)
slide-67
SLIDE 67

Conclusion

 MAHJONG

  • Improve significantly the efficiency of different

point-to analyses

 Call-site-, object- and type-sensitivity

  • Preserve almost the same precision for type-

dependent clients

 Direct impact

  • Benefit many program analyses where call graphs are required

67

slide-68
SLIDE 68

Thank you!

68