Problems Implementing a points-to analysis to handle the details of - - PowerPoint PPT Presentation

problems implementing a points to analysis to handle the
SMART_READER_LITE
LIVE PREVIEW

Problems Implementing a points-to analysis to handle the details of - - PowerPoint PPT Presentation

Scaling Java Points-To Analysis Using S PARK (Soot Pointer Analysis Research Kit) Ond rej Lhot ak and Laurie Hendren Sable Research Group McGill University April 8th, 2003 p. 1/53 Problems Implementing a points-to analysis to


slide-1
SLIDE 1

Scaling Java Points-To Analysis Using SPARK

(Soot Pointer Analysis Research Kit)

Ondˇ rej Lhot´ ak and Laurie Hendren

Sable Research Group McGill University April 8th, 2003

– p. 1/53

slide-2
SLIDE 2

Problems Implementing a points-to analysis to handle the details of Java is a lot of work. is difficult to do correctly. Research done on disparate implementations is often incomparable.

– p. 2/53

slide-3
SLIDE 3

Objectives Develop a flexible, efficient framework for experimenting with variations in Java points-to analyses Demonstrate its usefulness with an empirical comparison of precision and efficiency of some of these variations

– p. 3/53

slide-4
SLIDE 4

Outline Spark overview Empirical study Overall performance Uses of Spark Conclusion

– p. 4/53

slide-5
SLIDE 5

Spark overview Part of Soot bytecode transformation and annotation framework [CC 00] [CC 01] Initial representation is Soot’s Jimple Typed [SAS 00] Three-address (only simple operations) Spark internal representation is Pointer Assignment Graph (PAG) Nodes for variables, allocation sites, field references Edges representing subset constraints

– p. 5/53

slide-6
SLIDE 6

Spark overview Spark proceeds in three steps:

Jimple Construct PAG Simplify PAG Propagate Points-to Sets

Analysis variations expressed by building different PAGs for the same code This talk concentrates on flow-insensitive, subset-based variations

– p. 6/53

slide-7
SLIDE 7

Empirical study Factors affecting precision Enforcing declared types Field reference representation Call graph construction Factors affecting only efficiency Pointer assignment graph simplification Set implementation Propagation algorithms

– p. 7/53

slide-8
SLIDE 8

Declared types: ignore Hierarchy A B C A x, z; B y; x = new A(); y = new B(); y = (B) x; z = y; x : A y : B z : A

– p. 8/53

slide-9
SLIDE 9

Declared types: ignore Hierarchy A B C A x, z; B y; x = new A(); y = new B(); y = (B) x; z = y; x : A

A

y : B

B

z : A

– p. 8/53

slide-10
SLIDE 10

Declared types: ignore Hierarchy A B C A x, z; B y; x = new A(); y = new B(); y = (B) x; z = y; x : A

A

y : B

A B

z : A

A B

– p. 8/53

slide-11
SLIDE 11

Declared types: enforce after analysis

[OOPSLA 00] [Rountev,Milanova,Ryder 01]

Hierarchy A B C A x, z; B y; x = new A(); y = new B(); y = (B) x; z = y; x : A

A

y : B

A B

z : A

A B

– p. 9/53

slide-12
SLIDE 12

Declared types: enforce during analysis Hierarchy A B C A x, z; B y; x = new A(); y = new B(); y = (B) x; z = y; x : A

A

y : B

A B

z : A

A B

– p. 10/53

slide-13
SLIDE 13

Enforcing declared types ignoring types produces many large sets (> 1000 elements) of spurious points-to relationships in practice, enforcing types after analysis almost as precise as during analysis enforcing types during analysis prevents blowup during the analysis ignore slow less precise after analysis slow more precise during analysis fast more precise

– p. 11/53

slide-14
SLIDE 14

Empirical study Factors affecting precision Enforcing declared types Field reference representation Call graph construction Factors affecting only efficiency Pointer assignment graph simplification Set implementation Propagation algorithms

– p. 12/53

slide-15
SLIDE 15

Field representation Field references can be represented in different ways: field-sensitive distinguishes fields of different objects field-based ignores the base object, grouping all objects having the field together

– p. 13/53

slide-16
SLIDE 16

Field-sensitive representation A x, y, z; B u, v, w; l1: x = new A(); y = x; l2: z = new A(); u = new B(); x.f = u; v = y.f; w = z.f; x

A1

y

A1

z

A2

u

B

A1.f

B

v

B

A2.f w

– p. 14/53

slide-17
SLIDE 17

Field-based representation A x, y, z; B u, v, w; l1: x = new A(); y = x; l2: z = new A(); u = new B(); x.f = u; v = y.f; w = z.f; x

A1

y

A1

z

A2

u

B

A.f

B

v

B

w

B

– p. 15/53

slide-18
SLIDE 18

Field representation Field-sensitive requires iterating Field-based less precise, but possible in a single iteration Clever propagation algorithm can make speed difference very small field-based very fast less precise field-sensitive almost as fast more precise

– p. 16/53

slide-19
SLIDE 19

Empirical study Factors affecting precision Enforcing declared types Field reference representation Call graph construction Factors affecting only efficiency Pointer assignment graph simplification Set implementation Propagation algorithms

– p. 17/53

slide-20
SLIDE 20

Call graph construction An approximation of the call graph is required for points-to analysis It can be built ahead-of-time using an analysis such as Class Hierarchy Analysis

  • n-the-fly during the analysis as actual

types of receivers are computed

– p. 18/53

slide-21
SLIDE 21

Call graph construction: CHA Hierarchy A B C class B { foo() { . . . } } class C { foo() { . . . } } A x = new B(); A y = x.foo(); x

B

B.foo() this return

y

C.foo() this return

– p. 19/53

slide-22
SLIDE 22

Call graph construction: on-the-fly Hierarchy A B C class B { foo() { . . . } } class C { foo() { . . . } } A x = new B(); A y = x.foo(); x

B

B.foo() this return

y

C.foo() this return

– p. 20/53

slide-23
SLIDE 23

Call graph construction: on-the-fly Hierarchy A B C class B { foo() { . . . } } class C { foo() { . . . } } A x = new B(); A y = x.foo(); x

B

B.foo() this return

y

C.foo() this return

– p. 21/53

slide-24
SLIDE 24

Call graph construction Building call graph on-the-fly requires adding edges during propagation requires more iteration reduces simplification opportunities before propagation CHA call graph includes more spurious, unreachable methods than on-the-fly CHA fast less precise

  • n-the-fly

slow more precise

– p. 22/53

slide-25
SLIDE 25

Empirical study Factors affecting precision Enforcing declared types Field reference representation Call graph construction Factors affecting only efficiency Pointer assignment graph simplification Set implementation Propagation algorithms

– p. 23/53

slide-26
SLIDE 26

Pointer assignment graph simplification Groups of nodes can be merged

[Rountev,Chandra 00]

strongly-connected components single-entry subgraphs a

b c d e

f

– p. 24/53

slide-27
SLIDE 27

Pointer assignment graph simplification Groups of nodes can be merged

[Rountev,Chandra 00]

strongly-connected components single-entry subgraphs a

b c d e

f a bcde f

– p. 24/53

slide-28
SLIDE 28

Pointer assignment graph simplification Groups of nodes can be merged

[Rountev,Chandra 00]

strongly-connected components single-entry subgraphs a

b c d e

f a bcde f a

b c d e f g

h i

– p. 24/53

slide-29
SLIDE 29

Pointer assignment graph simplification Groups of nodes can be merged

[Rountev,Chandra 00]

strongly-connected components single-entry subgraphs a

b c d e

f a bcde f a

b c d e f g

h i a bcdefg h i

– p. 24/53

slide-30
SLIDE 30

Pointer assignment graph simplification Factors limiting simplification opportunities Enforcing declared types changes points-to sets On-the-fly call graph eliminates edges from initial pointer assignment graph

– p. 25/53

slide-31
SLIDE 31

Pointer assignment graph simplification

– p. 26/53

slide-32
SLIDE 32

Empirical study Factors affecting precision Enforcing declared types Field reference representation Call graph construction Factors affecting only efficiency Pointer assignment graph simplification Set implementation Propagation algorithms

– p. 27/53

slide-33
SLIDE 33

Set implementation

hash Using java.util.HashSet array Sorted array, binary search

a b d

bit Bit vector

a b c d e f g h i j . . . x y z 1 1 1 . . .

hybrid

Array for small sets Bit vector for large sets

– p. 28/53

slide-34
SLIDE 34

Set implementation hash slow large array slow small bit fast large hybrid fast small In the above table, slow is up to 100 times slower than fast large is up to 3 times larger than small Set implementation is very important

– p. 29/53

slide-35
SLIDE 35

Empirical study Factors affecting precision Enforcing declared types Field reference representation Call graph construction Factors affecting only efficiency Pointer assignment graph simplification Set implementation Propagation algorithms

– p. 30/53

slide-36
SLIDE 36

Propagation algorithms: iterative repeat for each edge e propagate along e; end for until no change Slightly more complicated to handle field references

  • n-the-fly call graph

– p. 31/53

slide-37
SLIDE 37

Propagation algorithms: worklist while worklist not empty do remove node n from worklist; for each edge e starting at n propagate along e; add all affected nodes to worklist; end for end while

– p. 32/53

slide-38
SLIDE 38

Propagation algorithms: worklist while worklist not empty do remove node n from worklist; for each edge e starting at n propagate along e; add all affected nodes to worklist; end for end while With field references, difficult to determine affected nodes Very costly to determine all affected nodes due to of aliasing

– p. 32/53

slide-39
SLIDE 39

Propagation algorithms: worklist repeat while worklist not empty do remove node n from worklist; for each edge e starting at n propagate along e; add most affected nodes to worklist; end for end while propagate along all field reference edges; until no change Solution: find most affected nodes, and add

  • uter loop to handle missed nodes

– p. 33/53

slide-40
SLIDE 40

Propagation algorithms: incremental worklist x

A B C D

y

– p. 34/53

slide-41
SLIDE 41

Propagation algorithms: incremental worklist x

A B C D

y

A B C D

1st iteration: propagate { A , B , C , D }

– p. 35/53

slide-42
SLIDE 42

Propagation algorithms: incremental worklist x

A B C D

E

y

A B C D

1st iteration: propagate { A , B , C , D } add E to x

– p. 36/53

slide-43
SLIDE 43

Propagation algorithms: incremental worklist x

A B C D

E

y

A B C D

E

1st iteration: propagate { A , B , C , D } add E to x 2nd iteration: propagate { A , B , C , D , E }

– p. 37/53

slide-44
SLIDE 44

Propagation algorithms: incremental worklist Idea: split sets into new and old part x

  • ld new

A B C D

y

  • ld new

– p. 38/53

slide-45
SLIDE 45

Propagation algorithms: incremental worklist Idea: split sets into new and old part x

  • ld new

A B C D

y

  • ld new

A B C D

1st iteration: propagate { A , B , C , D }

– p. 39/53

slide-46
SLIDE 46

Propagation algorithms: incremental worklist Idea: split sets into new and old part x

  • ld

new

A B C D

y

  • ld

new

A B C D

1st iteration: propagate { A , B , C , D } flush new to old

– p. 40/53

slide-47
SLIDE 47

Propagation algorithms: incremental worklist Idea: split sets into new and old part x

  • ld

new

A B C D

E

y

  • ld

new

A B C D

1st iteration: propagate { A , B , C , D } flush new to old add E to x

– p. 41/53

slide-48
SLIDE 48

Propagation algorithms: incremental worklist Idea: split sets into new and old part x

  • ld

new

A B C D

E

y

  • ld

new

A B C D

E

1st iteration: propagate { A , B , C , D } flush new to old add E to x 2nd iteration: propagate { E }

– p. 42/53

slide-49
SLIDE 49

Propagation algorithms: incremental worklist Idea: split sets into new and old part x

  • ld

new

A B C D

E

y

  • ld

new

A B C D

E

1st iteration: propagate { A , B , C , D } flush new to old add E to x 2nd iteration: propagate { E } flush new to old

– p. 43/53

slide-50
SLIDE 50

Propagation algorithms When to use worklist? Always, about twice as fast as iterative When to use incremental worklist? Always, except with CHA call graph

field-based analysis, in which there is not

enough iteration

– p. 44/53

slide-51
SLIDE 51

Summary of findings Declared types should be enforced during propagation for a scalable analysis Hybrid set implementation much faster than

  • thers, up to 2 orders of magnitude, with

reasonable memory consumption Field-based can be done in one iteration, but field-sensitive with worklist algorithm is almost as fast and slightly more precise Tradeoff: On-the-fly call graph slower but more precise than ahead-of-time CHA call graph

– p. 45/53

slide-52
SLIDE 52

Outline Spark overview Empirical study Overall performance Uses of Spark Conclusion

– p. 46/53

slide-53
SLIDE 53

Related studies Rountev, Milanova, Ryder [OOPSLA 01] 360 MHz SPARC, solver written in ML version 1.1.8 library (150 KLOC) Whaley, Lam [SAS 02] 2 GHz Pentium, solver written in Java version 1.3.1 library (500 KLOC)

  • ptimistic call graph (potentially unsafe)

(Spark) Lhoták, Hendren [CC 03] 1.67 GHz Athlon, solver written in Java version 1.3.1 library (500 KLOC) Common metric: number of methods analyzed

– p. 47/53

slide-54
SLIDE 54

Overall performance: time

– p. 48/53

slide-55
SLIDE 55

Overall performance: space

– p. 49/53

slide-56
SLIDE 56

Outline Spark overview Empirical study Overall performance Uses of Spark Conclusion

– p. 50/53

slide-57
SLIDE 57

Uses of Spark Use points-to and side-effect information in Soot analyses Encode in attributes for use in JITs for use in program understanding Experiment with points-to algorithms using Spark command-line switches by implementing new algorithms within Spark

– p. 51/53

slide-58
SLIDE 58

Conclusions Spark is a flexible and efficient framework for experimenting with variations in Java points-to analyses We have demonstrated its usefulness in an empirical study of some of these variations Ongoing work BDD-based solvers [PLDI 03] Object-sensitivity [Milanova,Rountev,Ryder 02] On-the-fly cycle detection [Heintze,Tardieu 01] Shared bit-vector [Heintze,Tardieu 01]

– p. 52/53

slide-59
SLIDE 59

Obtaining Spark Spark is part of Soot since version 1.2.4 Soot is available under the LGPL http://www.sable.mcgill.ca/soot Future plans for Soot Major update (version 2.0) in June 2003 Tutorial at PLDI

– p. 53/53