CO444H Pointer analysis Ben Livshits 1 Call Graphs Class - - PowerPoint PPT Presentation

co444h
SMART_READER_LITE
LIVE PREVIEW

CO444H Pointer analysis Ben Livshits 1 Call Graphs Class - - PowerPoint PPT Presentation

Datalog CO444H Pointer analysis Ben Livshits 1 Call Graphs Class analysis: Given a reference variable x, what are the classes of the objects that x refers to at runtime? We saw CHA and RTA Deal with polymorphic/virtual calls:


slide-1
SLIDE 1

Datalog Pointer analysis

CO444H

Ben Livshits

1

slide-2
SLIDE 2

Call Graphs

  • Class analysis:
  • Given a reference variable x, what are the classes of the
  • bjects that x refers to at runtime?
  • We saw CHA and RTA
  • Deal with polymorphic/virtual calls: x.m()
  • Compilers: can we devirtualize a virtual call x.m()?
  • Software engineering:
  • Construct the call graph of the program
  • Why is that important in everyday development?
slide-3
SLIDE 3

Features of RTA

  • RTA may evaluate a method several times
  • If new callers are discovered the method has

to be re-evaluated

  • RTA runs until the worklist is empty, at which

point it has reached a fixed point and cannot resolve any new call edges to add to the call graph

3

slide-4
SLIDE 4

RTA Revisited

RAPID TYPE ANALYSIS RTA = call graph of only methods (no edges) CHA = class hierarchy analysis call graph W = worklist containing the main method while W is not empty M = next method in W T = set of allocated types in M T = T U {allocated types in RTA callers of M} for each callsite (C) in M if C is a static dispatch or constructor: add an edge to statically resolved method

  • therwise:

M' = methods called from M in CHA M' = M' intersection {methods declared in T or supertypes of T} add an edge from the method M to each method in M' add each method in M' to worklist W

4

slide-5
SLIDE 5

Using RTA in Eclipse

5

slide-6
SLIDE 6

RTA May Be Unsound

public static void main(String[] args){ Object o = foo(); bar(o); } public static Object foo(){ return new A(); } public static void bar(Object o){

  • .toString()

}

  • main calls foo, which

returns an allocation

  • f type A that is then

passed as a parameter in the call to bar

  • The call edge to A.

toString would be missing because neither bar or its parents (main) allocated a type of A

6

slide-7
SLIDE 7

Call Graph Construction: Reachability Computation

Queue worklist CallGraph graph; worklist.addAtTail(main()) Graph.addNode(main()) while (worklist.notEmpty()) { m = worklist.getFromHead(); process_method_body(m); }

slide-8
SLIDE 8

Next Steps…

  • Ingredients
  • Adding pointers
  • Adding call graphs
  • Combining those two
  • How to we mix the ingredients?
  • Can first build a call graph; then add pointers
  • Can do it all at once: we can use Datalog to represent

everything, with some Datalog relations encoding intraprocedural aspects and some interprocedural

8

slide-9
SLIDE 9

Pointer Analysis: Basics and Algorithms

9

slide-10
SLIDE 10

Variants of Pointer Analysis

  • For C:
  • Andersen analysis
  • Steensgard analysis
  • Pointer analysis for Java
  • How to encode these in Datalog
  • Other variants

10

slide-11
SLIDE 11

What is the Goal of Pointer Analysis?

  • What memory locations can a pointer expression

refer to?

  • Alias analysis: When do two pointer expressions

refer to the same storage location?

int x; p = &x; q = p;

11

  • *p and *q alias
  • as do x and *p
  • and x and *q
slide-12
SLIDE 12

Sources of Aliasing

  • Aliasing can arise due to several reasons,

depending on the language…

  • Pointers
  • e.g., int *p, i; p = &i;
  • Call-by-reference

void m(Object a, Object b) { … } m(x,x); // a and b alias in m

  • Array indexing
  • int i, j, a[100];
  • i = j; // a[i] and a[j] alias

12

slide-13
SLIDE 13

Why do we Want to Know?

  • Pointer analysis tells us

what memory locations code uses or modifies

  • Useful in many analyses
  • E.g., available

expressions

*p = a + b; y = a + b;

  • If *p aliases a or b, then

second computation of a+b is not redundant

  • E.g., consider constant

propagation x = 3; *p = 4; y = x;

  • Is y constant?
  • If *p and x do not alias each
  • ther, then yes.
  • If *p and x always alias each
  • ther, then yes.
  • If *p and x sometimes alias each
  • ther, then no

13

slide-14
SLIDE 14

Pointer Analysis Dimensions

  • Intraprocedural /

interprocedural

  • Flow-sensitive /

flow-insensitive

  • Context-sensitive /

context-insensitive

  • Definiteness: May

versus must

  • Heap modelling
  • Data representation

14

slide-15
SLIDE 15

Flow-sensitive vs. Flow- insensitive Points-To

  • Flow-sensitive pointer

analysis computes for each program point what memory locations pointer expressions may refer to

  • Flow-insensitive pointer

analysis computes what memory locations pointer expressions may refer to, at any time in program execution

  • Flow-sensitive pointer

analysis is (traditionally) too expensive to perform for whole program

  • Flow-insensitive pointer

analyses typically used for whole program analyses

15

slide-16
SLIDE 16

Context Sensitivity

  • Also difficult,

but success in scaling up to hundreds of thousands LOC

  • BDDs see Whaley and

Lam PLDI 2004

  • Doop, Bravenboer and

Smaragdakis OOPSLA 2009

16

slide-17
SLIDE 17

May vs. Must

  • May analysis: aliasing

that may occur during execution

  • (cf. must-not alias,

although often has different representation)

  • Must analysis: aliasing

that must occur during execution

  • Sometimes both are

useful

  • E.g., consider liveness

analysis for *p = *q + 4;

  • If *p must alias x, then

x in kill set for statement

  • If *q may alias y, then y

in gen set for statement

17

slide-18
SLIDE 18

Representation Options

  • Points-to pairs: first

element points to the second

  • e.g., (p → b), (q → b)
  • p and b alias, as do *q

and b, as do *p and *q

  • Pairs that refer to the

same memory

  • e.g., (*p,b), (*q,b),

(*p,*q), (**r, b)

  • General, may be less

concise than points-to pairs

  • Equivalence sets: sets

that are aliases

  • e.g., {*p,*q,b}

18

slide-19
SLIDE 19

Modeling Memory Locations

  • We want to describe

what memory locations a pointer expression may refer to

  • How do we model

memory locations?

  • For global variables, no

trouble, use a single “node”

  • For local variables, use

a single “node” per context

  • i.e., just one node if

context insensitive

  • For dynamically

allocated memory

  • Problem: Potentially

unbounded locations created at runtime

  • Need to model

locations with some finite abstraction

19

slide-20
SLIDE 20

Modeling Dynamic Memory Locations

  • For each allocation

statement, use one node per context

  • Note: could choose

context-sensitivity for modelling heap locations to be less precise than context- sensitivity for modelling procedure invocation

  • Other solutions:
  • One node for

entire heap

  • One node for

each type

  • Nodes based on

analysis of “shape” of heap

20

slide-21
SLIDE 21

Problem Statement

  • Let’s consider flow-

insensitive may pointer analysis

  • Assume program

consists of statements

  • f form

p = &a (address of, includes allocation statements) p = q *p = q p = *q

  • Assume pointers p,q∈P and

address-taken variables a,b∈A are disjoint

  • Can transform program to

make this true

  • For any variable v for which

this isn’t true, add statement pv = &av, and replace v with *pv

  • Want to compute relation

pts : P∪A → 2A

  • Essentially points to pairs

21

slide-22
SLIDE 22

Andersen-style Pointer Analysis

  • View pointer assignments as subset constraints
  • Use constraints to propagate points-to information
  • Called inclusion-based pointer analysis

22

slide-23
SLIDE 23

Andersen-style Pointer Analysis

  • Can solve these constraints directly on sets pts(p)
  • p = &a; p ⊇ {a}
  • q = p;

q ⊇ p

  • p = &b; p ⊇ {b}
  • r = p;

r ⊇ p

23

slide-24
SLIDE 24

Example of Subset Constraints

24

slide-25
SLIDE 25

How Precise Is This Analysis?

25

slide-26
SLIDE 26

Andersen-style as Graph Closure

  • Can be cast as a graph closure problem
  • One node for each pts(p), pts(a)
  • Each node has an associated points-to set
  • Compute transitive closure of graph, and add edges

according to complex constraints

26

slide-27
SLIDE 27

Work List Algorithm

  • Initialize graph and points to sets using base and

simple constraints

  • Let W = { v | pts(v) ≠∅ } (all nodes with non-empty

points to sets)

  • While W not empty
  • v ← select from W
  • for each a ∈ pts(v) do
  • add edge a→ p, and add a to W if edge is new
  • for each constraint *v ⊇ q
  • add edge q→a, and add q to W if edge is new
  • for each edge v→q do
  • pts(q) = pts(q) ∪ pts(v), and add q to W if pts(q) changed

27

slide-28
SLIDE 28

Same Example, as A Graph (Initial)

28

W: p q r s a

slide-29
SLIDE 29

Same Example, as A Graph (Final)

29

W: {}

slide-30
SLIDE 30

Cycle Elimination

  • Andersen-style pointer analysis is O(n3), for number
  • f nodes in graph
  • Actually, quadratic in practice [Sridharan and Fink,

SAS 09];

  • Improve scalability by reducing the value of n
  • Cycle elimination: important optimization for Andersen-

style analysis

  • Detect strongly connected components in points-to

graph, collapse to single node

  • Why? All nodes in an SCC will have same points-to

relation at end of analysis

30

slide-31
SLIDE 31

Steensgaard-style Analysis

  • Also a constraint-based analysis
  • Uses equality constraints instead of subset constraints
  • Originally phrased as a type-inference problem
  • Less precise than Andersen-style, thus more scalable

31

slide-32
SLIDE 32

Steensgaard-style Example

32

a b c p q a,b c p,q a,b c p,q r a,b c p,q,s,t r a,b,c p,q,s,t,r All pointers end up in the same equivalence class pointing to all the locations

slide-33
SLIDE 33

Implementing Steensgaard

  • Can be efficiently implemented

using UnionFind algorithm

  • Nearly linear time: O(nα(n))
  • Each statement needs to be

processed just once

  • Unlike Andersen’s, which is a lot

more difficult to scale

33

slide-34
SLIDE 34

Datalog-based Formulation of Pointer Analysis

34

slide-35
SLIDE 35

35

Pointer or Points-to Analysis

  • We shall consider Andersen’s formulation of Java
  • bject references.
  • Flow/context insensitive analysis.
  • Cast of characters:

1. Local (or stack) variables, which point to… 2. Heap objects, which may have fields that are references to other heap objects.

slide-36
SLIDE 36

Matches a Language Like Java

  • Pointers (or

references) go from the stack(s) to the heap elements

  • Also from one

heap element to another

36

Stack 1 Stack 2 Heap

slide-37
SLIDE 37

37

Representing Heap Objects

  • A heap object is named by the statement in which

it is created.

  • Note many run-time objects may have the same

name.

  • Example: h: T v = new T; says variable v can

point to (one of) the heap object(s) created by statement h.

v h

slide-38
SLIDE 38

38

Field Store

  • v.f = w makes the f field of the heap object h

pointed to by v point to what variable w points to.

v h g w i f f

slide-39
SLIDE 39

39

Field Load

  • v = w.f makes v point to what the f field of the

heap object h pointed to by w points to.

v h g w i f

slide-40
SLIDE 40

40

Variable Assignment

  • v = w makes v point to whatever w points to
  • Interprocedural Analysis : Also models copying

an actual parameter to the corresponding formal or return value to a variable

v h w

slide-41
SLIDE 41

41

EDB (Initial) Relations

  • The facts about the statements in the program and

what they do to pointers are accumulated and placed in several EDB relations.

  • Example: there would be an EDB relation

Copy(To,From) whose tuples are the pairs (v,w) such that there is a copy statement v=w.

slide-42
SLIDE 42

42

Convention for Initial EDB

  • Instead of using EDB relations for the various

statement forms, we shall simply use the quoted statement itself to stand for an atom derived from the statement.

  • Example: “v=w” stands for Copy(v,w).
slide-43
SLIDE 43

43

What Do We Compute?

  • Pts(V,H) will get the set of pairs (v, h) such

that variable v can point to heap object h.

  • Hpts(H1,F,H2) will get the set of triples (h, f,

g) such that the field f of heap object h can point to heap object g.

slide-44
SLIDE 44

44

Datalog Rules for Points-to

1. Pts(V,H) :- “H: V = new T” 2. Pts(V,H) :- “V=W”, Pts(W,H). 3. Pts(V,H) :- “V=W.F”, Pts(W,G), Hpts(G,F,H). 4. Hpts(H,F,G) :- “V.F=W”, Pts(V,H), Pts(W,G).

slide-45
SLIDE 45

45

Program Example

T p(T x) { h: T a = new T; a.f = x; return a; } void main() { g: T b = new T; b = p(b); b = b.f; }

slide-46
SLIDE 46

46

Apply Rules Recursively --- Round 1

T p(T x) {h: T a = new T; a.f = x; return a;} void main() {g: T b = new T; b = p(b); b = b.f;}

Pts(a,h) Pts(b,g)

slide-47
SLIDE 47

47

Apply Rules Recursively --- Round 2

T p(T x) {h: T a = new T; a.f = x; return a;} void main() {g: T b = new T; b = p(b); b = b.f;}

Pts(a,h) Pts(b,g) Pts(b,h) Pts(x,g)

slide-48
SLIDE 48

48

Apply Rules Recursively --- Round 3

T p(T x) {h: T a = new T; a.f = x; return a;} void main() {g: T b = new T; b = p(b); b = b.f;}

Pts(a,h) Pts(b,g) Pts(x,g) Pts(b,h) Hpts(h,f,g) Pts(x,h)

slide-49
SLIDE 49

49

Apply Rules Recursively --- Round 4

T p(T x) {h: T a = new T; a.f = x; return a;} void main() {g: T b = new T; b = p(b); b = b.f;}

Pts(a,h) Pts(b,g) Pts(x,g) Pts(b,h) Pts(x,h) Hpts(h,f,g) Hpts(h,f,h)

slide-50
SLIDE 50

50

Extension to Support Flow Sensitivity

  • IDB predicates need additional arguments B, I.
  • B = block number.
  • I = position within block, 0, 1,…, n for n -statement

block.

  • Position 0 is before first statement, position 1 is between 1st

and 2nd statement, etc.

  • Is there another way to introduce flow sensitivity?
slide-51
SLIDE 51

51

Adding Context Sensitivity

  • Include a component C = context.
  • C doesn’t change within a function.
  • Call and return can extend the context if the called

function is not mutually recursive with the caller.

slide-52
SLIDE 52

52

Context Sensitive Analysis: Maintaining the Context

Pts(X,H,B0,0,D) :- Pts(V,H,B,I,C), “B,I: call P(…,V,…)”, “X is the corresponding actual to V in P”, “B0 is the entry of P”, “context D is C extended by P”.

slide-53
SLIDE 53

Path Numbering To Make Calling Contexts Finite

53

Cloning-Based Context-Sensitive Pointer Alias Analysis Using Binary Decision Diagrams, Whaley and Lam, 2004

slide-54
SLIDE 54

Expressing the Entire Analysis in Datalog

54