Datalog Pointer analysis
CO444H
Ben Livshits
1
CO444H Pointer analysis Ben Livshits 1 Call Graphs Class - - PowerPoint PPT Presentation
Datalog CO444H Pointer analysis Ben Livshits 1 Call Graphs Class analysis: Given a reference variable x, what are the classes of the objects that x refers to at runtime? We saw CHA and RTA Deal with polymorphic/virtual calls:
Datalog Pointer analysis
1
to be re-evaluated
point it has reached a fixed point and cannot resolve any new call edges to add to the call graph
3
RAPID TYPE ANALYSIS RTA = call graph of only methods (no edges) CHA = class hierarchy analysis call graph W = worklist containing the main method while W is not empty M = next method in W T = set of allocated types in M T = T U {allocated types in RTA callers of M} for each callsite (C) in M if C is a static dispatch or constructor: add an edge to statically resolved method
M' = methods called from M in CHA M' = M' intersection {methods declared in T or supertypes of T} add an edge from the method M to each method in M' add each method in M' to worklist W
4
5
public static void main(String[] args){ Object o = foo(); bar(o); } public static Object foo(){ return new A(); } public static void bar(Object o){
}
returns an allocation
passed as a parameter in the call to bar
toString would be missing because neither bar or its parents (main) allocated a type of A
6
Queue worklist CallGraph graph; worklist.addAtTail(main()) Graph.addNode(main()) while (worklist.notEmpty()) { m = worklist.getFromHead(); process_method_body(m); }
everything, with some Datalog relations encoding intraprocedural aspects and some interprocedural
8
9
10
refer to?
refer to the same storage location?
int x; p = &x; q = p;
11
depending on the language…
void m(Object a, Object b) { … } m(x,x); // a and b alias in m
12
what memory locations code uses or modifies
expressions
*p = a + b; y = a + b;
second computation of a+b is not redundant
propagation x = 3; *p = 4; y = x;
13
interprocedural
flow-insensitive
context-insensitive
versus must
14
analysis computes for each program point what memory locations pointer expressions may refer to
analysis computes what memory locations pointer expressions may refer to, at any time in program execution
analysis is (traditionally) too expensive to perform for whole program
analyses typically used for whole program analyses
15
but success in scaling up to hundreds of thousands LOC
Lam PLDI 2004
Smaragdakis OOPSLA 2009
16
that may occur during execution
although often has different representation)
that must occur during execution
useful
analysis for *p = *q + 4;
x in kill set for statement
in gen set for statement
17
element points to the second
and b, as do *p and *q
same memory
(*p,*q), (**r, b)
concise than points-to pairs
that are aliases
18
what memory locations a pointer expression may refer to
memory locations?
trouble, use a single “node”
a single “node” per context
context insensitive
allocated memory
unbounded locations created at runtime
locations with some finite abstraction
19
statement, use one node per context
context-sensitivity for modelling heap locations to be less precise than context- sensitivity for modelling procedure invocation
entire heap
each type
analysis of “shape” of heap
20
insensitive may pointer analysis
consists of statements
p = &a (address of, includes allocation statements) p = q *p = q p = *q
address-taken variables a,b∈A are disjoint
make this true
this isn’t true, add statement pv = &av, and replace v with *pv
pts : P∪A → 2A
21
Andersen-style Pointer Analysis
22
Andersen-style Pointer Analysis
q ⊇ p
r ⊇ p
23
24
25
according to complex constraints
26
simple constraints
points to sets)
27
28
W: p q r s a
29
W: {}
SAS 09];
style analysis
graph, collapse to single node
relation at end of analysis
30
31
32
a b c p q a,b c p,q a,b c p,q r a,b c p,q,s,t r a,b,c p,q,s,t,r All pointers end up in the same equivalence class pointing to all the locations
using UnionFind algorithm
processed just once
more difficult to scale
33
34
35
1. Local (or stack) variables, which point to… 2. Heap objects, which may have fields that are references to other heap objects.
references) go from the stack(s) to the heap elements
heap element to another
36
Stack 1 Stack 2 Heap
37
it is created.
name.
point to (one of) the heap object(s) created by statement h.
v h
38
pointed to by v point to what variable w points to.
v h g w i f f
39
heap object h pointed to by w points to.
v h g w i f
40
an actual parameter to the corresponding formal or return value to a variable
v h w
41
what they do to pointers are accumulated and placed in several EDB relations.
Copy(To,From) whose tuples are the pairs (v,w) such that there is a copy statement v=w.
42
Convention for Initial EDB
statement forms, we shall simply use the quoted statement itself to stand for an atom derived from the statement.
43
that variable v can point to heap object h.
g) such that the field f of heap object h can point to heap object g.
44
1. Pts(V,H) :- “H: V = new T” 2. Pts(V,H) :- “V=W”, Pts(W,H). 3. Pts(V,H) :- “V=W.F”, Pts(W,G), Hpts(G,F,H). 4. Hpts(H,F,G) :- “V.F=W”, Pts(V,H), Pts(W,G).
45
T p(T x) { h: T a = new T; a.f = x; return a; } void main() { g: T b = new T; b = p(b); b = b.f; }
46
T p(T x) {h: T a = new T; a.f = x; return a;} void main() {g: T b = new T; b = p(b); b = b.f;}
Pts(a,h) Pts(b,g)
47
T p(T x) {h: T a = new T; a.f = x; return a;} void main() {g: T b = new T; b = p(b); b = b.f;}
Pts(a,h) Pts(b,g) Pts(b,h) Pts(x,g)
48
T p(T x) {h: T a = new T; a.f = x; return a;} void main() {g: T b = new T; b = p(b); b = b.f;}
Pts(a,h) Pts(b,g) Pts(x,g) Pts(b,h) Hpts(h,f,g) Pts(x,h)
49
T p(T x) {h: T a = new T; a.f = x; return a;} void main() {g: T b = new T; b = p(b); b = b.f;}
Pts(a,h) Pts(b,g) Pts(x,g) Pts(b,h) Pts(x,h) Hpts(h,f,g) Hpts(h,f,h)
50
Extension to Support Flow Sensitivity
block.
and 2nd statement, etc.
51
function is not mutually recursive with the caller.
52
Pts(X,H,B0,0,D) :- Pts(V,H,B,I,C), “B,I: call P(…,V,…)”, “X is the corresponding actual to V in P”, “B0 is the entry of P”, “context D is C extended by P”.
53
Cloning-Based Context-Sensitive Pointer Alias Analysis Using Binary Decision Diagrams, Whaley and Lam, 2004
54