co444h
play

CO444H Pointer analysis Ben Livshits 1 Call Graphs Class - PowerPoint PPT Presentation

Datalog CO444H Pointer analysis Ben Livshits 1 Call Graphs Class analysis: Given a reference variable x, what are the classes of the objects that x refers to at runtime? We saw CHA and RTA Deal with polymorphic/virtual calls:


  1. Datalog CO444H Pointer analysis Ben Livshits 1

  2. Call Graphs • Class analysis: • Given a reference variable x, what are the classes of the objects that x refers to at runtime? • We saw CHA and RTA • Deal with polymorphic/virtual calls: x.m() • Compilers: can we devirtualize a virtual call x.m() ? • Software engineering: • Construct the call graph of the program • Why is that important in everyday development?

  3. Features of RTA • RTA may evaluate a method several times • If new callers are discovered the method has to be re-evaluated • RTA runs until the worklist is empty, at which point it has reached a fixed point and cannot resolve any new call edges to add to the call graph 3

  4. RTA Revisited RAPID TYPE ANALYSIS RTA = call graph of only methods (no edges) CHA = class hierarchy analysis call graph W = worklist containing the main method while W is not empty M = next method in W T = set of allocated types in M T = T U {allocated types in RTA callers of M} for each callsite (C) in M if C is a static dispatch or constructor: add an edge to statically resolved method otherwise: M' = methods called from M in CHA M' = M' intersection {methods declared in T or supertypes of T} add an edge from the method M to each method in M' add each method in M' to worklist W 4

  5. Using RTA in Eclipse 5

  6. RTA May Be Unsound • main calls foo, which public static void main(String[] args){ returns an allocation Object o = foo(); of type A that is then bar(o); passed as a } parameter in the call to bar public static Object foo(){ • The call edge to A. return new A(); toString would be } missing because neither bar or its public static void bar(Object o){ parents (main) o.toString() allocated a type of A } 6

  7. Call Graph Construction: Reachability Computation Queue worklist CallGraph graph; worklist.addAtTail( main() ) Graph.addNode( main() ) while (worklist.notEmpty()) { m = worklist.getFromHead(); process_method_body(m); }

  8. Next Steps… • Ingredients • Adding pointers • Adding call graphs • Combining those two • How to we mix the ingredients? • Can first build a call graph; then add pointers • Can do it all at once: we can use Datalog to represent everything , with some Datalog relations encoding intraprocedural aspects and some interprocedural 8

  9. 9 Pointer Analysis: Basics and Algorithms

  10. Variants of Pointer Analysis • For C: • Andersen analysis • Steensgard analysis • Pointer analysis for Java • How to encode these in Datalog • Other variants 10

  11. What is the Goal of Pointer Analysis? • What memory locations can a pointer expression refer to? • Alias analysis: When do two pointer expressions refer to the same storage location? int x; •*p and *q alias p = &x; • as do x and *p q = p; • and x and *q 11

  12. Sources of Aliasing • Aliasing can arise due to several reasons, depending on the language… • Pointers • e.g., int *p, i; p = &i; • Call-by-reference void m(Object a, Object b) { … } m(x,x); // a and b alias in m • Array indexing • int i, j, a[100]; • i = j; // a[i] and a[j] alias 12

  13. Why do we Want to Know? • Pointer analysis tells us • If *p aliases a or b, then second computation of a+b what memory locations is not redundant code uses or modifies • E.g., consider constant • Useful in many analyses propagation • E.g., available x = 3; expressions *p = 4; *p = a + b; y = x; y = a + b; • Is y constant? • If *p and x do not alias each other, then yes. • If *p and x always alias each other, then yes. • If *p and x sometimes alias each 13 other, then no

  14. Pointer Analysis Dimensions • Intraprocedural / • Definiteness: May interprocedural versus must • Flow-sensitive / • Heap modelling flow-insensitive • Data representation • Context-sensitive / context-insensitive 14

  15. Flow-sensitive vs. Flow- insensitive Points-To • Flow-sensitive pointer • Flow-sensitive pointer analysis computes for each analysis is (traditionally) program point what too expensive to memory locations pointer perform for whole expressions may refer to program • Flow-insensitive pointer • Flow-insensitive pointer analysis computes what memory locations pointer analyses typically used expressions may refer to, at for whole program any time in program analyses execution 15

  16. Context Sensitivity • Also difficult, • BDDs see Whaley and Lam PLDI 2004 but success in • Doop, Bravenboer and scaling up to Smaragdakis OOPSLA hundreds of 2009 thousands LOC 16

  17. May vs. Must • May analysis: aliasing • Sometimes both are that may occur during useful execution • E.g., consider liveness • (cf. must-not alias, analysis for *p = *q + 4; although often has • If *p must alias x, then different x in kill set for representation) statement • Must analysis: aliasing • If *q may alias y, then y that must occur during in gen set for statement execution 17

  18. Representation Options • Points-to pairs: first • Pairs that refer to the element points to the same memory second • e.g., (*p,b), (*q,b), (*p,*q), (**r, b) • e .g., (p → b), (q → b) • General, may be less • p and b alias, as do *q concise than points-to and b, as do *p and *q pairs • Equivalence sets: sets that are aliases • e.g., {*p,*q,b} 18

  19. Modeling Memory Locations • We want to describe • For local variables, use what memory locations a single “node” per a pointer expression context may refer to • i.e., just one node if context insensitive • How do we model • For dynamically memory locations? allocated memory • For global variables, no • Problem: Potentially trouble, use a single unbounded locations “node” created at runtime • Need to model locations with some finite abstraction 19

  20. Modeling Dynamic Memory Locations • Other solutions: • For each allocation statement, use one • One node for node per context entire heap • Note: could choose • One node for context-sensitivity for each type modelling heap • Nodes based on locations to be less precise than context- analysis of sensitivity for modelling “ shape ” of heap procedure invocation 20

  21. Problem Statement • Let’s consider flow- • Assume pointers p,q ∈ P and address-taken variables insensitive may pointer a,b ∈ A are disjoint analysis • Can transform program to • Assume program make this true • For any variable v for which consists of statements this isn’t true, add statement of form pv = &av, and replace v with *pv p = &a (address of, includes allocation • Want to compute relation statements) pts : P ∪ A → 2 A p = q *p = q • Essentially points to pairs p = *q 21

  22. Andersen-style Pointer Analysis • View pointer assignments as subset constraints • Use constraints to propagate points-to information • Called inclusion-based pointer analysis 22

  23. Andersen-style Pointer Analysis • Can solve these constraints directly on sets pts(p) • p = &a; p ⊇ {a} • q = p; q ⊇ p • p = &b; p ⊇ {b} • r = p; r ⊇ p 23

  24. Example of Subset Constraints 24

  25. How Precise Is This Analysis? 25

  26. Andersen-style as Graph Closure • Can be cast as a graph closure problem • One node for each pts(p), pts(a) • Each node has an associated points-to set • Compute transitive closure of graph, and add edges according to complex constraints 26

  27. Work List Algorithm • Initialize graph and points to sets using base and simple constraints • Let W = { v | pts(v) ≠ ∅ } (all nodes with non-empty points to sets) • While W not empty • v ← select from W • for each a ∈ pts(v) do • add edge a→ p, and add a to W if edge is new • for each constraint *v ⊇ q • add edge q→a , and add q to W if edge is new • for each edge v→q do • pts(q) = pts(q) ∪ pts(v), and add q to W if pts(q) changed 27

  28. Same Example, as A Graph (Initial) W: p q r s a 28

  29. Same Example, as A Graph (Final) W: {} 29

  30. Cycle Elimination • Andersen-style pointer analysis is O(n 3 ), for number of nodes in graph • Actually, quadratic in practice [Sridharan and Fink, SAS 09]; • Improve scalability by reducing the value of n • Cycle elimination: important optimization for Andersen- style analysis • Detect strongly connected components in points-to graph, collapse to single node • Why? All nodes in an SCC will have same points-to relation at end of analysis 30

  31. Steensgaard-style Analysis • Also a constraint-based analysis • Uses equality constraints instead of subset constraints • Originally phrased as a type-inference problem • Less precise than Andersen-style, thus more scalable 31

  32. Steensgaard-style Example p q a b c p,q a,b c All pointers end up in p,q r the same equivalence a,b c class pointing to all the locations p,q,s,t r a,b c p,q,s,t,r a,b,c 32

  33. Implementing Steensgaard • Can be efficiently implemented using UnionFind algorithm • Nearly linear time: O( nα (n)) • Each statement needs to be processed just once • Unlike Andersen’s, which is a lot more difficult to scale 33

  34. 34 Datalog-based Formulation of Pointer Analysis

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend