course script
play

Course Script Static analysis and all that IN5440 / autumn 2020 - PDF document

Course Script Static analysis and all that IN5440 / autumn 2020 Martin Steffen Contents ii Contents 2 Data flow analysis 1 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2.2 Intraprocedural


  1. Course Script Static analysis and all that IN5440 / autumn 2020 Martin Steffen

  2. Contents ii Contents 2 Data flow analysis 1 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2.2 Intraprocedural analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2.2.1 Determining the control flow graph . . . . . . . . . . . . . . . . . . . 2 2.2.2 Available expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2.3 Reaching definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2.4 Very busy expressions . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2.5 Live variable analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.3 Theoretical properties and semantics . . . . . . . . . . . . . . . . . . . . . . 20 2.3.1 Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.3.2 Intermezzo: Lattices . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.4 Monotone frameworks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 2.5 Equation solving . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.6 Interprocedural analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 2.6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 2.6.2 Extending the semantics and the CFGs . . . . . . . . . . . . . . . . 44 2.6.3 Naive analysis (non-context-sensitive) . . . . . . . . . . . . . . . . . 52 2.6.4 Taking paths into account . . . . . . . . . . . . . . . . . . . . . . . . 57 2.6.5 Context-sensitive analysis . . . . . . . . . . . . . . . . . . . . . . . . 62 2.7 Static single assignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 2.7.1 Value numbering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

  3. 2 Data flow analysis 1 2 Chapter Data flow analysis What Learning Targets of this Chapter Contents is it about? various DFAs 2.1 Introduction . . . . . . . . . . 1 monotone frameworks 2.2 Intraprocedural analysis . . . 2 operational semantics 2.2.1 Determining the con- foundations trol flow graph . . . . 2 special topics (SSA, 2.2.2 Available expressions . 5 context-sensitive analysis ...) 2.2.3 Reaching definitions . 9 2.2.4 Very busy expressions 11 2.2.5 Live variable analysis 16 2.3 Theoretical properties and semantics . . . . . . . . . . . 20 2.3.1 Semantics . . . . . . . 20 2.3.2 Intermezzo: Lattices . 24 2.4 Monotone frameworks . . . . 33 2.5 Equation solving . . . . . . . 37 2.6 Interprocedural analysis . . . 42 2.6.1 Introduction . . . . . 42 2.6.2 Extending the se- mantics and the CFGs 44 2.6.3 Naive analysis (non- context-sensitive) . . . 52 2.6.4 Taking paths into ac- count . . . . . . . . . 57 2.6.5 Context-sensitive analysis . . . . . . . . 62 2.7 Static single assignment . . . 84 2.7.1 Value numbering . . . 92 2.1 Introduction In this part, we cover classical data flow analysis , first in a few special, specific analyses. Among other ones, we do one more time reaching definitions analysis. Besides that, we also cover some other well-known ones. Those analyses are based on common principles. Those principles then lead to the notion of monotone framework . All of this is done for

  4. 2 Data flow analysis 2 2.2 Intraprocedural analysis the simple while-language from the general introduction. We also have a look at some extensions. One is the treatment of procedures . Those will be first-order procedures, not higher-order procedures. Nonetheless, they are already complicating the data flow problem (and its computational complexity), leading to what is known as context-sensitive analysis. Another extension deals with dynamically allocated memory on heaps (if we have time). Analyses that deal with that particular language feature are known as alias analysis , pointer analysis , and shape analysis . Also we might cover SSA this time. 2.2 Intraprocedural analysis As a start, we basically have a closer look at what we already discussed in the introductory warm-up: the very basics of data flow analysis, without complications like procedures, pointers, etc. The form of analysis is called intraprocedural analysis, i.e., an analysis focusing on the body of one procedure (or method, etc.). As already discussed, data flow analysis is done on top of so-called control-flow graphs . The control flow of each procedure can be abstractly represented by one such graph. Later, we will see how to “connect” different control-flow graphs to cover procedure calls and returns and how to extend the analyses for that. Compared to the introduction, we dig a bit deeper: we show how a CFG can be computed for the while language (not that it’s complicated), and we list different kinds of analyses, not just reaching definitions. Looking at those will show that they share common traits, and that prepares for what is known as monotone frameworks , a classic general framework for all kinds of data flow analyses. 2.2.1 Determining the control flow graph This section shows how turn an abstract syntrax tree into a CFG. The starting point is the (labelled) abstract syntax from the introduction. While language and control flow graph • starting point: while language from the intro • labelled syntax (unique labels) • labels = nodes of the cfg • initial and final labels • edges of a cfg: given by function flow Determining the edges of the control-flow graph Given an program in labelled (and abstract) syntax, the control-flow graph is easily calculated. The nodes we have already (in the form of the labels), the edges are given by a function flow . This function needs, as auxiliary functions, the functions init and final The latter 2 functions are of the following type:

  5. 2 Data flow analysis 3 2.2 Intraprocedural analysis final : Stmt → 2 Lab init : Stmt → Lab (2.1) Their definition is straightforward, by induction on the labelled syntax: init final (2.2) [ x := a ] l l { l } [ skip ] l l { l } S 1 ; S 2 init ( S 1 ) final ( S 2 ) if [ b ] l then S 1 else S 2 l final ( S 1 ) ∪ final ( S 2 ) while [ b ] l do S { l } l The label init ( S ) is the entry node to the graph of S . The language is simple and initial nodes are unique , but “exits” are not. Note that the concept of unique entry is not the same as that of “isolated” entry (mentioned already in the introduction). Isolated would mean: the entry is not the target of any edge. That’s not the case, for instance of the program consists of outer while loop. In general, however, it may be preferable to have an isolated entry, as well, and one can arrange easily for that, adding one extra sentinel node at the beginning. For isolated exits, one can do the same at the end. Using those, determining the edges, by a function flow : Stmt → 2 Lab × Lab works as follows: flow ([ x := a ] l ) = ∅ (2.3) flow ([ skip ] l ) = ∅ flow ( S 1 ; S 2 ) = flow ( S 1 ) ∪ flow ( S 2 ) ∪{ ( l, init ( S 2 )) | l ∈ final ( S 1 ) } flow ( if [ b ] l then S 1 else S 2 ) flow ( S 1 ) ∪ flow ( S 2 ) = ∪{ ( l, init ( S 1 )) , ( l, init ( S 2 )) } flow ( while [ b ] l do S ) flow ( S 1 ) ∪ { l, init ( S ) } = ∪{ ( l ′ , l ) | l ′ ∈ final ( S ) } Two further helpful functions In the following, we make use of two further (very easy) functions with the following types labels : Stmt → 2 Lab blocks : Stmt → 2 Stmt and They are defined straightforwardly as follows:

  6. 2 Data flow analysis 4 2.2 Intraprocedural analysis blocks ([ x := a ] l ) [ x := a ] l = (2.4) blocks ([ skip ] l ) [ skip ] l = blocks ( S 1 ; S 2 ) = blocks ( S 1 ) ∪ blocks ( S 2 ) blocks ( if [ b ] l then S 1 else S 2 ) { [ b ] l } ∪ blocks ( S 1 ) ∪ blocks ( S 2 ) = blocks ( while [ b ] l do S ) { [ b ] l } ∪ blocks ( S ) = labels ( S ) = { l | [ B ] l ∈ blocks ( S ) } (2.5) All the definitions and concepts are really straightforward and should be intuitively clear almost without giving a definition at all. One point with those definitions, though is the following: the given definitions are all “constructive”. They are given by structural induction over the labelled syntax. That means, they directly describe recursive procedures on the syntax trees. It’s a leitmotif of the lecture: we are dealing with static analysis, which is a phase of a compiler, which means, all definitions and concepts need to be realized in the form of algorithms and data structures: there must be a concrete control-flow graph data structure and there must be a function that determines it. Flow and reverse flow labels ( S ) = init ( S ) ∪ { l | ( l, l ′ ) ∈ flow ( S ) } ∪ { l ′ | ( l, l ′ ) ∈ flow ( S ) } • data flow analysis can be forward (like RD) or backward • flow : for forward analyses • for backward analyses: reverse flow flow R , simply invert the edges Program of interest • S ∗ : program being analysed, top-level statement • analogously Lab ∗ , Var ∗ , Blocks ∗ • trivial expression: a single variable or constant • AExp ∗ : non-trivial arithmetic sub-expr. of S ∗ , analogous for AExp ( a ) and AExp ( b ). • useful restrictions – isolated entries : ( l, init ( S ∗ )) / ∈ flow ( S ∗ ) ∀ l 1 ∈ final ( S ∗ ) . ∈ flow ( S ∗ ) – isolated exits ( l 1 , l 2 ) / – label consistency [ B 1 ] l , [ B 2 ] l ∈ blocks ( S ) then B 1 = B 2 “ l labels the block B ” • even better: unique labelling

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend