dataflow analysis first example analysis 1 available
play

Dataflow analysis First example (analysis #1) Available expressions - PDF document

Dataflow analysis First example (analysis #1) Available expressions Michel Schinz Advanced compiler construction, 2008-05-09 Common subexp. elimination Available expressions The following C program fragment sets r to x y for y > 0. How can


  1. Dataflow analysis First example (analysis #1) Available expressions Michel Schinz Advanced compiler construction, 2008-05-09 Common subexp. elimination Available expressions The following C program fragment sets r to x y for y > 0. How can it be (slightly) optimised? Why is the previous optimisation valid? 1 int y 1 = 1; Because at line 7, where expression y 1 *2 appears for the 2 int r = x; second time, it is available . That is, no matter how we reach 3 while (y 1 != y) { line 7, y 1 *2 will have been computed previously at line 4. 4 int t = y 1 *2; The computation of line 4 is still valid at line 7 because no 5 if (t <= y) { redefinition of y 1 appears between those two points. 6 r = r*r; 7 y 1 = y 1 *2; Generally speaking, we can define for every program point 8 } else { the set of available expressions , which is the set of all non- Here, y 1 *2 can be 9 r = r*x; trivial expressions whose value has already been computed replaced by t 10 y 1 = y 1 +1; at that point. 11 } 12 } 3 4 Available expressions Formalising the analysis {} int y 1 = 1 {} Note: we before after only consider {} int r = x {} arithmetic expressions. How can these ideas be formalised? {} while (y 1 != y) {} 1. introduce a variable i n for the set of expressions available before node n , and a variable o n for the set of {} int t = y 1 *2 { y 1 *2 } expressions available after node n , 2. define equations between those variables, { y 1 *2 } if (t <= y) { y 1 *2 } 3. solve those equations. { y 1 *2 } r = r*r { y 1 *2 } { y 1 *2 } r = r*x { y 1 *2 } { y 1 *2 } y 1 = y 1 *2 {} { y 1 *2 } y 1 = y 1 +1 {} 5 6

  2. Equations Solving equations int y 1 = 1 1 o 1 = i 1 i 1 ={} The equations can be solved by iteration: o 2 = i 2 i 2 = o 1 int r = x 2 i 3 = o 2 � o 7 � o 10 o 3 = i 3 1. initialise all sets i 1 , …, i 10 , o 1 , …, o 10 to the set of all i 4 = o 3 o 4 = { y 1 *2 } � i 4 non-trivial expressions in the program, here 3 while (y 1 != y) i 5 = o 4 o 5 = i 5 { y 1 *2 , y 1 +1 , r*r , r*x }, i 6 = o 5 o 6 = i 6 � r 2. viewing the equations as assignments, compute the int t = y 1 * 2 i 7 = o 6 o 7 = i 7 � y 1 4 “new” value of those sets, i 9 = o 5 o 9 = i 9 � r 3. iterate until fixed point is reached. i 10 = o 9 o 10 = i 10 � y 1 if (t <= y) 5 Initialisation is done that way because we are interested in Notation: finding the largest sets satisfying the equations: the larger a r = r * r r = r * x 6 9 S � x = set is, the more information it conveys (for this analysis). S \{all expressions using x } y 1 = y 1 * 2 y 1 = y 1 + 1 7 10 7 8 Solving equations Solving equations To simplify the equations, we can first replace all i k variables The simpler system can be solved by iterating until a fixed by their value, to obtain a simpler system, and then solve that point is reached, which happens after 7 iterations. system. It. 1 2 3 4 5 6 7 For our example, we get: o 1 YR {} {} {} {} {} {} o 2 YR YR {} {} {} {} {} o 1 = {} o 6 = o 5 � r o 3 YR YR R {} {} {} {} o 2 = o 1 o 7 = o 6 � y 1 o 4 o 3 = o 2 � o 7 � o 10 YR YR YR { y 1 *2 , r*r , r*x } { y 1 *2 } { y 1 *2 } { y 1 *2 } o 9 = o 5 � r o 5 YR YR YR YR { y 1 *2 , r*r , r*x } { y 1 *2 } { y 1 *2 } o 4 = o 3 � { y 1 *2 } o 10 = o 9 � y 1 o 5 = o 4 o 6 YR Y Y Y Y { y 1 *2 } { y 1 *2 } o 7 YR R {} {} {} {} {} o 9 YR Y Y Y Y { y 1 *2 } { y 1 *2 } o 10 YR R {} {} {} {} {} Notation: Y ={ y 1 *2 , y 1 +1 }, R ={ r*r , r*x }, YR = Y � R 9 10 Generalisation Note: generated expressions In general, for a node n of the control-flow graph, the equations have the following form: The equation giving the expressions available at the exit of i n = o p1 � o p2 � … � o pk node n is: where p 1 … p k are the predecessors of n. o n = gen AE ( n ) � ( i n \ kill AE ( n )) o n = gen AE ( n ) � ( i n \ kill AE ( n )) where gen AE ( n ) are the non-trivial expressions computed by n , and kill AE ( n ) is the set of all non-trivial expressions that use where gen AE ( n ) are the non-trivial expressions a variable modified by n . computed by n , and kill AE ( n ) is the set of all non-trivial expressions that use a variable modified by n . In order for this equation to be correct, expressions that are computed by n but which use a variable modified by n must Substituting i n in o n , we obtain the following equation for o n : not be part of gen AE ( n ). For example o n = gen AE ( n ) � [( o p1 � o p2 � … � o pk ) \ kill AE ( n )] gen AE ( x=y*y ) = { y*y } but gen AE ( y=y*y ) = {} These equations are the dataflow equations for the available expressions dataflow analysis. 11 12

  3. Dataflow analysis Analysis scope Available expressions is one example of a dataflow analysis. In this course, we will only consider intra-procedural Dataflow analysis is a global analysis framework that can be dataflow analyses. That is, analyses that work on a single used to approximate various properties of programs. function at a time. The results of those analyses can be used to perform several As in our example, those analyses work on the code of a optimisations, for example: function represented as a control-flow graph ( CFG ). • common sub-expression elimination, as we have seen, The nodes of the CFG are the statements of the function. • dead-code elimination, The edges of the CFG represent the flow of control: there is • constant propagation, an edge from n 1 to n 2 if and only if control can flow immediately from n 1 to n 2 . That is, if the statements of n 1 and • register allocation, n 2 can be executed in direct succession. • etc. 13 14 Live variable A variable is said to be live at a given point if its value will be Analysis #2 read later. While liveness is clearly undecidable, a conservative approximation can be computed using dataflow analysis. Live variables This approximation can then be used, for example, to allocate registers: a set of variables that are never live at the same time can share a single register. 16 Intuitions Equations We associate to every node n a pair of variables ( i n , o n ) that give the set of variables live when the node is entered or exited, respectively. These variables are defined as follows: Intuitively, a variable is live after a node if it is live before any i n = gen LV ( n ) � ( o n \ kill LV ( n )) of its successors. where gen LV ( n ) is the set of variables read by n , and Moreover, a variable is live before node n if it is either read kill LV ( n ) is the set of variables written by n . by n , or live after n and not written by n . o n = i s1 � i s2 � … � i sk Finally, no variable is live after an exit node. where s 1 … s k are the successors of n . Substituting o n in i n , we obtain the following equation for i n : i n = gen LV ( n ) � [( i s1 � i s2 � … � i sk ) \ kill LV ( n )] 17 18

  4. Equation solving Example CFG equations solution 1 x=read-int i 1 = i 2 \ { x } i 1 = {} i 2 = i 3 \ { y } i 2 = { x } We are interested in finding the smallest sets of variables live i 3 = { x , y } � ( i 4 � i 5 ) 2 y=read-int i 3 = { x , y } at a given point, as the information conveyed by a set i 4 = { x } � ( i 6 \ { z }) i 4 = { x } decreases as its size increases. i 5 = { y } � ( i 6 \ { z }) i 5 = { y } 3 if x<y i 6 = { z } i 6 = { z } Therefore, to solve the equations by iteration, we initialise all sets with the empty set. 4 z=x 5 z=y 6 print z 19 20 Using live variables The previous analysis shows that neither x nor y are live at the same time as z . Therefore, z can be replaced by x or y , thereby removing one assignment. original CFG analysis result optimised CFG Analysis #3 i 1 = {} 1 x=read-int x=read-int i 2 = { x } Reaching definitions i 3 = { x , y } 2 y=read-int y=read-int i 4 = { x } i 5 = { y } 3 if x<y if x<y i 6 = { z } 4 z=x 5 z=y y=x 6 print z print y 21 Reaching definitions Intuitions Intuitively, a definition reaches the beginning of a node if it reaches the exit of any of its predecessors. Moreover, a definition contained in a node n always reaches The reaching definitions for a program point are the the end of n itself. assignments that may have defined the values of variables at that point. Finally, a definition reaches the end of a node n if it reaches the beginning of n and is not killed by n itself. Dataflow analysis can approximate the set of reaching definitions for all program points. These sets can then be (A node n kills a definition d if and only if n is a definition used to perform constant propagation, for example. and defines the same variable as d .) As a first approximation, we consider that no definition reaches the beginning of the entry node. 23 24

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend