Compiler Design Spring 2018 8.0 Data-Flow Analysis Thomas R. Gross - - PowerPoint PPT Presentation

compiler design
SMART_READER_LITE
LIVE PREVIEW

Compiler Design Spring 2018 8.0 Data-Flow Analysis Thomas R. Gross - - PowerPoint PPT Presentation

Compiler Design Spring 2018 8.0 Data-Flow Analysis Thomas R. Gross Computer Science Department ETH Zurich, Switzerland 1 Properties of program analysis Analysis must be correct Depends on use Do not mislead the compiler Analysis


slide-1
SLIDE 1

Compiler Design

Spring 2018

8.0 Data-Flow Analysis

1

Thomas R. Gross Computer Science Department ETH Zurich, Switzerland

slide-2
SLIDE 2

Properties of program analysis

§ Analysis must be correct

§ Depends on use § Do not mislead the compiler

§ Analysis should be accurate

§ As accurate as possible (given the information in the program) § As accurate as feasible (may not be able to keep all details)

3

slide-3
SLIDE 3

Outline

§ Introduction

§ Why do we need data-flow analysis § Examples

§ 8.1 Program representation § 8.2 Points § 8.3 Paths § 8.4 Transfer functions

4

slide-4
SLIDE 4

6

slide-5
SLIDE 5

int foo(int max, int [] A) { int k = 1; int minVal = A[0]; while (k < max) { if (A[k] < minVal) { minVal = A[k]; } k = k + 1; } return minVal; }

9

slide-6
SLIDE 6

int foo(int max, int [] A) { k = 1 minVal = A[0] L: TCond1 = k < max if (TCond1) TCond2 = A[k] < minVal if (TCond2) minVal = A[k] k = k + 1 Goto L return minVal }

10

slide-7
SLIDE 7

int foo(int max, int [] A) { k = 1 minVal = A[0] L: TCond1 = k < max if (TCond1) TCond2 = A[k] < minVal if (TCond2) minVal = A[k] k = k + 1 Goto L return minVal }

11

slide-8
SLIDE 8

int foo(int max, int [] A) { k = 1 minVal = A[0] L: TCond1 = k < max if (TCond1) TCond2 = A[k] < minVal if (TCond2) minVal = A[k] k = k + 1 Goto L return minVal }

13

slide-9
SLIDE 9

§ All instructions in a box are executed together

§ We ignore for now that there could be exceptions if (ConditionVariable)

§ Test ConditionVariable and proceed accordingly

§ “TRUE” path § “FALSE” path § Only one path is taken

§ ”Goto Label” has the obvious meaning

15

slide-10
SLIDE 10

Basic block

§ The maximal sequence of instructions that is executed together is known as a “basic block”

§ Together means: w/o control flow change § You can form basic blocks directly from the JavaLi IR § May be harder in other languages or IRs

16

slide-11
SLIDE 11

Basic block

§ Observations

§ An if-statement ends a basic block

§ We do not know which statement will be executed next

§ A goto-statement ends a basic block § A return statement ends a basic block § A label starts a basic block (and ends the previous block) § Method call does not end a basic block

§ For some compilers a method call ends a basic block

17

slide-12
SLIDE 12

Control-flow graph

§ We can build a graph that captures the control flow

§ CFG: control-flow graph

§ Nodes: basic blocks § Edges: There is an edge between block B1 and block B2 if B2 may be executed immediately after B1. § Two special nodes: ENTRY and EXIT

§ ENTRY has no in-edges § EXIT has no out-edges § All other nodes have at least one in-edge and one out-edge

18

slide-13
SLIDE 13

k = 1 minVal = A[0] L: TCond1 = k < max if (TCond1) TCond2 = A[k] < minVal if (TCond2) minVal = A[k] k = k + 1 Goto L return minVal

19

B0

B1 B2 B3 B4 B5

slide-14
SLIDE 14

k = 1 minVal = A[0] L: TCond1 = k < max if (TCond1) TCond2 = A[k] < minVal if (TCond2) minVal = A[k] k = k + 1 Goto L return minVal

20

B0

B1 B2 B3 B4 B5 ENTRY EXIT

slide-15
SLIDE 15

x = 2; if (x > 0) { y = 0; while (x < Nmax) { y = y + 1; x = 2 * x; if (x == Nspecial) { x = x * x; } } } else { y = 1; }

22

Construct the CFG for this program (5 min)

slide-16
SLIDE 16

§ How many nodes (basic blocks) are in the CFG?

§ Did you include ENTRY and EXIT?

23

slide-17
SLIDE 17

25

slide-18
SLIDE 18

Control-flow graph

§ Edge defines successor/predecessor relationship

§ A block can be its own predecessor

§ Edges capture possible successor (predecessor) relationships

§ Upon further inspection may want to remove edges and/or blocks

§ Always built for one method/function

§ May want to include header block to deal with parameters § Some compilers may include in header code to free registers, ”stack banging”, etc., and may have an exit block for restore operations

27

slide-19
SLIDE 19

Control-flow graph

§ Exceptional control flow discussion postponed

§ “try-throw-catch” § Exception triggered by hardware

28

slide-20
SLIDE 20

CFG construction

§ Basic blocks are a convenient abstraction

§ Not necessary for compiler

§ Some algorithms easier to describe when using basic blocks, with control flow edges between blocks

§ But many algorithms easy to explain in flat IR (sequences of statements)

§ If you must/want to construct a CFG (and identify blocks):

§ Single pass over current IR § Form basic blocks (CFG nodes) on the fly

29

slide-21
SLIDE 21

Comments on intermediate representations

§ Previous example “close” to JavaLi source

§ For illustration § Statements: JavaLi statements (almost)

§ Basic block concept applies also to low-level (assembler) languages

§ Statement: asm instruction

§ Or to other kinds of internal representations

30

slide-22
SLIDE 22

Comments (cont’d)

§ Many algorithms easier to explain/discuss for “simple” statements

§ destination = source1 op source2 § Instead of destination = source1 op source2 op source3 op source4 …

§ We assume that our programs have this form (2 operands)

§ Three-address code

§ It’s easy to transform a program

a = b + c + d § turns into temp1 = c + d a = b + temp1

31

slide-23
SLIDE 23

33

slide-24
SLIDE 24

§ Once we identify c+d as a “common sub-expression” (i.e., an expression that’s evaluated more than once) it’s easy to change the program

§ We could also work on large trees but it’s painful

§ Of course, it’s not clear if the compiler should replace the second occurrence of c+d by temp1.

34

slide-25
SLIDE 25

Analysis inside a basic block

§ “Local” program analysis § As the compiler knows that all instructions are executed together, it’s “easy” to analyze a basic block § It’s still far from trivial to consider transformations

§ …or to identify operands

35

slide-26
SLIDE 26

Example

a = b + c + d ; x = b + d; § There is a common sub-expression but the compiler (most likely) won’t find it

§ Even if we deal with integer operands…

36

slide-27
SLIDE 27

Another example

a[k] = 1; a[m] = 2; b = a[k] + 1; § Can the compiler assume a[k]= 1?

§ k, m method parameters (int) § a some (large) array (int) § no multi-threading, no hidden changes to k, m

§ No, as k == m is possible.

38

slide-28
SLIDE 28

Analysis inside a basic block

§ Analysis (and optimization) of basic blocks postponed § Not difficult

§ Chapter 8 of Aho et al. contains a discussion of the topic

39

slide-29
SLIDE 29

Terminology

§ Local {analysis | transformation}: inside a basic block § Global {analysis | transformation}: inside a method/function

§ Intra-procedural…

§ Inter-procedural {analysis | transformation}: across methods/functions

40

slide-30
SLIDE 30

Outline

§ Introduction

§ Why do we need data-flow analysis § Examples

§ 8.1 Program representation § 8.2 Points § 8.3 Paths § 8.4 Transfer functions

41

slide-31
SLIDE 31

43

slide-32
SLIDE 32

Points (cont’d)

§ Points can be extended to basic blocks as well § Point (as before): a place in a program

§ Given a basic block B § Pbefore_B: point before basic block B is executed § Pafter_B: point after basic block B is executed § Drop B if no risk of confusion

45

slide-33
SLIDE 33

k = 1 minVal = A[0] L: TCond1 = k < max if (TCond1) TCond2 = A[k] < minVal if (TCond2) minVal = A[k] k = k + 1 Goto L return minVal

46

B0 B1 B2 B3 B4 B5

slide-34
SLIDE 34

52

slide-35
SLIDE 35

Comments

§ Points may have multiple predecessors

§ Join points § Join nodes (in the CFG)

§ Points may have multiple successors

§ Split points § Split nodes

§ When summarizing paths we may just list the basic blocks

53

slide-36
SLIDE 36

Paths

54

B2 B0 B1 B3

slide-37
SLIDE 37

Paths

56

i = 0 if (i>0)

B2 B0 B1 B3

slide-38
SLIDE 38

Paths

57

if (i>0) if (i<0)

B2 B0 B1 B3 B4 B5 B6

slide-39
SLIDE 39

Paths

§ We are interested in all paths in a CFG

§ Even if they can never be taken in an execution

§ Need summary information

§ There can be arbitrarily many paths

59

slide-40
SLIDE 40

Loops

60

if (COND)

§ B0 B1 B3 § B0 B1 B2 B1 B3 § B0 B1 B2 B1 B2 B1 B3 § B0 B1 B2 B1 B2 B1 B2 B1 B2 B1 B2 B1 B2 B1 B2 B1 B2 B1 B2 …

B0 B1 B2 B3

slide-41
SLIDE 41

Paths

§ We deal only with finite paths § Practical view: What has happened when program execution reaches B3

§ Pbefore_B3 § Execution (for some input) may never reach B3…

§ Summary information is needed

62