Course Script Static analysis and all that IN5440 / autumn 2020 - PDF document

Course Script Static analysis and all that IN5440 / autumn 2020 Martin Steffen

Contents ii Contents 2 Data flow analysis 1 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2.2 Intraprocedural analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2.2.1 Determining the control flow graph . . . . . . . . . . . . . . . . . . . 2 2.2.2 Available expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2.3 Reaching definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2.4 Very busy expressions . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2.5 Live variable analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.3 Theoretical properties and semantics . . . . . . . . . . . . . . . . . . . . . . 20 2.3.1 Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.3.2 Intermezzo: Lattices . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.4 Monotone frameworks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 2.5 Equation solving . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.6 Interprocedural analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 2.6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 2.6.2 Extending the semantics and the CFGs . . . . . . . . . . . . . . . . 44 2.6.3 Naive analysis (non-context-sensitive) . . . . . . . . . . . . . . . . . 52 2.6.4 Taking paths into account . . . . . . . . . . . . . . . . . . . . . . . . 57 2.6.5 Context-sensitive analysis . . . . . . . . . . . . . . . . . . . . . . . . 62 2.7 Static single assignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 2.7.1 Value numbering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

2 Data flow analysis 1 2 Chapter Data flow analysis What Learning Targets of this Chapter Contents is it about? various DFAs 2.1 Introduction . . . . . . . . . . 1 monotone frameworks 2.2 Intraprocedural analysis . . . 2 operational semantics 2.2.1 Determining the con- foundations trol flow graph . . . . 2 special topics (SSA, 2.2.2 Available expressions . 5 context-sensitive analysis ...) 2.2.3 Reaching definitions . 9 2.2.4 Very busy expressions 11 2.2.5 Live variable analysis 16 2.3 Theoretical properties and semantics . . . . . . . . . . . 20 2.3.1 Semantics . . . . . . . 20 2.3.2 Intermezzo: Lattices . 24 2.4 Monotone frameworks . . . . 33 2.5 Equation solving . . . . . . . 37 2.6 Interprocedural analysis . . . 42 2.6.1 Introduction . . . . . 42 2.6.2 Extending the semantics and the CFGs 44 2.6.3 Naive analysis (non- context-sensitive) . . . 52 2.6.4 Taking paths into account . . . . . . . . . 57 2.6.5 Context-sensitive analysis . . . . . . . . 62 2.7 Static single assignment . . . 84 2.7.1 Value numbering . . . 92 2.1 Introduction In this part, we cover classical data flow analysis , first in a few special, specific analyses. Among other ones, we do one more time reaching definitions analysis. Besides that, we also cover some other well-known ones. Those analyses are based on common principles. Those principles then lead to the notion of monotone framework . All of this is done for

2 Data flow analysis 2 2.2 Intraprocedural analysis the simple while-language from the general introduction. We also have a look at some extensions. One is the treatment of procedures . Those will be first-order procedures, not higher-order procedures. Nonetheless, they are already complicating the data flow problem (and its computational complexity), leading to what is known as context-sensitive analysis. Another extension deals with dynamically allocated memory on heaps (if we have time). Analyses that deal with that particular language feature are known as alias analysis , pointer analysis , and shape analysis . Also we might cover SSA this time. 2.2 Intraprocedural analysis As a start, we basically have a closer look at what we already discussed in the introductory warm-up: the very basics of data flow analysis, without complications like procedures, pointers, etc. The form of analysis is called intraprocedural analysis, i.e., an analysis focusing on the body of one procedure (or method, etc.). As already discussed, data flow analysis is done on top of so-called control-flow graphs . The control flow of each procedure can be abstractly represented by one such graph. Later, we will see how to “connect” different control-flow graphs to cover procedure calls and returns and how to extend the analyses for that. Compared to the introduction, we dig a bit deeper: we show how a CFG can be computed for the while language (not that it’s complicated), and we list different kinds of analyses, not just reaching definitions. Looking at those will show that they share common traits, and that prepares for what is known as monotone frameworks , a classic general framework for all kinds of data flow analyses. 2.2.1 Determining the control flow graph This section shows how turn an abstract syntrax tree into a CFG. The starting point is the (labelled) abstract syntax from the introduction. While language and control flow graph • starting point: while language from the intro • labelled syntax (unique labels) • labels = nodes of the cfg • initial and final labels • edges of a cfg: given by function flow Determining the edges of the control-flow graph Given an program in labelled (and abstract) syntax, the control-flow graph is easily calculated. The nodes we have already (in the form of the labels), the edges are given by a function flow . This function needs, as auxiliary functions, the functions init and final The latter 2 functions are of the following type:

2 Data flow analysis 3 2.2 Intraprocedural analysis final : Stmt → 2 Lab init : Stmt → Lab (2.1) Their definition is straightforward, by induction on the labelled syntax: init final (2.2) [ x := a ] l l { l } [ skip ] l l { l } S 1 ; S 2 init ( S 1 ) final ( S 2 ) if [ b ] l then S 1 else S 2 l final ( S 1 ) ∪ final ( S 2 ) while [ b ] l do S { l } l The label init ( S ) is the entry node to the graph of S . The language is simple and initial nodes are unique , but “exits” are not. Note that the concept of unique entry is not the same as that of “isolated” entry (mentioned already in the introduction). Isolated would mean: the entry is not the target of any edge. That’s not the case, for instance of the program consists of outer while loop. In general, however, it may be preferable to have an isolated entry, as well, and one can arrange easily for that, adding one extra sentinel node at the beginning. For isolated exits, one can do the same at the end. Using those, determining the edges, by a function flow : Stmt → 2 Lab × Lab works as follows: flow ([ x := a ] l ) = ∅ (2.3) flow ([ skip ] l ) = ∅ flow ( S 1 ; S 2 ) = flow ( S 1 ) ∪ flow ( S 2 ) ∪{ ( l, init ( S 2 )) | l ∈ final ( S 1 ) } flow ( if [ b ] l then S 1 else S 2 ) flow ( S 1 ) ∪ flow ( S 2 ) = ∪{ ( l, init ( S 1 )) , ( l, init ( S 2 )) } flow ( while [ b ] l do S ) flow ( S 1 ) ∪ { l, init ( S ) } = ∪{ ( l ′ , l ) | l ′ ∈ final ( S ) } Two further helpful functions In the following, we make use of two further (very easy) functions with the following types labels : Stmt → 2 Lab blocks : Stmt → 2 Stmt and They are defined straightforwardly as follows:

2 Data flow analysis 4 2.2 Intraprocedural analysis blocks ([ x := a ] l ) [ x := a ] l = (2.4) blocks ([ skip ] l ) [ skip ] l = blocks ( S 1 ; S 2 ) = blocks ( S 1 ) ∪ blocks ( S 2 ) blocks ( if [ b ] l then S 1 else S 2 ) { [ b ] l } ∪ blocks ( S 1 ) ∪ blocks ( S 2 ) = blocks ( while [ b ] l do S ) { [ b ] l } ∪ blocks ( S ) = labels ( S ) = { l | [ B ] l ∈ blocks ( S ) } (2.5) All the definitions and concepts are really straightforward and should be intuitively clear almost without giving a definition at all. One point with those definitions, though is the following: the given definitions are all “constructive”. They are given by structural induction over the labelled syntax. That means, they directly describe recursive procedures on the syntax trees. It’s a leitmotif of the lecture: we are dealing with static analysis, which is a phase of a compiler, which means, all definitions and concepts need to be realized in the form of algorithms and data structures: there must be a concrete control-flow graph data structure and there must be a function that determines it. Flow and reverse flow labels ( S ) = init ( S ) ∪ { l | ( l, l ′ ) ∈ flow ( S ) } ∪ { l ′ | ( l, l ′ ) ∈ flow ( S ) } • data flow analysis can be forward (like RD) or backward • flow : for forward analyses • for backward analyses: reverse flow flow R , simply invert the edges Program of interest • S ∗ : program being analysed, top-level statement • analogously Lab ∗ , Var ∗ , Blocks ∗ • trivial expression: a single variable or constant • AExp ∗ : non-trivial arithmetic sub-expr. of S ∗ , analogous for AExp ( a ) and AExp ( b ). • useful restrictions – isolated entries : ( l, init ( S ∗ )) / ∈ flow ( S ∗ ) ∀ l 1 ∈ final ( S ∗ ) . ∈ flow ( S ∗ ) – isolated exits ( l 1 , l 2 ) / – label consistency [ B 1 ] l , [ B 2 ] l ∈ blocks ( S ) then B 1 = B 2 “ l labels the block B ” • even better: unique labelling

Course Script Static analysis and all that IN5440 / autumn 2020 - PDF document

Course Script Static analysis and all that IN5440 / autumn 2020 Martin Steffen Contents ii Contents 2 Data flow analysis 1 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2.2 Intraprocedural

Class Unity scripts Rotate cube script Counter + collision script Sound script

LATIN-NASTALIQUE SCRIPT CLASSIFICATION SYSTEM Presenter: Muhammad Usman Ghani Latin script is

Natural script writing with Guile The newest step on my path towards the perfect script writing

Andromeda: XSS Accurate and Scalable Security Attackers evil script Analysis of Web

An Introduction to Php for Web API Principle of server side script WEB Client WEB SERVER html

Course Orientation q Course Description q Course Outcomes q Course Requirements q Course Outline

Script Sacred Heart Primary School Can of kids video presentation SAFETY SCRIPT Hannah

101 PRESENTATION SCRIPT Speaking Notes from Living Well Now: Practice this script at least 5 times

Pilot Training: Pilot Training: Departing From The Script Departing From The Script Captain

Presentation script Slide Screenshot Script SLIDE 1 DO: [Welcome leaders to the education

BBB AMBASSADOR PRESENTATION SCRIPT *You are not required to read the provided script verbatim.

What is Bash Shell Scripting? A shell script is a script written for the shell, or command

SCRIPT JOHN NEWBERY @jfnewbery github.com/jnewbery WHAT THIS TALK WILL COVER Why we have

Overview and Progress ICANN Singapore Meeting Task Force on Arabic Script IDNs (TF-AIDN) Middle

Detecting Script-to-Script Interactions in Call Processing Language Masahide Nakamura,

A script is a .COD file that resides within your database structure, typically within your

Canadian Tire Corporation 2016 First Quarter Financial Results May 12 th , 2016 Forward Looking

IDEC TRAINING Abstract [SODR /SOTL] Reviewer Training 1 Overview/Goals Review process for

6 I \ No .co/-ptsreporteds . 4401 & . . : 6 ' Layering . . IQ " " :*

6/17/2018 [1] PE Prof. Mor M. Peretz Analog Electronic Circuits 361-1-3671 M I C T HE C

R's weirdnesses are fun & useful richfitz Rich FitzJohn R is a really weird language

Introduction Modern DRAM Memory Architectures Sam Miller Memory subsystem is a bottleneck

Leakage Stefan Dziembowski Tomasz Kazana Daniel Wichs Main contribution We propose a secure

Heroku Provider The Heroku provider is used to interact with the resources provided by Heroku

Course Script Static analysis and all that IN5440 / autumn 2020 - PDF document

Course Script Static analysis and all that IN5440 / autumn 2020 Martin Steffen Contents ii Contents 2 Data flow analysis 1 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2.2 Intraprocedural

Class Unity scripts Rotate cube script Counter + collision script Sound script

LATIN-NASTALIQUE SCRIPT CLASSIFICATION SYSTEM Presenter: Muhammad Usman Ghani Latin script is

Natural script writing with Guile The newest step on my path towards the perfect script writing

Andromeda: XSS Accurate and Scalable Security Attackers evil script Analysis of Web

An Introduction to Php for Web API Principle of server side script WEB Client WEB SERVER html

Course Orientation q Course Description q Course Outcomes q Course Requirements q Course Outline

Script Sacred Heart Primary School Can of kids video presentation SAFETY SCRIPT Hannah

101 PRESENTATION SCRIPT Speaking Notes from Living Well Now: Practice this script at least 5 times

Pilot Training: Pilot Training: Departing From The Script Departing From The Script Captain

Presentation script Slide Screenshot Script SLIDE 1 DO: [Welcome leaders to the education

BBB AMBASSADOR PRESENTATION SCRIPT *You are not required to read the provided script verbatim.

What is Bash Shell Scripting? A shell script is a script written for the shell, or command

SCRIPT JOHN NEWBERY @jfnewbery github.com/jnewbery WHAT THIS TALK WILL COVER Why we have

Overview and Progress ICANN Singapore Meeting Task Force on Arabic Script IDNs (TF-AIDN) Middle

Detecting Script-to-Script Interactions in Call Processing Language Masahide Nakamura,

A script is a .COD file that resides within your database structure, typically within your

Canadian Tire Corporation 2016 First Quarter Financial Results May 12 th , 2016 Forward Looking

IDEC TRAINING Abstract [SODR /SOTL] Reviewer Training 1 Overview/Goals Review process for

6 I \ No .co/-ptsreporteds . 440*1 &amp; . . : 6 ' Layering . . IQ &quot; &quot; * :*

6/17/2018 [1] PE Prof. Mor M. Peretz Analog Electronic Circuits 361-1-3671 M I C T HE C

R's weirdnesses are fun &amp; useful richfitz Rich FitzJohn R is a really weird language

Introduction Modern DRAM Memory Architectures Sam Miller Memory subsystem is a bottleneck

Leakage Stefan Dziembowski Tomasz Kazana Daniel Wichs Main contribution We propose a secure

Heroku Provider The Heroku provider is used to interact with the resources provided by Heroku

6 I \ No .co/-ptsreporteds . 4401 & . . : 6 ' Layering . . IQ " " :*

R's weirdnesses are fun & useful richfitz Rich FitzJohn R is a really weird language