CS510 Software Engineering Program Representations Asst. Prof. - - PowerPoint PPT Presentation

cs510 software engineering
SMART_READER_LITE
LIVE PREVIEW

CS510 Software Engineering Program Representations Asst. Prof. - - PowerPoint PPT Presentation

CS510 Software Engineering Program Representations Asst. Prof. Mathias Payer Department of Computer Science Purdue University TA: Scott A. Carr Slides inspired by Xiangyu Zhang http://nebelwelt.net/teaching/15-CS510-SE Spring 2015 Why


slide-1
SLIDE 1

CS510 Software Engineering

Program Representations

  • Asst. Prof. Mathias Payer

Department of Computer Science Purdue University TA: Scott A. Carr Slides inspired by Xiangyu Zhang http://nebelwelt.net/teaching/15-CS510-SE

Spring 2015

slide-2
SLIDE 2

Why Program Representations?

Original representations: source code, binary, test cases. Hard to analyze and bad fit for automatic reasoning. Software is translated (lossy or lossless) into certain representations to help certain analyses.

Mathias Payer (Purdue University) CS510 Software Engineering 2015 2 / 35

slide-3
SLIDE 3

Control-Flow Graph

Table of Contents

1

Control-Flow Graph

2

Cyclomatic Complexity

3

Program Dependence Graph

4

Super Control-Flow Graph

5

Call Graph

6

Other Representations and Tools

Mathias Payer (Purdue University) CS510 Software Engineering 2015 3 / 35

slide-4
SLIDE 4

Control-Flow Graph

Control-Flow Graph (CFG)

The CFG is an abstract representation of a program that captures all possible flows through the program. A CFG is a graph that consists of basic blocks (nodes) and possible control-flow paths (edges). A basic block (BB) is a linear sequence of program statements with a single entry and exit. Control-flow cannot exit or halt at any point inside the basic block except at its exit point. Entry and exit nodes coincide if the basic block has only one statement.

Mathias Payer (Purdue University) CS510 Software Engineering 2015 4 / 35

slide-5
SLIDE 5

Control-Flow Graph

Control-Flow Graph: Definition

Control-Flow Graph A control flow graph (or flow graph) G is defined as a finite set N of nodes and a finite set E of edges. An edge (i, j) in E connects two nodes ni and nj in N. We often write G = (N, E) to denote a flow graph G with nodes given by N and edges by E.

Mathias Payer (Purdue University) CS510 Software Engineering 2015 5 / 35

slide-6
SLIDE 6

Control-Flow Graph

Control-Flow Graph

In a CFG, each BB becomes a node and edges are used to indicate the flow of control between blocks. And edge (i, j) connecting blocks bi and bj implies that control may flow from block bi to block bj 1. The graph, by convention, also has a start node and an end node (also in N). The start node has no incoming edge while the end node has no outgoing edge.

1Note that the graph is directed.

Mathias Payer (Purdue University) CS510 Software Engineering 2015 6 / 35

slide-7
SLIDE 7

Control-Flow Graph

CFG by Example

1 2 3 4

if-else condition

1 2 3

for/while loop

1 2 3

do-while loop

Mathias Payer (Purdue University) CS510 Software Engineering 2015 7 / 35

slide-8
SLIDE 8

Control-Flow Graph

Path

Path Consider a flow graph G = (N, E). A sequence of k edges k > 0, (e1, e2, ..., ek), denotes a path through the flow graph if the following sequence condition holds: Given that np, nq, nr, ns are nodes belonging to N, and 0 < i < k, if ei := (np, nq) and ei+1 := (nr, ns) then nq ≡ nr. A complete path is a path from start to end. A subpath is a subsequence of a complete path.

Mathias Payer (Purdue University) CS510 Software Engineering 2015 8 / 35

slide-9
SLIDE 9

Control-Flow Graph

Feasible Paths

A path p through a flow graph for program P is considered feasible if there exists at least one test case which when input to P produces path p.

1 i n t

func ( i n t n) {

2

i n t i , r e t = n ;

3

f o r ( i = n−1; i >=1; i −−) {

4

r e t = r e t ∗ i ;

5

}

6 }

Start 1 2 End

p1 = (Start, 1, 2, 1, End) p2 = (Start, 1, End) perr = (Start, 1, 2, End)

Mathias Payer (Purdue University) CS510 Software Engineering 2015 9 / 35

slide-10
SLIDE 10

Control-Flow Graph

Number of Paths

A program may allow many distinct paths, depending on the conditions in the program. A program without conditions contains exactly one path from Start to End. Each condition in the program increments the number of paths by at least 1. Conditions can have a multiplicative effect on the number of paths.

Mathias Payer (Purdue University) CS510 Software Engineering 2015 10 / 35

slide-11
SLIDE 11

Control-Flow Graph

Simplified CFG

Each statement is represented by a node (and each basic block therefore contains only one statement which is the entry and exit statement). A simplified CFG is easy to read and implement but not efficient. A naive CFG construction algorithm starts with a simplified CFG and merges nodes ni and ni+1 iff node ni has one outgoing edge and node ni+1 has one incoming edge and edge e := (ni, ni+1).

Mathias Payer (Purdue University) CS510 Software Engineering 2015 11 / 35

slide-12
SLIDE 12

Control-Flow Graph

Dominator

Dominators X dominates Y, iff all possible paths from Start to Y pass through X. X strictly dominates Y, iff X dominates Y and X! = Y . X immediately dominates Y, iff X dominates Y and X is the last dominator before Y on a path from Start to Y.

Mathias Payer (Purdue University) CS510 Software Engineering 2015 12 / 35

slide-13
SLIDE 13

Control-Flow Graph

Dominators: Example

1 i n t sum = 0; 2 i n t

i = 1;

3 while

( i <N) {

4

i += 1;

5

sum += i ;

6 } 7 p r i n t f ( ”Sum: %d” , sum) ;

1, 2 3 4, 5 7

sdom(7) = {1, 2; 3} idom(7) = {3}

Mathias Payer (Purdue University) CS510 Software Engineering 2015 13 / 35

slide-14
SLIDE 14

Control-Flow Graph

Post-dominator

Post Dominators X post-dominates Y, iff all possible paths from Y to End pass through X. X strictly post-dominates Y, iff X post-dominates Y and X! = Y . X immediately post-dominates Y, iff X post-dominates Y and X is the first post-dominator after Y on a path from Y to End.

Mathias Payer (Purdue University) CS510 Software Engineering 2015 14 / 35

slide-15
SLIDE 15

Control-Flow Graph

Post-dominators: Example

1 i n t sum = 0; 2 i n t

i = 1;

3 while

( i <N) {

4

i += 1;

5

sum += i ;

6 } 7 p r i n t f ( ”Sum: %d” , sum) ;

1, 2 3 4, 5 7

spdom(4, 5) = {3; 7} ipdom(4, 5) = {3}

Mathias Payer (Purdue University) CS510 Software Engineering 2015 15 / 35

slide-16
SLIDE 16

Control-Flow Graph

Backward Edges

1 i n t sum = 0; 2 i n t

i = 1;

3 while

( i <N) {

4

i += 1;

5

sum += i ;

6 } 7 p r i n t f ( ”Sum: %d” , sum) ;

1, 2 3 4, 5 7

A back edge is an edge whose head dominates its tail2.

2Back edges often identify loops.

Mathias Payer (Purdue University) CS510 Software Engineering 2015 16 / 35

slide-17
SLIDE 17

Cyclomatic Complexity

Table of Contents

1

Control-Flow Graph

2

Cyclomatic Complexity

3

Program Dependence Graph

4

Super Control-Flow Graph

5

Call Graph

6

Other Representations and Tools

Mathias Payer (Purdue University) CS510 Software Engineering 2015 17 / 35

slide-18
SLIDE 18

Cyclomatic Complexity

Cyclomatic Complexity

Cyclomatic Complexity Cyclomatic complexity is a software metric that measures the quantitative complexity of a program by measuring the number of linearly independent paths through a program’s source code. The complexity M is defined as M = E − N + 2P, whereas E is the number of edges, N the number of nodes, and P the number of connected components (i.e., functions). Rule of thumb: if the complexity M of a function is larger than 10-15 then the function should be split into multiple components.

Mathias Payer (Purdue University) CS510 Software Engineering 2015 18 / 35

slide-19
SLIDE 19

Cyclomatic Complexity

Cyclomatic Complexity: Example

1 i n t sum = 0; 2 i n t

i = 1;

3 while

( i <N) {

4

i += 1;

5

sum += i ;

6 } 7 p r i n t f ( ”Sum: %d” , sum) ;

1, 2 3 4, 5 7

E = 4, N = 4, P = 1. M = E - N + 2P = 2.

Mathias Payer (Purdue University) CS510 Software Engineering 2015 19 / 35

slide-20
SLIDE 20

Program Dependence Graph

Table of Contents

1

Control-Flow Graph

2

Cyclomatic Complexity

3

Program Dependence Graph

4

Super Control-Flow Graph

5

Call Graph

6

Other Representations and Tools

Mathias Payer (Purdue University) CS510 Software Engineering 2015 20 / 35

slide-21
SLIDE 21

Program Dependence Graph

Program Dependence Graph (PDG)

Nodes are formed by single statements, not basic blocks. Data-Dependence Graph used to track data dependencies. Control-Dependence Graph used to track control dependencies. Widely used program representation!

Mathias Payer (Purdue University) CS510 Software Engineering 2015 21 / 35

slide-22
SLIDE 22

Program Dependence Graph

Data Dependence

Data Dependence X is data dependent on Y, iff (i) there is a variable v defined at Y and used at X and (ii) there exists a path of nonzero length from Y to X along which v is not redefined.

Mathias Payer (Purdue University) CS510 Software Engineering 2015 22 / 35

slide-23
SLIDE 23

Program Dependence Graph

Data Dependence: Example

1 i n t sum = 0; 2 i n t

i = 1;

3 while

( i <N) {

4

i += 1;

5

sum += i ;

6 } 7 p r i n t f ( ”Sum: %d” , sum) ;

1, 2 3 4, 5 7

DataDep(sum, 7) = {5, 1}

Mathias Payer (Purdue University) CS510 Software Engineering 2015 23 / 35

slide-24
SLIDE 24

Program Dependence Graph

Difficulties with Data Dependence

Statically computing data dependencies is hard due to aliasing: a variable can refer to multiple memory locations/objects.

1 i n t

x , y , z , ∗p ;

2 x = . . . ; 3 y = . . . ; 4 p = &x ; 5 p = p + z ; 6 . . . = ∗p ; Mathias Payer (Purdue University) CS510 Software Engineering 2015 24 / 35

slide-25
SLIDE 25

Program Dependence Graph

Control Dependence

Control Dependence Y is control dependent on X, iff X directly determines whether Y executes: statements inside one branch of a predicate are usually control dependent on the predicate. there exists a path from X to Y so that every node in the path

  • ther than X and Y is post-dominated by Y.

(No such paths for nodes in a path between X and Y). Y does not strictly post-dominate X. (There is a path from X to End that does not pass Y or X==Y). Reading assignment: http://dl.acm.org/citation.cfm?id=24041

Mathias Payer (Purdue University) CS510 Software Engineering 2015 25 / 35

slide-26
SLIDE 26

Program Dependence Graph

Control Dependence: Example

X Y ...

all nodes post-dominated by Y X not post-dominated by Y

Mathias Payer (Purdue University) CS510 Software Engineering 2015 26 / 35

slide-27
SLIDE 27

Program Dependence Graph

Using the PDG

A program dependence graph combines the control dependence graph and the data dependence graph of the program. In debugging: what statement possibly induced the fault? In security: possible redefinitions?

Mathias Payer (Purdue University) CS510 Software Engineering 2015 27 / 35

slide-28
SLIDE 28

Super Control-Flow Graph

Table of Contents

1

Control-Flow Graph

2

Cyclomatic Complexity

3

Program Dependence Graph

4

Super Control-Flow Graph

5

Call Graph

6

Other Representations and Tools

Mathias Payer (Purdue University) CS510 Software Engineering 2015 28 / 35

slide-29
SLIDE 29

Super Control-Flow Graph

Super Control-Flow Graph (SCFG)

Adds inter-procedural aspects to intra-procedural CFG. Connect call sites to entry point of callee. Connect return statements back to call site.

Mathias Payer (Purdue University) CS510 Software Engineering 2015 29 / 35

slide-30
SLIDE 30

Call Graph

Table of Contents

1

Control-Flow Graph

2

Cyclomatic Complexity

3

Program Dependence Graph

4

Super Control-Flow Graph

5

Call Graph

6

Other Representations and Tools

Mathias Payer (Purdue University) CS510 Software Engineering 2015 30 / 35

slide-31
SLIDE 31

Call Graph

Call Graph (CG)

Each node represents a function; each edge represents a function invocation. The CG is useful when reasoning across function boundaries (e.g., for profiling or debugging).

Mathias Payer (Purdue University) CS510 Software Engineering 2015 31 / 35

slide-32
SLIDE 32

Other Representations and Tools

Table of Contents

1

Control-Flow Graph

2

Cyclomatic Complexity

3

Program Dependence Graph

4

Super Control-Flow Graph

5

Call Graph

6

Other Representations and Tools

Mathias Payer (Purdue University) CS510 Software Engineering 2015 32 / 35

slide-33
SLIDE 33

Other Representations and Tools

Other Representations

Points-to Graph Static Single Assignment (SSA)

Mathias Payer (Purdue University) CS510 Software Engineering 2015 33 / 35

slide-34
SLIDE 34

Other Representations and Tools

Analysis Tools

C/C++: LLVM, CIL, CBMC Java: SOOT, Wala Binary: Valgrind, Pin, Libdetox

Mathias Payer (Purdue University) CS510 Software Engineering 2015 34 / 35

slide-35
SLIDE 35

Other Representations and Tools

Questions?

?

Mathias Payer (Purdue University) CS510 Software Engineering 2015 35 / 35