EECS 583 Class 2 Control Flow Analysis LLVM Introduction - - PowerPoint PPT Presentation

eecs 583 class 2 control flow analysis llvm introduction
SMART_READER_LITE
LIVE PREVIEW

EECS 583 Class 2 Control Flow Analysis LLVM Introduction - - PowerPoint PPT Presentation

EECS 583 Class 2 Control Flow Analysis LLVM Introduction University of Michigan September 8, 2014 - 1 - Announcements & Reading Material HW 1 out today, due Friday, Sept 22 (2 wks) This homework is not hard, but takes lots of


slide-1
SLIDE 1

EECS 583 – Class 2 Control Flow Analysis LLVM Introduction

University of Michigan September 8, 2014

slide-2
SLIDE 2
  • 1 -
slide-3
SLIDE 3
  • 2 -

Announcements & Reading Material

❖ HW 1 out today, due Friday, Sept 22 (2 wks)

» This homework is not hard, but takes lots of time to figure LLVM

  • ut, so start soon!!

» Part I: get “hello world” application working » Part II: Run profilers (control & memory dep), collect some stats

❖ Reading

» Today’s class

Ÿ Ch 9.6 from Dragon book Ÿ Or Ch 7.1, 7.3, 7.4 from Muchnick Ÿ “Trace Selection for Compiling Large C Applications to Microcode”, Chang and Hwu, MICRO-21, 1988.

» Next class

Ÿ “The Superblock: An Effective Technique for VLIW and Superscalar Compilation”, Hwu et al., Journal of Supercomputing, 1993

slide-4
SLIDE 4
  • 3 -

From last time: Control Flow Graph (CFG)

❖ Defn Control Flow Graph –

Directed graph, G = (V,E) where each vertex V is a basic block and there is an edge E, v1 (BB1) à v2 (BB2) if BB2 can immediately follow BB1 in some execution sequence

» A BB has an edge to all blocks it can branch to » Standard representation used by many compilers » Often have 2 pseudo vertices

Ÿ entry node Ÿ exit node BB1 BB2 BB4 BB3 BB5 BB6 BB7 Entry Exit

slide-5
SLIDE 5
  • 4 -

Weighted CFG

❖ Profiling – Run the application on

1 or more sample inputs, record some behavior

» Control flow profiling

Ÿ edge profile Ÿ block profile

» Path profiling » Cache profiling » Memory dependence profiling

❖ Annotate control flow profile onto

a CFG à weighted CFG

❖ Optimize more effectively with

profile info!!

» Optimize for the common case » Make educated guess BB1 BB2 BB4 BB3 BB5 BB6 BB7 Entry Exit 20 10 10 10 10 20 20 20

slide-6
SLIDE 6
  • 5 -

Property of CFGs: Dominator (DOM)

❖ Defn: Dominator – Given a CFG(V, E, Entry, Exit), a

node x dominates a node y, if every path from the Entry block to y contains x

❖ 3 properties of dominators

» Each BB dominates itself » If x dominates y, and y dominates z, then x dominates z » If x dominates z and y dominates z, then either x dominates y or y dominates x

❖ Intuition

» Given some BB, which blocks are guaranteed to have executed prior to executing the BB

slide-7
SLIDE 7
  • 6 -

Dominator Examples

BB1 BB2 BB4 BB3 Entry Exit BB2 BB3 BB5 BB4 Entry Exit BB6 BB1 BB7

slide-8
SLIDE 8
  • 7 -

Dominator Analysis

❖ Compute dom(BBi) = set of

BBs that dominate BBi

❖ Initialization

» Dom(entry) = entry » Dom(everything else) = all nodes

❖ Iterative computation

» while change, do

Ÿ change = false Ÿ for each BB (except the entry BB)

◆ tmp(BB) = BB + {intersect of

Dom of all predecessor BB’s}

◆ if (tmp(BB) != dom(BB))

dom(BB) = tmp(BB) change = true

BB1 BB2 BB4 BB3 BB5 BB6 BB7 Entry Exit

slide-9
SLIDE 9
  • 8 -

Immediate Dominator

❖ Defn: Immediate

dominator (idom) – Each node n has a unique immediate dominator m that is the last dominator

  • f n on any path from the

initial node to n

» Closest node that dominates

BB1 BB2 BB4 BB3 BB5 BB6 BB7 Entry Exit

slide-10
SLIDE 10
  • 9 -

Dominator Tree

BB1 BB2 BB3 BB4 BB6 BB5 BB7 BB DOM 1 1 2 1,2 3 1,3 4 1,4 BB DOM 5 1,4,5 6 1,4,6 7 1,4,7

Dom tree First BB is the root node, each node dominates all of its descendants BB1 BB2 BB4 BB3 BB5 BB6 BB7

slide-11
SLIDE 11
  • 10 -

Class Problem

BB1 BB2 BB4 BB3 BB6 BB7 BB8 Entry Exit

Draw the dominator tree for the following CFG

BB5

slide-12
SLIDE 12
  • 11 -

Post Dominator (PDOM)

❖ Reverse of dominator ❖ Defn: Post Dominator –

Given a CFG(V, E, Entry, Exit), a node x post dominates a node y, if every path from y to the Exit contains x

❖ Intuition

» Given some BB, which blocks are guaranteed to have executed after executing the BB

❖ pdom(BBi) = set of BBs

that post dominate BBi

❖ Initialization

» Pdom(exit) = exit » Pdom(everything else) = all nodes

❖ Iterative computation

» while change, do

Ÿ change = false Ÿ for each BB (except the exit BB)

◆ tmp(BB) = BB + {intersect

  • f pdom of all successor

BB’s}

◆ if (tmp(BB) != pdom(BB))

pdom(BB) = tmp(BB) change = true

slide-13
SLIDE 13
  • 12 -

Post Dominator Examples

BB1 BB2 BB4 BB3 Entry Exit BB2 BB3 BB5 BB4 Entry Exit BB6 BB1 BB7

slide-14
SLIDE 14
  • 13 -

Immediate Post Dominator

❖ Defn: Immediate post

dominator (ipdom) – Each node n has a unique immediate post dominator m that is the first post dominator of n

  • n any path from n to the

Exit

» Closest node that post dominates » First breadth-first successor that post dominates a node

BB1 BB2 BB4 BB3 BB5 BB6 BB7 Entry Exit

slide-15
SLIDE 15
  • 14 -

Why Do We Care About Dominators?

❖ Loop detection – next subject ❖ Dominator

» Guaranteed to execute before » Redundant computation – an

  • p is redundant if it is

computed in a dominating BB » Most global optimizations use dominance info

❖ Post dominator

» Guaranteed to execute after » Make a guess (ie 2 pointers do not point to the same locn) » Check they really do not point to one another in the post dominating BB BB1 BB2 BB4 BB3 BB5 BB6 BB7 Entry Exit

slide-16
SLIDE 16
  • 15 -

Natural Loops

❖ Cycle suitable for optimization

» Discuss optimizations later

❖ 2 properties

» Single entry point called the header

Ÿ Header dominates all blocks in the loop

» Must be one way to iterate the loop (ie at least 1 path back to the header from within the loop) called a backedge

❖ Backedge detection

» Edge, xà y where the target (y) dominates the source (x)

slide-17
SLIDE 17
  • 16 -

Backedge Example

BB2 BB3 BB4 BB5 Entry Exit BB6 BB1

slide-18
SLIDE 18
  • 17 -

Loop Detection

❖ Identify all backedges using Dom info ❖ Each backedge (x à y) defines a loop

» Loop header is the backedge target (y) » Loop BB – basic blocks that comprise the loop

Ÿ All predecessor blocks of x for which control can reach x without going through y are in the loop + y

❖ Merge loops with the same header

» I.e., a loop with 2 continues » LoopBackedge = LoopBackedge1 + LoopBackedge2 » LoopBB = LoopBB1 + LoopBB2

❖ Important property

» Header dominates all LoopBB

slide-19
SLIDE 19
  • 18 -

Loop Detection Example

BB2 BB3 BB4 BB5 Entry Exit BB6 BB1

slide-20
SLIDE 20
  • 19 -

Important Parts of a Loop

❖ Header, LoopBB ❖ Backedges, BackedgeBB ❖ Exitedges, ExitBB

» For each LoopBB, examine each outgoing edge » If the edge is to a BB not in LoopBB, then its an exit

❖ Preheader (Preloop)

» New block before the header (falls through to header) » Whenever you invoke the loop, preheader executed » Whenever you iterate the loop, preheader NOT executed » All edges entering header

Ÿ Backedges – no change Ÿ All others, retarget to preheader ❖ Postheader (Postloop) - analogous

slide-21
SLIDE 21
  • 20 -

Preheaders for each Loop

BB2 BB3 BB4 BB5 Entry Exit BB6 BB1

??

slide-22
SLIDE 22
  • 21 -

Characteristics of a Loop

❖ Nesting (generally within a procedure scope)

» Inner loop – Loop with no loops contained within it » Outer loop – Loop contained within no other loops » Nesting depth

Ÿ depth(outer loop) = 1 Ÿ depth = depth(parent or containing loop) + 1

❖ Trip count (average trip count)

» How many times (on average) does the loop iterate » for (I=0; I<100; I++) à trip count = 100 » With profile info:

Ÿ Ave trip count = weight(header) / weight(preheader)

slide-23
SLIDE 23
  • 22 -

Trip Count Calculation Example

BB2 BB3 BB4 BB5 Entry Exit BB6 BB1

20 600 360 2100 140 360 480 20 1000 1340 1100

Calculate the trip counts for all the loops in the graph

slide-24
SLIDE 24
  • 23 -

Reducible Flow Graphs

❖ A flow graph is reducible if and only if we can partition

the edges into 2 disjoint groups often called forward and back edges with the following properties

» The forward edges form an acyclic graph in which every node can be reached from the Entry » The back edges consist only of edges whose destinations dominate their sources

❖ More simply – Take a CFG, remove all the backedges

(xà y where y dominates x), you should have a connected, acyclic graph

bb1 bb2 bb3 Non-reducible!

slide-25
SLIDE 25
  • 24 -

Regions

❖ Region: A collection of operations that are treated as a

single unit by the compiler

» Examples

Ÿ Basic block Ÿ Procedure Ÿ Body of a loop

» Properties

Ÿ Connected subgraph of operations Ÿ Control flow is the key parameter that defines regions Ÿ Hierarchically organized ❖ Problem

» Basic blocks are too small (3-5 operations)

Ÿ Hard to extract sufficient parallelism

» Procedure control flow too complex for many compiler xforms

Ÿ Plus only parts of a procedure are important (90/10 rule)

slide-26
SLIDE 26
  • 25 -

Regions (2)

❖ Want

» Intermediate sized regions with simple control flow » Bigger basic blocks would be ideal !! » Separate important code from less important » Optimize frequently executed code at the expense of the rest

❖ Solution

» Define new region types that consist of multiple BBs » Profile information used in the identification » Sequential control flow (sorta) » Pretend the regions are basic blocks

slide-27
SLIDE 27
  • 26 -

Region Type 1 - Trace

❖ Trace - Linear collection of

basic blocks that tend to execute in sequence

» “Likely control flow path” » Acyclic (outer backedge ok)

❖ Side entrance – branch into the

middle of a trace

❖ Side exit – branch out of the

middle of a trace

❖ Compilation strategy

» Compile assuming path

  • ccurs 100% of the time

» Patch up side entrances and exits afterwards

❖ Motivated by scheduling (i.e.,

trace scheduling)

BB2 BB4 BB6 BB5 BB1 BB3 80 20 10 90 10 90 10 80 20 10

slide-28
SLIDE 28
  • 27 -

Linearizing a Trace

BB2 BB4 BB6 BB5 BB1 BB3 80 20 (side exit) 10 (side exit) 90 10 (entry count) 90 (entry/ exit count) 10 (exit count) 80 20 (side entrance) 10 (side entrance)

slide-29
SLIDE 29
  • 28 -

Intelligent Trace Layout for Icache Performance

BB2 BB4 BB6 BB5 BB1 BB3 trace1 trace 2 trace 3 The rest Intraprocedural code placement Procedure positioning Procedure splitting

Procedure view Trace view

slide-30
SLIDE 30
  • 29 -

Issues With Selecting Traces

❖ Acyclic

» Cannot go past a backedge

❖ Trace length

» Longer = better ? » Not always !

❖ On-trace / off-trace transitions

» Maximize on-trace » Minimize off-trace » Compile assuming on-trace is 100% (ie single BB) » Penalty for off-trace

❖ Tradeoff (heuristic)

» Length » Likelihood remain within the trace BB2 BB4 BB6 BB5 BB1 BB3 80 20 10 90 10 90 10 80 20 10

slide-31
SLIDE 31
  • 30 -

Trace Selection Algorithm

i = 0; mark all BBs unvisited while (there are unvisited nodes) do seed = unvisited BB with largest execution freq trace[i] += seed mark seed visited current = seed /* Grow trace forward */ while (1) do next = best_successor_of(current) if (next == 0) then break trace[i] += next mark next visited current = next endwhile /* Grow trace backward analogously */ i++ endwhile

slide-32
SLIDE 32
  • 31 -

Best Successor/Predecessor

❖ Node weight vs edge

weight

» edge more accurate

❖ THRESHOLD

» controls off-trace probability » 60-70% found best

❖ Notes on this algorithm

» BB only allowed in 1 trace » Cumulative probability ignored » Min weight for seed to be chose (ie executed 100 times)

best_successor_of(BB) e = control flow edge with highest probability leaving BB if (e is a backedge) then return 0 endif if (probability(e) <= THRESHOLD) then return 0 endif d = destination of e if (d is visited) then return 0 endif return d end procedure

slide-33
SLIDE 33
  • 32 -

Class Problems

BB1 100 BB2 BB3 BB5 BB6 BB4 BB7 BB8 40 135 100 35 75 25 25 50 10 5 60 15 100

Find the traces. Assume a threshold probability of 60%. BB2 BB4 BB7 BB5 BB1 BB3

20 80 100 450 20 80

BB6 BB8 BB9

51 49 49 10 41 10 41