1
play

1 Control-Flow Profiles Code Motion Using Control Flow Profiles - PowerPoint PPT Presentation

Profile-Guided Optimizations Motivation for Profiling Recall Limitations of static analysis Instruction scheduling Compilers can analyze possible paths but must behave conservatively List scheduling Frequency information


  1. Profile-Guided Optimizations Motivation for Profiling Recall Limitations of static analysis – Instruction scheduling – Compilers can analyze possible paths but must behave conservatively – List scheduling – Frequency information cannot be obtained through static analysis – Register renaming – Loop unrolling How runtime information helps – Software pipelining – Control flow information if c – Alias analysis 10% 90% – how can we use alias analysis for instruction scheduling? Optimize the more frequent path – what causes conservative results? (perhaps at the expense of the less frequent path) Today − Memory conflicts – More instruction scheduling If r5 and r4 always have different values, st r1,0(r5) – Profiling we can move the load above the store ld r2,0(r4) – Trace scheduling CS553 Lecture Profile-Guided Optimizations 2 CS553 Lecture Profile-Guided Optimizations 3 Profile-Guided Optimizations Profiling Issues Basic idea Profile data – Instrument and run program on sample inputs to get likely runtime – Collected over whole program run behavior – May not be useful (unbiased branches) – Can use this information to improve instruction scheduling – May not reflect all runs – Many other uses – May be expensive and inconvenient to gather – Code placement – Continuous profiling [Anderson 97] – Inlining – May interfere with program – Value speculation – Branch prediction – Class-based optimization (static method lookup) CS553 Lecture Profile-Guided Optimizations 4 CS553 Lecture Profile-Guided Optimizations 5 1

  2. Control-Flow Profiles Code Motion Using Control Flow Profiles Commonly gather two types of information Code motion across basic blocks – Execution frequencies of basic blocks – Increased scheduling freedom – Branch frequencies of conditional branches A B – Represent information in a weighted flow graph − If we want to move s1 to A, we must move execution frequencies 1 100 C s1 s1 to both A and B move code above a join 2 100 branch frequencies 30 70 s1 A 3 4 70 30 − If we want to move s1 to B, we must Instrumentation B C move s1 to both B and C – Insert instrumentation code at basic block entrances and before each move code below a split branch – Take average of values from multiple training runs – Fairly inexpensive CS553 Lecture Profile-Guided Optimizations 6 CS553 Lecture Profile-Guided Optimizations 7 Code Motion Using Control Flow Profiles (cont) Memory-Dependence Profiles Code motion across basic blocks Gather information about memory conflicts – Increased scheduling freedom – Frequencies of address matches between pairs of loads and stores – Attempts to answer the question: Are two references independent of one another? A s1 B A B – Concentrate on ambiguous reference pairs (those that the compiler cannot s1 C C ′ C figure out) move code below a join tail duplication prevents B → C from seeing s1 st1: store r5 (st1, ld2, 7) If this number is low, we can ld2: load r4 A − If we want to move s1 from B to A and if s1 speculatively assume that st1 would destroy a value along the A → C path, and ld2 do not conflict Instrumentation B s1 C do renaming – Much more expensive (in both space and time) to gather than control flow move code above a split − What if s1 might cause an exception? information – First perform control flow profiling – Apply only to most frequently executed blocks CS553 Lecture Profile-Guided Optimizations 8 CS553 Lecture Profile-Guided Optimizations 9 2

  3. Trace Scheduling [Fisher 81] and [Ellis 85] Trace Scheduling (example) trace: Basic idea b[i] = “old” b[i] = “old” – We want large blocks to create large scheduling windows, but basic a[i] = ... a[i] = ... if (a[i]>0) then blocks are small because branches are frequent b[i]=“new”; b[i]=“new”; – Create superblocks to increase scheduling window c[i] = ... else if (a[i]<=0) then goto repair – Use profile information to create good superblocks stmt X continue: stmt Y – Optimize each superblock independently ... endif c[i] = ... Superblocks – A sequence of basic blocks with a single entrance and multiple exits repair: restore old b[i] 1 stmt X Goals stmt Y a superblock – Want large superblocks recalculate c[i]? 2 goto continue – Want to avoid early exits 3 4 – Want blocks that match actual execution paths CS553 Lecture Profile-Guided Optimizations 10 CS553 Lecture Profile-Guided Optimizations 11 Trace Scheduling (cont) Trace Scheduling (cont) Three steps 1. Superblock formation (cont) 1. Create superblocks – Convert traces into Superblocks 2. Enlarge superblocks – Use tail duplication to eliminate side entrances 3. Compact (optimize) superblocks A A 70 30 70 30 1. Superblock formation trace superblock C C B B − Create traces using mutual-most-likely heuristic 70 70 10 10 (two blocks A and B are mutual-most-likely if B is the most likely E E E ′ successor of A, and A is the most likely predecessor of B) − Tail duplication increases code size D A − A trace is a maximal sequence of mutual- 10 70 30 most-likely blocks that does not contain a back B C edge 70 10 − Each block belongs to exactly one trace E CS553 Lecture Profile-Guided Optimizations 12 CS553 Lecture Profile-Guided Optimizations 13 3

  4. Trace Scheduling (cont) Trace Scheduling (cont) 2. Superblock enlargement 3. Optimizations – Enlarge superblocks that are too small – Perform list scheduling for each superblock – Code expansion can hurt i-cache performance – Memory-dependence profiles can be used to speculatively assume that load/store pairs do not conflict – Insert repair code in case the assumption is incorrect Three techniques for enlargement – Software pipelining – Branch target expansion – If the last branch in a superblock is likely to jump to the start of another superblock, append the contents of the target superblock to the first superblock – Loop peeling – Loop unrolling – These last two techniques apply to superblock loops, which are superblocks whose last blocks are likely to jump to their first blocks – Assume that each loop body has a single dominant path CS553 Lecture Profile-Guided Optimizations 14 CS553 Lecture Profile-Guided Optimizations 15 Enhancements to Profile-Guided Code Scheduling Speculation based on memory-dependence profiles (example) trace: Path profiling [Ball and Larus 96] b[i] = “old” b[i] = “old” a[i] = ... – Collect information about entire paths instead of about individual edges c[i] = a[j] if (a[i]>0) then a[i] = ... b[i]=“new”; 50 50 50 50 50 50 b[i]=“new”; else if (i==j) then goto deprepair stmt X if (a[i]<=0) then goto repair stmt Y continue: 50 50 50 50 50 50 endif ... c[i] = a[j] deprepair: c[i] = a[i] Edge profiles Path profiles Path profiles if (a[i]<=0) then goto repair goto continue repair: – Limit paths to some specified length (can thus handle loops) restore old b[i] – Can also stop paths at back edges stmt X – Disadvantages of path profiling? stmt Y goto continue CS553 Lecture Profile-Guided Optimizations 16 CS553 Lecture Profile-Guided Optimizations 17 4

  5. Lessons Concepts Larger scope helps Instruction scheduling – How can we increase scope? How do we schedule across control – Trace scheduling dependences? – Uses profile information – Looks at scopes beyond basic blocks Static information is limited – Use profiles Miscellany – How else can profiles be used in optimization? – Path profiling – Can we do these kinds of optimizations at runtime? CS553 Lecture Profile-Guided Optimizations 18 CS553 Lecture Profile-Guided Optimizations 19 5

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend