Loop Fusion Amid Complex Control Flow R Ramshankar Dibyendu Das - PowerPoint PPT Presentation

Loop Fusion Amid Complex Control Flow R Ramshankar Dibyendu Das AMD 1

Loop Fusion Two loops with proximity in control flow iterating over same large arrays – Will show poor scalability – Why? Loops on large arrays stride over memory that is too big to fit in the cache. – Loops can be fused if dependences can be preserved, but – How do we deal with proximity amid complex control flows (and function calls)? 2

Loop fusion with control dependence • Build from trivial loop fusion: adjacent loops – Loops are typically guarded by an if (i != end) condition – Control dependence graph: derive from the CFG • If two loops have the same or almost identical control dependence 3

Control dependence If (x) { A; } A is control-dependent on the block that contains the conditional branch BR (x == true), A (i.e., A is control-dependent on the block that decides to bypass A or go to A) Formally, a statement y is said to be control dependent on another statement x if • – (1) there exists a non-trivial path from x to y such that every statement z ≠ x in the path is post-dominated by y and – (2) x is not post-dominated by y Added the control dependence construction algorithm from • Kennedy/Allen 4

Generic CFG pattern containing natural loops entry leads to the first loop int test(int A[], long size…) { • long i =0; By nature, a control – for (i=0; i < size; i++) { dependence A[i] |= (1 << a); Generalize based on this } • for (i=0; i < size; i++) { standard pattern A[i] |= (1 << b); Two proximal singly nested } – loops // … return 0; For ex: proximal in breadth-first • } order What if instead of the single – blocks “entry”/”if.end” we have complex control flow? 5

Fusing loops despite complex control flow: slicing out paths from the CFG Suppose a&b and d&e are • not mutually exclusive Loop fusion will be of benefit – int test(int A[], long size, int a, int b, int c, entry and if.end are the int d, int e) { • control-dependences long i =0; entry dominates if.end and if (a & b) { • if.end post-dominates entry for (i=0; i < size; i++) { if.end is the single exit for A[i] |= …; • first loop ( could be a DAG) } } if.end18 is the first common • if (d&e) { post-dominator of the loops’ exits for (i=0; i < size; i++) { A[i] |= …; Handle complex control flow • } by this approach: Transform } the CFG by duplicating paths leading from entry to … if.end18 Use aforementioned • dominance/control dependence relations 6

Loop fusion • To fuse merge entry , if.end blocks – Create control flow: no need for C/C++ short- circuiting – All conditions are anticipated at entry : collapse conditions with bitwise - and : done here in entrypflLander • Fuse all the way to the common post- dominator for both loop’s exits: if.end18 • Preserves the CFG structure; easy recursive application of loop fusion with subsequent loops 7

Loop fusion – control merging using closures We want to allow more control-dependences to be • merged: – Create closures of the control dependence graph • Warshall’s algorithm – Ensure that the newly created control flow preserves data dependences – Start from the common control prefix of the two loops and attempt to merge or collapse the suffices – Control how different the closures are using a heuristic number on the size of suffices (<5 control dependences now) 8

Head and tail control flow strands • for.end could be more than one block – Deal with tail control flows between the two loops – Likewise with if.then: there can be head control flows leading to the two loops • The approach used at this time is to enumerate all paths through the head/tail control flow blocks and insert the fused loop in each path – Managing this with profile data should be more profitable (TBD) – Orthogonal approach would be code- motion(TBD) 9

Fusing more than two adjacent loops • Recursive application of fusion using a graph with edges between loop fusion candidates – Share a prefix control dependence closure – Second loop has a control dependence parent that post- dominates first loop’s exit – Breadth-first order of the control flow graph breaks ties • Provides a proximity metric • Perhaps allows rethinking recursions until fixed point • Walk over the graph and merge from bottom-up • Iteratively build loop graphs and fuse, until fixed point (or a specific number of iterations) • Intensive optimization 10

Complex control flow Dependences/aliases/phis/opaque-calls will prune the number of • collapsed paths Adjacent function calls may have loops that can be fused • Inlining may allow some loops to be fused – Function unswitching (useful approach that looks for the quickly exiting function – pattern) Inter-procedural mod-ref information provide additional alias information • Added metadata to carry over address non-taken global mod-ref info in load/stores for – use in scalar transforms or analysis Inline functions in a selective manner • Walk over call graph SCCs and ascertain if inlining a call may allow loop fusion • 11

Dependence analysis • First cut approach chooses inner-most loops that are simple (for example, loops that may be favored by the loopvectorizer) • Need to develop a cache model that verifies to a certain degree of accuracy if loop fusion will be beneficial or not • Exit/step SCEVs of both loops are checked to be exact matches, check for no LCD with the dependence analyzer • Used LLVM Dependence Analyzer – Dependency Analyzer is said not to be robust, but was able to handle our tests 12

Results (preliminary) Several synthetic cases demonstrate effectiveness • – for() {} if () { for(){} } else { for () } – for() {} if () { for(){} } – for() {} for() {} – if() {for() {}} if() {for() {} } – For large arrays fusion improved performance almost exponentially Improves SPECCPU INT 2006 • 462.libquantum rate performance improves close to 2.5X in x86 • (AMD/Intel) – Non-trivial control flow, inlining, unswitching, global mod-ref – more than 100 loop fusion steps POC code received favorable response from llvmdev • – Working to address llvmdev comments Need to explore way for use of profile information • 13

Reference – R. Allen and K. Kennedy, Optimizing Compilers for Modern Architectures: A Dependence-based Approach. Morgan Kaufmann 2001, ISBN 1-55860-286-0 – S. S. Muchnick, Advanced Compiler Design and Implementation. Morgan Kaufmann 1997, ISBN 1-55860-320-4 – M. Wolfe: High performance compilers for parallel computing. Addison- Wesley 1996, ISBN 0-8053-2730-4 Trademark Attribution AMD, the AMD Arrow logo, and combinations thereof are trademarks of Advanced Micro Devices, Inc. Other product names used in this publication are for identification purposes only and may be trademarks of their respective companies. 14

Loop Fusion Amid Complex Control Flow R Ramshankar Dibyendu Das - PowerPoint PPT Presentation

Loop Fusion Amid Complex Control Flow R Ramshankar Dibyendu Das AMD 1 Loop Fusion Two loops with proximity in control flow iterating over same large arrays Will show poor scalability Why? Loops on large arrays stride over memory that

Closing the Loop Closing the Loop Closing the Loop Closing the Loop Closing the Loop Closing

Probabilistic and Model Fusion: . . . Model Fusion: . . . Interval Uncertainty Model Fusion:

Coarse-Grained Parallelism Variable Privatization, Loop Alignment, Loop Fusion, Loop

High resolution image fusion via fusion frames Shidong Li San Francisco State University

Repetition Types of Loops Counting loop Know how many times to loop

1 What Is Control-Flow Analysis? Loop Concepts Control-flow analysis discovers the flow of

October 2016 October 2016 WHAT IS FUSION? TWO FUSION TYPES NEUTRONIC ANEUTRONIC TWO

Update on the Fusion Update on the Fusion Energy Sciences Program Energy Sciences Program Ed

Modeling with MOSEK Fusion Ulf Worse INFORMS Minneapolis October 5 2013 http://www.mosek.com

Trading Strategies Introduction Trading Loop Trading Loop Trading Loop Trading Loop Three

EMPLOYER RESPONSIBILITIES AMID COVID-19 OVERVIEW + UPDATES EMPLOYER RESPONSIBILITIES AMID

Loop Fusion and Fission and Presburger Trans Framework ! Last time ! Unimodular transformation

1 Checking Legality in Kelly & Pugh Framework Loop Fusion Example (cont) For each dependence,

Open loop synthesis for closed loop control Kazufumi Ito, North Carolina State University June

Control-Flow Analysis and Loop Detection Last time PRE Today Control-flow

Loop Invariants: Part 2 7 January 2019 OSU CSE 1 Maintaining the Loop Invariant A claimed

CPSC 213 Condition Codes - Loops 3.6.1-3.6.5 Introduction to Computer Systems Unit 1d

Processes and control flow Are branches/calls the only way we can get the processor to go

Nonlinear Control Lecture # 14 Tracking & Regulation Nonlinear Control Lecture # 14 Tracking

Oshkosh Corporation Third Quarter Fiscal 2020 July 30, 2020 WILSON JONES CHIEF EXECUTIVE

Control Flow CS105 : Saelee dynamic flow of execution single path sequential decisions

Outline Unreachable-Code Elimination Straightening If and Loop Simplifications

Lecture 8: Optional 1 on 1 with a staff member to help just you Conditionals &

Python language: Control Flow The FOSSEE Group Department of Aerospace Engineering IIT Bombay

Loop Fusion Amid Complex Control Flow R Ramshankar Dibyendu Das - PowerPoint PPT Presentation

Loop Fusion Amid Complex Control Flow R Ramshankar Dibyendu Das AMD 1 Loop Fusion Two loops with proximity in control flow iterating over same large arrays Will show poor scalability Why? Loops on large arrays stride over memory that

Closing the Loop Closing the Loop Closing the Loop Closing the Loop Closing the Loop Closing

Probabilistic and Model Fusion: . . . Model Fusion: . . . Interval Uncertainty Model Fusion:

Coarse-Grained Parallelism Variable Privatization, Loop Alignment, Loop Fusion, Loop

High resolution image fusion via fusion frames Shidong Li San Francisco State University

Repetition Types of Loops Counting loop Know how many times to loop

1 What Is Control-Flow Analysis? Loop Concepts Control-flow analysis discovers the flow of

October 2016 October 2016 WHAT IS FUSION? TWO FUSION TYPES NEUTRONIC ANEUTRONIC TWO

Update on the Fusion Update on the Fusion Energy Sciences Program Energy Sciences Program Ed

Modeling with MOSEK Fusion Ulf Worse INFORMS Minneapolis October 5 2013 http://www.mosek.com

Trading Strategies Introduction Trading Loop Trading Loop Trading Loop Trading Loop Three

EMPLOYER RESPONSIBILITIES AMID COVID-19 OVERVIEW + UPDATES EMPLOYER RESPONSIBILITIES AMID

Loop Fusion and Fission and Presburger Trans Framework ! Last time ! Unimodular transformation

1 Checking Legality in Kelly &amp; Pugh Framework Loop Fusion Example (cont) For each dependence,

Open loop synthesis for closed loop control Kazufumi Ito, North Carolina State University June

Control-Flow Analysis and Loop Detection Last time PRE Today Control-flow

Loop Invariants: Part 2 7 January 2019 OSU CSE 1 Maintaining the Loop Invariant A claimed

CPSC 213 Condition Codes - Loops 3.6.1-3.6.5 Introduction to Computer Systems Unit 1d

Processes and control flow Are branches/calls the only way we can get the processor to go

Nonlinear Control Lecture # 14 Tracking &amp; Regulation Nonlinear Control Lecture # 14 Tracking

Oshkosh Corporation Third Quarter Fiscal 2020 July 30, 2020 WILSON JONES CHIEF EXECUTIVE

Control Flow CS105 : Saelee dynamic flow of execution single path sequential decisions

Outline Unreachable-Code Elimination Straightening If and Loop Simplifications

Lecture 8: Optional 1 on 1 with a staff member to help just you Conditionals &amp;

Python language: Control Flow The FOSSEE Group Department of Aerospace Engineering IIT Bombay

1 Checking Legality in Kelly & Pugh Framework Loop Fusion Example (cont) For each dependence,

Nonlinear Control Lecture # 14 Tracking & Regulation Nonlinear Control Lecture # 14 Tracking

Lecture 8: Optional 1 on 1 with a staff member to help just you Conditionals &