Optimization++ Complexities and strategies of optimization - PowerPoint PPT Presentation

Optimization++ • Complexities and strategies of optimization • Instruction Scheduling • Register Allocation

Optimization Recap 1. Intermediate language (IL) module - better separation of front-end and back-end modules - permit multi-pass optimization - we’re focusing on 3-address code 2. Basic blocks (BBs) & Control Flow Graph (CFGs) - BBs are jump-free sequences of code - CFGs link up BBs - clearly, efficiently identify jump-free sequences of code and control flow 3. IL-based optimization - data-flow analysis (abstract program execution of facts) * available expressions, avail. copies, useless vars, … - atomized, recombinable optimizations * common subexpressionn elmination, copy propagation, useless expressions (stmts), redunction in strength with induction variable elimination

Optimization Strategies Optimizations can be done: • Locally (within BB) • Globally (CFG for a function); true global rare • Functions can be inlined to get better results – can bloat code, replication hurts instruction cache locality • Peep-hole: sliding window over IL or assembler – e.g., reduce 2 simple instructions to 1 complex instruction 80/20 rules of optimization • 50% of improvement can be achieved in local opts – Also easier to implement (50/10 rule? ;) • 80% of instructions executed are in 20% of the code – in inner loops – Focusing optimizations in inner loops means optimization is 80% faster, yet 80% effective

Complexities – Subject of Research Pointers • Unknown what is being set; safe kill optimization Function calls • Like huge outer loops (think recursion), but used in many places • Algorithmic costs to handle is high, often treat like pointers Debugging • Optimization makes it hard to set Jeanne Ferrante Jeanne Ferrante Brad Calder break points, inspect values, etc. Brad Calder Andrew Chien Andrew Chien Runtime optimization Scott Baden Scott Baden • Java virtual machines compile & optimize during execution • Must be very lean, may use runtime information

Instruction Scheduling • RISC machines separate memory instr. from rest - Each instruction does memory or computation, not both - Allows most instructions to execute in a few cycles (say, 5), only multiplies and divides longer • If adjacent operations are unrelated, they can be overlapped (compute on x not near load of x ) - pipeline: fetch, decode, execute, (execute,) store (e.g.) - one-cycle net cost if perfectly overlapped, no cache misses - also, cannot do two fetches, etc., at a time (hazards) • To get best performance, then, need to reorder instructions to minimize ‘stalling’ dependences

Example of Pipeline Stalls ld x, r1 f d e e s ld y, r2 f d e e s add r1, r2, r3 f d e s st r3, z f d e e s • 10 cycles (3 cycles of stalls) • Ideas? These in front: ld a, r4 ld a, r4 ld b, r5 ld b, r5 sub r4, r5, r6 sub r4, r5, r6 st r6, c st r6, c

Reorder instructions - no stalls/bubbles! ld a, r4 f d e e s ld b, r5 f d e e s ld x, r1 f d e e s ld y, r2 f d e e s s sub r4, r5, r6 f d e s s st r6, c f d e e s s add r1, r2, r3 f d e s s st r3, z f d e e s • Assumes no cache misses on loads • Any ideas for implementation (an algorithm?)

Maximize distance between dependences ld a, r4 ld b, r5 (1s, 2i) (1s, 2i) sub r4, r5, r6 (0s, 1i) st r6, c Repeat until no instructions: Repeat until no instructions: ld x, r1 ld y, r2 1. Examine all “roots” 1. Examine all “roots” (all predecessors scheduled) (all predecessors scheduled) (1s, 2i) (1s, 2i) 2. Schedule one that 2. Schedule one that add r1, r2, r3 a. can cause stalls on succ. a. can cause stalls on succ. b. has most succesors b. has most succesors (creates most choices) (creates most choices) (0s, 1i) c. longest path to leaf c. longest path to leaf st r3, z 3. Delete it from graph 3. Delete it from graph

Optimizing Register Allocation Memory operations are expensive - ‘Extra’ instr, take longer, cause stalls, miss cache Best never to load or store data from memory • Keep all data in registers, but in short supply • How to prioritize; ideas? Temporal locality Temporal locality Values to be Values to be a. used the most a. used the most b. in the shortest span of b. in the shortest span of instructions instructions

Greedy Algorithm Policy: Let variable loaded into register stay in reg until reg is needed for something else x := y + y � ld [fp-4], r1 � ld [fp-4], r1 ld [fp- -4], 4], r2 r2 add r1, r1 r1, r3 ld [fp add r1, r2 r2, r3 st r3, [fp-8] st r3, [fp-8] • Achieve by VarSTO remember Register & vice versa R = Machine.smartGetReg(varSTO); - If called on STO with a reg, just returns its reg - If all registers in use, chooses register to free • Keeps memory-based model, modular change

smartGetReg • Must free all at end of function • Must free all at end of function • Can also do lazy stores , but • Can also do lazy stores , but not as big a win not as big a win Register smartGetReg(STO var) { if (!(reg = varFile.varsReg(var))) { if (!(reg = getReg()) { ovar = varFile. stalestVar (); // remove LRU var/reg from file oreg = varFile.varsReg(ovar); varFile.remVar(ovar); freeReg(oreg); reg = getReg(); // guaranteed to be the same } varFile.put(var, reg); emitLoad(var, reg); // DOES PREEMPTIVE LOAD!! } varFile. markUsed (var); // updates LRU ‘timestamp’ return reg; }

Global Register Allocation Actually examine temporal locality • Values likely to be used the most • In the shortest span of instructions In short, reference density • Static: number of var references in code • Dynamic: number of var references occur (loops) Two challenges • Determining (guessing) dynamic density – estimate from static, or from profiling via testcases • Choosing allocation that gets the most dense variables (most references) in registers

Graph-Coloring Register Allocation Schedule variables into registers like classes into rooms • Schedule the most classes possible for available rooms • Can’t have two classes (variables) in a room (register) at same time x y w z [1] z := 1 lifetime [2] x := 2 * z | [3] y := 3 * z | | [4] w := x + y | | | [5] z := y + z | | | [6] x := y * w | | try: r1 r2 r3 ? ? or: r1 r2 r1 r3

Graph-Coloring Register Allocation Create conflict graph, edge means “cannot be scheduled in same ‘room’ (register) because (life)times overlap” x 1 y 1 w 1 z 1 [1] z := 1 [2] x := 2 * z | [3] y := 3 * z | | [4] w := x + y | | | [5] z := y + z | | | [6] x := y * w | | (2c, 1/2d) (2c, 1/3d) r3 Repeat until all registers allocated: r3 x w • Select node from unallocated set* • Give register not in use by any neighbor • Remove node from unallocated set *prioritize to non-trivial, dense: y z a. constrained nodes (more edges than regs) r1 r2 (3c, 3/4d) (3c, 1d) b. nodes with higher reference density

Global Register Allocation [1] r2 := 1 [2] r3 := 2 * r2 [3] r1 := 3 * r2 [4] r3 := r3 + r1 ! x and w “share” [5] z := r1 + r2 [6] x := r1 * r3 • Entire calculation done in registers • Note that done on IL – restricted/typed temps - Accuracy of allocation depends on mapping to actual assembly

Lessons Learned Significant benefits possible at little cost or complexity Modelling (formalization) of problems • 3-addr code, BB, CFG, dependence graphs Clarifies structure • Exposes differences, similarities, (mis)matches • Identifies opportunities (for optimization) • Efficient and simple algorithms available “off the shelf” Also useful for thinking, software design, debugging, etc.

Optimization++ Complexities and strategies of optimization - PowerPoint PPT Presentation

Optimization++ Complexities and strategies of optimization Instruction Scheduling Register Allocation Optimization Recap 1. Intermediate language (IL) module - better separation of front-end and back-end modules - permit

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

Optimization (Introduction) : IR IR f ( x ) ID Optimization " 112 FCI ) : IR NE

AM 205: lecture 20 Today: PDE optimization, constrained optimization example New topic:

solving Linear Optimization What we did so far How to model an optimization problem

MATH529 Fundamentals of Optimization Unconstrained Optimization II Marco A. Montes de Oca

CS675: Convex and Combinatorial Optimization Spring 2018 Convex Optimization Problems

Search Engine Optimization What is Search Engine Optimization Search Engine Optimization is the

Optimization Unconstrained optimization Constrained optimization Newton with equality

Convex Optimization 4. Convex Optimization Problems Prof. Ying Cui Department of Electrical

Introduction: Why Optimization? Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725

MATHEMATICS 1 CONTENTS Unconstrained optimization Constrained optimization Lagrange method

Introduction to Optimization Dr. Mihail October 23, 2018 (Dr. Mihail) Optimization October 23,

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Optimization Problems Instructor:

Evolutionary Algorithm 2. Swarm Intelligence and Ant Colony Optimization Ant Colony Optimization

Outline Optimization Unconstrained Optimization Problems Machine Learning and Pattern

Convex Optimization by Stephen Boyd, and Lieven Vandenberghe. Optimization for Machine Learning by

High level OCaml optimisations Pierre Chambart, OCamlPro OCaml 2013, 23 September 2013 OCaml is

"Physics of the Deuteron" Group: Blaine Norum, Professor Richard Lindgren, Research

Spark Machine Learning Future Cloud Summer School Paco Nathan @pacoid 2015-08-08

Nothing About Us Without Us: Using a Participatory and Equitable Approach to Evaluating an

outdoor learning experiences with children Hanin Hussain, PhD Early Childhood and Special Needs

Swapping Evaluation: A Memory-Scalable Solution for Answer-On-Demand Tabling an 1 Manuel Carro 1

15-780 Graduate Artificial Intelligence: Convolutional and recurrent networks J. Zico Kolter

Multiplicity Computing Engineering Software for Reliability, Performance, and Security Alexander

Optimization++ Complexities and strategies of optimization - PowerPoint PPT Presentation

Optimization++ Complexities and strategies of optimization Instruction Scheduling Register Allocation Optimization Recap 1. Intermediate language (IL) module - better separation of front-end and back-end modules - permit

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

Optimization (Introduction) : IR IR f ( x ) ID Optimization &quot; 112 FCI ) : IR NE

AM 205: lecture 20 Today: PDE optimization, constrained optimization example New topic:

solving Linear Optimization What we did so far How to model an optimization problem

MATH529 Fundamentals of Optimization Unconstrained Optimization II Marco A. Montes de Oca

CS675: Convex and Combinatorial Optimization Spring 2018 Convex Optimization Problems

Search Engine Optimization What is Search Engine Optimization Search Engine Optimization is the

Optimization Unconstrained optimization Constrained optimization Newton with equality

Convex Optimization 4. Convex Optimization Problems Prof. Ying Cui Department of Electrical

Introduction: Why Optimization? Geoff Gordon &amp; Ryan Tibshirani Optimization 10-725 / 36-725

MATHEMATICS 1 CONTENTS Unconstrained optimization Constrained optimization Lagrange method

Introduction to Optimization Dr. Mihail October 23, 2018 (Dr. Mihail) Optimization October 23,

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Optimization Problems Instructor:

Evolutionary Algorithm 2. Swarm Intelligence and Ant Colony Optimization Ant Colony Optimization

Outline Optimization Unconstrained Optimization Problems Machine Learning and Pattern

Convex Optimization by Stephen Boyd, and Lieven Vandenberghe. Optimization for Machine Learning by

High level OCaml optimisations Pierre Chambart, OCamlPro OCaml 2013, 23 September 2013 OCaml is

&quot;Physics of the Deuteron&quot; Group: Blaine Norum, Professor Richard Lindgren, Research

Spark Machine Learning Future Cloud Summer School Paco Nathan @pacoid 2015-08-08

Nothing About Us Without Us: Using a Participatory and Equitable Approach to Evaluating an

outdoor learning experiences with children Hanin Hussain, PhD Early Childhood and Special Needs

Swapping Evaluation: A Memory-Scalable Solution for Answer-On-Demand Tabling an 1 Manuel Carro 1

15-780 Graduate Artificial Intelligence: Convolutional and recurrent networks J. Zico Kolter

Multiplicity Computing Engineering Software for Reliability, Performance, and Security Alexander

Optimization (Introduction) : IR IR f ( x ) ID Optimization " 112 FCI ) : IR NE

Introduction: Why Optimization? Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725

"Physics of the Deuteron" Group: Blaine Norum, Professor Richard Lindgren, Research