Optimal ILP and Register Tiling: Analytical Model and Optimization - PowerPoint PPT Presentation

Optimal ILP and Register Tiling: Analytical Model and Optimization Framework Lakshminarayanan. Renganarayana, Upadrasta Ramakrishna, Sanjay Rajopadhye Computer Science Department Colorado State University

Overview  ILP and register reuse  Execution time and register pressure functions  Optimal ILP and register tiling problem  Optimal tiling problem as convex opt. problem  Validation  Related work  Conclusions & Future work October 21, 2005 LCPC '05 2

ILP and Register Reuse  Loop programs  dominate application execution time  main sources of ILP and register reuse  Transformations  expose / exploit ILP  enable register reuse  These transformations interact in subtle ways  ILP - Register Reuse tradeoff? October 21, 2005 LCPC '05 3

ILP - Register Reuse Tradeoff  Optimal combination of transformations  Quantification of interactions  A mathematical model  to study the interactions  to choose the optimal trans. parameters  TTBOOK: no such model has been studied October 21, 2005 LCPC '05 4

Contributions  Cost model with trans. params. as variables  closed forms: execution time & register pressure  Convex optimization problem formulation  A globally optimal solution  First such formulation & optimal solution October 21, 2005 LCPC '05 5

Exposing and Exploiting ILP  Exposing ILP  Unroll and Jam  Loop permutation or skewing  Multi-dimensional scheduling  Exploiting ILP  DAG schedulers  Software pipelining October 21, 2005 LCPC '05 6

Exposing ILP with Unroll and Jam Unrolled Loop body for i 1 = 1 to 6 (9 iterations) for i 2 = 1 to 6 A[i 1 ,i 2 ] = A[i 1 -1,i 2 ]+A[i 1 ,i 2 -1] DAG exposes parallelism October 21, 2005 LCPC '05 7

Exposing ILP with Permutation for i 2 = 1 to 6 for i 1 = 1 to 6 for i 1 = 1 to 6 for i 2 = 1 to 6 A[i 1 ,i 2 ] = 3.23 * A[i 1 ,i 2 -1] A[i 1 ,i 2 ] = 3.23 * A[i 1 ,i 2 -1] October 21, 2005 LCPC '05 8

Exposing ILP with Skewing for i 1 = 1 to 6 All the iterations for i 2 = 1 to 6 of inner-most A[i 1 ,i 2 ] = A[i 1 -1,i 2 ]+A[i 1 ,i 2 -1] loop are parallel  Sufficient ILP  Performance limited only by the available execution resources October 21, 2005 LCPC '05 9

Register Reuse  Unrol and Jam  Scalar replacement  scalar replacement enables register placement  classic register allocators are sufficient  Loop tiling  array register allocation  registers allocated to array values  no code size increase  requires an array register allocator October 21, 2005 LCPC '05 10

Scalar Replacement for i 1 = 1 to 6 for i 1 = 1 to 6 step 2 for i 2 = 1 to 6 for i 2 = 1 to 6 step 2 A[i 1 ,i 2 ] = A[i 1 -1,i 2 ]+A[i 1 ,i 2 -1] A[i 1 ,i 2 ] = A[i 1 -1,i 2 ]+A[i 1 ,i 2 -1] A[i 1 +1,i 2 ] = A[i 1 ,i 2 ]+A[i 1 +1,i 2 -1] A[i 1 ,i 2 +1] = A[i 1 -1, i 2 +1]+A[i 1 ,i 2 ] unroll 2 x 2 A[i 1 +1,i 2 +1] = A[i 1 ,i 2 +1]+A[i 1 +1,i 2 ] for i 1 = 1 to 6 step 2 replace A[i 1 ,i 2 ] by T = A[i 1 ,0] a scalar T for i 2 = 1 to 6 step 2 T = A[i 1 -1,i 2 ]+ T A[i 1 +1,i 2 ] = T +A[i 1 +1,i 2 -1]  Saves 2 loop independent loads A[i 1 ,i 2 +1] = A[i 1 -1, i 2 +1]+T plus 1 loop carried load A[i 1 +1,i 2 +1] = A[i 1 ,i 2 +1]+A[i 1 +1,i 2 ] A[i 1 ,i 2 ] = T  T can be allocated to a register Which array references to scalar replace? October 21, 2005 LCPC '05 11

Tiling Tiling  Similar to Unroll and Jam  Decreases life time of values  Limits MAXLIVE for i 1 = 1 to 6 for i 2 = 1 to 6 A[i 1 ,i 2 ] = A[i 1 -1,i 2 ]+A[i 1 ,i 2 -1] October 21, 2005 LCPC '05 12

Register tiling for i 1 = 1 to 6 for i 2 = 1 to 6 A[i 1 ,i 2 ] = A[i 1 -1,i 2 ]+A[i 1 ,i 2 -1] 3x3 register tile Tile sizes:  Affects load/store savings  Constrained by number of registers  How to choose the tile sizes? October 21, 2005 LCPC '05 13

Traditional vs. Our Approach Sched. & Reg. Alloc Code Transformation Traditional Approach DAG Scheduler or Choose optimal Unroll and Jam Software Pipelining unroll and + + scalar promotion Scalar Promotion Scalar Register parameters Allocation Our Approach Choose optimal Permutation Software Pipelining skew and or Skewing + tile + Scalar & Array parameters Tiling Register Allocation October 21, 2005 LCPC '05 14

Program, Tiling, and Architecture Class  Input loops:  perfectly nested, rectangular loops  uniform dependence bodies  Rectangular tiling  we assume: input loop nest admits rectangular tiling  ILP-exposed by: permutation or skewing  Architectures: superscalar or VLIW October 21, 2005 LCPC '05 15

Execution Time (When permutation exposes ILP) T = (ntiles * tile_cost) + loop_overhead tile_cost = max(comp_cost,load_store_cost) comp_cost = α * tile_vol load_store_cost = β * LS(t,D) loop_overhead = η * LO(t,N) ntiles = N 1 /t 1 * … * N n /t n t = vector of tile sizes N = vector of iter. space sizes D = dependence matrix October 21, 2005 LCPC '05 16

Execution Time Model (when permutation cannot expose ILP: skew) Skewing affects  iteration space shape -- makes counting of partial, full, and no. of tiles hard.  dependence lengths -- affects the amount of data loaded / stored in a tile.  Partial tiles treated as full tiles.  Number of tiles approximated by N 1 /t 1 * … * N n /t n  Dep. matrix = SD  LS(t,SD) is the load store volume October 21, 2005 LCPC '05 17

Optimal ILP and Register Tiling: Optimization Problem Formulation minimize TotalExecutionTime(t,S) subject to LoadStoreVolume(t,S) ≤ Registers For a fixed skew S  t is the only variable  opt. prob. reduces to an integer convex opt. prob. October 21, 2005 LCPC '05 18

Solution Steps Can permutation expose a parallel loop?  Yes!  No!  Construct set ( Γ ) of valid  No skewing, only tiling skews  Fix S = I in opt. prob.  For each element in Γ  Solve for optimal tile solve the fixed skew sizes optimization problem  Single integer convex  Pick the best opt. problem.  Only d(d-1) problems October 21, 2005 LCPC '05 19

Solving for Optimal Tile Sizes  Opt. Prob. for tile sizes is a Integer Geometric Program (à la Integer Linear Programs)  GPs can be transformed into convex opt. probs.  Standard solvers are available  Running time:  depends on #vars & #constraints  few seconds (< 10 secs.) October 21, 2005 LCPC '05 20

Validation  Experimental validation requires  array register allocator  architectural support (like rotating registers)  Similar model used for finding optimal unroll factor  optimal unroll factors can be found with small tweaks  In tiling for memory hierarchy  we have successfully used a similar model  almost all the cost models used by other researchers can be cast into our GP framework [RR-SC04] October 21, 2005 LCPC '05 21

Related Work  Unroll and Jam approach  [Callhan et al.-90], [Carr-Kennedy-94], [Sarkar-01]  Hierarchical tiling  [Carter et al.-95], [Mitchell et al.-98]  Software pipelining of loop nests  [Ramanujam-94], [Rong et al. 04], [Rong et al. 05]  Code generation for register tiling  [Jiminez et al.-02], [Sarkar-01] October 21, 2005 LCPC '05 22

Conclusions & Future Work  A mathematical formulation of the combined ILP and register tiling problem.  A globally optimal solution.  Future work:  adapting modulo schedulers to pipeline skewed loops  developing an array register allocator  experimental validation on benchmarks October 21, 2005 LCPC '05 23

Optimal ILP and Register Tiling: Analytical Model and Optimization - PowerPoint PPT Presentation

Optimal ILP and Register Tiling: Analytical Model and Optimization Framework Lakshminarayanan. Renganarayana, Upadrasta Ramakrishna, Sanjay Rajopadhye Computer Science Department Colorado State University Overview ILP and register reuse

A Tiling Based Programming Model and Its Suppor7ve Tools

Analytical Solution of Constrained LQ Optimal Control for Horizon 2 Jos e De Don a

Optimal Tax Progressivity: An Analytical Framework Jonathan Heathcote Federal Reserve Bank of

Tiling: A Data Locality Optimizing Algorithm Previously Kelly & Pugh transformation

A Simple Analytical Model for the Energy-Efficient Activation of Access Points in Dense WLANs

An Analytical Model for Tim e-Driven Cache Attacks Kris Tiri Onur Ac imez Michael Neve

An XML Data Model for Analytical Instruments The world leader in serving science James Duckworth

Log-gases on a quadratic lattice via discrete loop equations Alisa Knizel Columbia University

CS 5 4 3 : Com puter Graphics Lecture 2 ( Part I I ) : Tiling, Zoom ing and 2 D Clipping

Multi-tiling and equidecomposability of polytopes by lattice translates Bochen Liu Bar-Ilan

ANALYTICAL MODEL FOR THE PREDICTION OF THE FRACTURE TOUGHNESS OF MULTIDIRECTIONAL LAMINATES P.P.

Gap-labelling of the pinwheel tiling H. Moustafa Lab. de Math ematiques, Clermont-Ferrand

A Relaxed Criterion for Loop Tiling Riyadh Baghdadi, Albert Cohen, Sven Verdoolaege

Tiling for Dynamic Scheduling Ravi Teja Mullapudi Uday Bondhugula CSA, Indian Institue of

EXPLORING THE SOLUTION ZOO OF A SEMI-ANALYTICAL MHD MODEL FOR SELF-SIMILAR JETS C HIARA C

Slides for When Does Eco-Efficiency Rebound or Backfire? An Analytical Model Presentation

1 Hertsmere Borough Council STATEMENT OF ACCOUNTS Contents: Statement of Responsibilities 3

ATMO/OCNG NSC Virtual Presentation Your Advisor Name: Roxanna Russell Room O&M 202 Phone:

Presented by Abu Zakariya Agenda 1. Some Important Definitions 2. Dawah Strategy - Why Jesus is

From Text Message to Research Paper: Teaching the Text Message Generation Heidi Wright Ohio

Q4 2019 Presentation Avida Holding AB Disclaimer This Presentation has been produced by Avida

Optimization of the Data Acquisition Software (PxSuite DAQ) for the Silicon Strip Telescope at

Upgrade and Optimization Project at The Premcor Refining Group Inc. Public Hearing Monday,

Key EPA Initiatives to Address Hardrock Mining Sites M May 9, 2017 9 2017 Shahid Mahmud Kirby

Optimal ILP and Register Tiling: Analytical Model and Optimization - PowerPoint PPT Presentation

Optimal ILP and Register Tiling: Analytical Model and Optimization Framework Lakshminarayanan. Renganarayana, Upadrasta Ramakrishna, Sanjay Rajopadhye Computer Science Department Colorado State University Overview ILP and register reuse

A Tiling Based Programming Model and Its Suppor7ve Tools

Analytical Solution of Constrained LQ Optimal Control for Horizon 2 Jos e De Don a

Optimal Tax Progressivity: An Analytical Framework Jonathan Heathcote Federal Reserve Bank of

Tiling: A Data Locality Optimizing Algorithm Previously Kelly &amp; Pugh transformation

A Simple Analytical Model for the Energy-Efficient Activation of Access Points in Dense WLANs

An Analytical Model for Tim e-Driven Cache Attacks Kris Tiri Onur Ac imez Michael Neve

An XML Data Model for Analytical Instruments The world leader in serving science James Duckworth

Log-gases on a quadratic lattice via discrete loop equations Alisa Knizel Columbia University

CS 5 4 3 : Com puter Graphics Lecture 2 ( Part I I ) : Tiling, Zoom ing and 2 D Clipping

Multi-tiling and equidecomposability of polytopes by lattice translates Bochen Liu Bar-Ilan

ANALYTICAL MODEL FOR THE PREDICTION OF THE FRACTURE TOUGHNESS OF MULTIDIRECTIONAL LAMINATES P.P.

Gap-labelling of the pinwheel tiling H. Moustafa Lab. de Math ematiques, Clermont-Ferrand

A Relaxed Criterion for Loop Tiling Riyadh Baghdadi, Albert Cohen, Sven Verdoolaege

Tiling for Dynamic Scheduling Ravi Teja Mullapudi Uday Bondhugula CSA, Indian Institue of

EXPLORING THE SOLUTION ZOO OF A SEMI-ANALYTICAL MHD MODEL FOR SELF-SIMILAR JETS C HIARA C

Slides for When Does Eco-Efficiency Rebound or Backfire? An Analytical Model Presentation

1 Hertsmere Borough Council STATEMENT OF ACCOUNTS Contents: Statement of Responsibilities 3

ATMO/OCNG NSC Virtual Presentation Your Advisor Name: Roxanna Russell Room O&amp;M 202 Phone:

Presented by Abu Zakariya Agenda 1. Some Important Definitions 2. Dawah Strategy - Why Jesus is

From Text Message to Research Paper: Teaching the Text Message Generation Heidi Wright Ohio

Q4 2019 Presentation Avida Holding AB Disclaimer This Presentation has been produced by Avida

Optimization of the Data Acquisition Software (PxSuite DAQ) for the Silicon Strip Telescope at

Upgrade and Optimization Project at The Premcor Refining Group Inc. Public Hearing Monday,

Key EPA Initiatives to Address Hardrock Mining Sites M May 9, 2017 9 2017 Shahid Mahmud Kirby

Tiling: A Data Locality Optimizing Algorithm Previously Kelly & Pugh transformation

ATMO/OCNG NSC Virtual Presentation Your Advisor Name: Roxanna Russell Room O&M 202 Phone: