Parametric Tiling with Inter-Tile Data Reuse Alexandre Isoard Alain - PowerPoint PPT Presentation

Motivation and challenges Parametric analysis Current implementation and results Parametric Tiling with Inter-Tile Data Reuse Alexandre Isoard Alain Darte Compsys, LIP (Laboratoire de l’Informatique du Parallélisme), Lyon IMPACT 4th International Workshop on Polyhedral Compilation Techniques January 20, 2014 Vienna, Austria 1 / 25

Motivation and challenges Parametric analysis Current implementation and results Outline Motivation and challenges 1 Kernel offloading: rules of the game Reminders: scheduling and tiling Inter-tile data reuse: example Parametric analysis 2 Tile index vs tile origin index Exact inter-tile reuse Approximated inter-tile reuse Current implementation and results 3 Current status Script with iscc Local memory allocation for PolyBench examples 2 / 25

Motivation and challenges Kernel offloading: rules of the game Parametric analysis Reminders: scheduling and tiling Current implementation and results Inter-tile data reuse: example Kernel Offloading Global Memory Local Memory slow fast Host Accelerator FPGA/GPU/MPPA/... CPU ☛ Perform computations by blocks; ☛ Exploit data reuse; ☛ Use pipelining/prefetching; ☛ Reduce and coalesce communications (burst). 3 / 25

Motivation and challenges Kernel offloading: rules of the game Parametric analysis Reminders: scheduling and tiling Current implementation and results Inter-tile data reuse: example Rules and objectives Data reuse: on the full iteration domain Rule 1: always use local data if already loaded or computed. ☛ Reduces communication volume, increases local memory. ☛ Enables full pipelining (load/compute/store sequence). 4 / 25

Motivation and challenges Kernel offloading: rules of the game Parametric analysis Reminders: scheduling and tiling Current implementation and results Inter-tile data reuse: example Rules and objectives Data reuse: on the full iteration domain Rule 1: always use local data if already loaded or computed. ☛ Reduces communication volume, increases local memory. ☛ Enables full pipelining (load/compute/store sequence). Blocking: thanks to tiling Rule 2: tiles executed in sequence (but a tile can be parallelized). ☛ Increases temporal reuse, reduces local memory. ☛ Increases spatial reuse, enables burst communications. 4 / 25

Motivation and challenges Kernel offloading: rules of the game Parametric analysis Reminders: scheduling and tiling Current implementation and results Inter-tile data reuse: example Rules and objectives Data reuse: on the full iteration domain Rule 1: always use local data if already loaded or computed. ☛ Reduces communication volume, increases local memory. ☛ Enables full pipelining (load/compute/store sequence). Blocking: thanks to tiling Rule 2: tiles executed in sequence (but a tile can be parallelized). ☛ Increases temporal reuse, reduces local memory. ☛ Increases spatial reuse, enables burst communications. Variants for reuse domain , i.e., where data reuse is performed Iteration domain reduced thanks to hierarchical tiling. Data reuse in a p -dimensional stripe, or at bounded distance. 4 / 25

Motivation and challenges Kernel offloading: rules of the game Parametric analysis Reminders: scheduling and tiling Current implementation and results Inter-tile data reuse: example Rules and objectives Data reuse: on the full iteration domain Rule 1: always use local data if already loaded or computed. ☛ Reduces communication volume, increases local memory. ☛ Enables full pipelining (load/compute/store sequence). Blocking: thanks to tiling Rule 2: tiles executed in sequence (but a tile can be parallelized). ☛ Increases temporal reuse, reduces local memory. ☛ Increases spatial reuse, enables burst communications. Variants for reuse domain , i.e., where data reuse is performed Iteration domain reduced thanks to hierarchical tiling. Data reuse in a p -dimensional stripe, or at bounded distance. Then: scheduling/pipelining & memory allocation Rule 3: reuse analysis independently on scheduling. Rule 4: load as late as possible, store as soon as possible. ☛ Overlaps transfer and computation (multi-buffering). ☛ Reduces live-ranges, and possibly local memory size. 4 / 25

Motivation and challenges Kernel offloading: rules of the game Parametric analysis Reminders: scheduling and tiling Current implementation and results Inter-tile data reuse: example Rules and objectives Parametric in terms of tile sizes? Data reuse: on the full iteration domain Rule 1: always use local data if already loaded or computed. ☛ Reduces communication volume, increases local memory. ☛ Enables full pipelining (load/compute/store sequence). Blocking: thanks to tiling Rule 2: tiles executed in sequence (but a tile can be parallelized). ☛ Increases temporal reuse, reduces local memory. ☛ Increases spatial reuse, enables burst communications. Variants for reuse domain , i.e., where data reuse is performed Iteration domain reduced thanks to hierarchical tiling. Data reuse in a p -dimensional stripe, or at bounded distance. Then: scheduling/pipelining & memory allocation Rule 3: reuse analysis independently on scheduling. Rule 4: load as late as possible, store as soon as possible. ☛ Overlaps transfer and computation (multi-buffering). ☛ Reduces live-ranges, and possibly local memory size. 4 / 25

Motivation and challenges Kernel offloading: rules of the game Parametric analysis Reminders: scheduling and tiling Current implementation and results Inter-tile data reuse: example Challenges and contributions General principle for Load sets m just before a tile indexed by � Load a data indexed by � T if: m is live-in for � T , i.e., read but not written earlier in � � T . m has not been loaded in a previous tile. � m has not been defined earlier. � 5 / 25

Motivation and challenges Kernel offloading: rules of the game Parametric analysis Reminders: scheduling and tiling Current implementation and results Inter-tile data reuse: example Challenges and contributions General principle for Load sets m just before a tile indexed by � Load a data indexed by � T if: m is live-in for � T , i.e., read but not written earlier in � � T . m has not been loaded in a previous tile. � m has not been defined earlier. � Tiling defines a schedule on tile+iteration indices, thus “previous” and “earlier”. � This schedule is not affine in terms of tile sizes. 5 / 25

Motivation and challenges Kernel offloading: rules of the game Parametric analysis Reminders: scheduling and tiling Current implementation and results Inter-tile data reuse: example Challenges and contributions General principle for Load sets m just before a tile indexed by � Load a data indexed by � T if: m is live-in for � T , i.e., read but not written earlier in � � T . m has not been loaded in a previous tile. � m has not been defined earlier. � Tiling defines a schedule on tile+iteration indices, thus “previous” and “earlier”. � This schedule is not affine in terms of tile sizes. Exact case Reads/writes are functions of iteration points. Can we express the relation “happens before” among iterations in a quasi-affine way? ☛ Yes. Parametric tiling with exact inter-tile reuse is feasible. 5 / 25

Motivation and challenges Kernel offloading: rules of the game Parametric analysis Reminders: scheduling and tiling Current implementation and results Inter-tile data reuse: example Challenges and contributions General principle for Load sets m just before a tile indexed by � Load a data indexed by � T if: m is live-in for � T , i.e., read but not written earlier in � � T . m has not been loaded in a previous tile. � m has not been defined earlier. � Tiling defines a schedule on tile+iteration indices, thus “previous” and “earlier”. � This schedule is not affine in terms of tile sizes. Exact case Reads/writes are functions of iteration points. Can we express the relation “happens before” among iterations in a quasi-affine way? ☛ Yes. Parametric tiling with exact inter-tile reuse is feasible. Approximations What if contributions of reads/writes are summarized at tile level? Approximated? ☛ No information loss if approximations are “pointwise”. More approximations needed otherwise. 5 / 25

Motivation and challenges Kernel offloading: rules of the game Parametric analysis Reminders: scheduling and tiling Current implementation and results Inter-tile data reuse: example Reads, writes, schedule j A Product of two polynomials: arguments in A and B ; result in C . for(int k=0; k <2*n -1; k++) { C[k] = 0; // S0 } for(int i=0; i<n; i++) { for(int j=0; j<n; j++) { C[i+j] += A[i]*B[j]; // S1 } B } C i 6 / 25

Parametric Tiling with Inter-Tile Data Reuse Alexandre Isoard Alain - PowerPoint PPT Presentation

Motivation and challenges Parametric analysis Current implementation and results Parametric Tiling with Inter-Tile Data Reuse Alexandre Isoard Alain Darte Compsys, LIP (Laboratoire de lInformatique du Paralllisme), Lyon IMPACT 4th

Experience the Difference 2017 DECRA Villa Tile Panel Detail 2017 DECRA Villa Tile Roof

MLSE Channel Estimation MLSE Channel Estimation MLSE Channel Estimation Parametric or Non-

Semi-parametric and response setup non-parametric approaches to Parametric models

Will it k-tile? Structural aspects of polytopes and lattices in multiple tiling Alexandru Mihai,

Eastern Redcedar Mulch Tile Meet the Team Overview Mission Statement Mulch Tile Process

Odyssey 2016 The Speaker and Language Recognition Workshop June 21-24, 2016 Bilbao, Spain The

Domino Tilings Can you tile the grid with L-shaped tiles? Domino Tilings Can you tile the grid

A Relaxed Criterion for Loop Tiling Riyadh Baghdadi, Albert Cohen, Sven Verdoolaege

Tiling: A Data Locality Optimizing Algorithm Previously Kelly & Pugh transformation

1 Infrastructure Requirements Limit Reuse Planned Indirect Potable Reuse (Purple pipe may be a

Tiling for Dynamic Scheduling Ravi Teja Mullapudi Uday Bondhugula CSA, Indian Institue of

CSSS 569 Visualizing Data and Models Lab 5: Intro to tile Kai Ping (Brian) Leung Department of

UC Berkeley ReUSE Programs March 9, 2017 Lin King Cal Zero Waste Manager UC Berkeley Chair

TRACER TUTORIAL: TEXT REUSE DETECTION INTRODUCTION TO HISTORICAL TEXT REUSE DETECTION M arco B

Software Reuse From informal reuse (scavenging) to systematic reuse Management and technical

Corporate Presentation May 2018 Agenda Global Tile Industry Indian Tile Industry Kajaria

Integrating Non-blocking Synchronisation in Parallel Applications: Performance Advantages and

Preprocessing QBF: Failed Literals and Quantified Blocked Clause Elimination Florian Lonsing

OR 62 E xpre ssway Vilas Inte rc hange T AC Me e ting Pre se nte d b y: Ka tie Bro wn Oc to

NODES 2019 Track #1, 4:00PM By Fanghua(Joshua) Yu, Oct. 2019 NODES 2019 Best Practices to Make

BLOCKING SETS OF HALL PLANES, AND VALUE SETS OF POLYNOMIALS OVER FINITE FIELDS Fq13, Gaeta June

WORKING FAITH studies from the book of JAMES JoLynn Gower 493-6151

Decision Trees Gavin Brown Every Learning Method has Limitations Linear model? KNN ? SVM ?

Bayesian networks: basics Machine Intelligence Thomas D. Nielsen September 2008 Bayesian

Parametric Tiling with Inter-Tile Data Reuse Alexandre Isoard Alain - PowerPoint PPT Presentation

Motivation and challenges Parametric analysis Current implementation and results Parametric Tiling with Inter-Tile Data Reuse Alexandre Isoard Alain Darte Compsys, LIP (Laboratoire de lInformatique du Paralllisme), Lyon IMPACT 4th

Experience the Difference 2017 DECRA Villa Tile Panel Detail 2017 DECRA Villa Tile Roof

MLSE Channel Estimation MLSE Channel Estimation MLSE Channel Estimation Parametric or Non-

Semi-parametric and response setup non-parametric approaches to Parametric models

Will it k-tile? Structural aspects of polytopes and lattices in multiple tiling Alexandru Mihai,

Eastern Redcedar Mulch Tile Meet the Team Overview Mission Statement Mulch Tile Process

Odyssey 2016 The Speaker and Language Recognition Workshop June 21-24, 2016 Bilbao, Spain The

Domino Tilings Can you tile the grid with L-shaped tiles? Domino Tilings Can you tile the grid

A Relaxed Criterion for Loop Tiling Riyadh Baghdadi, Albert Cohen, Sven Verdoolaege

Tiling: A Data Locality Optimizing Algorithm Previously Kelly &amp; Pugh transformation

1 Infrastructure Requirements Limit Reuse Planned Indirect Potable Reuse (Purple pipe may be a

Tiling for Dynamic Scheduling Ravi Teja Mullapudi Uday Bondhugula CSA, Indian Institue of

CSSS 569 Visualizing Data and Models Lab 5: Intro to tile Kai Ping (Brian) Leung Department of

UC Berkeley ReUSE Programs March 9, 2017 Lin King Cal Zero Waste Manager UC Berkeley Chair

TRACER TUTORIAL: TEXT REUSE DETECTION INTRODUCTION TO HISTORICAL TEXT REUSE DETECTION M arco B

Software Reuse From informal reuse (scavenging) to systematic reuse Management and technical

Corporate Presentation May 2018 Agenda Global Tile Industry Indian Tile Industry Kajaria

Integrating Non-blocking Synchronisation in Parallel Applications: Performance Advantages and

Preprocessing QBF: Failed Literals and Quantified Blocked Clause Elimination Florian Lonsing

OR 62 E xpre ssway Vilas Inte rc hange T AC Me e ting Pre se nte d b y: Ka tie Bro wn Oc to

NODES 2019 Track #1, 4:00PM By Fanghua(Joshua) Yu, Oct. 2019 NODES 2019 Best Practices to Make

BLOCKING SETS OF HALL PLANES, AND VALUE SETS OF POLYNOMIALS OVER FINITE FIELDS Fq13, Gaeta June

WORKING FAITH studies from the book of JAMES JoLynn Gower 493-6151

Decision Trees Gavin Brown Every Learning Method has Limitations Linear model? KNN ? SVM ?

Bayesian networks: basics Machine Intelligence Thomas D. Nielsen September 2008 Bayesian

Tiling: A Data Locality Optimizing Algorithm Previously Kelly & Pugh transformation