A Note on the Performance Distribution of Affine Schedules Louis-Nol - PowerPoint PPT Presentation

A Note on the Performance Distribution of Affine Schedules Louis-Noël Pouchet 1 , Cédric Bastoul 1 , John Cavazos 2 and Albert Cohen 1 1 ALCHEMY, INRIA Futurs / University of Paris-Sud XI, France 2 Computer and Information Sciences, University of Delaware, USA January 27, 2008 2nd Workshop on Statistical and Machine learning approaches to ARchitectures and compilaTion Göteborg, Sweden

Outline: SMART’08 Outline Motivation ◮ Automatic performance portability: iterative compilation ◮ Search space expressiveness → bring the iterative optimization problem into the polyhedral model ◮ Tradeoff expressiveness / traversal easiness ◮ Improve static characterization of the search space ◮ Highlight dynamic properties ◮ Validate a dedicated heuristic to traverse the space 2

Building the Search Space: SMART’08 The Model Original Schedule   i � 1 0 0 0 for (i = 0; i < n; ++i) for (i = 0; i < n; ++i) � j Θ S 1 . x S 1 = � .   for (j = 0; j < n; ++j){ 0 1 0 0 for (j = 0; j < n; ++j){  n  S1: C[i][j] = 0; 1 C[i][j] = 0; for (k = 0; k < n; ++k) for (k = 0; k < n; ++k) S2: C[i][j] += A[i][k]* i C[i][j] += A[i][k]*   � 1 0 0 0 0 B[k][j]; � j B[k][j]; Θ S 2 .   � x S 2 = . 0 1 0 0 0 k   } }   0 0 1 0 0 n 1 ◮ Represent Static Control Parts (control flow and dependences must be statically computable) ◮ Use code generator (e.g. CLooG) to generate C code from polyhedral representation (provided iteration domains + schedules) 3

Building the Search Space: SMART’08 The Model Original Schedule  i  � 1 0 0 0 for (i = 0; i < n; ++i) j for (i = 0; i < n; ++i) � Θ S 1 . x S 1 = � .   for (j = 0; j < n; ++j){ 0 1 0 0 n for (j = 0; j < n; ++j){   S1: C[i][j] = 0; 1 C[i][j] = 0; for (k = 0; k < n; ++k) for (k = 0; k < n; ++k) S2: C[i][j] += A[i][k]* C[i][j] += A[i][k]* i   � 1 0 0 0 0 B[k][j]; � j B[k][j]; Θ S 2 .   x S 2 = � 0 1 0 0 0 . k   } }   0 0 1 0 0 n 1 ◮ Represent Static Control Parts (control flow and dependences must be statically computable) ◮ Use code generator (e.g. CLooG) to generate C code from polyhedral representation (provided iteration domains + schedules) 3

Building the Search Space: SMART’08 The Model Original Schedule  i  � 1 0 0 0 for (i = 0; i < n; ++i) j for (i = 0; i < n; ++i) � Θ S 1 . x S 1 = � .   for (j = 0; j < n; ++j){ 0 1 0 0 n for (j = 0; j < n; ++j){   S1: C[i][j] = 0; 1 C[i][j] = 0; for (k = 0; k < n; ++k) for (k = 0; k < n; ++k) S2: C[i][j] += A[i][k]* i C[i][j] += A[i][k]*   � 1 0 0 0 0 B[k][j]; � j B[k][j]; Θ S 2 .   � x S 2 = . 0 1 0 0 0 k   } }   0 0 1 0 0 n 1 ◮ Represent Static Control Parts (control flow and dependences must be statically computable) ◮ Use code generator (e.g. CLooG) to generate C code from polyhedral representation (provided iteration domains + schedules) 3

Building the Search Space: SMART’08 The Model Distribute loops  i  � 1 0 0 0 for (i = 0; i < n; ++i) for (i = 0; i < n; ++i) � j Θ S 1 . x S 1 = � .   for (j = 0; j < n; ++j) for (j = 0; j < n; ++j){ 0 1 0 0 n   C[i][j] = 0; S1: C[i][j] = 0; 1 for (i = n ; i < 2* n; ++i) for (k = 0; k < n; ++k) for (j = 0; j < n; ++j) S2: C[i][j] += A[i][k]* i   � 1 0 0 1 0 for (k = 0; k < n; ++k) B[k][j]; � j Θ S 2 .   C[i -n ][j] += A[i -n ][k]* � x S 2 = . 0 1 0 0 0 k   } B[k][j];   0 0 1 0 0 n 1 ◮ All instances of S1 are executed before the first S2 instance 3

Building the Search Space: SMART’08 The Model Distribute loops + Interchange loops for S2  i  � 1 0 0 0 for (i = 0; i < n; ++i) for (i = 0; i < n; ++i) j � Θ S 1 . x S 1 = � .   for (j = 0; j < n; ++j) for (j = 0; j < n; ++j){ 0 1 0 0 n   C[i][j] = 0; S1: C[i][j] = 0; 1 for ( k = n; k < 2*n; ++k) for (k = 0; k < n; ++k) for (j = 0; j < n; ++j) S2: C[i][j] += A[i][k]* i   � 0 0 1 1 0 for (i = 0; i < n; ++i) B[k][j]; � j Θ S 2 .   C[i][j] += A[i][k-n]* � x S 2 = . 0 1 0 0 0 k   } B[k-n][j];   1 0 0 0 0 n 1 ◮ The outer-most loop for S2 becomes k 3

Building the Search Space: SMART’08 The Model Illegal schedule  i  � 1 0 1 0 for (k = 0; k < n; ++k) for (i = 0; i < n; ++i) j � Θ S 1 . x S 1 = � .   for (j = 0; j < n; ++j) for (j = 0; j < n; ++j){ 0 1 0 0 n   for (i = 0; i < n; ++i) S1: C[i][j] = 0; 1 C[i][j] += A[i][k]* for (k = 0; k < n; ++k) B[k][j]; S2: C[i][j] += A[i][k]* i   � 0 0 1 0 0 for (i = n; i < 2*n; ++i) B[k][j]; � j Θ S 2 .   for (j = 0; j < n; ++j) � x S 2 = . 0 1 0 0 0 k   } C[i-n][j] = 0;   1 0 0 0 0 n 1 ◮ All instances of S1 are executed after the last S2 instance 3

Building the Search Space: SMART’08 The Model A legal schedule  i  � 1 0 1 0 for (i = n; i < 2*n; ++i) for (i = 0; i < n; ++i) j � Θ S 1 . x S 1 = � .   for (j = 0; j < n; ++j) for (j = 0; j < n; ++j){ 0 1 0 0 n   C[i][j] = 0; S1: C[i][j] = 0; 1 for (k= n+1; k<= 2*n; ++k) for (k = 0; k < n; ++k) for (j = 0; j < n; ++j) S2: C[i][j] += A[i][k]* i   � 0 0 1 1 1 for (i = 0; i < n; ++i) B[k][j]; � j Θ S 2 .   C[i][j] += A[i][k-n-1]* � x S 2 = . 0 1 0 0 0 k   } B[k-n-1][j];   1 0 0 0 0 n 1 ◮ Delay the S2 instances ◮ Constraints must be expressed between Θ S 1 and Θ S 2 3

Building the Search Space: SMART’08 The Model Implicit fine-grain parallelism  i  for (i = 0; i < n; ++i) for (i = 0; i < n; ++i) j Θ S 1 . x S 1 = ( 1 0 0 0 ) . �   pfor (j = 0; j < n; ++j) for (j = 0; j < n; ++j){ n   C[i][j] = 0; S1: C[i][j] = 0; 1 for (k = n; k < 2*n; ++k) for (k = 0; k < n; ++k) pfor (j = 0; j < n; ++j) S2: C[i][j] += A[i][k]* i   pfor (i = 0; i < n; ++i) B[k][j]; j Θ S 2 .   C[i][j] += A[i][k-n]* � x S 2 = ( 0 0 1 1 0 ) . k   } B[k-n][j];   n 1 ◮ Number of rows of Θ ↔ number of outer-most sequential loops 3

Building the Search Space: SMART’08 The Model Representing a schedule   i � 1 0 1 0 for (i = n; i < 2*n; ++i) for (i = 0; i < n; ++i) � j Θ S 1 . x S 1 = � .   for (j = 0; j < n; ++j) for (j = 0; j < n; ++j){ 0 1 0 0  n  C[i][j] = 0; S1: C[i][j] = 0; 1 for (k= n+1; k<= 2*n; ++k) for (k = 0; k < n; ++k) for (j = 0; j < n; ++j) S2: C[i][j] += A[i][k]* i   � 0 0 1 1 1 for (i = 0; i < n; ++i) B[k][j]; � j Θ S 2 .   C[i][j] += A[i][k-n-1]* x S 2 = � 0 1 0 0 0 . k   } B[k-n-1][j];   1 0 0 0 0 n 1 � 1 0 0 0 1 1 1 0 1 � . ( i j i j k n n 1 1 ) T x = Θ . � 0 1 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 3

Building the Search Space: SMART’08 The Model Representing a schedule   i � 1 0 1 0 for (i = n; i < 2*n; ++i) for (i = 0; i < n; ++i) � j Θ S 1 . x S 1 = � .   for (j = 0; j < n; ++j) for (j = 0; j < n; ++j){ 0 1 0 0  n  C[i][j] = 0; S1: C[i][j] = 0; 1 for (k= n+1; k<= 2*n; ++k) for (k = 0; k < n; ++k) for (j = 0; j < n; ++j) S2: C[i][j] += A[i][k]* i   � 0 0 1 1 1 for (i = 0; i < n; ++i) B[k][j]; � j Θ S 2 .   C[i][j] += A[i][k-n-1]* x S 2 = � 0 1 0 0 0 . k   } B[k-n-1][j];   1 0 0 0 0 n 1 � 1 0 0 0 1 1 1 0 1 � . ( i j i j k n n 1 1 ) T x = Θ . � 0 1 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 � ı � p c 3

Building the Search Space: SMART’08 The Model Representing a schedule   i � 1 0 1 0 for (i = n; i < 2*n; ++i) for (i = 0; i < n; ++i) � j Θ S 1 . x S 1 = � .   for (j = 0; j < n; ++j) for (j = 0; j < n; ++j){ 0 1 0 0  n  C[i][j] = 0; S1: C[i][j] = 0; 1 for (k= n+1; k<= 2*n; ++k) for (k = 0; k < n; ++k) for (j = 0; j < n; ++j) S2: C[i][j] += A[i][k]* i   � 0 0 1 1 1 for (i = 0; i < n; ++i) B[k][j]; � j Θ S 2 .   C[i][j] += A[i][k-n-1]* x S 2 = � 0 1 0 0 0 . k   } B[k-n-1][j];   1 0 0 0 0 n 1 Transformation Description Changes the direction in which a loop traverses its iteration range reversal � ı Makes the bounds of a given loop depend on an outer loop counter skewing Exchanges two loops in a perfectly nested loop, a.k.a. permutation interchange fusion Fuses two loops, a.k.a. jamming � p distribution Splits a single loop nest into many, a.k.a. fission or splitting peeling Extracts one iteration of a given loop c shifting Allows to reorder loops 3

Building the Search Space: SMART’08 The Search Space Challenges ◮ Completeness (combinatorial problem) ◮ Scalability (large integer polyhedra computation) Proposed solution ◮ Philosophically close to Feautrier’s maximal fine-grain parallelism ◮ One point in the space ⇔ one distinct legal program version ◮ Bound schedule coefficients in [ − 1 , 1 ] to limit control overhead ◮ No completeness, but decent scalability ◮ Deliver a mechanism to automatically complete / correct schedules 4

A Note on the Performance Distribution of Affine Schedules Louis-Nol - PowerPoint PPT Presentation

A Note on the Performance Distribution of Affine Schedules Louis-Nol Pouchet 1 , Cdric Bastoul 1 , John Cavazos 2 and Albert Cohen 1 1 ALCHEMY, INRIA Futurs / University of Paris-Sud XI, France 2 Computer and Information Sciences, University of

Schedules, Schedules, Schedules Schedules, Schedules, Schedules Review of V5 Schedule and Status

Motion Estimation by Affine Transforms Motion Estimation by Affine Transforms Motion Estimation

Agenda Bilateral Settlement Schedules Overview Bilateral Settlement Schedules Online GUI

On the affine VW supercategory Mee Seong Im West Point, NY Interactions of quantum affine

1. Normal distribution 2. Geometric distribution 3. Binomial distribution 4.

CAPSTONE PRESENTATION SCHEDULES PACIFIC LUTHERAN UNIVERSITY SPRING 2015 Schedules as of April

Iterative Optimization in the Polyhedral Model: One-Dimensional Affine Schedules Louis-Nol

(Hierarchical) Identity-Based Encryption from Affine Message Authentication Crypto 2014 , Olivier

Quantitative aspects of linear and affine closed lambda terms Pierre Lescanne Ecole normale

Global Alignment with Affine Gap Penalties Jocelyn Hansson Constant vs. Affine Gap Penalties

Dimensions of invariant measures for affine iterated function systems De-Jun Feng The Chinese

. . . . . : o . affine indep 4 . un . . VECTOR #TEN NO IN 2 WAYS AS AFF . COMB . as= and

Affine objects in a tangent category Geoff Cruttwell Mount Allison University (joint work with

Modified singular value functions and self-affine carpets Jonathan M. Fraser The University of

CLASS SCHEDULES MOVED ALL EOC COURSES BEFORE LUNCH MOVED LUNCH AFTER 3 RD PERIOD (12:30)

WELCOME Amendment of the Schedules to the Financial Intelligence Centre Act: Consultation with

Methods to Better Account for Land Use in Planning Multimodal Transportation Systems and

Artificial I Intelligence f for S Smart Transp sportation Yan Liu Associate Professor

Use Cases of Pervasive Artificial Intelligence for Smart Cities Challenges Julien Nigon, Estle

On Algorithmic Decision Procedures in Emergency Response Systems in Smart and Connected

Smart Jump: Automated Navigation Suggestion for Videos in MOOCs Han Zhang , Maosong Sun ,

Designing Secure Ethereum Smart Contracts: A Finite State Machine Based Approach Anastasia

k N Wo r d S o c r a t i c S e mi n a r ( D a y 1 ) . n o t e b o

Announcements Today: Last lecture , special topic on smart transportation security Attention: