Polyhedral Transformations of Explicitly Parallel Programs Prasanth - PowerPoint PPT Presentation

Polyhedral Transformations of Explicitly Parallel Programs Prasanth Chatarasi, Jun Shirako, Vivek Sarkar Habanero Extreme Scale Software Research Group Department of Computer Science Rice University January 19, 2015 1/42 Prasanth Chatarasi, Jun Shirako, Vivek Sarkar IMPACT Workshop, 19 Jan 2015

Introduction Introduction 1 Explicit Parallelism and Motivation 2 Our Approach 3 Preliminary Results 4 Related Work 5 Conclusions, Future work and Acknowledgments 6 2/42 Prasanth Chatarasi, Jun Shirako, Vivek Sarkar IMPACT Workshop, 19 Jan 2015

Introduction Introduction Software with explicit parallelism is on rise Two major compiler approaches for program optimizations AST-based Polyhedral-based Past work on transformations of parallel programs using AST-based approaches E.g., [Nicolau et.al 2009], [Nandivada et.al 2013] Polyhedral frameworks for analysis and transformations of explicitly parallel programs ?? 3/42 Prasanth Chatarasi, Jun Shirako, Vivek Sarkar IMPACT Workshop, 19 Jan 2015

Introduction Introduction Explicit parallelism is different from sequential execution Partial order instead of total order No execution order among parallel portions → no dependence For the compiler, explicit parallelism can mitigate imprecision that accompanies unanalyzable data accesses from a variety of sources. Unrestricted pointer aliasing Unknown function calls Non-affine constructs Non-affine expressions in array subscripts Indirect array subscripts Non-affine loop bounds Use of Structs 4/42 Prasanth Chatarasi, Jun Shirako, Vivek Sarkar IMPACT Workshop, 19 Jan 2015

Explicit Parallelism and Motivation Introduction 1 Explicit Parallelism and Motivation 2 Our Approach 3 Preliminary Results 4 Related Work 5 Conclusions, Future work and Acknowledgments 6 5/42 Prasanth Chatarasi, Jun Shirako, Vivek Sarkar IMPACT Workshop, 19 Jan 2015

Explicit Parallelism and Motivation Explicit Parallelism Logical parallelism is a specification of a partial order , referred to as a happens-before relation HB(S1, S2) = true ↔ S1 must happen before S2 Currently, we focus on explicitly parallel programs that satisfy serial-elision property Doall parallelism Doacross parallelism 6/42 Prasanth Chatarasi, Jun Shirako, Vivek Sarkar IMPACT Workshop, 19 Jan 2015

Explicit Parallelism and Motivation Explicit Parallelism - Doall (OpenMP) In case of OpenMP, Doall parallelism is equivalent to the parallel for clause. Happens-before relations exist only among statements in the same iteration Guarantees no cross-iteration dependence 1 #pragma omp p a r a l l e l f o r 2 f o r ( i − loop ) { 3 S1 ; 4 S2 ; 5 S3 ; 6 } 7/42 Prasanth Chatarasi, Jun Shirako, Vivek Sarkar IMPACT Workshop, 19 Jan 2015

Explicit Parallelism and Motivation Explicit Parallelism - Doall (OpenMP) - Example LU Decomposition - Rodinia benchmarks [Shuai et.al 09] 1 f o r ( i = 0; i < size ; i ++) { 2 # pragma omp parallel f o r 3 f o r ( j = i ; j < size ; j ++) { 4 # pragma omp parallel f o r reduction (+: a ) 5 f o r ( k = 0; k < i ; k ++) { 6 a [ i ∗ size + j ] − = a [ i ∗ size + k ] ∗ a [ k ∗ size + j ] ; 7 } 8 } 9 . . . . 10 } j,k-loops are annotated as parallel loops and k-loop is parallel with a reduction on array a Poor spatial locality because of access pattern k*size+j for array a With happens-before relations from doall , loop permutation can be applied to improve spatial locality. 8/42 Prasanth Chatarasi, Jun Shirako, Vivek Sarkar IMPACT Workshop, 19 Jan 2015

Explicit Parallelism and Motivation Explicit Parallelism - Doall (OpenMP) - Example Permuted kernel 1 f o r ( i = 0; i < size ; i ++) { 2 # pragma f o r reduction (+: a ) private ( j ) omp parallel 3 f o r ( k = 0; k < i ; k ++) { 4 f o r ( j = i ; j < size ; j ++) { 5 a [ i ∗ size + j ] − = a [ i ∗ size + k ] ∗ a [ k ∗ size + j ] ; 6 } 7 } 8 . . . . 9 } 1.25X performance on Intel Xeon Phi coprocessor with 228 threads and input size as 2K Array subscripts are non-affine (but can be made affine with delinearization and perform permutation) [Tobias et.al 15] 9/42 Prasanth Chatarasi, Jun Shirako, Vivek Sarkar IMPACT Workshop, 19 Jan 2015

Explicit Parallelism and Motivation Explicit Parallelism - Doacross (OpenMP) In case of OpenMP, Doacross parallelism is equivalent to proposed extension [Shirako et.al 13] to the ordered clause (appears in OpenMP 4.1). To specify cross-iteration dependences of a parallelized loop 1 #pragma omp p a r a l l e l f o r ordered (1) 2 f o r ( i − loop ) { 3 S1 ; 4 #pragma omp ordered depend ( s i n k : i − 1) 5 S2 ; 6 #pragma omp ordered depend ( source : i ) 7 S3 ; 8 } 10/42 Prasanth Chatarasi, Jun Shirako, Vivek Sarkar IMPACT Workshop, 19 Jan 2015

Explicit Parallelism and Motivation Explicit Parallelism - Doacross (OpenMP) - Example 1 // Assume a r r a y A i s a nested a r r a y 2 #pragma omp p a r a l l e l f o r ordered (3) 3 f o r ( t = 0; t < = _PB_TSTEPS − 1; t ++) { 4 f o r ( i = 1; i < = _PB_N − 2; i ++) { 5 f o r ( j = 1; j < = _PB_N − 2; j ++) { 6 #pragma omp ordered depend ( s i n k : t , i − 1 , j +1) depend ( s i n k : t , i , j − 1) \ 7 depend ( sink : t − 1 , i +1, j +1) 8 A [ i ] [ j ] = ( A [ i − 1 ] [ j − 1] + A [ i − 1 ] [ j ] + A [ i − 1 ] [ j +1] + A [ i ] [ j − 1] 9 + A [ i ] [ j ] + A [ i ] [ j +1] + A [ i +1][ j − 1] + A [ i +1][ j ] 10 + A [ i +1][ j +1]) / 9 . 0 ; 11 #pragma omp ordered depend ( source : t , i , j ) 12 } 13 } 14 } 2-dimensional 9 point Gauss Seidel computation - [PolyBench] Annotated as 3-D Doacross loop nest Even though loop nest has affine accesses, C’s unrestricted aliasing semantics for nested arrays can prevent a sound compiler analysis from detecting exact cross iteration dependences. 11/42 Prasanth Chatarasi, Jun Shirako, Vivek Sarkar IMPACT Workshop, 19 Jan 2015

Explicit Parallelism and Motivation Explicit Parallelism - Doacross (OpenMP) - Example 1 // Assume a r r a y A i s a nested a r r a y 2 #pragma omp p a r a l l e l f o r ordered (3) 3 f o r ( t = 0; t < = _PB_TSTEPS − 1; t ++) { 4 f o r ( i = 1; i < = _PB_N − 2; i ++) { 5 f o r ( j = 1; j < = _PB_N − 2; j ++) { 6 #pragma omp ordered depend ( s i n k : t , i − 1 , j +1) depend ( s i n k : t , i , j − 1) \ 7 depend ( sink : t − 1 , i +1, j +1) 8 A [ i ] [ j ] = ( A [ i − 1 ] [ j − 1] + A [ i − 1 ] [ j ] + A [ i − 1 ] [ j +1] + A [ i ] [ j − 1] 9 + A [ i ] [ j ] + A [ i ] [ j +1] + A [ i +1][ j − 1] + A [ i +1][ j ] 10 + A [ i +1][ j +1]) / 9 . 0 ; 11 #pragma omp ordered depend ( source : t , i , j ) 12 } 13 } 14 } Through cross-iteration dependences via doacross, loop skewing and tiling can be performed to improve both locality and parallelism granularity. 2.2X performance on Intel Xeon Phi coprocessor with 228 threads and input for 100 time steps on a 2K X 2K matrix. 11/42 Prasanth Chatarasi, Jun Shirako, Vivek Sarkar IMPACT Workshop, 19 Jan 2015

Our Approach Introduction 1 Explicit Parallelism and Motivation 2 Our Approach 3 Preliminary Results 4 Related Work 5 Conclusions, Future work and Acknowledgments 6 12/42 Prasanth Chatarasi, Jun Shirako, Vivek Sarkar IMPACT Workshop, 19 Jan 2015

Our Approach Approach - Idea Overestimate dependences based on the sequential order ignore parallel constructs Improve dependence accuracy via explicit parallelism obtain happens-before relations from parallel constructs intersect HB relations with conservative dependences Transformations via polyhedral optimizers PLuTo [Bondhugula et.al 2008] Poly+AST [Shirako et.al 2014] Code generation with parallel constructs Focus on Doall and Doacross constructs Non-affine subscripts and Indirect arrays subscripts 13/42 Prasanth Chatarasi, Jun Shirako, Vivek Sarkar IMPACT Workshop, 19 Jan 2015

Our Approach Algorithm - Framework 14/42 Prasanth Chatarasi, Jun Shirako, Vivek Sarkar IMPACT Workshop, 19 Jan 2015

Our Approach Algorithm - Motivation Conservative dependence analysis May-information on access range of non-affine array subscripts Our existing implementation uses scoplib format for convenience (rather than openscop ) No support for access relations in scoplib format (to the best of our knowledge) What could potentially represent possible access range of non-affine subscript in polyhedral model? Iterator ? Cannot be part of loops Parameter ? Cannot be loop invariant 15/42 Prasanth Chatarasi, Jun Shirako, Vivek Sarkar IMPACT Workshop, 19 Jan 2015

Our Approach Approach - Dummy vector Approach use dummy variables to overestimate access range of non-affine subscripts A dummy corresponds to a non-affine expression Compute conservative dependences via dummy variables Dummy vector = vector of dummy variables from same scop Each dynamic instance of a statement S is uniquely identified by combination of: its iteration vector ( � i S ) dummy vector ( � d S ) parameter vector ( � p ) 16/42 Prasanth Chatarasi, Jun Shirako, Vivek Sarkar IMPACT Workshop, 19 Jan 2015

Polyhedral Transformations of Explicitly Parallel Programs Prasanth - PowerPoint PPT Presentation

Polyhedral Transformations of Explicitly Parallel Programs Prasanth Chatarasi, Jun Shirako, Vivek Sarkar Habanero Extreme Scale Software Research Group Department of Computer Science Rice University January 19, 2015 1/42 Prasanth Chatarasi,

Polyhedral Volumes Visual Techniques T. V. Raman & M. S. Krishnamoorthy Polyhedral Volumes

Polyhedral Volumes Visual Techniques T. V. Raman & M. S. Krishnamoorthy Polyhedral Volumes

Extending the Polyhedral Compilation Model for Debugging and Optimization of SPMD-style Explicitly

Combining Polyhedral and AST Transformations in CHiLL Huihui Zhang , Anand Venkat, Protonu Basu,

Computing the Cohomology Ring of a Polyhedral Complex Joint work with D. Kravatz, R.

A study of some pitfalls preventing peak performance in polyhedral compilation using a polyhedral

The Polyhedral Model Beyond Loops Recursion Optimization and Parallelization Through Polyhedral

Computing the Cohomology Algebra of a Polyhedral Complex Joint work with R. Gonzalez-Diaz &

AlphaZ: A System for Design Space Exploration in the Polyhedral Model Tomofumi Yuki, Gautam

Linear Transformations Linear Transformations 1 / 21 Linear Transformations A function T from R

CMSC427 Transformations I Credit: slides 9+ from Prof. Zwicker Transformations: outline

Transformations Composition of Transformations Congruence Transformations Dilations Similarity

Lecture 6: Normal Transformations, 3D Transformations, Euler Angles COMPSCI/MATH 290-04 Chris

lecture 3 view transformations model transformations GL_MODELVIEW transformation view

Transformations & Transformations & Coordinate Systems Coordinate Systems CSCD 472?

Transformations and Matrices Transformations I Transformations are functions Matrices

Arrhythmia Mechanisms Disclosures Honoria Abbott Biotronik William G. Stevenson,

Marine biology and bioinformatics Olga Konovalova Lomonosov Moscow State University

Electrical Storm in Coronary Artery Disease Saeed Oraii MD, Cardiologist Interventional

Glyph-based Visualization Applications David H. S. Chung Swansea University Outline Glyph

Automatic Virtualization of Accelerators Hangchen Yu, Arthur Michener Peters , Amogh Akshintala,

Safe Limits on Voltage Reduction Efficiency in GPUs: a Direct Measurement Approach Jingwen Leng,

Purely Functional GPU Programming with Futhark Troels Henriksen (athas@sigkill.dk) Computer

Data Parallel Programming in Futhark Troels Henriksen (athas@sigkill.dk) DIKU University of

Polyhedral Transformations of Explicitly Parallel Programs Prasanth - PowerPoint PPT Presentation

Polyhedral Transformations of Explicitly Parallel Programs Prasanth Chatarasi, Jun Shirako, Vivek Sarkar Habanero Extreme Scale Software Research Group Department of Computer Science Rice University January 19, 2015 1/42 Prasanth Chatarasi,

Polyhedral Volumes Visual Techniques T. V. Raman &amp; M. S. Krishnamoorthy Polyhedral Volumes

Polyhedral Volumes Visual Techniques T. V. Raman &amp; M. S. Krishnamoorthy Polyhedral Volumes

Extending the Polyhedral Compilation Model for Debugging and Optimization of SPMD-style Explicitly

Combining Polyhedral and AST Transformations in CHiLL Huihui Zhang , Anand Venkat, Protonu Basu,

Computing the Cohomology Ring of a Polyhedral Complex Joint work with D. Kravatz, R.

A study of some pitfalls preventing peak performance in polyhedral compilation using a polyhedral

The Polyhedral Model Beyond Loops Recursion Optimization and Parallelization Through Polyhedral

Computing the Cohomology Algebra of a Polyhedral Complex Joint work with R. Gonzalez-Diaz &amp;

AlphaZ: A System for Design Space Exploration in the Polyhedral Model Tomofumi Yuki, Gautam

Linear Transformations Linear Transformations 1 / 21 Linear Transformations A function T from R

CMSC427 Transformations I Credit: slides 9+ from Prof. Zwicker Transformations: outline

Transformations Composition of Transformations Congruence Transformations Dilations Similarity

Lecture 6: Normal Transformations, 3D Transformations, Euler Angles COMPSCI/MATH 290-04 Chris

lecture 3 view transformations model transformations GL_MODELVIEW transformation view

Transformations &amp; Transformations &amp; Coordinate Systems Coordinate Systems CSCD 472?

Transformations and Matrices Transformations I Transformations are functions Matrices

Arrhythmia Mechanisms Disclosures Honoria Abbott Biotronik William G. Stevenson,

Marine biology and bioinformatics Olga Konovalova Lomonosov Moscow State University

Electrical Storm in Coronary Artery Disease Saeed Oraii MD, Cardiologist Interventional

Glyph-based Visualization Applications David H. S. Chung Swansea University Outline Glyph

Automatic Virtualization of Accelerators Hangchen Yu, Arthur Michener Peters , Amogh Akshintala,

Safe Limits on Voltage Reduction Efficiency in GPUs: a Direct Measurement Approach Jingwen Leng,

Purely Functional GPU Programming with Futhark Troels Henriksen (athas@sigkill.dk) Computer

Data Parallel Programming in Futhark Troels Henriksen (athas@sigkill.dk) DIKU University of

Polyhedral Volumes Visual Techniques T. V. Raman & M. S. Krishnamoorthy Polyhedral Volumes

Polyhedral Volumes Visual Techniques T. V. Raman & M. S. Krishnamoorthy Polyhedral Volumes

Computing the Cohomology Algebra of a Polyhedral Complex Joint work with R. Gonzalez-Diaz &

Transformations & Transformations & Coordinate Systems Coordinate Systems CSCD 472?