Polyhedral-Based Data Reuse Optimization for Configurable Computing - PowerPoint PPT Presentation

Polyhedral-Based Data Reuse Optimization for Configurable Computing Louis-Noël Pouchet 1 Peng Zhang 1 P . Sadayappan 2 Jason Cong 1 1 University of California, Los Angeles 2 The Ohio State University February 12, 2013 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Monterey, CA

Overview: FPGA’13 Overview The current situation: ◮ Tremendous improvements on FPGA capacity/speed/energy ◮ But off-chip communications remains very costly, on-chip memory is scarce ◮ HLS/ESL tools have made great progresses (ex: AutoESL/Vivado) ◮ But still extensive manual effort needed for best performance ◮ Numerous previous research work on C-to-FPGA (PICO, DEFACTO, MMAlpha, etc.) and data reuse optimizations ◮ But (strong) limitations in applicability / transformations supported / performance achieved UCLA / OSU 2

Overview: FPGA’13 Overview The current situation: ◮ Tremendous improvements on FPGA capacity/speed/energy ◮ But off-chip communications remains very costly, on-chip memory is scarce ⇒ Our solution: automatic, resource-aware data reuse optimization framework (combining loop transformations, on-chip buffers, and communication generation) ◮ HLS/ESL tools have made great progresses (ex: AutoESL/Vivado) ◮ But still extensive manual effort needed for best performance ◮ Numerous previous research work on C-to-FPGA (PICO, DEFACTO, MMAlpha, etc.) and data reuse optimizations ◮ But (strong) limitations in applicability / transformations supported / performance achieved UCLA / OSU 2

Overview: FPGA’13 Overview The current situation: ◮ Tremendous improvements on FPGA capacity/speed/energy ◮ But off-chip communications remains very costly, on-chip memory is scarce ⇒ Our solution: automatic, resource-aware data reuse optimization framework (combining loop transformations, on-chip buffers, and communication generation) ◮ HLS/ESL tools have made great progresses (ex: AutoESL/Vivado) ◮ But still extensive manual effort needed for best performance ⇒ Our solution: complete HLS-focused source-to-source compiler ◮ Numerous previous research work on C-to-FPGA (PICO, DEFACTO, MMAlpha, etc.) and data reuse optimizations ◮ But (strong) limitations in applicability / transformations supported / performance achieved UCLA / OSU 2

Overview: FPGA’13 Overview The current situation: ◮ Tremendous improvements on FPGA capacity/speed/energy ◮ But off-chip communications remains very costly, on-chip memory is scarce ⇒ Our solution: automatic, resource-aware data reuse optimization framework (combining loop transformations, on-chip buffers, and communication generation) ◮ HLS/ESL tools have made great progresses (ex: AutoESL/Vivado) ◮ But still extensive manual effort needed for best performance ⇒ Our solution: complete HLS-focused source-to-source compiler ◮ Numerous previous research work on C-to-FPGA (PICO, DEFACTO, MMAlpha, etc.) and data reuse optimizations ◮ But (strong) limitations in applicability / transformations supported / performance achieved ⇒ Our solution: unleash the true power of the polyhedral framework (loop transfo., comm. scheduling, etc.) UCLA / OSU 2

The Polyhedral Model: FPGA’13 The Polyhedral Model in a Nutshell Affine program regions: ◮ Loops have affine control only (over-approximation otherwise) ⊲ Image processing, including medical imaging pipeline (NSF CDSC project) ⊲ Linear algebra ⊲ Iterative solvers (PDE, etc.) UCLA / OSU 3

The Polyhedral Model: FPGA’13 The Polyhedral Model in a Nutshell Affine program regions: ◮ Loops have affine control only (over-approximation otherwise) ◮ Iteration domain: represented as integer polyhedra for (i=1; i<=n; ++i)   1 0 0 − 1   i − 1 0 1 0 . for (j=1; j<=n; ++j)   j  ≥ � D S 1 =     0 1 0 − 1 . 0     . . if (i<=n-j+2) n    − 1 0 1 0   1 . . . s[i] = ... − 1 − 1 1 2 UCLA / OSU 3

The Polyhedral Model: FPGA’13 The Polyhedral Model in a Nutshell Affine program regions: ◮ Loops have affine control only (over-approximation otherwise) ◮ Iteration domain: represented as integer polyhedra ◮ Memory accesses: static references, represented as affine functions of � x S and � p   x S 2 � � 1 0 � f s ( � x S 2 ) = 0 0 . n   1 for (i=0; i<n; ++i) { . s[i] = 0;   � x S 2 � � 1 0 0 0 . for (j=0; j<n; ++j) f a ( � x S 2 ) = . n   0 1 0 0 . . s[i] = s[i]+a[i][j]*x[j]; 1 }   � x S 2 � 0 0 � f x ( � x S 2 ) = . 1 0 n   1 UCLA / OSU 3

The Polyhedral Model: FPGA’13 The Polyhedral Model in a Nutshell Affine program regions: ◮ Loops have affine control only (over-approximation otherwise) ◮ Iteration domain: represented as integer polyhedra ◮ Memory accesses: static references, represented as affine functions of � x S and � p ◮ Data dependence between S1 and S2: a subset of the Cartesian product of D S 1 and D S 2 ( exact analysis ) S1 iterations for (i=1; i<=3; ++i) {  1 − 1 0 0  1 0 0 − 1 . s[i] = 0;     iS 1  − 1 0 0 3  = 0   iS 2 . for (j=1; j<=3; ++j)   D S 1 δ S 2 :  0 1 0 − 1  . S2 iterations     jS 2 ≥ � 0 . . s[i] = s[i] + 1;    0 − 1 0 3    1  0 0 1 − 1  } 0 0 − 1 3 i UCLA / OSU 3

The Polyhedral Model: FPGA’13 The Polyhedral Model in a Nutshell Affine program regions: ◮ Loops have affine control only (over-approximation otherwise) ◮ Iteration domain: represented as integer polyhedra ◮ Memory accesses: static references, represented as affine functions of � x S and � p ◮ Data dependence between S1 and S2: a subset of the Cartesian product of D S 1 and D S 2 ( exact analysis ) Polyhedral compilation: ◮ Precise dataflow analysis [Feautrier,88] ◮ Optimal algorithms for data locality [Bondhugula,08] ◮ Effective code generation [Bastoul,04] ◮ Computationally expensive algorithms (ILP/PIP) UCLA / OSU 3

Data Reuse Optimization: FPGA’13 Step 1: Scheduling for Better Data Reuse ◮ Main idea: schedule operations accessing the same data as close as possible from each other ◮ Tiling is useful, but not all programs are tilable by default! ⊲ Need complex sequence of loop transformations to enable tiling ⊲ The Tiling Hyperplane method automatically finds such sequence ⊲ Uses an ILP for the optimization problem ◮ In our software, the first stage is to transform the input code so that: The number of tilable "loops" is maximized 1 Temporal data locality is maximized 2 All tilable loops can be tiled with an arbitrary tile size 3 UCLA / OSU 4

Data Reuse Optimization: FPGA’13 Step 2: Reuse Data Using On-Chip Buffers Key ideas: ◮ Compute the set of data used at a given loop iteration ◮ Reuse data between consecutive loop iterations ◮ The process works for any loop in the program ◮ Natural complement of tiling: the tile size will determine how much data is read by a non-inner-loop iteration ◮ The polyhedral framework can be used to easily compute all this information , including what to communicate UCLA / OSU 5

Data Reuse Optimization: FPGA’13 Computing the Per-Iteration Data Reuse j-2 j-1 j j+1 j+2 // Two-dimensional Jacobi-like stencil i+2 for (t = 0; t < T; ++t) for (i = 0; i < N; ++i) i+1 for (j = 0; j < N; ++j) B[i][j] = 0.2*( A[i][j-1] i + A[i][j] + A[i][j+1] i-1 + A[i-1][j] + A[i+1][j]); i-2 UCLA / OSU 6

Data Reuse Optimization: FPGA’13 Computing the Per-Iteration Data Reuse j-2 j-1 j j+1 j+2 Compute the data space of A, at it- i+2 eration � x = ( t , i , j ) i+1 � FS s DS A ( � x ) = A ( � x ) i s ∈ S i-1 F ( � x ) is the image of � x by the function i-2 F . UCLA / OSU 7

Data Reuse Optimization: FPGA’13 Computing the Per-Iteration Data Reuse j-2 j-1 j j+1 j+2 Compute the data space of A, at it- i+2 y = ( t , i , j − 1 ) eration � i+1 FS s � DS A ( � y ) = A ( � y ) i s ∈ S i-1 i-2 UCLA / OSU 7

Data Reuse Optimization: FPGA’13 Computing the Per-Iteration Data Reuse j-2 j-1 j j+1 j+2 i+2 Reused data: red set i+1 i ReuseSet = DS A ( x ) ∩ DS A ( y ) � � i-1 i-2 UCLA / OSU 7

Polyhedral-Based Data Reuse Optimization for Configurable Computing - PowerPoint PPT Presentation

Polyhedral-Based Data Reuse Optimization for Configurable Computing Louis-Nol Pouchet 1 Peng Zhang 1 P . Sadayappan 2 Jason Cong 1 1 University of California, Los Angeles 2 The Ohio State University February 12, 2013 ACM/SIGDA International

Fibre Optic Multiplexer Configurable The What is the Badger Fully configurable Audio/Data

Sampling Effect on Performance Prediction of Configurable Systems : A Case Study Juliana Alves

Overview of Overview of configurable architectures configurable architectures Prof. Kurt

Polyhedral Volumes Visual Techniques T. V. Raman & M. S. Krishnamoorthy Polyhedral Volumes

Polyhedral Volumes Visual Techniques T. V. Raman & M. S. Krishnamoorthy Polyhedral Volumes

Configurable software- -based based Configurable software edge router architecture edge router

The Polyhedral Model Beyond Loops Recursion Optimization and Parallelization Through Polyhedral

Maca a configurable tool to Maca a configurable tool to integrate Polish morphological

1 Infrastructure Requirements Limit Reuse Planned Indirect Potable Reuse (Purple pipe may be a

Dual-Mode Configurable RISC-V Processor IP Nuclei System Technology Dual-Mode

Designing a Web of Highly-Configurable Designing a Web of Highly-Configurable Intrusion Detection

An Architecture for An Architecture for Configurable Dependability of Configurable Dependability

Reinforcement Learning in Configurable Continuous Environments Alberto Maria Metelli, Emanuele

A Configurable Hardware Scheduler A Configurable Hardware Scheduler (CHS) for Real- -Time

Computing the Cohomology Ring of a Polyhedral Complex Joint work with D. Kravatz, R.

A study of some pitfalls preventing peak performance in polyhedral compilation using a polyhedral

Search for Dark Ma-er with bubble chambers Orin Harris Northeastern Illinois University APS DPF

GPRS/EDGE/UMTS/HSPA mobile data communications David Perez Jose Pico Introduction It has

CAT. THEORY Image by Gidon Pico from Pixabay = ? objects morphisms points in morphisms from

Creating a Moment 7 tips for effective employee recognition Mike Byam Author of

Dark matter searches using superheated liquid detectors Manuel Bou Cabo RICAP-14, Noto (Sicily,

signin.ritlug.com sign in! Windows Subsystem For Linux RITlug - Week 4! Solomon Rubin Hold up.

Modern Wireless Networks 5G Multipoint Coordination & Transmission ICEN 574 Spring 2019

Pico-Satellite Training Kit HEPTA-Sat: Hands-on Practices for Space Engineering Masahiko

Polyhedral-Based Data Reuse Optimization for Configurable Computing - PowerPoint PPT Presentation

Polyhedral-Based Data Reuse Optimization for Configurable Computing Louis-Nol Pouchet 1 Peng Zhang 1 P . Sadayappan 2 Jason Cong 1 1 University of California, Los Angeles 2 The Ohio State University February 12, 2013 ACM/SIGDA International

Fibre Optic Multiplexer Configurable The What is the Badger Fully configurable Audio/Data

Sampling Effect on Performance Prediction of Configurable Systems : A Case Study Juliana Alves

Overview of Overview of *configurable* architectures *configurable* architectures Prof. Kurt

Polyhedral Volumes Visual Techniques T. V. Raman &amp; M. S. Krishnamoorthy Polyhedral Volumes

Polyhedral Volumes Visual Techniques T. V. Raman &amp; M. S. Krishnamoorthy Polyhedral Volumes

Configurable software- -based based Configurable software edge router architecture edge router

The Polyhedral Model Beyond Loops Recursion Optimization and Parallelization Through Polyhedral

Maca a configurable tool to Maca a configurable tool to integrate Polish morphological

1 Infrastructure Requirements Limit Reuse Planned Indirect Potable Reuse (Purple pipe may be a

Dual-Mode Configurable RISC-V Processor IP Nuclei System Technology Dual-Mode

Designing a Web of Highly-Configurable Designing a Web of Highly-Configurable Intrusion Detection

An Architecture for An Architecture for Configurable Dependability of Configurable Dependability

Reinforcement Learning in Configurable Continuous Environments Alberto Maria Metelli, Emanuele

A Configurable Hardware Scheduler A Configurable Hardware Scheduler (CHS) for Real- -Time

Computing the Cohomology Ring of a Polyhedral Complex Joint work with D. Kravatz, R.

A study of some pitfalls preventing peak performance in polyhedral compilation using a polyhedral

Search for Dark Ma-er with bubble chambers Orin Harris Northeastern Illinois University APS DPF

GPRS/EDGE/UMTS/HSPA mobile data communications David Perez Jose Pico Introduction It has

CAT. THEORY Image by Gidon Pico from Pixabay = ? objects morphisms points in morphisms from

Creating a Moment 7 tips for effective employee recognition Mike Byam Author of

Dark matter searches using superheated liquid detectors Manuel Bou Cabo RICAP-14, Noto (Sicily,

signin.ritlug.com sign in! Windows Subsystem For Linux RITlug - Week 4! Solomon Rubin Hold up.

Modern Wireless Networks 5G Multipoint Coordination &amp; Transmission ICEN 574 Spring 2019

Pico-Satellite Training Kit HEPTA-Sat: Hands-on Practices for Space Engineering Masahiko

Overview of Overview of configurable architectures configurable architectures Prof. Kurt

Polyhedral Volumes Visual Techniques T. V. Raman & M. S. Krishnamoorthy Polyhedral Volumes

Polyhedral Volumes Visual Techniques T. V. Raman & M. S. Krishnamoorthy Polyhedral Volumes

Modern Wireless Networks 5G Multipoint Coordination & Transmission ICEN 574 Spring 2019