socao source to source opencl compiler for intel altera
play

SOCAO: Source-to-Source OpenCL Compiler for Intel-Altera FPGAs - PowerPoint PPT Presentation

SOCAO: Source-to-Source OpenCL Compiler for Intel-Altera FPGAs Johanna Rohde 1 , Marcos Martinez-Peir 2 , Rafael Gadea-Girons 2 Date: 7.9.2017 1 Computer Systems Group, TU Darmstadt 2 Department of Electronic Engineering, Universitat


  1. SOCAO: Source-to-Source OpenCL Compiler for Intel-Altera FPGAs Johanna Rohde 1 , Marcos Martinez-Peiró 2 , Rafael Gadea-Gironés 2 Date: 7.9.2017 1 Computer Systems Group, TU Darmstadt 2 Department of Electronic Engineering, Universitat Politècnica de València 07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 1

  2. Overview ● Introduction ● Background ● Design ● Implementation ● Evaluation ● Conclusion 07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 2

  3. Introduction Problem: Accelerate a program with an FPGA ● How do I program the FPGA? ● How do I communicate with the FPGA? ● How much time do I need to rewrite the code? 07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 3

  4. Introduction Low Level OpenCL SOCAO Compiler ASIC Tools for FPGA for C to OpenCL FPGA Programmers Parallel Programmers Sofuware Programmers 07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 4

  5. Background ● OpenCL Open programming standard for heterogeneous parallel systems – Calculations are passed to external accelerator – Accelerator can be – ● CPU ● GPU ● FPGA ● ... 07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 5

  6. Background OpenCL Platform ● Host – Manages the system ● Host Is connected to one or more ● compute devices Compute Device – Compute Device Executes a kernel Compute Unit Compute Unit Compute Unit ● PE PE PE PE PE PE Consists of multiple compute ... ... ... ... ● units PE PE PE PE PE PE Compute Unit – Consists of multiple processing ● elements 07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 6

  7. Background OpenCL ● Memory Model – Global Memory ● Used to transfer data ● Accessible by all work groups DEVICE Global ● Normally the slowest memory Memory WORK GROUP WORK Host HOST – Constant Memory ITEM Memory Local Private Memory Memory ● Used to save constants Constant WORK ITEM Memory – Local Memory Private Memory ... ● Accessible by all work items of one work group ● Not accessible by host – Private Memory ● Accessible by one work item ● Holds intermediate values 07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 7

  8. Background OpenCL Host Program Flow ● Host OpenCL Host OpenCL Program Kernel Program Kernel Set Kernel Write Buffer Arguments Initialize Load Kernel Environment Launch Kernel Read Buffer Execution 07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 8

  9. Background OpenCL for FPGAs ● 2 additional forms of parallelism ● Instruction-level parallelism – Instructions that are independent of each other can be calculated at the same time a b + a b c d c d + + + x = ( a + b )*( c + d ); y = x - e ; * e 4 * time e z = x << 4 ; - << - y z y 4 << z 07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 9

  10. Background OpenCL for FPGAs ● Loop Pipelining – Iterations are overlapped – Ideal case: One Iteration per clock cycle – Problem when the loop has loop-carried dependencies time +/+ * <</- for( int a = 0 ; a < 4 ; a ++) iteration 1: { +/+ * <</- iteration 2: x = ( a + b )*( c + d ); +/+ * <</- y = x << 4 ; iteration 3: z = x - e ; +/+ * <</- } iteration 4: 07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 10

  11. Background OpenCL for FPGAs ● Intel’s SDK for OpenCL Host Code OpenCL Accelerator Code void sum ( int * A , int * B , __kernel void void sum ( int * A , int * B , __kernel void int * res , int size ) sum ( __global int * A , int * res , int size ) sum ( __global int * A , { __global int * B , { __global int * B , clEnqueueWriteBuffer (...); __global int * res , clEnqueueWriteBuffer (...); __global int * res , ClEnqueueTask (...); int size ) ClEnqueueTask (...); int size ) ClEnqueueReadBuffer (...); { ClEnqueueReadBuffer (...); { } for( int i = 0 ; i < size ; i ++) } for( int i = 0 ; i < size ; i ++) Intel Intel res [ i ] = A [ i ] + B [ i ]; res [ i ] = A [ i ] + B [ i ]; Cross Cross Offmine } Offmine } Compiler Compiler Compiler Compiler . exe . exe . aocx . aocx 07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 11

  12. Design Input Program //Altera_OpenCL_Accelerate //Altera_OpenCL_Accelerate //Altera_OpcnCL_size A size //Altera_OpcnCL_size A size //Altera_OpenCL... //Altera_OpenCL... void sum ( int * A , int * B , int * res , int size ) void sum ( int * A , int * B , int * res , int size ) { { for( int i = 0 ; i < size ; i ++) for( int i = 0 ; i < size ; i ++) res [ i ] = A [ i ]+ B [ i ]; res [ i ] = A [ i ]+ B [ i ]; } } SOCAO Compiler SOCAO Compiler SOCAO Compiler SOCAO Compiler Host Code OpenCL Accelerator Code void sum ( int * A , int * B , __kernel void void sum ( int * A , int * B , __kernel void int * res , int size ) aocl_generated_kernel ( int * res , int size ) aocl_generated_kernel ( { __global int * A , { __global int * A , clEnqueueWriteBuffer ( ... ); __global int * B , clEnqueueWriteBuffer ( ... ); __global int * B , clEnqueueTask ( … ); __global int * res , clEnqueueTask ( … ); __global int * res , clEnqueueReadBuffer ( … ); int size ) clEnqueueReadBuffer ( … ); int size ) } { } { for( int i = 0 ; i < size ; i ++) for( int i = 0 ; i < size ; i ++) res [ i ] = A [ i ] + B [ i ]; res [ i ] = A [ i ] + B [ i ]; } } 07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 12

  13. Implementation Front-end Middle-end Back-end Semantic Check Unparse Start Function Detection yes Parse Successful End Function Analysis & no yes Transformation Successful Abort no Host Program Abort Transformation Kernel Creation • ROSE Framework – Open-source compiler framework – Provides front-end, back-end and additional functionalities 07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 13

  14. Implementation Front-end Middle-end Back-end Semantic Check Unparse Start Function Detection yes Parse Successful End Function Analysis & no yes Transformation Successful Abort no Host Program Abort Transformation Kernel Creation • ROSE Framework – Open-source compiler framework – Provides front-end, back-end and additional functionalities 07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 14

  15. Implementation Inline Constant Value Constant Constant Array 2D to 1D Array Transformation Transformation Folding Analysis Transformation Memory Typedef Parameter Loop Unrolling In/Out Analysis Analysis Analysis Analysis ● The Function Analysis & Transformation phase is the most important ● All decisions are made during this phase ● Consists of 10 analysis/transformation steps 07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 15

  16. Implementation Inline Constant Value Constant Constant Array 2D to 1D Array Transformation Transformation Folding Analysis Transformation Memory Typedef Parameter Loop Unrolling In/Out Analysis Analysis Analysis Analysis void vector_process ( char * input , void vector_update ( char * input , char value ) int ilen ) { { int i ; { for( i = 0 ; i < 64 ; i ++) char value_1 = 'c' ; input [ i ] += value ; int i ; } for( i = 0 ; i < 64 ; i ++) input [ i ] += value_1 ; void vector_update ( char * input , } int ilen ) } { vector_process ( input , 'c' ); } 07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 16

  17. Implementation Inline Constant Value Constant Constant Array 2D to 1D Array Transformation Transformation Folding Analysis Transformation Memory Typedef Parameter Loop Unrolling In/Out Analysis Analysis Analysis Analysis program.cpp aocl_kernel.cl const uint32_t K [] = { 0x428A2F98 , __constant const uint32_t K [] = { …}; 0x428A2F98 , …}; //Altera_OpenCL_Accelerate __kernel void //Altera_OpenCL_size K 64 aocl_generated_kernel ( …. ) //Altera_OpenCL_const_vec K { … … void update_accelerated ( … ) } 07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 17

  18. Implementation Inline Constant Value Constant Constant Array 2D to 1D Array Transformation Transformation Folding Analysis Transformation Memory Typedef Parameter Loop Unrolling In/Out Analysis Analysis Analysis Analysis #define WIDTH 512 #define WIDTH 512 int [ 5 ][ WIDTH ] A ; __kernel void aocl_generated_kernel ( int //Altera_OpenCL_Accelerate __global __restrict__ * A , ...) //Altera_OpenCL_size A 2560 { void func (...) ... { int tmp = A [ 2 * WIDTH + 3 ]; int tmp = A [ 2 ][ 3 ]; } } 07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 18

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend