openmp device offloading to fpga accelerators
play

OpenMP Device Offloading to FPGA Accelerators Lukas Sommer, Jens - PowerPoint PPT Presentation

OpenMP Device Offloading to FPGA Accelerators Lukas Sommer, Jens Korinth, Andreas Koch Motivation Increasing use of heterogeneous systems to overcome CPU power limitations 2017-07-12 | OpenMP FPGA Device Offloading | L. Sommer, J. Korinth,


  1. OpenMP Device Offloading to FPGA Accelerators Lukas Sommer, Jens Korinth, Andreas Koch

  2. Motivation ● Increasing use of heterogeneous systems to overcome CPU power limitations 2017-07-12 | OpenMP FPGA Device Offloading | L. Sommer, J. Korinth, A. Koch | TU Darmstadt 2 2

  3. Motivation ● Increasing use of heterogeneous systems to overcome CPU power limitations ● FPGAs increasingly used for implementation of accelerators in HPC systems (e.g. Microsoft Azure) 2017-07-12 | OpenMP FPGA Device Offloading | L. Sommer, J. Korinth, A. Koch | TU Darmstadt 3 2

  4. Motivation ● Increasing use of heterogeneous systems to overcome CPU power limitations ● FPGAs increasingly used for implementation of accelerators in HPC systems (e.g. Microsoft Azure) ● Programming of heterogeneous systems is non-trivial 2017-07-12 | OpenMP FPGA Device Offloading | L. Sommer, J. Korinth, A. Koch | TU Darmstadt 4 2

  5. Motivation ● Increasing use of heterogeneous systems to overcome CPU power limitations ● FPGAs increasingly used for implementation of accelerators in HPC systems (e.g. Microsoft Azure) ● Programming of heterogeneous systems is non-trivial ● Desirable: Programming with a single, portable code base 2017-07-12 | OpenMP FPGA Device Offloading | L. Sommer, J. Korinth, A. Koch | TU Darmstadt 5 2

  6. OpenMP Device Offloading ● Denote target regions to execute on device #pragma omp target \ map(to:x[0:SIZE]) \ map(tofrom:y[0:SIZE]) { #pragma omp parallel for[...] for(i=0; i<SIZE; i++){ Target Region y[i] = a*x[i]+y[i]; } } 2017-07-12 | OpenMP FPGA Device Offloading | L. Sommer, J. Korinth, A. Koch | TU Darmstadt 6 3

  7. OpenMP Device Offloading ● Denote target regions to execute on device #pragma omp target \ map(to:x[0:SIZE]) \ ● Specify which and how map(tofrom:y[0:SIZE]) data is transferred to { device memory #pragma omp parallel for[...] for(i=0; i<SIZE; i++){ Target Region y[i] = a*x[i]+y[i]; } } 2017-07-12 | OpenMP FPGA Device Offloading | L. Sommer, J. Korinth, A. Koch | TU Darmstadt 7 3

  8. OpenMP Device Offloading ● Denote target regions to execute on device #pragma omp target \ map(to:x[0:SIZE]) \ ● Specify which and how map(tofrom:y[0:SIZE]) data is transferred to { device memory #pragma omp parallel for[...] for(i=0; i<SIZE; i++){ ● Use additional parallel y[i] = a*x[i]+y[i]; } constructs inside target } region (also target- specific, e.g. teams, distribute ,...) 2017-07-12 | OpenMP FPGA Device Offloading | L. Sommer, J. Korinth, A. Koch | TU Darmstadt 8 3

  9. Goal ● Implement mapping of target regions to FPGA accelerators in LLVM Clang – Preserve FPGA-specific pragmas (e.g. Vivado HLS) – Automated flow from OpenMP-annotated input program to FPGA bitstream + software executable 2017-07-12 | OpenMP FPGA Device Offloading | L. Sommer, J. Korinth, A. Koch | TU Darmstadt 9 4

  10. Goal ● Implement mapping of target regions to FPGA accelerators in LLVM Clang – Preserve FPGA-specific pragmas (e.g. Vivado HLS) – Automated flow from OpenMP-annotated input program to FPGA bitstream + software executable ● Extend LLVM OpenMP Runtime – Manage data-transfers between host and FPGA – Control device execution on the FPGA accelerator 2017-07-12 | OpenMP FPGA Device Offloading | L. Sommer, J. Korinth, A. Koch | TU Darmstadt 10 4

  11. ThreadPoolComposer ● Toolchain to fast-track implementation of FPGA-based accelerators in heterogeneous systems ● Synthesize accelerator from kernel code TPC is available as open source: https://goo.gl/qTsU3B 2017-07-12 | OpenMP FPGA Device Offloading | L. Sommer, J. Korinth, A. Koch | TU Darmstadt 11 5

  12. ThreadPoolComposer ● Toolchain to fast-track implementation of FPGA-based accelerators in heterogeneous systems ● Assemble (multiple) instances of different kernels in top-level design, combined with standardized host- and memory connection TPC is available as open source: https://goo.gl/qTsU3B 2017-07-12 | OpenMP FPGA Device Offloading | L. Sommer, J. Korinth, A. Koch | TU Darmstadt 12 5

  13. ThreadPoolComposer ● Toolchain to fast-track implementation of FPGA-based accelerators in heterogeneous systems ● Control execution and data- transfer using two-layered API – Higher-level TPC API is device/platform-agnostic, allows for portable implementation (write once, run everywhere) TPC is available as open source: https://goo.gl/qTsU3B 2017-07-12 | OpenMP FPGA Device Offloading | L. Sommer, J. Korinth, A. Koch | TU Darmstadt 13 5

  14. Compilation Flow ● Start from a single, portable source file 2017-07-12 | OpenMP FPGA Device Offloading | L. Sommer, J. Korinth, A. Koch | TU Darmstadt 14 6

  15. Compilation Flow ● Start from a single, portable source file ● Standard host- compilation, including fallback if offloading fails 2017-07-12 | OpenMP FPGA Device Offloading | L. Sommer, J. Korinth, A. Koch | TU Darmstadt 15 6

  16. Compilation Flow ● Start from a single, portable source file ● Standard host- compilation, including fallback if offloading fails ● One device-specific compilation flow per device type – Limited to extracted target regions 2017-07-12 | OpenMP FPGA Device Offloading | L. Sommer, J. Korinth, A. Koch | TU Darmstadt 16 6

  17. Compilation Flow ● Custom Clang toolchain for TPC-based offloading to FPGA accelerators – Identified with new LLVM target triple – Preserves FPGA- specific pragmas, e.g. Vivado HLS pragmas – Yields three artifacts 2017-07-12 | OpenMP FPGA Device Offloading | L. Sommer, J. Korinth, A. Koch | TU Darmstadt 17 6

  18. Compilation Flow ● TPC-specific software binary – Entry point for FPGA device execution – Transfers kernel arguments and launches hardware execution using TPC API 2017-07-12 | OpenMP FPGA Device Offloading | L. Sommer, J. Korinth, A. Koch | TU Darmstadt 18 6

  19. Compilation Flow ● TPC-specific software binary – Entry point for FPGA device execution – Transfers kernel arguments and launches hardware execution using TPC API – Included in the combined binary 2017-07-12 | OpenMP FPGA Device Offloading | L. Sommer, J. Korinth, A. Koch | TU Darmstadt 19 6

  20. Compilation Flow ● Hardware kernel code extracted from target region 2017-07-12 | OpenMP FPGA Device Offloading | L. Sommer, J. Korinth, A. Koch | TU Darmstadt 20 6

  21. Compilation Flow ● Hardware kernel code extracted from target region ● Description of input argument types 2017-07-12 | OpenMP FPGA Device Offloading | L. Sommer, J. Korinth, A. Koch | TU Darmstadt 21 6

  22. Compilation Flow ● Hardware kernel code extracted from target region ● Description of input argument types ● TPC automates synthesis from kernel code and description to full FPGA design No additional user input required! 2017-07-12 | OpenMP FPGA Device Offloading | L. Sommer, J. Korinth, A. Koch | TU Darmstadt 22 6

  23. Compilation Flow ● Hardware kernel code extracted from target region ● Description of input argument types ● TPC automates synthesis from kernel code and description to full FPGA design ● Resulting bitstream features standardized host- and memory connection 2017-07-12 | OpenMP FPGA Device Offloading | L. Sommer, J. Korinth, A. Koch | TU Darmstadt 23 6

  24. Runtime Flow Components: Host 2017-07-12 | OpenMP FPGA Device Offloading | L. Sommer, J. Korinth, A. Koch | TU Darmstadt 24 7

  25. Runtime Flow Components: ● LLVM OpenMP Runtime Infrastructure Host 2017-07-12 | OpenMP FPGA Device Offloading | L. Sommer, J. Korinth, A. Koch | TU Darmstadt 25 7

  26. Runtime Flow Components: ● LLVM OpenMP Runtime Infrastructure ● TPC-based plugin for Host LLVM OpenMP Runtime 2017-07-12 | OpenMP FPGA Device Offloading | L. Sommer, J. Korinth, A. Koch | TU Darmstadt 26 7

  27. Runtime Flow Components: ● LLVM OpenMP Runtime Infrastructure ● TPC-based plugin for Host LLVM OpenMP Runtime ● TPC-specific software binary resulting from compilation 2017-07-12 | OpenMP FPGA Device Offloading | L. Sommer, J. Korinth, A. Koch | TU Darmstadt 27 7

  28. Runtime Flow Components: ● LLVM OpenMP Runtime Infrastructure ● TPC-based plugin for Host LLVM OpenMP Runtime ● TPC-specific software binary resulting from compilation ● FPGA abstraction as provided by TPC 2017-07-12 | OpenMP FPGA Device Offloading | L. Sommer, J. Korinth, A. Koch | TU Darmstadt 28 7

  29. Runtime Flow ● Host-centric: Execution starts on the host If target region is encountered: Host 2017-07-12 | OpenMP FPGA Device Offloading | L. Sommer, J. Korinth, A. Koch | TU Darmstadt 29 7

  30. Runtime Flow ● Host-centric: Execution starts on the host If target region is encountered: ● Transfer data to FPGA Host memory 2017-07-12 | OpenMP FPGA Device Offloading | L. Sommer, J. Korinth, A. Koch | TU Darmstadt 30 7

  31. Runtime Flow ● Host-centric: Execution starts on the host If target region is encountered: ● Transfer data to FPGA Host memory ● Invoke binary ● Sets kernel arguments ● Launches hardware execution 2017-07-12 | OpenMP FPGA Device Offloading | L. Sommer, J. Korinth, A. Koch | TU Darmstadt 31 7

  32. Runtime Flow ● Host-centric: Execution starts on the host If target region is encountered: ● Transfer data to FPGA Host memory ● Invoke binary ● Sets kernel arguments ● Launches hardware execution ● Transfer data back to host 2017-07-12 | OpenMP FPGA Device Offloading | L. Sommer, J. Korinth, A. Koch | TU Darmstadt 32 7

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend