from dataflow specifications to customised reconfigurable
play

From Dataflow Specifications to Customised Reconfigurable Datapaths - PowerPoint PPT Presentation

From Dataflow Specifications to Customised Reconfigurable Datapaths Using HLS: the OpenCL Case for FPGAs Rubn Salvador [Kindly hosted by INSA: KDesnos, MPelcat, JFNezan, DMenard, LMorin ] Universidad Politcnica de Madrid (UPM) School of


  1. From Dataflow Specifications to Customised Reconfigurable Datapaths Using HLS: the OpenCL Case for FPGAs Rubén Salvador [Kindly hosted by INSA: KDesnos, MPelcat, JFNezan, DMenard, LMorin …] Universidad Politécnica de Madrid (UPM) School of Telecommunications Systems and Engineering (ETSIST) Research Center on Software Technologies and Multimedia Systems (CITSEM) Dataflow Workshop Rennes, 12-14 December 2017

  2. Context Rubén Salvador From Dataflow to Customised FPGA Datapaths 2

  3. Context Rubén Salvador From Dataflow to Customised FPGA Datapaths 3

  4. OUTLINE Motivation OpenCL FPGA Dataflow Mapping On Top Of OpenCL FPGA Implementation Steps Rubén Salvador From Dataflow to Customised FPGA Datapaths 4

  5. OUTLINE Motivation OpenCL FPGA Dataflow Mapping On Top Of OpenCL FPGA Implementation Steps Rubén Salvador From Dataflow to Customised FPGA Datapaths 5

  6. Dataflow: a (naive) view from a newcomer Dataflow A flow of data… … moves … … is transformed … …and sinks. between (2) along its way points in space Spatial Computing hardware datapath FPGA on-chip memory point to point comms architecture Rubén Salvador From Dataflow to Customised FPGA Datapaths 6

  7. Customised FPGA-based datapaths for dataflow graphs Dataflow FPGA HLS A C B System Level Integration Wide (SW) developers love Community Embrace Rubén Salvador From Dataflow to Customised FPGA Datapaths 7

  8. Conquering Computing Community Embrace What can dataflow bring to the OpenCL community? OpenCL Dataflow • Graph analysis & Guarantees • Functionally portable • Schedulability, deadlocks, FIFO • Wide community acceptance Pros sizing • Support for HLS • Concurrent execution model • … • Comms interaction • No dataflow (streaming) friendly • Niche domain • Global memory comms • Most work for multi/manycore • Compute accelerator model Cons • Data offload (writes/reads) • Throughput oriented (vs latency) Rubén Salvador From Dataflow to Customised FPGA Datapaths 8

  9. OUTLINE Motivation OpenCL FPGA Dataflow Mapping On Top Of OpenCL FPGA Implementation Steps Rubén Salvador From Dataflow to Customised FPGA Datapaths 9

  10. OpenCL: framework for heterogeneous/parallel computing Task parallelism Data parallelism SIMD Work Group (WG) Work Item (WI) Rubén Salvador From Dataflow to Customised FPGA Datapaths 10

  11. OpenCL FPGA Model Intel FPGA SDK for OpenCL: Best Practices Guide https://www.altera.com/en_US/pdfs/lit erature/hb/opencl-sdk/aocl-best- practices-guide.pdf https://www.altera.com/products/design- software/embedded-software- developers/opencl/developer-zone.html Rubén Salvador From Dataflow to Customised FPGA Datapaths 11

  12. OUTLINE Motivation OpenCL FPGA Dataflow Mapping On Top Of OpenCL FPGA Implementation Steps Rubén Salvador From Dataflow to Customised FPGA Datapaths 12

  13. Dataflow On Top Of OpenCL: SoC FPGAs Desired Features & Expected Gains Hardware acceleration (custom datapath) Reduced processor (communication) overhead Reduced memory transactions Self-timed Execution Rubén Salvador From Dataflow to Customised FPGA Datapaths 13

  14. Dataflow On Top Of OpenCL FPGA Dataflow Community Leverage OpenCL FPGA constructs to generate efficient dataflow Dataflow- driven “ OpenCL ” code generation Tool Expertise & Design Space Exploration OpenCL Community OpenCL Khronos Group Standard Recent (2017) proposal: add dataflow semantics to OpenCL standard Rubén Salvador From Dataflow to Customised FPGA Datapaths 14

  15. MoCs semantics for OpenCL Pipes Kapre, Nachiket, and Hiren Patel. Applying Models of Computation to MoCs semantics to OpenCL OpenCL Pipes for FPGA Computing . Proc. 5th IWOCL . ACM, 2017 . OpenCL compute model + MoC Comms Schemes Proposal for the OpenCL Standard a.k.a. : compiler’s job Synchronous Dataflow (SDF) Bulk Synchronous Parallel (BSP) Rubén Salvador From Dataflow to Customised FPGA Datapaths 15

  16. OpenCL Increasing Streaming Support Pipes (OpenCL 2.0) Standard OpenCL Kernel-to-Kernel communication Overlap multi-kernel operation Channels (Intel FPGA) Preferred Kernel-to-Kernel communication Self-triggered kernels (free run decoupled from host) Host-Kernel Pipes Kang, K., and P. Yiannacouras. Host Pipes: Direct Streaming Interface Between OpenCL Host and Kernel . Proc. 5th IWOCL . ACM, 2017 . … only prototype demo so far Rubén Salvador From Dataflow to Customised FPGA Datapaths 16

  17. Kernel Operation Possibilities Autorun kernels No host-kernel communication logic Autostart & Auto-restart Intel FPGA SDK for OpenCL: Best Practices Guide https://www.altera.com/en_US/pdfs/literature/hb/opencl- Communicate through channels sdk/aocl-best-practices-guide.pdf Rubén Salvador From Dataflow to Customised FPGA Datapaths 17

  18. Channels Channels Kernel execution decoupled from host Blocking/Non-blocking Read/Write API Intel FPGA SDK for OpenCL: Programming Guide Synchronization mechanisms https://www.altera.com/content/dam/altera- www/global/en_US/pdfs/literature/hb/opencl- I/O Channels -> Streaming DSP sdk/aocl_programming_guide.pdf Rubén Salvador From Dataflow to Customised FPGA Datapaths 18

  19. OUTLINE Motivation OpenCL FPGA Dataflow Mapping On Top Of OpenCL FPGA Implementation Steps Rubén Salvador From Dataflow to Customised FPGA Datapaths 19

  20. Mapping (Pi)SDF Dataflow Graphs To OpenCL Model Kernel A C B K 1.- Actor Firing Rules within OpenCL Kernels Check overhead vs. performance Acceptable? 2.- Code Generatrion from P REESM 2.1.- Actor firing rules - scheduler 2.2.- FIFO analysis & Buffer generation 2.3.- Memory Access Optimization Rubén Salvador From Dataflow to Customised FPGA Datapaths 20

  21. Mapping (PiSDF) Dataflow Graphs To OpenCL Model Kernel K 2.1.- Actor firing rules (scheduler) Enough with channels sync? Actor I/O IFs, firing rules, templates • Host code: Borrow from CA CAPH ¿? • Platform initialization: automatic • Job management: automatic • input data & result data • "only" necessary for the host/device frontier • pointers mapped to device buffers • Kernel code: • I/O interfaces (firing rules): automatic • Functionality: manual (provided by user) Rubén Salvador From Dataflow to Customised FPGA Datapaths 21

  22. Mapping (PiSDF) Dataflow Graphs To OpenCL Model Kernel K 2.1.- Actor firing rules (scheduler) Enough with channels sync? Actor I/O IFs, firing rules, templates Borrow from CA CAPH ¿? 2.2.- Buffer generation Leverage current P REESM buffer generation Pipes vs Channels vs Ad-hoc Buffer 2.3.- Memory Accesses Optimization Streaming Dataflow Shared/Global Memory Different workloads? Local FPGA DDRs (kernel only) Out-of-order accesses? Rubén Salvador From Dataflow to Customised FPGA Datapaths 22

  23. future(future) 3.- Hack the Flow Kernel (actor) functionality Component library Plug HDL / CAPH DSE: Predictability Area / Latency / Throughput Host-Device communications Device Wrapper DSE: Predictability Compute upper bound? Open Run Time ?¿ Graph Reconfiguration Dynamic Reconfiguration ?¿ 4.- New devices Intel Xeon + FPGA HPC community Rubén Salvador From Dataflow to Customised FPGA Datapaths 23

  24. Thanks for your attention!! ruben.salvador@upm.es https://twitter.com/RubenSalvadorP http://blogs.upm.es/rubensalvador/ Rubén Salvador From Dataflow to Customised FPGA Datapaths 24

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend