gpgpu computing with opencl
play

GPGPU Computing with OpenCL . Institute for Data Processing and - PowerPoint PPT Presentation

. . . National Research Center of the Helmholtz Association KIT University of the State of Baden-Wuerttemberg and . Matthias Vogelgesang (IPE), Daniel Hilk (IEKP) GPGPU Computing with OpenCL . Institute for Data Processing and


  1. . . . National Research Center of the Helmholtz Association KIT – University of the State of Baden-Wuerttemberg and . Matthias Vogelgesang (IPE), Daniel Hilk (IEKP) GPGPU Computing with OpenCL . Institute for Data Processing and Electronics, Institut für Experimentelle Kernphysik . 0 KIT Institut für Experimentelle Kernphysik Institute for Data Processing and Electronics, . M. Vogelgesang - GPGPU Computing with OpenCL . Oct. 18ᵗʰ 2013 . www.kit.edu

  2. . Institut für Experimentelle Kernphysik . Despite Moore’s law, CPUs hit a performance wall . More data is generated, more data has to be processed and analyzed . KIT Institute for Data Processing and Electronics, Motivation . M. Vogelgesang - GPGPU Computing with OpenCL . Oct. 18ᵗʰ 2013 . 1 . GPU architectures can give a higher throughput and better performance

  3. . . . 4500 (SP) / 1500 (DP) GFLOPs (equivalent of supercomputer in 2000) . 6 GB at 288.4 GB/s . Some numbers of NVIDIAs GTX Titan flagship Instruction set is tailored towards math and image operations . Architecture consists of many but rather simple compute cores . GPUs have flexible, programmable pipelines GPUs are heavily optimized towards pixelation of 3D data GPU advantages . Why are GPUs good at what they do? KIT Institut für Experimentelle Kernphysik Institute for Data Processing and Electronics, . M. Vogelgesang - GPGPU Computing with OpenCL . Oct. 18ᵗʰ 2013 . 2 . 250 W power consumption

  4. . Limitations ¹4500 GFLOPS / 288.4 GB/s = 16 FLOP/B . Cliché quote: “premature optimization is the root of all evil” . Think about your algorithm first Limited main memory, thus partitioning might be necessary . Bus can become a bottleneck² . High operations-per-memory-access ratios¹ . Optimal performance with regular, parallel tasks . There are no silver bullets KIT Institut für Experimentelle Kernphysik Institute for Data Processing and Electronics, . M. Vogelgesang - GPGPU Computing with OpenCL . Oct. 18ᵗʰ 2013 . 3 . ²4500 GFLOPS / 16 GB/s (PCIe 3.0 x16) = 280 FLOP/B O ( c n ) is slow, no matter where you run it

  5. . . . Cross-platform support (Linux, Windows, Mac) . Open, vendor-neutral standard . Why OpenCL? High-level pragmas in OpenACC à la OpenMP since 2012 . OpenCL initiated by Apple first released in 2008/09 . NVIDIA presented CUDA in 2007 Early research prototypes (e.g. Brook) used OpenGL shaders History and Background . Development of GPGPU abstractions KIT Institut für Experimentelle Kernphysik Institute for Data Processing and Electronics, . M. Vogelgesang - GPGPU Computing with OpenCL . Oct. 18ᵗʰ 2013 . 4 . Multiple hardware platforms (CPUs, GPUs, FPGAs)

  6. . 5 . Oct. 18ᵗʰ 2013 . M. Vogelgesang - GPGPU Computing with OpenCL . Institute for Data Processing and Electronics, Institut für Experimentelle Kernphysik KIT OpenCL concepts

  7. . Programming model . 1 processing elements Each CU has . 1 compute units A device has . Devices The devices execute code assigned to them by the host . The host manages resources and schedules execution . . . Platform KIT Institut für Experimentelle Kernphysik Institute for Data Processing and Electronics, . M. Vogelgesang - GPGPU Computing with OpenCL . Oct. 18ᵗʰ 2013 . 6 . How CUs and PEs are mapped to hardware is not specified A host controls ≥ 1 platforms (e.g. vendor SDKs) A platform consists of ≥ 1 devices

  8. . . . . . Devices The devices execute code assigned to them by the host . The host manages resources and schedules execution . . Programming model How CUs and PEs are mapped to hardware is not specified Platform KIT Institut für Experimentelle Kernphysik Institute for Data Processing and Electronics, . M. Vogelgesang - GPGPU Computing with OpenCL . Oct. 18ᵗʰ 2013 . 6 . A host controls ≥ 1 platforms (e.g. vendor SDKs) A platform consists of ≥ 1 devices A device has ≥ 1 compute units Each CU has ≥ 1 processing elements

  9. . Work is arranged as Work items are executed on PEs . Work groups are scheduled on one or more CUs . work groups . Grid is split into . work items on a 1D, 2D or 3D grid . . Execution model KIT Institut für Experimentelle Kernphysik Institute for Data Processing and Electronics, . M. Vogelgesang - GPGPU Computing with OpenCL . Oct. 18ᵗʰ 2013 . 7 . .

  10. . . Work items are executed on PEs . Work groups are scheduled on one or more CUs . work groups . . Grid is split into . work items on a 1D, 2D or 3D grid Work is arranged as Execution model . KIT Institut für Experimentelle Kernphysik Institute for Data Processing and Electronics, . M. Vogelgesang - GPGPU Computing with OpenCL . Oct. 18ᵗʰ 2013 . 7 . .

  11. . . Work items are executed on PEs . Work groups are scheduled on one or more CUs . work groups . . Grid is split into . work items on a 1D, 2D or 3D grid Work is arranged as Execution model . KIT Institut für Experimentelle Kernphysik Institute for Data Processing and Electronics, . M. Vogelgesang - GPGPU Computing with OpenCL . Oct. 18ᵗʰ 2013 . 7 . .

  12. . . Work items are executed on PEs . Work groups are scheduled on one or more CUs . work groups . . Grid is split into . work items on a 1D, 2D or 3D grid Work is arranged as Execution model . KIT Institut für Experimentelle Kernphysik Institute for Data Processing and Electronics, . M. Vogelgesang - GPGPU Computing with OpenCL . Oct. 18ᵗʰ 2013 . 7 . .

  13. . In most cases it corresponds to the innermost body of a for loop, e.g. from . Location relative to the global grid . Location relative to the work group . A kernel has implicit parameters to identify itself . x[i] = sin(y[i]) + 0.5 * (x[i-1] + x[i+1]); you would extract the kernel x[i] = sin(y[i]) + 0.5 * (x[i-1] + x[i+1]); for (int i = 1; i < N-1; i++) . Kernel A kernel is a piece of code executed by each work item . KIT Institut für Experimentelle Kernphysik Institute for Data Processing and Electronics, . M. Vogelgesang - GPGPU Computing with OpenCL . Oct. 18ᵗʰ 2013 . 8 . Number of work groups/items

  14. . Host cannot access device memory directly and vice versa Privat local to a work item Local local to a work group work items Constant host-accessible, read-only by all all work items Global host-accessible, read/write-able by Device memory Images are structured buffers . Buffers to transfer data between host and device memory . . Memory model Memory, buffers and images KIT Institut für Experimentelle Kernphysik Institute for Data Processing and Electronics, . M. Vogelgesang - GPGPU Computing with OpenCL . Oct. 18ᵗʰ 2013 . 9 . .

  15. . Host cannot access device memory directly and vice versa Privat local to a work item Local local to a work group work items Constant host-accessible, read-only by all all work items Global host-accessible, read/write-able by Device memory Images are structured buffers . Buffers to transfer data between host and device memory . . Memory model Memory, buffers and images KIT Institut für Experimentelle Kernphysik Institute for Data Processing and Electronics, . M. Vogelgesang - GPGPU Computing with OpenCL . Oct. 18ᵗʰ 2013 . 9 . .

  16. . Host cannot access device memory directly and vice versa Privat local to a work item Local local to a work group work items Constant host-accessible, read-only by all all work items Global host-accessible, read/write-able by Device memory Images are structured buffers . Buffers to transfer data between host and device memory . . Memory model Memory, buffers and images KIT Institut für Experimentelle Kernphysik Institute for Data Processing and Electronics, . M. Vogelgesang - GPGPU Computing with OpenCL . Oct. 18ᵗʰ 2013 . 9 . .

  17. . Host cannot access device memory directly and vice versa Privat local to a work item Local local to a work group work items Constant host-accessible, read-only by all all work items Global host-accessible, read/write-able by Device memory Images are structured buffers . Buffers to transfer data between host and device memory . . Memory model Memory, buffers and images KIT Institut für Experimentelle Kernphysik Institute for Data Processing and Electronics, . M. Vogelgesang - GPGPU Computing with OpenCL . Oct. 18ᵗʰ 2013 . 9 . .

  18. . Host cannot access device memory directly and vice versa Privat local to a work item Local local to a work group work items Constant host-accessible, read-only by all all work items Global host-accessible, read/write-able by Device memory Images are structured buffers . Buffers to transfer data between host and device memory . . Memory model Memory, buffers and images KIT Institut für Experimentelle Kernphysik Institute for Data Processing and Electronics, . M. Vogelgesang - GPGPU Computing with OpenCL . Oct. 18ᵗʰ 2013 . 9 . .

  19. . 10 . Oct. 18ᵗʰ 2013 . M. Vogelgesang - GPGPU Computing with OpenCL . Institute for Data Processing and Electronics, Institut für Experimentelle Kernphysik KIT OpenCL API

  20. . FPGA 1.0 Altera 1.1¹ Apple 1.2 Intel 1.2 AMD Implementations 1.1 NVIDIA OS ¹ OpenCL 1.2 from OS X 10.9 CPU . . 11 . Oct. 18ᵗʰ 2013 GPU M. Vogelgesang - GPGPU Computing with OpenCL . Institute for Data Processing and Electronics, Vendor Rev. KIT Institut für Experimentelle Kernphysik ✓ ✗ ✗ ✓ ✓ ✗ ✓ ✓ ✗ ✓ ✓ ✗ ✗ ✗ ✓

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend