GPGPU Computing with OpenCL . Institute for Data Processing and - PowerPoint PPT Presentation

. . . National Research Center of the Helmholtz Association KIT – University of the State of Baden-Wuerttemberg and . Matthias Vogelgesang (IPE), Daniel Hilk (IEKP) GPGPU Computing with OpenCL . Institute for Data Processing and Electronics, Institut für Experimentelle Kernphysik . 0 KIT Institut für Experimentelle Kernphysik Institute for Data Processing and Electronics, . M. Vogelgesang - GPGPU Computing with OpenCL . Oct. 18ᵗʰ 2013 . www.kit.edu

. Institut für Experimentelle Kernphysik . Despite Moore’s law, CPUs hit a performance wall . More data is generated, more data has to be processed and analyzed . KIT Institute for Data Processing and Electronics, Motivation . M. Vogelgesang - GPGPU Computing with OpenCL . Oct. 18ᵗʰ 2013 . 1 . GPU architectures can give a higher throughput and better performance

. . . 4500 (SP) / 1500 (DP) GFLOPs (equivalent of supercomputer in 2000) . 6 GB at 288.4 GB/s . Some numbers of NVIDIAs GTX Titan flagship Instruction set is tailored towards math and image operations . Architecture consists of many but rather simple compute cores . GPUs have flexible, programmable pipelines GPUs are heavily optimized towards pixelation of 3D data GPU advantages . Why are GPUs good at what they do? KIT Institut für Experimentelle Kernphysik Institute for Data Processing and Electronics, . M. Vogelgesang - GPGPU Computing with OpenCL . Oct. 18ᵗʰ 2013 . 2 . 250 W power consumption

. Limitations ¹4500 GFLOPS / 288.4 GB/s = 16 FLOP/B . Cliché quote: “premature optimization is the root of all evil” . Think about your algorithm first Limited main memory, thus partitioning might be necessary . Bus can become a bottleneck² . High operations-per-memory-access ratios¹ . Optimal performance with regular, parallel tasks . There are no silver bullets KIT Institut für Experimentelle Kernphysik Institute for Data Processing and Electronics, . M. Vogelgesang - GPGPU Computing with OpenCL . Oct. 18ᵗʰ 2013 . 3 . ²4500 GFLOPS / 16 GB/s (PCIe 3.0 x16) = 280 FLOP/B O ( c n ) is slow, no matter where you run it

. . . Cross-platform support (Linux, Windows, Mac) . Open, vendor-neutral standard . Why OpenCL? High-level pragmas in OpenACC à la OpenMP since 2012 . OpenCL initiated by Apple first released in 2008/09 . NVIDIA presented CUDA in 2007 Early research prototypes (e.g. Brook) used OpenGL shaders History and Background . Development of GPGPU abstractions KIT Institut für Experimentelle Kernphysik Institute for Data Processing and Electronics, . M. Vogelgesang - GPGPU Computing with OpenCL . Oct. 18ᵗʰ 2013 . 4 . Multiple hardware platforms (CPUs, GPUs, FPGAs)

. 5 . Oct. 18ᵗʰ 2013 . M. Vogelgesang - GPGPU Computing with OpenCL . Institute for Data Processing and Electronics, Institut für Experimentelle Kernphysik KIT OpenCL concepts

. Programming model . 1 processing elements Each CU has . 1 compute units A device has . Devices The devices execute code assigned to them by the host . The host manages resources and schedules execution . . . Platform KIT Institut für Experimentelle Kernphysik Institute for Data Processing and Electronics, . M. Vogelgesang - GPGPU Computing with OpenCL . Oct. 18ᵗʰ 2013 . 6 . How CUs and PEs are mapped to hardware is not specified A host controls ≥ 1 platforms (e.g. vendor SDKs) A platform consists of ≥ 1 devices

. . . . . Devices The devices execute code assigned to them by the host . The host manages resources and schedules execution . . Programming model How CUs and PEs are mapped to hardware is not specified Platform KIT Institut für Experimentelle Kernphysik Institute for Data Processing and Electronics, . M. Vogelgesang - GPGPU Computing with OpenCL . Oct. 18ᵗʰ 2013 . 6 . A host controls ≥ 1 platforms (e.g. vendor SDKs) A platform consists of ≥ 1 devices A device has ≥ 1 compute units Each CU has ≥ 1 processing elements

. Work is arranged as Work items are executed on PEs . Work groups are scheduled on one or more CUs . work groups . Grid is split into . work items on a 1D, 2D or 3D grid . . Execution model KIT Institut für Experimentelle Kernphysik Institute for Data Processing and Electronics, . M. Vogelgesang - GPGPU Computing with OpenCL . Oct. 18ᵗʰ 2013 . 7 . .

. . Work items are executed on PEs . Work groups are scheduled on one or more CUs . work groups . . Grid is split into . work items on a 1D, 2D or 3D grid Work is arranged as Execution model . KIT Institut für Experimentelle Kernphysik Institute for Data Processing and Electronics, . M. Vogelgesang - GPGPU Computing with OpenCL . Oct. 18ᵗʰ 2013 . 7 . .

. In most cases it corresponds to the innermost body of a for loop, e.g. from . Location relative to the global grid . Location relative to the work group . A kernel has implicit parameters to identify itself . x[i] = sin(y[i]) + 0.5 * (x[i-1] + x[i+1]); you would extract the kernel x[i] = sin(y[i]) + 0.5 * (x[i-1] + x[i+1]); for (int i = 1; i < N-1; i++) . Kernel A kernel is a piece of code executed by each work item . KIT Institut für Experimentelle Kernphysik Institute for Data Processing and Electronics, . M. Vogelgesang - GPGPU Computing with OpenCL . Oct. 18ᵗʰ 2013 . 8 . Number of work groups/items

. Host cannot access device memory directly and vice versa Privat local to a work item Local local to a work group work items Constant host-accessible, read-only by all all work items Global host-accessible, read/write-able by Device memory Images are structured buffers . Buffers to transfer data between host and device memory . . Memory model Memory, buffers and images KIT Institut für Experimentelle Kernphysik Institute for Data Processing and Electronics, . M. Vogelgesang - GPGPU Computing with OpenCL . Oct. 18ᵗʰ 2013 . 9 . .

. 10 . Oct. 18ᵗʰ 2013 . M. Vogelgesang - GPGPU Computing with OpenCL . Institute for Data Processing and Electronics, Institut für Experimentelle Kernphysik KIT OpenCL API

. FPGA 1.0 Altera 1.1¹ Apple 1.2 Intel 1.2 AMD Implementations 1.1 NVIDIA OS ¹ OpenCL 1.2 from OS X 10.9 CPU . . 11 . Oct. 18ᵗʰ 2013 GPU M. Vogelgesang - GPGPU Computing with OpenCL . Institute for Data Processing and Electronics, Vendor Rev. KIT Institut für Experimentelle Kernphysik ✓ ✗ ✗ ✓ ✓ ✗ ✓ ✓ ✗ ✓ ✓ ✗ ✗ ✗ ✓

GPGPU Computing with OpenCL . Institute for Data Processing and - PowerPoint PPT Presentation

. . . National Research Center of the Helmholtz Association KIT University of the State of Baden-Wuerttemberg and . Matthias Vogelgesang (IPE), Daniel Hilk (IEKP) GPGPU Computing with OpenCL . Institute for Data Processing and

Welcome! Global Agenda: 1. GPGPU (1) : Introduction, architecture, concepts 2. GPGPU (2) :

OpenCL Kernel Compilation Slides taken from Hands On OpenCL by Simon McIntosh-Smith, Tom Deakin,

Welcome! Todays Agenda: Introduction to GPGPU Example: Voronoi Noise GPGPU

Welcome! Todays Agenda: GPU Execution Model GPGPU Flow GPGPU Low Level Notes

Parallel Incep+on MPP Databases GPGPU Kyle Dunn Me Data nerd for Recovering HPC/GPGPU

Welcome! Todays Agenda: Practical GPGPU: Verlet Fluid GPGPU Algorithms Optimizing

Investigation of the OpenCL support in the GeantV's Vectorized Geometry Gabor Biro 22.09.2014.

The OpenCL C++ API Slides taken from Hands On OpenCL by Simon McIntosh-Smith, Tom Deakin, James

Introduction to OpenCL David Black-Schaffer david.black-schaffer@it.uu.se 1 Disclaimer I

OpenCL on FPGAs Contains material from Hands On OpenCL by Simon McIntosh-Smith, Tom Deakin, James

Synchronization in OpenCL Slides taken from Hands On OpenCL by Simon McIntosh-Smith, Tom Deakin,

High-performance GPGPU OpenCL simulation of quantum Boltzmann equation Petr F. Kartsev NRNU

Efficient Abstractions for GPGPU Programming . Mathias Bourgoin 10.03.2015 Efficient

CUDA (Compute Unified Device Dr. Bharathwaj Bharath Muthuswamy Architecture) and OpenCL

HiPANQ Overview of NVIDIA GPU Architecture and Introduction to CUDA/OpenCL Programming, and

PERFORMANCE CONSIDERATIONS FOR OPENCL ON NVIDIA GPUS Karthik Raghavan Ravi, 4/4/16 THE PROBLEM

The HL-LHC CMS Level-1 Track Trigger Luis Ardila INSTITUTE FOR DATA PROCESSING AND ELECTRONICS

UCS Views on NTTF Recommendation 1 and the NRC Staff Proposal January 10, 2014 Dr. Edwin S.

COMMUNITY WORK TRANSITION PROGRAM TRANSITION SERVICES VIDEO PART TWO - FOR RETURNING EMPLOYMENT

Beyond Level Planarity 24th International Symposium on Graph Drawing & Network Visualization

SBFD Cryostats Marzio NESSI Directors Progress Review of SBN 15-17 December 2015 Who Am I

1 Interprofessional Collaboration Primary care is the central pillar of health care. The

Student Senate PBC INFO SESSION WEDNESDAY, AUGUST 8 TH 2018 Family Day: Saturday, October 6

FIRST QUARTER 2019 INVESTOR PRESENTATION Financing the Growth of Tomorrows Companies Today TM

GPGPU Computing with OpenCL . Institute for Data Processing and - PowerPoint PPT Presentation

. . . National Research Center of the Helmholtz Association KIT University of the State of Baden-Wuerttemberg and . Matthias Vogelgesang (IPE), Daniel Hilk (IEKP) GPGPU Computing with OpenCL . Institute for Data Processing and

Welcome! Global Agenda: 1. GPGPU (1) : Introduction, architecture, concepts 2. GPGPU (2) :

OpenCL Kernel Compilation Slides taken from Hands On OpenCL by Simon McIntosh-Smith, Tom Deakin,

Welcome! Todays Agenda: Introduction to GPGPU Example: Voronoi Noise GPGPU

Welcome! Todays Agenda: GPU Execution Model GPGPU Flow GPGPU Low Level Notes

Parallel Incep+on MPP Databases GPGPU Kyle Dunn Me Data nerd for Recovering HPC/GPGPU

Welcome! Todays Agenda: Practical GPGPU: Verlet Fluid GPGPU Algorithms Optimizing

Investigation of the OpenCL support in the GeantV's Vectorized Geometry Gabor Biro 22.09.2014.

The OpenCL C++ API Slides taken from Hands On OpenCL by Simon McIntosh-Smith, Tom Deakin, James

Introduction to OpenCL David Black-Schaffer david.black-schaffer@it.uu.se 1 Disclaimer I

OpenCL on FPGAs Contains material from Hands On OpenCL by Simon McIntosh-Smith, Tom Deakin, James

Synchronization in OpenCL Slides taken from Hands On OpenCL by Simon McIntosh-Smith, Tom Deakin,

High-performance GPGPU OpenCL simulation of quantum Boltzmann equation Petr F. Kartsev NRNU

Efficient Abstractions for GPGPU Programming . Mathias Bourgoin 10.03.2015 Efficient

CUDA (Compute Unified Device Dr. Bharathwaj Bharath Muthuswamy Architecture) and OpenCL

HiPANQ Overview of NVIDIA GPU Architecture and Introduction to CUDA/OpenCL Programming, and

PERFORMANCE CONSIDERATIONS FOR OPENCL ON NVIDIA GPUS Karthik Raghavan Ravi, 4/4/16 THE PROBLEM

The HL-LHC CMS Level-1 Track Trigger Luis Ardila INSTITUTE FOR DATA PROCESSING AND ELECTRONICS

UCS Views on NTTF Recommendation 1 and the NRC Staff Proposal January 10, 2014 Dr. Edwin S.

COMMUNITY WORK TRANSITION PROGRAM TRANSITION SERVICES VIDEO PART TWO - FOR RETURNING EMPLOYMENT

Beyond Level Planarity 24th International Symposium on Graph Drawing &amp; Network Visualization

SBFD Cryostats Marzio NESSI Directors Progress Review of SBN 15-17 December 2015 Who Am I

1 Interprofessional Collaboration Primary care is the central pillar of health care. The

Student Senate PBC INFO SESSION WEDNESDAY, AUGUST 8 TH 2018 Family Day: Saturday, October 6

FIRST QUARTER 2019 INVESTOR PRESENTATION Financing the Growth of Tomorrows Companies Today TM

Beyond Level Planarity 24th International Symposium on Graph Drawing & Network Visualization