GPGPU and Stream Computing Julian Fietkau University of Hamburg - PowerPoint PPT Presentation

GPGPU and Stream Computing Julian Fietkau University of Hamburg June 30th, 2011

Julian Fietkau Things to clear up beforehand. . . These slides are published under the CC-BY-SA 3.0 license. Sources for the numbered figures are in the → list of figures . Non- numbered pictures and illustrations are from the OpenClipArt Project or are based on content from there. Download these slides and give feedback: http://www.julian-fietkau.de/gpgpu_and_stream_computing 2 / 21

Agenda Julian Fietkau Agenda Introduction General Idea of GPGPU Stream Computing Languages Common Ideas OpenCL CUDA Others Compilation to Intermediary Languages Properties Programmability Efficiency Prospects and Conclusions Future Developments Conclusion 3 / 21

Introduction: General Idea of GPGPU Julian Fietkau Flynn’s Taxonomy SISD MISD SIMD MIMD 4 / 21

Introduction: General Idea of GPGPU Julian Fietkau Why Does It Exist? � How long can Moore’s law hold true? → parallelism as a possible answer to computational demands � “swiss army knife” (generally optimal solution) for parallel programming has not been found � idea: exploit consumer-grade graphics Figure 1: Moore’s law – 2011 hardware 5 / 21

Introduction: General Idea of GPGPU Julian Fietkau About Graphics Hardware � games need to display increasingly realistic objects/scenes in real time � need to calculate a lot of vertices and a lot of pixels very quickly → Pixel/Vertex Shaders, later Unified Shader Model � consumer market ensures that graphics adapters remain (relatively) cheap � G eneral P urpose computation on G raphics P rocessing U nits 6 / 21

Introduction: Stream Computing Julian Fietkau Stream Computing � idea: operate on a “stream” of data passing through different “kernels” � related to SIMD � mitigates some of the difficulties of parallelism on von Neumann architectures as well as simple SIMD implementations like SSE or AltiVec � first came up in the 70ies, didn’t gain much traction as “pure” implementations, but hybrid architectures survived 7 / 21

Introduction: Stream Computing Julian Fietkau Stream Computing Example Input: u, v, w; x = u - (v + w); y = u * (v + w); Output: x, y; Figure 2: Stream Computing Example 8 / 21

Languages: Common Ideas Julian Fietkau Common Ideas modern streaming programming languages. . . � . . . are verbose about different usage scenarios for memory � . . . help with partitioning problem spaces in a multitude of ways � . . . are not afraid to introduce limitations to faciliate optimization 9 / 21

Languages: OpenCL Julian Fietkau OpenCL ™ � Open C omputing L anguage, free standard by Khronos ™ Group Application Kernel Command Queue Context Device Figure 3: OpenCL ™ Application Model 10 / 21

Languages: OpenCL Julian Fietkau OpenCL ™ in Detail Device NDRange Host Work Work Work group group group Application (0,0) (1,0) (2,0) Work Work Work group group group (0,1) (1,1) (1,2) Work Work Work Work item item item item (2,0,1) (0,0,0) (1,0,0) (2,0,0) Work Work Work Work item item item item (2,1,1) (0,1,0) (1,1,0) (2,1,0) Figure 4: OpenCL ™ Problem Partitioning 11 / 21

Languages: CUDA Julian Fietkau CUDA � NVIDIA’s custom framework for high-level GPGPU � (it’s actually older than OpenCL though) � same basic idea, but specific to NVIDIA GPUs � conceptually only minor differences between CUDA and OpenCL � biggest one: CUDA is compiled at application compile time while OpenCL is (typically) compiled at application run time � also, annoying nomenclature differences (e.g. shared vs. local vs. private memory) 12 / 21

Languages: Others Julian Fietkau Others There are several more stream processing languages, some of them long in development. Notable: � Brook (and Brook+ ) � Cilk , compare also Intel Array Building Blocks 13 / 21

Languages: Compilation to Intermediary Languages Julian Fietkau Intermediary Languages Problem The actual binary code that runs on devices needs to “know” about exact numbers for cores, memory, registers etc., information that is generally not known at compile time. → compilation to an intermediary language like NVIDIA’s PTX and AMD’s IL , low-level and assembly-like yet abstracting some hardware limitations 14 / 21

Languages: Compilation to Intermediary Languages Julian Fietkau PTX and AMD IL PTX example .reg .b32 r1, r2; .global .f32 array[N]; start: mov.b32 r1, %tid.x; shl.b32 r1, r1, 2; // shift thread id by 2 bits ld.global.b32 r2, array[r1]; // thread[tid] gets array[tid] add.f32 r2, r2, 0.5; // add 1/2 AMD IL example sample_resource(0)_sampler(0) r0.x, v0.xy00 mov r2.x, r0.xxxx dcl_output_generic o0 ret 15 / 21

Properties: Programmability Julian Fietkau Programmability � as they’re mostly custom versions of C, GPGPU languages are rather simple to pick up for someone with C experience � OpenCL ™ and CUDA both look slightly boilerplate-y for small tasks � hypothesis: they might not be designed for small tasks � disadvantage of the cutting edge: toolchain maturity might be lacking � watch out for vendor dependencies! 16 / 21

Properties: Efficiency Julian Fietkau Efficiency � hard to find actual data � optimizations and proficiency might skew the results � conceptual similarities indicate that implementations would also be similar � CUDA can get a (constant) head start vs. OpenCL ™ due to being precompiled � CUDA might generally perform faster , sometimes significantly, than OpenCL (but take this with a grain of salt) 17 / 21

Prospects and Conclusions: Future Developments Julian Fietkau Things to Come The future remains notoriously hard to predict. � at the moment, we see increased interest in specialized GPGPU boards (cf. NVIDIA Tesla and AMD FireStream ) � OpenCL promotes device flexibility at the cost of efficiency – no way to know if this strategy will win � Intel pushes for integrated solutions with more processing power (cf. Sandy Bridge , Ivy Bridge ) 18 / 21

Prospects and Conclusions: Conclusion Julian Fietkau Conclusion � GPGPU is a viable way to to massively parallel work even on a home PC � will be further developed and refined, knowledge may be valuable 19 / 21

External Links: Weblinks Julian Fietkau Weblinks AMD Developer Central: Introduction to OpenCL ™ Programming http://developer.amd.com/zones/openclzone/...-may-2010.aspx GPGPU: OpenCL ™ (Università di Catania) http://www.dmi.unict.it/~bilotta/gpgpu/notes/11-opencl.html NVIDIA: PTX ISA Version 2.1 http://developer.download.nvidia.com/compute/.../ptx_isa_2.1.pdf AMD: High Level Programming for GPGPU http://coachk.cs.ucf.edu/courses/CDA6938/s08/AMD_IL.pdf 20 / 21

External Links: List of figures Julian Fietkau List of figures 1 Moore’s Law – 2011 , by Wgsimon via Wikimedia Commons , CC-BY-SA 2 Stream Computing Example , by Kallistratos via German Wikipedia , public domain 3 OpenCL – Simple Kernel Exec, by Joachim Weging, CC-BY-SA 4 OpenCL – Problem Partitioning, by Joachim Weging, CC-BY-SA 21 / 21

GPGPU and Stream Computing Julian Fietkau University of Hamburg - PowerPoint PPT Presentation

GPGPU and Stream Computing Julian Fietkau University of Hamburg June 30th, 2011 Julian Fietkau Things to clear up beforehand. . . These slides are published under the CC-BY-SA 3.0 license. Sources for the numbered figures are in the list of

Welcome! Global Agenda: 1. GPGPU (1) : Introduction, architecture, concepts 2. GPGPU (2) :

Welcome! Todays Agenda: GPU Execution Model GPGPU Flow GPGPU Low Level Notes

Parallel Incep+on MPP Databases GPGPU Kyle Dunn Me Data nerd for Recovering HPC/GPGPU

Welcome! Todays Agenda: Introduction to GPGPU Example: Voronoi Noise GPGPU

Welcome! Todays Agenda: Practical GPGPU: Verlet Fluid GPGPU Algorithms Optimizing

Efficient Abstractions for GPGPU Programming . Mathias Bourgoin 10.03.2015 Efficient

? sync ref chosen as sync source by Listener Stream B: Presentation Stream C: timestamps

Stream Ciphers Stream Ciphers 1 Stream Ciphers Generalization of one-time pad Trade

K Pre-Post Cloud Tutorial for the use of GPGPU instances RIKEN R-CCS MARCH 29, 2019 About this

K E D b . D a L a t a B a s e Jordan Vincent XML processing using GPGPU Jordan

GPGPU: General-Purpose Computation on GPUs Prekshu Ajmera 03d05006 Overview 1. Motivation: Why

GPGPU Programming in Haskell with Accelerate Trevor L. McDonell University of New South Wales

Assessing stream and riparian conditions Stream Habitat Assessment Conducted yearly

Phase III Stream Assessment Study: Potential Stream Restoration Projects Strawberry Run and

UPLOAD VIDEOS TO MICROSOFT STREAM VIA ACCESSUH To upload a video on Microsoft Stream, go to

Fresh water stream ecosystem Gr ov p 2 The description of stream lives Quadrat 1: Hong Kong Newt

A Rewriting Approach to the Design and Evolution of Object-Oriented Languages Mark Hills and

Library Choco: an Open Source Java Constraint Programming publics ou privs. recherche

Pony How I learned to stop worrying and embrace an unproven technology Sean T Allen Author of

SPWAG SIMPLE PRESENTATION WEB APP GENERATOR Lauren Zou, Aftab Khan, Richard Chiou Yunhe

Town of Youngsville Town of Zebulon Federal Highway Admin. TH THE CAMPO VISION N.C. Dept. of

From Official Statistics to Official Data Science Mark van der Loo, Statistics Netherlands CBS,

What is Bash Shell Scripting? A shell script is a script written for the shell, or command

MPI-based Approaches for Java http://www.hpjava.org/courses/arl Instructor: Bryan Carpenter