GPGPU and Stream Computing Julian Fietkau University of Hamburg - - PowerPoint PPT Presentation

gpgpu and stream computing
SMART_READER_LITE
LIVE PREVIEW

GPGPU and Stream Computing Julian Fietkau University of Hamburg - - PowerPoint PPT Presentation

GPGPU and Stream Computing Julian Fietkau University of Hamburg June 30th, 2011 Julian Fietkau Things to clear up beforehand. . . These slides are published under the CC-BY-SA 3.0 license. Sources for the numbered figures are in the list of


slide-1
SLIDE 1

GPGPU and Stream Computing

Julian Fietkau

University of Hamburg

June 30th, 2011

slide-2
SLIDE 2

Julian Fietkau

Things to clear up beforehand. . .

These slides are published under the CC-BY-SA 3.0 license.

Sources for the numbered figures are in the →list of figures. Non- numbered pictures and illustrations are from the OpenClipArt Project or are based on content from there.

Download these slides and give feedback:

http://www.julian-fietkau.de/gpgpu_and_stream_computing

2 / 21

slide-3
SLIDE 3

Agenda Julian Fietkau

Agenda

Introduction General Idea of GPGPU Stream Computing Languages Common Ideas OpenCL CUDA Others Compilation to Intermediary Languages Properties Programmability Efficiency Prospects and Conclusions Future Developments Conclusion

3 / 21

slide-4
SLIDE 4

Introduction: General Idea of GPGPU Julian Fietkau

Flynn’s Taxonomy

SISD MISD SIMD MIMD

4 / 21

slide-5
SLIDE 5

Introduction: General Idea of GPGPU Julian Fietkau

Why Does It Exist?

How long can Moore’s law

hold true? → parallelism as a possible answer to computational demands

“swiss army knife”

(generally optimal solution) for parallel programming has not been found

idea: exploit

consumer-grade graphics hardware

Figure 1: Moore’s law – 2011

5 / 21

slide-6
SLIDE 6

Introduction: General Idea of GPGPU Julian Fietkau

About Graphics Hardware

games need to display increasingly

realistic objects/scenes in real time

need to calculate a lot of vertices

and a lot of pixels very quickly → Pixel/Vertex Shaders, later Unified Shader Model

consumer market ensures that

graphics adapters remain (relatively) cheap

General Purpose computation on

Graphics Processing Units

6 / 21

slide-7
SLIDE 7

Introduction: Stream Computing Julian Fietkau

Stream Computing

idea: operate on a “stream” of data passing through different

“kernels”

related to SIMD mitigates some of the difficulties of parallelism on von Neumann

architectures as well as simple SIMD implementations like SSE or AltiVec

first came up in the 70ies, didn’t gain much traction as “pure”

implementations, but hybrid architectures survived

7 / 21

slide-8
SLIDE 8

Introduction: Stream Computing Julian Fietkau

Stream Computing Example

Input: u, v, w; x = u - (v + w); y = u * (v + w); Output: x, y;

Figure 2: Stream Computing Example

8 / 21

slide-9
SLIDE 9

Languages: Common Ideas Julian Fietkau

Common Ideas

modern streaming programming languages. . .

. . . are verbose about different usage scenarios for memory . . . help with partitioning problem spaces in a multitude of ways . . . are not afraid to introduce limitations to faciliate optimization 9 / 21

slide-10
SLIDE 10

Languages: OpenCL Julian Fietkau

OpenCL™

Open Computing Language, free standard by Khronos™ Group

Context Application Kernel Command Queue Device

Figure 3: OpenCL™ Application Model

10 / 21

slide-11
SLIDE 11

Languages: OpenCL Julian Fietkau

OpenCL™ in Detail

Host Application Device NDRange Work group (0,0) Work group (1,0) Work group (1,1) Work group (2,0) Work group (0,1) Work group (1,2) Work item (0,1,0) Work item (1,1,0) Work item (2,1,0) Work item (0,0,0) Work item (2,0,1) Work item (2,1,1) Work item (1,0,0) Work item (2,0,0)

Figure 4: OpenCL™ Problem Partitioning

11 / 21

slide-12
SLIDE 12

Languages: CUDA Julian Fietkau

CUDA

NVIDIA’s custom framework for high-level GPGPU (it’s actually older than OpenCL though) same basic idea, but specific to NVIDIA GPUs conceptually only minor differences between CUDA and OpenCL biggest one: CUDA is compiled at application compile time while

OpenCL is (typically) compiled at application run time

also, annoying nomenclature differences (e.g. shared vs. local vs.

private memory)

12 / 21

slide-13
SLIDE 13

Languages: Others Julian Fietkau

Others

There are several more stream processing languages, some of them long in development. Notable:

Brook (and Brook+) Cilk, compare also Intel Array Building Blocks 13 / 21

slide-14
SLIDE 14

Languages: Compilation to Intermediary Languages Julian Fietkau

Intermediary Languages

Problem

The actual binary code that runs on devices needs to “know” about exact numbers for cores, memory, registers etc., information that is generally not known at compile time. → compilation to an intermediary language like NVIDIA’s PTX and AMD’s IL, low-level and assembly-like yet abstracting some hardware limitations

14 / 21

slide-15
SLIDE 15

Languages: Compilation to Intermediary Languages Julian Fietkau

PTX and AMD IL

PTX example

.reg .b32 r1, r2; .global .f32 array[N]; start: mov.b32 r1, %tid.x; shl.b32 r1, r1, 2; // shift thread id by 2 bits ld.global.b32 r2, array[r1]; // thread[tid] gets array[tid] add.f32 r2, r2, 0.5; // add 1/2

AMD IL example

sample_resource(0)_sampler(0) r0.x, v0.xy00 mov r2.x, r0.xxxx dcl_output_generic o0 ret

15 / 21

slide-16
SLIDE 16

Properties: Programmability Julian Fietkau

Programmability

as they’re mostly custom versions of C, GPGPU languages are

rather simple to pick up for someone with C experience

OpenCL™ and CUDA both look slightly boilerplate-y for small

tasks

hypothesis: they might not be designed for small tasks disadvantage of the cutting edge: toolchain maturity might be

lacking

watch out for vendor dependencies! 16 / 21

slide-17
SLIDE 17

Properties: Efficiency Julian Fietkau

Efficiency

hard to find actual data

  • ptimizations and proficiency might skew the results

conceptual similarities indicate that implementations would also be

similar

CUDA can get a (constant) head start vs. OpenCL™ due to being

precompiled

CUDA might generally perform faster, sometimes significantly,

than OpenCL (but take this with a grain of salt)

17 / 21

slide-18
SLIDE 18

Prospects and Conclusions: Future Developments Julian Fietkau

Things to Come

The future remains notoriously hard to predict.

at the moment, we see increased interest in specialized GPGPU

boards (cf. NVIDIA Tesla and AMD FireStream)

OpenCL promotes device flexibility at the cost of efficiency – no

way to know if this strategy will win

Intel pushes for integrated solutions with more processing power

(cf. Sandy Bridge, Ivy Bridge)

18 / 21

slide-19
SLIDE 19

Prospects and Conclusions: Conclusion Julian Fietkau

Conclusion

GPGPU is a viable way to to massively parallel work even on a

home PC

will be further developed and refined, knowledge may be valuable 19 / 21

slide-20
SLIDE 20

External Links: Weblinks Julian Fietkau

Weblinks

AMD Developer Central: Introduction to OpenCL™ Programming

http://developer.amd.com/zones/openclzone/...-may-2010.aspx

GPGPU: OpenCL™ (Università di Catania)

http://www.dmi.unict.it/~bilotta/gpgpu/notes/11-opencl.html

NVIDIA: PTX ISA Version 2.1

http://developer.download.nvidia.com/compute/.../ptx_isa_2.1.pdf

AMD: High Level Programming for GPGPU

http://coachk.cs.ucf.edu/courses/CDA6938/s08/AMD_IL.pdf

20 / 21

slide-21
SLIDE 21

External Links: List of figures Julian Fietkau

List of figures

1 Moore’s Law – 2011, by Wgsimon via Wikimedia Commons, CC-BY-SA 2 Stream Computing Example, by Kallistratos via German Wikipedia, public domain 3 OpenCL – Simple Kernel Exec, by Joachim Weging, CC-BY-SA 4 OpenCL – Problem Partitioning, by Joachim Weging, CC-BY-SA

21 / 21