I NTRODUCTION TO GPU C OMPUTING Ilya Kuzovkin 13 May 2014, Tartu P - - PowerPoint PPT Presentation

i ntroduction to
SMART_READER_LITE
LIVE PREVIEW

I NTRODUCTION TO GPU C OMPUTING Ilya Kuzovkin 13 May 2014, Tartu P - - PowerPoint PPT Presentation

I NTRODUCTION TO GPU C OMPUTING Ilya Kuzovkin 13 May 2014, Tartu P ART I T EAPOT S IMPLE O PEN GL P ROGRAM Idea of computing on GPU emerged because GPUs became very good at parallel computations. S IMPLE O PEN GL P ROGRAM Idea of computing


slide-1
SLIDE 1

INTRODUCTION TO GPU COMPUTING

Ilya Kuzovkin

13 May 2014, Tartu

slide-2
SLIDE 2

PART I “TEAPOT”

slide-3
SLIDE 3

SIMPLE OPENGL PROGRAM

Idea of computing on GPU emerged because GPUs became very good at parallel computations.

slide-4
SLIDE 4

SIMPLE OPENGL PROGRAM

Idea of computing on GPU emerged because GPUs became very good at parallel computations.

  • Let us start from
  • bserving an example
  • f parallelism in a

simple OpenGL application.

slide-5
SLIDE 5

SIMPLE OPENGL PROGRAM

You will need CodeBlocksWindows, Linux or XCodeMac to run this example.

  • Install CodeBlocks bundled with MinGW compiler from

http://www.codeblocks.org/downloads/26

  • Download codebase from https://github.com/kuz/

Introduction-to-GPU-Computing

  • Open the project from the code/Cube
  • Compile & run it
slide-6
SLIDE 6

SHADER PROGRAM

Program which is executed on GPU. Has to be written using shading language. In OpenGL this language is GLSL, which is based on C.

http://www.opengl.org/wiki/Shader

slide-7
SLIDE 7

SHADER PROGRAM

Program which is executed on GPU. Has to be written using shading language. In OpenGL this language is GLSL, which is based on C. OpenGL has 5 main shader stages:

  • Vertex Shader
  • Tessellation Control
  • Geometry Shader
  • Fragment Shader
  • Compute Shader (since 4.3)

http://www.opengl.org/wiki/Shader

slide-8
SLIDE 8

SHADER PROGRAM

Program which is executed on GPU. Has to be written using shading language. In OpenGL this language is GLSL, which is based on C. OpenGL has 5 main shader stages:

  • Vertex Shader
  • Tessellation Control
  • Geometry Shader
  • Fragment Shader
  • Compute Shader (since 4.3)

http://www.opengl.org/wiki/Shader

slide-9
SLIDE 9

LIGHTING

Is it a cube or not? We will find out as soon as we add lighting to the scene.

slide-10
SLIDE 10

LIGHTING

Is it a cube or not? We will find out as soon as we add lighting to the scene.

https://github.com/konstantint/ComputerGraphics2013/blob/master/Lectures/07%20-%20Color%20and%20Lighting/slides07_colorandlighting.pdf

slide-11
SLIDE 11

LIGHTING

Is it a cube or not? We will find out as soon as we add lighting to the scene.

https://github.com/konstantint/ComputerGraphics2013/blob/master/Lectures/07%20-%20Color%20and%20Lighting/slides07_colorandlighting.pdf

Exercise: code that equation into fragment shader of the Cube program

slide-12
SLIDE 12

LIGHTING

slide-13
SLIDE 13
  • Run the program with lighting enabled and look at

FPS values

COMPARE FPS

slide-14
SLIDE 14
  • Run the program with lighting enabled and look at

FPS values

  • In cube.cpp idle() function uncomment dummy

code which simulates approximately same amount of computations as Phong lighting model requires.

COMPARE FPS

slide-15
SLIDE 15
  • Run the program with lighting enabled and look at

FPS values

  • In cube.cpp idle() function uncomment dummy

code which simulates approximately same amount of computations as Phong lighting model requires.

  • Note that these computations are performed on CPU

COMPARE FPS

slide-16
SLIDE 16
  • Run the program with lighting enabled and look at

FPS values

  • In cube.cpp idle() function uncomment dummy

code which simulates approximately same amount of computations as Phong lighting model requires.

  • Note that these computations are performed on CPU
  • Observe how FPS has changed

COMPARE FPS

slide-17
SLIDE 17
  • Run the program with lighting enabled and look at

FPS values

  • In cube.cpp idle() function uncomment dummy

code which simulates approximately same amount of computations as Phong lighting model requires.

  • Note that these computations are performed on CPU
  • Observe how FPS has changed

Parallel computations are fast on GPU. Lets use it to compute something useful.

COMPARE FPS

slide-18
SLIDE 18

PART II “OLD SCHOOL”

slide-19
SLIDE 19

OPENGL PIPELINE + GLSL

http://www.opengl.org/wiki/Framebuffer

Take the input data from the CPU memory and put it as an image into the GPU memory

slide-20
SLIDE 20

http://www.opengl.org/wiki/Framebuffer

In the fragment shader perform a computation on each of the pixels of that image Take the input data from the CPU memory and put it as an image into the GPU memory

OPENGL PIPELINE + GLSL

slide-21
SLIDE 21

http://www.opengl.org/wiki/Framebuffer

In the fragment shader perform a computation on each of the pixels of that image Store the resulting image to the Render Buffer inside the GPU memory Take the input data from the CPU memory and put it as an image into the GPU memory

OPENGL PIPELINE + GLSL

slide-22
SLIDE 22

http://www.opengl.org/wiki/Framebuffer

Read output from the GPU memory back to the CPU memory Store the resulting image to the Render Buffer inside the GPU memory In the fragment shader perform a computation on each of the pixels of that image Take the input data from the CPU memory and put it as an image into the GPU memory

OPENGL PIPELINE + GLSL

slide-23
SLIDE 23
  • Create texture where will store the input data

http://www.opengl.org/wiki/Framebuffer

OPENGL PIPELINE + GLSL

slide-24
SLIDE 24
  • Create texture where will store the input data
  • Create FrameBuffer Object (FBO) to “render” to

http://www.opengl.org/wiki/Framebuffer

OPENGL PIPELINE + GLSL

slide-25
SLIDE 25
  • Run OpenGL pipeline

http://www.opengl.org/wiki/Framebuffer

OPENGL PIPELINE + GLSL

slide-26
SLIDE 26
  • Run OpenGL pipeline
  • Render GL_QUADS of same size as the texture matrix

http://www.opengl.org/wiki/Framebuffer

OPENGL PIPELINE + GLSL

slide-27
SLIDE 27
  • Run OpenGL pipeline
  • Render GL_QUADS of same size as the texture matrix
  • Use fragment shader to perform per-fragment

computations using data from the texture

http://www.opengl.org/wiki/Framebuffer

OPENGL PIPELINE + GLSL

slide-28
SLIDE 28
  • Run OpenGL pipeline
  • Render GL_QUADS of same size as the texture matrix
  • Use fragment shader to perform per-fragment

computations using data from the texture

  • OpenGL will store result in the texture given to the

Render Buffer (within Framebuffer Object)

http://www.opengl.org/wiki/Framebuffer

OPENGL PIPELINE + GLSL

slide-29
SLIDE 29
  • Run OpenGL pipeline
  • Render GL_QUADS of same size as the texture matrix
  • Use fragment shader to perform per-fragment

computations using data from the texture

  • OpenGL will store result in the texture given to the

Render Buffer (within Framebuffer Object)

  • Read the data from the Render Buffer

http://www.opengl.org/wiki/Framebuffer

OPENGL PIPELINE + GLSL

slide-30
SLIDE 30
  • Run OpenGL pipeline
  • Render GL_QUADS of same size as the texture matrix
  • Use fragment shader to perform per-fragment

computations using data from the texture

  • OpenGL will store result in the texture given to the

Render Buffer (within Framebuffer Object)

  • Read the data from the Render Buffer
  • Can we use that to properly debug GLSL?

http://www.opengl.org/wiki/Framebuffer

OPENGL PIPELINE + GLSL

slide-31
SLIDE 31

Run the project from the code/FBO

DEMO

slide-32
SLIDE 32

PART III “MODERN TIMES”

slide-33
SLIDE 33

COMPUTE SHADER

  • Since OpenGL 4.3
  • Used to compute things not related to rendering directly
slide-34
SLIDE 34

COMPUTE SHADER

  • Since OpenGL 4.3
  • Used to compute things not related to rendering directly
slide-35
SLIDE 35

COMPUTE SHADER

http://web.engr.oregonstate.edu/~mjb/cs557/Handouts/compute.shader.1pp.pdf

  • Since OpenGL 4.3
  • Used to compute things not related to rendering directly

W i l l n

  • t

t a l k a b

  • u

t i t

slide-36
SLIDE 36

http://wiki.tiker.net/CudaVsOpenCL

slide-37
SLIDE 37

Supported only by nVidia hardware Supported by nVidia, AMD, Intel, Qualcomm

https://developer.nvidia.com/cuda-gpus http://www.khronos.org/conformance/adopters/conformant-products#opencl http://wiki.tiker.net/CudaVsOpenCL

slide-38
SLIDE 38

Supported only by nVidia hardware Supported by nVidia, AMD, Intel, Qualcomm

https://developer.nvidia.com/cuda-gpus http://www.khronos.org/conformance/adopters/conformant-products#opencl

Implementations only by nVidia OpenCL

http://wiki.tiker.net/CudaVsOpenCL

slide-39
SLIDE 39

Supported only by nVidia hardware Supported by nVidia, AMD, Intel, Qualcomm

https://developer.nvidia.com/cuda-gpus http://www.khronos.org/conformance/adopters/conformant-products#opencl

Implementations only by nVidia OpenCL

http://wiki.tiker.net/CudaVsOpenCL

~same performance levels

slide-40
SLIDE 40

Supported only by nVidia hardware Supported by nVidia, AMD, Intel, Qualcomm

https://developer.nvidia.com/cuda-gpus http://www.khronos.org/conformance/adopters/conformant-products#opencl

Implementations only by nVidia OpenCL

http://wiki.tiker.net/CudaVsOpenCL

~same performance levels Developer-friendly OpenCL

slide-41
SLIDE 41

PART III CHAPTER 1

slide-42
SLIDE 42
slide-43
SLIDE 43
slide-44
SLIDE 44
slide-45
SLIDE 45
slide-46
SLIDE 46
slide-47
SLIDE 47

KERNEL

slide-48
SLIDE 48

KERNEL

slide-49
SLIDE 49

KERNEL

slide-50
SLIDE 50

WRITE AND READ DATA ON GPU

slide-51
SLIDE 51

WRITE AND READ DATA ON GPU

… run computations here …

slide-52
SLIDE 52

WRITE AND READ DATA ON GPU

… run computations here …

slide-53
SLIDE 53

THE COMPUTATION

slide-54
SLIDE 54

THE COMPUTATION

slide-55
SLIDE 55

THE COMPUTATION

slide-56
SLIDE 56

THE COMPUTATION

slide-57
SLIDE 57

THE COMPUTATION

slide-58
SLIDE 58

DEMO

Open, study and run the project from the code/OpenCL

slide-59
SLIDE 59

PART III CHAPTER 2

slide-60
SLIDE 60

CUDA PROGRAMMING MODEL

  • CPU is called “host”
  • Move data CPU <-> GPU memory cudaMemcopy
  • Allocate memory cudaMalloc ¡
  • Launch kernels on GPU
  • GPU is called “device”
slide-61
SLIDE 61

CUDA PROGRAMMING MODEL

  • CPU is called “host”
  • Move data CPU <-> GPU memory cudaMemcopy
  • Allocate memory cudaMalloc ¡
  • Launch kernels on GPU
  • GPU is called “device”
  • 1. CPU allocates memory on GPU
  • 2. CPU copies data to GPU memory
  • 3. CPU launches kernels on GPU (process the data)
  • 4. CPU copies results back to CPU memory

Typical CUDA program

slide-62
SLIDE 62

CUDA PROGRAMMING MODEL

  • CPU is called “host”
  • Move data CPU <-> GPU memory cudaMemcopy
  • Allocate memory cudaMalloc ¡
  • Launch kernels on GPU
  • GPU is called “device”
  • 1. CPU allocates memory on GPU
  • 2. CPU copies data to GPU memory
  • 3. CPU launches kernels on GPU (process the data)
  • 4. CPU copies results back to CPU memory

Typical CUDA program

Very similar to the logic of OpenCL

slide-63
SLIDE 63

EXAMPLE

Introduction to Parallel Programming @ Udacity https://www.udacity.com/course/cs344

slide-64
SLIDE 64

EXAMPLE

Introduction to Parallel Programming @ Udacity https://www.udacity.com/course/cs344

slide-65
SLIDE 65

EXAMPLE

Introduction to Parallel Programming @ Udacity https://www.udacity.com/course/cs344

slide-66
SLIDE 66

EXAMPLE

Introduction to Parallel Programming @ Udacity https://www.udacity.com/course/cs344

slide-67
SLIDE 67

EXAMPLE

Introduction to Parallel Programming @ Udacity https://www.udacity.com/course/cs344

slide-68
SLIDE 68

EXAMPLE

Introduction to Parallel Programming @ Udacity https://www.udacity.com/course/cs344

slide-69
SLIDE 69

EXAMPLE

Introduction to Parallel Programming @ Udacity https://www.udacity.com/course/cs344

slide-70
SLIDE 70

PART IV “DISCUSSION”

slide-71
SLIDE 71

LINKS

  • Code repository for this presentation
  • https://github.com/kuz/Introduction-to-GPU-Computing
  • Feel free to leave feature requests there, ask questions, etc.
  • OpenCL tutorials & presentations
  • http://streamcomputing.eu/knowledge/for-developers/tutorials/
  • http://opencl.codeplex.com/wikipage?title=OpenCL%20Tutorials%20-%201&referringTitle=OpenCL%20Tutorials
  • http://www.cc.gatech.edu/~vetter/keeneland/tutorial-2011-04-14/06-intro_to_opencl.pdf
  • Introduction to Parallel Computing @ www.udacity.com
  • https://www.udacity.com/course/cs344
  • Slides about Compute Shader
  • http://web.engr.oregonstate.edu/~mjb/cs557/Handouts/compute.shader.1pp.pdf
  • Introduction to Computer Graphics with codebases
  • https://github.com/konstantint/ComputerGraphics2013
  • GLSL Computing (aka “Old school”)
  • http://www.computer-graphics.se/gpu-computing/lab1.html