Shader Programming Shader Programming vs CUDA vs CUDA Tien-Tsin - PowerPoint PPT Presentation

Shader Programming Shader Programming vs CUDA vs CUDA Tien-Tsin Wong The Chinese University of Hong Kong 5 June 2008, CIGPU, WCCI 2008 T. T. Wong 5 June 2008, CIGPU, WCCI 2008

GPGPU GPGPU • Apply consumer parallel graphics hardware for general purpose (GP) computing • GPU almost comes with every PC • Let’s focus on two approaches: – Shader programming – CUDA T. T. Wong 5 June 2008, CIGPU, WCCI 2008

Shader Programming Shader Programming • GPU is not originally designed for GPGPU, but for graphics • Shader (program) • Shading language (specialized language, C- like) • A graphics “shell” is needed to perform your GP program T. T. Wong 5 June 2008, CIGPU, WCCI 2008

Programming as “Drawing” Programming as “Drawing” • Every program must be a “drawing” even you draw nothing • Two dummy triangles to cover the screen T. T. Wong 5 June 2008, CIGPU, WCCI 2008

Programming as “Drawing” (2) Programming as “Drawing” (2) • Then, rasterization (discretization to pixels) shaders • Each pixel triggers a shader T. T. Wong 5 June 2008, CIGPU, WCCI 2008

Pixel as Chromosome Pixel as Chromosome • For EC, it is natural to have each pixel being a chromosome • Each shader evaluates the objective function T. T. Wong 5 June 2008, CIGPU, WCCI 2008

CUDA CUDA • A tailormade platform for GPGPU on GPU • No dummy graphics “shell” T. T. Wong 5 June 2008, CIGPU, WCCI 2008

CUDA Architecture CUDA Architecture • shader => kernel • Shared memory • Thread synchronization • Communication! T. T. Wong 5 June 2008, CIGPU, WCCI 2008

Shader vs CUDA Shader vs CUDA • Learning curve: – Shader: Dummy graphics “shell” needed, and specialized shading language => Longer learning curve for non-graphics people – CUDA: Just like multi-thread programming, basically C language => easier to catch up for most people T. T. Wong 5 June 2008, CIGPU, WCCI 2008

Shader vs CUDA Shader vs CUDA • Communication among processes: – Shader: No communication => multiple passes, read & write textures for data sharing – CUDA: Yes, via shared memory & synchronization => less passes, more efficient and flexible T. T. Wong 5 June 2008, CIGPU, WCCI 2008

Shader vs CUDA (2) Shader vs CUDA (2) • Logical number of instances – Shader: Strongly coupled with screen resolution No. of pixels = No. of shader instances = No. of chromosomes => Straightforward problem formulation – CUDA: Depends on hardware limit No. of threads < No. of chromosomes => Each thread handles multiple chromosomes T. T. Wong 5 June 2008, CIGPU, WCCI 2008

Shader vs CUDA (3) Shader vs CUDA (3) • Efficiency • In theory, CUDA should be as efficient as shader programming T. T. Wong 5 June 2008, CIGPU, WCCI 2008

Shader vs CUDA (4) Shader vs CUDA (4) • Standardization – Shader: There are standards GLSL (OpenGL shading language) HLSL (MS DirectX high level shading language) => cross-platform (can be ATI or nVidia) – CUDA: Standard is still forming CUDA is basically supported by vender nVidia, not sure whether it will be supported by ATI T. T. Wong 5 June 2008, CIGPU, WCCI 2008

Shader vs CUDA (5) Shader vs CUDA (5) • Access to graphics specific functionalities • Mipmapping, Cubemap look-up – Shader: Accessible => fast evaluation (lookup) of spherical functions => fast downsampling and upsampling – CUDA: No access T. T. Wong 5 June 2008, CIGPU, WCCI 2008

Debugging Shader Debugging Shader • So far, quite limited • printf-style visual debugging (graphics) • Microsoft Shader Debugger – MS DirectX shaders can be debugged – Shader emulation on CPU, not debugging on actual GPU – seldom use as we stick to OpenGL for backward compatibility T. T. Wong 5 June 2008, CIGPU, WCCI 2008

Debugging Shader (2) Debugging Shader (2) • NVIDIA Shader Debugger for FX Composer – recently released in April 2008, as a plugin for FX composer!? http://developer.nvidia.com/object/shader_debugger_beta.html • glsldevil, OpenGL GLSL Debugger http://www.vis.uni-stuttgart.de/glsldevil/ T. T. Wong 5 June 2008, CIGPU, WCCI 2008

Debugging Shader (3) Debugging Shader (3) • Execution cycle needed for a shader can be determined offline nvshaderperf -a G70 -f main shader.cg http://developer.nvidia.com/object/nvshaderperf_home.html T. T. Wong 5 June 2008, CIGPU, WCCI 2008

Debugging CUDA Debugging CUDA • CUDA can be executed in device emulation mode => threads are executed sequentially • Set break point is feasible • Currently, debugging tools are still quite scarce T. T. Wong 5 June 2008, CIGPU, WCCI 2008

Debugging CUDA (2) Debugging CUDA (2) • VC++ debug modes – EmuDebug, Debug • Kernel codes are traceable in EmuDebug (emulation) mode, not on actual hardware • gdb debugger (not yet released) T. T. Wong 5 June 2008, CIGPU, WCCI 2008

Debugging CUDA (3) Debugging CUDA (3) • Profiling in CUDA By enabling CUDA_PROFILE: to enable (1) or disable (0) ./shaderprogram –N1024 method=[ memcopy ] gputime=[ 1427.200 ] method=[ memcopy ] gputime=[ 10.112 ] method=[ memcopy ] gputime=[ 9.632 ] method=[ real2complex ] gputime=[ 1654.080 ] cputime=[ 1702.000 ] occupancy=[ 0.667 ] method=[ c2c_radix4 ] gputime=[ 8651.936 ] cputime=[ 8683.000 ] occupancy=[ 0.333 ] method=[ transpose ] gputime=[ 2728.640 ] cputime=[ 2773.000 ] occupancy=[ 0.333 ] method=[ c2c_radix4 ] gputime=[ 8619.968 ] cputime=[ 8651.000 ] occupancy=[ 0.333 ] method=[ c2c_transpose ] gputime=[ 2731.456 ] cputime=[ 2762.000 ] occupancy=[ 0.333 ] method=[ solve_poisson] gputime=[ 6389.984 ] cputime=[ 6422.000 ] occupancy=[ 0.667 ] method=[ c2c_radix4 ] gputime=[ 8518.208 ] cputime=[ 8556.000 ] occupancy=[ 0.333 ] method=[ c2c_transpose] gputime=[ 2724.000 ] cputime=[ 2757.000 ] occupancy=[ 0.333 ] method=[ c2c_radix4 ] gputime=[ 8618.752 ] cputime=[ 8652.000 ] occupancy=[ 0.333 ] method=[ c2c_transpose] gputime=[ 2767.840 ] cputime=[ 5248.000 ] occupancy=[ 0.333 ] method=[ complex2real_scaled ] gputime=[ 2844.096 ] cputime=[ 3613.000 ] occupancy=[ 0.667 ] method=[ memcopy ] gputime=[ 2461.312 ] T. T. Wong 5 June 2008, CIGPU, WCCI 2008

Debugging CUDA (4) Debugging CUDA (4) • Occupancy -- amount of shared memory and registers used by each thread block • CUDA occupancy calculator computes the multiprocessor occupancy of the GPU by a given CUDA kernel http://developer.download.nvidia.com/compute/cuda/CUDA_Occupancy_calculator.xls T. T. Wong 5 June 2008, CIGPU, WCCI 2008

Panel Discussions Panel Discussions • Components needed for GPGPU from the perspective of EC community • Debugging experience • Standardization of GPGPU platforms and languages • Any other topics T. T. Wong 5 June 2008, CIGPU, WCCI 2008

Shader Programming Shader Programming vs CUDA vs CUDA Tien-Tsin - PowerPoint PPT Presentation

Shader Programming Shader Programming vs CUDA vs CUDA Tien-Tsin Wong The Chinese University of Hong Kong 5 June 2008, CIGPU, WCCI 2008 T. T. Wong 5 June 2008, CIGPU, WCCI 2008 GPGPU GPGPU Apply consumer parallel graphics hardware for

Outline Overview Parallel Computing with GPU Introduction to CUDA CUDA Thread Model

Displacement Shader Writing CSCD 472 Slide 1 4/5/10 Displacement Shader Variables CSCD 472

Introduction to CUDA C What is CUDA? CUDA Architecture Expose general-purpose GPU

Lecture 2.1 - Introduction to CUDA C CUDA C vs. Thrust vs. CUDA Libraries Objective To learn

CUDA/Ada An Ada binding to CUDA Reto B urki, Adrian-Ken R uegsegger University of Applied

From Shader Code to a Tera Terafl flop op: How Shader Cores Work Kayvon Fatahalian Stanford

RenderMan Shader Assignment So You Want to Write RenderMan shaders Due: Monday, May 3 rd

Shaders Rasmus Vahtra, Andres Traks What is a shader? Maybe this thing? Shader definition

GPU Programming Alan Gray EPCC The University of Edinburgh Overview Motivation and need

Shaders Slide credit to Prof. Zwicker Today Shader programming 2 Complete model Blinn

A High-Level Intro to CUDA CS5220 Fall 2015 What is CUDA? C ompute U nified D evice A

Lecture 2.4 Introduction to CUDA C Introduction to the CUDA Toolkit Objective To become

Computer Graphics Cuda Programming Hendrik Lensch Computer Graphics WS07/08

Computer Graphics Parallel Programming with Cuda Hendrik Lensch Computer Graphics

2110412 Parallel Comp Arch CUDA: Parallel Programming on GPU Natawut Nupairoj, Ph.D. Department

CUDA 8 AND BEYOND Mark Harris, April 5, 2016 INTRODUCING CUDA 8 Pascal Support Unified Memory

GPU Programming Ren Kloth Florian Wende Pro Seminar: Parallel Programming Freie Universitt

Microsoft Corporation http://www.jeff.wilcox.name/ 2

Introduction to LLVM UG3 Compiling Techniques Autumn 2017 Contact Information Instructor:

CANT WE ALL JUST GET ALONG? Andrina Kelly - @andrina - Bell Media ! Diana Birsan -

WebGL Agenda Rendering pipeline Boilerplate for minimal application Obtaining

INFOGR Computer Graphics Jacco Bikker & Debabrata Panja - April-July 2017 Lecture 8:

Optimal Dirichlet regions for elliptic PDEs Giuseppe Buttazzo Dipartimento di Matematica

ABC...L: The uniform abc -conjecture and zeros of Dirichlet L -functions Christian T afula

Shader Programming Shader Programming vs CUDA vs CUDA Tien-Tsin - PowerPoint PPT Presentation

Shader Programming Shader Programming vs CUDA vs CUDA Tien-Tsin Wong The Chinese University of Hong Kong 5 June 2008, CIGPU, WCCI 2008 T. T. Wong 5 June 2008, CIGPU, WCCI 2008 GPGPU GPGPU Apply consumer parallel graphics hardware for

Outline Overview Parallel Computing with GPU Introduction to CUDA CUDA Thread Model

Displacement Shader Writing CSCD 472 Slide 1 4/5/10 Displacement Shader Variables CSCD 472

Introduction to CUDA C What is CUDA? CUDA Architecture Expose general-purpose GPU

Lecture 2.1 - Introduction to CUDA C CUDA C vs. Thrust vs. CUDA Libraries Objective To learn

CUDA/Ada An Ada binding to CUDA Reto B urki, Adrian-Ken R uegsegger University of Applied

From Shader Code to a Tera Terafl flop op: How Shader Cores Work Kayvon Fatahalian Stanford

RenderMan Shader Assignment So You Want to Write RenderMan shaders Due: Monday, May 3 rd

Shaders Rasmus Vahtra, Andres Traks What is a shader? Maybe this thing? Shader definition

GPU Programming Alan Gray EPCC The University of Edinburgh Overview Motivation and need

Shaders Slide credit to Prof. Zwicker Today Shader programming 2 Complete model Blinn

A High-Level Intro to CUDA CS5220 Fall 2015 What is CUDA? C ompute U nified D evice A

Lecture 2.4 Introduction to CUDA C Introduction to the CUDA Toolkit Objective To become

Computer Graphics Cuda Programming Hendrik Lensch Computer Graphics WS07/08

Computer Graphics Parallel Programming with Cuda Hendrik Lensch Computer Graphics

2110412 Parallel Comp Arch CUDA: Parallel Programming on GPU Natawut Nupairoj, Ph.D. Department

CUDA 8 AND BEYOND Mark Harris, April 5, 2016 INTRODUCING CUDA 8 Pascal Support Unified Memory

GPU Programming Ren Kloth Florian Wende Pro Seminar: Parallel Programming Freie Universitt

Microsoft Corporation http://www.jeff.wilcox.name/ 2

Introduction to LLVM UG3 Compiling Techniques Autumn 2017 Contact Information Instructor:

CANT WE ALL JUST GET ALONG? Andrina Kelly - @andrina - Bell Media ! Diana Birsan -

WebGL Agenda Rendering pipeline Boilerplate for minimal application Obtaining

INFOGR Computer Graphics Jacco Bikker &amp; Debabrata Panja - April-July 2017 Lecture 8:

Optimal Dirichlet regions for elliptic PDEs Giuseppe Buttazzo Dipartimento di Matematica

ABC...L: The uniform abc -conjecture and zeros of Dirichlet L -functions Christian T afula

INFOGR Computer Graphics Jacco Bikker & Debabrata Panja - April-July 2017 Lecture 8: