Lecture 2.1 - Introduction to CUDA C CUDA C vs. Thrust vs. CUDA - PowerPoint PPT Presentation

May 24, 2023 •1.09k likes •1.28k views

GPU Teaching Kit Accelerated Computing Lecture 2.1 - Introduction to CUDA C CUDA C vs. Thrust vs. CUDA Libraries Objective To learn the main venues and developer resources for GPU computing Where CUDA C fits in the big picture 2 3

GPU Teaching Kit Accelerated Computing Lecture 2.1 - Introduction to CUDA C CUDA C vs. Thrust vs. CUDA Libraries
Objective – To learn the main venues and developer resources for GPU computing – Where CUDA C fits in the big picture 2
3 Ways to Accelerate Applications Applications Programming Compiler Libraries Languages Directives Easy to use Easy to use Most Performance Most Performance Portable code Most Flexibility 3
Libraries: Easy, High-Quality Acceleration Ease of use: Using libraries enables GPU acceleration without in- depth knowledge of GPU programming “Drop-in”: Many GPU-accelerated libraries follow standard APIs, thus enabling acceleration with minimal code changes Quality: Libraries offer high-quality implementations of functions encountered in a broad range of applications 4
GPU Accelerated Libraries Linear Algebra NVIDIA cuFFT, FFT, BLAS , cuBLAS, S PARS E, Matrix cuSPARSE Numerical & Mat h NVIDIA NVIDIA Math cuRAND RAND, S tatistics Lib Dat a S t ruct . & AI GPU AI – GPU AI – Board Path S ort, S can, Zero S um Games Finding NVIDIA Visual Processing Video NVIDIA Encode Image & Video NPP 5
Vector Addition in Thrust thrust::device_vector< float > deviceInput1(inputLength); thrust::device_vector< float > deviceInput2(inputLength); thrust::device_vector< float > deviceOutput(inputLength); thrust::copy(hostInput1, hostInput1 + inputLength, deviceInput1.begin()); thrust::copy(hostInput2, hostInput2 + inputLength, deviceInput2.begin()); thrust::transform(deviceInput1.begin(), deviceInput1.end(), deviceInput2.begin(), deviceOutput.begin(), thrust::plus< float >()); 6
Compiler Directives: Easy, Portable Acceleration Ease of use: Compiler takes care of details of parallelism management and data movement Portable: The code is generic, not specific to any type of hardware and can be deployed into multiple languages Uncertain: Performance of code can vary across compiler versions 7
OpenACC – Compiler directives for C, C++, and FORTRAN #pragma acc parallel loop copyin(input1[0:inputLength],input2[0:inputLength]), copyout(output[0:inputLength]) for(i = 0; i < inputLength; ++i) { output[i] = input1[i] + input2[i]; } 8
Programming Languages: Most Performance and Flexible Acceleration Performance: Programmer has best control of parallelism and data movement Flexible: The computation does not need to fit into a limited set of library patterns or directive types Verbose: The programmer often needs to express more details 9
GPU Programming Languages MATLAB, Mathematica, LabVIEW Numerical analytics CUDA Fortran Fortran CUDA C C CUDA C++ C++ PyCUDA, Copperhead, Numba Python F# Alea.cuBase 10
CUDA - C Applications Programming Compiler Libraries Languages Directives Easy to use Easy to use Most Performance Most Performance Portable code Most Flexibility 11
GPU Teaching Kit Accelerated Computing The GPU Teaching Kit is licensed by NVIDIA and the University of Illinois under the Creative Commons Attribution-NonCommercial 4.0 International License.

Recommend

Outline Overview Parallel Computing with GPU Introduction to CUDA CUDA Thread Model

Outline Overview Parallel Computing with GPU Introduction to CUDA CUDA Thread Model CUDA Memory Hierarchy and Memory Spaces CUDA Memory Hierarchy and Memory Spaces CUDA Synchronization 2110412 Parallel Comp Arch CUDA:

449 views • 12 slides

Introduction to CUDA C What is CUDA? CUDA Architecture Expose general-purpose GPU

Introduction to CUDA C What is CUDA? CUDA Architecture Expose general-purpose GPU computing as first-class capability Retain traditional DirectX/OpenGL graphics performance CUDA C Based on industry-standard C A handful of

1.31k views • 62 slides

CUDA/Ada An Ada binding to CUDA Reto B urki, Adrian-Ken R uegsegger University of Applied

Introduction Bindings in Ada CUDA/Ada Design CUDA Binding Conclusion CUDA/Ada An Ada binding to CUDA Reto B urki, Adrian-Ken R uegsegger University of Applied Sciences Rapperswil (HSR), Switzerland 1/16/2012 Master seminar: Program

469 views • 36 slides

Lecture 2.4 Introduction to CUDA C Introduction to the CUDA Toolkit Objective To become

GPU Teaching Kit Accelerated Computing Lecture 2.4 Introduction to CUDA C Introduction to the CUDA Toolkit Objective To become familiar with some valuable tools and resources from the CUDA Toolkit Compiler flags Debuggers

887 views • 34 slides

GPU Programming Alan Gray EPCC The University of Edinburgh Overview Motivation and need

GPU Programming Alan Gray EPCC The University of Edinburgh Overview Motivation and need for CUDA Introduction to CUDA CUDA kernels, decompositions CUDA memory management C and Fortran OpenCL 2 NVIDIA CUDA

1.24k views • 37 slides

A High-Level Intro to CUDA CS5220 Fall 2015 What is CUDA? C ompute U nified D evice A

A High-Level Intro to CUDA CS5220 Fall 2015 What is CUDA? C ompute U nified D evice A rchitecture released in 2007 GPU Computing Extension of C/C++ requires NVCC (CUDA Compiler) and NVIDIA Graphics Card Historical

507 views • 38 slides

Computer Graphics Parallel Programming with Cuda Hendrik Lensch Computer Graphics

Computer Graphics Parallel Programming with Cuda Hendrik Lensch Computer Graphics WS07/08 HW-Shading Overview So far: Introduction to Cuda GPGPU via Cuda (general purpose computing on the GPU) Block matrix-matrix

755 views • 49 slides

2110412 Parallel Comp Arch CUDA: Parallel Programming on GPU Natawut Nupairoj, Ph.D. Department

2110412 Parallel Comp Arch CUDA: Parallel Programming on GPU Natawut Nupairoj, Ph.D. Department of Computer Engineering, Chulalongkorn University Outline Overview Parallel Computing with GPU Introduction to CUDA CUDA Thread Model

705 views • 66 slides

S9751: ACCELERATE YOUR CUDA DEVELOPMENT WITH LATEST DEBUGGING AND CODE ANALYSIS DEVELOPER TOOLS

S9751: ACCELERATE YOUR CUDA DEVELOPMENT WITH LATEST DEBUGGING AND CODE ANALYSIS DEVELOPER TOOLS Aurelien Chartier & Steve Ulrich, March 19 th , 2019 Debugging Tools Nsight Eclipse Edition CUDA GDB Nsight Visual Studio Steve Ulrich CUDA

890 views • 36 slides

CUDA 7 AND BEYOND MARK HARRIS, NVIDIA CUDA 7 Runtime C++11 cuSOLVER Compilation

CUDA 7 AND BEYOND MARK HARRIS, NVIDIA CUDA 7 Runtime C++11 cuSOLVER Compilation [&](char)c)){) ))for)(auto)x):)letters))) ))))if)(c)==)x))return)true;) ))return)false;) }) C++11 FEELS LIKE A NEW LANGUAGE Bjarne Stroustrup,

1.1k views • 37 slides

SC13 GPU Technology Theater Accessing New CUDA Features from CUDA Fortran Brent Leback, Compiler

SC13 GPU Technology Theater Accessing New CUDA Features from CUDA Fortran Brent Leback, Compiler Manager, PGI The Case for Fortran Clear, straight-forward syntax Successful legacy in the scientific community Large existing code base

624 views • 20 slides

CUDA 8 AND BEYOND Mark Harris, April 5, 2016 INTRODUCING CUDA 8 Pascal Support Unified Memory

April 4-7, 2016 | Silicon Valley CUDA 8 AND BEYOND Mark Harris, April 5, 2016 INTRODUCING CUDA 8 Pascal Support Unified Memory New Architecture, Stacked Memory , NVLINK Simple Parallel Programming with large virtual memory Libraries

501 views • 39 slides

PerfMon redux: analyzing a CUDA application with the Windows PerfMon redux: analyzing a CUDA

S6287 PerfMon redux: analyzing a CUDA application with the Windows PerfMon redux: analyzing a CUDA application with the Windows Performance Monitor Richard Wilton Department of Physics and Astronomy Johns Hopkins University S6287: Analyzing

279 views • 25 slides

CUDA ON MOBILE Yogesh Kini, GTC 2016 Typical pipeline ABSTRACT CUDA Interop APIs Unified

April 4-7, 2016 | Silicon Valley CUDA ON MOBILE Yogesh Kini, GTC 2016 Typical pipeline ABSTRACT CUDA Interop APIs Unified Memory on Tegra 2 TYPICAL USE CASES Automobiles: Autonomous Cars Mobile Devices: Consoles, Tablets Embedded: Drones,

525 views • 17 slides

Approaches to GPU computing Manuel Ujaldon Nvidia CUDA Fellow Computer Architecture Department

Approaches to GPU computing Manuel Ujaldon Nvidia CUDA Fellow Computer Architecture Department University of Malaga (Spain) Talk outline [40 slides] 1. Programming choices. [30] 1. CUDA libraries and tools. [10] 2. Targeting CUDA to other

1.17k views • 49 slides

Plan Optimizing Matrix Transpose with CUDA 1 CS4402-9535: High-Performance Computing with CUDA

Plan Optimizing Matrix Transpose with CUDA 1 CS4402-9535: High-Performance Computing with CUDA Performance Optimization 2 Marc Moreno Maza Parallel Reduction 3 University of Western Ontario, London, Ontario (Canada) Parallel Scan 4

573 views • 29 slides

Parallel computing with Python Delft University of Technology Alvaro Leitao Rodr guez

Parallel computing with Python Delft University of Technology Alvaro Leitao Rodr guez December 10, 2014 Alvaro Leitao Rodr guez (TU Delft) Parallel Python December 10, 2014 1 / 36 Outline 1 Python tools for parallel

593 views • 36 slides

20 Degrees: zero-carbon city and country The future Brisbane through the lens of greenspace:

20 Degrees: zero-carbon city and country The future Brisbane through the lens of greenspace: nature, landscape and food. mongard.com.au Why do children have to strike to make us save the planet? mongard.com.au Climate Crisis = Place Crisis

586 views • 29 slides

Advances in the perturbative description of LSS Zvonimir Vlah CERN with: Elisa Chisari, Fabian

Advances in the perturbative description of LSS Zvonimir Vlah CERN with: Elisa Chisari, Fabian Schmidt & Patrick McDonald Understanding galaxy overdensity and shape clustering 2 / LSS using PT Introduction 24 Galaxies and biasing of

561 views • 25 slides

CS 61: Database Systems NoSQL/Mongo CRUD Adapted from mongodb.com unless otherwise noted Agenda

CS 61: Database Systems NoSQL/Mongo CRUD Adapted from mongodb.com unless otherwise noted Agenda 1. Why choose NoSQL 2. Mongo CRUD 2 Relational databases have historically been the safe bet for the enterprise SQL databases are a solid

667 views • 29 slides

Parallel Space-Time Kernel Density Estimation Erik Saule , Dinesh Panchananam , Alexander

Parallel Space-Time Kernel Density Estimation Erik Saule , Dinesh Panchananam , Alexander Hohl , Wenwu Tang , Eric Delmelle Dept. of Computer Science Dept. of Geography and Earth Sciences UNC Charlotte Email: {

907 views • 33 slides

Security CS 4720 Mobile Application Development CS 4720 The Traditional Security Model

Security CS 4720 Mobile Application Development CS 4720 The Traditional Security Model The Firewall Approach Keep the good guys in and the bad guys out CS 4720 2 Distributed System Security Islands of Security

883 views • 39 slides

REAL TIME CONTROL FOR ADAPTIVE OPTICS WORKSHOP (3RD EDITION) 27 th January 2016 Franois

REAL TIME CONTROL FOR ADAPTIVE OPTICS WORKSHOP (3RD EDITION) 27 th January 2016 Franois Courteille |Senior Solutions Architect, NVIDIA |fcourteille@nvidia.com GAMING PRO VISUALIZATION ENTERPRISE DATA CENTER AUTO THE WORLD LEADER IN VISUAL

1.14k views • 59 slides

CS4501: Introduction to Computer Vision Neural Networks (NNs) Artificial Neural Networks (ANNs)

CS4501: Introduction to Computer Vision Neural Networks (NNs) Artificial Neural Networks (ANNs) Multi-layer Perceptrons (MLPs Previous Neural Networks The Perceptron Model The Multi-layer Perceptron (MLP) Forward-pass in an MLP

1.54k views • 40 slides