Lecture 3.2 CUDA Parallelism Model Multidimensional Kernel - PowerPoint PPT Presentation

Jul 16, 2023 •141 likes •243 views

GPU Teaching Kit GPU Teaching Kit GPU Teaching Kit Accelerated Computing Lecture 3.2 CUDA Parallelism Model Multidimensional Kernel Configuration Objective To understand multidimensional Grids Multi-dimensional block and thread

GPU Teaching Kit GPU Teaching Kit GPU Teaching Kit Accelerated Computing Lecture 3.2 – CUDA Parallelism Model Multidimensional Kernel Configuration
Objective – To understand multidimensional Grids – Multi-dimensional block and thread indices – Mapping block/thread indices to data indices 2 2
A Multi-Dimensional Grid Example device host Block Block Grid 1 (0, 0) (0, 1) Kernel 1 Block Block (1, 0) (1, 1) Block (1,0) Grid 2 (1,0,0) (1,0,1) (1,0,2) (1,0,3) Thread Thread Thread Thread (0,0,0) (0,0,1) (0,0,2) (0,0,3) Thread Thread Thread Thread Thread (0,0,0) (0,1,0) (0,1,1) (0,1,2) (0,1,3) 3 3
Processing a Picture with a 2D Grid 16 � 16 blocks 62 � 76 picture 4
Row-Major Layout in C/C++ M R ow*Width+Col = 2*4+1 = 9 M 0 M 1 M 2 M 3 M 4 M 5 M 6 M 7 M 8 M 9 M 10 M 11 M 12 M 13 M 14 M 15 M M 0,0 M 0,1 M 0,2 M 0,3 M 1,0 M 1,1 M 1,2 M 1,3 M 2,0 M 2,1 M 2,2 M 2,3 M 3,0 M 3,1 M 3,2 M 3,3 M 0,0 M 0,1 M 0,2 M 0,3 M 1,0 M 1,1 M 1,2 M 1,3 M 2,0 M 2,1 M 2,2 M 2,3 M 3,0 M 3,1 M 3,2 M 3,3 5
Source Code of a PictureKernel __global__ void PictureKernel(float* d_Pin, float* d_Pout, int height, int width) { // Calculate the row # of the d_Pin and d_Pout element int Row = blockIdx.y*blockDim.y + threadIdx.y; // Calculate the column # of the d_Pin and d_Pout element int Col = blockIdx.x*blockDim.x + threadIdx.x; // each thread computes one element of d_Pout if in range if ((Row < height) && (Col < width)) { d_Pout[Row*width+Col] = 2.0*d_Pin[Row*width+Col]; } } S cale every pixel value by 2.0 6
Host Code for Launching PictureKernel // assume that the picture is m � n, // m pixels in y dimension and n pixels in x dimension // input d_Pin has been allocated on and copied to device // output d_Pout has been allocated on device … dim3 DimGrid((n-1)/16 + 1, (m-1)/16+1, 1); dim3 DimBlock(16, 16, 1); PictureKernel<<<DimGrid,DimBlock>>>(d_Pin, d_Pout, m, n); … 7
Covering a 62 � 76 Picture with 16 � 16 Blocks Not all threads in a Block will follow the same control flow path. 8
GPU Teaching Kit GPU Teaching Kit Accelerated Computing The GPU Teaching Kit is licensed by NVIDIA and the University of Illinois under the Creative Commons Attribution-NonCommercial 4.0 International License.

Recommend

Outline Overview Parallel Computing with GPU Introduction to CUDA CUDA Thread Model

Outline Overview Parallel Computing with GPU Introduction to CUDA CUDA Thread Model CUDA Memory Hierarchy and Memory Spaces CUDA Memory Hierarchy and Memory Spaces CUDA Synchronization 2110412 Parallel Comp Arch CUDA:

449 views • 12 slides

Hardware Parallelism vs. Software Parallelism USENIX Workshop on Hot Topics in Parallelism March

Hardware Parallelism vs. Software Parallelism USENIX Workshop on Hot Topics in Parallelism March 30, 2009 Billions of transistors Hardware Parallelism vs. Software Parallelism USENIX Workshop on Hot Topics in Parallelism March 30, 2009 Multicore

469 views • 23 slides

Lecture 2.1 - Introduction to CUDA C CUDA C vs. Thrust vs. CUDA Libraries Objective To learn

GPU Teaching Kit Accelerated Computing Lecture 2.1 - Introduction to CUDA C CUDA C vs. Thrust vs. CUDA Libraries Objective To learn the main venues and developer resources for GPU computing Where CUDA C fits in the big picture 2 3

1.28k views • 12 slides

Introduction to CUDA C What is CUDA? CUDA Architecture Expose general-purpose GPU

Introduction to CUDA C What is CUDA? CUDA Architecture Expose general-purpose GPU computing as first-class capability Retain traditional DirectX/OpenGL graphics performance CUDA C Based on industry-standard C A handful of

1.31k views • 62 slides

CUDA/Ada An Ada binding to CUDA Reto B urki, Adrian-Ken R uegsegger University of Applied

Introduction Bindings in Ada CUDA/Ada Design CUDA Binding Conclusion CUDA/Ada An Ada binding to CUDA Reto B urki, Adrian-Ken R uegsegger University of Applied Sciences Rapperswil (HSR), Switzerland 1/16/2012 Master seminar: Program

469 views • 36 slides

Chapter 17: Parallel Databases Introduction I/O Parallelism Interquery Parallelism

' $ Chapter 17: Parallel Databases Introduction I/O Parallelism Interquery Parallelism Intraquery Parallelism Intraoperation Parallelism Interoperation Parallelism Design of Parallel Systems & % Database Systems

341 views • 21 slides

Lecture 2.4 Introduction to CUDA C Introduction to the CUDA Toolkit Objective To become

GPU Teaching Kit Accelerated Computing Lecture 2.4 Introduction to CUDA C Introduction to the CUDA Toolkit Objective To become familiar with some valuable tools and resources from the CUDA Toolkit Compiler flags Debuggers

887 views • 34 slides

Advanced OpenMP Lecture 6: Nested parallelism Nested parallelism Nested parallelism is

Advanced OpenMP Lecture 6: Nested parallelism Nested parallelism Nested parallelism is supported in OpenMP. If a PARALLEL directive is encountered within another PARALLEL directive, a new team of threads will be created. This is

242 views • 11 slides

CSCI341 Lecture 37, Introduction to Parallelism PIPELINING Exploits potential parallelism

CSCI341 Lecture 37, Introduction to Parallelism PIPELINING Exploits potential parallelism among instructions. Instruction-level parallelism INSTRUCTION-LEVEL PARALLELISM Increase depth of pipeline (greater overlap of

646 views • 26 slides

Module 3.1 - CUDA Parallelism Model Kernel-Based SPMD Parallel Programming Objective To

GPU Teaching Kit Accelerated Computing Module 3.1 - CUDA Parallelism Model Kernel-Based SPMD Parallel Programming Objective To learn the basic concepts involved in a simple CUDA kernel function Declaration Built-in variables

269 views • 9 slides

A High-Level Intro to CUDA CS5220 Fall 2015 What is CUDA? C ompute U nified D evice A

A High-Level Intro to CUDA CS5220 Fall 2015 What is CUDA? C ompute U nified D evice A rchitecture released in 2007 GPU Computing Extension of C/C++ requires NVCC (CUDA Compiler) and NVIDIA Graphics Card Historical

507 views • 38 slides

GPU Programming Alan Gray EPCC The University of Edinburgh Overview Motivation and need

GPU Programming Alan Gray EPCC The University of Edinburgh Overview Motivation and need for CUDA Introduction to CUDA CUDA kernels, decompositions CUDA memory management C and Fortran OpenCL 2 NVIDIA CUDA

1.24k views • 37 slides

Pervasive Parallelism Laboratory Stanford University ppl.stanford.edu Make parallelism

Kunle Olukotun Pervasive Parallelism Laboratory Stanford University ppl.stanford.edu Make parallelism accessible to all programmers Parallelism is not for the average programmer Too difficult to find parallelism, to debug, maintain

593 views • 40 slides

Data-Level Parallelism Nima Honarmand Fall 2015 :: CSE 610 Parallel Computer Architectures

Fall 2015 :: CSE 610 Parallel Computer Architectures Data-Level Parallelism Nima Honarmand Fall 2015 :: CSE 610 Parallel Computer Architectures Overview Data Parallelism vs. Control Parallelism Data Parallelism: parallelism

1.33k views • 59 slides

Exploiting CUDA Dynamic Parallelism for low power ARM based prototypes Vishal Mehta Engineer,

www.bsc.es Exploiting CUDA Dynamic Parallelism for low power ARM based prototypes Vishal Mehta Engineer, Barcelona Supercomputing Center vishal.mehta@bsc.es BSC/UPC CUDA Centre of Excellence (CCOE) Training Build an education program on

901 views • 31 slides

Lecture 3.3 CUDA Parallelism Model Color-to-Grayscale Image Processing Example Objective

GPU Teaching Kit GPU Teaching Kit Accelerated Computing Lecture 3.3 CUDA Parallelism Model Color-to-Grayscale Image Processing Example Objective To gain deeper understanding of multi-dimensional grid kernel configurations through a

1.18k views • 9 slides

PRNG REQUIREMENTS AND THE NEW MIXMAX PRNG J. Apostolakis 1 Overview What do we need from

PRNG REQUIREMENTS AND THE NEW MIXMAX PRNG J. Apostolakis 1 Overview What do we need from a PRNG? What makes a good PRNG? What Dynamical Systems tell us The MIXMAX generator Properties, speed Availability 2

868 views • 20 slides

16 While Paul was waiting for them in Athens, he was greatly distressed to see that the city was

16 While Paul was waiting for them in Athens, he was greatly distressed to see that the city was full of idols. 17 So he reasoned in the synagogue with both Jews and God-fearing Greeks, as well as in the marketplace day by day with those who

443 views • 29 slides

Get Rich, Lucky Bitch! Denise Duffield-Thomas 7 Ways To Increase Your Abundance Tonight is

Get Rich, Lucky Bitch! Denise Duffield-Thomas 7 Ways To Increase Your Abundance Tonight is About... Reminding you and re-inspiring you about your business dreams Sparking new insights into your journey of abundance Being in the

747 views • 38 slides

Investments in U.S. REITs & REIT Management May 2014 Corporate Overview HKSE-listed

Investments in U.S. REITs & REIT Management May 2014 Corporate Overview HKSE-listed Heng Fai Enterprises (HFE; Stock code: 185 ) is led by HK-born Singaporean, Mr. Chan Heng Fai (Fai Chan) Until 2012, HK listco was

710 views • 18 slides

Analysis on Keller-Segel Models in Chemotaxis Li CHEN Universit at Mannheim 03.2019, Potsdam

Keller-Segel model with linear diffusion Keller-Segel equation with nonlinear diffusion in multi-D Keller-Segel model with nonlinear nonlocal reaction Analysis on Keller-Segel Models in Chemotaxis Li CHEN Universit at Mannheim 03.2019,

432 views • 26 slides

M6S2 - P-values Professor Jarad Niemi STAT 226 - Iowa State University October 30, 2018

M6S2 - P-values Professor Jarad Niemi STAT 226 - Iowa State University October 30, 2018 Professor Jarad Niemi (STAT226@ISU) M6S2 - P-values October 30, 2018 1 / 18 Outline Review of statistical hypotheses Null vs alternative One-sided vs

489 views • 18 slides

Expedient Non-Malleability Notions for Hash Functions CT-RSA 2011 Paul Baecher, Marc Fischlin,

Expedient Non-Malleability Notions for Hash Functions CT-RSA 2011 Paul Baecher, Marc Fischlin, Dominique Schr oder Darmstadt University of Technology, supported by DFG Emmy Noether Program Introduction: Non-Malleability Introduced

819 views • 22 slides

Supergravity in Phenomenology and Cosmology Supergravity in Phenomenology and Cosmology CMSSM -

Supergravity in Phenomenology and Cosmology Supergravity in Phenomenology and Cosmology CMSSM - supergravity inspired m 0 , m 1/2 , A 0 , tan Supergravity in Phenomenology and Cosmology CMSSM - supergravity inspired m 0 , m 1/2 , A 0 , tan

399 views • 13 slides

Lecture 3.2 CUDA Parallelism Model Multidimensional Kernel - PowerPoint PPT Presentation

GPU Teaching Kit GPU Teaching Kit GPU Teaching Kit Accelerated Computing Lecture 3.2 CUDA Parallelism Model Multidimensional Kernel Configuration Objective To understand multidimensional Grids Multi-dimensional block and thread

Outline Overview Parallel Computing with GPU Introduction to CUDA CUDA Thread Model

Hardware Parallelism vs. Software Parallelism USENIX Workshop on Hot Topics in Parallelism March

Lecture 2.1 - Introduction to CUDA C CUDA C vs. Thrust vs. CUDA Libraries Objective To learn

Introduction to CUDA C What is CUDA? CUDA Architecture Expose general-purpose GPU

CUDA/Ada An Ada binding to CUDA Reto B urki, Adrian-Ken R uegsegger University of Applied

Chapter 17: Parallel Databases Introduction I/O Parallelism Interquery Parallelism

Lecture 2.4 Introduction to CUDA C Introduction to the CUDA Toolkit Objective To become

Advanced OpenMP Lecture 6: Nested parallelism Nested parallelism Nested parallelism is

CSCI341 Lecture 37, Introduction to Parallelism PIPELINING Exploits potential parallelism

Module 3.1 - CUDA Parallelism Model Kernel-Based SPMD Parallel Programming Objective To

A High-Level Intro to CUDA CS5220 Fall 2015 What is CUDA? C ompute U nified D evice A

GPU Programming Alan Gray EPCC The University of Edinburgh Overview Motivation and need

Pervasive Parallelism Laboratory Stanford University ppl.stanford.edu Make parallelism

Data-Level Parallelism Nima Honarmand Fall 2015 :: CSE 610 Parallel Computer Architectures

Exploiting CUDA Dynamic Parallelism for low power ARM based prototypes Vishal Mehta Engineer,

Lecture 3.3 CUDA Parallelism Model Color-to-Grayscale Image Processing Example Objective

PRNG REQUIREMENTS AND THE NEW MIXMAX PRNG J. Apostolakis 1 Overview What do we need from

16 While Paul was waiting for them in Athens, he was greatly distressed to see that the city was

Get Rich, Lucky Bitch! Denise Duffield-Thomas 7 Ways To Increase Your Abundance Tonight is

Investments in U.S. REITs & REIT Management May 2014 Corporate Overview HKSE-listed

Analysis on Keller-Segel Models in Chemotaxis Li CHEN Universit at Mannheim 03.2019, Potsdam

M6S2 - P-values Professor Jarad Niemi STAT 226 - Iowa State University October 30, 2018

Expedient Non-Malleability Notions for Hash Functions CT-RSA 2011 Paul Baecher, Marc Fischlin,

Supergravity in Phenomenology and Cosmology Supergravity in Phenomenology and Cosmology CMSSM -

Sambuz

Useful Links

Newsletter

Mail Us

Lecture 3.2 CUDA Parallelism Model Multidimensional Kernel - PowerPoint PPT Presentation

GPU Teaching Kit GPU Teaching Kit GPU Teaching Kit Accelerated Computing Lecture 3.2 CUDA Parallelism Model Multidimensional Kernel Configuration Objective To understand multidimensional Grids Multi-dimensional block and thread

Outline Overview Parallel Computing with GPU Introduction to CUDA CUDA Thread Model

Hardware Parallelism vs. Software Parallelism USENIX Workshop on Hot Topics in Parallelism March

Lecture 2.1 - Introduction to CUDA C CUDA C vs. Thrust vs. CUDA Libraries Objective To learn

Introduction to CUDA C What is CUDA? CUDA Architecture Expose general-purpose GPU

CUDA/Ada An Ada binding to CUDA Reto B urki, Adrian-Ken R uegsegger University of Applied

Chapter 17: Parallel Databases Introduction I/O Parallelism Interquery Parallelism

Lecture 2.4 Introduction to CUDA C Introduction to the CUDA Toolkit Objective To become

Advanced OpenMP Lecture 6: Nested parallelism Nested parallelism Nested parallelism is

CSCI341 Lecture 37, Introduction to Parallelism PIPELINING Exploits potential parallelism

Module 3.1 - CUDA Parallelism Model Kernel-Based SPMD Parallel Programming Objective To

A High-Level Intro to CUDA CS5220 Fall 2015 What is CUDA? C ompute U nified D evice A

GPU Programming Alan Gray EPCC The University of Edinburgh Overview Motivation and need

Pervasive Parallelism Laboratory Stanford University ppl.stanford.edu Make parallelism

Data-Level Parallelism Nima Honarmand Fall 2015 :: CSE 610 Parallel Computer Architectures

Exploiting CUDA Dynamic Parallelism for low power ARM based prototypes Vishal Mehta Engineer,

Lecture 3.3 CUDA Parallelism Model Color-to-Grayscale Image Processing Example Objective

PRNG REQUIREMENTS AND THE NEW MIXMAX PRNG J. Apostolakis 1 Overview What do we need from

16 While Paul was waiting for them in Athens, he was greatly distressed to see that the city was

Get Rich, Lucky Bitch! Denise Duffield-Thomas 7 Ways To Increase Your Abundance Tonight is

Investments in U.S. REITs &amp; REIT Management May 2014 Corporate Overview HKSE-listed

Analysis on Keller-Segel Models in Chemotaxis Li CHEN Universit at Mannheim 03.2019, Potsdam

M6S2 - P-values Professor Jarad Niemi STAT 226 - Iowa State University October 30, 2018

Expedient Non-Malleability Notions for Hash Functions CT-RSA 2011 Paul Baecher, Marc Fischlin,

Supergravity in Phenomenology and Cosmology Supergravity in Phenomenology and Cosmology CMSSM -

Sambuz

Useful Links

Newsletter

Mail Us

Investments in U.S. REITs & REIT Management May 2014 Corporate Overview HKSE-listed