lecture 2 2 introduction to cuda c
play

Lecture 2.2 - Introduction to CUDA C Memory Allocation and Data - PowerPoint PPT Presentation

GPU Teaching Kit Accelerated Computing Lecture 2.2 - Introduction to CUDA C Memory Allocation and Data Movement API Functions Objective To learn the basic API functions in CUDA host code Device Memory Allocation Host-Device Data


  1. GPU Teaching Kit Accelerated Computing Lecture 2.2 - Introduction to CUDA C Memory Allocation and Data Movement API Functions

  2. Objective – To learn the basic API functions in CUDA host code – Device Memory Allocation – Host-Device Data Transfer 2

  3. Data Parallelism - Vector Addition Example A[0] A[1] A[2] A[N-1] vector A … vector B B[0] B[1] B[2] … B[N-1] + + + + C[0] C[1] C[2] C[N-1] vector C … 3 3

  4. Vector Addition – Traditional C Code // Compute vector sum C = A + B void vecAdd(float *h_A, float *h_B, float *h_C, int n) { int i; for (i = 0; i<n; i++) h_C[i] = h_A[i] + h_B[i]; } int main() { // Memory allocation for h_A, h_B, and h_C // I/O to read h_A and h_B, N elements … vecAdd(h_A, h_B, h_C, N); } 4 4

  5. Heterogeneous Computing vecAdd CUDA Host Code Part 1 #include <cuda.h> Device Memory Host Memory Part 2 void vecAdd(float *h_A, float *h_B, float *h_C, int n) { CPU GPU int size = n* sizeof(float); float *d_A, *d_B, *d_C; // Part 1 Part 3 // Allocate device memory for A, B, and C // copy A and B to device memory // Part 2 // Kernel launch code – the device performs the actual vector addition // Part 3 // copy C from the device memory // Free device vectors } 5 5

  6. Partial Overview of CUDA Memories – Device code can: – R/W per-thread registers (Device) Grid – R/W all-shared global Block (0, 0) Block (0, 1) memory Registers Registers Registers Registers Thread (0, 0) Thread (0, 1) Thread (0, 1) Thread (0, 0) – Host code can – Transfer data to/from per Host Global grid global memory Memory We will cover more memory t ypes and more sophist icated memory models lat er. 6 6

  7. CUDA Device Memory Management API functions – cudaMalloc() – Allocates an object in the device (Device) Grid global memory Block (0, 0) Block (0, 1) – Two parameters – Address of a pointe r to the Registers Registers Registers Registers allocated object Thread (0, 0) Thread (0, 1) Thread (0, 1) Thread (0, 0) – Size of allocated object in terms of bytes Host Global – cudaFree() Memory – Frees object from device global memory – One parameter – Pointer to freed object 7

  8. Host-Device Data Transfer API functions – cudaMemcpy() – memory data transfer (Device) Grid – Requires four parameters Block (0, 0) Block (0, 1) – Pointer to destination Registers Registers Registers Registers – Pointer to source Thread (0, 0) Thread (0, 1) Thread (0, 1) Thread (0, 0) – Number of bytes copied – Type/Direction of transfer Host Global Memory – Transfer to device is asynchronous 8

  9. Vector Addition Host Code void vecAdd(float *h_A, float *h_B, float *h_C, int n) { int size = n * sizeof(float); float *d_A, *d_B, *d_C; cudaMalloc((void **) &d_A, size); cudaMemcpy(d_A, h_A, size, cudaMemcpyHostToDevice); cudaMalloc((void **) &d_B, size); cudaMemcpy(d_B, h_B, size, cudaMemcpyHostToDevice); cudaMalloc((void **) &d_C, size); // Kernel invocation code – to be shown later cudaMemcpy(h_C, d_C, size, cudaMemcpyDeviceToHost); cudaFree(d_A); cudaFree(d_B); cudaFree (d_C); } 9 9

  10. In Practice, Check for API Errors in Host Code cudaError_t err = cudaMalloc((void **) &d_A, size); if (err != cudaSuccess) { printf(“%s in %s at line %d\n”, cudaGetErrorString(err), __FILE__, __LINE__); exit(EXIT_FAILURE); } 10 10

  11. GPU Teaching Kit Accelerated Computing The GPU Teaching Kit is licensed by NVIDIA and the University of Illinois under the Creative Commons Attribution-NonCommercial 4.0 International License.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend