Lecture 1.1 Course Introduction Course Introduction and Overview - PowerPoint PPT Presentation

GPU Teaching Kit GPU Teaching Kit Accelerated Computing Lecture 1.1 – Course Introduction Course Introduction and Overview

Course Goals – Learn how to program heterogeneous parallel computing systems and achieve – High performance and energy-efficiency – Functionality and maintainability – Scalability across future generations – Portability across vendor devices – Technical subjects – Parallel programming API, tools and techniques – Principles and patterns of parallel algorithms – Processor architecture features and constraints 2 2

People – Wen-mei Hwu (University of Illinois) – David Kirk (NVIDIA) – Joe Bungo (NVIDIA) – Mark Ebersole (NVIDIA) – Abdul Dakkak (University of Illinois) – Izzat El Hajj (University of Illinois) – Andy Schuh (University of Illinois) – John Stratton (Colgate College) – Isaac Gelado (NVIDIA) – John Stone (University of Illinois) – Javier Cabezas (NVIDIA) – Michael Garland (NVIDIA) 3

Course Content • Course Introduction and Overview Module 1 • Introduction to Heterogeneous Parallel Computing Course Introduction • Portability and Scalability in Heterogeneous Parallel Computing • CUDA C vs. CUDA Libs vs. OpenACC Module 2 • Memory Allocation and Data Movement API Functions • Data Parallelism and Threads Introduction to CUDA C • Introduction to CUDA Toolkit Kernel -Based SPMD Parallel Programming • • Multidimensional Kernel Configuration Module 3 CUDA Parallelism Model • Color-to-Greyscale Image Processing Example • Blur Image Processing Example CUDA Memories • Tiled Matrix Multiplication • Module 4 Tiled Matrix Multiplication Kernel • Memory Model and Locality Handling Boundary Conditions in Tiling • Tiled Kernel for Arbitrary Matrix Dimensions • • Histogram (Sort) Example Module 5 Basic Matrix -Matrix Multiplication Example • Kernel-based Parallel Thread Scheduling • Programming • Control Divergence 4

Course Content Module 6 • DRAM Bandwidth Performance Considerations: Memory Coalescing in CUDA • Memory Module 7 • Atomic Operations Atomic Operations • Convolution Module 8 Parallel Computation Patterns • Tiled Convolution (Part 1) • 2D Tiled Convolution Kernel Module 9 • Tiled Convolution Analysis Parallel Computation Patterns • Data Reuse in Tiled Convolution (Part 2) • Reduction Module 10 Performance Considerations: • Basic Reduction Kernel Parallel Computation Patterns • Improved Reduction Kernel • Scan (Parallel Prefix Sum) Module 11 • Work-Inefficient Parallel Scan Kernel Parallel Computation Patterns • Work-Efficient Parallel Scan Kernel (Part 3) • More on Parallel Scan 5

Course Content • Scan Applications: Per-thread Output Variable Allocation Module 12 • Scan Applications: Radix Sort Performance Considerations: Scan • Performance Considerations (Histogram (Atomics) Example) Applications • Performance Considerations (Histogram (Scan) Example) • Advanced CUDA Memory Model Module 13 • Constant Memory Advanced CUDA Memory Model • Texture Memory Module 14 • Floating Point Precision Considerations • Numerical Stability Floating Point Considerations Module 15 • GPU as part of the PC Architecture GPU as part of the PC Architecture • Data Movement API vs. Unified Memory Module 16 • Pinned Host Memory Efficient Host-Device Data • Task Parallelism/CUDA Streams Transfer • Overlapping Transfer with Computation Module 17 • Advanced MRI Reconstruction Application Case Study: Advanced MRI Reconstruction Module 18 • Electrostatic Potential Calculation (Part 1) Application Case Study: • Electrostatic Potential Calculation (part 2) Electrostatic Potential Calculation 6

Course Content Module 19 • Computational Thinking for Parallel Programming Computational Thinking For Parallel Programming • Joint MPI-CUDA Programming • Joint MPI-CUDA Programming (Vector Addition - Main Function) Module 20 • Joint MPI-CUDA Programming (Message Passing and Barrier) Related Programming Models: MPI (Data Server and Compute Processes) • Joint MPI-CUDA Programming (Adding CUDA) • Joint MPI-CUDA Programming (Halo Data Exchange) Module 21 • CUDA Python using Numba CUDA Python Using Numba • OpenCL Data Parallelism Model Module 22 • OpenCL Device Architecture Related Programming Models: • OpenCL Host Code (Part 1) OpenCL • OpenCL Host Code (Part 2) Module 23 • Introduction to OpenACC Related Programming Models: • OpenACC Subtleties OpenACC Module 24 Related Programming Models: • OpenGL and CUDA Interoperability OpenGL 7

Course Content Module 25 • Effective use of Dynamic Parallelism • Advanced Architectural Features: Hyper-Q Dynamic Parallelism Module 26 • Multi-GPU Multi-GPU • Example Applications Using Libraries: CUBLAS Module 27 • Example Applications Using Libraries: CUFFT Using CUDA Libraries • Example Applications Using Libraries: CUSOLVER Module 28 • Advanced Thrust Advanced Thrust Module 29 Other GPU Development • Other GPU Development Platforms: QwickLABS Platforms: QwickLABS Where to Find Support 8

GPU Teaching Kit GPU Teaching Kit Accelerated Computing The GPU Teaching Kit is licensed by NVIDIA and the University of Illinois under the Creative Commons Attribution-NonCommercial 4.0 International License.

Lecture 1.1 Course Introduction Course Introduction and Overview - PowerPoint PPT Presentation

GPU Teaching Kit GPU Teaching Kit Accelerated Computing Lecture 1.1 Course Introduction Course Introduction and Overview Course Goals Learn how to program heterogeneous parallel computing systems and achieve High performance and

Course Orientation q Course Description q Course Outcomes q Course Requirements q Course Outline

Malaysian Healthy Ageing Society Plenary Lecture Plenary Lecture Plenary Lecture Plenary

BIOE 301/362 Lecture One Overview of Lecture 1 Course Overview: Course organization

Welcome to CS 61A About the Course Parts of the Course 4 Parts of the Course Lecture : Videos

Course Search Widget Topics StudyLink Course Search Widget Demo Generic Course Search

Course Specifications/Detailed Course Outline Course code : STA 331 2.0 Course title :

Course Home Page Course Design Course Structure main source reading-intensive course

BIOE 301/362 Course organization Four questions we will answer Course project Lecture

Statistics II Xavier Vil Course 2004-2005 1.- Course Contents 2.- Course Resources 3.-

Lecture Outline 1. Course summary 2. Beyond the course DD2452 Formal Methods 3. Exam

DPD Basic Bicycle Course Course Objectives COURSE GOAL: The course will provide the trainee with

CANVAS COURSE PROFILE STUDENT PERFORMANCE COURSE OVERVIEW ASSIGNMENT AND SUBMISSION ANALYSIS

Leadplane Training Course Leadplane Training Course Course Objectives Describe procedures for

ARM Microcontroller Course June 3, 2015 ARM Microcontroller Course The Course Direct Digital

Level 1, V2.0 Level 1, V2.0 1 Course Contents Course Contents Course Contents Course

CISC883: LECTURE 1 INTRODUCTION TO ULSS Cor-Paul Bezemer 2 Todays lecture Course

North Texas Municipal Water District Regional Service Through Unity Meeting Our Regions

U s i n g S i m u l a t e d E x e c u t i o n i n V e r i f y i n

Simplification Stolen from various places The Problem of Detail Graphics systems are

your male or female servant, nor your ox, your donkey or any of your animals, nor any foreigner

Toasting the real world What makes Toaster work Beln Barros Pena - ELC 2016 Yocto Project |

HORST RITTEL (1930-1990) BY GARRET CREE AND KEVIN JACOBS Connecting Science & Design

Introduction Good afternoon everyone! Thank you for having me, my name is Garret Rempel. I am a

Natural Language Processing Lecture 143/2/2015 Martha Palmer Today Start on Parsing

Lecture 1.1 Course Introduction Course Introduction and Overview - PowerPoint PPT Presentation

GPU Teaching Kit GPU Teaching Kit Accelerated Computing Lecture 1.1 Course Introduction Course Introduction and Overview Course Goals Learn how to program heterogeneous parallel computing systems and achieve High performance and

Course Orientation q Course Description q Course Outcomes q Course Requirements q Course Outline

Malaysian Healthy Ageing Society Plenary Lecture Plenary Lecture Plenary Lecture Plenary

BIOE 301/362 Lecture One Overview of Lecture 1 Course Overview: Course organization

Welcome to CS 61A About the Course Parts of the Course 4 Parts of the Course Lecture : Videos

Course Search Widget Topics StudyLink Course Search Widget Demo Generic Course Search

Course Specifications/Detailed Course Outline Course code : STA 331 2.0 Course title :

Course Home Page Course Design Course Structure main source reading-intensive course

BIOE 301/362 Course organization Four questions we will answer Course project Lecture

Statistics II Xavier Vil Course 2004-2005 1.- Course Contents 2.- Course Resources 3.-

Lecture Outline 1. Course summary 2. Beyond the course DD2452 Formal Methods 3. Exam

DPD Basic Bicycle Course Course Objectives COURSE GOAL: The course will provide the trainee with

CANVAS COURSE PROFILE STUDENT PERFORMANCE COURSE OVERVIEW ASSIGNMENT AND SUBMISSION ANALYSIS

Leadplane Training Course Leadplane Training Course Course Objectives Describe procedures for

ARM Microcontroller Course June 3, 2015 ARM Microcontroller Course The Course Direct Digital

Level 1, V2.0 Level 1, V2.0 1 Course Contents Course Contents Course Contents Course

CISC883: LECTURE 1 INTRODUCTION TO ULSS Cor-Paul Bezemer 2 Todays lecture Course

North Texas Municipal Water District Regional Service Through Unity Meeting Our Regions

U s i n g S i m u l a t e d E x e c u t i o n i n V e r i f y i n

Simplification Stolen from various places The Problem of Detail Graphics systems are

your male or female servant, nor your ox, your donkey or any of your animals, nor any foreigner

Toasting the real world What makes Toaster work Beln Barros Pena - ELC 2016 Yocto Project |

HORST RITTEL (1930-1990) BY GARRET CREE AND KEVIN JACOBS Connecting Science &amp; Design

Introduction Good afternoon everyone! Thank you for having me, my name is Garret Rempel. I am a

Natural Language Processing Lecture 143/2/2015 Martha Palmer Today Start on Parsing

HORST RITTEL (1930-1990) BY GARRET CREE AND KEVIN JACOBS Connecting Science & Design