CUDA ON MOBILE Yogesh Kini, GTC 2016 Typical pipeline ABSTRACT - PowerPoint PPT Presentation

April 4-7, 2016 | Silicon Valley CUDA ON MOBILE Yogesh Kini, GTC 2016

Typical pipeline ABSTRACT CUDA Interop APIs Unified Memory on Tegra 2

TYPICAL USE CASES Automobiles: Autonomous Cars Mobile Devices: Consoles, Tablets Embedded: Drones, Robots, Smart-Surveillance 3

TYPICAL PIPELINE Graphics/ Camera Graphics Camera Display ISP/DSP ISP/DSP CUDA Actuators Sensor Actuators Sensor CUDA CAPTURE PROCESS DISPLAY 4 4/1/2016

CUDA OPENGL(ES) INTEROP 5

CUDA – OPENGL(ES) Provide access to OpenGL-ES resources in • CUDA Support for EGL • Supported on Android, L4T, Vibrante- • Linux, QNX Implicit synchronization support • Useful for graphics applications and • games 6 4/1/2016

EGL IMAGE INTEROP 7

EGL IMAGE Source for EGL image GStreamer • OpenGL ES • OpenMAX • Android - GraphicBuffer • Khronos EGL_image_base: https://www.khronos.org/registry/egl/extensions/KHR/EGL_KHR_image_base.txt 8 4/1/2016

EGL IMAGE cudaArray cudaDevicePointer EGLimage cuGraphicsEGLRegisterImage() cuGraphicsResourceGetMappedPointer() cuGraphicsResourceGetMappedArray() Begin resource Begin resource usage in Other API Usage in CUDA Other API code CUDA code End resource End resource Usage in Other API Usage in CUDA synchronize 9 4/1/2016

EGL STREAMS INTEROP 10

EGL STREAMS Producer-Consumer architecture • EGL streams spec: https://www.khronos.org/registry/egl/extensions/KHR/EGL_KHR_stream.txt • Implicit Synchronization • Cross Process support • • Supports YUV formats cuDNN cuBLAS CUDA CUDA ISP EGL stream Producer Consumer Producer Visionworks CUDA OpenGL EGL stream Consumer 11 4/1/2016

EGL STREAMS CUDA CUDA EGL Stream cuEGLStreamConsumerConnect() cuEGLStreamProducerConnect() Consumer Producer Frame 1 cuEGLStreamConsumerAcquireFrame(frame) cuEGLStreamProducerPresentFrame(frame) Frame 2 Use Frame in CUDA 3 cuEGLStreamConsumerReleaseFrame(frame) Frame cuEGLStreamProducerReturnFrame(frame) 4 Frame 12

INTEROP SUMMARY EGL STREAMS EGL IMAGE CUDA-OPENGL • Producer-Consumer • Easy setup • EGL support • Implicit- • Works with several EGL • OpenGL-ES support Synchronization client API Portable across Tegra • Cross-Process support YUV Planar Image and discrete GPU • • support YUV Planar Image • support 13 4/1/2016

CUDA UNIFIED MEMORY ON TEGRA • Helps take advantage of unified DRAM on Tegra Easier to program, Unified allocator: cudaMallocManaged • TEGRA • Programming model enforced through memory access CPU GPU protection Memcpy not needed, migration managed by CUDA driver • Memory- DRAM Saves memory consumption and power • Attach API will help achieve optimal performance • 14 4/1/2016

CUDA MEMORY TYPES Traditional Zero Copy Unified Memory malloc() Allocate cudaMallocHost() cudaMallocManaged() cudaMalloc() CPU use CPU use CPU use CPU use Migrate NA cudaMemcpyHtoD () cudaMemAttach[Optional] CUDA kernel Kernel_launch<<<>>>() Kernel_launch<<<>>>() Kernel_launch<<<>>>() Migrate cudaMemcpyDtoH () NA cudaMemAttach[Optional] CPU use CPU use CPU use CPU use 15

CUDA MEMORY TYPES Traditional Zero Copy Managed • Easy portability from • Cache is bypassed by • Memory access by CPU existing desktop both GPU and CPU and GPU is through programs while accessing these cache. allocations Faster for some small Faster for larger • • allocations Suitable when memory allocations • access is not affected by caching • Suitable for GPU • Suitable when memory intermediate buffers, used on both host and tables, etc GPU Time taken(ms) by the Matrix Multiply CUDA kernel with different allocation types: TRADITIONAL ZERO COPY MANAGED MEMORY 0.617 16KB 0.544 0.644 9.723 11.119 7.093 1MB 4MB 59.37618 62.232 46.42551 16MB 377.9244 403.2382 344.926 16

April 4-7, 2016 | Silicon Valley THANK YOU JOIN THE NVIDIA DEVELOPER PROGRAM AT developer.nvidia.com/join

CUDA ON MOBILE Yogesh Kini, GTC 2016 Typical pipeline ABSTRACT - PowerPoint PPT Presentation

April 4-7, 2016 | Silicon Valley CUDA ON MOBILE Yogesh Kini, GTC 2016 Typical pipeline ABSTRACT CUDA Interop APIs Unified Memory on Tegra 2 TYPICAL USE CASES Automobiles: Autonomous Cars Mobile Devices: Consoles, Tablets Embedded: Drones,

Outline Overview Parallel Computing with GPU Introduction to CUDA CUDA Thread Model

Introduction to CUDA C What is CUDA? CUDA Architecture Expose general-purpose GPU

Lecture 2.1 - Introduction to CUDA C CUDA C vs. Thrust vs. CUDA Libraries Objective To learn

CUDA/Ada An Ada binding to CUDA Reto B urki, Adrian-Ken R uegsegger University of Applied

A High-Level Intro to CUDA CS5220 Fall 2015 What is CUDA? C ompute U nified D evice A

GPU Programming Alan Gray EPCC The University of Edinburgh Overview Motivation and need

Lecture 2.4 Introduction to CUDA C Introduction to the CUDA Toolkit Objective To become

MOBILE ADVERTISING Agenda Get off to a mobile start with Media Impact! Why mobile? MI

S9751: ACCELERATE YOUR CUDA DEVELOPMENT WITH LATEST DEBUGGING AND CODE ANALYSIS DEVELOPER TOOLS

CUDA 7 AND BEYOND MARK HARRIS, NVIDIA CUDA 7 Runtime C++11 cuSOLVER Compilation

SC13 GPU Technology Theater Accessing New CUDA Features from CUDA Fortran Brent Leback, Compiler

CUDA 8 AND BEYOND Mark Harris, April 5, 2016 INTRODUCING CUDA 8 Pascal Support Unified Memory

PerfMon redux: analyzing a CUDA application with the Windows PerfMon redux: analyzing a CUDA

Approaches to GPU computing Manuel Ujaldon Nvidia CUDA Fellow Computer Architecture Department

Plan Optimizing Matrix Transpose with CUDA 1 CS4402-9535: High-Performance Computing with CUDA

Shader Programming Shader Programming vs CUDA vs CUDA Tien-Tsin Wong The Chinese University of

Provide an extra layer of scrutiny on a local level to land sales or redevelopment efforts

Orange Belgium Q3 2018 Financial Results October 24, 2018 1 Disclaimer This presentation

Mobile Credit Scoring: Powering Consumer Finance in Emerging Markets SUMMARY Credit Scoring

Family Dental Plan Community Partnered Mobile Dental Services Family Dental Plan was started in

Mobile Learning The Good, The Bad, and the Ugly April 2014 ATMC 2014 2 The Opportunities

PG&E Safety Reporting Mobile App Pilot Workshop 1 A.19-07-019 Jeremy Battis Risk

ARIVU: Power-Aware Middleware for Multiplayer Mobile Games Bhojan Anand , Karthik

Q3 Q3 2017 Presenters GUSTAF VIKTOR HAGMAN FRITZN Group CEO and Co-founder Group CFO 2

CUDA ON MOBILE Yogesh Kini, GTC 2016 Typical pipeline ABSTRACT - PowerPoint PPT Presentation

April 4-7, 2016 | Silicon Valley CUDA ON MOBILE Yogesh Kini, GTC 2016 Typical pipeline ABSTRACT CUDA Interop APIs Unified Memory on Tegra 2 TYPICAL USE CASES Automobiles: Autonomous Cars Mobile Devices: Consoles, Tablets Embedded: Drones,

Outline Overview Parallel Computing with GPU Introduction to CUDA CUDA Thread Model

Introduction to CUDA C What is CUDA? CUDA Architecture Expose general-purpose GPU

Lecture 2.1 - Introduction to CUDA C CUDA C vs. Thrust vs. CUDA Libraries Objective To learn

CUDA/Ada An Ada binding to CUDA Reto B urki, Adrian-Ken R uegsegger University of Applied

A High-Level Intro to CUDA CS5220 Fall 2015 What is CUDA? C ompute U nified D evice A

GPU Programming Alan Gray EPCC The University of Edinburgh Overview Motivation and need

Lecture 2.4 Introduction to CUDA C Introduction to the CUDA Toolkit Objective To become

MOBILE ADVERTISING Agenda Get off to a mobile start with Media Impact! Why mobile? MI

S9751: ACCELERATE YOUR CUDA DEVELOPMENT WITH LATEST DEBUGGING AND CODE ANALYSIS DEVELOPER TOOLS

CUDA 7 AND BEYOND MARK HARRIS, NVIDIA CUDA 7 Runtime C++11 cuSOLVER Compilation

SC13 GPU Technology Theater Accessing New CUDA Features from CUDA Fortran Brent Leback, Compiler

CUDA 8 AND BEYOND Mark Harris, April 5, 2016 INTRODUCING CUDA 8 Pascal Support Unified Memory

PerfMon redux: analyzing a CUDA application with the Windows PerfMon redux: analyzing a CUDA

Approaches to GPU computing Manuel Ujaldon Nvidia CUDA Fellow Computer Architecture Department

Plan Optimizing Matrix Transpose with CUDA 1 CS4402-9535: High-Performance Computing with CUDA

Shader Programming Shader Programming vs CUDA vs CUDA Tien-Tsin Wong The Chinese University of

Provide an extra layer of scrutiny on a local level to land sales or redevelopment efforts

Orange Belgium Q3 2018 Financial Results October 24, 2018 1 Disclaimer This presentation

Mobile Credit Scoring: Powering Consumer Finance in Emerging Markets SUMMARY Credit Scoring

Family Dental Plan Community Partnered Mobile Dental Services Family Dental Plan was started in

Mobile Learning The Good, The Bad, and the Ugly April 2014 ATMC 2014 2 The Opportunities

PG&amp;E Safety Reporting Mobile App Pilot Workshop 1 A.19-07-019 Jeremy Battis Risk

ARIVU: Power-Aware Middleware for Multiplayer Mobile Games Bhojan Anand , Karthik

Q3 Q3 2017 Presenters GUSTAF VIKTOR HAGMAN FRITZN Group CEO and Co-founder Group CFO 2

PG&E Safety Reporting Mobile App Pilot Workshop 1 A.19-07-019 Jeremy Battis Risk