S8483 - Empowering CUDA Developers with Virtual Desktops Tony Foster - PowerPoint PPT Presentation

S8483 - Empowering CUDA Developers with Virtual Desktops Tony Foster – Sr. Advisor, Technical Marketing, Dell EMC VMware vExpert; VMware EUC Champion; VMware Experts Program, BDSEW; NVIDIA vGPU Community Advisor (NGCA) @wonder_nerd www.wondernerd.net V 4.0 Date: 3-24-18 1 #GTC18 #S8483 @wonder_nerd

Agenda 1drnrd.me/blog More Define the Technologies Slides Available at: www.wondernerd.net Why do This? (in 20 minutes) Environment Overview Deployment Testing Questions Resources 2 #GTC18 #S8483 @wonder_nerd

What is CUDA and Virtualization CUDA Virtualization • Provides a development • Takes physical computing environment for creating high resources and divides them performance GPU- up among virtual machines accelerated applications. Virtual GPU (vGPU) • Provides a shared instance of a GPU to a virtual machine, delivering resources of the underling physical GPU to the virtual machine, such as graphics processing or CUDA. 3 #GTC18 #S8483 @wonder_nerd

Why I Did This 1drnrd.me/blog More • Cool part of the job – pushing technology further • Limited resources in my home lab • 1 - P4 GPU • $1/Day power consumption • Happy Wife • Multiple Code Branches • Multiple Projects • Easy to Change OS 4 #GTC18 #S8483 @wonder_nerd

In The Real World Why? Resource Optimization Multiple Workspaces Version Control v3.5 Security Resource Sharing Backup / DR New Workspace Automated Delivery 5 #GTC18 #S8483 @wonder_nerd

Environment Overview 6 #GTC18 #S8483 @wonder_nerd

Requirements • GPU (P4, P40, etc.) • VMware Horizon • Linux VM • NVIDIA CUDA Toolkit • NVIDIA Quadro vDWS, Virtual GPU Software License Important 7 #GTC18 #S8483 @wonder_nerd

My Virtual Environment Virtual Desktops VMware vCenter Server “Lab” Office 8 #GTC18 #S8483 @wonder_nerd

Scaling to the Organization Centralized Virtual Desktops Virtualized Environment Remote Workers Data Lakes VMware Horizon Connection Server 9 #GTC18 #S8483 @wonder_nerd

Hardware Specs 1drnrd.me/lab More • Testing on 2U host • Management environment on separate 1U host • Dual E5-2640 – 6 Core Procs • vCenter Appliance • 64GB of RAM • AD/DNS (Windows 2k8 R2) • NVIDIA P4 @ 384.111 • Jump Box (Windows 2k8 R2) • VMware vSphere 6.5 (Build • NVIDIA GRID License Server 7388607) (CentOS7.1 & Windows 2k8 R2) • vCenter Server Appliance 6.5.0 • vSphere Connection Server (Build 6.5.0.14100) (Windows 2k8 R2) • Horizon View Client running on • VMware Horizon 7.4.0 (Build 7400497) Jump box • Basic Environment Only • Sub-optimal Unsupported Lab Configuration 10 #GTC18 #S8483 @wonder_nerd

VM Specs 1drnrd.me/ubuntu More • CentOS 7.1 (x64) • 4 vCPU • 12GB vRAM • VMware Blast Extreme protocol vGPU Profile • Quadro vDWS P4-4Q • Equal Share Scheduling • CUDA Toolkit 9.0.176 Passthrough • NVIDIA P4 GPU • CUDA Toolkit 9.1.85 Flings 11 #GTC18 #S8483 @wonder_nerd https://labs.vmware.com/flings/horizon-ova-for-ubuntu

Deployment 12 #GTC18 #S8483 @wonder_nerd

Why Horizon/VDI? Traditional VMs User@deepthought~ Virtual Display user@deepthought ~ $ █ User@deepthought~ user@deepthought ~ $ █ Console VM GPU Enabled VMs User@deepthought~ user@deepthought ~ $ █ User@deepthought~ Horizon user@deepthought ~ $ █ VM vGPU 13 #GTC18 #S8483 @wonder_nerd

Why Horizon/VDI? User@deepthought~ Virtual Display user@deepthought ~ $ █ User@deepthought~ user@deepthought ~ $ █ Console VM Virtual Display Console User@deepthought~ user@deepthought ~ $ █ User@deepthought~ Horizon user@deepthought ~ $ █ VM vGPU 14 #GTC18 #S8483 @wonder_nerd

Preparing Hosts & VM GTC17 Session S7349 VMworld Session VMTN6636U 1drnrd.me/S7349 1drnrd.me/VMTN6636U More More 15 #GTC18 #S8483 @wonder_nerd

Licensing Requires NVIDIA Quadro vDWS Examples: • P4 • P4-8Q; P4-4Q; P4-2Q; P4-1Q • P40 • P40-24Q; P40-12Q; P40-8Q; grid_p4-4q • P100 • P100-16Q; P100-8Q • P100C-12Q; P100C-6Q 16 #GTC18 #S8483 @wonder_nerd

Two Parts of a vGPU Memory Streaming Multiprocessor (SM) • “Frame Buffer” • Does the computation • vGPU Profiles RAM RAM RAM DDR5 DDR5 DDR5 17 #GTC18 #S8483 @wonder_nerd

vGPU Profiles Maximum vGPUs Profile Frame Buffer (Mbytes) per Board License Required P40-24Q 24576 1 Quadro vDWS P40-12Q 12288 2 Quadro vDWS P40-8Q 8192 3 Quadro vDWS P40-6Q 6144 4 Quadro vDWS P40-4Q 4096 6 Quadro vDWS P40-3Q 3072 8 Quadro vDWS P40-2Q 2048 12 Quadro vDWS P40-1Q 1024 24 Quadro vDWS = ÷ Frame Buffer GPU Card Memory (24GB) vGPUs per Card 18 #GTC18 #S8483 @wonder_nerd

Scheduling vGPUs 1drnrd.me/GPUQoS More Schedulers impose a limit on GPU processing cycles used by a vGPU, which prevents vGPU-intensive applications running in one VM from affecting the performance of vGPU-light applications running in other VMs. On GPUs based on the Pascal architecture, you can select the vGPU scheduler to use. P40-6Q P40-6Q Default P40-6Q VM1, 17% VM1, 25% No VM, 25% VM3, 33% VM1, 33% VM3, 50% VM2, 33% VM3, 25% VM2, 25% VM2, 33% Best Effort Equal Share Fixed Share 19 #GTC18 #S8483 @wonder_nerd

Configuring Scheduling RmPVMRL Registry Key 1drnrd.me/scheduling More Value Meaning Usage 1. SSH to the ESXi host 0x00 Best Effort Scheduler 2. Issue the following 0x01 Equal Share Scheduler (Default) Enterprise 0x11 1. For all cards on a host: Fixed Share Scheduler Service Provider esxcli system module parameters set -m nvidia -p "NVreg_RegistryDwords=RmPVMRL= <value> “ 2. For individual cards on a host: List the GPUs in the host: lspci | grep NVIDIA 1. Results in: 0000:85:00.0 VGA compatible… 2. Set the policy per card: esxcli system module parameters set -m nvidia \ -p "NVreg_RegistryDwordsPerDevice=pci= <pci-domain:pci- bdf> ;RmPVMRL= <value> [;pci= <pci-domain:pci- bdf> ;RmPVMRL= <value> ][;...]“ 3. Reboot 20 #GTC18 #S8483 @wonder_nerd

vGPU Driver Requirements 1drnrd.me/vCUDAp1 More • Must match between host and VM VM ESXi Host Virtual Machine (Linux)  NVIDIA GPU P40  NVIDIA Virtual GPU P40-8Q  GPU VIB X.Y.Z  GPU Driver X.Y.Z 21 #GTC18 #S8483 @wonder_nerd

Two Methods to Install the CUDA Toolkit RPM/Deb Deploy *.Deb *.RPM VM • NVIDIA CUDA Toolkit Deb/RPM •  CUDA Compatible GPU Virtual Machine (Linux)  NVIDIA Virtual GPU P40-8Q o GPU Driver A.B.C  GPU Driver X.Y.Z .run • NVIDIA CUDA Toolkit (run) • ESXi Host  CUDA Compatible GPU  NVIDIA GPU P40  GPU Driver configurable  GPU VIB X.Y.Z 22 #GTC18 #S8483 @wonder_nerd

CUDA Deployment Overview 1. NVIDIA GPU VIB VIB 2. .run 3. VMware Horizon Agent .sh VM 4. .run 23 #GTC18 #S8483 @wonder_nerd

Get the Right Installer 1drnrd.me/getCUDA More Select appropriate installer 24 #GTC18 #S8483 @wonder_nerd

Using .run to Deploy CUDA Toolkit 1drnrd.me/CUDAguide More 1. Disable Nouveau (varies per OS) 2. Switch runlevel 3 (text mode) – when you do this the virtual console will be functional again until you exit the run level 3. Execute the run file: sudo sh ./cuda_<version>_linux.run 1. Follow the prompts on screen 2. When asked to install the GPU driver enter No (N) , this is the most important part of this process . 3. If you select yes, the file will overwrite the already installed driver with the driver included in the CUDA package 4. Finish answering the prompts and complete the installation of the run file 5. Apply any patches 6. Complete Post-Installation Actions 1. Mandatory Actions 2. Recommended Actions 3. Optional Actions 25 #GTC18 #S8483 @wonder_nerd

CUDA Toolkit Install 26 #GTC18 #S8483 @wonder_nerd

CUDA Toolkit Install - Complete 27 #GTC18 #S8483 @wonder_nerd

Post Installation Steps 1. Add /usr/local/cuda- <version> /bin to the PATH variable: export PATH=/usr/local/cuda- <version> /bin${PATH:+:${PATH}} (Non persistent) 2. We then need to add the 64bit library to the the LD_LIBRARY_PATH variable: export LD_LIBRARY_PATH=/usr/local/cuda- <version> /lib64\${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}} (Non persistent) 3. Install the writable samples cuda-install-samples- <version> .sh <dir> 4. Make the samples: cd ~/NVIDIA_CUDA- <version> _Samples make This can take a while to run, you may want to do this over lunch 5. Reboot your VM 28 #GTC18 #S8483 @wonder_nerd

Validating CUDA Functionality 1drnrd.me/CUDAtest More deviceQuery part of NVIDIA CUDA Samples 29 #GTC18 #S8483 @wonder_nerd

Licensing or Insufficient vGPU Profile … code=46(cudaErrorDevicesUnavailable) … 30 #GTC18 #S8483 @wonder_nerd

Testing 31 #GTC18 #S8483 @wonder_nerd

P4-4Q – MC_EstimatePiP 1drnrd.me/CUDA4Q More Monte Carlo Estimate Pi (with batch PRNG) ========================================= Estimating Pi on GPU (GRID P4-4Q) Single VM Equal Share Scheduling Precision: single Number of sims: 100000 Tolerance: 1.000000e-02 GPU result: 3.136320e+00 Expected: 3.141593e+00 Absolute error: 5.272627e-03 Relative error: 1.678329e-03 MonteCarloEstimatePiP, Performance = 565585.27 sims/s, Time = 176.81(ms), NumDevsUsed = 1, Blocksize = 128 32 #GTC18 #S8483 @wonder_nerd

S8483 - Empowering CUDA Developers with Virtual Desktops Tony Foster - PowerPoint PPT Presentation

S8483 - Empowering CUDA Developers with Virtual Desktops Tony Foster Sr. Advisor, Technical Marketing, Dell EMC VMware vExpert; VMware EUC Champion; VMware Experts Program, BDSEW; NVIDIA vGPU Community Advisor (NGCA) @wonder_nerd

Outline Overview Parallel Computing with GPU Introduction to CUDA CUDA Thread Model

Introduction to CUDA C What is CUDA? CUDA Architecture Expose general-purpose GPU

Lecture 2.1 - Introduction to CUDA C CUDA C vs. Thrust vs. CUDA Libraries Objective To learn

CUDA/Ada An Ada binding to CUDA Reto B urki, Adrian-Ken R uegsegger University of Applied

A High-Level Intro to CUDA CS5220 Fall 2015 What is CUDA? C ompute U nified D evice A

GPU Programming Alan Gray EPCC The University of Edinburgh Overview Motivation and need

Lecture 2.4 Introduction to CUDA C Introduction to the CUDA Toolkit Objective To become

Empowering the Wind to Energize the World Empowering the Wind to Energize the World

Building Bridges to Success: Building Bridges to Success: Empowering American Indian Males

Empowering customers Presentation to Ofwat 17 September 2018 Summary Empowering customers of

S9751: ACCELERATE YOUR CUDA DEVELOPMENT WITH LATEST DEBUGGING AND CODE ANALYSIS DEVELOPER TOOLS

CUDA 7 AND BEYOND MARK HARRIS, NVIDIA CUDA 7 Runtime C++11 cuSOLVER Compilation

SC13 GPU Technology Theater Accessing New CUDA Features from CUDA Fortran Brent Leback, Compiler

CUDA 8 AND BEYOND Mark Harris, April 5, 2016 INTRODUCING CUDA 8 Pascal Support Unified Memory

PerfMon redux: analyzing a CUDA application with the Windows PerfMon redux: analyzing a CUDA

CUDA ON MOBILE Yogesh Kini, GTC 2016 Typical pipeline ABSTRACT CUDA Interop APIs Unified

Governing Board Meeting October 26, 2017 Agenda CISS Project Update CT: CHIEF Hosting

Mohammad ali Bagheri Binary vs. Multiclass Classification Real word applications Class

Welcome - Great to see you all - thanks for coming Only have 10 mins so very

Newport Research Facility The Burrishoole & the Catchment Cluster Open Day Welcome to

Hands on Virtualization with Ganeti OSCON 2011 Setup Guide This setup guide covers installing and

about the so-called Alberta Advantage Gil McGowan President Alberta Federation of Labour

Contr tractor or Perform ormance e Evaluation tion (CPE) E) Consis isten ency in Complet

GETTING INVOLVED: VOLUNTEER OPPORTUNITIES AT CAEP Cole Bowers, Accreditation Associate, CAEP

S8483 - Empowering CUDA Developers with Virtual Desktops Tony Foster - PowerPoint PPT Presentation

S8483 - Empowering CUDA Developers with Virtual Desktops Tony Foster Sr. Advisor, Technical Marketing, Dell EMC VMware vExpert; VMware EUC Champion; VMware Experts Program, BDSEW; NVIDIA vGPU Community Advisor (NGCA) @wonder_nerd

Outline Overview Parallel Computing with GPU Introduction to CUDA CUDA Thread Model

Introduction to CUDA C What is CUDA? CUDA Architecture Expose general-purpose GPU

Lecture 2.1 - Introduction to CUDA C CUDA C vs. Thrust vs. CUDA Libraries Objective To learn

CUDA/Ada An Ada binding to CUDA Reto B urki, Adrian-Ken R uegsegger University of Applied

A High-Level Intro to CUDA CS5220 Fall 2015 What is CUDA? C ompute U nified D evice A

GPU Programming Alan Gray EPCC The University of Edinburgh Overview Motivation and need

Lecture 2.4 Introduction to CUDA C Introduction to the CUDA Toolkit Objective To become

Empowering the Wind to Energize the World Empowering the Wind to Energize the World

Building Bridges to Success: Building Bridges to Success: Empowering American Indian Males

Empowering customers Presentation to Ofwat 17 September 2018 Summary Empowering customers of

S9751: ACCELERATE YOUR CUDA DEVELOPMENT WITH LATEST DEBUGGING AND CODE ANALYSIS DEVELOPER TOOLS

CUDA 7 AND BEYOND MARK HARRIS, NVIDIA CUDA 7 Runtime C++11 cuSOLVER Compilation

SC13 GPU Technology Theater Accessing New CUDA Features from CUDA Fortran Brent Leback, Compiler

CUDA 8 AND BEYOND Mark Harris, April 5, 2016 INTRODUCING CUDA 8 Pascal Support Unified Memory

PerfMon redux: analyzing a CUDA application with the Windows PerfMon redux: analyzing a CUDA

CUDA ON MOBILE Yogesh Kini, GTC 2016 Typical pipeline ABSTRACT CUDA Interop APIs Unified

Governing Board Meeting October 26, 2017 Agenda CISS Project Update CT: CHIEF Hosting

Mohammad ali Bagheri Binary vs. Multiclass Classification Real word applications Class

Welcome - Great to see you all - thanks for coming Only have 10 mins so very

Newport Research Facility The Burrishoole &amp; the Catchment Cluster Open Day Welcome to

Hands on Virtualization with Ganeti OSCON 2011 Setup Guide This setup guide covers installing and

about the so-called Alberta Advantage Gil McGowan President Alberta Federation of Labour

Contr tractor or Perform ormance e Evaluation tion (CPE) E) Consis isten ency in Complet

GETTING INVOLVED: VOLUNTEER OPPORTUNITIES AT CAEP Cole Bowers, Accreditation Associate, CAEP

Newport Research Facility The Burrishoole & the Catchment Cluster Open Day Welcome to