 
              ACCELERATED COMPUTING WITH NVIDIA GPUS Jesse Tetreault, Solutions Architect October 2019
ACCELERATED COMPUTING 2
NVIDIA “THE AI COMPUTING COMPANY” GPU Computing Computer Graphics Artificial Intelligence 3
4
HOW GPU ACCELERATION WORKS Application Code Compute-Intensive Functions Rest of Sequential 5% of Code CPU Code GPU CPU + 5
HOW TO START WITH GPUS 1 Applications 1. Review available GPU- accelerated applications 2 4 3 Compiler Programming Libraries 2. Check for GPU-Accelerated Directives Languages applications and libraries Most Easy to use Easy to Start Performance 3. Add OpenACC Directives for quick acceleration results and Most Portable Most portability Performance Code Flexibility 4. Dive into CUDA for highest CUDA performance and flexibility 6
NVIDIA CUDA-X LIBRARIES Software To Deliver Acceleration For HPC & AI Apps; 500+ New Updates Machine Computational Computational Life Sciences Structural Weather & Geoscience, Numerical Electronic Learning & Physics & Fluid Dynamics & Mechanics Climate Seismology & Analytics Design Deep Learning Chemistry Bioinformatics Imaging Automation 600+ Apps Linear Algebra Parallel Algorithms Signal Processing Deep Learning Machine Learning Visualization CUDA-X HPC & AI 40+ GPU Acceleration Libraries CUDA Desktop Development Data Center Supercomputers GPU-Accelerated Cloud 7
NVIDIA DEEP LEARNING SOFTWARE STACK INFERENCE TRAINING Data TensorRT Data center Management Training Training Trained Neural Data Network Embedded JETPACK SDK Model Assessment Automotive DriveWorks SDK NVIDIA DEEP LEARNING SDK and CUDA 8 developer.nvidia.com/deep-learning-software
NGC: GPU-OPTIMIZED SOFTWARE HUB Ready-to-run GPU Optimized Software, Anywhere 50+ Containers 15+ Model Training Scripts DL, ML, HPC NLP , Image Classification, Object Detection & more NGC 60 Pre-trained Models Industry Workflows On-prem Cloud Hybrid Cloud Multi-cloud NLP , Image Classification, Object Detection Medical Imaging, Intelligent Video 9 & more Analytics
GPU-ACCELERATED DATA SCIENCE PLATFORMS Unparalleled Performance and Productivity ML in the Cloud ML Enthusiast Enterprise Desktop Enterprise Data Center All the top CSPs High-end PCs Individual Workstations Shared Infrastructure for Data Science Teams Max Flexibility Max Performance NVIDIA-Powered NVIDIA GPUs T4 Enterprise DGX Station, GeForce TITAN RTX Data Science DGX-2 / HGX-2 in the Cloud Servers DGX-1 / HGX-1 Workstations Ease of getting Enthusiast PC The ultimate PC Enterprise Standard GPU- Enterprise server, Largest compute started, low/no solution, easy to GPU for data workstation for accelerated data proven 4 or 8-way and memory barrier to entry, acquire, low cost, scientists. Easy to experienced data center configuration, capacity in a Benefit elasticity of great performance acquire, deploy scientists infrastructures modular approach for single node, resources and get started with the world’s scale-up, fastest fastest training experimenting. leading servers multi-GPU & multi- solution node training varies 64 GB Typical GPU Memory depending 22GB 48GB 96GB 128GB-256GB 512GB (system dependent) (4 x 16 GB) on offering varies 2-way 2-way 2-way 4- and 8-way 16-way depending on PCIe 3.0 GPU Fabric NVLink NVLink NVLink NVLink NVSwitch 10 offering
ACCELERATING DATA SCIENCE IN HEALTHCARE 11
DAY IN THE LIFE OF A DATA SCIENTIST 12
CHALLENGES IN DATA SCIENCE Wrangle Data Train Deploy Data Preparation Imaging Genomic Medical Inference Train Evaluate Data Lake ETL Records Claims Dataframe Manipulation Cross Validation Feature Engineering Hyperparameter Tuning Wearables Performance in these two domains is typically a pain point for Data Scientists 13
RAPIDS IN DATA SCIENCE Wrangle Data Train Deploy Data Preparation Imaging Genomic Train Evaluate Medical Inference Data Lake ETL Records Claims cuDF cuML Dataframe Manipulation Cross Validation Feature Engineering Hyperparameter Tuning Wearables Performance in these two domains is typically a pain point for Data Scientists 14
cuML Algorithms 15
Data Processing Evolution Faster data access, less data movement Hadoop Processing, Reading from disk HDFS HDFS HDFS HDFS HDFS Query ETL ML Train Read Write Read Write Read Spark In-Memory Processing 25-100x Improvement Less code HDFS Language flexible Query ETL ML Train Primarily In-Memory Read RAPIDS 50-100x Improvement Same code ML Arrow Query ETL Language flexible Read Train Primarily on GPU 16
BIG DATA EVOLUTION Disk → Memory → GPUs Scalable, but slow due to Faster, by keeping data Keeps data in GPU repeated reads & writes always in host memory memory instead of CPU to disk instead of on disk memory Performance limited by Computations are GPU CPUs accelerated 17
Real Outcomes using Accelerated Machine Learning 18
cuPy Acceleration 19
TRANSFORM GENETICS WITH RAPIDS Personalize Immunotherapy for Cancer Patients GPU Accelerated XGBoost GPU Accelerated cuDF Single V100 Single Node with 2x E5v4 CPUs 0 50 100 150 200 250 0 50 100 150 200 250 10.24X Speedup 18.14X Speedup Users of K-means, PCA, and XGBoost “ We see close to 20x speedup using XGBoost on DGX-1. This helps us significantly improve our personalized immunotherapy and expand our analysis to millions of peptide candidates. ” – Yong Hou, Duty Director of BGI Research 20
Faster Speeds, Real-World Benefits cuIO/cuDF – Load and Data Preparation cuML - XGBoost End-to-End 8762 6148 3925 3221 322 213 Time in seconds (shorter is better) cuIO/cuDF (Load and Data Prep) Data Conversion XGBoost CPU Cluster Configuration DGX Cluster Configuration Benchmark 200GB CSV dataset; Data prep includes CPU nodes (61 GiB memory, 8 vCPUs, 64- 5x DGX-1 on InfiniBand bit platform), Apache Spark network joins, variable transformations 21
Recommend
More recommend