cuML: A Library for GPU Accelerated Machine Learning Onur Yilmaz, - PowerPoint PPT Presentation

cuML: A Library for GPU Accelerated Machine Learning Onur Yilmaz, Ph.D. | oyilmaz@nvidia.com | Senior ML/DL Scientist and Engineer Corey Nolet | cnolet@nvidia.com | Data Scientist and Senior Engineer

About Us Onur Yilmaz, Ph.D. Senior ML/DL Scientist and Engineer on the RAPIDS cuML team at NVIDIA Focuses on building single and multi GPU machine learning algorithms to support extreme data loads at light-speed Ph.D. in computer engineering, focusing on ML for finance. Corey Nolet Data Scientist & Senior Engineer on the RAPIDS cuML team at NVIDIA Focuses on building and scaling machine learning algorithms to support extreme data loads at light-speed Over a decade experience building massive-scale exploratory data science & real- time analytics platforms for HPC environments in the defense industry Working towards PhD in Computer Science, focused on unsupervised representation learning 2

• Introduction to cuML • Architecture Overview Agenda • cuML Deep Dive • Benchmarks • cuML Roadmap 3

Introduction “Details are confusing. It is only by selection, by elimination, by emphasis, that we get to the real meaning of things.” ~ Georgia O'Keefe Mother of American Modernism 4

Realities of Data 5

Problem Data sizes continue to grow 6

Problem Data sizes continue to grow 7

Problem Data sizes continue to grow min(variance) min(bias) 8

Problem Data sizes continue to grow Histograms / Distributions Dimension Reduction Feature Selection Remove Outliers Sampling 9

Problem Data sizes continue to grow Histograms / Distributions Dimension Reduction Feature Selection Remove Outliers Sampling 10

Problem Data sizes continue to grow Histograms / Distributions Better to start with as much data as possible and explore / preprocess to scale Dimension Reduction to performance needs. Feature Selection Remove Outliers Sampling 11

Problem Data sizes continue to grow Massive Dataset Histograms / Distributions Better to start with as much data as possible and explore / preprocess to scale Dimension Reduction to performance needs. Feature Selection Remove Outliers Sampling 12

Problem Data sizes continue to grow Massive Dataset Histograms / Distributions Better to start with as much data as possible and explore / preprocess to scale Dimension Reduction to performance needs. Feature Selection Remove Outliers Sampling 13

Problem Data sizes continue to grow Massive Dataset Histograms / Distributions Better to start with as much data as possible and explore / preprocess to scale Dimension Reduction to performance needs. Feature Selection Remove Outliers Iterate. Sampling 14

Problem Data sizes continue to grow Massive Dataset Histograms / Distributions Better to start with as much data as possible and explore / preprocess to scale Dimension Reduction to performance needs. Feature Selection Remove Outliers Iterate. Cross Validate. Sampling 15

Problem Data sizes continue to grow Massive Dataset Histograms / Distributions Better to start with as much data as possible and explore / preprocess to scale Dimension Reduction to performance needs. Feature Selection Remove Outliers Iterate. Cross Validate & Grid Search. Sampling 16

Problem Data sizes continue to grow Massive Dataset Histograms / Distributions Better to start with as much data as possible and explore / preprocess to scale Dimension Reduction to performance needs. Feature Selection Remove Outliers Iterate. Cross Validate & Grid Search. Iterate some more. Sampling 17

Problem Data sizes continue to grow Massive Dataset Histograms / Distributions Better to start with as much data as possible and explore / preprocess to scale Dimension Reduction to performance needs. Feature Selection Remove Outliers Iterate. Cross Validate & Grid Search. Iterate some more. Sampling Meet reasonable speed vs accuracy tradeoff 18

Problem Data sizes continue to grow Massive Dataset Histograms / Distributions Better to start with as much data as possible and explore / preprocess to scale Dimension Reduction to performance needs. Feature Selection Time Increases Remove Outliers Iterate. Cross Validate & Grid Search. Iterate some more. Sampling Meet reasonable speed vs accuracy tradeoff 19

Problem Data sizes continue to grow Massive Dataset Histograms / Distributions Better to start with as much data as possible and explore / preprocess to scale Dimension Reduction to performance needs. Feature Selection Time Increases Remove Outliers Iterate. Cross Validate & Grid Search. Iterate some more. Hours? Sampling Meet reasonable speed vs accuracy tradeoff 20

Problem Data sizes continue to grow Massive Dataset Histograms / Distributions Better to start with as much data as possible and explore / preprocess to scale Dimension Reduction to performance needs. Feature Selection Time Increases Remove Outliers Iterate. Cross Validate & Grid Search. Iterate some more. Hours? Days? Sampling Meet reasonable speed vs accuracy tradeoff 21

ML Workflow Stifles Innovation It Requires Exploration and Iterations Manage Data Training Evaluate Deploy Feature Model Tuning & All Structured ETL Inference Engineering Training Selection Data Data Store Iterate … Cross Validate … Grid Search … Iterate some more. Accelerating just `Model Training` does have benefit but doesn’t address the whole problem 22

ML Workflow Stifles Innovation It Requires Exploration and Iterations Manage Data Training Evaluate Deploy Feature Model Tuning & All Structured ETL Inference Engineering Training Selection Data Data Store Iterate … Cross Validate … Grid Search … Iterate some more. Accelerating just `Model Training` does have benefit but doesn’t address the whole problem End-to-End acceleration is needed 23

Architecture “More data requires better approaches!” ~ Xavier Amatriain CTO, CurAI 24

RAPIDS: OPEN GPU DATA SCIENCE cuDF, cuML, and cuGraph mimic well-known libraries Data Preparation Model Training Visualization PYTHON Pandas-like DL FRAMEWORKS RAPIDS NetworkX-like DASK CUDF CUML CUGRAPH CUDNN CUDA APACHE ARROW ScikitLearn-like 25

HIGH-LEVEL APIs Python Dask-CUML Dask Multi-GPU ML CuML Scikit-Learn-Like CUDA/C++ libcuml ML Algorithms ML Primitives Multi-Node & Multi-GPU Communications Host 1 Host 2 GPU1 GPU3 GPU1 GPU3 GPU4 GPU4 GPU2 GPU2 26

cuML API GPU-accelerated machine learning at every layer Python Scikit-learn-like interface for data scientists utilizing cuDF & Numpy Algorithms CUDA C++ API for developers to utilize accelerated machine learning algorithms. Primitives Reusable building blocks for composing machine learning algorithms. 27

Primitives GPU-accelerated math optimized for feature matrices Linear Algebra Statistics Matrix / Math • Element-wise operations Matrix multiply • Random Norms • Distance / Metrics Eigen Decomposition • • SVD/RSVD Objective Functions • Transpose Sparse Conversions QR Decomposition • More to come! 28

Algorithms GPU-accelerated Scikit-Learn Decision Trees / Random Forests Linear Regression Classification / Regression Logistic Regression K-Nearest Neighbors Kalman Filtering Bayesian Inference Statistical Inference Gaussian Mixture Models Hidden Markov Models K-Means Clustering DBSCAN Spectral Clustering Principal Components Singular Value Decomposition Decomposition & Dimensionality Reduction UMAP Spectral Embedding ARIMA Cross Validation Timeseries Forecasting Holt-Winters Recommendations Hyper-parameter Tuning Implicit Matrix Factorization More to come! 29

HIGH-LEVEL APIs Python Dask Multi-GPU ML Data Distribution Scikit-Learn-Like CUDA/C++ ML Algorithms Model Parallelism ML Primitives Multi-Node / Multi-GPU Communications Host 1 Host 2 GPU1 GPU3 GPU1 GPU3 GPU4 GPU4 GPU2 GPU2 30

HIGH-LEVEL APIs Python Dask Multi-GPU ML Data Distribution Scikit-Learn-Like CUDA/C++ ML Algorithms Model Parallelism ML Primitives Multi-Node / Multi-GPU Communications Portability • Host 1 Host 2 Efficiency • GPU1 GPU3 GPU1 GPU3 • Speed GPU4 GPU4 GPU2 GPU2 31

Dask cuML Distributed Data-parallelism Layer • Distributed computation scheduler for Python • Scales up and out • Distributes data across processes • Enables model-parallel cuML algorithms 32

ML Technology Stack Dask cuML Python Dask cuDF cuDF Cython Numpy cuML Algorithms Thrust Cub cuSolver cuML Prims nvGraph CUTLASS CUDA Libraries cuSparse cuRand CUDA cuBlas 33

cuML Deep Dive “I would posit that every scientist is a data scientist.” ~ Arun Subramaniyan V.P . of Data Science & Analytics, Baker Hughes, a GE Company 34

Linear Regression (OLS) Python Layer Pandas cuDF 35

Linear Regression (OLS) Python Layer cuDF 36

Linear Regression (OLS) Python Layer Scikit-Learn cuML 37

Linear Regression (OLS) cuML Algorithms CUDA C++ Layer 40

Linear Regression (OLS) cuML Algorithms CUDA C++ Layer 41

cuML: A Library for GPU Accelerated Machine Learning Onur Yilmaz, - PowerPoint PPT Presentation

cuML: A Library for GPU Accelerated Machine Learning Onur Yilmaz, Ph.D. | oyilmaz@nvidia.com | Senior ML/DL Scientist and Engineer Corey Nolet | cnolet@nvidia.com | Data Scientist and Senior Engineer About Us Onur Yilmaz, Ph.D. Senior ML/DL

NVGRAPH,FIREHOSE,PAGERANK GPU ACCELERATED ANALYTICS NOV 2016 Joe Eaton Ph.D. Accelerated

GPU-Accelerated GPU-Accelerated Large Vocabulary Continuous Speech Recognition Large

Status of GPU offloading on Wayland Axel Davy FOSDEM 2014 Status of GPU offloading on Wayland

Motivation to Learn GPGPU Julius Parulek Why to Learn About GPU? Computational power of GPU vs.

Picture This! Visualization on GPU Accelerated Supercomputers Peter Messmer, 11/15/2016 NVIDIA

GPU-accelerated similarity searching in a database of short DNA sequences Richard Wilton

Accelerated Reader What is Accelerated Reader? Accelerated Reader is the number one software

Library Department FY 2021 Library Department FY 2021 Library Organization Chart Springfield

Presentation 7.3b: Multiple linear regression Murray Logan 09 Aug 2016 library (GGally) library

UNIFIED MEMORY ON PASCAL AND VOLTA Nikolay Sakharnykh - May 10, 2017 1 HETEROGENEOUS

Advancements in V-Ray RT GPU Vlado Koylazov, CTO & Co-founder Blagovest Taskov, RT GPU Team

GPU Accelerated Machine Learning for Bond Price Prediction Venkat Bala Rafael Nicolas Fermin

Using a CUDA-Accelerated PGAS Model on a GPU Cluster for Bioinformatics Jorge

GPU-accelerated Data Management Data Processing on Modern Hardware Sebastian Bre TU Dortmund

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

PacketShader: A GPU-Accelerated Software Router Some images and sentence are from original author

Head Motion Prediction in Augmented Reality Systems Using Monte Carlo Particle Filters

Estimation of error covariance matrices in data assimilation Pierre Tandeo Associate professor

Michel Juillard Outline Bayesian Estimation of GPM with Dynare 1. Introduction to Bayesian

GPU-Accelerated Object Tracking Using Particle Filtering and Appearance-adaptive Models Bogusaw

Embedded Bayesian Perception & Risk Assessment for ADAS & Autonomous Cars Christian

Bayesian Estimation of Autoregressive Moving-Average Processes as Exogenous Shock Processes in

Time-varying Combinations of Bayesian Dynamic Models and Equity Momentum Strategies Herman K. van

Penalty terms for estimation of ARMA models: A Bayesian inspiration ITISE Granada 2018 Helgi T

cuML: A Library for GPU Accelerated Machine Learning Onur Yilmaz, - PowerPoint PPT Presentation

cuML: A Library for GPU Accelerated Machine Learning Onur Yilmaz, Ph.D. | oyilmaz@nvidia.com | Senior ML/DL Scientist and Engineer Corey Nolet | cnolet@nvidia.com | Data Scientist and Senior Engineer About Us Onur Yilmaz, Ph.D. Senior ML/DL

NVGRAPH,FIREHOSE,PAGERANK GPU ACCELERATED ANALYTICS NOV 2016 Joe Eaton Ph.D. Accelerated

GPU-Accelerated GPU-Accelerated Large Vocabulary Continuous Speech Recognition Large

Status of GPU offloading on Wayland Axel Davy FOSDEM 2014 Status of GPU offloading on Wayland

Motivation to Learn GPGPU Julius Parulek Why to Learn About GPU? Computational power of GPU vs.

Picture This! Visualization on GPU Accelerated Supercomputers Peter Messmer, 11/15/2016 NVIDIA

GPU-accelerated similarity searching in a database of short DNA sequences Richard Wilton

Accelerated Reader What is Accelerated Reader? Accelerated Reader is the number one software

Library Department FY 2021 Library Department FY 2021 Library Organization Chart Springfield

Presentation 7.3b: Multiple linear regression Murray Logan 09 Aug 2016 library (GGally) library

UNIFIED MEMORY ON PASCAL AND VOLTA Nikolay Sakharnykh - May 10, 2017 1 HETEROGENEOUS

Advancements in V-Ray RT GPU Vlado Koylazov, CTO &amp; Co-founder Blagovest Taskov, RT GPU Team

GPU Accelerated Machine Learning for Bond Price Prediction Venkat Bala Rafael Nicolas Fermin

Using a CUDA-Accelerated PGAS Model on a GPU Cluster for Bioinformatics Jorge

GPU-accelerated Data Management Data Processing on Modern Hardware Sebastian Bre TU Dortmund

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

PacketShader: A GPU-Accelerated Software Router Some images and sentence are from original author

Head Motion Prediction in Augmented Reality Systems Using Monte Carlo Particle Filters

Estimation of error covariance matrices in data assimilation Pierre Tandeo Associate professor

Michel Juillard Outline Bayesian Estimation of GPM with Dynare 1. Introduction to Bayesian

GPU-Accelerated Object Tracking Using Particle Filtering and Appearance-adaptive Models Bogusaw

Embedded Bayesian Perception &amp; Risk Assessment for ADAS &amp; Autonomous Cars Christian

Bayesian Estimation of Autoregressive Moving-Average Processes as Exogenous Shock Processes in

Time-varying Combinations of Bayesian Dynamic Models and Equity Momentum Strategies Herman K. van

Penalty terms for estimation of ARMA models: A Bayesian inspiration ITISE Granada 2018 Helgi T

Advancements in V-Ray RT GPU Vlado Koylazov, CTO & Co-founder Blagovest Taskov, RT GPU Team

Embedded Bayesian Perception & Risk Assessment for ADAS & Autonomous Cars Christian