RAPIDS, FOSDEM19 Dr. Christoph Angerer, Manager AI Developer - PowerPoint PPT Presentation

RAPIDS, FOSDEM’19 Dr. Christoph Angerer, Manager AI Developer Technologies, NVIDIA

HPC & AI TRANSFORMS INDUSTRIES Computational & Data Scientists Are Driving Change Healthcare Industrial Consumer Internet Automotive Ad Tech / Retail Financial / Insurance MarTech 2

DATA SCIENCE IS NOT A LINEAR PROCESS It Requires Exploration and Iterations Manage Data Training Evaluate Deploy Data Model All Structured ETL Visualization Inference Preparation Training Data Data Store Iterate … Cross Validate … Grid Search … Iterate some more. Accelerating`Model Training` only does have benefit but doesn’t address the whole problem 3

DAY IN THE LIFE Or: Why did I want to become a Data Scientist? Data Scientist are valued resources. Why not give them the environment to configure ETL workflow be more productive 4

PERFORMANCE AND DATA GROWTH Post-Moore's law Data sizes continue to grow Moore's law is no longer a predictor of capacity in CPU market growth Distributing CPUs exacerbates the problem 5

TRADITIONAL DATA SCIENCE CLUSTER Workload Profile: Fannie Mae Mortgage Data: • 192GB data set • 16 years, 68 quarters • 34.7 Million single family mortgage loans • 1.85 Billion performance records • XGBoost training set: 50 features 300 Servers | $3M | 180 kW 6

GPU-ACCELERATED MACHINE LEARNING CLUSTER NVIDIA Data Science Platform with DGX-2 1 DGX-2 | 10 kW 1/8 the Cost | 1/15 the Space 1/18 the Power End-to-End 20 CPU Nodes 30 CPU Nodes 50 CPU Nodes 100 CPU Nodes DGX-2 5x DGX-1 0 2,000 4,000 6,000 8,000 10,000 7

DELIVERING DATA SCIENCE VALUE Maximized Productivity Top Model Accuracy Lowest TCO Oak Ridge Global Streaming Media National Labs Retail Giant Company 215x $1B $1.5M Speedup Using RAPIDS Potential Saving with Infrastructure with XGBoost 4% Error Rate Reduction Cost Saving 8

DATA SCIENCE WORKFLOW WITH RAPIDS Open Source, End-to-end GPU-accelerated Workflow Built On CUDA DATA PREDICTIONS DATA PREPARATION GPUs accelerated compute for in-memory data preparation Simplified implementation using familiar data science tools Python drop-in Pandas replacement built on CUDA C++. GPU-accelerated Spark (in development) 9

DATA SCIENCE WORKFLOW WITH RAPIDS Open Source, End-to-end GPU-accelerated Workflow Built On CUDA DATA PREDICTIONS MODEL TRAINING GPU- acceleration of today’s most popular ML algorithms XGBoost, PCA, K-means, k-NN, DBScan, tSVD … 10

DATA SCIENCE WORKFLOW WITH RAPIDS Open Source, End-to-end GPU-accelerated Workflow Built On CUDA DATA PREDICTIONS VISUALIZATION Effortless exploration of datasets, billions of records in milliseconds Dynamic interaction with data = faster ML model development Data visualization ecosystem (Graphistry & OmniSci), integrated with RAPIDS 11

THE EFFECTS OF END-TO-END ACCELERATION Faster Data Access Less Data Movement Hadoop Processing, Reading from disk HDFS HDFS HDFS HDFS HDFS Query ETL ML Train Read Write Read Write Read Spark In-Memory Processing 25-100x Improvement Less code HDFS Language flexible Query ETL ML Train Read Primarily In-Memory GPU/Spark In-Memory Processing 5-10x Improvement More code Language rigid HDFS GPU ReadQuery CPU GPU CPU GPU ML Read ETL Substantially on GPU Read Write Write Read Train RAPIDS 50-100x Improvement Same code Arrow Language flexible ML Query ETL Primarily on GPU Read Train 12

Yes GPUs are fast but … • Too much data movement ADDRESSING CHALLENGES • Too many makeshift data formats IN GPU ACCELERATED • Writing CUDA C/C++ is involved DATA SCIENCE • No Python API for data manipulation 13

DATA MOVEMENT AND TRANSFORMATION The bane of productivity and performance APP B Read Data APP B GPU APP B Copy & Convert Data CPU GPU Copy & Convert GPU APP A Copy & Convert Data APP A APP A Load Data 14

DATA MOVEMENT AND TRANSFORMATION What if we could keep data on the GPU? APP B Read Data APP B GPU APP B Copy & Convert Data CPU GPU Copy & Convert GPU APP A Copy & Convert Data APP A APP A Load Data 15

LEARNING FROM APACHE ARROW From Apache Arrow Home Page - https://arrow.apache.org/ 16

CUDA DATA FRAMES IN PYTHON GPUs at your Fingertips Illustrations from https://changhsinlee.com/pyspark-dataframe-basics/ 17

RAPIDS OPEN GPU DATA SCIENCE 18

RAPIDS Open GPU Data Science Learn what the data science community needs • APPLICATIONS Use best practices and standards • • Build scalable systems and algorithms ALGORITHMS Test Applications and workflows • • Iterate SYSTEMS CUDA ARCHITECTURE 19

RAPIDS COMPONENTS Data Preparation Model Training Visualization cuDF cuML cuGraph PyTorch & Chainer Kepler.GL Analytics Machine Learning Graph Analytics Deep Learning Visualization GPU Memory DASK 20

CUML & CUGRAPH Data Preparation Model Training Visualization cuDF cuML cuGraph PyTorch & Chainer Kepler.GL Analytics Machine Learning Graph Analytics Deep Learning Visualization GPU Memory DASK 21

AI LIBRARIES cuML & cuGraph Machine Learning Graph Analytics XGBoost, Mortgage Dataset, 90x PageRank BFS Jaccard Similarity Decisions Trees Single Source Shortest Path Random Forests Triangle Counting Accelerating more of the AI ecosystem Linear Regressions Louvain Modularity 3 Hours to 2 mins on 1 DGX-1 Logistics Regressions Graph Analytics is fundamental to network analysis Time Series K-Means K-Nearest Neighbor Machine Learning is fundamental to prediction, DBSCAN classification, clustering, anomaly detection and Kalman Filtering recommendations. Principal Components Single Value Decomposition Both can be accelerated with NVIDIA GPU Bayesian Inferencing ARIMA 8x V100 20-90x faster than dual socket CPU Holt-Winters 22

CUDF + XGBOOST DGX-2 vs Scale Out CPU Cluster • Full end to end pipeline • Leveraging Dask + PyGDF Store each GPU results in sys mem then read back in • Arrow to Dmatrix (CSR) for XGBoost • 23

CUDF + XGBOOST Scale Out GPU Cluster vs DGX-2 Chart Title • Full end to end pipeline • Leveraging Dask for multi-node + PyGDF DGX-2 Store each GPU results in sys mem then read back in • Arrow to Dmatrix (CSR) for XGBoost • 5x DGX-1 0 50 100 150 200 250 300 350 ETL+CSV (s) ML Prep (s) ML (s) 24

CUML Benchmarks of initial algorithms 25

NEAR FUTURE WORK ON CUML Additional algorithms in development right now K-means - Released ARIMA – v0.6 K-NN - Released UMAP – v0.6 Kalman filter – v0.5 Collaborative filtering – Q2 2019 GLM – v0.5 Random Forests - v0.6 26

CUGRAPH GPU-Accelerated Graph Analytics Library Coming Soon: Full NVGraph Integration Q1 2019 27

CUDF Data Preparation Model Training Visualization cuDF cuML cuGraph PyTorch & Chainer Kepler.GL Analytics Machine Learning Graph Analytics Deep Learning Visualization GPU Memory DASK 28

CUDF GPU DataFrame library Apache Arrow data format • Pandas-like API • Unary and Binary Operations • • Joins / Merges GroupBys • • Filters User-Defined Functions (UDFs) • • Accelerated file readers Etc. • 29

CUDF Today CUDA With Python Bindings • Low level library containing function A Python library for manipulating GPU • implementations and C/C++ API DataFrames • Importing/exporting Apache Arrow using the Python interface to CUDA C++ with additional • CUDA IPC mechanism functionality • CUDA kernels to perform element-wise math Creating Apache Arrow from Numpy arrays, • operations on GPU DataFrame columns Pandas DataFrames, and PyArrow Tables • CUDA sort, join, groupby, and reduction • JIT compilation of User-Defined Functions operations on GPU DataFrames (UDFs) using Numba 30

CUSTRING & NVSTRING GPU-Accelerated string functions with a Pandas-like API API and functionality is following Pandas: • https://pandas.pydata.org/pandas- docs/stable/api.html#string-handling 800.00 700.00 • lower() 600.00 500.00 milliseconds ~22x speedup • 400.00 find() • 300.00 200.00 • ~40x speedup 100.00 0.00 slice() • lower() find(#) slice(1,15) Pandas cudastrings • ~100x speedup 31

DASK Data Preparation Model Training Visualization cuDF cuML cuGraph PyTorch & Chainer Kepler.GL Analytics Machine Learning Graph Analytics Deep Learning Visualization GPU Memory DASK 32

DASK What is Dask and why does RAPIDS use it for scaling out? • Dask is a distributed computation scheduler built to scale Python workloads from laptops to supercomputer clusters. • Extremely modular with scheduling, compute, data transfer, and out-of-core handling all being disjointed allowing us to plug in our own implementations. • Can easily run multiple Dask workers per node to allow for an easier development model of one worker per GPU regardless of single node or multi node environment. 33

RAPIDS, FOSDEM19 Dr. Christoph Angerer, Manager AI Developer - PowerPoint PPT Presentation

RAPIDS, FOSDEM19 Dr. Christoph Angerer, Manager AI Developer Technologies, NVIDIA HPC & AI TRANSFORMS INDUSTRIES Computational & Data Scientists Are Driving Change Healthcare Industrial Consumer Internet Automotive Ad Tech /

Welcome to FOSDEM! Philip Paeps <philip@fosdem.org> & Pascal Bleser

Horizon EDA Version 1.0 Lukas Kramer 2020-02-01 FOSDEM 2020 Lukas K., FOSDEM 2020 1 / 11

RAPIDS: Deep Dive Into How the Platform Works Paul Mahler, 3/18/19 Introduction to RAPIDS 2

Welcome Perham to Pelican Rapids Regional Trail Perham to Pelican Rapids Regional Trail Status

Webinar Series CITY OF GRAND RAPIDS' CANNABIS LICENSING, SOCIAL EQUITY, AND ZONING REGULATIONS

Age-Friendly Grand Rapids Strategic Priority Alignment Economic Vibrancy & Affordability

RAPIDS: PLATFORM INSIDE AND OUT Joshua Patterson 3-19-2019 RAPIDS End to End Accelerate GPU Data

MARS RAPIDS GPU

Little Rapids Habitat Restoration St. Marys River AOC Engineering and Design Project Update

horizon EDA whats new Lukas Kramer 03.02.2019 FOSDEM 2019 Lukas K., FOSDEM 2019 1 / 20

Marijuana In Grand Rapids Development Center Lunch & Learn September 17, 2019 Agenda 1.

MARIJUANA IN GRAND RAPIDS Practitioner Informational Meeting #3 March 1, 2019 Landon Bartley,

RAPIDS CUDA DataFrame Internals for C++ Developers - S91043 Jake Hemstad - NVIDIA - Developer

Fair Housing Cedar Rapids Civil Rights Commission LaSheila Yates Executive Director Johnny

Housing For All Strong Neighborhoods, Strong City 2016 Grand Rapids Neighborhood Summit

USING THE DATA YOU COLLECT: ACCELERATING CYBERSECURITY APPLICATIONS WITH RAPIDS Bianca Rhodes

Bakry meets Villani Fabrice Baudoin Purdue University Purdue Probability Seminar Dominique

4/14/20 Outline 0) Course Info CS520 1) Introduction Data Integration, Warehousing, and 2)

MTD-BO 4: ETL Overview Including LGADs, System Testing, I&C Artur Apresyan HL-LHC CMS

address the challenges posed by Big Data Syed Muhammad Fawad Ali Agenda 1. Introduction 2.

Data Orchestration with Apache Airflow Data driven empower the organization to seek more

Chernoff approximation of diffusions and further applications Yana A. Butko Analysis and

sqrrl sqrrl Secure. Scale. Adapt Secure. Scale. Adapt. Adam

e-MOTICON e-MO MObility Transnational strategy for an Interoperable CO COmmunity and

RAPIDS, FOSDEM19 Dr. Christoph Angerer, Manager AI Developer - PowerPoint PPT Presentation

RAPIDS, FOSDEM19 Dr. Christoph Angerer, Manager AI Developer Technologies, NVIDIA HPC & AI TRANSFORMS INDUSTRIES Computational & Data Scientists Are Driving Change Healthcare Industrial Consumer Internet Automotive Ad Tech /

Welcome to FOSDEM! Philip Paeps &lt;philip@fosdem.org&gt; &amp; Pascal Bleser

Horizon EDA Version 1.0 Lukas Kramer 2020-02-01 FOSDEM 2020 Lukas K., FOSDEM 2020 1 / 11

RAPIDS: Deep Dive Into How the Platform Works Paul Mahler, 3/18/19 Introduction to RAPIDS 2

Welcome Perham to Pelican Rapids Regional Trail Perham to Pelican Rapids Regional Trail Status

Webinar Series CITY OF GRAND RAPIDS' CANNABIS LICENSING, SOCIAL EQUITY, AND ZONING REGULATIONS

Age-Friendly Grand Rapids Strategic Priority Alignment Economic Vibrancy &amp; Affordability

RAPIDS: PLATFORM INSIDE AND OUT Joshua Patterson 3-19-2019 RAPIDS End to End Accelerate GPU Data

MARS RAPIDS GPU

Little Rapids Habitat Restoration St. Marys River AOC Engineering and Design Project Update

horizon EDA whats new Lukas Kramer 03.02.2019 FOSDEM 2019 Lukas K., FOSDEM 2019 1 / 20

Marijuana In Grand Rapids Development Center Lunch &amp; Learn September 17, 2019 Agenda 1.

MARIJUANA IN GRAND RAPIDS Practitioner Informational Meeting #3 March 1, 2019 Landon Bartley,

RAPIDS CUDA DataFrame Internals for C++ Developers - S91043 Jake Hemstad - NVIDIA - Developer

Fair Housing Cedar Rapids Civil Rights Commission LaSheila Yates Executive Director Johnny

Housing For All Strong Neighborhoods, Strong City 2016 Grand Rapids Neighborhood Summit

USING THE DATA YOU COLLECT: ACCELERATING CYBERSECURITY APPLICATIONS WITH RAPIDS Bianca Rhodes

Bakry meets Villani Fabrice Baudoin Purdue University Purdue Probability Seminar Dominique

4/14/20 Outline 0) Course Info CS520 1) Introduction Data Integration, Warehousing, and 2)

MTD-BO 4: ETL Overview Including LGADs, System Testing, I&amp;C Artur Apresyan HL-LHC CMS

address the challenges posed by Big Data Syed Muhammad Fawad Ali Agenda 1. Introduction 2.

Data Orchestration with Apache Airflow Data driven empower the organization to seek more

Chernoff approximation of diffusions and further applications Yana A. Butko Analysis and

sqrrl sqrrl Secure. Scale. Adapt Secure. Scale. Adapt. Adam

e-MOTICON e-MO MObility Transnational strategy for an Interoperable CO COmmunity and

Welcome to FOSDEM! Philip Paeps <philip@fosdem.org> & Pascal Bleser

Age-Friendly Grand Rapids Strategic Priority Alignment Economic Vibrancy & Affordability

Marijuana In Grand Rapids Development Center Lunch & Learn September 17, 2019 Agenda 1.

MTD-BO 4: ETL Overview Including LGADs, System Testing, I&C Artur Apresyan HL-LHC CMS