Scaling RAPIDS with Dask Matthew Rocklin, Systems Software Manager - PowerPoint PPT Presentation

Scaling RAPIDS with Dask Matthew Rocklin, Systems Software Manager GTC San Jose 2019

PyData is Pragmatic, but Limited How do we accelerate an existing software stack? The PyData Ecosystem NumPy: Arrays • Pandas: Dataframes • Scikit-Learn: Machine Learning • Jupyter: Interaction • … (many other projects) • Is well loved Easy to use • Broadly taught • Community Governed • But sometimes slow Single CPU core • In-memory data • 2

95% of the time, PyData is great (and you can ignore the rest of this talk) 5% of the time, you want more performance 3

Scale up and out with RAPIDS and Dask RAPIDS and Others Dask + RAPIDS Accelerated on single GPU Multi-GPU On single Node (DGX) Scale Up / Accelerate NumPy -> CuPy/PyTorch/.. Or across a cluster Pandas -> cuDF Scikit-Learn -> cuML Numba -> Numba PyData Dask NumPy, Pandas, Scikit-Learn Multi-core and Distributed PyData and many more NumPy -> Dask Array Single CPU core Pandas -> Dask DataFrame In-memory data Scikit-Learn -> Dask-ML … -> Dask Futures Scale out / Parallelize 4

Scale up and out with RAPIDS and Dask RAPIDS and Others Accelerated on single GPU Scale Up / Accelerate NumPy -> CuPy/PyTorch/.. Pandas -> cuDF Scikit-Learn -> cuML Numba -> Numba PyData NumPy, Pandas, Scikit-Learn and many more Single CPU core In-memory data Scale out / Parallelize 5

RAPIDS: GPU variants of PyData libraries NumPy -> CuPy, PyTorch, TensorFlow • Array computing • Mature due to deep learning boom • Also useful for other domains • Obvious fit for GPUs • Pandas -> cuDF • Tabular computing • New development • Parsing, joins, groupbys • Not an obvious fit for GPUs • Scikit-Learn -> cuML • Traditional machine learning • Somewhere in between • 6

Scale up and out with RAPIDS and Dask Scale Up / Accelerate PyData Dask NumPy, Pandas, Scikit-Learn Multi-core and Distributed PyData and many more NumPy -> Dask Array Single CPU core Pandas -> Dask DataFrame In-memory data Scikit-Learn -> Dask-ML … -> Dask Futures Scale out / Parallelize 11

Dask Parallelizes PyData Natively PyData Native • Built on top of NumPy, Pandas Scikit-Learn, … (easy to migrate) • With the same APIs (easy to train) • With the same developer community (well trusted) • Scales • Scales out to thousand-node clusters • Easy to install and use on a laptop • Popular • Most common parallelism framework today at PyData and SciPy conferences • Deployable • HPC: SLURM, PBS, LSF, SGE • Cloud: Kubernetes • Hadoop/Spark: Yarn • 12

Parallel NumPy For imaging, simulation analysis, machine learning Same API as NumPy ● ● One Dask Array is built from many NumPy arrays Either lazily fetched from disk Or distributed throughout a cluster 13

Parallel Pandas For ETL, time series, data munging Same API as Pandas ● ● One Dask DataFrame is built from many Pandas DataFrames Either lazily fetched from disk Or distributed throughout a cluster 14

Parallel Scikit-Learn For Hyper-Parameter Optimization, Random Forests, ... Same API ● Thread Same exact code, just wrap with a decorator ● Pool Replaces default threaded execution with Dask ● Allowing scaling onto clusters ● Available in most Scikit-Learn algorithms where joblib is used 15

Parallel Scikit-Learn For Hyper-Parameter Optimization, Random Forests, ... Same API ● Thread Same exact code, just wrap with a decorator ● Pool Replaces default threaded execution with Dask ● Allowing scaling onto clusters ● Available in most Scikit-Learn algorithms where joblib is used 16

Parallel Python For custom systems, ML algorithms, workflow engines Parallelize existing codebases ● 17

Parallel Python For custom systems, ML algorithms, workflow engines Parallelize existing codebases ● M Tepper, G Sapiro “ Compressed nonnegative matrix factorization is fast and accurate ”, 18 IEEE Transactions on Signal Processing, 2016

Dask Connects Python users to Hardware Execute on distributed User hardware 19

Dask Connects Python users to Hardware Writes high level code Executes on distributed Turns into a task graph User (NumPy/Pandas/Scikit-Learn) hardware 20

Example: Dask + Pandas on NYC Taxi We see how well New Yorkers Tip import dask.dataframe as dd df = dd.read_csv('gcs://bucket-name/nyc-taxi-*.csv', parse_dates=['pickup_datetime', 'dropoff_datetime']) df2 = df[(df.tip_amount > 0) & (df.fare_amount > 0)] df2['tip_fraction'] = df2.tip_amount / df2.fare_amount hour = df2.groupby(df2.pickup_datetime.dt.hour).tip_fraction.mean() hour.compute().plot(figsize=(10, 6), title='Tip Fraction by Hour') 21

examples.dask.org Try live 22

Dask scales PyData libraries But is compute-agnostic to those libraries (A good fit if you’re building a new data science platform) 23

Scale up and out with RAPIDS and Dask RAPIDS and Others Accelerated on single GPU Scale Up / Accelerate NumPy -> CuPy/PyTorch/.. Pandas -> cuDF Scikit-Learn -> cuML Numba -> Numba PyData Dask NumPy, Pandas, Scikit-Learn Multi-core and Distributed PyData and many more NumPy -> Dask Array Single CPU core Pandas -> Dask DataFrame In-memory data Scikit-Learn -> Dask-ML … -> Dask Futures Scale out / Parallelize 24

Combine Dask with cuDF Many GPU DataFrames form a distributed DataFrame 26

Combine Dask with cuDF Many GPU DataFrames form a distributed DataFrame cuDF 27

Combine Dask with CuPy Many GPU arrays form a Distributed GPU array 28

Combine Dask with CuPy Many GPU arrays form a Distributed GPU array GPU 29

Experiments ... SVD with Dask Array NYC Taxi with Dask DataFrame 30

So what works in DataFrames? Lots! Read CSV: Elementwise operations: Reductions: Groupby Aggregations: Joins (hash, sorted, large-to-small): Leverages Dask DataFrame algorithms (been around for years) API matches Pandas 31

So what doesn’t work? Lots! Read Parquet/ORC Reductions: Groupby Aggregations: Rolling window operations Leverages Dask DataFrame algorithms (been around for years) API matches Pandas 32

So what doesn’t work? API Alignment When cuDF and Pandas match, existing Dask algorithms work seamlessly • But the APIs don’t always match • 33

So what doesn’t work? API Alignment When cuDF and Pandas match, existing Dask algorithms work seamlessly • But the APIs don’t always match • 34

So what works in Arrays? We genuinely don’t know yet This work is much younger, but moving quickly • CuPy has been around for a while, and is fairly mature • Most work today happening upstream in NumPy and Dask • Thanks Peter Entschev, Hameer Abbasi, Stephan Hoyer, Marten van Kerkwijk, Eric Wieser Ecosystem approach benefits other NumPy-like arrays as well, sparse arrays, Xarray, ... • 35

Scaling RAPIDS with Dask Matthew Rocklin, Systems Software Manager - PowerPoint PPT Presentation

Scaling RAPIDS with Dask Matthew Rocklin, Systems Software Manager GTC San Jose 2019 PyData is Pragmatic, but Limited How do we accelerate an existing software stack? The PyData Ecosystem NumPy: Arrays Pandas: Dataframes

B u ilding Dask Bags & Globbing PAR AL L E L P R OG R AMMIN G W ITH DASK IN P YTH ON Dha

Ch u nking Arra y s in Dask PAR AL L E L P R OG R AMMIN G W ITH DASK IN P YTH ON Dha v ide

Preparing Flight Dela y Data PAR AL L E L P R OG R AMMIN G W ITH DASK IN P YTH ON Dha v ide

Distributed Multi-GPU Computing with Dask, CuPy and RAPIDS Peter Andreas Entschev Senior System

RAPIDS: Deep Dive Into How the Platform Works Paul Mahler, 3/18/19 Introduction to RAPIDS 2

Welcome Perham to Pelican Rapids Regional Trail Perham to Pelican Rapids Regional Trail Status

Webinar Series CITY OF GRAND RAPIDS' CANNABIS LICENSING, SOCIAL EQUITY, AND ZONING REGULATIONS

Outline Scaling Scalinga Plenitude of Power Laws Scaling-at-large Scaling-at-large

UP UP AND OUT: SCALING SOFTWARE WITH AKKA Jonas Bonr CTO Typesafe @jboner Scaling software

Analysis of Scaling Algorithms for Matrix & Operator Scaling Contents Scaling Algorithms

Age-Friendly Grand Rapids Strategic Priority Alignment Economic Vibrancy & Affordability

RAPIDS: PLATFORM INSIDE AND OUT Joshua Patterson 3-19-2019 RAPIDS End to End Accelerate GPU Data

MARS RAPIDS GPU

Little Rapids Habitat Restoration St. Marys River AOC Engineering and Design Project Update

UNIFIED MEMORY FOR DATA ANALYTICS AND DEEP LEARNING Nikolay Sakharnykh, Chirayu Garg, and Dmitri

Effectively Scaling Effectively Scaling up/universalizing exclusive up/universalizing exclusive

Chapter 1 : Informatics Practices Advance operations Class XII ( As per on dataframes CBSE

Informatics Practices Class XII ( As per CBSE Board) Visit : python.mykvs.in for regular updates

EV-M5b - EVDAS training for Marketing Authorisation Holders Overview of the EVDAS functionalities

Soda Tax in West Virginia Tara Holmes Summer Research Associate West Virginia Center on Budget

GPU PYTHON , 1 2 3 4 5

Python for Data Science Overview of Python Why Python Installing Python Installing Python Modules

Boa Meets Python: A Boa Dataset of Data Science Software in Python Language Sumon Biswas , Md

Session 3 Upskilling for Predictive Analytics Travis M Short, FSA Upskilling for Predictive

Scaling RAPIDS with Dask Matthew Rocklin, Systems Software Manager - PowerPoint PPT Presentation

Scaling RAPIDS with Dask Matthew Rocklin, Systems Software Manager GTC San Jose 2019 PyData is Pragmatic, but Limited How do we accelerate an existing software stack? The PyData Ecosystem NumPy: Arrays Pandas: Dataframes

B u ilding Dask Bags &amp; Globbing PAR AL L E L P R OG R AMMIN G W ITH DASK IN P YTH ON Dha

Ch u nking Arra y s in Dask PAR AL L E L P R OG R AMMIN G W ITH DASK IN P YTH ON Dha v ide

Preparing Flight Dela y Data PAR AL L E L P R OG R AMMIN G W ITH DASK IN P YTH ON Dha v ide

Distributed Multi-GPU Computing with Dask, CuPy and RAPIDS Peter Andreas Entschev Senior System

RAPIDS: Deep Dive Into How the Platform Works Paul Mahler, 3/18/19 Introduction to RAPIDS 2

Welcome Perham to Pelican Rapids Regional Trail Perham to Pelican Rapids Regional Trail Status

Webinar Series CITY OF GRAND RAPIDS' CANNABIS LICENSING, SOCIAL EQUITY, AND ZONING REGULATIONS

Outline Scaling Scalinga Plenitude of Power Laws Scaling-at-large Scaling-at-large

UP UP AND OUT: SCALING SOFTWARE WITH AKKA Jonas Bonr CTO Typesafe @jboner Scaling software

Analysis of Scaling Algorithms for Matrix &amp; Operator Scaling Contents Scaling Algorithms

Age-Friendly Grand Rapids Strategic Priority Alignment Economic Vibrancy &amp; Affordability

RAPIDS: PLATFORM INSIDE AND OUT Joshua Patterson 3-19-2019 RAPIDS End to End Accelerate GPU Data

MARS RAPIDS GPU

Little Rapids Habitat Restoration St. Marys River AOC Engineering and Design Project Update

UNIFIED MEMORY FOR DATA ANALYTICS AND DEEP LEARNING Nikolay Sakharnykh, Chirayu Garg, and Dmitri

Effectively Scaling Effectively Scaling up/universalizing exclusive up/universalizing exclusive

Chapter 1 : Informatics Practices Advance operations Class XII ( As per on dataframes CBSE

Informatics Practices Class XII ( As per CBSE Board) Visit : python.mykvs.in for regular updates

EV-M5b - EVDAS training for Marketing Authorisation Holders Overview of the EVDAS functionalities

Soda Tax in West Virginia Tara Holmes Summer Research Associate West Virginia Center on Budget

GPU PYTHON , 1 2 3 4 5

Python for Data Science Overview of Python Why Python Installing Python Installing Python Modules

Boa Meets Python: A Boa Dataset of Data Science Software in Python Language Sumon Biswas , Md

Session 3 Upskilling for Predictive Analytics Travis M Short, FSA Upskilling for Predictive

B u ilding Dask Bags & Globbing PAR AL L E L P R OG R AMMIN G W ITH DASK IN P YTH ON Dha

Analysis of Scaling Algorithms for Matrix & Operator Scaling Contents Scaling Algorithms

Age-Friendly Grand Rapids Strategic Priority Alignment Economic Vibrancy & Affordability