From Python to PySpark and Back Again - Unifying Single-host and - PowerPoint PPT Presentation

From Python to PySpark and Back Again - Unifying Single-host and Distributed Machine Learning with Maggy Moritz Meister, @morimeister Software Engineer, Logical Clocks Jim Dowling, @jim_dowling Associate Professor, KTH Royal Institute of Technology

ML Model Development A simplified view Feature Explainability Exploration Experimentation Model Training Serving Pipelines and Validation

ML Model Development It’s simple - only four steps Explore Experimentation: Explainability and Model Training and Design Tune and Search Ablation Studies (Distributed)

Artifacts and Non DRY Code Explore Experimentation: Explainability and Model Training and Design Tune and Search Ablation Studies (Distributed)

What It’s Really Like … not linear but iterative

What It’s Really Really Like … not linear but iterative

Root Cause: Iterative Development of ML Models Explore Experimentation: Explainability and Model Training and Design Tune and Search Ablation Studies (Distributed)

Iterative Development Is a Pain, We Need DRY Code! Each step requires different implementations of the training code EDA HParam Tuning Ablation Studies Training (Dist)

The Oblivious Training Function # RUNS ON THE WORKERS def train(): def input_fn(): # return dataset model = … optimizer = … OBLIVIOUS model.compile(…) rc = tf.estimator.RunConfig( TRAINING FUNCTION ‘CollectiveAllReduceStrate gy’) keras_estimator = tf.keras.estimator. model_to_estimator(….) tf.estimator.train_and_evaluate( keras_estimator, input_fn) EDA HParam Tuning Ablation Studies Training (Dist)

Challenge: Obtrusive Framework Artifacts Example: TensorFlow ▪ TF_CONFIG ▪ Distribution Strategy ▪ Dataset (Sharding, DFS) ▪ Integration in Python - hard from inside a notebook ▪ Keras vs. Estimator vs. Custom Training Loop

Where is Deep Learning headed?

Productive High-Level APIs Or why data scientists love Keras and PyTorch Idea Framework Experiment Tracking Visualization Infrastructure Results Francois Chollet, “Keras: The Next 5 Years”

Productive High-Level APIs Or why data scientists love Keras and PyTorch Idea Framework Experiment Tracking Visualization Infrastructure ? Results Hopsworks (Open Source) Databricks Apache Spark Cloud Providers Francois Chollet, “Keras: The Next 5 Years”

How do we keep our high-level APIs transparent and productive?

What Is Transparent Code? def dataset(batch_size): def dataset(batch_size): (x_train, y_train) = load_data() (x_train, y_train) = load_data() x_train = x_train / np.float32(255) x_train = x_train / np.float32(255) y_train = y_train.astype(np.int64) y_train = y_train.astype(np.int64) train_dataset = tf.data.Dataset.from_tensor_slices( train_dataset = tf.data.Dataset.from_tensor_slices( NO CHANGES! (x_train,y_train)).shuffle(60000) (x_train,y_train)).shuffle(60000) .repeat().batch(batch_size) .repeat().batch(batch_size) return train_dataset return train_dataset def build_and_compile_cnn_model(lr): def build_and_compile_cnn_model(lr): model = tf.keras.Sequential([ model = tf.keras.Sequential([ tf.keras.Input(shape=(28, 28)), tf.keras.Input(shape=(28, 28)), tf.keras.layers.Conv2D(32, 3, activation='relu'), tf.keras.layers.Conv2D(32, 3, activation='relu'), tf.keras.layers.Flatten(), tf.keras.layers.Flatten(), tf.keras.layers.Dense(128, activation='relu'), tf.keras.layers.Dense(128, activation='relu'), tf.keras.layers.Dense(10) tf.keras.layers.Dense(10) ]) ]) model.compile( model.compile( loss=SparseCategoricalCrossentropy(from_logits=True), loss=SparseCategoricalCrossentropy(from_logits=True), optimizer=SGD(learning_rate=lr)) optimizer=SGD(learning_rate=lr)) return model return model

Building Blocks for Distribution Transparency

Distribution Context Single-host vs. parallel multi-host vs. distributed multi-host Worker 1 Worker 8 Worker 2 Single Driver Driver Worker 7 Worker 3 Experiment Host TF_CONFIG Controller Worker 6 Worker 4 Worker 5 Worker 1 Worker 2 Worker N

Distribution Context Single-host vs. parallel multi-host vs. distributed multi-host Worker 1 Worker 8 Worker 2 Single Driver Driver Worker 7 Worker 3 Experiment Host TF_CONFIG Controller Worker 6 Worker 4 Worker 5 Worker 1 Worker 2 Worker N Explore Experimentation: Explainability and Model Training and Design Tune and Search Ablation Studies (Distributed)

Model Development Best Practices ▪ Modularize ▪ Parametrize ▪ Higher order training functions Training Model Dataset ▪ Usage of callbacks at Logic Generation Generation runtime

Oblivious Training Function as an Abstraction Let the system handle the complexities System takes care of ... … fixing parameters … launching trials (parametrized … setting up TF_CONFIG instantiations of the function) … launching … wrapping in Distribution Strategy the function … generating new trials … launching function as workers … collecting and logging results … collecting results

Maggy Make the Oblivious Training Function a core abstraction on Hopsworks Spark+AI Summit 2019 Today With Hopsworks and Maggy, we provide a unified development and execution environment for distribution transparent ML model development.

Hopsworks - Award Winning Plattform

Recap: Maggy - Asynchronous Trials on Spark Spark is bulk-synchronous HopsFS Metrics 1 Metrics 2 Metrics 3 Task 11 Task 21 Task 31 Task 12 Task 22 Task 32 Barrier Barrier Barrier Task 13 Task 23 Task 33 Wasted Compute Early-Stopping Task 1N Task 2N Task 3N Wasted Wasted Compute Compute Driver

Recap: The Solution Add Communication and Long Running Tasks Task 11 Task 12 Barrier Task 13 Task 1N Driver Metrics New Trial

What’s New? Worker discovery and distribution context set-up Task 11 Task 12 Barrier Task 13 Task 1N Driver Discover Launch Oblivious Training Workers Function in Context

What’s New: Distribution Context from maggy import experiment experiment.set_dataset_generator(gen_dataset) experiment.set_model_generator(gen_model) # Hyperparameter optimization experiment.set_context('optimization', 'randomsearch', searchspace) result = experiment.lagom(train_fun) params = result.get('best_hp') # Distributed Training experiment.set_context('dist_training', 'MultiWorkerMirroredStrategy', params) experiment.lagom(train_fun) # Ablation study experiment.set_context('ablation', 'loco', ablation_study, params) experiment.lagom(train_fun)

DEMO Code changes required to go from standard Python code to scale-out hyperparameter tuning and distributed training.

What’s Next Extend the platform to provide a unified development and execution environment for distribution transparent Jupyter Notebooks.

Summary ▪ Moving between distribution contexts requires code rewriting ▪ Factor out obtrusive framework artifacts ▪ Let system handle distribution context ▪ Keep productive high-level APIs

Thank You! Get Started Thanks to the Logical Clocks Team! hopsworks.ai github.com/logicalclocks/maggy Contributions from colleagues Sina Sheikholeslami Twitter ▪ Robin Andersson @morimeister ▪ @jim_dowling Alex Ormenisan ▪ @logicalclocks Kai Jeggle @hopsworks ▪ Web www.logicalclocks.com

Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.

From Python to PySpark and Back Again - Unifying Single-host and - PowerPoint PPT Presentation

From Python to PySpark and Back Again - Unifying Single-host and Distributed Machine Learning with Maggy Moritz Meister, @morimeister Software Engineer, Logical Clocks Jim Dowling, @jim_dowling Associate Professor, KTH Royal Institute of

Again & Again Again & Again Again & Again Again & Again The Detailed

Again & Again Again & Again Again & Again Again & Again Life, like war, is a

Again & Again Again & Again Again & Again Again & Again Gods people

Again & Again Again & Again Again & Again Again & Again The Divine Statement:

Again & Again Again & Again Again & Again Again & Again Afuer the death of

Again & Again Again & Again Again & Again Again & Again Now when all the

Python, PySpark and Riak TS Stephen Etheridge Lead Solution Architect, EMEA Agenda

COMP9313: Big Data Management Classification and PySpark MLlib PySpark MLlib MLlib is

Introduction to PySpark DataFrames BIG DATA F UN DAMEN TALS W ITH P YS PARK Upendra Devisetty

Python for Data Science Overview of Python Why Python Installing Python Installing Python Modules

Python Tidbits Python created by that guy ---> Python is named after Monty Pythons

Overview of PySpark MLlib BIG DATA F UN DAMEN TALS W ITH P YS PARK Upendra Devisetty Science

Introduction to the MovieLens dataset Jamen Long Data Scientist DataCamp Building

Why learn how to build recommendation engines? Jamen Long Data Scientist DataCamp Building

Matrix Multiplication Jamen Long Data Scientist DataCamp Building Recommendation Engines with

Introduction to the Million Songs Dataset Jamen Long Data Scientist DataCamp Building

A Self-Stabilizing Hybrid Fault-Tolerant Synchronization Protocol Mahyar R. Malekpour

1 Compliance Open Webinar June 18, 2020 Travis English Training & Outreach Specialist

Oil Shale Formation Evaluation by Well Logs and Core Measurements Robert Kleinberg

Monitoring Sustainable Logging in the Congo Client Project Goal: Create digital platforms to

Causal Consistency For Large Neo4j Clusters Jim Webber Chief Scientist, Neo4j QCon London Leads

Comparison'of'Bulk'Built7In' Current'Sensors'(BBICS)'in'terms'of'

rrss rrt Pr

Joseph M. Lancaster, Roger D. Chamberlain Dept. of Computer Science and Engineering Washington

From Python to PySpark and Back Again - Unifying Single-host and - PowerPoint PPT Presentation

From Python to PySpark and Back Again - Unifying Single-host and Distributed Machine Learning with Maggy Moritz Meister, @morimeister Software Engineer, Logical Clocks Jim Dowling, @jim_dowling Associate Professor, KTH Royal Institute of

Again &amp; Again Again &amp; Again Again &amp; Again Again &amp; Again The Detailed

Again &amp; Again Again &amp; Again Again &amp; Again Again &amp; Again Life, like war, is a

Again &amp; Again Again &amp; Again Again &amp; Again Again &amp; Again Gods people

Again &amp; Again Again &amp; Again Again &amp; Again Again &amp; Again The Divine Statement:

Again &amp; Again Again &amp; Again Again &amp; Again Again &amp; Again Afuer the death of

Again &amp; Again Again &amp; Again Again &amp; Again Again &amp; Again Now when all the

Python, PySpark and Riak TS Stephen Etheridge Lead Solution Architect, EMEA Agenda

COMP9313: Big Data Management Classification and PySpark MLlib PySpark MLlib MLlib is

Introduction to PySpark DataFrames BIG DATA F UN DAMEN TALS W ITH P YS PARK Upendra Devisetty

Python for Data Science Overview of Python Why Python Installing Python Installing Python Modules

Python Tidbits Python created by that guy ---&gt; Python is named after Monty Pythons

Overview of PySpark MLlib BIG DATA F UN DAMEN TALS W ITH P YS PARK Upendra Devisetty Science

Introduction to the MovieLens dataset Jamen Long Data Scientist DataCamp Building

Why learn how to build recommendation engines? Jamen Long Data Scientist DataCamp Building

Matrix Multiplication Jamen Long Data Scientist DataCamp Building Recommendation Engines with

Introduction to the Million Songs Dataset Jamen Long Data Scientist DataCamp Building

A Self-Stabilizing Hybrid Fault-Tolerant Synchronization Protocol Mahyar R. Malekpour

1 Compliance Open Webinar June 18, 2020 Travis English Training &amp; Outreach Specialist

Oil Shale Formation Evaluation by Well Logs and Core Measurements Robert Kleinberg

Monitoring Sustainable Logging in the Congo Client Project Goal: Create digital platforms to

Causal Consistency For Large Neo4j Clusters Jim Webber Chief Scientist, Neo4j QCon London Leads

Comparison'of'Bulk'Built7In' Current'Sensors'(BBICS)'in'terms'of'

rrss rrt Pr

Joseph M. Lancaster, Roger D. Chamberlain Dept. of Computer Science and Engineering Washington

Again & Again Again & Again Again & Again Again & Again The Detailed

Again & Again Again & Again Again & Again Again & Again Life, like war, is a

Again & Again Again & Again Again & Again Again & Again Gods people

Again & Again Again & Again Again & Again Again & Again The Divine Statement:

Again & Again Again & Again Again & Again Again & Again Afuer the death of

Again & Again Again & Again Again & Again Again & Again Now when all the

Python Tidbits Python created by that guy ---> Python is named after Monty Pythons

1 Compliance Open Webinar June 18, 2020 Travis English Training & Outreach Specialist