From Python to PySpark and Back Again - Unifying Single-host and - - PowerPoint PPT Presentation
From Python to PySpark and Back Again - Unifying Single-host and - - PowerPoint PPT Presentation
From Python to PySpark and Back Again - Unifying Single-host and Distributed Machine Learning with Maggy Moritz Meister, @morimeister Software Engineer, Logical Clocks Jim Dowling, @jim_dowling Associate Professor, KTH Royal Institute of
From Python to PySpark and Back Again
- Unifying Single-host and Distributed Machine Learning with Maggy
Moritz Meister, @morimeister Software Engineer, Logical Clocks Jim Dowling, @jim_dowling Associate Professor, KTH Royal Institute of Technology
ML Model Development
A simplified view
Exploration Experimentation Model Training Explainability and Validation Serving Feature Pipelines
ML Model Development
Explore and Design Experimentation: Tune and Search Model Training (Distributed) Explainability and Ablation Studies
It’s simple - only four steps
Artifacts and Non DRY Code
Explore and Design Experimentation: Tune and Search Model Training (Distributed) Explainability and Ablation Studies
What It’s Really Like
… not linear but iterative
What It’s Really Really Like
… not linear but iterative
Root Cause: Iterative Development of ML Models
Explore and Design Experimentation: Tune and Search Model Training (Distributed) Explainability and Ablation Studies
Ablation Studies EDA HParam Tuning Training (Dist)
Iterative Development Is a Pain, We Need DRY Code!
Each step requires different implementations of the training code
OBLIVIOUS TRAINING FUNCTION
# RUNS ON THE WORKERS def train(): def input_fn(): # return dataset model = …
- ptimizer = …
model.compile(…) rc = tf.estimator.RunConfig( ‘CollectiveAllReduceStrate gy’) keras_estimator = tf.keras.estimator. model_to_estimator(….) tf.estimator.train_and_evaluate( keras_estimator, input_fn)
Ablation Studies EDA HParam Tuning Training (Dist)
The Oblivious Training Function
Challenge: Obtrusive Framework Artifacts
▪ TF_CONFIG ▪ Distribution Strategy ▪ Dataset (Sharding, DFS) ▪ Integration in Python - hard from inside a notebook ▪ Keras vs. Estimator vs. Custom Training Loop
Example: TensorFlow
Where is Deep Learning headed?
Productive High-Level APIs
Or why data scientists love Keras and PyTorch
Idea Experiment Results Infrastructure Framework Tracking Visualization
Francois Chollet, “Keras: The Next 5 Years”
Productive High-Level APIs
Or why data scientists love Keras and PyTorch
Idea Experiment Results Infrastructure Framework Tracking Visualization
Francois Chollet, “Keras: The Next 5 Years”
?
Hopsworks (Open Source) Databricks Apache Spark Cloud Providers
How do we keep our high-level APIs transparent and productive?
What Is Transparent Code?
def dataset(batch_size): (x_train, y_train) = load_data() x_train = x_train / np.float32(255) y_train = y_train.astype(np.int64) train_dataset = tf.data.Dataset.from_tensor_slices( (x_train,y_train)).shuffle(60000) .repeat().batch(batch_size) return train_dataset def build_and_compile_cnn_model(lr): model = tf.keras.Sequential([ tf.keras.Input(shape=(28, 28)), tf.keras.layers.Conv2D(32, 3, activation='relu'), tf.keras.layers.Flatten(), tf.keras.layers.Dense(128, activation='relu'), tf.keras.layers.Dense(10) ]) model.compile( loss=SparseCategoricalCrossentropy(from_logits=True),
- ptimizer=SGD(learning_rate=lr))
return model def dataset(batch_size): (x_train, y_train) = load_data() x_train = x_train / np.float32(255) y_train = y_train.astype(np.int64) train_dataset = tf.data.Dataset.from_tensor_slices( (x_train,y_train)).shuffle(60000) .repeat().batch(batch_size) return train_dataset def build_and_compile_cnn_model(lr): model = tf.keras.Sequential([ tf.keras.Input(shape=(28, 28)), tf.keras.layers.Conv2D(32, 3, activation='relu'), tf.keras.layers.Flatten(), tf.keras.layers.Dense(128, activation='relu'), tf.keras.layers.Dense(10) ]) model.compile( loss=SparseCategoricalCrossentropy(from_logits=True),
- ptimizer=SGD(learning_rate=lr))
return model
NO CHANGES!
Building Blocks for Distribution Transparency
Distribution Context
Single-host vs. parallel multi-host vs. distributed multi-host
Worker 1 Worker 5 Worker 3 Worker 2 Worker 4 Worker 7 Worker 8 Worker 6
Driver
TF_CONFIG
Driver
Experiment Controller
Worker 1 Worker N Worker 2
Single Host
Distribution Context
Single-host vs. parallel multi-host vs. distributed multi-host
Worker 1 Worker 5 Worker 3 Worker 2 Worker 4 Worker 7 Worker 8 Worker 6
Driver
TF_CONFIG
Driver
Experiment Controller
Worker 1 Worker N Worker 2
Single Host
Explore and Design Experimentation: Tune and Search Model Training (Distributed) Explainability and Ablation Studies
Model Development Best Practices
▪ Modularize ▪ Parametrize ▪ Higher order training
functions
▪ Usage of callbacks at
runtime
Dataset Generation Model Generation Training Logic
Oblivious Training Function as an Abstraction
Let the system handle the complexities
System takes care of ...
… fixing parameters … launching the function … launching trials (parametrized instantiations of the function) … generating new trials … collecting and logging results … setting up TF_CONFIG … wrapping in Distribution Strategy … launching function as workers … collecting results
Maggy
Spark+AI Summit 2019 Today
With Hopsworks and Maggy, we provide a unified development and execution environment for distribution transparent ML model development.
Make the Oblivious Training Function a core abstraction on Hopsworks
Hopsworks - Award Winning Plattform
Recap: Maggy - Asynchronous Trials on Spark
Spark is bulk-synchronous
Wasted Compute Wasted Compute
HopsFS Barrier
Task11 Task12 Task13 Task1N
Driver
Metrics1
Barrier
Task21 Task22 Task23 Task2N Metrics2
Barrier
Task31 Task32 Task33 Task3N Metrics3
Wasted Compute
Early-Stopping
Recap: The Solution
Add Communication and Long Running Tasks
Task11 Task12 Task13 Task1N
Driver Barrier
Metrics New Trial
What’s New?
Worker discovery and distribution context set-up
Task11 Task12 Task13 Task1N
Driver Barrier
Launch Oblivious Training Function in Context Discover Workers
What’s New: Distribution Context
from maggy import experiment experiment.set_dataset_generator(gen_dataset) experiment.set_model_generator(gen_model) # Hyperparameter optimization experiment.set_context('optimization', 'randomsearch', searchspace) result = experiment.lagom(train_fun) params = result.get('best_hp') # Distributed Training experiment.set_context('dist_training', 'MultiWorkerMirroredStrategy', params) experiment.lagom(train_fun) # Ablation study experiment.set_context('ablation', 'loco', ablation_study, params) experiment.lagom(train_fun)
DEMO
Code changes required to go from standard Python code to scale-out hyperparameter tuning and distributed training.
What’s Next
Extend the platform to provide a unified development and execution environment for distribution transparent Jupyter Notebooks.
Summary
▪ Moving between distribution contexts requires code rewriting ▪ Factor out obtrusive framework artifacts ▪ Let system handle distribution context ▪ Keep productive high-level APIs
Thank You!
Get Started hopsworks.ai github.com/logicalclocks/maggy Twitter @morimeister @jim_dowling @logicalclocks @hopsworks Web www.logicalclocks.com Contributions from colleagues
▪
Sina Sheikholeslami
▪
Robin Andersson
▪
Alex Ormenisan
▪
Kai Jeggle