Distributed Deep Learning Using Hopsworks CGI Trainee Program - PowerPoint PPT Presentation

I NTRO H OPSWORKS D ISTRIBUTED DL B LACK -B OX O PTIMIZATION F EATURE S TORE S UMMARY D EMO /W ORKSHOP Distributed Deep Learning Using Hopsworks CGI Trainee Program Workshop Kim Hammar kim@logicalclocks.com

Before we start.. 1. Register for an account at: www.hops.site 2. Follow the instructions at: http://bit.ly/2EnZQgW

I NTRO H OPSWORKS D ISTRIBUTED DL B LACK -B OX O PTIMIZATION F EATURE S TORE S UMMARY D EMO /W ORKSHOP D ISTRIBUTED C OMPUTING + D EEP L EARNING = ? Distributed Computing Deep Learning b 0 b 1 x 0 , 1 x 1 , 1 y ˆ x 0 , 2 x 1 , 2 x 0 , 3 x 1 , 3 Why Combine the two? 2em1 1 Chen Sun et al. “Revisiting Unreasonable Effectiveness of Data in Deep Learning Era”. In: CoRR abs/1707.02968 (2017). arXiv: 1707.02968 . URL : http://arxiv.org/abs/1707.02968 . 2em1 2 Jeffrey Dean et al. “Large Scale Distributed Deep Networks”. In: Advances in Neural Information Processing Systems 25 . Ed. by F. Pereira et al. Curran Associates, Inc., 2012, pp. 1223–1231.

I NTRO H OPSWORKS D ISTRIBUTED DL B LACK -B OX O PTIMIZATION F EATURE S TORE S UMMARY D EMO /W ORKSHOP D ISTRIBUTED C OMPUTING + D EEP L EARNING = ? Distributed Computing Deep Learning b 0 b 1 x 0 , 1 x 1 , 1 y ˆ x 0 , 2 x 1 , 2 x 0 , 3 x 1 , 3 Why Combine the two? ◮ We like challenging problems � 2em1 1 Chen Sun et al. “Revisiting Unreasonable Effectiveness of Data in Deep Learning Era”. In: CoRR abs/1707.02968 (2017). arXiv: 1707.02968 . URL : http://arxiv.org/abs/1707.02968 . 2em1 2 Jeffrey Dean et al. “Large Scale Distributed Deep Networks”. In: Advances in Neural Information Processing Systems 25 . Ed. by F. Pereira et al. Curran Associates, Inc., 2012, pp. 1223–1231.

I NTRO H OPSWORKS D ISTRIBUTED DL B LACK -B OX O PTIMIZATION F EATURE S TORE S UMMARY D EMO /W ORKSHOP D ISTRIBUTED C OMPUTING + D EEP L EARNING = ? Distributed Computing Deep Learning b 0 b 1 x 0 , 1 x 1 , 1 y ˆ x 0 , 2 x 1 , 2 x 0 , 3 x 1 , 3 Why Combine the two? ◮ We like challenging problems � ◮ More productive data science ◮ Unreasonable effectiveness of data 1 ◮ To achieve state-of-the-art results 2 2em1 1 Chen Sun et al. “Revisiting Unreasonable Effectiveness of Data in Deep Learning Era”. In: CoRR abs/1707.02968 (2017). arXiv: 1707.02968 . URL : http://arxiv.org/abs/1707.02968 . 2em1 2 Jeffrey Dean et al. “Large Scale Distributed Deep Networks”. In: Advances in Neural Information Processing Systems 25 . Ed. by F. Pereira et al. Curran Associates, Inc., 2012, pp. 1223–1231.

I NTRO H OPSWORKS D ISTRIBUTED DL B LACK -B OX O PTIMIZATION F EATURE S TORE S UMMARY D EMO /W ORKSHOP D ISTRIBUTED D EEP L EARNING (DDL): P REDICTABLE S CALING 3 2em1 3 Jeff Dean. Building Intelligent Systems withLarge Scale Deep Learning . https : / / www . scribd . com / document/355752799/Jeff-Dean-s-Lecture-for-YC-AI . 2018.

I NTRO H OPSWORKS D ISTRIBUTED DL B LACK -B OX O PTIMIZATION F EATURE S TORE S UMMARY D EMO /W ORKSHOP D ISTRIBUTED D EEP L EARNING (DDL): P REDICTABLE S CALING

I NTRO H OPSWORKS D ISTRIBUTED DL B LACK -B OX O PTIMIZATION F EATURE S TORE S UMMARY D EMO /W ORKSHOP DDL IS NOT A SECRET ANYMORE 4 2em1 4 Tal Ben-Nun and Torsten Hoefler. “Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis”. In: CoRR abs/1802.09941 (2018). arXiv: 1802.09941 . URL : http://arxiv.org/abs/ 1802.09941 .

I NTRO H OPSWORKS D ISTRIBUTED DL B LACK -B OX O PTIMIZATION F EATURE S TORE S UMMARY D EMO /W ORKSHOP DDL IS NOT A SECRET ANYMORE Companies using DDL Frameworks for DDL TensorflowOnSpark Distributed TF CaffeOnSpark

I NTRO H OPSWORKS D ISTRIBUTED DL B LACK -B OX O PTIMIZATION F EATURE S TORE S UMMARY D EMO /W ORKSHOP DDL REQUIRES AN ENTIRE SOFTWARE / INFRASTRUCTURE STACK Distributed Systems A/B Data Validation Testing Distributed Training e 4 Gradient ∇ Gradient ∇ Model Serving Data Collection e 1 e 3 HyperParameter Tuning Gradient ∇ Gradient ∇ Monitoring Hardware e 2 Management Feature Engineering Pipeline Management

I NTRO H OPSWORKS D ISTRIBUTED DL B LACK -B OX O PTIMIZATION F EATURE S TORE S UMMARY D EMO /W ORKSHOP O UTLINE 1. Hopsworks : Background of the platform 2. Managed Distributed Deep Learning using HopsYARN, HopsML, PySpark, and Tensorflow 3. Black-Box Optimization (Hyperparameter Tuning) using Hopsworks, Metadata Store, PySpark, and Maggy 5 4. Feature Store data management for machine learning 5. Coffee Break 6. Demo , end-to-end ML pipeline 7. Hands-on Workshop , try out Hopsworks on our cluster in Luleå 2em1 5 Moritz Meister and Sina Sheikholeslami. Maggy . https://github.com/logicalclocks/maggy . 2019.

I NTRO H OPSWORKS D ISTRIBUTED DL B LACK -B OX O PTIMIZATION F EATURE S TORE S UMMARY D EMO /W ORKSHOP H OPSWORKS

I NTRO H OPSWORKS D ISTRIBUTED DL B LACK -B OX O PTIMIZATION F EATURE S TORE S UMMARY D EMO /W ORKSHOP H OPSWORKS HopsFS

I NTRO H OPSWORKS D ISTRIBUTED DL B LACK -B OX O PTIMIZATION F EATURE S TORE S UMMARY D EMO /W ORKSHOP H OPSWORKS HopsYARN (GPU/CPU as a resource) HopsFS

I NTRO H OPSWORKS D ISTRIBUTED DL B LACK -B OX O PTIMIZATION F EATURE S TORE S UMMARY D EMO /W ORKSHOP H OPSWORKS Frameworks (ML/Data) HopsYARN (GPU/CPU as a resource) HopsFS

I NTRO H OPSWORKS D ISTRIBUTED DL B LACK -B OX O PTIMIZATION F EATURE S TORE S UMMARY D EMO /W ORKSHOP H OPSWORKS b 0 b 1 x 0 , 1 x 1 , 1 ML/AI Assets ˆ y x 0 , 2 x 1 , 2 x 0 , 3 x 1 , 3 Feature Store Pipelines Experiments Models Frameworks (ML/Data) HopsYARN (GPU/CPU as a resource) HopsFS

I NTRO H OPSWORKS D ISTRIBUTED DL B LACK -B OX O PTIMIZATION F EATURE S TORE S UMMARY D EMO /W ORKSHOP H OPSWORKS from hops import featurestore from hops import experiment APIs featurestore.get_features([ "average_attendance", "average_player_age"]) experiment.collective_all_reduce(features , model) b 0 b 1 x 0 , 1 x 1 , 1 ML/AI Assets ˆ y x 0 , 2 x 1 , 2 x 0 , 3 x 1 , 3 Experiments Models Feature Store Pipelines Frameworks (ML/Data) HopsYARN (GPU/CPU as a resource) HopsFS

I NTRO H OPSWORKS D ISTRIBUTED DL B LACK -B OX O PTIMIZATION F EATURE S TORE S UMMARY D EMO /W ORKSHOP H OPSWORKS from hops import featurestore from hops import experiment APIs featurestore.get_features([ "average_attendance", "average_player_age"]) experiment.collective_all_reduce(features , model) b 0 b 1 ML/AI Assets x 0 , 1 x 1 , 1 ˆ y x 0 , 2 x 1 , 2 x 0 , 3 x 1 , 3 Experiments Models Feature Store Pipelines Frameworks (ML/Data) HopsYARN (GPU/CPU as a resource) Distributed Metadata (Available from REST API) HopsFS

D EMO /W ORKSHOP I NNER AND O UTER L OOP OF L ARGE S CALE D EEP S UMMARY b 0 b 1 b 1 b 1 b 1 worker N ∇ N x 0 , 1 x 1 , 1 x 1 , 1 x 1 , 1 x 1 , 1 y ˆ F EATURE S TORE x 0 , 2 x 1 , 2 x 1 , 2 x 1 , 2 x 1 , 2 x 0 , 3 x 1 , 3 x 1 , 3 x 1 , 3 x 1 , 3 . . . Synchronization B LACK -B OX O PTIMIZATION Data b 0 b 1 b 1 b 1 b 1 worker 2 ∇ 2 x 0 , 1 x 1 , 1 x 1 , 1 x 1 , 1 x 1 , 1 y ˆ x 0 , 2 x 1 , 2 x 1 , 2 x 1 , 2 x 1 , 2 x 0 , 3 x 1 , 3 x 1 , 3 x 1 , 3 x 1 , 3 b 0 b 1 b 1 b 1 b 1 worker 1 ∇ 1 x 0 , 1 x 1 , 1 x 1 , 1 x 1 , 1 x 1 , 1 D ISTRIBUTED DL ˆ y Inner loop x 0 , 2 x 1 , 2 x 1 , 2 x 1 , 2 x 1 , 2 x 0 , 3 x 1 , 3 x 1 , 3 x 1 , 3 x 1 , 3 L EARNING H OPSWORKS I NTRO

I NTRO H OPSWORKS D ISTRIBUTED DL B LACK -B OX O PTIMIZATION F EATURE S TORE S UMMARY D EMO /W ORKSHOP I NNER AND O UTER L OOP OF L ARGE S CALE D EEP L EARNING Outer loop Inner loop Data worker 1 worker 2 worker N x 0 , 3 x 0 , 2 x 0 , 1 x 0 , 3 x 0 , 2 x 0 , 1 x 0 , 3 x 0 , 2 x 0 , 1 b 0 b 0 b 0 hparams � h x 1 , 3 x 1 , 2 x 1 , 1 b 1 x 1 , 3 x 1 , 2 x 1 , 1 b 1 x 1 , 3 x 1 , 2 x 1 , 1 b 1 . . . x 1 , 3 x 1 , 2 x 1 , 1 x 1 , 3 x 1 , 2 x 1 , 1 x 1 , 3 x 1 , 2 x 1 , 1 b 1 b 1 b 1 Search Method x 1 , 3 x 1 , 2 x 1 , 1 x 1 , 3 x 1 , 2 x 1 , 1 x 1 , 3 x 1 , 2 x 1 , 1 b 1 b 1 b 1 x 1 , 3 x 1 , 2 x 1 , 1 b 1 x 1 , 3 x 1 , 2 x 1 , 1 b 1 x 1 , 3 x 1 , 2 x 1 , 1 b 1 y ˆ y ˆ ˆ y ∇ 1 ∇ 2 ∇ N Synchronization Metric τ

I NTRO H OPSWORKS D ISTRIBUTED DL B LACK -B OX O PTIMIZATION F EATURE S TORE S UMMARY D EMO /W ORKSHOP I NNER L OOP : D ISTRIBUTED D EEP L EARNING Gradient ∇ θ L ( y , ˆ y ) b 0 b 1   x 1 x 0 , 1 x 1 , 1 .   ˆ L ( y , ˆ . y y ) .   y ˆ   x 0 , 2 x 1 , 2 x n x 0 , 3 x 1 , 3 Features Model θ Prediction Loss

I NTRO H OPSWORKS D ISTRIBUTED DL B LACK -B OX O PTIMIZATION F EATURE S TORE S UMMARY D EMO /W ORKSHOP I NNER L OOP : D ISTRIBUTED D EEP L EARNING p 4 Data Partition e 4 Gradient ∇ Gradient ∇ p 1 e 1 e 3 p 3 Data Partition Data Partition Gradient ∇ Gradient ∇ e 2 p 2 Data Partition

Distributed Deep Learning Using Hopsworks CGI Trainee Program - PowerPoint PPT Presentation

I NTRO H OPSWORKS D ISTRIBUTED DL B LACK -B OX O PTIMIZATION F EATURE S TORE S UMMARY D EMO /W ORKSHOP Distributed Deep Learning Using Hopsworks CGI Trainee Program Workshop Kim Hammar kim@logicalclocks.com Before we start.. 1. Register for an

Distributed Deep Learning Using Hopsworks SF Machine Learning Mesosphere Kim Hammar

Distributed Deep Learning Mathew Salvaris What will be covered Overview of Distributed

Tiresias A GPU Cluster Manager for Distributed Deep Learning Ju Junchen eng g Gu , Mosharaf

Hopsworks Dr. Jim Dowling Sina Sheikholeslami Nov 2019 Hopsworks Technical Milestones First

Distributed Deep Learning with Horovod Alex Sergeev, Machine Learning Platform, Uber Engineering

TensorLights End-Host Traffic Scheduling for Distributed Deep Learning Xin Sunny Huang Ang Chen

Pluto A Distributed Heterogeneous Deep Learning Framework Jun Yang, Yan Chen Large Scale

Learning and Generating Distributed Routing Protocols Using Graph-Based Deep Learning Fabien

Distributed Synthetic Data Platform for Deep Learning Applications BITCOIN OR ETHER AMAZON DEEP

Distributed DeepLearning at Scale Soumith Chintala Facebook AI Research Overview Deep

Distributed Deep Learning Inference using Apache MXNet* and Apache Spark Naveen Swamy Amazon AI

Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis

Dual-Way Gradient Sparsification for Asynchronous Distributed Deep Learning Zijie Yan,

Distributed Hierarchical GPU Parameter Server for Massive Scale Deep Learning Ads Systems

Optimization of Deep Learning Applications in Highly Heterogeneous Distributed System

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

Computation Graphs Philipp Koehn 29 September 2020 Philipp Koehn Machine Translation:

>>> ELEG5491: Introduction to Deep Learning >>> PyTorch Tutorials Name: GE

Geometric data analysis, beyond convolutions Jean Feydy, under the supervision of Alain Trouv

6/18/2018 Legacy Something handed down from one generation to the next! Are you leaving a

Galatians: week 2 Galatians 2:15-21 And count the patience of our Lord as salvation, just as our

Lab 4 preview Hung-Wei Tseng In Lab 4... You will be extending the datapath and control unit

Identifying Frequent Items in Sliding Windows over On-Line Packet Streams Alejandro Lpez-Ortiz

Parallelism, Multicore, and Synchronization Hakim Weatherspoon CS 3410 Computer Science

Distributed Deep Learning Using Hopsworks CGI Trainee Program - PowerPoint PPT Presentation

I NTRO H OPSWORKS D ISTRIBUTED DL B LACK -B OX O PTIMIZATION F EATURE S TORE S UMMARY D EMO /W ORKSHOP Distributed Deep Learning Using Hopsworks CGI Trainee Program Workshop Kim Hammar kim@logicalclocks.com Before we start.. 1. Register for an

Distributed Deep Learning Using Hopsworks SF Machine Learning Mesosphere Kim Hammar

Distributed Deep Learning Mathew Salvaris What will be covered Overview of Distributed

Tiresias A GPU Cluster Manager for Distributed Deep Learning Ju Junchen eng g Gu , Mosharaf

Hopsworks Dr. Jim Dowling Sina Sheikholeslami Nov 2019 Hopsworks Technical Milestones First

Distributed Deep Learning with Horovod Alex Sergeev, Machine Learning Platform, Uber Engineering

TensorLights End-Host Traffic Scheduling for Distributed Deep Learning Xin Sunny Huang Ang Chen

Pluto A Distributed Heterogeneous Deep Learning Framework Jun Yang, Yan Chen Large Scale

Learning and Generating Distributed Routing Protocols Using Graph-Based Deep Learning Fabien

Distributed Synthetic Data Platform for Deep Learning Applications BITCOIN OR ETHER AMAZON DEEP

Distributed DeepLearning at Scale Soumith Chintala Facebook AI Research Overview Deep

Distributed Deep Learning Inference using Apache MXNet* and Apache Spark Naveen Swamy Amazon AI

Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis

Dual-Way Gradient Sparsification for Asynchronous Distributed Deep Learning Zijie Yan,

Distributed Hierarchical GPU Parameter Server for Massive Scale Deep Learning Ads Systems

Optimization of Deep Learning Applications in Highly Heterogeneous Distributed System

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

Computation Graphs Philipp Koehn 29 September 2020 Philipp Koehn Machine Translation:

&gt;&gt;&gt; ELEG5491: Introduction to Deep Learning &gt;&gt;&gt; PyTorch Tutorials Name: GE

Geometric data analysis, beyond convolutions Jean Feydy, under the supervision of Alain Trouv

6/18/2018 Legacy Something handed down from one generation to the next! Are you leaving a

Galatians: week 2 Galatians 2:15-21 And count the patience of our Lord as salvation, just as our

Lab 4 preview Hung-Wei Tseng In Lab 4... You will be extending the datapath and control unit

Identifying Frequent Items in Sliding Windows over On-Line Packet Streams Alejandro Lpez-Ortiz

Parallelism, Multicore, and Synchronization Hakim Weatherspoon CS 3410 Computer Science

>>> ELEG5491: Introduction to Deep Learning >>> PyTorch Tutorials Name: GE