Distributed Deep Learning Using Hopsworks SF Machine Learning - PowerPoint PPT Presentation

I NTRODUCTION H OPSWORKS D ISTRIBUTED D EEP L EARNING P ARALLEL B LACK -B OX O PTIMIZATION S UMMARY Distributed Deep Learning Using Hopsworks SF Machine Learning Mesosphere Kim Hammar kim@logicalclocks.com

I NTRODUCTION H OPSWORKS D ISTRIBUTED D EEP L EARNING P ARALLEL B LACK -B OX O PTIMIZATION S UMMARY D ISTRIBUTED C OMPUTING + D EEP L EARNING = ? Distributed Computing Deep Learning b 0 b 1 x 0 , 1 x 1 , 1 ˆ y x 0 , 2 x 1 , 2 x 0 , 3 x 1 , 3 Why Combine the two? 2em1 1 Chen Sun et al. “Revisiting Unreasonable Effectiveness of Data in Deep Learning Era”. In: CoRR abs/1707.02968 (2017). arXiv: 1707.02968 . URL : http://arxiv.org/abs/1707.02968 . 2em1 2 Jeffrey Dean et al. “Large Scale Distributed Deep Networks”. In: Advances in Neural Information Processing Systems 25 . Ed. by F. Pereira et al. Curran Associates, Inc., 2012, pp. 1223–1231.

I NTRODUCTION H OPSWORKS D ISTRIBUTED D EEP L EARNING P ARALLEL B LACK -B OX O PTIMIZATION S UMMARY D ISTRIBUTED C OMPUTING + D EEP L EARNING = ? Distributed Computing Deep Learning b 0 b 1 x 0 , 1 x 1 , 1 ˆ y x 0 , 2 x 1 , 2 x 0 , 3 x 1 , 3 Why Combine the two? ◮ We like challenging problems � 2em1 1 Chen Sun et al. “Revisiting Unreasonable Effectiveness of Data in Deep Learning Era”. In: CoRR abs/1707.02968 (2017). arXiv: 1707.02968 . URL : http://arxiv.org/abs/1707.02968 . 2em1 2 Jeffrey Dean et al. “Large Scale Distributed Deep Networks”. In: Advances in Neural Information Processing Systems 25 . Ed. by F. Pereira et al. Curran Associates, Inc., 2012, pp. 1223–1231.

I NTRODUCTION H OPSWORKS D ISTRIBUTED D EEP L EARNING P ARALLEL B LACK -B OX O PTIMIZATION S UMMARY D ISTRIBUTED C OMPUTING + D EEP L EARNING = ? Distributed Computing Deep Learning b 0 b 1 x 0 , 1 x 1 , 1 ˆ y x 0 , 2 x 1 , 2 x 0 , 3 x 1 , 3 Why Combine the two? ◮ We like challenging problems � ◮ More productive data science ◮ Unreasonable effectiveness of data 1 ◮ To achieve state-of-the-art results 2 2em1 1 Chen Sun et al. “Revisiting Unreasonable Effectiveness of Data in Deep Learning Era”. In: CoRR abs/1707.02968 (2017). arXiv: 1707.02968 . URL : http://arxiv.org/abs/1707.02968 . 2em1 2 Jeffrey Dean et al. “Large Scale Distributed Deep Networks”. In: Advances in Neural Information Processing Systems 25 . Ed. by F. Pereira et al. Curran Associates, Inc., 2012, pp. 1223–1231.

I NTRODUCTION H OPSWORKS D ISTRIBUTED D EEP L EARNING P ARALLEL B LACK -B OX O PTIMIZATION S UMMARY D ISTRIBUTED D EEP L EARNING (DDL): P REDICTABLE S CALING 3 2em1 3 Jeff Dean. Building Intelligent Systems withLarge Scale Deep Learning . https : / / www . scribd . com / document/355752799/Jeff-Dean-s-Lecture-for-YC-AI . 2018.

I NTRODUCTION H OPSWORKS D ISTRIBUTED D EEP L EARNING P ARALLEL B LACK -B OX O PTIMIZATION S UMMARY D ISTRIBUTED D EEP L EARNING (DDL): P REDICTABLE S CALING

I NTRODUCTION H OPSWORKS D ISTRIBUTED D EEP L EARNING P ARALLEL B LACK -B OX O PTIMIZATION S UMMARY DDL IS NOT A SECRET ANYMORE 4 2em1 4 Tal Ben-Nun and Torsten Hoefler. “Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis”. In: CoRR abs/1802.09941 (2018). arXiv: 1802.09941 . URL : http://arxiv.org/abs/ 1802.09941 .

I NTRODUCTION H OPSWORKS D ISTRIBUTED D EEP L EARNING P ARALLEL B LACK -B OX O PTIMIZATION S UMMARY DDL IS NOT A SECRET ANYMORE Companies using DDL Frameworks for DDL TensorflowOnSpark Distributed TF CaffeOnSpark

I NTRODUCTION H OPSWORKS D ISTRIBUTED D EEP L EARNING P ARALLEL B LACK -B OX O PTIMIZATION S UMMARY DDL REQUIRES AN ENTIRE SOFTWARE / INFRASTRUCTURE STACK Distributed Systems A/B Data Validation Testing Distributed Training e 4 Gradient ∇ Gradient ∇ Model Serving Data Collection e 1 e 3 HyperParameter Tuning Gradient ∇ Gradient ∇ Monitoring Hardware e 2 Management Feature Engineering Pipeline Management

I NTRODUCTION H OPSWORKS D ISTRIBUTED D EEP L EARNING P ARALLEL B LACK -B OX O PTIMIZATION S UMMARY O UTLINE 1. Hopsworks : Background of the platform 2. Managed Distributed Deep Learning using HopsYARN, HopsML, PySpark, and Tensorflow 3. Black-Box Optimization using Hopsworks, Metadata Store, PySpark, and Maggy 5 2em1 5 Moritz Meister and Sina Sheikholeslami. Maggy . https://github.com/logicalclocks/maggy . 2019.

I NTRODUCTION H OPSWORKS D ISTRIBUTED D EEP L EARNING P ARALLEL B LACK -B OX O PTIMIZATION S UMMARY O UTLINE 1. Hopsworks : Background of the platform 2. Managed Distributed Deep Learning using HopsYARN, HopsML, PySpark, and Tensorflow 3. Black-Box Optimization using Hopsworks, Metadata Store, PySpark, and Maggy 6 2em1 6 Moritz Meister and Sina Sheikholeslami. Maggy . https://github.com/logicalclocks/maggy . 2019.

I NTRODUCTION H OPSWORKS D ISTRIBUTED D EEP L EARNING P ARALLEL B LACK -B OX O PTIMIZATION S UMMARY H OPSWORKS

I NTRODUCTION H OPSWORKS D ISTRIBUTED D EEP L EARNING P ARALLEL B LACK -B OX O PTIMIZATION S UMMARY H OPSWORKS HopsFS

I NTRODUCTION H OPSWORKS D ISTRIBUTED D EEP L EARNING P ARALLEL B LACK -B OX O PTIMIZATION S UMMARY H OPSWORKS HopsYARN (GPU/CPU as a resource) HopsFS

I NTRODUCTION H OPSWORKS D ISTRIBUTED D EEP L EARNING P ARALLEL B LACK -B OX O PTIMIZATION S UMMARY H OPSWORKS Frameworks (ML/Data) HopsYARN (GPU/CPU as a resource) HopsFS

I NTRODUCTION H OPSWORKS D ISTRIBUTED D EEP L EARNING P ARALLEL B LACK -B OX O PTIMIZATION S UMMARY H OPSWORKS b 0 b 1 x 0 , 1 x 1 , 1 ML/AI Assets ˆ y x 0 , 2 x 1 , 2 x 0 , 3 x 1 , 3 Feature Store Pipelines Experiments Models Frameworks (ML/Data) HopsYARN (GPU/CPU as a resource) HopsFS

I NTRODUCTION H OPSWORKS D ISTRIBUTED D EEP L EARNING P ARALLEL B LACK -B OX O PTIMIZATION S UMMARY H OPSWORKS from hops import featurestore from hops import experiment APIs featurestore.get_features([ "average_attendance", "average_player_age"]) experiment.collective_all_reduce(features , model) b 0 b 1 x 0 , 1 x 1 , 1 ML/AI Assets ˆ y x 0 , 2 x 1 , 2 x 0 , 3 x 1 , 3 Experiments Models Feature Store Pipelines Frameworks (ML/Data) HopsYARN (GPU/CPU as a resource) HopsFS

I NTRODUCTION H OPSWORKS D ISTRIBUTED D EEP L EARNING P ARALLEL B LACK -B OX O PTIMIZATION S UMMARY H OPSWORKS from hops import featurestore from hops import experiment APIs featurestore.get_features([ "average_attendance", "average_player_age"]) experiment.collective_all_reduce(features , model) b 0 b 1 ML/AI Assets x 0 , 1 x 1 , 1 ˆ y x 0 , 2 x 1 , 2 x 0 , 3 x 1 , 3 Experiments Models Feature Store Pipelines Frameworks (ML/Data) HopsYARN (GPU/CPU as a resource) Distributed Metadata (Available from REST API) HopsFS

S UMMARY I NNER AND O UTER L OOP OF L ARGE S CALE D EEP P ARALLEL B LACK -B OX O PTIMIZATION b 0 b 1 b 1 b 1 b 1 worker N ∇ N x 0 , 1 x 1 , 1 x 1 , 1 x 1 , 1 x 1 , 1 ˆ y x 0 , 2 x 1 , 2 x 1 , 2 x 1 , 2 x 1 , 2 x 0 , 3 x 1 , 3 x 1 , 3 x 1 , 3 x 1 , 3 . . . Synchronization Data D ISTRIBUTED D EEP L EARNING b 0 b 1 b 1 b 1 b 1 worker 2 ∇ 2 x 0 , 1 x 1 , 1 x 1 , 1 x 1 , 1 x 1 , 1 ˆ y x 0 , 2 x 1 , 2 x 1 , 2 x 1 , 2 x 1 , 2 x 0 , 3 x 1 , 3 x 1 , 3 x 1 , 3 x 1 , 3 b 0 b 1 b 1 b 1 b 1 worker 1 ∇ 1 x 0 , 1 x 1 , 1 x 1 , 1 x 1 , 1 x 1 , 1 y ˆ Inner loop x 0 , 2 x 1 , 2 x 1 , 2 x 1 , 2 x 1 , 2 x 0 , 3 x 1 , 3 x 1 , 3 x 1 , 3 x 1 , 3 H OPSWORKS L EARNING I NTRODUCTION

I NTRODUCTION H OPSWORKS D ISTRIBUTED D EEP L EARNING P ARALLEL B LACK -B OX O PTIMIZATION S UMMARY I NNER AND O UTER L OOP OF L ARGE S CALE D EEP L EARNING Outer loop Inner loop Data worker 1 worker 2 worker N x 0 , 3 x 0 , 2 x 0 , 1 x 0 , 3 x 0 , 2 x 0 , 1 x 0 , 3 x 0 , 2 x 0 , 1 b 0 b 0 b 0 hparams � h x 1 , 3 x 1 , 2 x 1 , 1 b 1 x 1 , 3 x 1 , 2 x 1 , 1 b 1 x 1 , 3 x 1 , 2 x 1 , 1 b 1 . . . x 1 , 3 x 1 , 2 x 1 , 1 x 1 , 3 x 1 , 2 x 1 , 1 x 1 , 3 x 1 , 2 x 1 , 1 b 1 b 1 b 1 Search Method x 1 , 3 x 1 , 2 x 1 , 1 x 1 , 3 x 1 , 2 x 1 , 1 x 1 , 3 x 1 , 2 x 1 , 1 b 1 b 1 b 1 x 1 , 3 x 1 , 2 x 1 , 1 b 1 x 1 , 3 x 1 , 2 x 1 , 1 b 1 x 1 , 3 x 1 , 2 x 1 , 1 b 1 y ˆ y ˆ ˆ y ∇ 1 ∇ 2 ∇ N Synchronization Metric τ

I NTRODUCTION H OPSWORKS D ISTRIBUTED D EEP L EARNING P ARALLEL B LACK -B OX O PTIMIZATION S UMMARY I NNER L OOP : D ISTRIBUTED D EEP L EARNING e 4 e 1 e 3 e 2

I NTRODUCTION H OPSWORKS D ISTRIBUTED D EEP L EARNING P ARALLEL B LACK -B OX O PTIMIZATION S UMMARY I NNER L OOP : D ISTRIBUTED D EEP L EARNING e 4 Gradient ∇ Gradient ∇ e 1 e 3 Gradient ∇ Gradient ∇ e 2

I NTRODUCTION H OPSWORKS D ISTRIBUTED D EEP L EARNING P ARALLEL B LACK -B OX O PTIMIZATION S UMMARY I NNER L OOP : D ISTRIBUTED D EEP L EARNING p 4 Data Partition e 4 Gradient ∇ Gradient ∇ p 1 e 1 e 3 p 3 Data Partition Data Partition Gradient ∇ Gradient ∇ e 2 p 2 Data Partition

Distributed Deep Learning Using Hopsworks SF Machine Learning - PowerPoint PPT Presentation

I NTRODUCTION H OPSWORKS D ISTRIBUTED D EEP L EARNING P ARALLEL B LACK -B OX O PTIMIZATION S UMMARY Distributed Deep Learning Using Hopsworks SF Machine Learning Mesosphere Kim Hammar kim@logicalclocks.com I NTRODUCTION H OPSWORKS D ISTRIBUTED

Hopsworks Dr. Jim Dowling Sina Sheikholeslami Nov 2019 Hopsworks Technical Milestones First

Distributed Deep Learning Using Hopsworks CGI Trainee Program Workshop Kim Hammar

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Distributed Deep Learning Inference using Apache MXNet* and Apache Spark Naveen Swamy Amazon AI

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Distributed Deep Learning Mathew Salvaris What will be covered Overview of Distributed

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Presentation about Deep Learning --- Zhongwu xie Contents 1.Brief introduction of Deep learning.

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

Deep learning Deep reinforcement learning Hamid Beigy Sharif university of technology December

Differen'able Func'onal Programming Noel Welsh @noelwelsh underscore Goals Deep learning

DSC 102 Systems for Scalable Analytics Arun Kumar Topic 6: Deep Learning Systems 1 Outline

Distributed DeepLearning at Scale Soumith Chintala Facebook AI Research Overview Deep

Distributed Synthetic Data Platform for Deep Learning Applications BITCOIN OR ETHER AMAZON DEEP

Docker@OVH with Mesos/Marathon June 28th 2016 @devatoria @brouberol Devops / Python charmer

DOING BIG DATA FOR REAL WITH DOCKER MESOSPHERE DCOS Elizabeth Lingg elizabeth@mesosphere.io

Mesosphere and Percona Server for MongoDB Jeff Sandstrom, Product Manager (Percona) Ravi Yadav,

ma mainly y N 2 and O and O 2 (variable) Greenhouse Gases: H 2 O, CO 2 , CH 4 SIO15: Topic 15:

Numerical Optimization - a brief review - What is optimization, and why should we care about it?

Why Are Metals Metallic? What are the properties of metals? What are the energies of electrons in

Extensions of the Nagaoka- Thouless Theorem Venice ce 2019 9 - Quantis tissima sima in the

Lecture 2: Color Tuesday, Sept 4 1 Why do we need color for visual processing? 2 Color