distributed deep learning using hopsworks
play

Distributed Deep Learning Using Hopsworks SF Machine Learning - PowerPoint PPT Presentation

I NTRODUCTION H OPSWORKS D ISTRIBUTED D EEP L EARNING P ARALLEL B LACK -B OX O PTIMIZATION S UMMARY Distributed Deep Learning Using Hopsworks SF Machine Learning Mesosphere Kim Hammar kim@logicalclocks.com I NTRODUCTION H OPSWORKS D ISTRIBUTED


  1. I NTRODUCTION H OPSWORKS D ISTRIBUTED D EEP L EARNING P ARALLEL B LACK -B OX O PTIMIZATION S UMMARY Distributed Deep Learning Using Hopsworks SF Machine Learning Mesosphere Kim Hammar kim@logicalclocks.com

  2. I NTRODUCTION H OPSWORKS D ISTRIBUTED D EEP L EARNING P ARALLEL B LACK -B OX O PTIMIZATION S UMMARY D ISTRIBUTED C OMPUTING + D EEP L EARNING = ? Distributed Computing Deep Learning b 0 b 1 x 0 , 1 x 1 , 1 ˆ y x 0 , 2 x 1 , 2 x 0 , 3 x 1 , 3 Why Combine the two? 2em1 1 Chen Sun et al. “Revisiting Unreasonable Effectiveness of Data in Deep Learning Era”. In: CoRR abs/1707.02968 (2017). arXiv: 1707.02968 . URL : http://arxiv.org/abs/1707.02968 . 2em1 2 Jeffrey Dean et al. “Large Scale Distributed Deep Networks”. In: Advances in Neural Information Processing Systems 25 . Ed. by F. Pereira et al. Curran Associates, Inc., 2012, pp. 1223–1231.

  3. I NTRODUCTION H OPSWORKS D ISTRIBUTED D EEP L EARNING P ARALLEL B LACK -B OX O PTIMIZATION S UMMARY D ISTRIBUTED C OMPUTING + D EEP L EARNING = ? Distributed Computing Deep Learning b 0 b 1 x 0 , 1 x 1 , 1 ˆ y x 0 , 2 x 1 , 2 x 0 , 3 x 1 , 3 Why Combine the two? ◮ We like challenging problems � 2em1 1 Chen Sun et al. “Revisiting Unreasonable Effectiveness of Data in Deep Learning Era”. In: CoRR abs/1707.02968 (2017). arXiv: 1707.02968 . URL : http://arxiv.org/abs/1707.02968 . 2em1 2 Jeffrey Dean et al. “Large Scale Distributed Deep Networks”. In: Advances in Neural Information Processing Systems 25 . Ed. by F. Pereira et al. Curran Associates, Inc., 2012, pp. 1223–1231.

  4. I NTRODUCTION H OPSWORKS D ISTRIBUTED D EEP L EARNING P ARALLEL B LACK -B OX O PTIMIZATION S UMMARY D ISTRIBUTED C OMPUTING + D EEP L EARNING = ? Distributed Computing Deep Learning b 0 b 1 x 0 , 1 x 1 , 1 ˆ y x 0 , 2 x 1 , 2 x 0 , 3 x 1 , 3 Why Combine the two? ◮ We like challenging problems � ◮ More productive data science ◮ Unreasonable effectiveness of data 1 ◮ To achieve state-of-the-art results 2 2em1 1 Chen Sun et al. “Revisiting Unreasonable Effectiveness of Data in Deep Learning Era”. In: CoRR abs/1707.02968 (2017). arXiv: 1707.02968 . URL : http://arxiv.org/abs/1707.02968 . 2em1 2 Jeffrey Dean et al. “Large Scale Distributed Deep Networks”. In: Advances in Neural Information Processing Systems 25 . Ed. by F. Pereira et al. Curran Associates, Inc., 2012, pp. 1223–1231.

  5. I NTRODUCTION H OPSWORKS D ISTRIBUTED D EEP L EARNING P ARALLEL B LACK -B OX O PTIMIZATION S UMMARY D ISTRIBUTED D EEP L EARNING (DDL): P REDICTABLE S CALING 3 2em1 3 Jeff Dean. Building Intelligent Systems withLarge Scale Deep Learning . https : / / www . scribd . com / document/355752799/Jeff-Dean-s-Lecture-for-YC-AI . 2018.

  6. I NTRODUCTION H OPSWORKS D ISTRIBUTED D EEP L EARNING P ARALLEL B LACK -B OX O PTIMIZATION S UMMARY D ISTRIBUTED D EEP L EARNING (DDL): P REDICTABLE S CALING

  7. I NTRODUCTION H OPSWORKS D ISTRIBUTED D EEP L EARNING P ARALLEL B LACK -B OX O PTIMIZATION S UMMARY DDL IS NOT A SECRET ANYMORE 4 2em1 4 Tal Ben-Nun and Torsten Hoefler. “Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis”. In: CoRR abs/1802.09941 (2018). arXiv: 1802.09941 . URL : http://arxiv.org/abs/ 1802.09941 .

  8. I NTRODUCTION H OPSWORKS D ISTRIBUTED D EEP L EARNING P ARALLEL B LACK -B OX O PTIMIZATION S UMMARY DDL IS NOT A SECRET ANYMORE Companies using DDL Frameworks for DDL TensorflowOnSpark Distributed TF CaffeOnSpark

  9. I NTRODUCTION H OPSWORKS D ISTRIBUTED D EEP L EARNING P ARALLEL B LACK -B OX O PTIMIZATION S UMMARY DDL REQUIRES AN ENTIRE SOFTWARE / INFRASTRUCTURE STACK Distributed Systems A/B Data Validation Testing Distributed Training e 4 Gradient ∇ Gradient ∇ Model Serving Data Collection e 1 e 3 HyperParameter Tuning Gradient ∇ Gradient ∇ Monitoring Hardware e 2 Management Feature Engineering Pipeline Management

  10. I NTRODUCTION H OPSWORKS D ISTRIBUTED D EEP L EARNING P ARALLEL B LACK -B OX O PTIMIZATION S UMMARY O UTLINE 1. Hopsworks : Background of the platform 2. Managed Distributed Deep Learning using HopsYARN, HopsML, PySpark, and Tensorflow 3. Black-Box Optimization using Hopsworks, Metadata Store, PySpark, and Maggy 5 2em1 5 Moritz Meister and Sina Sheikholeslami. Maggy . https://github.com/logicalclocks/maggy . 2019.

  11. I NTRODUCTION H OPSWORKS D ISTRIBUTED D EEP L EARNING P ARALLEL B LACK -B OX O PTIMIZATION S UMMARY O UTLINE 1. Hopsworks : Background of the platform 2. Managed Distributed Deep Learning using HopsYARN, HopsML, PySpark, and Tensorflow 3. Black-Box Optimization using Hopsworks, Metadata Store, PySpark, and Maggy 6 2em1 6 Moritz Meister and Sina Sheikholeslami. Maggy . https://github.com/logicalclocks/maggy . 2019.

  12. I NTRODUCTION H OPSWORKS D ISTRIBUTED D EEP L EARNING P ARALLEL B LACK -B OX O PTIMIZATION S UMMARY H OPSWORKS

  13. I NTRODUCTION H OPSWORKS D ISTRIBUTED D EEP L EARNING P ARALLEL B LACK -B OX O PTIMIZATION S UMMARY H OPSWORKS HopsFS

  14. I NTRODUCTION H OPSWORKS D ISTRIBUTED D EEP L EARNING P ARALLEL B LACK -B OX O PTIMIZATION S UMMARY H OPSWORKS HopsYARN (GPU/CPU as a resource) HopsFS

  15. I NTRODUCTION H OPSWORKS D ISTRIBUTED D EEP L EARNING P ARALLEL B LACK -B OX O PTIMIZATION S UMMARY H OPSWORKS Frameworks (ML/Data) HopsYARN (GPU/CPU as a resource) HopsFS

  16. I NTRODUCTION H OPSWORKS D ISTRIBUTED D EEP L EARNING P ARALLEL B LACK -B OX O PTIMIZATION S UMMARY H OPSWORKS b 0 b 1 x 0 , 1 x 1 , 1 ML/AI Assets ˆ y x 0 , 2 x 1 , 2 x 0 , 3 x 1 , 3 Feature Store Pipelines Experiments Models Frameworks (ML/Data) HopsYARN (GPU/CPU as a resource) HopsFS

  17. I NTRODUCTION H OPSWORKS D ISTRIBUTED D EEP L EARNING P ARALLEL B LACK -B OX O PTIMIZATION S UMMARY H OPSWORKS from hops import featurestore from hops import experiment APIs featurestore.get_features([ "average_attendance", "average_player_age"]) experiment.collective_all_reduce(features , model) b 0 b 1 x 0 , 1 x 1 , 1 ML/AI Assets ˆ y x 0 , 2 x 1 , 2 x 0 , 3 x 1 , 3 Experiments Models Feature Store Pipelines Frameworks (ML/Data) HopsYARN (GPU/CPU as a resource) HopsFS

  18. I NTRODUCTION H OPSWORKS D ISTRIBUTED D EEP L EARNING P ARALLEL B LACK -B OX O PTIMIZATION S UMMARY H OPSWORKS from hops import featurestore from hops import experiment APIs featurestore.get_features([ "average_attendance", "average_player_age"]) experiment.collective_all_reduce(features , model) b 0 b 1 ML/AI Assets x 0 , 1 x 1 , 1 ˆ y x 0 , 2 x 1 , 2 x 0 , 3 x 1 , 3 Experiments Models Feature Store Pipelines Frameworks (ML/Data) HopsYARN (GPU/CPU as a resource) Distributed Metadata (Available from REST API) HopsFS

  19. S UMMARY I NNER AND O UTER L OOP OF L ARGE S CALE D EEP P ARALLEL B LACK -B OX O PTIMIZATION b 0 b 1 b 1 b 1 b 1 worker N ∇ N x 0 , 1 x 1 , 1 x 1 , 1 x 1 , 1 x 1 , 1 ˆ y x 0 , 2 x 1 , 2 x 1 , 2 x 1 , 2 x 1 , 2 x 0 , 3 x 1 , 3 x 1 , 3 x 1 , 3 x 1 , 3 . . . Synchronization Data D ISTRIBUTED D EEP L EARNING b 0 b 1 b 1 b 1 b 1 worker 2 ∇ 2 x 0 , 1 x 1 , 1 x 1 , 1 x 1 , 1 x 1 , 1 ˆ y x 0 , 2 x 1 , 2 x 1 , 2 x 1 , 2 x 1 , 2 x 0 , 3 x 1 , 3 x 1 , 3 x 1 , 3 x 1 , 3 b 0 b 1 b 1 b 1 b 1 worker 1 ∇ 1 x 0 , 1 x 1 , 1 x 1 , 1 x 1 , 1 x 1 , 1 y ˆ Inner loop x 0 , 2 x 1 , 2 x 1 , 2 x 1 , 2 x 1 , 2 x 0 , 3 x 1 , 3 x 1 , 3 x 1 , 3 x 1 , 3 H OPSWORKS L EARNING I NTRODUCTION

  20. I NTRODUCTION H OPSWORKS D ISTRIBUTED D EEP L EARNING P ARALLEL B LACK -B OX O PTIMIZATION S UMMARY I NNER AND O UTER L OOP OF L ARGE S CALE D EEP L EARNING Outer loop Inner loop Data worker 1 worker 2 worker N x 0 , 3 x 0 , 2 x 0 , 1 x 0 , 3 x 0 , 2 x 0 , 1 x 0 , 3 x 0 , 2 x 0 , 1 b 0 b 0 b 0 hparams � h x 1 , 3 x 1 , 2 x 1 , 1 b 1 x 1 , 3 x 1 , 2 x 1 , 1 b 1 x 1 , 3 x 1 , 2 x 1 , 1 b 1 . . . x 1 , 3 x 1 , 2 x 1 , 1 x 1 , 3 x 1 , 2 x 1 , 1 x 1 , 3 x 1 , 2 x 1 , 1 b 1 b 1 b 1 Search Method x 1 , 3 x 1 , 2 x 1 , 1 x 1 , 3 x 1 , 2 x 1 , 1 x 1 , 3 x 1 , 2 x 1 , 1 b 1 b 1 b 1 x 1 , 3 x 1 , 2 x 1 , 1 b 1 x 1 , 3 x 1 , 2 x 1 , 1 b 1 x 1 , 3 x 1 , 2 x 1 , 1 b 1 y ˆ y ˆ ˆ y ∇ 1 ∇ 2 ∇ N Synchronization Metric τ

  21. I NTRODUCTION H OPSWORKS D ISTRIBUTED D EEP L EARNING P ARALLEL B LACK -B OX O PTIMIZATION S UMMARY I NNER AND O UTER L OOP OF L ARGE S CALE D EEP L EARNING Outer loop Inner loop Data worker 1 worker 2 worker N x 0 , 3 x 0 , 2 x 0 , 1 x 0 , 3 x 0 , 2 x 0 , 1 x 0 , 3 x 0 , 2 x 0 , 1 b 0 b 0 b 0 hparams � h x 1 , 3 x 1 , 2 x 1 , 1 b 1 x 1 , 3 x 1 , 2 x 1 , 1 b 1 x 1 , 3 x 1 , 2 x 1 , 1 b 1 . . . x 1 , 3 x 1 , 2 x 1 , 1 x 1 , 3 x 1 , 2 x 1 , 1 x 1 , 3 x 1 , 2 x 1 , 1 b 1 b 1 b 1 Search Method x 1 , 3 x 1 , 2 x 1 , 1 x 1 , 3 x 1 , 2 x 1 , 1 x 1 , 3 x 1 , 2 x 1 , 1 b 1 b 1 b 1 x 1 , 3 x 1 , 2 x 1 , 1 b 1 x 1 , 3 x 1 , 2 x 1 , 1 b 1 x 1 , 3 x 1 , 2 x 1 , 1 b 1 y ˆ y ˆ ˆ y ∇ 1 ∇ 2 ∇ N Synchronization Metric τ

  22. I NTRODUCTION H OPSWORKS D ISTRIBUTED D EEP L EARNING P ARALLEL B LACK -B OX O PTIMIZATION S UMMARY I NNER L OOP : D ISTRIBUTED D EEP L EARNING e 4 e 1 e 3 e 2

  23. I NTRODUCTION H OPSWORKS D ISTRIBUTED D EEP L EARNING P ARALLEL B LACK -B OX O PTIMIZATION S UMMARY I NNER L OOP : D ISTRIBUTED D EEP L EARNING e 4 Gradient ∇ Gradient ∇ e 1 e 3 Gradient ∇ Gradient ∇ e 2

  24. I NTRODUCTION H OPSWORKS D ISTRIBUTED D EEP L EARNING P ARALLEL B LACK -B OX O PTIMIZATION S UMMARY I NNER L OOP : D ISTRIBUTED D EEP L EARNING p 4 Data Partition e 4 Gradient ∇ Gradient ∇ p 1 e 1 e 3 p 3 Data Partition Data Partition Gradient ∇ Gradient ∇ e 2 p 2 Data Partition

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend