I NTRO H OPSWORKS D ISTRIBUTED DL B LACK -B OX O PTIMIZATION F EATURE S TORE S UMMARY D EMO /W ORKSHOP Distributed Deep Learning Using Hopsworks CGI Trainee Program Workshop Kim Hammar kim@logicalclocks.com
Before we start.. 1. Register for an account at: www.hops.site 2. Follow the instructions at: http://bit.ly/2EnZQgW
I NTRO H OPSWORKS D ISTRIBUTED DL B LACK -B OX O PTIMIZATION F EATURE S TORE S UMMARY D EMO /W ORKSHOP D ISTRIBUTED C OMPUTING + D EEP L EARNING = ? Distributed Computing Deep Learning b 0 b 1 x 0 , 1 x 1 , 1 y ˆ x 0 , 2 x 1 , 2 x 0 , 3 x 1 , 3 Why Combine the two? 2em1 1 Chen Sun et al. “Revisiting Unreasonable Effectiveness of Data in Deep Learning Era”. In: CoRR abs/1707.02968 (2017). arXiv: 1707.02968 . URL : http://arxiv.org/abs/1707.02968 . 2em1 2 Jeffrey Dean et al. “Large Scale Distributed Deep Networks”. In: Advances in Neural Information Processing Systems 25 . Ed. by F. Pereira et al. Curran Associates, Inc., 2012, pp. 1223–1231.
I NTRO H OPSWORKS D ISTRIBUTED DL B LACK -B OX O PTIMIZATION F EATURE S TORE S UMMARY D EMO /W ORKSHOP D ISTRIBUTED C OMPUTING + D EEP L EARNING = ? Distributed Computing Deep Learning b 0 b 1 x 0 , 1 x 1 , 1 y ˆ x 0 , 2 x 1 , 2 x 0 , 3 x 1 , 3 Why Combine the two? ◮ We like challenging problems � 2em1 1 Chen Sun et al. “Revisiting Unreasonable Effectiveness of Data in Deep Learning Era”. In: CoRR abs/1707.02968 (2017). arXiv: 1707.02968 . URL : http://arxiv.org/abs/1707.02968 . 2em1 2 Jeffrey Dean et al. “Large Scale Distributed Deep Networks”. In: Advances in Neural Information Processing Systems 25 . Ed. by F. Pereira et al. Curran Associates, Inc., 2012, pp. 1223–1231.
I NTRO H OPSWORKS D ISTRIBUTED DL B LACK -B OX O PTIMIZATION F EATURE S TORE S UMMARY D EMO /W ORKSHOP D ISTRIBUTED C OMPUTING + D EEP L EARNING = ? Distributed Computing Deep Learning b 0 b 1 x 0 , 1 x 1 , 1 y ˆ x 0 , 2 x 1 , 2 x 0 , 3 x 1 , 3 Why Combine the two? ◮ We like challenging problems � ◮ More productive data science ◮ Unreasonable effectiveness of data 1 ◮ To achieve state-of-the-art results 2 2em1 1 Chen Sun et al. “Revisiting Unreasonable Effectiveness of Data in Deep Learning Era”. In: CoRR abs/1707.02968 (2017). arXiv: 1707.02968 . URL : http://arxiv.org/abs/1707.02968 . 2em1 2 Jeffrey Dean et al. “Large Scale Distributed Deep Networks”. In: Advances in Neural Information Processing Systems 25 . Ed. by F. Pereira et al. Curran Associates, Inc., 2012, pp. 1223–1231.
I NTRO H OPSWORKS D ISTRIBUTED DL B LACK -B OX O PTIMIZATION F EATURE S TORE S UMMARY D EMO /W ORKSHOP D ISTRIBUTED D EEP L EARNING (DDL): P REDICTABLE S CALING 3 2em1 3 Jeff Dean. Building Intelligent Systems withLarge Scale Deep Learning . https : / / www . scribd . com / document/355752799/Jeff-Dean-s-Lecture-for-YC-AI . 2018.
I NTRO H OPSWORKS D ISTRIBUTED DL B LACK -B OX O PTIMIZATION F EATURE S TORE S UMMARY D EMO /W ORKSHOP D ISTRIBUTED D EEP L EARNING (DDL): P REDICTABLE S CALING
I NTRO H OPSWORKS D ISTRIBUTED DL B LACK -B OX O PTIMIZATION F EATURE S TORE S UMMARY D EMO /W ORKSHOP DDL IS NOT A SECRET ANYMORE 4 2em1 4 Tal Ben-Nun and Torsten Hoefler. “Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis”. In: CoRR abs/1802.09941 (2018). arXiv: 1802.09941 . URL : http://arxiv.org/abs/ 1802.09941 .
I NTRO H OPSWORKS D ISTRIBUTED DL B LACK -B OX O PTIMIZATION F EATURE S TORE S UMMARY D EMO /W ORKSHOP DDL IS NOT A SECRET ANYMORE Companies using DDL Frameworks for DDL TensorflowOnSpark Distributed TF CaffeOnSpark
I NTRO H OPSWORKS D ISTRIBUTED DL B LACK -B OX O PTIMIZATION F EATURE S TORE S UMMARY D EMO /W ORKSHOP DDL REQUIRES AN ENTIRE SOFTWARE / INFRASTRUCTURE STACK Distributed Systems A/B Data Validation Testing Distributed Training e 4 Gradient ∇ Gradient ∇ Model Serving Data Collection e 1 e 3 HyperParameter Tuning Gradient ∇ Gradient ∇ Monitoring Hardware e 2 Management Feature Engineering Pipeline Management
I NTRO H OPSWORKS D ISTRIBUTED DL B LACK -B OX O PTIMIZATION F EATURE S TORE S UMMARY D EMO /W ORKSHOP O UTLINE 1. Hopsworks : Background of the platform 2. Managed Distributed Deep Learning using HopsYARN, HopsML, PySpark, and Tensorflow 3. Black-Box Optimization (Hyperparameter Tuning) using Hopsworks, Metadata Store, PySpark, and Maggy 5 4. Feature Store data management for machine learning 5. Coffee Break 6. Demo , end-to-end ML pipeline 7. Hands-on Workshop , try out Hopsworks on our cluster in Luleå 2em1 5 Moritz Meister and Sina Sheikholeslami. Maggy . https://github.com/logicalclocks/maggy . 2019.
I NTRO H OPSWORKS D ISTRIBUTED DL B LACK -B OX O PTIMIZATION F EATURE S TORE S UMMARY D EMO /W ORKSHOP H OPSWORKS
I NTRO H OPSWORKS D ISTRIBUTED DL B LACK -B OX O PTIMIZATION F EATURE S TORE S UMMARY D EMO /W ORKSHOP H OPSWORKS HopsFS
I NTRO H OPSWORKS D ISTRIBUTED DL B LACK -B OX O PTIMIZATION F EATURE S TORE S UMMARY D EMO /W ORKSHOP H OPSWORKS HopsYARN (GPU/CPU as a resource) HopsFS
I NTRO H OPSWORKS D ISTRIBUTED DL B LACK -B OX O PTIMIZATION F EATURE S TORE S UMMARY D EMO /W ORKSHOP H OPSWORKS Frameworks (ML/Data) HopsYARN (GPU/CPU as a resource) HopsFS
I NTRO H OPSWORKS D ISTRIBUTED DL B LACK -B OX O PTIMIZATION F EATURE S TORE S UMMARY D EMO /W ORKSHOP H OPSWORKS b 0 b 1 x 0 , 1 x 1 , 1 ML/AI Assets ˆ y x 0 , 2 x 1 , 2 x 0 , 3 x 1 , 3 Feature Store Pipelines Experiments Models Frameworks (ML/Data) HopsYARN (GPU/CPU as a resource) HopsFS
I NTRO H OPSWORKS D ISTRIBUTED DL B LACK -B OX O PTIMIZATION F EATURE S TORE S UMMARY D EMO /W ORKSHOP H OPSWORKS from hops import featurestore from hops import experiment APIs featurestore.get_features([ "average_attendance", "average_player_age"]) experiment.collective_all_reduce(features , model) b 0 b 1 x 0 , 1 x 1 , 1 ML/AI Assets ˆ y x 0 , 2 x 1 , 2 x 0 , 3 x 1 , 3 Experiments Models Feature Store Pipelines Frameworks (ML/Data) HopsYARN (GPU/CPU as a resource) HopsFS
I NTRO H OPSWORKS D ISTRIBUTED DL B LACK -B OX O PTIMIZATION F EATURE S TORE S UMMARY D EMO /W ORKSHOP H OPSWORKS from hops import featurestore from hops import experiment APIs featurestore.get_features([ "average_attendance", "average_player_age"]) experiment.collective_all_reduce(features , model) b 0 b 1 ML/AI Assets x 0 , 1 x 1 , 1 ˆ y x 0 , 2 x 1 , 2 x 0 , 3 x 1 , 3 Experiments Models Feature Store Pipelines Frameworks (ML/Data) HopsYARN (GPU/CPU as a resource) Distributed Metadata (Available from REST API) HopsFS
D EMO /W ORKSHOP I NNER AND O UTER L OOP OF L ARGE S CALE D EEP S UMMARY b 0 b 1 b 1 b 1 b 1 worker N ∇ N x 0 , 1 x 1 , 1 x 1 , 1 x 1 , 1 x 1 , 1 y ˆ F EATURE S TORE x 0 , 2 x 1 , 2 x 1 , 2 x 1 , 2 x 1 , 2 x 0 , 3 x 1 , 3 x 1 , 3 x 1 , 3 x 1 , 3 . . . Synchronization B LACK -B OX O PTIMIZATION Data b 0 b 1 b 1 b 1 b 1 worker 2 ∇ 2 x 0 , 1 x 1 , 1 x 1 , 1 x 1 , 1 x 1 , 1 y ˆ x 0 , 2 x 1 , 2 x 1 , 2 x 1 , 2 x 1 , 2 x 0 , 3 x 1 , 3 x 1 , 3 x 1 , 3 x 1 , 3 b 0 b 1 b 1 b 1 b 1 worker 1 ∇ 1 x 0 , 1 x 1 , 1 x 1 , 1 x 1 , 1 x 1 , 1 D ISTRIBUTED DL ˆ y Inner loop x 0 , 2 x 1 , 2 x 1 , 2 x 1 , 2 x 1 , 2 x 0 , 3 x 1 , 3 x 1 , 3 x 1 , 3 x 1 , 3 L EARNING H OPSWORKS I NTRO
I NTRO H OPSWORKS D ISTRIBUTED DL B LACK -B OX O PTIMIZATION F EATURE S TORE S UMMARY D EMO /W ORKSHOP I NNER AND O UTER L OOP OF L ARGE S CALE D EEP L EARNING Outer loop Inner loop Data worker 1 worker 2 worker N x 0 , 3 x 0 , 2 x 0 , 1 x 0 , 3 x 0 , 2 x 0 , 1 x 0 , 3 x 0 , 2 x 0 , 1 b 0 b 0 b 0 hparams � h x 1 , 3 x 1 , 2 x 1 , 1 b 1 x 1 , 3 x 1 , 2 x 1 , 1 b 1 x 1 , 3 x 1 , 2 x 1 , 1 b 1 . . . x 1 , 3 x 1 , 2 x 1 , 1 x 1 , 3 x 1 , 2 x 1 , 1 x 1 , 3 x 1 , 2 x 1 , 1 b 1 b 1 b 1 Search Method x 1 , 3 x 1 , 2 x 1 , 1 x 1 , 3 x 1 , 2 x 1 , 1 x 1 , 3 x 1 , 2 x 1 , 1 b 1 b 1 b 1 x 1 , 3 x 1 , 2 x 1 , 1 b 1 x 1 , 3 x 1 , 2 x 1 , 1 b 1 x 1 , 3 x 1 , 2 x 1 , 1 b 1 y ˆ y ˆ ˆ y ∇ 1 ∇ 2 ∇ N Synchronization Metric τ
I NTRO H OPSWORKS D ISTRIBUTED DL B LACK -B OX O PTIMIZATION F EATURE S TORE S UMMARY D EMO /W ORKSHOP I NNER AND O UTER L OOP OF L ARGE S CALE D EEP L EARNING Outer loop Inner loop Data worker 1 worker 2 worker N x 0 , 3 x 0 , 2 x 0 , 1 x 0 , 3 x 0 , 2 x 0 , 1 x 0 , 3 x 0 , 2 x 0 , 1 b 0 b 0 b 0 hparams � h x 1 , 3 x 1 , 2 x 1 , 1 b 1 x 1 , 3 x 1 , 2 x 1 , 1 b 1 x 1 , 3 x 1 , 2 x 1 , 1 b 1 . . . x 1 , 3 x 1 , 2 x 1 , 1 x 1 , 3 x 1 , 2 x 1 , 1 x 1 , 3 x 1 , 2 x 1 , 1 b 1 b 1 b 1 Search Method x 1 , 3 x 1 , 2 x 1 , 1 x 1 , 3 x 1 , 2 x 1 , 1 x 1 , 3 x 1 , 2 x 1 , 1 b 1 b 1 b 1 x 1 , 3 x 1 , 2 x 1 , 1 b 1 x 1 , 3 x 1 , 2 x 1 , 1 b 1 x 1 , 3 x 1 , 2 x 1 , 1 b 1 y ˆ y ˆ ˆ y ∇ 1 ∇ 2 ∇ N Synchronization Metric τ
I NTRO H OPSWORKS D ISTRIBUTED DL B LACK -B OX O PTIMIZATION F EATURE S TORE S UMMARY D EMO /W ORKSHOP I NNER L OOP : D ISTRIBUTED D EEP L EARNING Gradient ∇ θ L ( y , ˆ y ) b 0 b 1 x 1 x 0 , 1 x 1 , 1 . ˆ L ( y , ˆ . y y ) . y ˆ x 0 , 2 x 1 , 2 x n x 0 , 3 x 1 , 3 Features Model θ Prediction Loss
I NTRO H OPSWORKS D ISTRIBUTED DL B LACK -B OX O PTIMIZATION F EATURE S TORE S UMMARY D EMO /W ORKSHOP I NNER L OOP : D ISTRIBUTED D EEP L EARNING p 4 Data Partition e 4 Gradient ∇ Gradient ∇ p 1 e 1 e 3 p 3 Data Partition Data Partition Gradient ∇ Gradient ∇ e 2 p 2 Data Partition
Recommend
More recommend