device placement optimization with reinforcement learning
play

Device Placement Optimization with Reinforcement Learning Azalia - PowerPoint PPT Presentation

Device Placement Optimization with Reinforcement Learning Azalia Mirhoseini, Hieu Pham, Quoc V. Le, Benoit Steiner, Rasmus Larsen, Yuefeng Zhou, Naveen Kumar, Mohammad Norouzi, Samy Bengio, Jefg Dean 1 / 21 What is device placement Consider a


  1. Device Placement Optimization with Reinforcement Learning Azalia Mirhoseini, Hieu Pham, Quoc V. Le, Benoit Steiner, Rasmus Larsen, Yuefeng Zhou, Naveen Kumar, Mohammad Norouzi, Samy Bengio, Jefg Dean 1 / 21

  2. What is device placement Consider a T ensorFlow computational graph G, which ● consists of M operations {o 1 ,o 2 , …, o M }, and a list of D available devices. A placement P = {p 1 ,p 2 , …, p M } is an assignment of an ● operation o i to a device p i . 2 / 21

  3. Why device placement Trend toward many-device training, bigger models, larger ● batch sizes Growth in size and computational requirements of training ● and inference 3 / 21

  4. Typical approaches Use a heterogeneous distributed environment with a mixture ● of many CPUs and GPUs Often based on greedy heuristics ● Require deep understanding of devices: bandwidth, latency ● behavior Are not fmexible enough and does not generalize well ● 4 / 21

  5. ML for device placement ML is repeatedly replacing rule based heuristics ● RL can be applied to device placement ● – Efgective search across large state and action spaces to fjnd optimal solution – Automatic learning from underlying environment only based on reward function 5 / 21

  6. RL based device placement Output RL Input model Neural Model Assignment of ops in Policy Neural model to devices Available Devs CPU GPU Evaluate runtime 6 / 21

  7. Problem formulation : expected runtime : trainable parameters of policy : runtime : policy : output placements 7 / 21

  8. Training with REINFORCE Learn the network parameters using Adam optimizer based ● on policy gradients computed via the REINFORCE equation: Use K placement samples to estimate policy gradients & use ● a baseline term B to reduce variance: 8 / 21

  9. Model architecture 9 / 21

  10. Challenges Vanishing ● Exploding gradient issue ● Large memory footprints ● 10 / 21

  11. Distributed training 11 / 21

  12. Experiments Recurrent Neural Language Model (RNNLM) ● Neural Machine Translation with attention mechanism(NMT) ● Inception-V3 ● 12 / 21

  13. Learned placement on NMT 13 / 21

  14. NMT end-to-end runtime 14 / 21

  15. Learned placement on Inception-V3 15 / 21

  16. Inception-V3 end-to-end runtime 16 / 21

  17. Profming on NMT 17 / 21

  18. Profming on Inception-V3 18 / 21

  19. Profming on Inception-V3 19 / 21

  20. Running times (in seconds) 20 / 21

  21. Summary Propose a RL model to optimize device placements for ● neural networks Use policy gradient to learn parameters ● Policy fjnds non-trival assignment of operations to devices ● that outperform heuristic approaches Profjling of results show policy learns implicit trade-ofgs ● between computation and communication in hardware 21 / 21

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend