evolution strategies
play

Evolution Strategies Distributed deep reinforcement learning - PowerPoint PPT Presentation

Evolution Strategies Distributed deep reinforcement learning (blog.otoro.net) Evolutionary Strategies Steven Schmatz November 21, 2017 @stevenschmatz Deep Reinforcement Learning Evolution Strategies Steven Schmatz November 21, 2017


  1. Evolution Strategies Distributed deep reinforcement learning (blog.otoro.net) Evolutionary Strategies Steven Schmatz November 21, 2017 @stevenschmatz

  2. Deep Reinforcement Learning Evolution Strategies Steven Schmatz November 21, 2017 @stevenschmatz

  3. Agenda 1. Why is deep reinforcement learning hard? 2. How does evolution strategies (ES) help? 3. Advice on applying ES to real-world problems Evolution Strategies Steven Schmatz November 21, 2017 @stevenschmatz

  4. RL in a nutshell (reinforcement learning) Evolution Strategies Steven Schmatz November 21, 2017 @stevenschmatz

  5. Deep RL in a nutshell Evolution Strategies Steven Schmatz November 21, 2017 @stevenschmatz

  6. Deep CNNs are useful. Evolution Strategies Steven Schmatz November 21, 2017 @stevenschmatz

  7. Assumptions of supervised learning Stationary Independence Clear input-output distribution of examples relationship Evolution Strategies Steven Schmatz November 21, 2017 @stevenschmatz

  8. RL violates these assumptions. 😮 Stationary Independence Clear input-output distribution of examples relationship Evolution Strategies Steven Schmatz November 21, 2017 @stevenschmatz

  9. RL violates these assumptions. 😮 Stationary distribution The training data changes as you act differently. Evolution Strategies Steven Schmatz November 21, 2017 @stevenschmatz

  10. RL violates these assumptions. 😮 Independence of examples Adjacent game frames are usually very similar. Evolution Strategies Steven Schmatz November 21, 2017 @stevenschmatz

  11. RL violates these assumptions. 😮 Clear input-output relationship There can be a large delay between action and reward. Evolution Strategies Steven Schmatz November 21, 2017 @stevenschmatz

  12. Deep Q-Learning Model Training objective Evolution Strategies Steven Schmatz November 21, 2017 @stevenschmatz

  13. Policy gradients (our objective) (our weight update) Evolution Strategies Steven Schmatz November 21, 2017 @stevenschmatz

  14. Policy gradients What if our reward function is highly nonlinear ? What if our reward is received much later ? What if our policy is non-differentiable ? How far should we step? Evolution Strategies Steven Schmatz November 21, 2017 @stevenschmatz

  15. Local optima Evolution Strategies Steven Schmatz November 21, 2017 @stevenschmatz

  16. Local optima Evolution Strategies Steven Schmatz November 21, 2017 @stevenschmatz

  17. Black-box optimization Evolution Strategies Steven Schmatz November 21, 2017 @stevenschmatz

  18. ES to the rescue! At each iteration: 1. Generate candidate solutions from old candidates by adding noise 2. Evaluate a fitness function for each candidate 3. Aggregate the results and discard bad candidates. Evolution Strategies Steven Schmatz November 21, 2017 @stevenschmatz

  19. Simple ES Basic idea: Select the single best previous solution, and add Gaussian noise. (Keep standard deviation fixed.) Evolution Strategies Steven Schmatz November 21, 2017 @stevenschmatz

  20. Genetic ES Basic idea: Only keep the top performing 10% of solutions. Randomly select two solutions. Recombine them by randomly assigning each parameter value from either parent. (and add fixed Gaussian noise.) Example: Combine (1, 2, 3) and ( 4, 5, 6 ): - (1, 5, 6 ) - ( 4 , 2 , 3) Evolution Strategies Steven Schmatz November 21, 2017 @stevenschmatz

  21. CMA–ES Basic idea: Select the best 25% of the population. Calculate a covariance matrix of these best 25%. (represents a promising area to search for new candidates) Generate new candidates using the per- parameter means and variances. Evolution Strategies Steven Schmatz November 21, 2017 @stevenschmatz

  22. CMA–ES Basic idea: Select the best 25% of the population. Calculate a covariance matrix of these best 25%. (represents a promising area to search for new candidates) Generate new candidates using the per- parameter means and variances. 😮 Problem: Evolution Strategies Steven Schmatz November 21, 2017 @stevenschmatz

  23. Natural ES Basic idea: Treat the problem a bit differently: Then use the gradient with your favorite SGD optimizer: Evolution Strategies Steven Schmatz November 21, 2017 @stevenschmatz

  24. OpenAI ES Basic idea: Similar to Natural ES, but σ constant. Evolution Strategies Steven Schmatz November 21, 2017 @stevenschmatz

  25. OpenAI ES Basic idea: Similar to Natural ES, but σ constant. Note: to parallelize we only need to know pairs! Evolution Strategies Steven Schmatz November 21, 2017 @stevenschmatz

  26. Parallelization Initialize 1. Create shared list of all random 1 seeds, one per worker; and 7 2 6 3 5 4 Evolution Strategies Steven Schmatz November 21, 2017 @stevenschmatz

  27. Parallelization Initialize 1. Create shared list of all random 1 seeds, one per worker; and 7 2 Repeat: 1. Sample 6 3 5 4 Evolution Strategies Steven Schmatz November 21, 2017 @stevenschmatz

  28. Parallelization Initialize 1. Create shared list of all random 1 seeds, one per worker; and 7 2 Repeat: 1. Sample 2. Evaluate 6 3 5 4 Evolution Strategies Steven Schmatz November 21, 2017 @stevenschmatz

  29. Parallelization Initialize 1. Create shared list of all random 1 seeds, one per worker; and 7 2 Repeat: 1. Sample 2. Evaluate 6 3 3. Communicate to all nodes 5 4 Evolution Strategies Steven Schmatz November 21, 2017 @stevenschmatz

  30. Parallelization Initialize 1. Create shared list of all random 1 seeds, one per worker; and 7 2 Repeat: 1. Sample 2. Evaluate 6 3 3. Communicate to all nodes 4. Reconstruct for all other nodes using known random seeds 5 4 Evolution Strategies Steven Schmatz November 21, 2017 @stevenschmatz

  31. Parallelization Initialize 1. Create shared list of all random 1 seeds, one per worker; and 7 2 Repeat: 1. Sample 2. Evaluate 6 3 3. Communicate to all nodes 4. Reconstruct for all other nodes using known random seeds 5 4 5. Evolution Strategies Steven Schmatz November 21, 2017 @stevenschmatz

  32. Efficiency • The only information communicated at each iteration is a single scalar per machine. • Most distributed update mechanisms (A3C, Gorila) must communicate entire parameter lists. • Result: linear horizontal parallelization . Evolution Strategies Steven Schmatz November 21, 2017 @stevenschmatz

  33. Efficiency • The only information communicated at each iteration is a single scalar per machine. • Most distributed update mechanisms (A3C, Gorila) must communicate entire parameter lists. • Result: linear horizontal parallelization . Evolution Strategies Steven Schmatz November 21, 2017 @stevenschmatz

  34. Benefits Non-differentiable No backprop! Sparse rewards! policies! 3x computation time Learn long-term policies decrease! in hard environments! (hard attention!) And much cheaper than GPUs! Evolution Strategies Steven Schmatz November 21, 2017 @stevenschmatz

  35. Drawbacks Not useful for Data inefficient supervised learning . About 3–10x less data efficient (good, reliable gradients) Evolution Strategies Steven Schmatz November 21, 2017 @stevenschmatz

  36. Bottom Line If you have a large amount of CPU cores (>100), or if you have sparse rewards , evolution strategies may be a good bet. Evolution Strategies Steven Schmatz November 21, 2017 @stevenschmatz

  37. Evolution Strategies Steven Schmatz November 21, 2017 @stevenschmatz

  38. Appendix Evolution Strategies Steven Schmatz November 21, 2017 @stevenschmatz

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend