Evolution Strategies Distributed deep reinforcement learning - PowerPoint PPT Presentation

Evolution Strategies Distributed deep reinforcement learning (blog.otoro.net) Evolutionary Strategies Steven Schmatz November 21, 2017 @stevenschmatz

Deep Reinforcement Learning Evolution Strategies Steven Schmatz November 21, 2017 @stevenschmatz

Agenda 1. Why is deep reinforcement learning hard? 2. How does evolution strategies (ES) help? 3. Advice on applying ES to real-world problems Evolution Strategies Steven Schmatz November 21, 2017 @stevenschmatz

RL in a nutshell (reinforcement learning) Evolution Strategies Steven Schmatz November 21, 2017 @stevenschmatz

Deep RL in a nutshell Evolution Strategies Steven Schmatz November 21, 2017 @stevenschmatz

Deep CNNs are useful. Evolution Strategies Steven Schmatz November 21, 2017 @stevenschmatz

Assumptions of supervised learning Stationary Independence Clear input-output distribution of examples relationship Evolution Strategies Steven Schmatz November 21, 2017 @stevenschmatz

RL violates these assumptions. 😮 Stationary Independence Clear input-output distribution of examples relationship Evolution Strategies Steven Schmatz November 21, 2017 @stevenschmatz

RL violates these assumptions. 😮 Stationary distribution The training data changes as you act differently. Evolution Strategies Steven Schmatz November 21, 2017 @stevenschmatz

RL violates these assumptions. 😮 Independence of examples Adjacent game frames are usually very similar. Evolution Strategies Steven Schmatz November 21, 2017 @stevenschmatz

RL violates these assumptions. 😮 Clear input-output relationship There can be a large delay between action and reward. Evolution Strategies Steven Schmatz November 21, 2017 @stevenschmatz

Deep Q-Learning Model Training objective Evolution Strategies Steven Schmatz November 21, 2017 @stevenschmatz

Policy gradients (our objective) (our weight update) Evolution Strategies Steven Schmatz November 21, 2017 @stevenschmatz

Policy gradients What if our reward function is highly nonlinear ? What if our reward is received much later ? What if our policy is non-differentiable ? How far should we step? Evolution Strategies Steven Schmatz November 21, 2017 @stevenschmatz

Local optima Evolution Strategies Steven Schmatz November 21, 2017 @stevenschmatz

Black-box optimization Evolution Strategies Steven Schmatz November 21, 2017 @stevenschmatz

ES to the rescue! At each iteration: 1. Generate candidate solutions from old candidates by adding noise 2. Evaluate a fitness function for each candidate 3. Aggregate the results and discard bad candidates. Evolution Strategies Steven Schmatz November 21, 2017 @stevenschmatz

Simple ES Basic idea: Select the single best previous solution, and add Gaussian noise. (Keep standard deviation fixed.) Evolution Strategies Steven Schmatz November 21, 2017 @stevenschmatz

Genetic ES Basic idea: Only keep the top performing 10% of solutions. Randomly select two solutions. Recombine them by randomly assigning each parameter value from either parent. (and add fixed Gaussian noise.) Example: Combine (1, 2, 3) and ( 4, 5, 6 ): - (1, 5, 6 ) - ( 4 , 2 , 3) Evolution Strategies Steven Schmatz November 21, 2017 @stevenschmatz

CMA–ES Basic idea: Select the best 25% of the population. Calculate a covariance matrix of these best 25%. (represents a promising area to search for new candidates) Generate new candidates using the per- parameter means and variances. Evolution Strategies Steven Schmatz November 21, 2017 @stevenschmatz

CMA–ES Basic idea: Select the best 25% of the population. Calculate a covariance matrix of these best 25%. (represents a promising area to search for new candidates) Generate new candidates using the per- parameter means and variances. 😮 Problem: Evolution Strategies Steven Schmatz November 21, 2017 @stevenschmatz

Natural ES Basic idea: Treat the problem a bit differently: Then use the gradient with your favorite SGD optimizer: Evolution Strategies Steven Schmatz November 21, 2017 @stevenschmatz

OpenAI ES Basic idea: Similar to Natural ES, but σ constant. Evolution Strategies Steven Schmatz November 21, 2017 @stevenschmatz

OpenAI ES Basic idea: Similar to Natural ES, but σ constant. Note: to parallelize we only need to know pairs! Evolution Strategies Steven Schmatz November 21, 2017 @stevenschmatz

Parallelization Initialize 1. Create shared list of all random 1 seeds, one per worker; and 7 2 6 3 5 4 Evolution Strategies Steven Schmatz November 21, 2017 @stevenschmatz

Parallelization Initialize 1. Create shared list of all random 1 seeds, one per worker; and 7 2 Repeat: 1. Sample 6 3 5 4 Evolution Strategies Steven Schmatz November 21, 2017 @stevenschmatz

Parallelization Initialize 1. Create shared list of all random 1 seeds, one per worker; and 7 2 Repeat: 1. Sample 2. Evaluate 6 3 5 4 Evolution Strategies Steven Schmatz November 21, 2017 @stevenschmatz

Parallelization Initialize 1. Create shared list of all random 1 seeds, one per worker; and 7 2 Repeat: 1. Sample 2. Evaluate 6 3 3. Communicate to all nodes 5 4 Evolution Strategies Steven Schmatz November 21, 2017 @stevenschmatz

Parallelization Initialize 1. Create shared list of all random 1 seeds, one per worker; and 7 2 Repeat: 1. Sample 2. Evaluate 6 3 3. Communicate to all nodes 4. Reconstruct for all other nodes using known random seeds 5 4 Evolution Strategies Steven Schmatz November 21, 2017 @stevenschmatz

Parallelization Initialize 1. Create shared list of all random 1 seeds, one per worker; and 7 2 Repeat: 1. Sample 2. Evaluate 6 3 3. Communicate to all nodes 4. Reconstruct for all other nodes using known random seeds 5 4 5. Evolution Strategies Steven Schmatz November 21, 2017 @stevenschmatz

Efficiency • The only information communicated at each iteration is a single scalar per machine. • Most distributed update mechanisms (A3C, Gorila) must communicate entire parameter lists. • Result: linear horizontal parallelization . Evolution Strategies Steven Schmatz November 21, 2017 @stevenschmatz

Benefits Non-differentiable No backprop! Sparse rewards! policies! 3x computation time Learn long-term policies decrease! in hard environments! (hard attention!) And much cheaper than GPUs! Evolution Strategies Steven Schmatz November 21, 2017 @stevenschmatz

Drawbacks Not useful for Data inefficient supervised learning . About 3–10x less data efficient (good, reliable gradients) Evolution Strategies Steven Schmatz November 21, 2017 @stevenschmatz

Bottom Line If you have a large amount of CPU cores (>100), or if you have sparse rewards , evolution strategies may be a good bet. Evolution Strategies Steven Schmatz November 21, 2017 @stevenschmatz

Evolution Strategies Steven Schmatz November 21, 2017 @stevenschmatz

Appendix Evolution Strategies Steven Schmatz November 21, 2017 @stevenschmatz

Evolution Strategies Distributed deep reinforcement learning - PowerPoint PPT Presentation

Evolution Strategies Distributed deep reinforcement learning (blog.otoro.net) Evolutionary Strategies Steven Schmatz November 21, 2017 @stevenschmatz Deep Reinforcement Learning Evolution Strategies Steven Schmatz November 21, 2017

EVOLUTION X3 - 1 - Evolution X3 Marketing Dpt. November 2006 - 2 - EVOLUTION X3 Evolution X3

Evolution of valley depth and width Evolution of valley depth and width Evolution of valley depth

Lecture 1 Chapter 9 Software evolution 1 Topics covered Evolution processes Change

EVOLUTION Its a Family Affair TODAYS LESSON Diversity and Evolution of Living Organisms

EVOLUTION Paper 2: 66 marks THEORIES OF EVOLUTION EVOLUTION : Change over Time Compiled by

Technology Evolution Technology Focused Evolution Architectural Changes Impact on

Science Evolution and Inheritance Year One Science | Year 6 | Evolution and Inheritance | Theory

Meta-Evolution Style for Software Architecture Evolution lah Ad Adel Ha Hassan n and Mourad d

Rehabilitation Consequences of Road Collisions ine Carroll Evolution Evolution Evolution

1. Evolution and Classification 1.1 Origin of Life and Plants 1.2 Animal Evolution 1.3 Human

Models of Language Evolution models thereof its evolution language Models of Language Evolution

Evolution Change over time but what is the process? Evolution: Change through time

The Generalized Theories of Evolution Why it is the Theory of Evolution that is Constantly

One Step Mutation (OSM) matrices joint work with Sequence Evolution 1 Sequence Evolution

Galaxy Evolution interstellar matter (ISM) drives galaxy evolution, but SFR evolution driven

Chapter: 12 12 HSPA Evolution HSPA Evolution Ruiyuan Tian Department of Electrical and

Models of Language Evolution Session 10 : Iterated Learning and the Evolution of Compositionality

INTERACTIVE EVOLUTIONARY GENERATION OF FACIAL COMPOSITES FOR LOCATING SUSPECTS IN CRIMINAL

Survey of Artificial Intelligence for Card Games and Its Application to the Swiss Game Jass J.

Long-Term Causes of Populism GI Bischi ** . Favaretto * E. S-Carrera ** F Boston College,

INTRODUCTION TO EVOLUTION STRATEGY ALGORITHMS James Gleeson Eric Langlois William Saunders

Evolutionary Algorithms General Concepts Prof. Thomas Bck Natural Computing Group

Model-Based Evolutionary Algorithms Part 1: Estimation of Distribution Algorithms Dirk Thierens

Covariance Matrix Adaptation Covariance Matrix Adaptation Evolution Strategies Recalling New