Parameter Space Noise for Exploration Matthias Plappert, Rein - - PowerPoint PPT Presentation

parameter space noise for exploration
SMART_READER_LITE
LIVE PREVIEW

Parameter Space Noise for Exploration Matthias Plappert, Rein - - PowerPoint PPT Presentation

Parameter Space Noise for Exploration Matthias Plappert, Rein Houthooft, Prafulla Dhariwal, Szymon Sidor, Richard Y. Chen, Xi Chen, Tamim Asfour, Pieter Abbeel, and Marcin Andrychowicz 1 Let the Noise Flo - Flo Rida 2 Background


slide-1
SLIDE 1

1

Parameter Space Noise for Exploration

Matthias Plappert, Rein Houthooft, Prafulla Dhariwal, Szymon Sidor, Richard Y. Chen, Xi Chen, Tamim Asfour, Pieter Abbeel, and Marcin Andrychowicz

slide-2
SLIDE 2

2

“Let the Noise Flo”

  • Flo Rida
slide-3
SLIDE 3

3

Background – Reinforcement Learning

slide-4
SLIDE 4

4

Parameter Space Noise – Motivation

slide-5
SLIDE 5

5

Parameter Space Noise – Formulation

We sample the noise at the beginning of each rollout, and keep it fixed for the duration of the rollout.

slide-6
SLIDE 6

6

Parameter Space Noise – Formulation

slide-7
SLIDE 7

7

Parameter Space Noise – Problems

slide-8
SLIDE 8

8

Parameter Space Noise – Problems

slide-9
SLIDE 9

9

Parameter Space Noise – Problems

slide-10
SLIDE 10

10

Parameter Space Noise – Problem 1

Adding noise to now perturbs activations which are normalized to zero mean and unit variance more sensitivity to mean noise Each layer would have similar sensitivity to

slide-11
SLIDE 11

11

Parameter Space Noise – Problem 2

slide-12
SLIDE 12

12

Parameter Space Noise – Experiments (1)

We test for exploration on a simple but scalable toy environment [1] Chains of length N with initial state . Each episode lasts N + 9 steps, algorithm successful if it can get the optimal reward of 10. 
 
 
 
 
 
 Experiments on DQN with different exploration methods

[1] “Deep exploration via Bootstrapped DQN”, Osband et al., 2016

slide-13
SLIDE 13

13

Parameter Space Noise – Experiments (2)

slide-14
SLIDE 14

14

Parameter Space Noise – Experiments (3)

slide-15
SLIDE 15

15

Parameter Space Noise – Experiments (4)

Evaluation on 7 MuJoCo continuous control problems
 DDPG with different exploration methods
 Exploration of additive Gaussian noise (left) vs. parameter space noise (right)

slide-16
SLIDE 16

16

Parameter Space Noise – Experiments (5)

slide-17
SLIDE 17

17

Parameter Space Noise – Conclusion

Conceptually simple concept designed as a drop-in replacement for action space noise (or as an addition)
 Often leads to better performance due to better exploration
 Especially helps when exploration is especially important (i.e. sparse rewards)
 Seems to escape local optima (e.g. HalfCheetah)
 Works for off- and on-policy algorithms for discrete and continuous action spaces


slide-18
SLIDE 18

18

Parameter Space Noise – Related Work

Concurrently to our work, DeepMind has proposed “Noisy Networks for Exploration”, Fortunato et al., 2017
 “Deep Exploration via Bootstrapped DQN”, Osband et al., 2016
 “Evolution strategies as a scalable alternative to reinforcement learning”, Salimans et al., 2017
 “State-dependent exploration for policy gradient methods”, Rückstieß et al., 2008
 And a lot of other papers on the general topic of exploration in RL

slide-19
SLIDE 19

19

Thank you!