Overcoming Exploration in Reinforcement Learning with Demonstrations
Authors: Ashvin Nair, Bob McGrew, Marcin Andrychowicz, Wojciech Zaremba, Pieter Abbeel Presentation by: Scott Larter
Reinforcement Learning with Demonstrations Authors: Ashvin Nair, - - PowerPoint PPT Presentation
Overcoming Exploration in Reinforcement Learning with Demonstrations Authors: Ashvin Nair, Bob McGrew, Marcin Andrychowicz, Wojciech Zaremba, Pieter Abbeel Presentation by: Scott Larter Introduction Addresses problem of exploration in
Authors: Ashvin Nair, Bob McGrew, Marcin Andrychowicz, Wojciech Zaremba, Pieter Abbeel Presentation by: Scott Larter
consecutive timesteps
not
grasping blocks
required over 100,000 demonstrations
𝑢, and
transitions to state 𝑦𝑢+1
𝑈
γ 𝑗−𝑢 𝑠
𝑗 with
horizon 𝑈 and discount factor γ
action
maximizing 𝑅 w.r.t. policy parameters θπ
𝑢, 𝑡𝑢+1
1) Sample 𝑂 tuples from 𝑆 2) Update critic function parameters by minimizing loss 3) Update policy parameters with policy gradient
state 𝑦𝑗
demonstration policy
demonstrations where the learned policy is better
exposing agent to higher rewards during training
be used in their place
simulation
holes
[1] M. Andrychowicz et al., “Hindsight experience replay,” in Advances in neural information processing systems (NIPS), 2017. [2] M. Vecerik et al., “Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards,” arXiv preprint arxiv:1707.08817, 2017.
in complex multi-step tasks with sparse rewards
speeding up learning significantly