Understanding the impact of entropy on policy optimization
Zafarali Ahmed, Nicolas Le Roux, Mohammad Norouzi, Dale Schuurmans
zafarali.ahmed@mail.mcgill.ca
Understanding the impact of entropy on policy optimization Zafarali - - PowerPoint PPT Presentation
Understanding the impact of entropy on policy optimization Zafarali Ahmed, Nicolas Le Roux, Mohammad Norouzi, Dale Schuurmans bit.ly/2HQvGoQ zafarali.ahmed@mail.mcgill.ca Why should we understand policy optimization? What is policy
zafarali.ahmed@mail.mcgill.ca
Find parameterized policy that maximizes rewards.
(1) Collect data + calculate objective (2) Take gradient + update policy parameters Bad gradient estimates? Difficult geometry? Not enough “Exploration”? Poor conditioning?
STEP 2: How does objective change along random perturbations? STEP 1: Collect random perturbations of
(+, +) (+, -) (-, -)
@ A Saddle Point @ A Local Optimum
Conclusion: Even the absence of gradient estimation error, policy entropy helps by smoothing the
4.5
Read the paper! Come see poster!
Chat with me!