Understanding the impact of entropy on policy optimization Zafarali - - PowerPoint PPT Presentation

understanding the impact of entropy on policy optimization
SMART_READER_LITE
LIVE PREVIEW

Understanding the impact of entropy on policy optimization Zafarali - - PowerPoint PPT Presentation

Understanding the impact of entropy on policy optimization Zafarali Ahmed, Nicolas Le Roux, Mohammad Norouzi, Dale Schuurmans bit.ly/2HQvGoQ zafarali.ahmed@mail.mcgill.ca Why should we understand policy optimization? What is policy


slide-1
SLIDE 1

Understanding the impact of entropy on policy optimization

Zafarali Ahmed, Nicolas Le Roux, Mohammad Norouzi, Dale Schuurmans

zafarali.ahmed@mail.mcgill.ca

bit.ly/2HQvGoQ

slide-2
SLIDE 2

What is policy optimization?

Find parameterized policy that maximizes rewards.

Why is it difficult?

Why should we understand policy optimization?

(1) Collect data + calculate objective (2) Take gradient + update policy parameters Bad gradient estimates? Difficult geometry? Not enough “Exploration”? Poor conditioning?

slide-3
SLIDE 3

θ0

Contribution 1: How do we study high dim objective functions?

STEP 2: How does objective change along random perturbations? STEP 1: Collect random perturbations of

  • bjective

(+, +) (+, -) (-, -)

slide-4
SLIDE 4

Contribution 1: How do we study high dim objective functions? Examples

@ A Saddle Point @ A Local Optimum

slide-5
SLIDE 5

Contribution 2: Why does entropy regularization help?

Experiments on exact grid worlds and Mujoco

Conclusion: Even the absence of gradient estimation error, policy entropy helps by smoothing the

  • bjective function:

4.5

slide-6
SLIDE 6

Read the paper! Come see poster!

Poster # 29 TODAY - 6.30 PM Pacific Ballroom

Chat with me!

zafarali.ahmed@mail.mcgill.ca

bit.ly/2HQvGoQ

Understanding the impact of entropy on policy optimization