Understanding the impact of entropy on policy optimization Zafarali - - PowerPoint PPT Presentation

▶

Feb 05, 2023 128 likes •208 views

Understanding the impact of entropy on policy optimization Zafarali Ahmed, Nicolas Le Roux, Mohammad Norouzi, Dale Schuurmans bit.ly/2HQvGoQ zafarali.ahmed@mail.mcgill.ca Why should we understand policy optimization? What is policy

SLIDE 1

Understanding the impact of entropy on policy optimization

Zafarali Ahmed, Nicolas Le Roux, Mohammad Norouzi, Dale Schuurmans

zafarali.ahmed@mail.mcgill.ca

bit.ly/2HQvGoQ

SLIDE 2

What is policy optimization?

Find parameterized policy that maximizes rewards.

Why is it difficult?

Why should we understand policy optimization?

(1) Collect data + calculate objective (2) Take gradient + update policy parameters Bad gradient estimates? Difficult geometry? Not enough “Exploration”? Poor conditioning?

SLIDE 3

θ0

Contribution 1: How do we study high dim objective functions?

STEP 2: How does objective change along random perturbations? STEP 1: Collect random perturbations of

bjective

(+, +) (+, -) (-, -)

SLIDE 4

Contribution 1: How do we study high dim objective functions? Examples

@ A Saddle Point @ A Local Optimum

SLIDE 5

Contribution 2: Why does entropy regularization help?

Experiments on exact grid worlds and Mujoco

Conclusion: Even the absence of gradient estimation error, policy entropy helps by smoothing the

bjective function:

4.5

SLIDE 6

Read the paper! Come see poster!

Poster # 29 TODAY - 6.30 PM Pacific Ballroom

Chat with me!