Projections for Approximate Policy Iteration Algorithms Riad Akrour - - PowerPoint PPT Presentation

▶

Oct 09, 2022 311 likes •376 views

Projections for Approximate Policy Iteration Algorithms Riad Akrour , Joni Pajarinen, Gerhard Neumann, Jan Peters IAS, TU Darmstadt, Germany ICML19 Entropy Regularization in RL Widespread with actor-critic methods ICML19 Hard vs Soft

SLIDE 1

ICML19

Projections for Approximate Policy Iteration Algorithms

Riad Akrour, Joni Pajarinen, Gerhard Neumann, Jan Peters IAS, TU Darmstadt, Germany

SLIDE 2

ICML19

Entropy Regularization in RL

Widespread with actor-critic methods

SLIDE 3

ICML19

Hard vs Soft Constraints

Soft constraint (bonus term)
Hard constraint

– Harder to optimize, easier to interpret and tune Policy return Entropy reg.

SLIDE 4

ICML19

Contributions

Projections hard constraining Shannon entropy of

Gaussian or soft-max policies

Projections that outperform other KL-constrained
ptimizers used in deep RL

SLIDE 5

ICML19

Results

Optimizing vs

– Deep RL – Projected gradient – Direct policy search

SLIDE 6

ICML19

Results

Optimizing vs

– Deep RL – Projected gradient – Direct policy search

Projections for Approximate Policy Iteration Algorithms

Entropy Regularization in RL

Widespread with actor-critic methods

Hard vs Soft Constraints

Contributions

Gaussian or soft-max policies

Results

Results

Poster #34 Poster #34