Projections for Approximate Policy Iteration Algorithms Riad Akrour - - PowerPoint PPT Presentation

projections for approximate policy iteration algorithms
SMART_READER_LITE
LIVE PREVIEW

Projections for Approximate Policy Iteration Algorithms Riad Akrour - - PowerPoint PPT Presentation

Projections for Approximate Policy Iteration Algorithms Riad Akrour , Joni Pajarinen, Gerhard Neumann, Jan Peters IAS, TU Darmstadt, Germany ICML19 Entropy Regularization in RL Widespread with actor-critic methods ICML19 Hard vs Soft


slide-1
SLIDE 1

ICML19

Projections for Approximate Policy Iteration Algorithms

Riad Akrour, Joni Pajarinen, Gerhard Neumann, Jan Peters IAS, TU Darmstadt, Germany

slide-2
SLIDE 2

ICML19

Entropy Regularization in RL

Widespread with actor-critic methods

slide-3
SLIDE 3

ICML19

Hard vs Soft Constraints

  • Soft constraint (bonus term)
  • Hard constraint

– Harder to optimize, easier to interpret and tune Policy return Entropy reg.

slide-4
SLIDE 4

ICML19

Contributions

  • Projections hard constraining Shannon entropy of

Gaussian or soft-max policies

  • Projections that outperform other KL-constrained
  • ptimizers used in deep RL
slide-5
SLIDE 5

ICML19

Results

  • Optimizing vs

– Deep RL – Projected gradient – Direct policy search

slide-6
SLIDE 6

ICML19

Results

  • Optimizing vs

– Deep RL – Projected gradient – Direct policy search

Poster #34 Poster #34