Kernel-based Reinforcement Learning in Robust Markov Decision - - PowerPoint PPT Presentation

kernel based reinforcement learning in robust markov
SMART_READER_LITE
LIVE PREVIEW

Kernel-based Reinforcement Learning in Robust Markov Decision - - PowerPoint PPT Presentation

Kernel-based Reinforcement Learning in Robust Markov Decision Processes Shiau Hong Lim, Arnaud Autef Motivation Robust Markov Decision Process (MDP) framework Tackle model mismatch and parameter uncertainty Previously, for state


slide-1
SLIDE 1

Kernel-based Reinforcement Learning in Robust Markov Decision Processes

Shiau Hong Lim, Arnaud Autef

slide-2
SLIDE 2

Motivation

  • Robust Markov Decision Process (MDP)

framework

– Tackle model mismatch and parameter uncertainty – Previously, for state aggregation, performance bound on improved via robust policies:

12/6/2019 Arnaud Autef - ICML 2019 2

slide-3
SLIDE 3

Contribution

  • 1. Robust performance bound improvement on

extended to the general kernel averager setting 2.Formulation of a practical kernel-based robust algorithm, with empirical results on benchmark tasks

12/6/2019 Arnaud Autef - ICML 2019 3

slide-4
SLIDE 4

Kernel-based approach

1.MDP to solve 2.Kernel averager and representative states to approximate the value function: and

12/6/2019 Arnaud Autef - ICML 2019 4

slide-5
SLIDE 5

Kernel-based approach

2.Define a non-trivial robust MDP with states = representative states 3.Obtain

  • ptimal robust value in

4.Derive in greedy w.r.t , with:

12/6/2019 Arnaud Autef - ICML 2019 5

slide-6
SLIDE 6

Theoretical Result

Theorem:

  • ptimal robust value in

, greedy policy w.r.t ,

  • ptimal value in

:

  • Function approximator limitations

  • ∗ Smoothness

12/6/2019 Arnaud Autef - ICML 2019 6

slide-7
SLIDE 7

Practical algorithm

1.Second kernel averager to approximate the MDP model from data 2.Solve with the approximate robust Bellman operator: With Robustness parameter

12/6/2019 Arnaud Autef - ICML 2019 7

slide-8
SLIDE 8

Experiments: Acrobot

12/6/2019 Arnaud Autef - ICML 2019 8

slide-9
SLIDE 9

Acrobot

12/6/2019 Arnaud Autef - ICML 2019 9

slide-10
SLIDE 10

Experiments: Double Pole Balancing

12/6/2019 Arnaud Autef - ICML 2019 10

slide-11
SLIDE 11

Double Pole Balancing

12/6/2019 Arnaud Autef - ICML 2019 11

slide-12
SLIDE 12

Conclusion

  • Theoretical performance guarantees for

robust kernel-based reinforcement learning in

  • Significant empirical benefits from

robustness, even stronger with model mismatch (real-world settings)

12/6/2019 Arnaud Autef - ICML 2019 12

slide-13
SLIDE 13

Shiau Hong Lim, Arnaud Autef

Thank you! Please come to see our poster tonight