Kernel-based Reinforcement Learning in Robust Markov Decision - - PowerPoint PPT Presentation

▶

Dec 23, 2023 375 likes •525 views

Kernel-based Reinforcement Learning in Robust Markov Decision Processes Shiau Hong Lim, Arnaud Autef Motivation Robust Markov Decision Process (MDP) framework Tackle model mismatch and parameter uncertainty Previously, for state

SLIDE 1

Kernel-based Reinforcement Learning in Robust Markov Decision Processes

Shiau Hong Lim, Arnaud Autef

SLIDE 2

Motivation

Robust Markov Decision Process (MDP)

framework

– Tackle model mismatch and parameter uncertainty – Previously, for state aggregation, performance bound on improved via robust policies:

12/6/2019 Arnaud Autef - ICML 2019 2

SLIDE 3

Contribution

1. Robust performance bound improvement on

extended to the general kernel averager setting 2.Formulation of a practical kernel-based robust algorithm, with empirical results on benchmark tasks

12/6/2019 Arnaud Autef - ICML 2019 3

SLIDE 4

Kernel-based approach

1.MDP to solve 2.Kernel averager and representative states to approximate the value function: and

12/6/2019 Arnaud Autef - ICML 2019 4

SLIDE 5

Kernel-based approach

2.Define a non-trivial robust MDP with states = representative states 3.Obtain

ptimal robust value in

4.Derive in greedy w.r.t , with:

12/6/2019 Arnaud Autef - ICML 2019 5

SLIDE 6

Theoretical Result

Theorem:

ptimal robust value in

, greedy policy w.r.t ,

ptimal value in

:

–

∗
–
∗
Function approximator limitations

–

∗ Smoothness

12/6/2019 Arnaud Autef - ICML 2019 6

SLIDE 7

Practical algorithm

1.Second kernel averager to approximate the MDP model from data 2.Solve with the approximate robust Bellman operator: With Robustness parameter

12/6/2019 Arnaud Autef - ICML 2019 7

SLIDE 8

Experiments: Acrobot

12/6/2019 Arnaud Autef - ICML 2019 8

SLIDE 9

Acrobot

12/6/2019 Arnaud Autef - ICML 2019 9

SLIDE 10

Experiments: Double Pole Balancing

12/6/2019 Arnaud Autef - ICML 2019 10

SLIDE 11

Double Pole Balancing

12/6/2019 Arnaud Autef - ICML 2019 11

SLIDE 12

Conclusion

Theoretical performance guarantees for

robust kernel-based reinforcement learning in

Significant empirical benefits from

robustness, even stronger with model mismatch (real-world settings)

12/6/2019 Arnaud Autef - ICML 2019 12

SLIDE 13

Kernel-based Reinforcement Learning in Robust Markov Decision Processes

Shiau Hong Lim, Arnaud Autef

Motivation

framework

– Tackle model mismatch and parameter uncertainty – Previously, for state aggregation, performance bound on improved via robust policies:

Contribution

extended to the general kernel averager setting 2.Formulation of a practical kernel-based robust algorithm, with empirical results on benchmark tasks

Kernel-based approach

1.MDP to solve 2.Kernel averager and representative states to approximate the value function: and

Kernel-based approach

2.Define a non-trivial robust MDP with states = representative states 3.Obtain

4.Derive in greedy w.r.t , with:

Theoretical Result

Theorem:

, greedy policy w.r.t ,

:

Practical algorithm

1.Second kernel averager to approximate the MDP model from data 2.Solve with the approximate robust Bellman operator: With Robustness parameter

Experiments: Acrobot

Acrobot

Experiments: Double Pole Balancing

Double Pole Balancing

Conclusion

robust kernel-based reinforcement learning in

robustness, even stronger with model mismatch (real-world settings)

Shiau Hong Lim, Arnaud Autef

Thank you! Please come to see our poster tonight