Kernel-based Reinforcement Learning in Robust Markov Decision - - PowerPoint PPT Presentation
Kernel-based Reinforcement Learning in Robust Markov Decision - - PowerPoint PPT Presentation
Kernel-based Reinforcement Learning in Robust Markov Decision Processes Shiau Hong Lim, Arnaud Autef Motivation Robust Markov Decision Process (MDP) framework Tackle model mismatch and parameter uncertainty Previously, for state
SLIDE 1
SLIDE 2
Motivation
- Robust Markov Decision Process (MDP)
framework
– Tackle model mismatch and parameter uncertainty – Previously, for state aggregation, performance bound on improved via robust policies:
12/6/2019 Arnaud Autef - ICML 2019 2
SLIDE 3
Contribution
- 1. Robust performance bound improvement on
extended to the general kernel averager setting 2.Formulation of a practical kernel-based robust algorithm, with empirical results on benchmark tasks
12/6/2019 Arnaud Autef - ICML 2019 3
SLIDE 4
Kernel-based approach
1.MDP to solve 2.Kernel averager and representative states to approximate the value function: and
12/6/2019 Arnaud Autef - ICML 2019 4
SLIDE 5
Kernel-based approach
2.Define a non-trivial robust MDP with states = representative states 3.Obtain
- ptimal robust value in
4.Derive in greedy w.r.t , with:
12/6/2019 Arnaud Autef - ICML 2019 5
SLIDE 6
Theoretical Result
Theorem:
- ptimal robust value in
, greedy policy w.r.t ,
- ptimal value in
:
–
- ∗
- –
- ∗
- Function approximator limitations
–
- ∗ Smoothness
12/6/2019 Arnaud Autef - ICML 2019 6
SLIDE 7
Practical algorithm
1.Second kernel averager to approximate the MDP model from data 2.Solve with the approximate robust Bellman operator: With Robustness parameter
12/6/2019 Arnaud Autef - ICML 2019 7
SLIDE 8
Experiments: Acrobot
12/6/2019 Arnaud Autef - ICML 2019 8
SLIDE 9
Acrobot
12/6/2019 Arnaud Autef - ICML 2019 9
SLIDE 10
Experiments: Double Pole Balancing
12/6/2019 Arnaud Autef - ICML 2019 10
SLIDE 11
Double Pole Balancing
12/6/2019 Arnaud Autef - ICML 2019 11
SLIDE 12
Conclusion
- Theoretical performance guarantees for
robust kernel-based reinforcement learning in
- Significant empirical benefits from
robustness, even stronger with model mismatch (real-world settings)
12/6/2019 Arnaud Autef - ICML 2019 12
SLIDE 13