Lipschitz Continuity in Model-based Reinforcement Learning
Kavosh Asadi*, Dipendra Misra*, Michael L. Littman
* denotes equal contribution
1
Lipschitz Continuity in Model-based Reinforcement Learning Kavosh - - PowerPoint PPT Presentation
Lipschitz Continuity in Model-based Reinforcement Learning Kavosh Asadi*, Dipendra Misra*, Michael L. Littman * denotes equal contribution 1 Model-based RL value/policy planning acting model experience model learning model learning: T
* denotes equal contribution
1
2
value/policy
3
truth model
credit to Matt Cooper for the video github.com/dyelax
[Talvitie 2014, Venkatraman et al. 2015]
4
s1∈M1,s2∈M1
f(s1)
5
µ1
µ1
j∈Λ
[Villani, 2008]
6
7
n−1
i=0
8
:Lipschitz constant of reward
value/policy
T (s)
K(R)
9
a Lipschitz operator
10
for each layer, ensure the weights are in a desired norm ball: Lipschitz constant of entire net is bounded by multiplication of Lipschitz constant of layers
11
average return per episode
samples
model
by an intermediate Lipschitz value
more experiments (including on stochastic domains) in the paper
12
★
Littman and Szepesvári, "A Generalized Reinforcement-Learning model: Convergence and Applications", 1996
★
Villani, "Optimal Transport, Old and New", 2014
★
Talvitie, "Model Regularization for Stable Sample Rollouts", 2014
★
Venkatraman, Hebert, and Bagnell, "Improving Multi-Step Prediction of Learned Time Series Models", 2015
13