SLIDE 6 Balancing short vs. long term accuracy
π" β $ π" β€ π' (
)*+ "
πΏ) (
)-*+ )./
π) )- π) π’ β π’2 β 1 + (
)*+ "
πΏ) π'(π’)
Total return error Error due to state estimation Error due to reward estimation π)/' - Lipschitz constants of transition/reward functions π)/' π’ - Bound on model errors for transition/reward at time π’ π - Time horizon πΏ - Reward discount factor π" β‘ β)*+
"
πΏ) π (π’) - Return over entire trajectory
Closely related to bound in - Asadi, Misra, Littman. βLipschitz Continuity in Model-based Reinforcement Learning.β (ICML 2018).