SLIDE 1
A Basic RL Model: Markov Decision Process
- States: ; Actions:
- Reward:
- State transition:
- Policy:
- Optimal policy & value:
- optimal policy :
Effective Horizon: random
Sample mple-Opt Optimal imal Pa Para rametric metric Q-Le - - PowerPoint PPT Presentation
Sample mple-Opt Optimal imal Pa Para rametric metric Q-Le Learning arning Usi Using ng Li Line nearly arly Ad Additive ditive Fea eatur tures es Lin in F. Yan ang, , Meng ngdi di Wan ang A Basic RL Model: Markov Decision
Effective Horizon: random
Too many states for most cases …
|S| = 3361 |S| ≥ 256256×240
How to optimally reduce dimensions? Exploiting structures!
Known Unknown
0.2 0.11 0.3 0.5 0.01
𝑥 𝑡 ≔ max 𝑏∈𝐵 𝑅𝑥(𝑡, 𝑏)
𝑄 ⋅ |𝑡1, 𝑏1 𝑄 ⋅ |𝑡2, 𝑏2 𝑄 ⋅ |𝑡3, 𝑏3 𝑄 ⋅ |𝑡4, 𝑏4 𝑄 ⋅ |𝑡5, 𝑏5 𝑄 ⋅ |𝑡6, 𝑏6 𝑄 ⋅ |𝑡, 𝑏