Sample mple-Opt Optimal imal Pa Para rametric metric Q-Le - - PowerPoint PPT Presentation

sample mple opt optimal imal pa para rametric metric q le
SMART_READER_LITE
LIVE PREVIEW

Sample mple-Opt Optimal imal Pa Para rametric metric Q-Le - - PowerPoint PPT Presentation

Sample mple-Opt Optimal imal Pa Para rametric metric Q-Le Learning arning Usi Using ng Li Line nearly arly Ad Additive ditive Fea eatur tures es Lin in F. Yan ang, , Meng ngdi di Wan ang A Basic RL Model: Markov Decision


slide-1
SLIDE 1

Sample mple-Opt Optimal imal Pa Para rametric metric Q-Le Learning arning Usi Using ng Li Line nearly arly Ad Additive ditive Fea eatur tures es

Lin in F. Yan ang, , Meng ngdi di Wan ang

slide-2
SLIDE 2

A Basic RL Model: Markov Decision Process

  • States: ; Actions:
  • Reward:
  • State transition:
  • Policy:
  • Optimal policy & value:
  • optimal policy :

Effective Horizon: random

slide-3
SLIDE 3
  • Optimal sample complexity:

Too many states for most cases …

|S| = 3361 |S| ≥ 256256×240

How to optimally reduce dimensions? Exploiting structures!

Curse of Dimensionality

slide-4
SLIDE 4

Parametric Q-Learning On Feature-Based MDP

  • Transition is decomposable

𝑄 ∈ ℝ 𝑇×𝐵 ×𝑇

Φ Ψ

Known Unknown

slide-5
SLIDE 5

Parametric Q-Learning On Feature-Based MDP

  • Transition is decomposable
slide-6
SLIDE 6

Parametric Q-Learning On Feature-Based MDP

0.2 0.11 0.3 0.5 0.01

slide-7
SLIDE 7

A Simple Regression Based Algorithm

  • Generative Model: we are able to samples from any (s,a)
  • Learn 𝑥 with modified Q-learning

Represent Q-function with parameter 𝑥 ∈ ℝ𝐿: 𝑅𝑥 ≔ 𝑠 𝑡, 𝑏 + 𝛿𝜚 𝑡, 𝑏 ⊤𝑥 𝑊

𝑥 𝑡 ≔ max 𝑏∈𝐵 𝑅𝑥(𝑡, 𝑏)

𝜌𝑥 𝑡 ≔ argmax𝑏∈𝐵𝑅𝑥(𝑡, 𝑏) Sample complexity (𝐿: feature dimension): ෨ 𝑃 𝐿 𝜗2 1 − 𝛿 7

slide-8
SLIDE 8

Sample Optimality?

  • Anchor condition:

𝑄 ⋅ |𝑡1, 𝑏1 𝑄 ⋅ |𝑡2, 𝑏2 𝑄 ⋅ |𝑡3, 𝑏3 𝑄 ⋅ |𝑡4, 𝑏4 𝑄 ⋅ |𝑡5, 𝑏5 𝑄 ⋅ |𝑡6, 𝑏6 𝑄 ⋅ |𝑡, 𝑏

Sample complexity: ෩ Θ 𝐿 𝜗2 1 − 𝛿 3 ArXiv: 1902.04779. Poster: 117