Experiments in Value Function Approximation with Sparse Support - - PowerPoint PPT Presentation

experiments in value function approximation with sparse
SMART_READER_LITE
LIVE PREVIEW

Experiments in Value Function Approximation with Sparse Support - - PowerPoint PPT Presentation

Experiments in Value Function Approximation with Sparse Support Vector Regression Tobias Jung and Thomas Uthmann { tjung,uthmann } @informatik.uni-mainz.de Fachbereich Mathematik & Informatik Johannes Gutenberg-Universit at Mainz, Germany


slide-1
SLIDE 1

Experiments in Value Function Approximation with Sparse Support Vector Regression

Tobias Jung and Thomas Uthmann

{tjung,uthmann}@informatik.uni-mainz.de

Fachbereich Mathematik & Informatik Johannes Gutenberg-Universit¨ at Mainz, Germany

Value Function Approximation with Sparse SVR – ECML 2004 – p. 1/17

slide-2
SLIDE 2

Why SVR?

  • ✁✄✂
☎ ✆✞✝ ✁ ✟ ✆ ✟ ✠☛✡ ☞ ✝ ✌ ✌ ✍ ✆ ✍ ✡ ☞ ✎ ✏ ✑ ✒ ✒ ✓ ✔ ✕ ✖ ✕ ✗ ✕ ✓✙✘ ✚ ✛ ✘ ✕ ✜ ✓✙✘ ✗ ✒ ✕ ✢ ✔ ✣ ✗ ✔ ✤ ✥ ✓ ✔ ✏ ✜ ✣ ✗ ✒ ✦ ✧ ✒ ✓ ✗ ★ ✘ ✩ ✚ ✒ ★✪ ✫ ✒ ✚ ✥ ★ ✓ ✣ ✖ ★ ✓ ✖ ✖ ✥ ✑ ✒ ✪ ✓ ✔ ✓ ✪ ✥ ✤ ✬ ✪ ✢ ✣ ✩ ✜ ✓ ✔ ✏ ✚ ✣ ✫ ✗ ✒ ✓ ✪ ✩ ✢ ✕ ✜✄✭ ✒ ✢ ✪ ✖✯✮ ✕ ✢ ★ ✓ ✏ ✚ ✪ ✢ ✔ ✭ ✮ ✓ ✕ ✏✯✰ ★ ✓ ✖ ✖ ✑ ✔ ✩ ✢ ✓ ★✪ ✤ ✩ ✜ ✓ ✔ ✏ ✚ ✣ ✫ ✗ ✒ ✓ ✪ ✕ ✚ ✱ ✓ ✑ ✒ ✗ ★ ✱ ✓ ✕ ✩ ✚ ✒ ✦ ✖ ✲ ✔ ✚ ✳✴ ✛ ✚ ✒ ✭ ✓ ✕ ✕ ✢ ✔ ✭ ✳✵ ✮ ✔ ✚ ★ ✚ ✏ ✗ ★ ✣ ✢ ✔ ✢ ✣ ✗ ✶ ✥ ✓ ✕ ✕ ✓ ✒ ✭ ✓ ✔ ✓ ✒ ✗ ★ ✢✸✷ ✗ ✕ ✢ ✚ ✔ ✩ ✜ ✓ ✔ ✏ ✚ ✣ ✫ ✗ ✒ ✓ ✪ ✕ ✚ ★ ✚ ✏ ✗ ★ ✢ ✔ ✖ ✕ ✗ ✔ ✏ ✓✙✘ ✥✞✗ ✖ ✓ ✪ ✣ ✓ ✕ ✜ ✚ ✪ ✖ ✲ ✓✯✰ ✭ ✰ ✹✺ ✻ ✶
  • ✁✄✂
☎ ✆✞✝ ✁ ✟ ✆ ✟ ✠☛✡ ☞ ✠ ☞ ✍ ✆ ✍ ✡ ☞ ✎ ✏ ✚ ✔ ✏ ✓ ✫ ✕ ✑ ✗ ★ ✧ ✢ ✣ ✫ ★ ✓ ✣ ✓ ✔ ✕ ✗ ✕ ✢ ✚ ✔ ✢ ✖ ✖ ✑ ✓ ✖ ✼ ✽ ✾ ✿ ✆✸❀ ☞ ❁ ☞ ✟ ❂ ✁❄❃ ❅ ✡ ☞ ❆❇ ✡ ❆ ✚ ✔ ✘ ★ ✢ ✔ ✓ ✻ ✹ ✔ ✓ ✓ ✪ ✖ ✕ ✚ ✗ ✪ ✪ ✔ ✓ ✩ ✖ ✗ ✣ ✫ ★ ✓ ✖ ✕ ✚ ✕ ✜ ✓ ✏ ✑ ✒ ✒ ✓ ✔ ✕ ✕ ✒ ✗ ✢ ✔ ✢ ✔ ✭ ✖ ✓ ❈ ✑ ✓ ✔ ✏ ✓ ✣ ✚ ✪ ✢ ✛ ✤ ✲ ✑ ✫ ✪ ✗ ✕ ✓ ✶ ✓ ❉ ✢ ✖ ✕ ✢ ✔ ✭ ✚ ✔ ✓ ✖

Value Function Approximation with Sparse SVR – ECML 2004 – p. 2/17

slide-3
SLIDE 3

Contents

☞ ✟ ✆ ❀ ✟ ✁ ✆ ❀ ✟ ☞ ❅ ☞ ✠ ✌ ✁ ✟ ✎ ✂ ✰ ✾ ☞ ❅ ✁ ✡ ✄ ✁ ❇ ❂ ✟ ✆ ✌ ❇ ☎✝✆ ✆ ❆ ✌ ✞ ✆ ☎ ☞ ✟ ✆ ✌ ❇ ✻ ✓ ✢ ✔ ✛ ✚ ✒ ✏ ✓ ✣ ✓ ✔ ✕ ✹ ✓ ✗ ✒ ✔ ✢ ✔ ✭ ✮ ✟ ✓ ✣ ✫ ✚ ✒ ✗ ★ ✘ ✠ ✢ ✡ ✓ ✒ ✓ ✔ ✏ ✓ ✹ ✓ ✗ ✒ ✔ ✢ ✔ ✭ ✮ ☛ ✑ ✔ ✏ ✕ ✢ ✚ ✔ ☞ ✫ ✫ ✒ ✚ ❉ ✢ ✣ ✗ ✕ ✢ ✚ ✔ ✗ ✔ ✪ ✟ ✠ ✲ ✌ ✶ ✍ ✰ ✽ ✁ ✆ ✆ ✌ ❆ ✟ ✾ ✡ ❂ ✟ ✌ ❆ ✿ ✡ ✝ ❆ ✡ ❀ ❀ ✆ ✌ ❇ ☛ ✚ ✒ ✣ ✑ ★ ✗ ✕ ✓ ✎ ✏ ✮ ✑ ✫ ✗ ✒ ✖ ✓ ☞ ✫ ✫ ✒ ✚ ❉ ✢ ✣ ✗ ✕ ✢ ✚ ✔ ✮ ✻ ✓ ✪ ✑ ✏ ✓ ✪ ✏ ✒ ✚ ✥ ★ ✓ ✣ ✮ ✒ ✔ ✘ ★ ✢ ✔ ✓ ✑ ✓ ★ ✓ ✏ ✕ ✢ ✚ ✔ ✚ ✛ ✑ ✑ ✥ ✖ ✓ ✕ ✲ ✥✞✗ ✖ ✓ ✪ ✚ ✔ ✓ ✔ ✭ ✓ ★ ✮ ✔ ✗ ✔ ✔ ✚ ✒ ✗ ✔ ✪ ✔ ✓ ✢ ✒ ✲ ✍ ✌ ✌ ✍ ✶ ✶ ✕ ✰ ✖ ✞ ✆ ✡ ❆ ✆ ☎ ✡ ❇ ✟ ❀ ✗ ✒ ✢ ✪ ✩ ✚ ✒ ★✪ ✮ ✔ ✚ ✑ ✔ ✕ ✗ ✢ ✔ ✘ ✗ ✒ ✙ ✰ ✽ ✁ ☎ ☎ ☞ ❆ ✂ ☞ ❇ ✍ ✄ ✁ ✟ ✁ ❆ ✡ ✚ ✍ ✡ ☞❀ ✛ ✁ ✡ ✠ ✆✞✝ ✆ ❅ ☞ ❇ ✜

RL

SVR update add Sparse regressor list of A very big training set A very small Reduce (states, values)

Value Function Approximation with Sparse SVR – ECML 2004 – p. 3/17

slide-4
SLIDE 4

Reinforcement Learning I

✁ ✂ ✡ ❂ ✆✸❀ ✆ ✌ ❇ ✄ ❆ ✌ ❂ ✡ ❀ ❀ ❂ ✌ ❇ ❀ ✆✸❀ ✟ ❀ ✌ ☎ ✑ ✕ ✗ ✕ ✓ ✖

S = {s1, . . . , sN}

☞ ✏ ✕ ✢ ✚ ✔ ✖

A = {a1, . . . , aM}

✻ ✓ ✩ ✗ ✒ ✪ ✖ ✣ ✚ ✪ ✓ ★ ✼

Ra(s, s′)

✟ ✒ ✗ ✔ ✖ ✢ ✕ ✢ ✚ ✔ ✫ ✒ ✚ ✥✞✗ ✥ ✢ ★ ✢ ✕ ✢ ✓ ✖ ✲ ✔ ✗ ✒ ✦ ✚ ✆ ✶ ✼

P a(s, s′)

Agent Environment

st rt at st+1 t = 0, 1, 2, . . .

✝ ✆ ✟ ❂ ✁ ✜ ✞ ✖ ✑ ✗ ★ ★ ✤ ✟✡✠ ☛✡☞ ✌ ✠ ✟✎✍ ✠ ✏ ☞ ✍ ✟ ✰ ✑ ✔ ✻ ✹ ★ ✓ ✗ ✒ ✔ ✓ ✒ ✪ ✚ ✓ ✖ ✔ ✚ ✕ ✦ ✔ ✚ ✩ ✕ ✜ ✓ ✒ ✓ ✟✡✠ ☛

P a(s, s′), Ra(s, s′)

✰ ✔ ✠ ✕ ✡ ❂ ✟ ✆ ✁ ✡ ✜ ✏ ✜ ✚ ✚ ✖ ✓ ✗ ✏ ✕ ✢ ✚ ✔ ✖ ✕ ✚ ✣ ✗ ❉ ✢ ✣ ✢✸✷ ✓ ★ ✚ ✔ ✭ ✕ ✓ ✒ ✣ ✒ ✓ ✩ ✗ ✒ ✪ ✰

Value Function Approximation with Sparse SVR – ECML 2004 – p. 4/17

slide-5
SLIDE 5

Reinforcement Learning II

✆ ✟ ✡ ❆ ✆ ✌ ❇ ✜ ✢ ✔ ✁ ✔ ✢ ✕ ✓✙✘ ✜ ✚ ✒ ✢✸✷ ✚ ✔ ✓ ❉ ✫ ✓ ✏ ✕ ✓ ✪ ✕ ✚ ✕ ✗ ★ ✪ ✢ ✖ ✏ ✚ ✑ ✔ ✕ ✓ ✪ ✒ ✓ ✩ ✗ ✒ ✪ ✝ ✌ ✂ ✍ ✌ ✂ ✡ ✝ ✡ ✟ ✟ ✁ ✡ ❆ ✡ ✎ ✏ ✚ ★ ✢ ✏ ✤ ✼

π : S → A

✲ ✪ ✓ ✕ ✓ ✒ ✣ ✢ ✔ ✢ ✖ ✕ ✢ ✏ ✮ ✖ ✕ ✗ ✕ ✢ ✚ ✔ ✗ ✒ ✤ ✶ ✄ ✗ ★ ✑ ✓ ✛ ✑ ✔ ✏ ✕ ✢ ✚ ✔ ✼ ✲

γ

✪ ✢ ✖ ✏ ✚ ✑ ✔ ✕ ✒ ✗ ✕ ✓ ✶

V π(s) = Eπ{

  • k=0

γkrk |st = s, π}, ∀s

☎ ✓ ★ ★ ✣ ✗ ✔ ✖ ✗ ✤ ✖ ✼

V π(s) =

  • s′

P π(s)(s, s′)

  • Rπ(s)(s, s′) + γV π(s′)
  • ,

∀s

✆ ✌ ☞ ❅ ✜ ✚ ✫ ✕ ✢ ✣ ✗ ★ ✫ ✚ ★ ✢ ✏ ✤

π∗ = argmaxπV π

✮ ✚ ✫ ✕ ✢ ✣ ✗ ★ ✆ ✗ ★ ✑ ✓ ✛ ✑ ✔ ✏ ✕ ✢ ✚ ✔

V ∗(s) = maxπ V π(s)

∀s

❇ ✂ ✂ ☞ ✂ ❀ ✟ ✌ ❀ ✌ ❅ ✁ ✡ ✆ ✟ ✜ ✔ ✓ ✕ ✜ ✚ ✪ ✖ ✥✞✗ ✖ ✓ ✪ ✚ ✔ ✏ ✚ ★ ✢ ✏ ✤ ✑ ✕ ✓ ✒ ✗ ✕ ✢ ✚ ✔ ✲ ✓✯✰ ✭ ✰ ✒ ✫ ✕ ✢ ✣ ✢ ✖ ✕ ✢ ✏ ✏ ✑ ✮ ☞ ✏ ✕ ✚ ✒ ✘ ✘ ✒ ✢ ✕ ✢ ✏ ✶ ✔ ✓ ✕ ✜ ✚ ✪ ✖ ✥✞✗ ✖ ✓ ✪ ✚ ✔ ✄ ✗ ★ ✑ ✓ ✑ ✕ ✓ ✒ ✗ ✕ ✢ ✚ ✔ ✲ ✓✯✰ ✭ ✰ ✎ ✘ ★ ✓ ✗ ✒ ✔ ✢ ✔ ✭ ✶

Value Function Approximation with Sparse SVR – ECML 2004 – p. 5/17

slide-6
SLIDE 6

Reinforcement Learning III

❇ ✂ ☞ ❅ ✝ ✌ ❆ ✆ ✟ ✁ ☎ ❀ ✆ ✡ ❆ ☎ ✌ ❆ ☎ ✆ ✌ ❅ ✆ ❂ ✂ ❃ ✡ ✁ ☞ ❅ ✁ ☞ ✟ ✆ ✌ ❇ ✜ ✠ ✤ ✔ ✗ ✣ ✢ ✏ ✏ ✒ ✚ ✭ ✒ ✗ ✣ ✣ ✢ ✔ ✭ ✖ ✕ ✤ ★ ✓ ✲ ✣ ✚ ✪ ✓ ★ ✘ ✥✞✗ ✖ ✓ ✪ ✮ ✑ ✖ ✓ ✁ ❉ ✓ ✪ ✫ ✚ ★ ✢ ✏ ✤

π

✶ ✼

Vt+1(s) = Vt(s) +

  • s′

P π(s)(s, s′)

  • Rπ(s)(s, s′) + γVt(s′)
  • target

− Vt(s)

✓ ✣ ✫ ✚ ✒ ✗ ★ ✘ ✠ ✢ ✡ ✓ ✒ ✓ ✔ ✏ ✓ ✖ ✕ ✤ ★ ✓ ✲ ✣ ✚ ✪ ✓ ★ ✘ ✛ ✒ ✓ ✓ ✮ ✑ ✖ ✓ ✓ ✂✁ ✠ ✍ ✄ ✠ ✟ ✒ ✓ ✩ ✗ ✒ ✪

rt

✗ ✔ ✪ ✔ ✓ ❉ ✕ ✖ ✕ ✗ ✕ ✓

s′

✑ ✖ ✢ ✔ ✭

π

✶ ✼

Vt+1(s) = Vt(s) + α

  • rt + γVt(s′)
  • target (unbiased estimate)

− Vt(s)

☎ ✌ ❆ ✂ ❃ ✠ ☞❀ ✡ ✍ ☎ ✁ ❇ ❂ ✟ ✆ ✌ ❇ ☞ ✆ ✆ ❆ ✌ ✞ ✆ ☎ ☞ ✟ ✆ ✌ ❇ ✜ ☎ ✗ ✖ ✢ ✏ ✗ ★ ★ ✤ ✮ ✩ ✚ ✒ ✦ ✖ ✥ ✤ ✖ ✕ ✚ ✒ ✢ ✔ ✭ ☎ ✁ ✆ ☞ ✆ ✠ ✝ ✆ ☞ ✍ ✞ ✠ ✆ ✁ ✟ ✢ ✔ ✗ ★ ✢ ✖ ✕ ✼ ✛ ✂ ❃ ✁ ✆ ✍ ☞ ✟ ✡ ✼ ☞ ✟ ✟ ✔ ✓ ✩ ✢ ✔ ✖ ✕ ✗ ✔ ✏ ✓ ✩ ✜ ✓ ✔ ✓ ✆ ✓ ✒ ✏ ✑ ✒ ✒ ✓ ✔ ✕ ✖ ✕ ✗ ✕ ✓ ✢ ✖ ✠ ☞ ✍ ✛ ✒ ✚ ✣ ✒ ✓ ✖ ✕ ✓ ★ ✖ ✓ ✡ ☛ ✟✡☞ ✆ ✠ ✕ ✗ ✒ ✭ ✓ ✕ ✛ ✚ ✒ ☞ ✠ ☞ ✍ ✠ ✁ ✆ ✖ ✕ ✗ ✕ ✓ ✌ ✁ ✡ ❆ ✂ ✜ ☎ ✑ ✢ ★✪ ✲ ★ ✚ ✏ ✗ ★ ✶ ✗ ✫ ✫ ✒ ✚ ❉ ✢ ✣ ✗ ✕ ✢ ✚ ✔

Value Function Approximation with Sparse SVR – ECML 2004 – p. 6/17

slide-7
SLIDE 7

Recall SVR ...

✔ ✠ ✕ ✡ ❂ ✟ ✆ ✁ ✡ ✜ ✗ ✢ ✆ ✓ ✔ ✪ ✗ ✕ ✗

{xi, yi}ℓ

i=1

✰ ✑ ✔

ε

✘ ✑ ✄ ✻ ✩ ✓ ✖ ✚ ★ ✆ ✓ ✲ ✥ ✢ ✗ ✖ ✗ ✥ ✖ ✚ ✒ ✥ ✓ ✪ ✶

min

α,α∗∈

− 1

2 (α∗ − α)T K(α∗ − α) − ε(α∗ + α)T e + (α∗ − α)T y

s.t. 0 ≤ α, α∗ ≤ Ce

✄ ✆ ❇ ☞ ❅ ❆ ✡ ✝ ❆ ✡ ❀ ❀ ✌ ❆ ✜

f(x) = (α∗ − α)T k(x)

✩ ✜ ✓ ✒ ✓

k(·, ·)

✖ ✤ ✣ ✣ ✓ ✕ ✒ ✢ ✏ ✫ ✚ ✖ ✢ ✕ ✢ ✆ ✓ ✪ ✓ ✁ ✔ ✢ ✕ ✓ ✛ ✑ ✔ ✏ ✕ ✢ ✚ ✔ ✲ ✦ ✓ ✒ ✔ ✓ ★ ✶

K ∈

ℓ×ℓ

✂ ✓ ✒ ✔ ✓ ★ ✣ ✗ ✕ ✒ ✢ ❉

[K]ij = k(xi, xj) k(x) ∈

✩ ✢ ✕ ✜

k(x) =

  • k(x1, x), . . . , k(xℓ, x)

T C ∈

≥0

✻ ✓ ✭ ✑ ★ ✗ ✒ ✢✸✷ ✗ ✕ ✢ ✚ ✔ ✫ ✗ ✒ ✗ ✣ ✓ ✕ ✓ ✒ ✔ ✁ ❆ ✆ ❆ ✌ ✠ ❅ ✡ ☎ ✜ ✏ ✚ ✣ ✫ ★ ✓ ❉ ✢ ✕ ✤ ✖ ✏ ✗ ★ ✓ ✖ ✖ ✑ ✫ ✓ ✒ ★ ✢ ✔ ✓ ✗ ✒ ★ ✤ ✩ ✢ ✕ ✜ ✬ ✪ ✗ ✕ ✗

Value Function Approximation with Sparse SVR – ECML 2004 – p. 7/17

slide-8
SLIDE 8

Support Vector Regression II

✿ ✡ ❂ ☞ ❅ ❅ ✟ ✁ ✡ ✿ ✡ ✆ ❆ ✡ ❀ ✡ ❇ ✟ ✡ ❆ ✛ ✁ ✡ ✌ ❆ ✡ ☎ ✜ ✓ ✆ ✓ ✒ ✤ ✖ ✚ ★ ✑ ✕ ✢ ✚ ✔

f ∈ H

✲ ✻ ✂ ✑ ✶ ✕ ✚

min

f∈H

1 ℓ

  • i

c(xi, yi, f(xi)) + Λ fH

✗ ✪ ✣ ✢ ✕ ✖ ✗ ✒ ✓ ✫ ✒ ✓ ✖ ✓ ✔ ✕ ✗ ✕ ✢ ✚ ✔

f(x) = ℓ

i βik(xi, x)

= ⇒

✑ ✚ ★ ✑ ✕ ✢ ✚ ✔ ★ ✢ ✓ ✖ ✢ ✔ ✗ ✖ ✑ ✥ ✖ ✫ ✗ ✏ ✓ ✖ ✫ ✗ ✔ ✔ ✓ ✪ ✥ ✤ ✕ ✜ ✓

k(xi, ·)

✲ ✕ ✜ ✓ ✪ ✗ ✕ ✗ ✁ ✶ ✔ ✠ ❀ ✡ ❆ ✁ ☞ ✟ ✆ ✌ ❇ ✜

K

✵ ✖ ✓ ✢ ✭ ✓ ✔ ✆ ✗ ★ ✑ ✓ ✖ ✪ ✓ ✏ ✗ ✤ ✒ ✗ ✫ ✢ ✪ ★ ✤ ✮ ✣ ✗ ✔ ✤ ✚ ✛ ✕ ✜ ✓ ✣ ✗ ✒ ✓ ✆ ✓ ✒ ✤ ✖ ✣ ✗ ★ ★

= ⇒

✟ ✜ ✢ ✖ ✖ ✑ ✥ ✖ ✫ ✗ ✏ ✓ ✏ ✗ ✔ ✥ ✓ ☞ ✆ ✆ ❆ ✌ ✞ ✆ ☎ ☞ ✟ ✡ ✍ ✥ ✤ ✂ ✑ ✖ ✕ ✫ ✢ ✏ ✦ ✢ ✔ ✭ ❀ ✌ ☎ ✡ ✚ ✛ ✕ ✜ ✓

k(xi, ·)

✆ ✌ ☞ ❅ ✜ ✻ ✓ ✪ ✑ ✏ ✓ ✕ ✜ ✓ ✔ ✑ ✣ ✥ ✓ ✒ ✚ ✛ ✏ ✚ ✓ ✄ ✏ ✢ ✓ ✔ ✕ ✖

βi

✕ ✜ ✗ ✕ ✩ ✓ ✜ ✗ ✆ ✓ ✕ ✚ ✪ ✓ ✕ ✓ ✒ ✣ ✢ ✔ ✓✯✰

Value Function Approximation with Sparse SVR – ECML 2004 – p. 8/17

slide-9
SLIDE 9

Eliminate linear dependence

☞ ✖ ✖ ✑ ✣ ✓ ✩ ✓ ✜ ✗ ✆ ✓ ✫ ✢ ✏ ✦ ✓ ✪ ✕ ✜ ✓
✁ ✆

m

✖ ✗ ✣ ✫ ★ ✓ ✖ ✲ ✛ ✚ ✒ ✏ ✚ ✔ ✆ ✓ ✔ ✢ ✓ ✔ ✏ ✓ ✣ ✗ ✒ ✦ ✓ ✪ ✥ ✤

˜ ·

✶ ✰ ✰ ✰ ☎ ✆ ✆ ❆ ✌ ✞ ✆ ☎ ☞ ✟ ✡ ✜ ✕ ✜ ✓ ✒ ✓ ✣ ✗ ✢ ✔ ✢ ✔ ✭

ℓ − m

✚ ✔ ✓ ✖ ✲ ✢ ✔

H

min

ai∈

  • m
  • k(xi, ·) −

m

  • j

aijk(˜ xj, ·)

  • 2

H

, i = m + 1 . . . ℓ

✰ ✰ ✰ ✰ ✩ ✓ ✚ ✥ ✕ ✗ ✢ ✔ ✕ ✜ ✓ ✏ ✚ ✓ ✄ ✏ ✢ ✓ ✔ ✕ ✖ ✼

ai = ˜ K−1˜ k(xi)

✩ ✜ ✓ ✒ ✓

˜ K ∈

m×m

✿ ✡ ✍ ✁ ❂ ✡ ✍ ✁ ✡ ❆ ❇ ✡ ❅ ☎ ☞ ✟ ❆ ✆ ✞

[ ˜ K]ij = k(˜ xi, ˜ xj) ˜ k(xi) ∈

m

✩ ✢ ✕ ✜

˜ k(xi) =

  • k(˜

x1, xi), . . . , k(˜ xm, xi) T

✂ ✡ ✂ ❇ ✡ ✜

A ∈

ℓ×m

✕ ✚ ✥ ✓ ✕ ✜ ✓ ✣ ✗ ✕ ✒ ✢ ❉ ✏ ✚ ✔ ✖ ✢ ✖ ✕ ✢ ✔ ✭ ✚ ✛ ✕ ✜ ✓ ✒ ✚ ✩ ✖

aT

i

✰ ✟ ✜ ✓ ✔

K ≈ A ˜ KAT

✰ ✆ ✌ ☞ ❅ ✜ ✺ ✗ ✔ ✕ ✕ ✚ ✑ ✖ ✓ ✕ ✜ ✓ ✒ ✡ ✄ ☎ ✖ ✣ ✗ ★ ★ ✓ ✒

˜ K

✢ ✔ ✖ ✕ ✓ ✗ ✪ ✚ ✛ ✕ ✜ ✓ ✥ ✢ ✭

K

✢ ✔ ✚ ✑ ✒ ✎ ✏ ✰ ✰ ✰

Value Function Approximation with Sparse SVR – ECML 2004 – p. 9/17

slide-10
SLIDE 10

Define a reduced problem

❇ ❀ ✆ ✍ ✡ ❆ ✜ ✍ ✠ ✟ ✡ ✄ ✠ ✟ ✆ ✗ ✒ ✢ ✗ ✥ ★ ✓ ✖

˜ α = AT α

˜ α∗ = AT α∗

✲ ✓ ✗ ✏ ✜ ✢ ✔ ✁

m

✶ ✕ ✒ ✗ ✔ ✖ ✛ ✚ ✒ ✣ ✓ ✪ ✕ ✗ ✒ ✭ ✓ ✕ ✆ ✗ ★ ✑ ✓ ✖

˜ y = A†y

✖ ✚ ★ ✆ ✢ ✔ ✭ ✕ ✜ ✓ ✎ ✏ ✢ ✔ ✕ ✜ ✓ ✒ ✓ ✪ ✑ ✏ ✓ ✪ ✆ ✗ ✒ ✢ ✗ ✥ ★ ✓ ✖

˜ α, ˜ α∗

✩ ✢ ✕ ✜ ✕ ✜ ✓ ✒ ✓ ✪ ✑ ✏ ✓ ✪ ✖ ✓ ✕

{(˜ xi, ˜ yi)}m

i=1

✔ ✠ ✟ ☞ ✆ ❇ ✜ ✕ ✜ ✓ ✖ ✚ ★ ✑ ✕ ✢ ✚ ✔ ✕ ✚ ✕ ✜ ✓ ✒ ✓ ✪ ✑ ✏ ✓ ✪ ✫ ✒ ✚ ✥ ★ ✓ ✣

˜ f(·) =

m

  • i=1

(˜ α∗

i − ˜

αi)k(˜ xi, ·) =

  • i=1

(α∗

i − αi) m

  • j=1

aijk(˜ xi, ·) ≈

  • i=1

(α∗

i − αi)k(xi, ·) = f(·)

✩ ✜ ✢ ✏ ✜ ✢ ✖ ☞ ✆ ✆ ❆ ✌ ✞ ✆ ☎ ☞ ✟ ✡ ❅✄✂ ✕ ✜ ✓ ✚ ✔ ✓ ✩ ✓ ✩ ✚ ✑ ★✪ ✜ ✗ ✆ ✓ ✚ ✥ ✕ ✗ ✢ ✔ ✓ ✪ ✛ ✒ ✚ ✣ ✕ ✜ ✓ ✛ ✑ ★ ★ ✫ ✒ ✚ ✥ ★ ✓ ✣ ✰
❇ ❀ ✡
✡ ❇ ❂ ✡ ✜ ✑ ✔ ✖ ✕ ✓ ✗ ✪ ✚ ✛

{(xi, yi)}ℓ

i=1

✑ ✖ ✓ ✕ ✜ ✓ ✒ ✓ ✪ ✑ ✏ ✓ ✪ ✪ ✗ ✕ ✗

{(˜ xi, ˜ yi)}m

i=1

✲ ✑ ✖ ✑ ✗ ★ ★ ✤

m ≪ ℓ

✶ ✰

Value Function Approximation with Sparse SVR – ECML 2004 – p. 10/17

slide-11
SLIDE 11

How do we obtain the reduced set?

✆ ✌ ☞ ❅ ✜ ✥ ✑ ✢ ★✪

{(˜ xi, ˜ yi)}m

i=1

✢ ✔ ✗ ✔ ✚ ✔ ✘ ★ ✢ ✔ ✓ ✛ ✗ ✖ ✜ ✢ ✚ ✔ ✲ ✗ ✪ ✗ ✫ ✕ ✓ ✪ ✛ ✒ ✚ ✣ ✓ ✔ ✭ ✓ ★ ✓ ✕ ✗ ★ ✰ ✲ ✍ ✌ ✌ ✍ ✶ ✶ ✄ ☞ ❆ ☞ ☎ ✡ ✟ ✡ ❆ ✜ ✏ ✜ ✚ ✚ ✖ ✓

TOL

✲ ✗ ✫ ✫ ✒ ✚ ❉ ✢ ✣ ✗ ✕ ✢ ✚ ✔ ✫ ✒ ✓ ✏ ✢ ✖ ✢ ✚ ✔ ✶ ❁ ✌ ✌
✡ ✆ ✆ ❇ ✝ ✜ ✔ ✓ ✓ ✪

{(˜ xi, ˜ yi)}m

i=1

˜ K−1

(AT A)−1, AT y

✑ ✕ ✗ ✒ ✕ ✩ ✢ ✕ ✜ ✗ ✔ ✓ ✣ ✫ ✕ ✤ ✥✞✗ ✖ ✢ ✖

LOOP

✗ ✓ ✕ ✏ ✑ ✒ ✒ ✓ ✔ ✕ ✖ ✗ ✣ ✫ ★ ✓

(xt, yt)

✰ ✘ ✚ ✣ ✫ ✑ ✕ ✓ ✪ ✢ ✖ ✕ ✗ ✔ ✏ ✓

dt

✕ ✚✖ ✫ ✗ ✔ ✚ ✛ ✏ ✑ ✒ ✒ ✓ ✔ ✕ ✥✞✗ ✖ ✢ ✖ ✰

IF dt < TOL

✕ ✜ ✓ ✔

k(xt, ·)

✢ ✖ ✗ ✫ ✫ ✒ ✚ ❉ ✢ ✣ ✗ ✕ ✓ ✪ ✩ ✓ ★ ★ ✓ ✔ ✚ ✑ ✭ ✜ ✑ ✢✸✷ ✓ ✚ ✛ ✥ ✗ ✖ ✢ ✖ ✢ ✖ ✑ ✔ ✏ ✜ ✗ ✔ ✭ ✓ ✪ ✰ ✻ ✓ ✏ ✑ ✒ ✖ ✢ ✆ ✓ ★ ✤ ✑ ✫ ✪ ✗ ✕ ✓

(AT A)−1, AT y

ELSE

☞ ✪ ✪

xt

✕ ✚ ✥✞✗ ✖ ✢ ✖ ✰ ✻ ✓ ✏ ✑ ✒ ✖ ✢ ✆ ✓ ★ ✤ ✑ ✫ ✪ ✗ ✕ ✓

˜ K−1

✰ ☞ ✫ ✫ ✓ ✔ ✪

(AT A)−1, AT y

✰ ✝ ✌ ✂ ✍ ✌✡ ❀ ✆ ✟ ❀ ❂ ☞ ❅ ✡ ☎ ✌ ❆ ❅ ☞ ❆ ✝ ✡ ❀ ☞ ☎ ✆ ❅ ✡ ❀ ✆✁ ✡ ❀ ✎ ✓ ✄ ✏ ✢ ✓ ✔ ✕ ✼ ✣ ✓ ✣ ✚ ✒ ✤ ✗ ✔ ✪ ✏ ✚ ✣ ✫ ✑ ✕ ✗ ✕ ✢ ✚ ✔ ✗ ★ ✏ ✚ ✣ ✫ ★ ✓ ❉ ✢ ✕ ✤ ✢ ✖

O(m2) m

✢ ✖ ✗ ✖ ✤ ✣ ✫ ✕ ✚ ✕ ✢ ✏ ✗ ★ ★ ✤ ✢ ✔ ✪ ✓ ✫ ✓ ✔ ✪ ✓ ✔ ✕ ✚ ✛ ✕ ✚ ✕ ✗ ★ ✬ ✪ ✗ ✕ ✗

Value Function Approximation with Sparse SVR – ECML 2004 – p. 11/17

slide-12
SLIDE 12

Toy Example: Sombrero

☎ ✆ ✆ ❆ ✌ ✞ ✆ ☎ ☞ ✟ ✆ ❇ ✝ ✜

sin x / x

x ∈ [−10, 10]2

✰ ✟ ✒ ✗ ✢ ✔ ✢ ✔ ✭ ✼
✌ ✒ ✗ ✔ ✪ ✚ ✣ ★ ✤ ✪ ✒ ✗ ✩ ✔ ✖ ✗ ✣ ✫ ★ ✓ ✖ ✰ ✻ ☎ ☛ ✘ ✦ ✓ ✒ ✔ ✓ ★ ✰
  • 0.4
  • 0.2

0.2 0.4 0.6 0.8 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

  • 0.4
  • 0.2

0.2 0.4 0.6 0.8 1 Sigma 0.2 TOL 1e-6

  • 0.4
  • 0.2

0.2 0.4 0.6 0.8 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

  • 0.4
  • 0.2

0.2 0.4 0.6 0.8 1 Sigma 0.04 TOL 1e-6

Value Function Approximation with Sparse SVR – ECML 2004 – p. 12/17

slide-13
SLIDE 13

Putting everything together ...

RL

approximation sparse greedy Sparse regressor SVR Sparsified training set update add (efficient) List of states and values

(A very big list) (A very small list)

(si, ˜ Vt(si)) {(xi, yi)}ℓ

i=1

{(˜ xi, ˜ yi)}m

i=1

˜ Vt(·) = βik(˜ xi, ·)

Value Function Approximation with Sparse SVR – ECML 2004 – p. 13/17

slide-14
SLIDE 14

Experiment 1: Gridworld

✆ ✌ ☞ ❅ ✜ ✛ ✡ ❀ ✟ ☞ ✆ ✆ ❆ ✌ ✞ ✆ ☎ ☞ ✟ ✆ ✌ ❇
☞ ❅ ✆ ✟ ✂ ✆ ❇ ✌ ❇ ❃ ❅ ✆ ❇ ✡ ✿
☎ ✌ ✍ ✡ ❅ ❃ ✠ ☞❀ ✡ ✍ ✂

Value Function Approximation with Sparse SVR – ECML 2004 – p. 14/17

slide-15
SLIDE 15

Experiment 2a: Mountain Car

✆ ✌ ☞ ❅
✛ ✡ ❀ ✟ ☞ ✆ ✆ ❆ ✌ ✞ ✆ ☎ ☞ ✟ ✆ ✌ ❇
☞ ❅ ✆ ✟ ✂ ✆ ❇ ✌ ❇ ❃ ❅ ✆ ❇ ✡ ✿
☎ ✌ ✍ ✡ ❅ ❃ ✠ ☞❀ ✡ ✍ ✂

Value Function Approximation with Sparse SVR – ECML 2004 – p. 15/17

slide-16
SLIDE 16

Experiment 2b: Mountain Car

✆ ✌ ☞ ❅
☎ ✆ ☞ ❆ ✡ ✆ ✡ ❆ ☎ ✌ ❆ ☎ ☞ ❇ ❂ ✡ ✂ ✆ ✟ ✁ ✟ ✆ ❅ ✡ ❂ ✌ ✍ ✆ ❇ ✝ ✁

10 × 10 × 10

  • ptimal

Time−to−goal: CMAC vs SVR

50 100 150 200 250 300 350 400 450 500 200 400 600 800 1000 CMAC sigma 0.20 sigma 0.10 sigma 0.05

Time−to−goal (smoothed) Trials

  • ptimal

Greedy Policy: CMAC vs. SVR

−50 200 400 600 800 1000 CMAC sigma 0.20 sigma 0.10 sigma 0.05

Trials

−500 −450 −400 −350 −300 −250 −200 −150 −100

Reward of greedy policy (smoothed)

  • 0.08
  • 0.06
  • 0.04
  • 0.02

0.02 0.04 0.06 0.08

  • 1.2
  • 1
  • 0.8
  • 0.6
  • 0.4
  • 0.2

0.2 0.4 0.6 Path of greedy policy (Kernel=0.2) sigma 0.2 DP-solution

  • 0.08
  • 0.06
  • 0.04
  • 0.02

0.02 0.04 0.06 0.08

  • 1.2
  • 1
  • 0.8
  • 0.6
  • 0.4
  • 0.2

0.2 0.4 0.6 Path of greedy policy (Kernel=0.05) sigma 0.05 DP-solution

Value Function Approximation with Sparse SVR – ECML 2004 – p. 16/17

slide-17
SLIDE 17

Conclusions

✽ ✁ ☎ ☎ ☞ ❆ ✂ ✜ ✑ ✄ ✻ ✩ ✢ ✕ ✜ ✚ ✔ ✘ ★ ✢ ✔ ✓ ✻ ✹ ✢ ✖ ✣ ✗ ✪ ✓ ✫ ✚ ✖ ✖ ✢ ✥ ★ ✓ ✥ ✤ ✂ ✰ ✔ ✓ ✣ ✚ ✒ ✢✸✷ ✢ ✔ ✭ ✖ ✕ ✗ ✕ ✓ ✖
✗ ★ ✑ ✓ ✖ ✗ ✖ ✢ ✔ ✢ ✔ ✖ ✕ ✗ ✔ ✏ ✓✙✘ ✥✞✗ ✖ ✓ ✪ ✗ ✒ ✏ ✜ ✢ ✕ ✓ ✏ ✕ ✑ ✒ ✓ ✖ ✲ ✚ ✔ ✘ ★ ✢ ✔ ✓ ✶ ✍ ✰ ☎ ✑ ✢ ★✪ ✢ ✔ ✭ ✗ ✖ ✫ ✗ ✒ ✖ ✢ ✁ ✓ ✪ ✕ ✒ ✗ ✢ ✔ ✢ ✔ ✭ ✖ ✓ ✕ ✲ ✚ ✔ ✘ ★ ✢ ✔ ✓ ✶ ✕ ✰ ✑ ✚ ★ ✆ ✢ ✔ ✭ ✗ ✒ ✓ ✪ ✑ ✏ ✓ ✪ ✫ ✒ ✚ ✥ ★ ✓ ✣ ✄ ✁ ✟ ✁ ❆ ✡ ✂ ✌ ❆
❇ ✍ ❀ ✌ ☎ ✡ ✆ ✍ ✡ ☞❀ ✜ ✒ ✕ ✜ ✓ ✒ ★ ✓ ✗ ✒ ✔ ✢ ✔ ✭ ✣ ✓ ✏ ✜ ✗ ✔ ✢ ✖ ✣ ✖✯✮ ✓✯✰ ✭ ✰ ✫ ✚ ★ ✢ ✏ ✤ ✘ ✢ ✕ ✓ ✒ ✗ ✕ ✢ ✚ ✔ ✲ ✥✞✗ ✕ ✏ ✜ ✑ ✫ ✪ ✗ ✕ ✓ ✖ ✕ ✚ ✆ ✗ ★ ✑ ✓ ✛ ✑ ✔ ✏ ✕ ✢ ✚ ✔ ✶ ✔ ✚ ✒ ✓ ✪ ✢ ✄ ✏ ✑ ★ ✕ ✕ ✗ ✖ ✦ ✖ ✔ ✢ ✔ ✚ ✒ ✲ ✗ ✔ ✪ ✣ ✗ ✂ ✚ ✒ ✁ ✶ ✗ ★ ✭ ✚ ✒ ✢ ✕ ✜ ✣ ✢ ✏ ✢ ✣ ✫ ✒ ✚ ✆ ✓ ✣ ✓ ✔ ✕ ✖ ✑ ✫ ✗ ✒ ✖ ✢ ✁ ✓ ✪ ✕ ✒ ✗ ✢ ✔ ✢ ✔ ✭ ✖ ✓ ✕ ✏ ✚ ✑ ★✪ ✗ ★ ✖ ✚ ✥ ✓ ✑ ✖ ✓ ✪ ✢ ✔ ✒ ✓ ✭ ✑ ★ ✗ ✒ ✢✸✷ ✗ ✕ ✢ ✚ ✔ ✔ ✓ ✕ ✩ ✚ ✒ ✦ ✖ ✚ ✒ ✛ ✚ ✒ ✫ ★ ✗ ✏ ✓ ✣ ✓ ✔ ✕ ✚ ✛ ✥ ✗ ✖ ✢ ✖ ✛ ✑ ✔ ✏ ✕ ✢ ✚ ✔ ✖ ✢ ✔ ✻ ☎ ☛ ✘ ✔ ✓ ✕ ✩ ✚ ✒ ✦ ✖ ✘ ✚ ✔ ✆ ✓ ✒ ✭ ✓ ✔ ✏ ✓ ✁

Value Function Approximation with Sparse SVR – ECML 2004 – p. 17/17