experiments in value function approximation with sparse
play

Experiments in Value Function Approximation with Sparse Support - PowerPoint PPT Presentation

Experiments in Value Function Approximation with Sparse Support Vector Regression Tobias Jung and Thomas Uthmann { tjung,uthmann } @informatik.uni-mainz.de Fachbereich Mathematik & Informatik Johannes Gutenberg-Universit at Mainz, Germany


  1. Experiments in Value Function Approximation with Sparse Support Vector Regression Tobias Jung and Thomas Uthmann { tjung,uthmann } @informatik.uni-mainz.de Fachbereich Mathematik & Informatik Johannes Gutenberg-Universit¨ at Mainz, Germany Value Function Approximation with Sparse SVR – ECML 2004 – p. 1/17

  2. ✓ ✻ ✟ ✆ ✟ ✁ ✓ ☎ ✶ � ✶ ✹✺ ☞ ✰ ✭ ✓ ✲ ✖ ✪ ✚ ✜ ✕ ✕ ✠ ✣ ✓ ✫ ✣ ✢ ✧ ★ ✗ ✑ ✕ ✫ ✏ ☞ ✔ ✚ ✏ ✎ ☞ ✡ ✍ ✆ ✍ ✓ ✪ ✓ ✖ ✔ ✓ ✜ ✩ ✔ ✚ ✢ ✕ ✗ ★ ✚ ✗ ✒ � ✔ ✓ ✭ ✒ ✓ ✕ ✏ ✣ ✓ ✢ ✖ ❉ ✢ ✏ ✔ ✗ ✕ ✖ ✔ ★ ✫ ✗ ✏ ✚ ★ ✚ ✕ ✪ ✓ ✒ ✗ ★ ✣ ✓ ✕ ✓ ✒ ✒ ✑ ✏ ✓ ✜ ✕ ✚ ✖ ✕ ✓ ★ ✫ ✣ ✗ ✖ ✩ ✓ ✔ ✔ ✕ ✪ ✔ ✲ ✤ ✛ ✢ ✪ ✚ ✣ ✓ ✏ ✓ ✒ ✑ ❈ ✓ ✖ ✭ ✔ ✢ ✔ ✢ ✗ ✪ ✗ ✓ ✑ ❁ ☞ ✗ ✿ ✾ ✽ ✼ ✖ ✓ ✖ ✟ ✖ ✢ ✔ ✚ ✢ ✕ ✗ ✕ ✔ ☞ ❂ ✚ ✔ ✕ ✖ ✪ ✓ ✓ ✔ ✹ ✻ ✓ ✢ ✪ ★ ✘ ✔ ✚ ❆ ✡ ❆❇ ☞ ✡ ❅ ✕ ✥ ✫ ✘ ★ ✥ ✚ ✒ ✫ ★✪ ✒ ✚ ✩ ★ ✣ ✗ ✓ ✒ ✧ ✦ ✒ ✗ ✣ ✜ ✓ ✖ ✔ ✥ ✔ ✓ ✜ ✩ ✣ ✢ ✪ ✬ ✤ ✪ ★ ✓ ✔ ✓ ✪ ✒ ✑ ✥ ✖ ✖ ✓ ✏ ✓ ✚ ✌ ✑ ✏ ✎ ☞ ✡ ✍ ✆ ✍ ✌ ✝ ✒ ☞ ✔ ✟ ✆ ✟ ✁ ✓ ☎ ✖ ✒ ✓ ✥ ✭ ✤ ✔ ✗ ✣ ✔ ✢ ✕ ✒ ✗ ✜ ✔ ✕ ✘ ✛ ✚ ✚ ✕ ✗ ✕ ✖ ✕ ✏ ✣ ✶ ✕ ✳✴ ✚ ✔ ✲ ✖ ✦ ✒ ✚ ✩ ✓ ✚ ✱ ★ ✗ ✒ ✑ ✓ ✱ ✚ ✕ ✛ ✒ ✓ ✚ ✗ ✣ ✢ ✔ ✢ ✣ ★ ✗ ✏ ★ ✭ ✚ ✔ ✮ ✳✵ ✭ ✔ ✢ ✕ ✕ ✓ ✪ ✒ ✫ ✪ ✢ ✪ ✚ ✏ ✓ ★ ✢ ✕ ✢ ✢ ✭ ✒ ✔ ✕ ✢ ✩ ✪ ✓ ✒ ✗ ✔ ✮ ✗ ★✪ ✫ ✣ ✚ ✏ ✔ ✓ ✜ ✩ ✤ ✓ ✓ ✢ ✩ ✔ ✑ ✖ ✖ ✓ ★ ✕ ✕ ✑ Value Function Approximation with Sparse SVR – ECML 2004 – p. 2/17 ✓✯✰ ✏✯✰ ✥✞✗ ✓✙✘ ✁❄❃ ✖✯✮ ✜✄✭ ✆✸❀ ✓✙✘ ✢✸✷ ✓✙✘ ✠☛✡ ✠☛✡ Why SVR? ✆✞✝ ✆✞✝ ✁✄✂ ✁✄✂

  3. ✒ ✏ ✛ ✚ ✔ ✚ ✢ ✕ ✓ ✑ ★ ✓ ✑ ✓ ✔ ✢ ★ ✑ ✥ ✔ ✔ ✔ ✮ ★ ✓ ✭ ✔ ✓ ✚ ✖ ✪ ✓ ✖ ❇ ✲ ✕ ✓ ✘ ✒ ✔ ☞ ✣ ✢ ❉ ✚ ✒ ✫ ✫ ✓ ✕ ✖ � ✗ ✫ ✑ ✮ ✏ ✗ ✢ ✮ ✪ ✣ ✓ ★ ✥ ✚ ✒ ✏ ✓ ✚ ✏ ✑ ✪ ✓ ✻ ✮ ✔ ✗ ✔ ✓ ✙ ☞ ☎ ☎ ✁ ✽ ✰ ✒ ✂ ✗ ✘ ✔ ✢ ✗ ✕ ✔ ❆ ☞ ✚ ✍ ☞ ✠ ✡ ✁ ✛ ☞❀ ✡ ✚ ❇ ✡ ❆ ✁ ✟ ✁ ✄ ✍ ✑ ✔ ✚ ✲ ✕ ✶ ✶ ✍ ✌ ✌ ✍ ✒ ✖ ✢ ✓ ✔ ✪ ✔ ✗ ✒ ✰ ✞ ✮ ✗ ★✪ ✒ ✚ ✩ ✪ ✢ ✒ ❀ ✆ ✟ ❇ ✡ ☎ ✆ ❆ ✡ ✎ ✕ ❅ ✔ ✣ ✓ ✏ ✒ ✚ ✛ ✢ ✔ ✓ ✻ ❇ ✌ ✆ ✟ ☞ ✓ ✕ ✆ ✟ ★ ✗ ✒ ✚ ✫ ✣ ✓ ✮ ✹ ✭ ✔ ✢ ✔ ✒ ✗ ✓ ☎ ✞ ✠ ❀ ✁ ✌ ✠ ☞ ❅ � ☞ ✟ ✆ ✎ ✁ ✟ ❀ ✆ ✟ ☞ ✁ ✟ ✂ ✌ ❂ ❆ ✆ ✜ ❇ ✌ ✆ ✟ ❇ ✰ ✁ ✄ ✡ ✁ ❅ ☞ ✾ ✘ ✢ ✗ ✆ ✡ ✾ ✟ ❆ ✌ ✆ ✁ ✟ ✽ ✰ ✍ ✶ ✌ ✲ ✠ ❂ ✌ ✪ ✌ ★ ✑ ✣ ✒ ✚ ☛ ❇ ✆ ❆ ❀ ❀ ✡ ❆ ✝ ✡ ✿ ✟ ✔ ✡ ✗ ☛ ✮ ✭ ✔ ✢ ✔ ✒ ✓ ✔ ✹ ✓ ✏ ✔ ✓ ✒ ✓ ✑ ✏ ✗ ❉ ✔ ✚ ✢ ✕ ✗ ✣ ✢ ✚ ✕ ✒ ✫ ✫ ☞ ✔ ✚ ✢ ✆ Value Function Approximation with Sparse SVR – ECML 2004 – p. 3/17 ✥✞✗ Sparse regressor SVR A very small training set Reduce (states, values) A very big list of update ☎✝✆ add RL Contents ✆✞✝

  4. ✟ ✔ ✣ ✠ ✏ ☞ ✍ ✟ ✰ ✑ ✻ ✌ ✹ ★ ✓ ✗ ✒ ✔ ✓ ✒ ✠ ✓ ✚ ☎ ✒ ✓ ✕ ✭ ✔ ✚ ✝ ✆ ❂ ★ ✁ ✜ ✞ ✖ ✑ ✗ ★ ★ ✤ ✪ ✓ ✒ ✓ ✁ ✡ ✜ ✏ ✜ ✚ ✚ ✖ ✗ ✟ ✏ ✕ ✢ ✚ ✔ ✖ ✕ ✚ ✆ ❂ ✖ ✜ ✔ ✚ ✕ ✦ ✔ ✚ ✩ ✕ ✓ ✡ ✒ ✓ ✢ ☛ ❉ ✰ ✔ ✠ ✕ ✣ ✼ ✗ ✗ ❀ ✪ ✟ ❀ ✌ ☎ ✑ ✕ ✕ ✌ ✓ ✖ ✒ ☞ ✏ ✕ ✢ ✚ ❇ ❂ ✖ ❂ � ☞ ❆ � ✌ ✁ ✂ ✡ ✰ ❀ ✆ ✌ ❇ ✄ ❆ ✌ ❂ ✡ ❀ ✔ ✗ ✶ ✕ ✫ ✒ ✚ ✓ ✥ ✢ ★ ✢ ✢ ✚ ✓ ✖ ✲ ✔ ✗ ✒ ✦ ✚ ✆ ✔ ✢ ✻ ✪ ✓ ✩ ✗ ✒ ✪ ✖ ✣ ✚ ✓ ✕ ★ ✼ ✩ ✟ ✒ ✗ ✔ ✖ ✢ ✣ Value Function Approximation with Sparse SVR – ECML 2004 – p. 4/17 P a ( s, s ′ ) , R a ( s, s ′ ) s t +1 s t Environment t = 0 , 1 , 2 , . . . Agent r t ✟✡✠ a t P a ( s, s ′ ) Reinforcement Learning I ✢✸✷ ✆✸❀ A = { a 1 , . . . , a M } R a ( s, s ′ ) S = { s 1 , . . . , s N } ✟✎✍ ✥✞✗ ☛✡☞ ✟✡✠ ✆✸❀

  5. ✌ ✔ ✓ ✔ ✚ ✢ ✕ ✏ ✑ ✕ ✛ ✓ ✑ ★ ✗ ✆ ✮ � ✗ ✌ ✆ ✡ ✁ ❅ ✌ ❀ ✟ ☞ ❀ ✂ ☞ ✂ ✂ ❇ ★ ✣ ✜ ✢ ❅ ☞ � ✆ ✗ ✕ ✚ ✚ ✔ ✲ ✭ ✰ ✎ ✼ ✜ ✫ ✢ ✏ ✕ ✫ ✚ ✮ ✒ ✤ ✢ ✕ ★ ✚ ✫ ★ ✗ ✣ ✢ ✟ ✔ ✤ ✏ ✒ ✘ ✘ ✒ ✚ ✕ ☞ ✕ ✮ ✑ ✏ ✏ ✢ ✕ ✢ ✢ ✢ ✑ ✄ ✔ ✚ ✪ ✓ ✖ ✖ ✏ ✪ ✚ ✜ ✕ ✓ ✔ ✶ ✖ ✣ ✓ ✓ ★ ✚ ✏ ✔ ✚ ✪ ✖ ✏ ✑ ✖ ✪ ✚ ✜ ✕ ✢ ✤ ✢ ✲ ✕ ✫ ✒ ✰ ✭ ✓ ✔ ✑ ✚ ✢ ✕ ✗ ✒ ✓ ✕ ✖ ✗ ★ ✪ ✪ ✒ ✗ ✩ ✓ ✒ ✓ ✌ ✕ ✔ ✑ ✚ ✏ ✖ ✝ ✂ ✪ ✁ ✚ ✏ ✎ ✡ ❆ ✡ ✟ ✍ ✟ ✡ ✝ ✡ ✂ ✌ ✢ ★ ✢ ❇ ✢ ✔ ✁ ✔ ✢ ✜ ✌ ✶ ✆ ❆ ✡ ✟ ✆ ❆ ✕ ✜ ✗ ✏ ✕ ✚ ✕ ✪ ✓ ✕ ✓ ✚ ✫ ❉ ✓ ✔ ✚ ✭ ✒ ★ ✏ ✖ ✪ ✔ ✑ ✚ ✏ ✖ ✢ ✢ ✒ ✲ ✼ ✔ ✚ ✢ ✕ ✕ ✗ ✔ ☎ ✔ ✗ ✣ ★ ★ ✓ ✘ ✕ ★ ✓ ✗ ✒ ✔ ✶ ✓ ✏ ✑ ✤ ✒ ✕ ✖ ✢ ✔ ✢ ✣ ✓ ✏ ✕ ✓ ✪ ✲ ✔ ✼ ✢ ✮ ✛ ✤ ✓ ✑ ★ ✗ ✄ ✶ ✒ ✖ ✗ ✔ ✚ ✢ ✕ ✗ ✕ ✗ Value Function Approximation with Sparse SVR – ECML 2004 – p. 5/17 ∀ s V ∗ ( s ) = max π V π ( s ) ∀ s , � R π ( s ) ( s, s ′ ) + γV π ( s ′ ) ∀ s γ k r k | s t = s, π } , � k =0 � P π ( s ) ( s, s ′ ) ∞ π ∗ = argmax π V π V π ( s ) = E π { Reinforcement Learning II ✓✯✰ ✓✯✰ � s ′ V π ( s ) = π : S → A γ ✢✸✷ ✓✙✘ ✥✞✗ ✥✞✗

  6. ❃ ✦ ✢ ✒ ✚ ✕ ✖ ✤ ✥ ✖ ✒ ✭ ✚ ✩ ✮ ✤ ★ ★ ✗ ✏ ✔ ☎ ✖ ✠ ✢ ★ ✗ ✔ ✢ ✟ ✁ ✆ ✞ ✁ ✍ ☞ ✆ ✝ ✠ ✆ ☞ ✆ ✢ ✗ ✕ ☎ ✍ ✡ ☞❀ ✠ � ✂ ❆ ✌ ✡ ✁ � ✡ ❆ ✂ ✜ ☎ ✑ ✢ ☎ ❇ ☎ ✞ ✜ ❇ ✌ ✆ ✟ ☞ ☎ ✆ ✌ ❂ ❆ ✆ ✆ ☞ ❇ ✌ ✆ ✟ ✖ ✼ ✲ ✖ ✁ ☛ ✡ ✓ ✖ ★ ✓ ✕ ✓ ✠ ✒ ✣ ✚ ✒ ✛ ✍ ☞ ✠ ✆ ✕ ✢ ☞ ✕ ✗ ✕ ✖ ✆ ✁ ✠ ✍ ✠ ✗ ☞ ✒ ✚ ✛ ✕ ✓ ✭ ✒ ✖ ✓ ✛ ☞ ✖ ✔ ✢ ✩ ✓ ✔ ✟ ✟ ✼ ✗ ✡ ✟ ☞ ✍ ✆ ✁ ❃ ✂ ✕ ✔ ✕ ✑ ✗ ✕ ✖ ✕ ✔ ✓ ✒ ✒ ✏ ✏ ✒ ✓ ✆ ✓ ✔ ✓ ✜ ✩ ✓ ★✪ ✼ ✌ ✢ ✲ ✓ ★ ✤ ✕ ✖ ✭ ✔ ✣ ✚ ✣ ✗ ✒ ✭ ✚ ✒ ✏ ✏ ✣ ✪ ✣ ✓ ✢ ★ ✚ ✫ ✪ ✓ ❉ ✁ ✖ ✓ ✑ ✮ ✪ ✓ ✖ ✔ ✘ ★ ✢ ✗ ✤ ✟ ✌ ☎ ❆ ✡ ✆ ❀ ☎ ✁ ✆ ☎ ❆ ✌ ✝ ❅ ☞ ✂ ❇ ☞ ❆ ✆ ✔ ✁ ✤ ✠ ✜ ❇ ✌ ✆ ✟ ☞ ❅ ✌ ☞ ✁ ✡ ❃ ✂ ❂ ✆ ❅ ✏ ✚ ✶ ✠ ✗ ✩ ✓ ✒ ✟ ✠ ✄ ✍ ✗ ✪ ✓ ✓ ✖ ✑ ✮ ✓ ✓ ✒ ✒ ✏ ✘ ✕ ★ ✭ ✔ ✢ ✖ ✑ ✚ ✓ ✗ ✗ ✕ ✖ ✕ ❉ ✓ ✔ ✪ ✔ ✛ ★ ✶ ✫ ✫ ✣ ✓ ✟ ★ ✶ ✗ ✫ ✒ ✒ ✚ ❉ ✢ ✣ ✗ ✕ ✢ ✼ ✚ ✗ ✓ ✖ ✪ ✚ ✣ ✲ ✓ ★ ✤ ✕ ✓ ★ ✏ ✔ ✓ ✒ ✓ ✡ ✢ ✠ ✘ ✓ Value Function Approximation with Sparse SVR – ECML 2004 – p. 6/17 π � − V t ( s ) s ′ � � � R π ( s ) ( s, s ′ ) + γV t ( s ′ ) − V t ( s ) r t π target (unbiased estimate) r t + γV t ( s ′ ) � �� target �� � P π ( s ) ( s, s ′ ) � �✂✁ � V t +1 ( s ) = V t ( s ) + α Reinforcement Learning III ✥✞✗ �� s ′ � V t +1 ( s ) = V t ( s ) + ✟✡☞

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend