Function Approximation via Tile Coding: Automating Parameter Choice - PowerPoint PPT Presentation

Function Approximation via Tile Coding: Automating Parameter Choice Alexander Sherstov and Peter Stone Department of Computer Sciences The University of Texas at Austin

About the Authors Alex Sherstov Peter Stone Thanks to Nick Jong for presenting!

Overview • TD reinforcement learning – Leading abstraction for decision making – Uses function approximation to store value function Agent action Environment value function Q(s,a) reward function r reward, new state transition function t

Overview • TD reinforcement learning – Leading abstraction for decision making – Uses function approximation to store value function Agent action Environment value function Q(s,a) reward function r reward, new state transition function t • Existing methods – Discretization, neural nets, radial basis, case-based, ... [Santamaria et al., 1997] – Trade-offs: representational power, time/space req’s, ease of use

Overview, cont. • “Happy medium": tile coding – Widely used in RL [Stone and Sutton, 2001, Santamaria et al., 1997, Sutton, 1996] . – Use in robot soccer: Full Action Sparse, coarse, Linear soccer values tile coding map state Few continuous state variables (13) Huge binary feature vector, F a (about 400 1’s and 40,000 0’s)

Our Results • We show that: – Tile-coding is parameter-sensitive – Optimal parameterization depends on the problem and elapsed training time

Our Results • We show that: – Tile-coding is parameter-sensitive – Optimal parameterization depends on the problem and elapsed training time • We contribute: – An automated parameter-adjustment scheme – Empirical validation

Background: Reinforcement Learning • RL problem given by �S , A , t, r � : – S , set of states ; – A , set of actions ; – t : S × A → Pr( S ) , transition function; – r : S × A → R , reward function.

Background: Reinforcement Learning • RL problem given by �S , A , t, r � : – S , set of states ; – A , set of actions ; – t : S × A → Pr( S ) , transition function; – r : S × A → R , reward function. • Solution: – policy π ∗ : S → A that maximizes return � ∞ i =0 γ i r i find π ∗ by approximating optimal value – Q -learning: function Q ∗ : S × A → R

Background: Reinforcement Learning • RL problem given by �S , A , t, r � : – S , set of states ; – A , set of actions ; – t : S × A → Pr( S ) , transition function; – r : S × A → R , reward function. • Solution: – policy π ∗ : S → A that maximizes return � ∞ i =0 γ i r i find π ∗ by approximating optimal value – Q -learning: function Q ∗ : S × A → R • Need FA to generalize Q ∗ to unseen situations

Background: Tile Coding Tiling #2 Tiling #1 2 2 # # e e l l b b a a i i r r a a V V e e t t a a t t S S State Variable #1 State Variable #1 • Maintaining arbitrary f : D → R (often D = S × A ): – D partitioned into tiles , each with a weight – Each partition is a tiling ; several used – Given x ∈ D , sum weights of participating tiles = ⇒ get f ( x )

Background: Tile Coding Parameters • We study canonical univariate tile coding: – w , tile width (same for all tiles) – t , # of tilings (“generalization breadth") – r = w/t , resolution – tilings uniformly offset

Background: Tile Coding Parameters • We study canonical univariate tile coding: – w , tile width (same for all tiles) – t , # of tilings (“generalization breadth") – r = w/t , resolution – tilings uniformly offset • Empirical model: – Fix resolution r , vary generalization breadth t – Same resolution = ⇒ same rep power, asymptotic perf – But: t affects intermediate performance – How to set t ?

Testbed Domain: Grid World • Domain and optimal policy: .8 .8 .8 .8 .8 .8 .8 .8 .8 wall .7 .7 .7 .7 .7 .7 .7 .7 .7 .6 .6 .7 .7 .7 .7 .6 .6 .6 .5 abyss .5 start goal • Episodic task (cliff, goal cells terminal) • Actions: ( d, p ) ∈ {↑ , ↓ , → , ←} × [0 , 1]

Testbed Domain, cont. • Move succeeds w/ prob. F ( p ) , random o/w; F varies from cell to cell: 1 0.9 0.8 0.7 0.6 0.5 0 0.2 0.4 0.6 0.8 1 p

Testbed Domain, cont. • Move succeeds w/ prob. F ( p ) , random o/w; F varies from cell to cell: 1 0.9 0.8 0.7 0.6 0.5 0 0.2 0.4 0.6 0.8 1 p • 2 reward functions: − 100 cliff, +100 goal, − 1 o/w (“informative"); +100 goal, 0 o/w (“uninformative")

Testbed Domain, cont. • Move succeeds w/ prob. F ( p ) , random o/w; F varies from cell to cell: 1 0.9 0.8 0.7 0.6 0.5 0 0.2 0.4 0.6 0.8 1 p • 2 reward functions: − 100 cliff, +100 goal, − 1 o/w (“informative"); +100 goal, 0 o/w (“uninformative") • Use of tile coding: generalize over actions ( p )

Generalization Helps Initially 100 100 1 TILING 3 TILINGS 80 80 6 TILINGS % OPTIMAL % OPTIMAL 60 60 40 40 1 TILING 20 20 3 TILINGS 6 TILINGS 0 0 0 250 500 750 1000 0 1000 2000 3000 4000 EPISODES COMPLETED EPISODES COMPLETED informative reward uninformative reward Generalization improves cliff avoidance.

Generalization Helps Initially, cont. 100 100 100 80 80 80 % OPTIMAL % OPTIMAL % OPTIMAL 60 60 60 40 40 40 1 TILING 1 TILING 1 TILING 20 20 20 3 TILINGS 3 TILINGS 3 TILINGS 6 TILINGS 6 TILINGS 6 TILINGS 0 0 0 0 10000 20000 30000 40000 50000 0 10000 20000 30000 40000 50000 0 10000 20000 30000 40000 50000 EPISODES COMPLETED EPISODES COMPLETED EPISODES COMPLETED α = 0 . 5 α = 0 . 1 α = 0 . 05 Generalization improves discovery of better actions.

Generalization Hurts Eventually 99 99 98 98 % OPTIMAL % OPTIMAL 97 97 96 96 1 TILING 1 TILING 3 TILINGS 3 TILINGS 6 TILINGS 6 TILINGS 95 95 40000 60000 80000 100000 40000 60000 80000 100000 EPISODES COMPLETED EPISODES COMPLETED informative reward uninformative reward Generalization slows convergence.

Adaptive Generalization • Best to adjust generalization over time

Adaptive Generalization • Best to adjust generalization over time • Solution: reliability index ρ ( s, a ) ∈ [0 , 1] – ρ ( s, a ) ≈ 1 = ⇒ Q ( s, a ) reliable (and vice versa) – large backup error on ( s, a ) decreases ρ ( s, a ) (and vice versa)

Adaptive Generalization • Best to adjust generalization over time • Solution: reliability index ρ ( s, a ) ∈ [0 , 1] – ρ ( s, a ) ≈ 1 = ⇒ Q ( s, a ) reliable (and vice versa) – large backup error on ( s, a ) decreases ρ ( s, a ) (and vice versa) • Use of ρ ( s, a ) : – An update to Q ( s, a ) is generalized to largest nearby region R that is unreliable on average : 1 ( s,a ) ∈ R ρ ( s, a ) ≤ 1 � | R | 2

Effects of Adaptive Generalization • Time-variant generalization – Encourages generalization when Q ( s, a ) changing – Suppresses generalization near convergence

Effects of Adaptive Generalization • Time-variant generalization – Encourages generalization when Q ( s, a ) changing – Suppresses generalization near convergence • Space-variant generalization – Rarely-visited states benefit from generalization for a longer time

Adaptive Generalization at Work 100 100 80 98 % OPTIMAL % OPTIMAL 60 96 40 94 ADAPTIVE ADAPTIVE 1 TILING 1 TILING 20 92 3 TILINGS 3 TILINGS 6 TILINGS 6 TILINGS 0 90 0 200 400 600 800 1000 20000 40000 60000 80000 100000 EPISODES COMPLETED EPISODES COMPLETED episodes 0–1000 episodes 1000–1000000 Adaptive generalization better than any fixed setting.

Conclusions • Precise empirical study of parameter choice in tile coding

Conclusions • Precise empirical study of parameter choice in tile coding • No single setting ideal for all problems, or even throughout learning curve on the same problem

Conclusions • Precise empirical study of parameter choice in tile coding • No single setting ideal for all problems, or even throughout learning curve on the same problem • Contributed algorithm for adjusting parameters as needed in different regions of S × A ( space-variant gen.) and at different learning stages ( time-variant gen.)

Conclusions • Precise empirical study of parameter choice in tile coding • No single setting ideal for all problems, or even throughout learning curve on the same problem • Contributed algorithm for adjusting parameters as needed in different regions of S × A ( space-variant gen.) and at different learning stages ( time-variant gen.) • Showed superiority of this adaptive technique to any fixed setting

References [Santamaria et al., 1997] Santamaria, J. C., Sutton, R. S., and Ram, A. (1997). Experiments with reinforcement learning in problems with continuous state and action spaces. Adaptive Behavior , 6(2):163–217. [Stone and Sutton, 2001] Stone, P . and Sutton, R. S. (2001). Scaling reinforcement learning toward RoboCup soccer. In Proc. 18th International Conference on Machine Learning (ICML-01) , pages 537–544. Morgan Kaufmann, San Francisco, CA. [Sutton, 1996] Sutton, R. S. (1996). Generalization in reinforcement learning: Successful examples using sparse coarse coding. In Tesauro, G., Touretzky, D., and Leen, T., editors, Advances in Neural Information Processing Systems 8 , pages 1038–1044, Cambridge, MA. MIT Press.

Function Approximation via Tile Coding: Automating Parameter Choice - PowerPoint PPT Presentation

Function Approximation via Tile Coding: Automating Parameter Choice Alexander Sherstov and Peter Stone Department of Computer Sciences The University of Texas at Austin About the Authors Alex Sherstov Peter Stone Thanks to Nick Jong for

Experience the Difference 2017 DECRA Villa Tile Panel Detail 2017 DECRA Villa Tile Roof

Formal Modeling in Cognitive Science 1 Coding Theorems Lecture 28: Kraft Inequality; Source Coding

Eastern Redcedar Mulch Tile Meet the Team Overview Mission Statement Mulch Tile Process

Odyssey 2016 The Speaker and Language Recognition Workshop June 21-24, 2016 Bilbao, Spain The

Domino Tilings Can you tile the grid with L-shaped tiles? Domino Tilings Can you tile the grid

Image and Video Coding: Video Coding Extensions Screen Content Coding Screen Content Coding

ADVANCED MULTIMEDIA ADVANCED MULTIMEDIA CODING CODING Fernando Pereira Instituto Superior

Dynamical systems Expanding maps on the circle. Coding Jana Rodriguez Hertz ICTP 2018 coding

6. Approximation and fitting norm approximation least-norm problems regularized

Accelerating Sparse DNN Models without Hardware-Support via Tile-Wise Sparsity 2020/11

Corporate Presentation May 2018 Agenda Global Tile Industry Indian Tile Industry Kajaria

RED LAKE RIVER FARM TO RED LAKE RIVER FARM TO STREAM TILE DRAINAGE STREAM TILE DRAINAGE STUDY

Corporate Presentation Oct 2018 Agenda Global Tile Industry Indian Tile Industry Kajaria

Corporate Presentation May 2019 Agenda Global Tile Industry Indian Tile Industry Kajaria

Corporate Presentation February 2020 Agenda Global Tile Industry Indian Tile Industry Kajaria

CSSS 569 Visualizing Data and Models Lab 5: Intro to tile Kai Ping (Brian) Leung Department of

Theo Keijzer a few slides with examples Article 6.1: tax avoidance term is used

Interrogating the Relationship Between Legally Defensible Tax Planning and Social Justice

Pttr rt rtts

Detecting and Avoiding Concurrency Bugs Pil Jae Jang Cyril Agbi Paper similarities Testing

Calibrated Model-Based Deep Reinforcement Learning IC ML 2019 Ali Malik, Volodymyr Kuleshov,

Path following with reinforcement learning for autonomous cars - Mozzam Motiwala (IAS) Index

CS 6320 Intro Immanuel Trummer itrummer@cornell.edu Course Organization Lecture Times

Shai Ben-David with Nati Srebro and Ruth Urner Philosophy of Machine Learning Workshop, NIPS,

Function Approximation via Tile Coding: Automating Parameter Choice - PowerPoint PPT Presentation

Function Approximation via Tile Coding: Automating Parameter Choice Alexander Sherstov and Peter Stone Department of Computer Sciences The University of Texas at Austin About the Authors Alex Sherstov Peter Stone Thanks to Nick Jong for

Experience the Difference 2017 DECRA Villa Tile Panel Detail 2017 DECRA Villa Tile Roof

Formal Modeling in Cognitive Science 1 Coding Theorems Lecture 28: Kraft Inequality; Source Coding

Eastern Redcedar Mulch Tile Meet the Team Overview Mission Statement Mulch Tile Process

Odyssey 2016 The Speaker and Language Recognition Workshop June 21-24, 2016 Bilbao, Spain The

Domino Tilings Can you tile the grid with L-shaped tiles? Domino Tilings Can you tile the grid

Image and Video Coding: Video Coding Extensions Screen Content Coding Screen Content Coding

ADVANCED MULTIMEDIA ADVANCED MULTIMEDIA CODING CODING Fernando Pereira Instituto Superior

Dynamical systems Expanding maps on the circle. Coding Jana Rodriguez Hertz ICTP 2018 coding

6. Approximation and fitting norm approximation least-norm problems regularized

Accelerating Sparse DNN Models without Hardware-Support via Tile-Wise Sparsity 2020/11

Corporate Presentation May 2018 Agenda Global Tile Industry Indian Tile Industry Kajaria

RED LAKE RIVER FARM TO RED LAKE RIVER FARM TO STREAM TILE DRAINAGE STREAM TILE DRAINAGE STUDY

Corporate Presentation Oct 2018 Agenda Global Tile Industry Indian Tile Industry Kajaria

Corporate Presentation May 2019 Agenda Global Tile Industry Indian Tile Industry Kajaria

Corporate Presentation February 2020 Agenda Global Tile Industry Indian Tile Industry Kajaria

CSSS 569 Visualizing Data and Models Lab 5: Intro to tile Kai Ping (Brian) Leung Department of

Theo Keijzer a few slides with examples Article 6.1: tax avoidance term is used

Interrogating the Relationship Between Legally Defensible Tax Planning and Social Justice

Pttr rt rtts

Detecting and Avoiding Concurrency Bugs Pil Jae Jang Cyril Agbi Paper similarities Testing

Calibrated Model-Based Deep Reinforcement Learning IC ML 2019 Ali Malik*, Volodymyr Kuleshov*,

Path following with reinforcement learning for autonomous cars - Mozzam Motiwala (IAS) Index

CS 6320 Intro Immanuel Trummer itrummer@cornell.edu Course Organization Lecture Times

Shai Ben-David with Nati Srebro and Ruth Urner Philosophy of Machine Learning Workshop, NIPS,

Calibrated Model-Based Deep Reinforcement Learning IC ML 2019 Ali Malik, Volodymyr Kuleshov,