class 4 on policy prediction
play

Class 4: On Policy Prediction With Approximation Chapter 9 Sutton - PowerPoint PPT Presentation

Class 4: On Policy Prediction With Approximation Chapter 9 Sutton slides/silver slides 295, class 4 1 Forms of approximations functions: A linear approximation, a neural network, a decision tree 295, class 4 2 The


  1. Class 4: On Policy Prediction With Approximation Chapter 9 Sutton slides/silver slides 295, class 4 1

  2. Forms of approximations functions: • A linear approximation, • a neural network, • a decision tree 295, class 4 2

  3. The Prediction Objective With approximation we can no longer hope to converge to the exact value for each state. We must specify a state weighting or distribution representing how much we care about the error in each state s. The objective function is to minimize the Mean Square Value Error, denoted: mu(s) is the fraction of time spent in s, which is called “on - policy distribution” The continuing case and the episodic case are different. • It is not obvious that the above is a good objective for RL (we want the value function in order to generate a good policy, but this is what we use. • For a general function form no guarantee to converge to optimal w* 17

  4. Stochastic-gradient and Semi-gradient Methods 295, class 4 18

  5. General Stochastic Gradient Descent 295, class 4 19

  6. 295, class 4 20

  7. Gradient Monte Carlo Algorithm for estimating v we cannot perform the exact update (9.5) because v(S t ) is unknown, but we can approximate it by substituting U t in place of v(S t ). This yields the following general SGD method for state-value prediction: Th egeneral SGD (aiming at G_t) converges to a local optimum approximation 21

  8. Semi Gradient Methods Replacing G_t with a bootstrapping target such as TD(0) or G_{t:t+n} will not guarantee convergence (but for linear functions) semi-gradient (bootstrapping) methods offer important advantages: they typically enable significantly faster learning, without waiting for the end of an episode. This enables them to be used on continuing problems and provides computational advantages. A prototypical semi-gradient method is semi-gradient TD(0), 295, class 4 22

  9. State Aggregation 295, class 4 23

  10. 295, class 4 24

  11. 295, class 4 25

  12. 295, class 4 26

  13. Linear Methods X(s) is a feature vector with the same dimensionality as w In the linear case there is only one optimum thus the Semi-SGD is guaranteed to converge to or near a local optimum. SGD does converges to the global optimum if alpha satisfies the usual conditions Of reducing over time. 27

  14. TD(0) Convergence 295, class 4 28

  15. Bootstrapping on the 1000-state random walk 295, class 4 29

  16. 295, class 4 31

  17. n-Step Semi-gradient TD for v 295, class 4 32

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend