SLIDE 16 Experiment 2b: Mountain Car
✆ ✌ ☞ ❅
☎ ✆ ☞ ❆ ✡ ✆ ✡ ❆ ☎ ✌ ❆ ☎ ☞ ❇ ❂ ✡ ✂ ✆ ✟ ✁ ✟ ✆ ❅ ✡ ❂ ✌ ✍ ✆ ❇ ✝ ✁
10 × 10 × 10
✂
Time−to−goal: CMAC vs SVR
50 100 150 200 250 300 350 400 450 500 200 400 600 800 1000 CMAC sigma 0.20 sigma 0.10 sigma 0.05
Time−to−goal (smoothed) Trials
Greedy Policy: CMAC vs. SVR
−50 200 400 600 800 1000 CMAC sigma 0.20 sigma 0.10 sigma 0.05
Trials
−500 −450 −400 −350 −300 −250 −200 −150 −100
Reward of greedy policy (smoothed)
0.02 0.04 0.06 0.08
0.2 0.4 0.6 Path of greedy policy (Kernel=0.2) sigma 0.2 DP-solution
0.02 0.04 0.06 0.08
0.2 0.4 0.6 Path of greedy policy (Kernel=0.05) sigma 0.05 DP-solution
Value Function Approximation with Sparse SVR – ECML 2004 – p. 16/17