SLIDE 15 Results 2D domains
5 10 15 20 100 150 200 250 300 350 400 450 500 Episodes Steps to goal (lower is better) Mountain car (GP−RMAX)
GP−RMAX exp GP−RMAX noexp GP−RMAX grid5 GP−RMAX grid10 200 400 600 800 1000 100 150 200 250 300 350 400 450 500 Episodes Steps to goal (lower is better) Mountain car (Sarsa)
Sarsa(λ) Tilecoding 10 Sarsa(λ) Tilecoding 20 5 10 15 20 −600 −500 −400 −300 −200 −100 Episodes Total reward (higher is better) Inverted pendulum (GP−RMAX)
GP−RMAX exp GP−RMAX noexp GP−RMAX grid5 GP−RMAX grid10 100 200 300 400 500 −450 −400 −350 −300 −250 −200 −150 −100 −50 Episodes Total reward (higher is better) Inverted pendulum (Sarsa)
Sarsa(λ) Tilecoding 10 Sarsa(λ) Tilecoding 40
GP-RMAX – ECML 09/21/10 – p.15/18