Sample-Based Methods for Continuous Action Markov Decision Processes
Chris Mansley Ari Weinstein Michael Littman Rutgers Unversity
Sample-Based Methods for Continuous Action Markov Decision - - PowerPoint PPT Presentation
Sample-Based Methods for Continuous Action Markov Decision Processes Chris Mansley Ari Weinstein Michael Littman Rutgers Unversity From Learning to Planning Bellman Equation From Learning to Planning Bellman Equation Continuous State
Chris Mansley Ari Weinstein Michael Littman Rutgers Unversity
Standard machine learning approaches to function approximation have proven successful! Continuous State Space
Standard machine learning approaches to function approximation have proven successful! Continuous State Space Continuous Action Space
S0 S1 S2 S3
S4 S5
S7 S8 S9
S10
S11 S12
S1
1 (+1) 0.8 (-1) 0.2 (+1)
S0 S1 S4
S12
S0 S1
S12
S14
S4 S0 S1
S12 S14 S9
. . . . . .
Thanks to Remi Munos
40 60 80 100 120 140 160 180 200 100 1000 10000 Total Reward Samples per Planning Step (logscale) Double Integrator - 1D UCT 5A UCT 11A UCT 15A HOOT 165 170 175 180 185 190 195 10 20 30 40 50 Total Reward Number of Discrete Actions D-Double Integrator - 1D HOOT UCT 20 40 60 80 100 120 140 160 180 200 1 2 3 4 Total Reward Number of Action Dimensions D-Double Integrator HOOT UCT 5 UCT 10 UCT 20
200 400 600 800 1000 1200 1400 1600 1800 2000 2200 100 1000 10000 Total Reward Samples per Planning Step (logscale) Bicycle - 0.02cm UCT 5A UCT 10A UCT 20A HOOT 200 400 600 800 1000 1200 1400 1600 1800 2000 2200 3 5 7 9 11 13 15 Total Reward Number of Discretizations per Action Dimension Bicycle - 0.02cm HOOT UCT