Action Selection for MDPs: Anytime AO* vs. UCT
Blai Bonet1 and Hector Geffner2
1Universidad Sim´
- n Bol´
ıvar
2ICREA & Universitat Pompeu Fabra
Action Selection for MDPs: Anytime AO* vs. UCT Blai Bonet 1 and - - PowerPoint PPT Presentation
Action Selection for MDPs: Anytime AO* vs. UCT Blai Bonet 1 and Hector Geffner 2 1 Universidad Sim on Bol var 2 ICREA & Universitat Pompeu Fabra AAAI, Toronto, Canada, July 2012 Online MDP Planning and UCT Offline infinite-horizon MDP
1Universidad Sim´
2ICREA & Universitat Pompeu Fabra
◮ G∗ is optimal solution of G on the assumption that tips of G
◮ the leaf is expanded ◮ values of the children are set with h(·), ◮ values are propagated upwards while recomputing G∗
◮ compared w/ state-of-the-art domain-specific UCT ◮ compared w/ own implementation of UCT and RTDP
◮ compared w/ own implementation of UCT ◮ compared w/ own implementation of RTDP
200 300 400
AOT 0.1 1 10 100 1e3 1e4 1e5
200 300 400
AOT 0.1 1 10 100 1e3 1e4 1e5
250 300 350 400 450
AOT(h) LRTDP(h) 0.1 1 10 100 1e3 1e4 1e5
250 300 350 400 450
AOT(h) LRTDP(h) 0.1 1 10 100 1e3 1e4 1e5
30 35 40 45
AOT 1 10 100 1e3
40 60 80 100
AOT 1 10 100 1e3
40 60 80 100
barto−big with h_d for d = 2.0
AOT(h) LRTDP(h) 0.1 1 10 100 1e3 1e4
20 40 60 80 100
barto−big with h_d for d = 1.0
AOT(h) LRTDP(h) 0.1 1 10 100 1e3 1e4
40 60 80 100
barto−big with h_d for d = 0.5
AOT(h) LRTDP(h) 0.1 1 10 100 1e3 1e4