ICML 2016 Backup Strategies in MCTS Piyush Khandelwal (UT Austin)
Complex Backup Strategies in Monte Carlo Tree Search
Piyush Khandelwal, Elad Liebman, Scott Niekum, and Peter Stone ICML 2016
University of Texas at Austin
Complex Backup Strategies in Monte Carlo Tree Search Piyush - - PowerPoint PPT Presentation
Complex Backup Strategies in Monte Carlo Tree Search Piyush Khandelwal , Elad Liebman, Scott Niekum, and Peter Stone University of Texas at Austin ICML 2016 Piyush Khandelwal (UT Austin) Backup Strategies in MCTS ICML 2016 Monte Carlo Tree
ICML 2016 Backup Strategies in MCTS Piyush Khandelwal (UT Austin)
University of Texas at Austin
ICML 2016 Backup Strategies in MCTS Piyush Khandelwal (UT Austin)
Planning Start State st at st+1 at+1 , rt , rt+1 Actions
Agent Environment Reward rt Next State st+1 Action at
2
ICML 2016 Backup Strategies in MCTS Piyush Khandelwal (UT Austin)
st at st+1 at+1 , rt , rt+1
3
ICML 2016 Backup Strategies in MCTS Piyush Khandelwal (UT Austin)
st at st+1 at+1 , rt , rt+1
4
ICML 2016 Backup Strategies in MCTS Piyush Khandelwal (UT Austin)
5
ICML 2016 Backup Strategies in MCTS Piyush Khandelwal (UT Austin)
6
We have estimates for all Q values while performing backpropagation. More Bias More Variance
r0 r1 rn
ICML 2016 Backup Strategies in MCTS Piyush Khandelwal (UT Austin)
7
r0 r1 rn
ICML 2016 Backup Strategies in MCTS Piyush Khandelwal (UT Austin)
8
➢ Parameter free. ➢ Assumes n-step return variances are highly correlated. ➢ Easier to implement. ➢ Assumes n-step return variances increase @ λ-1. r0 r1 rn
ICML 2016 Backup Strategies in MCTS Piyush Khandelwal (UT Austin)
9
Subtree with higher value
ICML 2016 Backup Strategies in MCTS Piyush Khandelwal (UT Austin)
10
ICML 2016 Backup Strategies in MCTS Piyush Khandelwal (UT Austin)
11
Recon Skill Teaching Elevators
ICML 2016 Backup Strategies in MCTS Piyush Khandelwal (UT Austin)
12
Recon Skill Teaching Elevators
ICML 2016 Backup Strategies in MCTS Piyush Khandelwal (UT Austin)
13
Recon Skill Teaching Elevators
ICML 2016 Backup Strategies in MCTS Piyush Khandelwal (UT Austin)
14
ICML 2016 Backup Strategies in MCTS Piyush Khandelwal (UT Austin)
15
Start Goal +100 Variable number of 0 Reward Terminal States Step -1
ICML 2016 Backup Strategies in MCTS Piyush Khandelwal (UT Austin)
16
#0-Term 3 6 15 λ = 1 90.4 11.3 0.9
λ = 0.8 90.2 28.0 10.7
λ = 0.6 89.5 62.8 45.3 8.5 λ = 0.4 88.7 85.1 77.6 24.1 λ = 0.2 87.7 82.6 78.1 28.4 λ = 0 84.5 79.8 74.1 31.8 Start Goal +100 Variable number of 0 Reward Terminal States Step -1
ICML 2016 Backup Strategies in MCTS Piyush Khandelwal (UT Austin)
17
ICML 2016 Backup Strategies in MCTS Piyush Khandelwal (UT Austin)
18
ICML 2016 Backup Strategies in MCTS Piyush Khandelwal (UT Austin)
19