Planning and Optimization
- G8. Trial-based Heuristic Tree Search
Gabriele R¨
- ger and Thomas Keller
Universit¨ at Basel
Planning and Optimization G8. Trial-based Heuristic Tree Search - - PowerPoint PPT Presentation
Planning and Optimization G8. Trial-based Heuristic Tree Search Gabriele R oger and Thomas Keller Universit at Basel December 17, 2018 Motivation THTS Framework THTS Algorithms Summary Content of this Course Tasks Progression/
Universit¨ at Basel
Motivation THTS Framework THTS Algorithms Summary
Motivation THTS Framework THTS Algorithms Summary
Motivation THTS Framework THTS Algorithms Summary
Motivation THTS Framework THTS Algorithms Summary
Motivation THTS Framework THTS Algorithms Summary
Motivation THTS Framework THTS Algorithms Summary
Motivation THTS Framework THTS Algorithms Summary
Motivation THTS Framework THTS Algorithms Summary
decision (OR) nodes for states chance (AND) nodes for actions
state-/action-value estimate visit counter solved label
Motivation THTS Framework THTS Algorithms Summary
decision (OR) nodes for states chance (AND) nodes for actions
state-/action-value estimate visit counter solved label
action selection
Motivation THTS Framework THTS Algorithms Summary
decision (OR) nodes for states chance (AND) nodes for actions
state-/action-value estimate visit counter solved label
action selection
initialization trial length
Motivation THTS Framework THTS Algorithms Summary
decision (OR) nodes for states chance (AND) nodes for actions
state-/action-value estimate visit counter solved label
action selection
initialization trial length backup function
Motivation THTS Framework THTS Algorithms Summary
decision (OR) nodes for states chance (AND) nodes for actions
state-/action-value estimate visit counter solved label
action selection
initialization trial length backup function recommendation function
Motivation THTS Framework THTS Algorithms Summary
Motivation THTS Framework THTS Algorithms Summary
Motivation THTS Framework THTS Algorithms Summary
Motivation THTS Framework THTS Algorithms Summary
Motivation THTS Framework THTS Algorithms Summary
Motivation THTS Framework THTS Algorithms Summary
Motivation THTS Framework THTS Algorithms Summary
Motivation THTS Framework THTS Algorithms Summary
Most played arm [Bubeck et al. 2009, Chaslot et al. 2008] Empirical distribution of plays [Bubeck et al. 2009] Secure arm [Chaslot et al. 2008]
Expand decision node and initialize chance nodes with heuristic for state-action pairs [Keller & Eyerich, 2012] Any classical heuristic on any determinization Occupation measure heuristic [Trevizan et al., 2017]
Motivation THTS Framework THTS Algorithms Summary
Motivation THTS Framework THTS Algorithms Summary
εLIN-G [Singh et al., 2000; Auer et al., 2002] εRT-G [Keller, 2015] εLOG-G [Keller, 2015]
Motivation THTS Framework THTS Algorithms Summary
Motivation THTS Framework THTS Algorithms Summary
1 trial length, 1 outcome selection, 1 initialization 2 different recommendation functions 9 different backup functions 9 different action selections
Motivation THTS Framework THTS Algorithms Summary
Motivation THTS Framework THTS Algorithms Summary
A c a d e m i c C r
s i n g E l e v a t
s G a m e N a v i g a t i
R e c
S k i l l S y s a d m i n T a m a r i s k T r a f f i c T r i a n g l e W i l d f i r e T
a l MCUCB1 MPA 27 65 78 86 45 92 77 89 86 71 46 84 70 Prost 2011 26 62 49 84 42 90 69 88 83 60 49 85 66
Motivation THTS Framework THTS Algorithms Summary
1 UCB1 4 RT-UCB 4 BE 2 BE-DT 1 ǫ-G 1 ǫRT-G 1 ǫLOG-G 1 ǫLIN-G
Motivation THTS Framework THTS Algorithms Summary
1 UCB1 4 RT-UCB 4 BE 2 BE-DT 1 ǫ-G 1 ǫRT-G 1 ǫLOG-G 1 ǫLIN-G 6 MC 4 PB 2 TD 2 MaxMC 1 SMC 1 QL
Motivation THTS Framework THTS Algorithms Summary
1 UCB1 4 RT-UCB 4 BE 2 BE-DT 1 ǫ-G 1 ǫRT-G 1 ǫLOG-G 1 ǫLIN-G 6 MC 4 PB 2 TD 2 MaxMC 1 SMC 1 QL
Motivation THTS Framework THTS Algorithms Summary
Motivation THTS Framework THTS Algorithms Summary
Motivation THTS Framework THTS Algorithms Summary