Planning and Optimization G8. Trial-based Heuristic Tree Search - PowerPoint PPT Presentation

Planning and Optimization G8. Trial-based Heuristic Tree Search Gabriele R¨ oger and Thomas Keller Universit¨ at Basel December 17, 2018

Motivation THTS Framework THTS Algorithms Summary Content of this Course Tasks Progression/ Regression Classical Complexity Heuristics Planning MDPs Blind Methods Probabilistic Heuristic Search Monte-Carlo Methods

Motivation THTS Framework THTS Algorithms Summary Motivation

Motivation THTS Framework THTS Algorithms Summary AO ∗ & LAO ∗ : Recap Iteratively build explicated graph Extend explicated graph by expanding fringe node in partial solution graph State-value estimates are initialized with admissible heuristic Propagate information with Bellman backups in partial solution graph

Motivation THTS Framework THTS Algorithms Summary (Labeled) Real-Time Dynamic Programming: Recap Iteratively performs trials Simulates greedy policy in each trial Encountered states are updated with Bellman backup Admissible heuristic used if no state-value estimate available Labeling procedure marks states that have converged

Motivation THTS Framework THTS Algorithms Summary Monte-Carlo Tree Search: Recap Iteratively explicates search tree in trials Uses tree policy to traverse tree First encountered state not yet in tree added to search tree State-value estimates are initialized with default policy Propagates information with Monte-Carlo backups in reverse order through visited states

Motivation THTS Framework THTS Algorithms Summary Trial-based Heuristic Tree Search All are asymptotically optimal (or such a version exists) In practice, all have complementary strengths There are a significant differences between these algorithms but they also have a lot in common common framework that allows to describe all three: Trial-based Heuristic Tree Search (THTS)

Motivation THTS Framework THTS Algorithms Summary Trial-based Heuristic Tree Search Framework

Motivation THTS Framework THTS Algorithms Summary Trial-based Heuristic Tree Search Perform trials to explicate search tree decision (OR) nodes for states chance (AND) nodes for actions Annotate nodes with state-/action-value estimate visit counter solved label Initialize search nodes with heuristic abc

Motivation THTS Framework THTS Algorithms Summary Trial-based Heuristic Tree Search Perform trials to explicate search tree decision (OR) nodes for states chance (AND) nodes for actions Annotate nodes with state-/action-value estimate visit counter solved label Initialize search nodes with heuristic 6 variable ingredients: action selection outcome selection abc

Motivation THTS Framework THTS Algorithms Summary Trial-based Heuristic Tree Search Perform trials to explicate search tree decision (OR) nodes for states chance (AND) nodes for actions Annotate nodes with state-/action-value estimate visit counter solved label Initialize search nodes with heuristic 6 variable ingredients: action selection outcome selection initialization trial length abc

Motivation THTS Framework THTS Algorithms Summary Trial-based Heuristic Tree Search Perform trials to explicate search tree decision (OR) nodes for states chance (AND) nodes for actions Annotate nodes with state-/action-value estimate visit counter solved label Initialize search nodes with heuristic 6 variable ingredients: action selection outcome selection initialization trial length backup function abc

Motivation THTS Framework THTS Algorithms Summary Trial-based Heuristic Tree Search Perform trials to explicate search tree decision (OR) nodes for states chance (AND) nodes for actions Annotate nodes with state-/action-value estimate visit counter solved label Initialize search nodes with heuristic 6 variable ingredients: action selection outcome selection initialization trial length backup function abc recommendation function

Motivation THTS Framework THTS Algorithms Summary Trial-based Heuristic Tree Search THTS for SSP T = � S , L , c , T , s 0 , S ⋆ � d 0 = create root node associated with s 0 while time allows: visit decision node( d 0 , T ) return recommend( d 0 )

Motivation THTS Framework THTS Algorithms Summary THTS: Visit a Decision Node visit decision node for decision node d , SSP T = � S , L , c , T , s 0 , S ⋆ � if s ( d ) ∈ S ⋆ then return 0 a := select action( d ) if a not explicated: cost = expand and initialize( d , a ) if not trial length reached( d ) let c be the node in children( d ) with a ( c ) = a cost = visit chance node( c , T ) backup( d ,cost) return cost

Motivation THTS Framework THTS Algorithms Summary THTS: Visit a Chance Node visit chance node for chance node c , SSP T = � S , L , c , T , s 0 , S ⋆ � s ′ = select outcome( s ( c ) , a ( c )) if s ′ not explicated: cost = expand and initialize( c , s ′ ) if not trial length reached( c ) let d be the node in children( c ) with s ( d ) = s ′ cost = visit decision node( d , T ) cost = cost + c ( s ( c ) , a ( c )) backup( c ,cost) return cost

Motivation THTS Framework THTS Algorithms Summary THTS Algorithms

Motivation THTS Framework THTS Algorithms Summary MCTS in the THTS Framework Trial length: terminate trial when node is explicated Action selection: tree policy Outcome selection: sample Initialization: add single node to the tree and initialize with heuristic that simulates the default policy Backup function: Monte-Carlo backups Recommendation function: expected best arm

Motivation THTS Framework THTS Algorithms Summary AO ∗ (Tree Search Version) in the THTS Framework Trial length: terminate trial when node is expanded Action selection: greedy Outcome selection: depends on AO ∗ version Initialization: expand decision node and all its chance node V k with admissible heuristic successors, then initialize all ˆ Backup function: Bellman backups & solved labels Recommendation function: expected best arm

Motivation THTS Framework THTS Algorithms Summary LRTDP (Tree Search Version) in the THTS Framework Trial length: finish trials only in goal states Action selection: greedy Outcome selection: sample unsolved outcome Initialization: expand decision node and all its chance node V k with admissible heuristic successors, then initialize all ˆ Backup function: Bellman backups & solved labels Recommendation function: expected best arm

Motivation THTS Framework THTS Algorithms Summary Further Ingredients from Literature Recommendation function: Most played arm [Bubeck et al. 2009, Chaslot et al. 2008] Empirical distribution of plays [Bubeck et al. 2009] Secure arm [Chaslot et al. 2008] Initialization: Expand decision node and initialize chance nodes with heuristic for state-action pairs [Keller & Eyerich, 2012] Any classical heuristic on any determinization Occupation measure heuristic [Trevizan et al., 2017]

Motivation THTS Framework THTS Algorithms Summary Further Ingredients from Literature Backup functions: Temporal Differences [Sutton & Barto, 1987] Q-Learning [Watkins, 1989] Selective Backups [Feldman & Domshlak, 2012; Keller, 2015] MaxMonte-Carlo [Keller & Helmert, 2013] Partial Bellman [Keller & Helmert, 2013]

Motivation THTS Framework THTS Algorithms Summary Further Ingredients from Literature Action selections: Uniform sampling (UNI) ε -greedy ( ε -G) ε -G with decaying ε : ε LIN -G [Singh et al., 2000; Auer et al., 2002] ε RT -G [Keller, 2015] ε LOG -G [Keller, 2015] Boltzmann exploration (BE) BE with logarithmic decaying τ (BE-DT) [Singh et al., 2000] UCB1 [Auer et al., 2002] Root-valued UCB (RT-UCB) [Keller, 2015]

Motivation THTS Framework THTS Algorithms Summary Experimental Comparison THTS allows to mix and match ingredients Not all combinations asymptotically optimal Analysis based on properties of ingredients possible

Motivation THTS Framework THTS Algorithms Summary Experimental Comparison THTS allows to mix and match ingredients Not all combinations asymptotically optimal Analysis based on properties of ingredients possible In [Keller, 2015], comparison of: 1 trial length, 1 outcome selection, 1 initialization 2 different recommendation functions 9 different backup functions 9 different action selections ⇒ 162 different THTS algorithms 115 shown to be asymptotically optimal

Motivation THTS Framework THTS Algorithms Summary Asymptotic Optimality B G G T C - G 1 - D U G B - I N - - N G O T C E E T I U R R U - L L B B ǫ ǫ ǫ ǫ LSMC MC ESMC LSTD TD ESTD QL MaxMC PB

Motivation THTS Framework THTS Algorithms Summary Experimental Evaluation Most played arm recommendation function often better than same configuration with expected best arm n s o c r n k e e g i i o l m n t i s c r m t a i g i e i n r i a f f s e g d n d o l a f d l s v a a a m i l m a o v c a l e s i t a a i i c r l e k y a r r o W A C E G N R T T T T S S MCUCB1 27 65 78 86 45 92 77 89 86 71 46 84 70 MPA Prost 2011 26 62 49 84 42 90 69 88 83 60 49 85 66

Planning and Optimization G8. Trial-based Heuristic Tree Search - PowerPoint PPT Presentation

Planning and Optimization G8. Trial-based Heuristic Tree Search Gabriele R oger and Thomas Keller Universit at Basel December 17, 2018 Motivation THTS Framework THTS Algorithms Summary Content of this Course Tasks Progression/

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

Convex Optimization 4. Convex Optimization Problems Prof. Ying Cui Department of Electrical

P2P Combinatorial Optimization Amir H. Payberah (amir@sics.se) P2P Combinatorial Optimization, 13

Classical Planning Systems ICS 271 Fall 2014 Outline: Planning Planning environments

Planning and Optimization December 4, 2019 G1. Factored MDPs G1.1 Factored MDPs Planning and

Planning and Optimization October 16, 2019 C2. Delete Relaxation: Properties of Relaxed

Planning 2.0 BLMs Final Planning Rule http://www.blm.gov/plan2 1 Planning 2.0 Outline

Classical Planning Systems Chapter 10 R&N ICS 271 Fall 2016 Outline: Planning Planning

Convex Optimization by Stephen Boyd, and Lieven Vandenberghe. Optimization for Machine Learning by

Evolutionary Algorithm 2. Swarm Intelligence and Ant Colony Optimization Ant Colony Optimization

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Optimization Problems Instructor:

CS675: Convex and Combinatorial Optimization Spring 2018 Convex Optimization Problems

Five Steps to Optimization Five Steps to Optimization Beyond Best Practices Beyond Best

Introduction to Optimization Dr. Mihail October 23, 2018 (Dr. Mihail) Optimization October 23,

Optimization of HPSG Grammar Implementations in Trale Georgiana Dinu Optimization of HPSG

Search Engine Optimization What is Search Engine Optimization Search Engine Optimization is the

12: Social Networks Machine Learning and Real-world Data (MLRD) Ryan Cotterell (based on slides

sst r strt

Quad trees, addresses Top level node represents entire space, prefix string is Four nodes

Code Placement, Code Motion Compiler Construction Course Winter Term 2009/2010 saarland

More techniques for localised failures Riku Saikkonen 4th April 2007 Based on sections

Numerical computation of Coleman integrals Kiran S. Kedlaya Department of Mathematics,

11 An introduction to Riemann Integration The PROOFS of the standard lemmas and theorems

Explicit Coleman integration for hyperelliptic curves Jennifer Balakrishnan 1 Robert Bradshaw 2

Planning and Optimization G8. Trial-based Heuristic Tree Search - PowerPoint PPT Presentation

Planning and Optimization G8. Trial-based Heuristic Tree Search Gabriele R oger and Thomas Keller Universit at Basel December 17, 2018 Motivation THTS Framework THTS Algorithms Summary Content of this Course Tasks Progression/

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

Convex Optimization 4. Convex Optimization Problems Prof. Ying Cui Department of Electrical

P2P Combinatorial Optimization Amir H. Payberah (amir@sics.se) P2P Combinatorial Optimization, 13

Classical Planning Systems ICS 271 Fall 2014 Outline: Planning Planning environments

Planning and Optimization December 4, 2019 G1. Factored MDPs G1.1 Factored MDPs Planning and

Planning and Optimization October 16, 2019 C2. Delete Relaxation: Properties of Relaxed

Planning 2.0 BLMs Final Planning Rule http://www.blm.gov/plan2 1 Planning 2.0 Outline

Classical Planning Systems Chapter 10 R&amp;N ICS 271 Fall 2016 Outline: Planning Planning

Convex Optimization by Stephen Boyd, and Lieven Vandenberghe. Optimization for Machine Learning by

Evolutionary Algorithm 2. Swarm Intelligence and Ant Colony Optimization Ant Colony Optimization

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Optimization Problems Instructor:

CS675: Convex and Combinatorial Optimization Spring 2018 Convex Optimization Problems

Five Steps to Optimization Five Steps to Optimization Beyond Best Practices Beyond Best

Introduction to Optimization Dr. Mihail October 23, 2018 (Dr. Mihail) Optimization October 23,

Optimization of HPSG Grammar Implementations in Trale Georgiana Dinu Optimization of HPSG

Search Engine Optimization What is Search Engine Optimization Search Engine Optimization is the

12: Social Networks Machine Learning and Real-world Data (MLRD) Ryan Cotterell (based on slides

sst r strt

Quad trees, addresses Top level node represents entire space, prefix string is Four nodes

Code Placement, Code Motion Compiler Construction Course Winter Term 2009/2010 saarland

More techniques for localised failures Riku Saikkonen 4th April 2007 Based on sections

Numerical computation of Coleman integrals Kiran S. Kedlaya Department of Mathematics,

11 An introduction to Riemann Integration The PROOFS of the standard lemmas and theorems

Explicit Coleman integration for hyperelliptic curves Jennifer Balakrishnan 1 Robert Bradshaw 2

Classical Planning Systems Chapter 10 R&N ICS 271 Fall 2016 Outline: Planning Planning