Complex Backup Strategies in Monte Carlo Tree Search Piyush - PowerPoint PPT Presentation

Complex Backup Strategies in Monte Carlo Tree Search Piyush Khandelwal , Elad Liebman, Scott Niekum, and Peter Stone University of Texas at Austin ICML 2016 Piyush Khandelwal (UT Austin) Backup Strategies in MCTS ICML 2016

Monte Carlo Tree Search MCTS MDP Planning Start State s t Actions Agent a t , r t Reward r t Action a t s t+1 Next Stat e s t+1 a t+1 , r t+1 Environment Piyush Khandelwal (UT Austin) Backup Strategies in MCTS ICML 2016 2

Monte Carlo Tree Search s t a t , r t 4 stages in MCTS: Selection ➢ s t+1 Expansion ➢ Simulation ➢ a t+1 , r t+1 Backpropagation ➢ Piyush Khandelwal (UT Austin) Backup Strategies in MCTS ICML 2016 3

MCTS - Backpropagation (Motivation) Monte Carlo backup for s t single trajectory: a t , r t s t+1 Across all trajectories: a t+1 , r t+1 Can we do better? Piyush Khandelwal (UT Austin) Backup Strategies in MCTS ICML 2016 4

This talk Contribution: Formalize and analyze different on-policy/off-policy complex ➢ backup approaches from RL literature for MCTS planning. Talk outline: Review complex backup strategies from RL in MCTS context. ➢ Empirical evaluation using IPC benchmarks. ➢ Explore relationship between domain structure and backup ➢ strategy performance. Piyush Khandelwal (UT Austin) Backup Strategies in MCTS ICML 2016 5

n-step return (bias-variance tradeoff) We can compute the return sample in many different ways! 1-step: r 0 More Bias n-step: r 1 Monte Carlo: r n More Variance We have estimates for all Q values while performing backpropagation. Piyush Khandelwal (UT Austin) Backup Strategies in MCTS ICML 2016 6

MCTS - Complex return Complex return: λ -return/eligibility [Rummery 1995]: r 0 ➡ MCTS( λ ) r 1 γ -return weights [Konidaris et al. 2011]: r n ➡ MCTS γ Piyush Khandelwal (UT Austin) Backup Strategies in MCTS ICML 2016 7

MCTS - Complex return Complex return: λ -return/eligibility [Rummery 1995]: r 0 ➡ MCTS( λ ) Easier to implement. ➢ Assumes n-step return variances increase @ λ -1 . ➢ r 1 γ -return weights [Konidaris et al. 2011]: r n ➡ MCTS γ Parameter free. ➢ Assumes n-step return variances are ➢ highly correlated. Piyush Khandelwal (UT Austin) Backup Strategies in MCTS ICML 2016 8

MaxMCTS - Off-policy style returns Backup using best known action: Intuition: Don’t penalize exploratory actions. ➢ Reinforce previously seen better ➢ trajectories instead. Equivalent to Peng’s Q( λ ) style updates. MaxMCTS( λ ) and MaxMCTS γ Subtree with higher value Piyush Khandelwal (UT Austin) Backup Strategies in MCTS ICML 2016 9

Experiments 4 variants: ● On-policy: MCTS( λ ) and MCTS γ ○ Off-policy: MaxMCTS( λ ) and MaxMCTS γ ○ Test performance in IPC domains ● Limited planning time (10,000 rollouts per step). ○ Grid-world experiments to explore dependency between ● domain structure and backup strategy performance. Piyush Khandelwal (UT Austin) Backup Strategies in MCTS ICML 2016 10

IPC - Random action selection Recon Skill Teaching Elevators Piyush Khandelwal (UT Austin) Backup Strategies in MCTS ICML 2016 11

IPC - Random action selection Recon Skill Teaching Elevators Piyush Khandelwal (UT Austin) Backup Strategies in MCTS ICML 2016 12

IPC - UCB1 action selection Recon Skill Teaching Elevators Piyush Khandelwal (UT Austin) Backup Strategies in MCTS ICML 2016 13

Computational Time Comparison Piyush Khandelwal (UT Austin) Backup Strategies in MCTS ICML 2016 14

Grid World Domain Start 90% chance of moving in ➢ intended direction. Variable number of 10% chance of moving to ➢ 0 Reward any neighbor randomly. Terminal States Goal +100 Step -1 Piyush Khandelwal (UT Austin) Backup Strategies in MCTS ICML 2016 15

Grid World Domain Start #0-Term 0 3 6 15 λ = 1 90.4 11.3 0.9 -2.2 Variable λ = 0.8 90.2 28.0 10.7 -1.4 number of 0 Reward λ = 0.6 89.5 62.8 45.3 8.5 Terminal λ = 0.4 88.7 85.1 77.6 24.1 States λ = 0.2 87.7 82.6 78.1 28.4 λ = 0 84.5 79.8 74.1 31.8 Goal +100 Step -1 Piyush Khandelwal (UT Austin) Backup Strategies in MCTS ICML 2016 16

Related Work λ -return has been applied previously for planning: ● TEXPLORE used a slightly different version of MaxMCTS( λ ) ○ [Hester 2012]. Dyna2 used eligibility traces [Silver et al. 2008]. ○ Other backpropagation strategies: ● MaxMCTS( λ =0) is equivalent to MaxUCT [Keller, Helmert 2012]. ○ Coulom analyzed hand-designed backpropagation strategies in ○ 9x9 Computer Go [Coulom 2007]. Planning Horizon: ● Dependence of planning horizon on performance [Jiang et al. ○ 2015]. Piyush Khandelwal (UT Austin) Backup Strategies in MCTS ICML 2016 17

Conclusions In some domains, selecting the right complex backup strategy ➢ is important. MaxMCTS γ is a parameter-free approach that always performs ➢ better than/equivalent to Monte Carlo. MaxMCTS( λ ) performs best if λ can be selected appropriately. ➢ Backup strategy performance related to number of ➢ trajectories with high rewards. Piyush Khandelwal (UT Austin) Backup Strategies in MCTS ICML 2016 18

Multi-robot coordination [Khandelwal et al. 2015] 84 discrete and ➢ continuous factors 100-500 actions per ➢ state (10-50 after heuristic reduction). Piyush Khandelwal (UT Austin) Backup Strategies in MCTS ICML 2016 19

Complex Backup Strategies in Monte Carlo Tree Search Piyush - PowerPoint PPT Presentation

Complex Backup Strategies in Monte Carlo Tree Search Piyush Khandelwal , Elad Liebman, Scott Niekum, and Peter Stone University of Texas at Austin ICML 2016 Piyush Khandelwal (UT Austin) Backup Strategies in MCTS ICML 2016 Monte Carlo Tree

Monte Carlo Generators Monte Carlo Generators Monte Carlo Generators QCD Lecture III P .

Monte-Carlo tree search for Monte-Carlo tree search for multi-player, no-limit multi-player,

Monte Carlo Tree Search 2-15-16 Reading Quiz What is the relationship between Monte Carlo tree

Monte Carlo Methods Guojin Chen Christopher Cprek Chris Rambicure Monte Carlo Methods 1.

Monte Carlo Approximation of Monte Carlo Filters Adam M. Johansen et al. Collaborators Include:

BROCHURE 2019 TETRA JUICES DEL MONTE DEL MONTE 6 x 1L GOLD PINEAPPLE 6 x 1L 6 x 1L 6 x 1L

ACRONIS BACKUP SETUP AND INSTALLATION Setting up and installing Acronis Backup and Acronis Backup

ACRONIS BACKUP Configuring Acronis Backup and Acronis Backup Cloud Acronis Training and

Center Jason Acord Systems Engineer 2 Secondary backup storage ( onsite) Backup Backup copy

Modern Monte Carlo Tree Search Andrew Li, John Chen, Keiran Paster 1 Outline Motivation

Balanced Search Trees Binary Search Trees Binary Search Tree Binary Search Tree A binary tree is

Chapter 5: Monte Carlo Methods Monte Carlo methods are learning methods Experience

Draft Introduction to (randomized) quasi-Monte Carlo Pierre LEcuyer MCQMC Conference,

Monte Carlo Estimation 7 January 2019 OSU CSE 1 Monte Carlo Methods Class of computational

Monte Carlo Localization Ximing Yu March 24, 2009 Ximing Yu Monte Carlo Localization 1

Monte Carlo Control CMPUT 366: Intelligent Systems S&B 5.3-5.5, 5.7 Lecture Outline 1.

Advisors Answers to Controlling Risk and Reducing Vola7lity

Predicting Stock Market Returns Financial Markets, Day 2, Class 1 Jun Pan Shanghai Advanced

Estate Tax Repeal Is Not a Temporary or Permanent Certainty: How to Plan Now By: Jonathan

Conversion Constructors Converting Objects class Money { ... Money(); ... }; Money money;

Collaboration with Domain Experts Amitabh Varshney University of Maryland Overall Research

Heuristics and Control Strategies Dana S. Nau University of Maryland 1:32 PM February 29,

Topic 8: Inference & Search in CP & LCG (Version of 13th November 2020) Pierre Flener and

The he Right Right Tool ool for the or the Job: ob: A Bayesian Meta-Regression of Employment

Complex Backup Strategies in Monte Carlo Tree Search Piyush - PowerPoint PPT Presentation

Complex Backup Strategies in Monte Carlo Tree Search Piyush Khandelwal , Elad Liebman, Scott Niekum, and Peter Stone University of Texas at Austin ICML 2016 Piyush Khandelwal (UT Austin) Backup Strategies in MCTS ICML 2016 Monte Carlo Tree

Monte Carlo Generators Monte Carlo Generators Monte Carlo Generators QCD Lecture III P .

Monte-Carlo tree search for Monte-Carlo tree search for multi-player, no-limit multi-player,

Monte Carlo Tree Search 2-15-16 Reading Quiz What is the relationship between Monte Carlo tree

Monte Carlo Methods Guojin Chen Christopher Cprek Chris Rambicure Monte Carlo Methods 1.

Monte Carlo Approximation of Monte Carlo Filters Adam M. Johansen et al. Collaborators Include:

BROCHURE 2019 TETRA JUICES DEL MONTE DEL MONTE 6 x 1L GOLD PINEAPPLE 6 x 1L 6 x 1L 6 x 1L

ACRONIS BACKUP SETUP AND INSTALLATION Setting up and installing Acronis Backup and Acronis Backup

ACRONIS BACKUP Configuring Acronis Backup and Acronis Backup Cloud Acronis Training and

Center Jason Acord Systems Engineer 2 Secondary backup storage ( onsite) Backup Backup copy

Modern Monte Carlo Tree Search Andrew Li, John Chen, Keiran Paster 1 Outline Motivation

Balanced Search Trees Binary Search Trees Binary Search Tree Binary Search Tree A binary tree is

Chapter 5: Monte Carlo Methods Monte Carlo methods are learning methods Experience

Draft Introduction to (randomized) quasi-Monte Carlo Pierre LEcuyer MCQMC Conference,

Monte Carlo Estimation 7 January 2019 OSU CSE 1 Monte Carlo Methods Class of computational

Monte Carlo Localization Ximing Yu March 24, 2009 Ximing Yu Monte Carlo Localization 1

Monte Carlo Control CMPUT 366: Intelligent Systems S&amp;B 5.3-5.5, 5.7 Lecture Outline 1.

Advisors Answers to Controlling Risk and Reducing Vola7lity

Predicting Stock Market Returns Financial Markets, Day 2, Class 1 Jun Pan Shanghai Advanced

Estate Tax Repeal Is Not a Temporary or Permanent Certainty: How to Plan Now By: Jonathan

Conversion Constructors Converting Objects class Money { ... Money(); ... }; Money money;

Collaboration with Domain Experts Amitabh Varshney University of Maryland Overall Research

Heuristics and Control Strategies Dana S. Nau University of Maryland 1:32 PM February 29,

Topic 8: Inference &amp; Search in CP &amp; LCG (Version of 13th November 2020) Pierre Flener and

The he Right Right Tool ool for the or the Job: ob: A Bayesian Meta-Regression of Employment

Monte Carlo Control CMPUT 366: Intelligent Systems S&B 5.3-5.5, 5.7 Lecture Outline 1.

Topic 8: Inference & Search in CP & LCG (Version of 13th November 2020) Pierre Flener and