MCTS Extensions 2/15/17 The Monte Carlo Tree Search Algorithm MCTS - PowerPoint PPT Presentation

MCTS Extensions 2/15/17

The Monte Carlo Tree Search Algorithm

MCTS Pseudocode for i = 1 : rollouts node = root init empty path # selection while all children expanded and node not terminal node = UCB_sample(node) add node to path # expansion if node not terminal node = expand(random unexpanded child of node) # simulation outcome = random_playout(node's state) # backpropagation for each node in the path update node’s value and visits

Selection Expansion Simulation Backpropagation 1 2.0 1 0.0

Selection Expansion Simulation Backpropagation 1 2.0 1 2 1 0.0 1.0 0.0

Selection Expansion Simulation Backpropagation 1 2.0 2 3 1 1.0 1.0 0.0 1 1.0

Selection Expansion Simulation Backpropagation C = 5.0 1 w i = v i + 5*ln(3) .5 2.0 weights = [7.24, 5.24, 6.24] distribution = [.39, .28, .33] 3 4 1 1.0 .75 0.0 1 2 1.0 1.5 1 0.0

Selection Expansion Simulation Backpropagation C = 5.0 1 w i = v i + 5 * ln(4) .5 /n i .5 2.0 weights = [7.89, 5.89, 6.45] distribution = [.39, .29, .32] weights = [7.24, 5.24, 6.24] distribution = [.39, .28, .33] 5 4 1 .75 1.0 0.0 1 2.0 2 3 1.5 1.0 1 0.0

Exercise: construct the UCB distribution 19 .45 5 3 8 2 .6 .5 .75 0. weights = [2.13, 2.48, 1.96, 2.43] probs = [0.24, 0.28, 0.22, 0.27]

How do we pick a move? MCTS builds a tree, with visits and values for each node. How can we use this to pick a move? 1 2.0 • Pick the highest-value move. • Pick the most-visited move. 5 1 1.0 0.0 • Can we do both? 1 2.0 • Use some weighted combination. • Keep simulating until they agree. 3 1.0 1 0.0

Generalizing MCTS Beyond UCT The tree policy returns a child node in the explored region of the tree. The default policy returns a UCT uses a tree policy value estimate for a newly that draws samples expanded node. according to UCB. UCT uses a default policy that completes a uniform random playout.

Alternative tree policies Requirement: The tree policy needs to trade off exploration and exploitation. • Epsilon-greedy: pick a uniform random child with probability ε and the best child with probability (1-ε). • We’ll see this again soon. • Use UCB, but seed the tree within initial values. • From previous runs. • Using a heuristic. • Other ideas?

Alternative default policies Requirement: The default policy needs to run quickly and return a value estimate. • Use the board evaluation heuristic from bounded minimax. • Run multiple random rollouts for each expanded node. • Other ideas?

Exercise: extend MCTS to these games How can MCTS handle non-zero-sum games? How can MCTS handle games with randomness?

Non-Zero-Sum Games Key idea: store a value tuple with the average utility for each player. • Each node now stores visits, children, and one value for each player. • The agent who’s making a decision will compute UCB weights using only their component of the value tuple.

Randomness in the Environment This is what Monte Carlo simulations were made for! • Whenever we hit a move-by-nature in the game tree, sample from nature’s move distribution. • We still need to track value and visits for the nature node, so that the parent can make its choices. 1 N 2 .4 .6 1 1 2 2

MCTS Extensions 2/15/17 The Monte Carlo Tree Search Algorithm MCTS - PowerPoint PPT Presentation

MCTS Extensions 2/15/17 The Monte Carlo Tree Search Algorithm MCTS Pseudocode for i = 1 : rollouts node = root init empty path # selection while all children expanded and node not terminal node = UCB_sample(node) add node to path #

Extending MCTS 2-17-16 Reading Quiz (from Monday) What is the relationship between Monte Carlo

Class Structure Last time: Batch RL This Time: MCTS Next time: Human in the Loop RL Lecture 14:

Learning to Search with MCTSnets Minghan Li Ignavier Ng Motivation of MCTSnet MCTS is

Class Structure Last time: Batch RL This Time: MCTS Next time: Human in the Loop RL Lecture 16:

Monte Carlo Tree Search Simon M. Lucas Outline MCTS: The Excitement! A tutorial: how it

Partially Observable Markov Decision Processes 3/3/17 (Dis)Advantages of Online MCTS + Just

Cheap Talk Games: Extensions Cheap Talk Games: Extensions F. Koessler / November 12, 2008 Cheap

Product Ads Sitelink Extensions xo group; Jam & Toast, Feb 2012 1 xo group; Jam &

H.264/AVC Standard and H.264/AVC Standard and H.264/AVC Standard and Extensions Extensions

SQL Server Database Forensics Kevvie Fowler , GCFA Gold, CIS S P, MCTS , MCDBA, MCS D, MCS E

Some examples of MCTs work in in 2018 8 us usin ing g war ard d fun undi ding ng 21

Windows Server 2008 Training Day -3 Vijay Bhalerao BCS, MCM, CISA, DCL,MCTS, ISO 27001 LA

MCTS and regional connectivity: A study of spatial mismatch and transit accessibility for low

Windows Server 2008 Training Vijay Bhalerao BCS, MCM, CISA, DCL,MCTS, ISO 27001 LA

Zoom Logistics When listening, please set your video off and mute your side Please feel free to

Modern Monte Carlo Tree Search Andrew Li, John Chen, Keiran Paster 1 Outline Motivation

CS 170 Section 9 Zero-Sum Games, Reductions Owen Jow | owenjow@berkeley.edu Zero-Sum Games

Game Theory Catherine Moon csm17@duke.edu With thanks to Ron Parr and Vince Conitzer for some

Introduction to Game Theory Lirong Xia Fall, 2016 Homework 1 2 Announcements We will use

Variance Reduction for Matrix Games Yair Carmon Yujia Jin Aaron Sidford Kevin Tian

Game Theory Greg Plaxton Theory in Programming Practice, Spring 2004 Department of Computer

Multigrid methods for two player zero-sum stochastic games Sylvie Detournay INRIA Saclay and

Optimally Resilient Strategies in Pushdown Safety Games Joint work with Daniel Neider (MPI-SWS)

CS599: Convex and Combinatorial Optimization Fall 2013 Lecture 3: Linear Programming Duality II

MCTS Extensions 2/15/17 The Monte Carlo Tree Search Algorithm MCTS - PowerPoint PPT Presentation

MCTS Extensions 2/15/17 The Monte Carlo Tree Search Algorithm MCTS Pseudocode for i = 1 : rollouts node = root init empty path # selection while all children expanded and node not terminal node = UCB_sample(node) add node to path #

Extending MCTS 2-17-16 Reading Quiz (from Monday) What is the relationship between Monte Carlo

Class Structure Last time: Batch RL This Time: MCTS Next time: Human in the Loop RL Lecture 14:

Learning to Search with MCTSnets Minghan Li Ignavier Ng Motivation of MCTSnet MCTS is

Class Structure Last time: Batch RL This Time: MCTS Next time: Human in the Loop RL Lecture 16:

Monte Carlo Tree Search Simon M. Lucas Outline MCTS: The Excitement! A tutorial: how it

Partially Observable Markov Decision Processes 3/3/17 (Dis)Advantages of Online MCTS + Just

Cheap Talk Games: Extensions Cheap Talk Games: Extensions F. Koessler / November 12, 2008 Cheap

Product Ads Sitelink Extensions xo group; Jam &amp; Toast, Feb 2012 1 xo group; Jam &amp;

H.264/AVC Standard and H.264/AVC Standard and H.264/AVC Standard and Extensions Extensions

SQL Server Database Forensics Kevvie Fowler , GCFA Gold, CIS S P, MCTS , MCDBA, MCS D, MCS E

Some examples of MCTs work in in 2018 8 us usin ing g war ard d fun undi ding ng 21

Windows Server 2008 Training Day -3 Vijay Bhalerao BCS, MCM, CISA, DCL,MCTS, ISO 27001 LA

MCTS and regional connectivity: A study of spatial mismatch and transit accessibility for low

Windows Server 2008 Training Vijay Bhalerao BCS, MCM, CISA, DCL,MCTS, ISO 27001 LA

Zoom Logistics When listening, please set your video off and mute your side Please feel free to

Modern Monte Carlo Tree Search Andrew Li, John Chen, Keiran Paster 1 Outline Motivation

CS 170 Section 9 Zero-Sum Games, Reductions Owen Jow | owenjow@berkeley.edu Zero-Sum Games

Game Theory Catherine Moon csm17@duke.edu With thanks to Ron Parr and Vince Conitzer for some

Introduction to Game Theory Lirong Xia Fall, 2016 Homework 1 2 Announcements We will use

Variance Reduction for Matrix Games Yair Carmon Yujia Jin Aaron Sidford Kevin Tian

Game Theory Greg Plaxton Theory in Programming Practice, Spring 2004 Department of Computer

Multigrid methods for two player zero-sum stochastic games Sylvie Detournay INRIA Saclay and

Optimally Resilient Strategies in Pushdown Safety Games Joint work with Daniel Neider (MPI-SWS)

CS599: Convex and Combinatorial Optimization Fall 2013 Lecture 3: Linear Programming Duality II

Product Ads Sitelink Extensions xo group; Jam & Toast, Feb 2012 1 xo group; Jam &