Monte-Carlo tree search for Monte-Carlo tree search for - PowerPoint PPT Presentation

MCTS for games with uncertainty?  Expected reward distributions (ERD)  Sample selection using ERD  Backpropagation of ERD [VandenBroeck09]

Expected reward distribution MiniMax Estimating 10 samples 100 samples ∞ samples Variance

Expected reward distribution MiniMax Estimating 10 samples 100 samples ∞ samples Variance Sampling

Expected reward distribution MiniMax ExpectiMax/MixiMax Estimating 10 samples 100 samples ∞ samples Variance Sampling

Expected reward distribution MiniMax ExpectiMax/MixiMax Estimating 10 samples 100 samples ∞ samples Variance Sampling Uncertainty + Sampling

Expected reward distribution MiniMax ExpectiMax/MixiMax ExpectiMax/MixiMax Estimating / T(P) 10 samples 100 samples ∞ samples Variance Sampling Uncertainty + Sampling

Expected reward distribution MiniMax ExpectiMax/MixiMax ExpectiMax/MixiMax Estimating / T(P) 10 samples 100 samples ∞ samples Variance Sampling Uncertainty + Sampling Sampling

ERD selection strategy  Objective?  Find maximum expected reward  Sample more in subtrees with (1) High expected reward (2) Uncertain estimate  UCT does (1) but not really (2)  CrazyStone does (1) and (2) for deterministic games (Go)  UCT+ selection : (1) (2)

ERD selection strategy  Objective?  Find maximum expected reward  Sample more in subtrees with (1) High expected reward (2) Uncertain estimate  UCT does (1) but not really (2)  CrazyStone does (1) and (2) for deterministic games (Go)  UCT+ selection : “Expected value under perfect play”

ERD selection strategy  Objective?  Find maximum expected reward  Sample more in subtrees with (1) High expected reward (2) Uncertain estimate  UCT does (1) but not really (2)  CrazyStone does (1) and (2) for deterministic games (Go)  UCT+ selection : “Measure of uncertainty due to sampling”

ERD max-distribution backpropagation max A B … … 3 4

ERD max-distribution backpropagation sample-weighted max 3.5 A B … … 3 4

ERD max-distribution backpropagation sample-weighted max 3.5 max A B 4 … … 3 4

ERD max-distribution backpropagation sample-weighted max 3.5 max A B 4 … … “When the game reaches P, we'll have more time to find the real “ 3 4

ERD max-distribution backpropagation sample-weighted max 3.5 max A B 4 max-distribution … … 4.5 3 4

ERD max-distribution backpropagation P(B<4) = 0.5 P(B>4) = 0.5 P(A<4) = 0.8 P(A>4) = 0.2 max A<4 A>4 B<4 0.8*0.5 0.2*0.5 B>4 0.8*0.5 0.2*0.5 A B P(max(A,B)>4) = 0.6 > 0.5 … … 4.5 3 4

Experiments  2*MCTS  2*MCTS  Max-distribution  UCT+ (stddev)  Sample-weighted  UCT

Outline  Overview approach  The Poker game tree  Opponent model  Monte-Carlo tree search  Research challenges  Search  Uncertainty in MCTS  Continuous action spaces  Opponent model  Online learning  Concept drift  Conclusion

Dealing with continuous actions  Sample discrete actions relative  Progressive betsize unpruning [Chaslot08] (ignores smoothness of EV function)  ...  Tree learning search (work in progress)

Tree learning search  Based on regression tree induction from data streams  training examples arrive quickly  nodes split when significant reduction in stddev  training examples are immediately forgotten  Edges in TLS tree are not actions, but sets of actions , e.g., (raise in [2,40]), (fold or call)  MCTS provides a stream of (action,EV) examples  Split action sets to reduce stddev of EV (when significant)

Tree learning search max Bet in [0,10] {Fold, Call} max ? ?

Tree learning search max Bet in [0,10] {Fold, Call} max ? ? Optimal split at 4

Tree learning search max Bet in [0,10] {Fold, Call} max Bet in Bet in [4,10] [0,4] max max ? ? ? ?

Tree learning search one action of P1 one action of P2

Selection Phase P1 Sample 2.4 Each node has EV estimate, which generalizes over actions

Expansion P1 P2 Selected Node

Expansion P1 P2 Expanded node P3 Represents any action of P3

Backpropagation New sample; Split becomes significant

Outline  Overview approach  The Poker game tree  Opponent model  Monte-Carlo tree search  Research challenges  Search  Uncertainty in MCTS  Continuous action spaces  Opponent model  Online learning  Concept drift  Conclusion

Online learning of opponent model  Start from (safe) model of general opponent  Exploit weaknesses of specific opponent Start to learn model of specific opponent (exploration of opponent behavior)

Monte-Carlo tree search for Monte-Carlo tree search for - PowerPoint PPT Presentation

Monte-Carlo tree search for Monte-Carlo tree search for multi-player, no-limit multi-player, no-limit Texas hold'em poker Texas hold'em poker Guy Van den Broeck Should I bluff? Deceptive play Should I bluff? Is he bluffing? Opponent

Monte Carlo Generators Monte Carlo Generators Monte Carlo Generators QCD Lecture III P .

Monte Carlo Tree Search 2-15-16 Reading Quiz What is the relationship between Monte Carlo tree

Monte Carlo Methods Guojin Chen Christopher Cprek Chris Rambicure Monte Carlo Methods 1.

Monte Carlo Approximation of Monte Carlo Filters Adam M. Johansen et al. Collaborators Include:

BROCHURE 2019 TETRA JUICES DEL MONTE DEL MONTE 6 x 1L GOLD PINEAPPLE 6 x 1L 6 x 1L 6 x 1L

Modern Monte Carlo Tree Search Andrew Li, John Chen, Keiran Paster 1 Outline Motivation

Chapter 5: Monte Carlo Methods Monte Carlo methods are learning methods Experience

Draft Introduction to (randomized) quasi-Monte Carlo Pierre LEcuyer MCQMC Conference,

Monte Carlo Estimation 7 January 2019 OSU CSE 1 Monte Carlo Methods Class of computational

Monte Carlo Localization Ximing Yu March 24, 2009 Ximing Yu Monte Carlo Localization 1

Monte Carlo Control CMPUT 366: Intelligent Systems S&B 5.3-5.5, 5.7 Lecture Outline 1.

4. THE MONTE CARLO METHOD 4.1 I ntroduction This chapter is aimed at describing the Monte Carlo

CS171: Artificial Intelligence Monte Carlo Tree Search and Alpha Go Jia Chen Dec 5, 2017 1

Monte Carlo Tree Search for Algorithm Configuration: MOSAIC Herilalaina Rakotoarison and Mich`

Planning and Optimization December 16, 2019 G8. Monte-Carlo Tree Search Algorithms (Part II)

Balanced Search Trees Binary Search Trees Binary Search Tree Binary Search Tree A binary tree is

2-7 Triple Draw Poker: With Learning Nikolai Yakovenko 2/18/15 EE6894 Deep Learning Class

We Welc lcome ome to to U. S. H . S. His istor tory Mr Mr. . Na Nation tion Quick

, Olivea Resort W orld class resort uniquely designed around you C yprus Jewel of the

2017 June Ramaco Resources Investor Presentation Disclaimer Forward Looking Statements The

Empowerment: 5 myths and how to get there! Marcos Garrido, Certified Scrum Trainer (CST)

o Capital Raising o Valuations o Financial Advisory Services 2 Robert Heller, CEO, Spectrum

M K Using Japanese to save your I a Legacy S/W k n + a b d a o n Mikado Method

Inclusive Communities Through Education Summit What we need to grow understandings and

Monte-Carlo tree search for Monte-Carlo tree search for - PowerPoint PPT Presentation

Monte-Carlo tree search for Monte-Carlo tree search for multi-player, no-limit multi-player, no-limit Texas hold'em poker Texas hold'em poker Guy Van den Broeck Should I bluff? Deceptive play Should I bluff? Is he bluffing? Opponent

Monte Carlo Generators Monte Carlo Generators Monte Carlo Generators QCD Lecture III P .

Monte Carlo Tree Search 2-15-16 Reading Quiz What is the relationship between Monte Carlo tree

Monte Carlo Methods Guojin Chen Christopher Cprek Chris Rambicure Monte Carlo Methods 1.

Monte Carlo Approximation of Monte Carlo Filters Adam M. Johansen et al. Collaborators Include:

BROCHURE 2019 TETRA JUICES DEL MONTE DEL MONTE 6 x 1L GOLD PINEAPPLE 6 x 1L 6 x 1L 6 x 1L

Modern Monte Carlo Tree Search Andrew Li, John Chen, Keiran Paster 1 Outline Motivation

Chapter 5: Monte Carlo Methods Monte Carlo methods are learning methods Experience

Draft Introduction to (randomized) quasi-Monte Carlo Pierre LEcuyer MCQMC Conference,

Monte Carlo Estimation 7 January 2019 OSU CSE 1 Monte Carlo Methods Class of computational

Monte Carlo Localization Ximing Yu March 24, 2009 Ximing Yu Monte Carlo Localization 1

Monte Carlo Control CMPUT 366: Intelligent Systems S&amp;B 5.3-5.5, 5.7 Lecture Outline 1.

4. THE MONTE CARLO METHOD 4.1 I ntroduction This chapter is aimed at describing the Monte Carlo

CS171: Artificial Intelligence Monte Carlo Tree Search and Alpha Go Jia Chen Dec 5, 2017 1

Monte Carlo Tree Search for Algorithm Configuration: MOSAIC Herilalaina Rakotoarison and Mich`

Planning and Optimization December 16, 2019 G8. Monte-Carlo Tree Search Algorithms (Part II)

Balanced Search Trees Binary Search Trees Binary Search Tree Binary Search Tree A binary tree is

2-7 Triple Draw Poker: With Learning Nikolai Yakovenko 2/18/15 EE6894 Deep Learning Class

We Welc lcome ome to to U. S. H . S. His istor tory Mr Mr. . Na Nation tion Quick

, Olivea Resort W orld class resort uniquely designed around you C yprus Jewel of the

2017 June Ramaco Resources Investor Presentation Disclaimer Forward Looking Statements The

Empowerment: 5 myths and how to get there! Marcos Garrido, Certified Scrum Trainer (CST)

o Capital Raising o Valuations o Financial Advisory Services 2 Robert Heller, CEO, Spectrum

M K Using Japanese to save your I a Legacy S/W k n + a b d a o n Mikado Method

Inclusive Communities Through Education Summit What we need to grow understandings and

Monte Carlo Control CMPUT 366: Intelligent Systems S&B 5.3-5.5, 5.7 Lecture Outline 1.