Online Knowledge Enhancements for Monte Carlo Tree Search in - PowerPoint PPT Presentation

Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning Bachelor presentation Marcel Neidinger <m.neidinger@unibas.ch> Department of Mathematics and Computer Science, University of Basel 13. February 2017

What is Probabilistic Planning? Solve planning tasks with probabilistic transitions Models a Markov Decision Problem given by M = ⟨ V, s 0 , A, T, R ⟩ A set of binary variables V inducing States S = 2 V An initial state s 0 ∈ S A set of applicable actions A A transition model T : S × A × S → [0; 1] A Reward R ( s, a ) Monte Carlo Tree Search algorithms solve MDPs Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning 2 / 33

Monte Carlo Tree Search Algorithms Algorithmic framework to solve MDPs Used especially in computer Go Go Board 1 Lee Sedol 2 1 Source: https://commons.wikimedia.org/wiki/File:Go_board.jpg 2 Source: https://qz.com/639952/googles-ai-won-the-game-go-by-defying- millennia-of-basic-human-instinct/ Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning 3 / 33

Four phases - Two components Selection Expansion Simulation Simulation e Backpropagation Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning 4 / 33

Monte Carlo Tree node MCTS tree for a MDP M Important information in a tree node A state s ∈ S A counter N ( i ) for the number of visits A counter N ( i ) ( s, a ) ∀ a ∈ A for the number of times a was selected in s A reward estimate Q ( i ) ( s, a ) for action a in state s Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning 5 / 33

Online Knowledge AlphaGo used Neural Networks for the two policis → Domain-specific knowledge We want domain independent enhancements Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning 6 / 33

Overview Tree-Policy Enhancements All Moves as First α -AMAF Cutoff-AMAF Rapid Action Value Estimation Default-Policy Enhancements Move-Average Sampling Technique Conclusion Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning 7 / 33

What is a Tree Policy? Iterate through the known part of the tree and select an action given a node Use a Q value for a state-action pair to estimate an actions reward Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning 8 / 33

UCT MCTS implementation first proposed in 2006 s 1 m m ′ m ′ s 2 s 3 m ′′ m ′ s 5 s 4 Reward: 10 Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning 9 / 33

UCT Reward approximation, parent node v l , child node v j √ 2 ln N ( i ) ( s l ) UCT ( v l , v j ) = Q ( i ) ( s l , a j ) + 2 C p (1) N ( i +1) ( s j ) From parent v l select child node v ∗ that maximises v ∗ = max (2) v j { UCT ( n l , n j ) } Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning 10 / 33

All Moves as First - Idea UCT score needs several trials to become reliable Idea: Generalize informations extracted from trials Implementation: Use additional (node-independant) score that updates unselected actions as well State Action Reward … s 1 m s 1 m m ′ m ′ s 2 s 3 m ′′ m ′ s 5 s 4 Reward: 10 Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning 11 / 33

All Moves as First - α -AMAF Idea: Combine UCT and AMAF score (3) SCR = αAMAF + (1 − α ) UCT Choose action with highest SCR Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning 12 / 33

All Moves as First - α -AMAF - Results AMAF( α = 0 ) AMAF( α = 0 . 6 ) AMAF( α = 0 . 2 ) AMAF( α = 0 . 8 ) AMAF( α = 0 . 4 ) AMAF( α = 1 . 0 ) 1 0 . 8 IPPC score 0 . 6 0 . 4 0 . 2 0 wildfire triangle academic elevators tamarisk sysadmin recon game traffic crossing skill navigation total Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning 13 / 33 Domain

All Moves as First - α -AMAF - Problems With more trials UCT becomes more reliable AMAF score has higher variance We want to discontinue using AMAF score after some time Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning 14 / 33

All Moves as First - Cutoff-AMAF Introduce cutoff parameter K { , for i ≤ k αAMAF + (1 − α ) UCT (4) SCR = , else UCT Use AMAF score only in the first k trials Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning 15 / 33

All Moves as First - Cutoff-AMAF - Results init: IDS, backup: MC Raw UCT Plain α -AMAF 0 . 75 0 . 7 Total IPPC score 0 . 65 0 . 6 0 . 55 0 . 5 0 10 20 30 40 50 K value Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning 16 / 33

All Moves as First - Cutoff-AMAF - Problems How to choose the parameter K ? When is the UCT score reliable enough? Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning 17 / 33

Rapid Actio Value Estimation - Idea First introduced in 2007 for computer go Use soft cutoff { } 0 , V − v ( n ) (5) α = max V Use UCT for often visited nodes and AMAF score for less-visited Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning 18 / 33

Rapid Action Value Estimation - Results UCT RAVE(15) RAVE(50) RAVE(5) RAVE(25) 1 0 . 8 IPPC score 0 . 6 0 . 4 0 . 2 0 wildfire triangle academic elevators tamarisk sysadmin recon game traffic crossing skill navigation total Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning 19 / 33 Domain

All Moves as First - Conclusion UCT AMAF( α = 0 . 2 ) RAVE(25) 1 0 . 8 IPPC score 0 . 6 0 . 4 0 . 2 0 w t a e t s r g t c s n t r a e r o y r k i c l a a a i e l a m c o t a s m i v d f l a n v o f s l d a i f a i g a n e c s g l i d r e r t i a e l i m n e m o s t r g i k i s i o n c n Domain Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning 20 / 33

Rapid Action Value Estimation - Problems PROST uses problem description with conditional effects Also no preconditions given PROST description is more general Player In PROST: Goal field Movepath Action: move _ up In e.g. computer chess Action: move _ a 2 _ to _ a 3 Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning 21 / 33

Predicate Rapid Action Value Estimation A state has predicates that give some context Idea Use predicates to find similar states and use their score Q PRAV E ( s, a ) = 1 ∑ (6) Q RAV E ( p, a ) N p ∈ P and weight with { 0 , V − v ( n ) } (7) α = V Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning 22 / 33

All Moves as First - Conclusion - Revisited UCT RAVE(25) PRAVE AMAF( α = 0 . 2 ) 1 0 . 8 IPPC score 0 . 6 0 . 4 0 . 2 0 w t a e t s r g t c s n t r a e r o y r k i c l a a a i e l a m c o t a s m i v d f l a n v o f s l d a i f a i g a n e c s g l i d r e r t i a e l i m n e m o s t r g i k i s i o n c n Domain Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning 23 / 33

Overview Tree-Policy Enhancements All Moves as First α -AMAF Cutoff-AMAF Rapid Action Value Estimation Default-Policy Enhancements Move-Average Sampling Technique Conclusion Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning 24 / 33

What is a Default Policy? Simulation e Simulate the outcome of a trial Basic default policy: random walk Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning 25 / 33

X-Average Sampling Technique Use tree knowledge to bias default policy towards moves that are more goal-oriented Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning 26 / 33

Move-Average Sampling Technique - Idea - Sample Game Introduce Q ( a ) Use moves that are Player Goal field good on average Movepath Choose action according to: Q ( a ) e τ (8) P ( a ) = Q ( b ) ∑ e τ b ∈ A Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning 27 / 33

Move-Average Sampling Technique - Idea - Example Actions: r,r,u,u,u Actions: r,r,u,l,l Q ( r ) = 1; N ( r ) = 2 Q ( r ) = 2; N ( r ) = 4 Q ( u ) = 6; N ( u ) = 3 Q ( u ) = 7; N ( u ) = 4 Q ( l ) = 3; N ( l ) = 2 Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning 28 / 33

Move-Average Sampling Technique - Idea - Example (2) Actions: l,u,u,r,r Actions: r,r,r,u,u Q ( r ) = 7; N ( r ) = 6 Q ( r ) = 7; N ( r ) = 9 Q ( u ) = 8; N ( u ) = 6 Q ( u ) = 9; N ( u ) = 8 Q ( l ) = 2; N ( l ) = 3 Q ( l ) = 2; N ( l ) = 3 Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning 29 / 33

Online Knowledge Enhancements for Monte Carlo Tree Search in - PowerPoint PPT Presentation

Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning Bachelor presentation Marcel Neidinger <m.neidinger@unibas.ch> Department of Mathematics and Computer Science, University of Basel 13. February 2017

Monte Carlo Generators Monte Carlo Generators Monte Carlo Generators QCD Lecture III P .

Monte-Carlo tree search for Monte-Carlo tree search for multi-player, no-limit multi-player,

Monte Carlo Tree Search 2-15-16 Reading Quiz What is the relationship between Monte Carlo tree

Monte Carlo Methods Guojin Chen Christopher Cprek Chris Rambicure Monte Carlo Methods 1.

Monte Carlo Approximation of Monte Carlo Filters Adam M. Johansen et al. Collaborators Include:

BROCHURE 2019 TETRA JUICES DEL MONTE DEL MONTE 6 x 1L GOLD PINEAPPLE 6 x 1L 6 x 1L 6 x 1L

Modern Monte Carlo Tree Search Andrew Li, John Chen, Keiran Paster 1 Outline Motivation

Balanced Search Trees Binary Search Trees Binary Search Tree Binary Search Tree A binary tree is

Chapter 5: Monte Carlo Methods Monte Carlo methods are learning methods Experience

Draft Introduction to (randomized) quasi-Monte Carlo Pierre LEcuyer MCQMC Conference,

Monte Carlo Estimation 7 January 2019 OSU CSE 1 Monte Carlo Methods Class of computational

Monte Carlo Localization Ximing Yu March 24, 2009 Ximing Yu Monte Carlo Localization 1

Monte Carlo Control CMPUT 366: Intelligent Systems S&B 5.3-5.5, 5.7 Lecture Outline 1.

4. THE MONTE CARLO METHOD 4.1 I ntroduction This chapter is aimed at describing the Monte Carlo

CS171: Artificial Intelligence Monte Carlo Tree Search and Alpha Go Jia Chen Dec 5, 2017 1

Monte Carlo Tree Search for Algorithm Configuration: MOSAIC Herilalaina Rakotoarison and Mich`

Modules and packages import from as name, "main"

AST 1420 Galactic Structure and Dynamics Last week: equilibrium of dynamical systems

Integrating Dynamics into NTU, Singapore Industrial Motion Planning Path planning problem Find

Finite summability in noncommutative geometry Magnus Go ff eng joint work with Bram Mesland

Surface imaging of flux-closure domains in thick micron-size self-assembled dots: a combined

Generic R&D for an EIC : Developing Analysis Tools and Techniques for the EIC Whitney

Distributed Estimation with Relative Measurements Fundamental Limitations, Algorithms, Application

Signatures of Earth-Scattering in the Direct Detection of Dark Matter Bradley J. Kavanagh LPTHE

Online Knowledge Enhancements for Monte Carlo Tree Search in - PowerPoint PPT Presentation

Online Knowledge Enhancements for Monte Carlo Tree Search in Probabilistic Planning Bachelor presentation Marcel Neidinger <m.neidinger@unibas.ch> Department of Mathematics and Computer Science, University of Basel 13. February 2017

Monte Carlo Generators Monte Carlo Generators Monte Carlo Generators QCD Lecture III P .

Monte-Carlo tree search for Monte-Carlo tree search for multi-player, no-limit multi-player,

Monte Carlo Tree Search 2-15-16 Reading Quiz What is the relationship between Monte Carlo tree

Monte Carlo Methods Guojin Chen Christopher Cprek Chris Rambicure Monte Carlo Methods 1.

Monte Carlo Approximation of Monte Carlo Filters Adam M. Johansen et al. Collaborators Include:

BROCHURE 2019 TETRA JUICES DEL MONTE DEL MONTE 6 x 1L GOLD PINEAPPLE 6 x 1L 6 x 1L 6 x 1L

Modern Monte Carlo Tree Search Andrew Li, John Chen, Keiran Paster 1 Outline Motivation

Balanced Search Trees Binary Search Trees Binary Search Tree Binary Search Tree A binary tree is

Chapter 5: Monte Carlo Methods Monte Carlo methods are learning methods Experience

Draft Introduction to (randomized) quasi-Monte Carlo Pierre LEcuyer MCQMC Conference,

Monte Carlo Estimation 7 January 2019 OSU CSE 1 Monte Carlo Methods Class of computational

Monte Carlo Localization Ximing Yu March 24, 2009 Ximing Yu Monte Carlo Localization 1

Monte Carlo Control CMPUT 366: Intelligent Systems S&amp;B 5.3-5.5, 5.7 Lecture Outline 1.

4. THE MONTE CARLO METHOD 4.1 I ntroduction This chapter is aimed at describing the Monte Carlo

CS171: Artificial Intelligence Monte Carlo Tree Search and Alpha Go Jia Chen Dec 5, 2017 1

Monte Carlo Tree Search for Algorithm Configuration: MOSAIC Herilalaina Rakotoarison and Mich`

Modules and packages import from as __name__, &quot;__main__&quot;

AST 1420 Galactic Structure and Dynamics Last week: equilibrium of dynamical systems

Integrating Dynamics into NTU, Singapore Industrial Motion Planning Path planning problem Find

Finite summability in noncommutative geometry Magnus Go ff eng joint work with Bram Mesland

Surface imaging of flux-closure domains in thick micron-size self-assembled dots: a combined

Generic R&amp;D for an EIC : Developing Analysis Tools and Techniques for the EIC Whitney

Distributed Estimation with Relative Measurements Fundamental Limitations, Algorithms, Application

Signatures of Earth-Scattering in the Direct Detection of Dark Matter Bradley J. Kavanagh LPTHE

Monte Carlo Control CMPUT 366: Intelligent Systems S&B 5.3-5.5, 5.7 Lecture Outline 1.

Modules and packages import from as name, "main"

Generic R&D for an EIC : Developing Analysis Tools and Techniques for the EIC Whitney