monte carlo tree search
play

Monte Carlo Tree Search Mark Maloof Department of Computer Science - PowerPoint PPT Presentation

Monte Carlo Tree Search Mark Maloof Department of Computer Science Georgetown University Washington, DC 20057 1 January 1970 Overview MCTS consists of four main steps (Browne et al., 2012) 1. Selection: Starting at the root, select the


  1. Monte Carlo Tree Search Mark Maloof Department of Computer Science Georgetown University Washington, DC 20057 1 January 1970

  2. Overview ◮ MCTS consists of four main steps (Browne et al., 2012) 1. Selection: Starting at the root, select the best action until reaching a node that has not been fully explored (i.e., a node with untried and therefore unevaluated actions). 2. Expansion: Choose an action, and expand the tree by adding a child node. 3. Simulation: From the newly added child, uniformly randomly select actions until reaching a leaf node and receiving a reward (e.g., +1 for winning, − 1 for losing). 4. Backpropagation: Starting at the new child node, propagate the reward to the root by adjusting the visit count N ( v ) and the simulation reward Q ( v ) of the nodes along the path.

  3. Figure 2, Brown et al. (2012)

  4. Upper-confidence Bound for Trees (UCT) 1: function uctSearch( s 0 ) create a root node v 0 with state s 0 2: while within computational budget do 3: v l ← treePolicy( v 0 ) 4: ∆ ← defaultPolicy(( s ( v l )) 5: backup( v l , ∆) 6: end while 7: return a (bestChild( v 0 , 0)) 8: 9: end function

  5. Tree Policy 1: function treePolicy( v ) while v is non-terminal do 2: if v not fully expanded then 3: return expand( v ) 4: else 5: v ← bestChild( v , C p ) 6: end if 7: end while 8: return v 9: 10: end function

  6. Expand 1: function expand( v ) choose a ∈ untried actions from A ( s ( v )) 2: add a new child v ′ to v with s ( v ′ ) = f ( s ( v ) , a ) and 3: a ( v ′ ) = a return v ′ 4: 5: end function

  7. Best Child 1: function bestChild( v , c ) � Q ( v ′ ) 2 ln N ( v ) return argmax v ′ ∈ Children ( v ) N ( v ′ ) + c 2: N ( v ′ ) 3: end function

  8. Default Policy 1: function defaultPolicy( s ) while s is non-terminal do 2: choose a ∈ A ( s ) uniformly at random 3: s ← f ( s , a ) 4: end while 5: return reward for state s 6: 7: end function

  9. Backup 1: function backup( v , ∆) while s is not null do 2: N ( v ) ← N ( v ) + 1 3: Q ( v ) ← Q ( v ) + ∆( v , p ) ⊲ p is player 4: v ← parent of v 5: end while 6: 7: end function

  10. Backup Negamax 1: function backupNegamax( v , ∆) while s is not null do 2: N ( v ) ← N ( v ) + 1 3: Q ( v ) ← Q ( v ) + ∆ 4: ∆ ← − ∆ 5: v ← parent of v 6: end while 7: 8: end function

  11. Figure 3, Brown et al. (2012)

  12. Monte Carlo Tree Search Mark Maloof Department of Computer Science Georgetown University Washington, DC 20057 1 January 1970

  13. References I C. Browne, E. Powley, D. Whitehouse, S. Lucas, P. I. Cowling, P. Fohlfshagen, S. Tavener, D. Perez, S. Samothrakis, and S. Colton. A survey of Monte Carlo tree search methods. IEEE Transactions on Computational Intelligence and AI in Games , 4(1):1–43, 2012. doi: 10.1109/TCIAIG.2012.2186810 .

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend