Monte Carlo Tree Search Mark Maloof Department of Computer Science - - PowerPoint PPT Presentation

monte carlo tree search
SMART_READER_LITE
LIVE PREVIEW

Monte Carlo Tree Search Mark Maloof Department of Computer Science - - PowerPoint PPT Presentation

Monte Carlo Tree Search Mark Maloof Department of Computer Science Georgetown University Washington, DC 20057 1 January 1970 Overview MCTS consists of four main steps (Browne et al., 2012) 1. Selection: Starting at the root, select the


slide-1
SLIDE 1

Monte Carlo Tree Search

Mark Maloof Department of Computer Science Georgetown University Washington, DC 20057 1 January 1970

slide-2
SLIDE 2

Overview

◮ MCTS consists of four main steps (Browne et al., 2012)

  • 1. Selection: Starting at the root, select the best action until

reaching a node that has not been fully explored (i.e., a node with untried and therefore unevaluated actions).

  • 2. Expansion: Choose an action, and expand the tree by adding a

child node.

  • 3. Simulation: From the newly added child, uniformly randomly

select actions until reaching a leaf node and receiving a reward (e.g., +1 for winning, −1 for losing).

  • 4. Backpropagation: Starting at the new child node, propagate

the reward to the root by adjusting the visit count N(v) and the simulation reward Q(v) of the nodes along the path.

slide-3
SLIDE 3

Figure 2, Brown et al. (2012)

slide-4
SLIDE 4

Upper-confidence Bound for Trees (UCT)

1: function uctSearch(s0) 2:

create a root node v0 with state s0

3:

while within computational budget do

4:

vl ← treePolicy(v0)

5:

∆ ← defaultPolicy((s(vl))

6:

backup(vl, ∆)

7:

end while

8:

return a(bestChild(v0, 0))

9: end function

slide-5
SLIDE 5

Tree Policy

1: function treePolicy(v) 2:

while v is non-terminal do

3:

if v not fully expanded then

4:

return expand(v)

5:

else

6:

v ← bestChild(v, Cp)

7:

end if

8:

end while

9:

return v

10: end function

slide-6
SLIDE 6

Expand

1: function expand(v) 2:

choose a ∈ untried actions from A(s(v))

3:

add a new child v′ to v with s(v′) = f (s(v), a) and a(v′) = a

4:

return v′

5: end function

slide-7
SLIDE 7

Best Child

1: function bestChild(v, c) 2:

return argmaxv′∈Children(v)

Q(v′) N(v′) + c

  • 2 ln N(v)

N(v′)

3: end function

slide-8
SLIDE 8

Default Policy

1: function defaultPolicy(s) 2:

while s is non-terminal do

3:

choose a ∈ A(s) uniformly at random

4:

s ← f (s, a)

5:

end while

6:

return reward for state s

7: end function

slide-9
SLIDE 9

Backup

1: function backup(v, ∆) 2:

while s is not null do

3:

N(v) ← N(v) + 1

4:

Q(v) ← Q(v) + ∆(v, p) ⊲ p is player

5:

v ← parent of v

6:

end while

7: end function

slide-10
SLIDE 10

Backup Negamax

1: function backupNegamax(v, ∆) 2:

while s is not null do

3:

N(v) ← N(v) + 1

4:

Q(v) ← Q(v) + ∆

5:

∆ ← −∆

6:

v ← parent of v

7:

end while

8: end function

slide-11
SLIDE 11

Figure 3, Brown et al. (2012)

slide-12
SLIDE 12

Monte Carlo Tree Search

Mark Maloof Department of Computer Science Georgetown University Washington, DC 20057 1 January 1970

slide-13
SLIDE 13

References I

  • C. Browne, E. Powley, D. Whitehouse, S. Lucas, P. I. Cowling, P. Fohlfshagen, S. Tavener, D. Perez,
  • S. Samothrakis, and S. Colton. A survey of Monte Carlo tree search methods. IEEE Transactions on

Computational Intelligence and AI in Games, 4(1):1–43, 2012. doi: 10.1109/TCIAIG.2012.2186810.