AlphaGo
2/17/17
AlphaGo 2/17/17 Video https://www.youtube.com/watch?v=g-dKXOlsf98 - - PowerPoint PPT Presentation
AlphaGo 2/17/17 Video https://www.youtube.com/watch?v=g-dKXOlsf98 Figure from the AlphaGo Paper neural networks regular MCTS AlphaGo Neural Networks Tree Policy Default Policy Step 1: learn to predict human moves Used a large database
2/17/17
https://www.youtube.com/watch?v=g-dKXOlsf98
regular MCTS neural networks
Default Policy Tree Policy
expert games.
network:
selection.
CS63 topic neural networks weeks 8–9
CS63 topic reinforcement learning weeks 6-7 CS63 topic stochastic gradient descent week 3
self-play database
black wins from a given board
CS63 topic avoiding
weeks 9-10
When expanding a node, its initial value combines:
A rollout according to P𝜌 selects random moves with the estimated probability a human would select them instead of uniformly randomly.
UCT / MCTS
rollouts.
give an answer immediately, improves its answer with more time).
required, but can be used if available.
information gracefully.
MinMax/Backward Induction
tree is explored or pruned.
the game.
with iterative deepening.
unless the game tree is small.
information games.
write new heuristics for every game.
neural networks, but they’re still heuristics.
information better than Min/Max.
So why does MCTS make so much sense for go?