monte carlo tree search parallelisation
play

Monte-Carlo Tree Search Parallelisation International Go Symposium - PowerPoint PPT Presentation

Monte-Carlo Tree Search Parallelisation International Go Symposium 2012 Francois van Niekerk francoisvn@ml.sun.ac.za August 2012 Collaborators: Steve Kroon Gert-Jan van Rooyen Cornelia Inggs This work was partially supported by the National


  1. Monte-Carlo Tree Search Parallelisation International Go Symposium 2012 Francois van Niekerk francoisvn@ml.sun.ac.za August 2012

  2. Collaborators: Steve Kroon Gert-Jan van Rooyen Cornelia Inggs This work was partially supported by the National Research Foundation of South Africa.

  3. Outline Introduction 1 Background 2 Computer Go Monte-Carlo Tree Search Parallelisation 3 Implementation Testing and Results 4 Multi-Core Parallelisation Cluster Parallelisation 5 New Developments Conclusions 6

  4. Introduction • Top Go programs are currently about 5 dan KGS. • Monte-Carlo Tree Search (MCTS) is dominant Computer Go algorithm. • MCTS parallelisation possible on multi-core and cluster systems.

  5. Computer Go • Tree for moves and their follow-ups. • Exponential tree growth means brute-force is infeasible. • Evaluation function is used to avoid growing tree too far.

  6. Classical Methods • Emulate humans with expert knowledge. • Difficult to assimilate new knowledge into an already large body. • Top strength in SDKs, far from pros.

  7. Monte-Carlo Tree Search • Monte-Carlo methods — stochastic simulations (playouts). • Winrate of playouts starting from a position is the value of the position. • Playouts are used in a tree to form Monte-Carlo Tree Search (MCTS). • MCTS can be broken into four parts: selection, expansion, simulation and backpropagation.

  8. Monte-Carlo Tree Search 4/9 1/3 0/1 3/5 0/1 1/1 2/3 0/1 1/1 0/1 Selection

  9. Monte-Carlo Tree Search 4/9 1/3 0/1 3/5 0/1 1/1 2/3 0/1 1/1 0/1 Expansion

  10. Monte-Carlo Tree Search 4/9 1/3 0/1 3/5 0/1 1/1 2/3 0/1 1/1 0/1 W Simulation (playout)

  11. Monte-Carlo Tree Search 4/9 1/3 0/1 3/5 0/1 1/1 2/3 0/1 1/1 0/1 1/1 Backpropagation

  12. Monte-Carlo Tree Search 4/9 1/3 0/1 3/5 0/1 1/1 2/3 1/2 1/1 0/1 1/1 Backpropagation

  13. Monte-Carlo Tree Search 4/9 1/3 0/1 4/6 0/1 1/1 2/3 1/2 1/1 0/1 1/1 Backpropagation

  14. Monte-Carlo Tree Search 5/10 1/3 0/1 4/6 0/1 1/1 2/3 1/2 1/1 0/1 1/1 Backpropagation

  15. Parallelisation • Improve MCTS: improve algorithm or increase playouts. • Increasing number of playouts increases playing strength. • Increase playouts: increase thinking time or playout rate. • Parallelisation: use parallel hardware to increase playout rate and therefore strength. • Three parallelisation methods for MCTS: tree, leaf, and root.

  16. Tree Parallelisation • Single shared tree. • Well-suited to shared-memory systems, such as multi-core systems.

  17. Leaf Parallelisation master: • Master and slave nodes. • Only one tree, on the master. • Slaves are playout workers. slaves:

  18. Root Parallelisation • Each execution node maintains a tree. • Each node performs MCTS. • Periodic sharing of information.

  19. Parallel Effect • Strength penalty for parallelisation. • Due to change from sequential to parallel execution. • More pronounced if the playout updates are delayed, for example in root vs. multi-core parallelisation.

  20. Implementation • Oakfoam is an open-source cross-platform MCTS engine for Computer Go. • Tree parallelisation for multi-core systems. • Root parallelisation for cluster systems.

  21. Testing and Results • Test for playout rate increase. • If increase found, test for strength penalty. • If strength penalty found, test for overall strength increase.

  22. Multi-Core Parallelisation Results 8 8 Ideal Ideal No additions No additions Virtual Loss Virtual Loss Lock-free Lock-free 4 4 Both additions Both additions Speedup Speedup 2 2 1 1 1 2 4 8 1 2 4 8 Cores Cores Speedup on 9x9 Speedup on 19x19

  23. Cluster Parallelisation Results 100 100 Baseline 10s/move 10s/move p = 0 . 1 10s/move p = 0 . 2 90 90 10s/move p = 0 . 05 Winrate vs. 1-Core [%] Winrate vs. 1-Core [%] 2s/move p = 0 . 1 2s/move p = 0 . 2 80 80 2s/move p = 0 . 05 70 70 Baseline 10s/move 60 60 10s/move p = 0 . 1 2s/move p = 0 . 1 50 50 1 2 4 8 16 1 2 4 8 16 32 64 Cores/Periods Cores/Periods Strength Comparison on 9x9 Strength Comparison on 19x19

  24. Overview of Results • Multi-core: tree parallelisation showed linear scaling up to eight cores (physical limit in these tests). • Cluster: root parallelisation for 19x19 showed scaling up to eight nodes, where it had a four-core ideal strength improvement.

  25. New Developments • Pachi uses virtual wins and losses to improve cluster scaling. • Depth-First UCT changes MCTS from a best-first to a depth-first search. • Distributed UCT, and Distributed Depth-First UCT use Transposition-table Driven Scheduling to break up the tree across nodes. • UCT-Treesplit uses Transposition-table Driven Scheduling to break up the MCTS work across nodes. • Only virtual wins and losses applied to Computer Go so far.

  26. Conclusions • MCTS is dominant algorithm for Computer Go. • Parallelisation on multi-core systems scales well. • Parallelisation on cluster systems possible, but still room for improvement. • Future of cluster parallelisation holds possibilities.

  27. Thanks Thank you for taking time to listen to this talk. More information about this talk is available at: http://oakfoam.com/igs2012 . Please send any questions to: francoisvn@ml.sun.ac.za .

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend