deep reinforcement learning
play

Deep Reinforcement Learning Philipp Koehn 18 April 2019 Philipp - PowerPoint PPT Presentation

Deep Reinforcement Learning Philipp Koehn 18 April 2019 Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 18 April 2019 Reinforcement Learning 1 Sequence of actions moves in chess driving controls in car


  1. Deep Reinforcement Learning Philipp Koehn 18 April 2019 Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 18 April 2019

  2. Reinforcement Learning 1 ● Sequence of actions – moves in chess – driving controls in car ● Uncertainty – moves by component – random outcomes (e.g., dice rolls, impact of decisions) ● Reward delayed – chess: win/loss at end of game – Pacman: points scored throughout game ● Challenge: find optimal policy for actions Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 18 April 2019

  3. Deep Learning 2 ● Mapping input to output through multiple layers ● Weight matrices and activation functions Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 18 April 2019

  4. AlphaGo 3 Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 18 April 2019

  5. Book 4 ● Lecture based on the book Deep Learning and the Game of Go by Pumperla and Ferguson, 2019 ● Hands-on introduction to game playing and neural networks ● Lots of Python code Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 18 April 2019

  6. 5 go Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 18 April 2019

  7. Go 6 ● Board game with white and black stones ● Stones may be placed anywhere ● If opponents stones are surrounded, you can capture them ● Ultimately: you need to claim territory ● Player with most territory and captured stones wins Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 18 April 2019

  8. Go Board 7 ● Starting board, standard board is 19x19, but can also play with 9x9 or 13x13 Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 18 April 2019

  9. Move 1 8 ● First move: white Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 18 April 2019

  10. Move 2 9 ● Second move: black Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 18 April 2019

  11. Move 3 10 ● Third move: white Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 18 April 2019

  12. Move 7 11 ● Situation after 7 moves, black’s turn Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 18 April 2019

  13. Move 8 12 ● Move by black: surrounded white stone in the middle Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 18 April 2019

  14. Capture 13 ● White stone in middle is captured Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 18 April 2019

  15. Final State 14 ● Any further moves will not change outcome Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 18 April 2019

  16. Final State with Territory Marked 15 ● Total score: number of squares in territory + number of captured stones Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 18 April 2019

  17. Why is Go Hard for Computers? 16 ● Many moves possible – 19x19 board – 361 moves initially – games may last 300 moves ⇒ Huge branching factor in search space ● Hard to evaluate board positions – control of board most important – number of captured stones less relevant Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 18 April 2019

  18. 17 game playing Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 18 April 2019

  19. Game Tree 18 etc. ● Recall: game tree to consider all possible moves Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 18 April 2019

  20. Alpha-Beta Search 19 ● Explore game tree depth-first ● Exploration stops at win or loss ● Backtrack to other paths, note best/worst outcome ● Ignore paths with worse outcomes ● This does not work for a game tree with about 361 300 states Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 18 April 2019

  21. Evaluation Function for States 20 ● Explore game tree up to some specified maximum depth ● Evaluate leaf states – informed by knowledge of game – e.g., chess: pawn count, control of board ● This does not work either due – high branching factor – difficulty of defining evaluation function Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 18 April 2019

  22. 21 monte carlo tree search Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 18 April 2019

  23. Monte Carlo Tree Search 22 1/0 etc. win ● Explore depth-first randomly (”roll-out”), record win on all states along path Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 18 April 2019

  24. Monte Carlo Tree Search 23 1/1 0/1 1/0 etc. etc. loss win ● Pick existing node as starting point, execute another roll-out, record loss Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 18 April 2019

  25. Monte Carlo Tree Search 24 1/0 1/1 0/1 1/0 1/0 etc. etc. etc. loss win win ● Pick existing node as starting point, execute another roll-out Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 18 April 2019

  26. Monte Carlo Tree Search 25 1/0 0/1 1/1 0/1 1/0 1/0 0/1 etc. etc. etc. etc. loss win loss win ● Pick existing node as starting point, execute another roll-out Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 18 April 2019

  27. Monte Carlo Tree Search 26 1/0 0/1 1/2 0/1 1/0 1/1 0/1 etc. etc. etc. etc. loss loss win loss win ● Increasingly, prefer to explore paths with high win percentage Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 18 April 2019

  28. Monte Carlo Tree Search 27 ● Which node to pick? √ log N w + c n – N total number of roll-outs – n number of roll-outs for this node in the game tree – w winning percentage – c hyper parameter to balance exploration ● This is an inference algorithm – execute, say, 10,000 roll-outs – pick initial action with best win percentage w – can be improved by following rules based on well-known local shapes Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 18 April 2019

  29. 28 action prediction with neural networks Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 18 April 2019

  30. Learning Moves 29 ● We would like to learn actions of game playing agent ● Input state: board position ● Output action: optimal move Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 18 April 2019

  31. Learning Moves 30 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 -1 1 0 0 1 0 0 0 0 0 1 -1 0 0 0 0 0 0 0 0 -1 -1 0 0 0 0 0 0 ● Machine learning problem ● Input: 5x5 matrix ● Output: 5x5 matrix Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 18 April 2019

  32. Neural Networks 31 0 0 ● First idea: feed-forward neural network 0 0 0 0 – encode board position in n × n sized vector 0 0 0 0 – encode correct move in n × n sized vector 0 0 – add some hidden layers 0 0 0 1 0 0 ● Many parameters 0 0 0 0 – input and output vectors have dimension 361 1 0 -1 0 (19x19 board) 1 0 -1 0 – if hidden layers have same size 0 0 → 361x361 weights for each 0 0 1 0 -1 0 ● Does not generalize well -1 0 0 0 – same patterns on various locations of the board 0 0 – has to learn moves for each location 0 0 0 0 – consider everything moved one position to the right 0 0 Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 18 April 2019

  33. Convolutional Neural Networks 32 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 -1 1 1 1 0 -1 1 0 1 0 0 1 -1 0 0 0 1 -1 0 0 0 -1 -1 0 0 0 -1 -1 0 0 0 0 0 -1 0 0 0 0 0 0 0 1 0 0 -1 0 1 0 0 0 0 0 -1 1 0 0 0 -1 1 0 0 0 0 0 0 0 0 0 -1 0 0 0 0 0 0 0 0 -1 0 0 ● Convolutional kernel: here maps 3x3 matrix to 1x1 value ● Applied to all 3x3 regions of the original matrix ● Learns local features Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 18 April 2019

  34. Move Prediction with CNNs 33 0 0 0 0 0 Convolutional Layer Convolutional Layer Feed-forward Layer 0 0 0 0 0 Flatten 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 ● May use multiple convolutional kernels (of same size) → learn different local features ● Resulting values may be added or maximum value selected (max-pooling) ● May have several convolutional neural network layers ● Final layer: softmax prediction of move Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 18 April 2019

  35. Human Game Play Data 34 Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 18 April 2019

  36. Human Game Play Data 35 ● Game records – sequence of moves – winning player ● Convert into training data for move prediction – one move at a time – prediction + 1 for move if winner – prediction − 1 for move if loser ● learn winning moves, avoid losing moves Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 18 April 2019

  37. Playing Go with Neural Move Predictor 36 ● Greedy search ● Make prediction at each turn ● Selection move with highest probability Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 18 April 2019

  38. 37 reinforcement learning Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 18 April 2019

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend