Deep Reinforcement Learning Philipp Koehn 18 April 2019 Philipp - PowerPoint PPT Presentation

Deep Reinforcement Learning Philipp Koehn 18 April 2019 Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 18 April 2019

Reinforcement Learning 1 ● Sequence of actions – moves in chess – driving controls in car ● Uncertainty – moves by component – random outcomes (e.g., dice rolls, impact of decisions) ● Reward delayed – chess: win/loss at end of game – Pacman: points scored throughout game ● Challenge: find optimal policy for actions Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 18 April 2019

Deep Learning 2 ● Mapping input to output through multiple layers ● Weight matrices and activation functions Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 18 April 2019

AlphaGo 3 Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 18 April 2019

Book 4 ● Lecture based on the book Deep Learning and the Game of Go by Pumperla and Ferguson, 2019 ● Hands-on introduction to game playing and neural networks ● Lots of Python code Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 18 April 2019

5 go Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 18 April 2019

Go 6 ● Board game with white and black stones ● Stones may be placed anywhere ● If opponents stones are surrounded, you can capture them ● Ultimately: you need to claim territory ● Player with most territory and captured stones wins Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 18 April 2019

Go Board 7 ● Starting board, standard board is 19x19, but can also play with 9x9 or 13x13 Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 18 April 2019

Move 1 8 ● First move: white Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 18 April 2019

Move 2 9 ● Second move: black Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 18 April 2019

Move 3 10 ● Third move: white Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 18 April 2019

Move 7 11 ● Situation after 7 moves, black’s turn Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 18 April 2019

Move 8 12 ● Move by black: surrounded white stone in the middle Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 18 April 2019

Capture 13 ● White stone in middle is captured Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 18 April 2019

Final State 14 ● Any further moves will not change outcome Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 18 April 2019

Final State with Territory Marked 15 ● Total score: number of squares in territory + number of captured stones Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 18 April 2019

Why is Go Hard for Computers? 16 ● Many moves possible – 19x19 board – 361 moves initially – games may last 300 moves ⇒ Huge branching factor in search space ● Hard to evaluate board positions – control of board most important – number of captured stones less relevant Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 18 April 2019

17 game playing Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 18 April 2019

Game Tree 18 etc. ● Recall: game tree to consider all possible moves Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 18 April 2019

Alpha-Beta Search 19 ● Explore game tree depth-first ● Exploration stops at win or loss ● Backtrack to other paths, note best/worst outcome ● Ignore paths with worse outcomes ● This does not work for a game tree with about 361 300 states Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 18 April 2019

Evaluation Function for States 20 ● Explore game tree up to some specified maximum depth ● Evaluate leaf states – informed by knowledge of game – e.g., chess: pawn count, control of board ● This does not work either due – high branching factor – difficulty of defining evaluation function Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 18 April 2019

21 monte carlo tree search Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 18 April 2019

Monte Carlo Tree Search 22 1/0 etc. win ● Explore depth-first randomly (”roll-out”), record win on all states along path Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 18 April 2019

Monte Carlo Tree Search 23 1/1 0/1 1/0 etc. etc. loss win ● Pick existing node as starting point, execute another roll-out, record loss Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 18 April 2019

Monte Carlo Tree Search 24 1/0 1/1 0/1 1/0 1/0 etc. etc. etc. loss win win ● Pick existing node as starting point, execute another roll-out Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 18 April 2019

Monte Carlo Tree Search 25 1/0 0/1 1/1 0/1 1/0 1/0 0/1 etc. etc. etc. etc. loss win loss win ● Pick existing node as starting point, execute another roll-out Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 18 April 2019

Monte Carlo Tree Search 26 1/0 0/1 1/2 0/1 1/0 1/1 0/1 etc. etc. etc. etc. loss loss win loss win ● Increasingly, prefer to explore paths with high win percentage Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 18 April 2019

Monte Carlo Tree Search 27 ● Which node to pick? √ log N w + c n – N total number of roll-outs – n number of roll-outs for this node in the game tree – w winning percentage – c hyper parameter to balance exploration ● This is an inference algorithm – execute, say, 10,000 roll-outs – pick initial action with best win percentage w – can be improved by following rules based on well-known local shapes Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 18 April 2019

28 action prediction with neural networks Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 18 April 2019

Learning Moves 29 ● We would like to learn actions of game playing agent ● Input state: board position ● Output action: optimal move Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 18 April 2019

Learning Moves 30 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 -1 1 0 0 1 0 0 0 0 0 1 -1 0 0 0 0 0 0 0 0 -1 -1 0 0 0 0 0 0 ● Machine learning problem ● Input: 5x5 matrix ● Output: 5x5 matrix Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 18 April 2019

Neural Networks 31 0 0 ● First idea: feed-forward neural network 0 0 0 0 – encode board position in n × n sized vector 0 0 0 0 – encode correct move in n × n sized vector 0 0 – add some hidden layers 0 0 0 1 0 0 ● Many parameters 0 0 0 0 – input and output vectors have dimension 361 1 0 -1 0 (19x19 board) 1 0 -1 0 – if hidden layers have same size 0 0 → 361x361 weights for each 0 0 1 0 -1 0 ● Does not generalize well -1 0 0 0 – same patterns on various locations of the board 0 0 – has to learn moves for each location 0 0 0 0 – consider everything moved one position to the right 0 0 Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 18 April 2019

Convolutional Neural Networks 32 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 -1 1 1 1 0 -1 1 0 1 0 0 1 -1 0 0 0 1 -1 0 0 0 -1 -1 0 0 0 -1 -1 0 0 0 0 0 -1 0 0 0 0 0 0 0 1 0 0 -1 0 1 0 0 0 0 0 -1 1 0 0 0 -1 1 0 0 0 0 0 0 0 0 0 -1 0 0 0 0 0 0 0 0 -1 0 0 ● Convolutional kernel: here maps 3x3 matrix to 1x1 value ● Applied to all 3x3 regions of the original matrix ● Learns local features Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 18 April 2019

Move Prediction with CNNs 33 0 0 0 0 0 Convolutional Layer Convolutional Layer Feed-forward Layer 0 0 0 0 0 Flatten 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 ● May use multiple convolutional kernels (of same size) → learn different local features ● Resulting values may be added or maximum value selected (max-pooling) ● May have several convolutional neural network layers ● Final layer: softmax prediction of move Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 18 April 2019

Human Game Play Data 34 Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 18 April 2019

Human Game Play Data 35 ● Game records – sequence of moves – winning player ● Convert into training data for move prediction – one move at a time – prediction + 1 for move if winner – prediction − 1 for move if loser ● learn winning moves, avoid losing moves Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 18 April 2019

Playing Go with Neural Move Predictor 36 ● Greedy search ● Make prediction at each turn ● Selection move with highest probability Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 18 April 2019

37 reinforcement learning Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 18 April 2019

Deep Reinforcement Learning Philipp Koehn 18 April 2019 Philipp - PowerPoint PPT Presentation

Deep Reinforcement Learning Philipp Koehn 18 April 2019 Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 18 April 2019 Reinforcement Learning 1 Sequence of actions moves in chess driving controls in car

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

1 Deep Reinforcement Learning Qianqian Li, Nayeon Koong, Langtian He What is deep reinforcement

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Deep Reinforcement Learning [Mastering the Game of Go with Deep Reinforcement Learning and Tree

Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning?

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

Deep Reinforcement Learning [Human-Level Control through deep reinforcement learning, Nature

Deep learning Deep reinforcement learning Hamid Beigy Sharif university of technology December

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning

Deep Reinforcement Learning Philipp Koehn 21 April 2020 Philipp Koehn Artificial Intelligence:

Deep he(a)p, big feat arXiv:1707.06887 A Distributional Perspective on Reinforcement Learning

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

Neural Networks and their Application to Go Neural Networks Learning Blackjack Theory Training

Network Policy Controller in Weave Net Blocking unwanted network traffic in Kubernetes Bryan

CSC321 Lecture 22: Q-Learning Roger Grosse Roger Grosse CSC321 Lecture 22: Q-Learning 1 / 21

Integrity Policies CSE497b - Spring 2007 Introduction Computer and Network Security Professor

How to address Polo? Grammatically correct Prof. Chau Dr. Chau Grammatically incorrect, but

August 5, 2014 1 DISCLAIMER This presentation includes time sensitive information that may be

MARKET SIZE Governments will SPEND billions in smart cities A trillion dollar market in

Teaching a first course in Human-Robot Interaction Carlotta A. Berry, Ph.D. Rose-Hulman