CS885 Reinforcement Learning Lecture 13c: June 13, 2018 Adversarial - PowerPoint PPT Presentation

CS885 Reinforcement Learning Lecture 13c: June 13, 2018 Adversarial Search [RusNor] Sec. 5.1-5.4 University of Waterloo CS885 Spring 2018 Pascal Poupart 1

Outline • Minimax search • Evaluation functions • Alpha-beta pruning University of Waterloo CS885 Spring 2018 Pascal Poupart 2

Game search challenge • What makes game search challenging? – There is an opponent! – The opponent is malicious – it wants to win (i.e. it is trying to make you lose) – We need to take this into account when choosing moves • Simulate the opponent’s behaviour in our search • Notation: One player is called MAX (who wants to maximize its utility) and one player is called MIN (who wants to minimize its utility) University of Waterloo CS885 Spring 2018 Pascal Poupart 3

Example: Tic-Tac-Toe MAX’s job is to use the search tree to determine the best move University of Waterloo CS885 Spring 2018 Pascal Poupart 4

Optimal strategies • Want to find the optimal strategy – One that leads to outcomes at least as good as any other strategy, given that MIN is playing optimally – Equilibrium (game theory) – Zero-sum game of perfect information University of Waterloo CS885 Spring 2018 Pascal Poupart 5

Minimax Value MINIMAX-VALUE(n) = Utility(n) if n is a terminal state Max s Î Succ(n) MINIMAX-VALUE(s) if n is a MAX node Min s Î Succ(n) MINIMAX-VALUE(s) if n is a MIN node ply University of Waterloo CS885 Spring 2018 Pascal Poupart 6

Minimax algorithm Returns action corresponding to best possible move University of Waterloo CS885 Spring 2018 Pascal Poupart 7

Properties of Minimax • Time complexity: – O(b d ) Where b is branching factor and d is depth of the tree • Space complexity: – O(bd) just need to keep in memory the current branch with its children University of Waterloo CS885 Spring 2018 Pascal Poupart 8

Minimax and multi-player games University of Waterloo CS885 Spring 2018 Pascal Poupart 9

Chess • Can we write a a minimax program that will play chess reasonably well? – For chess ! ≈ 35 and % ≈ 100 – Do we really need to look at all those nodes? University of Waterloo CS885 Spring 2018 Pascal Poupart 10

Alpha-Beta Pruning • No! – If we are smart (and careful) we can do pruning • Eliminate large parts of the tree from consideration • Alpha-Beta pruning applied to a minimax tree – Returns the same decision as minimax – Prunes branches that cannot influence final decision University of Waterloo CS885 Spring 2018 Pascal Poupart 11

Alpha-Beta Pruning • Alpha: – Value of best (highest value) choice we have found so far on the path for MAX • Beta: – Value of best (lowest value) choice we have found so far on path for MIN • Update alpha and beta as search continues • Prune as soon as the value of the current node is known to be worse than current alpha or beta values for MAX or MIN University of Waterloo CS885 Spring 2018 Pascal Poupart 12

Alpha-Beta example [-inf, inf ] MAX MIN [- inf , 3] 3 University of Waterloo CS885 Spring 2018 Pascal Poupart 13

Alpha-Beta example [- inf , inf ] MAX MIN [- inf ,3] 3 12 University of Waterloo CS885 Spring 2018 Pascal Poupart 14

Alpha-Beta example [3, inf ] MAX MIN [3,3] 3 12 8 University of Waterloo CS885 Spring 2018 Pascal Poupart 15

Alpha-Beta example [3, inf ] MAX MIN [3,3] [- inf ,2] 3 12 2 8 University of Waterloo CS885 Spring 2018 Pascal Poupart 16

Alpha-Beta example [3, inf ] MAX MIN [3,3] [- inf ,2] Prune remaining children 3 12 2 8 University of Waterloo CS885 Spring 2018 Pascal Poupart 17

Alpha-Beta example [3,14] MAX MIN [- inf ,14] [3,3] [- inf ,2] 3 12 2 14 8 University of Waterloo CS885 Spring 2018 Pascal Poupart 18

Alpha-Beta example [3,5] MAX MIN [- inf ,5] [3,3] [- inf ,2] 3 12 2 14 5 8 University of Waterloo CS885 Spring 2018 Pascal Poupart 19

Alpha-Beta example [3,3] MAX MIN [2,2] [3,3] [- inf ,2] 2 3 12 2 14 5 8 University of Waterloo CS885 Spring 2018 Pascal Poupart 20

Properties of Alpha-Beta • Pruning does not affect the final result – Prune parts of the tree that would never be reached in actual play • The order in which moves are evaluated are important – A bad move ordering will prune nothing – A perfect node ordering can reduce time complexity to O(b d/2 ) University of Waterloo CS885 Spring 2018 Pascal Poupart 21

Real-time decisions • Alpha-beta can be a huge improvement over minimax – Still not good enough as we need to search all the way to terminal states for at least part of the search space – Need to make a decision about a move quickly • Heuristic evaluation function + cutoff test University of Waterloo CS885 Spring 2018 Pascal Poupart 22

Evaluation functions • Apply an evaluation function to a state – If terminal state, function returns actual utility – If non-terminal, function returns estimate of the expected utility (i.e. the chance of winning from that state) – Function must be fast to compute University of Waterloo CS885 Spring 2018 Pascal Poupart 23

Evaluation functions • Evaluation functions can be given by the designer of the program (using expert knowledge) or learned from experience • If features can be judged independently, a weighted linear function is good – w 1 f 1 (s)+w 2 f 2 (s)+…+w n f n (s) with s as board state • Neural networks are commonly used today University of Waterloo CS885 Spring 2018 Pascal Poupart 24

Cutting off search • Instead of searching until we find a terminal state, we can cut search sooner and apply the evaluation function • When? – Arbitrarily (but deeper is better) – Quiescent states • States that are “stable” – not going to change value (by a lot) in the near future – Singular extensions • Searching deeper when you have a move that is “clearly better” (i.e. moving the king out of check) • Can be used to avoid the horizon effect University of Waterloo CS885 Spring 2018 Pascal Poupart 25

Cutting off search • How deep do we need to search? – Novice chess human player • 5-ply (minimax) – Master chess human player • 10-ply (alpha-beta) – Grandmaster chess human player • 14-ply + a fantastic evaluation function, opening and endgame databases University of Waterloo CS885 Spring 2018 Pascal Poupart 26

CS885 Reinforcement Learning Lecture 13c: June 13, 2018 Adversarial - PowerPoint PPT Presentation

CS885 Reinforcement Learning Lecture 13c: June 13, 2018 Adversarial Search [RusNor] Sec. 5.1-5.4 University of Waterloo CS885 Spring 2018 Pascal Poupart 1 Outline Minimax search Evaluation functions Alpha-beta pruning University

CS885 Reinforcement Learning Lecture 1a: May 2, 2018 Course Introduction [SutBar] Chapter 1,

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning

CS885 Reinforcement Learning Lecture 12: June 8, 2018 Deep Recurrent Q-Networks [GBC] Chap. 10

CS885 Reinforcement Learning Lecture 15c: June 20, 2018 Semi-Markov Decision Processes [Put]

CS885 Reinforcement Learning Lecture 14c: June 15, 2018 Trust Region Methods [Nocedal and

CS885 Reinforcement Learning Lecture 8a: May 25, 2018 Multi-armed Bandits [SutBar] Sec. 2.1-2.7,

CS885 Reinforcement Learning Lecture 8b: May 25, 2018 Bayesian and Contextual Bandits [SutBar]

CS885 Reinforcement Learning Lecture 4a: May 11, 2018 Deep Neural Networks [GBC] Chap. 6, 7, 8

CS885 Reinforcement Learning Lecture 2a: May 4, 2018 Intro to Markov decision processes [SutBar]

CS885 Reinforcement Learning Lecture 1b: May 2, 2018 Markov Processes [RusNor] Sec. 15.1

CS885 Reinforcement Learning Lecture 4b: May 11, 2018 Deep Q-networks [SutBar] Sec. 9.4, 9.7,

Neural Combinatorial Optimization With Reinforcement Learning CS885 Reinforcement Learning Paper

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

BOARD MEETING Marketing & PR Update May 11, 2020 CCCTA 05/11/20 Agenda Item #13C Marketing

Reinforcement Learning-Based End-to-End Parking for Automatic Parking System CS885

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

Game Playing Tail end of Constraint Satisfaction Ch. 5.1-5.3, 5.4.1, 5.5 Questions Game

Game Playing Philipp Koehn 29 September 2015 Philipp Koehn Artificial Intelligence: Game

CS 730/830: Intro AI Adversarial Search 1 handout: slides You think you know when you can learn,

The results of alpha-beta depend on the order in which moves are considered among the

Foundations of Artificial Intelligence 42. Board Games: Minimax Search and Evaluation Functions

Inf2D 04: Adversarial Search Valerio Restocchi School of Informatics, University of Edinburgh

Minimax Rates for Memory-Constrained Sparse Linear Regression Jacob Steinhardt John Duchi

Robust Digital Filters Part 1: Minimax FIR Filters Wu-Sheng Lu Takao Hinamoto University of

CS885 Reinforcement Learning Lecture 13c: June 13, 2018 Adversarial - PowerPoint PPT Presentation

CS885 Reinforcement Learning Lecture 13c: June 13, 2018 Adversarial Search [RusNor] Sec. 5.1-5.4 University of Waterloo CS885 Spring 2018 Pascal Poupart 1 Outline Minimax search Evaluation functions Alpha-beta pruning University

CS885 Reinforcement Learning Lecture 1a: May 2, 2018 Course Introduction [SutBar] Chapter 1,

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning

CS885 Reinforcement Learning Lecture 12: June 8, 2018 Deep Recurrent Q-Networks [GBC] Chap. 10

CS885 Reinforcement Learning Lecture 15c: June 20, 2018 Semi-Markov Decision Processes [Put]

CS885 Reinforcement Learning Lecture 14c: June 15, 2018 Trust Region Methods [Nocedal and

CS885 Reinforcement Learning Lecture 8a: May 25, 2018 Multi-armed Bandits [SutBar] Sec. 2.1-2.7,

CS885 Reinforcement Learning Lecture 8b: May 25, 2018 Bayesian and Contextual Bandits [SutBar]

CS885 Reinforcement Learning Lecture 4a: May 11, 2018 Deep Neural Networks [GBC] Chap. 6, 7, 8

CS885 Reinforcement Learning Lecture 2a: May 4, 2018 Intro to Markov decision processes [SutBar]

CS885 Reinforcement Learning Lecture 1b: May 2, 2018 Markov Processes [RusNor] Sec. 15.1

CS885 Reinforcement Learning Lecture 4b: May 11, 2018 Deep Q-networks [SutBar] Sec. 9.4, 9.7,

Neural Combinatorial Optimization With Reinforcement Learning CS885 Reinforcement Learning Paper

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

BOARD MEETING Marketing &amp; PR Update May 11, 2020 CCCTA 05/11/20 Agenda Item #13C Marketing

Reinforcement Learning-Based End-to-End Parking for Automatic Parking System CS885

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

Game Playing Tail end of Constraint Satisfaction Ch. 5.1-5.3, 5.4.1, 5.5 Questions Game

Game Playing Philipp Koehn 29 September 2015 Philipp Koehn Artificial Intelligence: Game

CS 730/830: Intro AI Adversarial Search 1 handout: slides You think you know when you can learn,

The results of alpha-beta depend on the order in which moves are considered among the

Foundations of Artificial Intelligence 42. Board Games: Minimax Search and Evaluation Functions

Inf2D 04: Adversarial Search Valerio Restocchi School of Informatics, University of Edinburgh

Minimax Rates for Memory-Constrained Sparse Linear Regression Jacob Steinhardt John Duchi

Robust Digital Filters Part 1: Minimax FIR Filters Wu-Sheng Lu Takao Hinamoto University of

BOARD MEETING Marketing & PR Update May 11, 2020 CCCTA 05/11/20 Agenda Item #13C Marketing