adversarial search
play

Adversarial Search Lecture 7 How can we use search to plan ahead - PowerPoint PPT Presentation

Wentworth Institute of Technology COMP3770 Artificial Intelligence | Summer 2017 | Derbinsky Adversarial Search Lecture 7 How can we use search to plan ahead when other agents are planning against us ? Adversarial Search June 10, 2017 1


  1. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky Adversarial Search Lecture 7 How can we use search to plan ahead when other agents are planning against us ? Adversarial Search June 10, 2017 1

  2. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky Agenda • Games: context, history • Searching via Minimax • Scaling – 𝛽−𝛾 pruning – Depth-limiting – Evaluation functions • Handling uncertainty with Expectiminimax Adversarial Search June 10, 2017 2

  3. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky Characterizing Games • There are many kinds of games, and several ways to classify them – Deterministic vs. stochastic – [Im]perfect information – One, two, multi-player – Utility (how agents value outcomes) • Zero-sum • Algorithmic goal: calculate a strategy (or policy ) that decides a move in each state Adversarial Search June 10, 2017 3

  4. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky Utility Zero/Constant-Sum General Games • Opposite utilities • Independent utilities • Adversarial, pure • Cooperation, indifference, competition competition, and more are all possible Adversarial Search June 10, 2017 4

  5. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky Examples: Perception vs. Chance Deterministic Stochastic Perfect Chess, Checkers, Go, Othello Backgammon, Monopoly Imperfect Battleship Bridge, Poker, Scrabble Adversarial Search June 10, 2017 5

  6. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky Checkers • 1950: First computer player • 1994: First computer champion (Chinook) ended 40-year-reign of human champion Marion Tinsley using complete 8-piece endgame • 1995: defended against Don Lafferty • 2007: solved ! Adversarial Search June 10, 2017 6

  7. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky Chess • 1997: Deep Blue defeats human champion Gary Kasparov in a six-game match • Deep Blue examined 200M positions per second, used very sophisticated evaluation and undisclosed methods for extending some lines of search up to 40 ply • Current programs are even better , if less historic DeepBlue Adversarial Search June 10, 2017 7

  8. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky Go • Until recently, AI was not competitive at champion level – 2015: beat Fan Hui, European champion (2-dan; 5-0) – 2016: beat Lee Sedol, one of the best players in the world (9-dan; 4-1) – 2017: beat Ke Jie, #1 in the world (9-dan; 2-0) • MCTS + ANNs for policy (what to do) and evaluation (how good is a board state) AlphaGo Adversarial Search June 10, 2017 8

  9. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky Poker • Libratus beat four top- class human poker players in January, 2017 – 120,000 hands played • Novel methods for endgame solving in imperfect games • 15 million core hours of computation (+4 during competition) Libratus Adversarial Search June 10, 2017 9

  10. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky More Progress • Othello: 1997, defeated world champion • Bridge: 1998, competitive with human champions • Scrabble: 2006, defeated world champion Adversarial Search June 10, 2017 10

  11. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky Game Formalism • States: 𝑇 (start at 𝑇 % ) • Players: 𝑄 {1, … 𝑂} (typically take turns) • Actions: 𝐵𝑑𝑢𝑗𝑝𝑜(𝑡) , returns legal options • Transition function: 𝑇×𝐵 → 𝑇 • Terminal test: 𝑈𝑓𝑠𝑛𝑗𝑜𝑏𝑚(𝑡) , returns T/F • Utility: 𝑇×𝑄 → ℝ • Solution for a player is a policy : 𝑇 → 𝐵 Adversarial Search June 10, 2017 11

  12. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky Game Plan :) • Start with deterministic, two- player adversarial games • Issues to come – Multiple players – Resource limits – Stochasticity Adversarial Search June 10, 2017 12

  13. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky Single-Agent Game Tree 8 2 0 … 2 6 … 4 6 Adversarial Search June 10, 2017 13

  14. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky Value of a State Non-Terminal States: Value of a state: The best achievable outcome (utility) from that state 8 Terminal States: 2 0 … 2 6 … 4 6 Adversarial Search June 10, 2017 14

  15. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky Adversarial Game Trees -20 -8 … -18 -5 … -10 +4 -20 +8 Adversarial Search June 10, 2017 15

  16. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky Minimax Values States Under Agent’s Control: States Under Opponent’s Control: -8 -5 -10 +8 Terminal States: Adversarial Search June 10, 2017 16

  17. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky Tic-Tac-Toe Game Tree Adversarial Search June 10, 2017 17

  18. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky Adversarial Search via Minimax • Deterministic, zero-sum Minimax values: – Tic-tac-toe, chess computed recursively – One player maximizes max 5 – The other minimizes • Minimax search min 5 2 – A search tree – Players alternate turns – Compute each node’s 8 2 5 6 minimax value : the best achievable utility Terminal values: against a rational part of the game (optimal) adversary Adversarial Search June 10, 2017 18

  19. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky Minimax Implementation def value(state): if the state is a terminal state: return the state’s utility if the next agent is MAX: return max-value(state) if the next agent is MIN: return min-value(state) def max-value(state): def min-value(state): initialize v = - ∞ initialize v = + ∞ for each successor of state: for each successor of state: v = max(v, value(successor)) v = min(v, value(successor)) return v return v Adversarial Search June 10, 2017 19

  20. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky Minimax Evaluation Time 𝒫(𝑐 𝑛 ) • – For chess: 𝑐 ≈ 35 , 𝑛 ≈ 100 Space 𝒫(𝑐𝑛) • Complete • Only if finite Minimax-Min Optimal • Yes, against optimal opponent Minimax-Avg Adversarial Search June 10, 2017 20

  21. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky Multiple Players Add a ply per player • Independent utility: use a vector of values, each player MAX own utility • Zero-sum: each team sequentially MIN/MAX • In Pacman, have multiple MIN layers for each ghost per 1 Pacman move Adversarial Search June 10, 2017 21

  22. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky Scaling to Larger Games Tree Pruning Depth-Limiting + Evaluation Adversarial Search June 10, 2017 22

  23. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky Minimax Example 3 2 3 2 3 12 8 2 4 6 14 5 2 Adversarial Search June 10, 2017 23

  24. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky Minimax Pruning [−∞, ∞] [3, ∞] [3,3] 3 [3,3] [2,2] [−∞, ∞] [−∞, 3] [−∞, 2] [−∞, 14] [−∞, 5] 2 3 2 3 12 8 2 14 5 2 Adversarial Search June 10, 2017 24

  25. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky General Case 𝛽 is the best value (to 𝑁𝐵𝑌 ) found so far off the current path • If V is worse than 𝛽 , 𝑁𝐵𝑌 will avoid it – prune that branch • Define 𝛾 similarly for 𝑁𝐽𝑂 • Adversarial Search June 10, 2017 25

  26. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky Alpha-Beta Pruning def min-value(state, α, β): initialize v = + ∞ for each successor of state: v = min(v,value(successor,α,β)) if v ≤ α return v β = min(β, v) return v def max-value(state, α, β): initialize v = - ∞ for each successor of state: v = max(v,value(successor,α,β)) if v ≥ β return v α: MAX’s best option on path α = max(α, v) β: MIN’s best option on path return v Adversarial Search June 10, 2017 26

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend