Adversarial Search Lecture 7 How can we use search to plan ahead - - PowerPoint PPT Presentation

adversarial search
SMART_READER_LITE
LIVE PREVIEW

Adversarial Search Lecture 7 How can we use search to plan ahead - - PowerPoint PPT Presentation

Wentworth Institute of Technology COMP3770 Artificial Intelligence | Summer 2017 | Derbinsky Adversarial Search Lecture 7 How can we use search to plan ahead when other agents are planning against us ? Adversarial Search June 10, 2017 1


slide-1
SLIDE 1

Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky

Adversarial Search

Lecture 7

How can we use search to plan ahead when

  • ther agents are planning against us?

June 10, 2017 Adversarial Search 1

slide-2
SLIDE 2

Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky

Agenda

  • Games: context, history
  • Searching via Minimax
  • Scaling

– 𝛽−𝛾 pruning – Depth-limiting – Evaluation functions

  • Handling uncertainty with

Expectiminimax

June 10, 2017 Adversarial Search 2

slide-3
SLIDE 3

Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky

Characterizing Games

  • There are many kinds of games, and

several ways to classify them

– Deterministic vs. stochastic – [Im]perfect information – One, two, multi-player – Utility (how agents value outcomes)

  • Zero-sum
  • Algorithmic goal: calculate a strategy (or

policy) that decides a move in each state

June 10, 2017 Adversarial Search 3

slide-4
SLIDE 4

Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky

Utility

  • Opposite utilities
  • Adversarial, pure

competition

  • Independent utilities
  • Cooperation, indifference,

competition, and more are all possible

June 10, 2017 Adversarial Search 4

Zero/Constant-Sum General Games

slide-5
SLIDE 5

Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky

Examples: Perception vs. Chance

Deterministic Stochastic Perfect Chess, Checkers, Go, Othello Backgammon, Monopoly Imperfect Battleship Bridge, Poker, Scrabble

June 10, 2017 Adversarial Search 5

slide-6
SLIDE 6

Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky

Checkers

  • 1950: First computer

player

  • 1994: First computer

champion (Chinook) ended 40-year-reign of human champion Marion Tinsley using complete 8-piece endgame

  • 1995: defended against

Don Lafferty

  • 2007: solved!

June 10, 2017 Adversarial Search 6

slide-7
SLIDE 7

Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky

Chess

  • 1997: Deep Blue defeats

human champion Gary Kasparov in a six-game match

  • Deep Blue examined

200M positions per second, used very sophisticated evaluation and undisclosed methods for extending some lines

  • f search up to 40 ply
  • Current programs are

even better, if less historic

June 10, 2017 Adversarial Search 7

DeepBlue

slide-8
SLIDE 8

Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky

Go

  • Until recently, AI was

not competitive at champion level

– 2015: beat Fan Hui, European champion (2-dan; 5-0) – 2016: beat Lee Sedol,

  • ne of the best players

in the world (9-dan; 4-1) – 2017: beat Ke Jie, #1 in the world (9-dan; 2-0)

  • MCTS + ANNs for

policy (what to do) and evaluation (how good is a board state)

June 10, 2017 Adversarial Search 8

AlphaGo

slide-9
SLIDE 9

Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky

Poker

  • Libratus beat four top-

class human poker players in January, 2017

– 120,000 hands played

  • Novel methods for

endgame solving in imperfect games

  • 15 million core hours of

computation (+4 during competition)

June 10, 2017 Adversarial Search 9

Libratus

slide-10
SLIDE 10

Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky

More Progress

  • Othello: 1997,

defeated world champion

  • Bridge: 1998,

competitive with human champions

  • Scrabble: 2006,

defeated world champion

June 10, 2017 Adversarial Search 10

slide-11
SLIDE 11

Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky

Game Formalism

  • States: 𝑇 (start at 𝑇%)
  • Players: 𝑄 {1, … 𝑂} (typically take turns)
  • Actions: 𝐵𝑑𝑢𝑗𝑝𝑜(𝑡), returns legal options
  • Transition function: 𝑇×𝐵 → 𝑇
  • Terminal test: 𝑈𝑓𝑠𝑛𝑗𝑜𝑏𝑚(𝑡), returns T/F
  • Utility: 𝑇×𝑄 → ℝ
  • Solution for a player is a policy: 𝑇 → 𝐵

June 10, 2017 Adversarial Search 11

slide-12
SLIDE 12

Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky

Game Plan :)

  • Start with

deterministic, two- player adversarial games

  • Issues to come

– Multiple players – Resource limits – Stochasticity

June 10, 2017 Adversarial Search 12

slide-13
SLIDE 13

Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky

Single-Agent Game Tree

June 10, 2017 Adversarial Search 13

8 2 2 6 4 6 … …

slide-14
SLIDE 14

Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky

Value of a State

June 10, 2017 Adversarial Search 14

Non-Terminal States:

8 2 2 6 4 6 … …

Terminal States: Value of a state: The best achievable

  • utcome (utility)

from that state

slide-15
SLIDE 15

Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky

Adversarial Game Trees

June 10, 2017 Adversarial Search 15

  • 20
  • 8
  • 18
  • 5
  • 10

+4 … …

  • 20

+8

slide-16
SLIDE 16

Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky

Minimax Values

June 10, 2017 Adversarial Search 16

+8

  • 10
  • 5
  • 8

States Under Agent’s Control: Terminal States: States Under Opponent’s Control:

slide-17
SLIDE 17

Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky

Tic-Tac-Toe Game Tree

June 10, 2017 Adversarial Search 17

slide-18
SLIDE 18

Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky

Adversarial Search via Minimax

  • Deterministic, zero-sum

– Tic-tac-toe, chess – One player maximizes – The other minimizes

  • Minimax search

– A search tree – Players alternate turns – Compute each node’s minimax value: the best achievable utility against a rational (optimal) adversary

June 10, 2017 Adversarial Search 18

8 2 5 6 max min 2 5 5 Terminal values: part of the game Minimax values: computed recursively

slide-19
SLIDE 19

Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky

Minimax Implementation

June 10, 2017 Adversarial Search 19

def min-value(state): initialize v = +∞ for each successor of state: v = min(v, value(successor)) return v def max-value(state): initialize v = -∞ for each successor of state: v = max(v, value(successor)) return v def value(state): if the state is a terminal state: return the state’s utility if the next agent is MAX: return max-value(state) if the next agent is MIN: return min-value(state)

slide-20
SLIDE 20

Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky

Minimax Evaluation

Time

  • 𝒫(𝑐𝑛)

– For chess: 𝑐 ≈ 35, 𝑛 ≈ 100

Space

  • 𝒫(𝑐𝑛)

Complete

  • Only if finite

Optimal

  • Yes, against optimal
  • pponent

June 10, 2017 Adversarial Search 20

Minimax-Min Minimax-Avg

slide-21
SLIDE 21

Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky

Multiple Players

Add a ply per player

  • Independent utility:

use a vector of values, each player MAX own utility

  • Zero-sum: each team

sequentially MIN/MAX

  • In Pacman, have

multiple MIN layers for each ghost per 1 Pacman move

June 10, 2017 Adversarial Search 21

slide-22
SLIDE 22

Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky

Scaling to Larger Games

June 10, 2017 Adversarial Search 22

Tree Pruning

Depth-Limiting + Evaluation

slide-23
SLIDE 23

Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky

Minimax Example

June 10, 2017 Adversarial Search 23

12 8 5 2 3 2 14 4 6 3 2 2 3

slide-24
SLIDE 24

Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky

Minimax Pruning

June 10, 2017 Adversarial Search 24

12 8 5 2 3 2 14 [−∞, ∞] [−∞, ∞] [−∞, 3] [3,3] 3 [3, ∞] [−∞, 2] [−∞, 14] 2 [−∞, 5] [2,2] 2 3 [3,3]

slide-25
SLIDE 25

Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky

General Case

  • 𝛽 is the best value (to 𝑁𝐵𝑌) found so far off the current path
  • If V is worse than 𝛽, 𝑁𝐵𝑌 will avoid it – prune that branch
  • Define 𝛾 similarly for 𝑁𝐽𝑂

June 10, 2017 Adversarial Search 25

slide-26
SLIDE 26

Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky

Alpha-Beta Pruning

June 10, 2017 Adversarial Search 26

def min-value(state, α, β): initialize v = +∞ for each successor of state: v = min(v,value(successor,α,β)) if v ≤ α return v β = min(β, v) return v def max-value(state, α, β): initialize v = -∞ for each successor of state: v = max(v,value(successor,α,β)) if v ≥ β return v α = max(α, v) return v α: MAX’s best option on path β: MIN’s best option on path

slide-27
SLIDE 27

Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky

Alpha-Beta Properties

  • Has no effect on minimax value computed for the root!
  • Good child ordering improves effectiveness of pruning
  • With “perfect ordering”:

– Time complexity drops to 𝒫(𝑐N/P) – Doubles solvable depth! – Full search of, e.g. chess, is still hopeless…

  • This is a simple example of metareasoning

(computing about what to compute)

June 10, 2017 Adversarial Search 27

slide-28
SLIDE 28

Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky

Checkup #1

June 10, 2017 Adversarial Search 28

10 8 50 4

slide-29
SLIDE 29

Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky

Checkup #2

June 10, 2017 Adversarial Search 29

10 6 100 8 1 2 20 4

slide-30
SLIDE 30

Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky

Checkup #3

Adversarial Search 30

5 6 4 3 6 7 6 7 5 6 9 5 9 8

slide-31
SLIDE 31

Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky

Resource Limits

  • Problem: in realistic games, cannot search to leaves!
  • Solution: depth-limited search

1. Search only to a limited depth in the tree 2. Replace terminal utilities with an evaluation function for non-terminal positions

  • Guarantee of optimal play is gone
  • More plies makes a BIG difference
  • Use iterative deepening for an anytime algorithm

June 10, 2017 Adversarial Search 31

slide-32
SLIDE 32

Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky

Search Depth Matters

  • Evaluation functions

are always imperfect

  • The deeper in the tree

the evaluation function is buried, the less the quality of the evaluation function matters

  • An important example
  • f the tradeoff between

complexity of features and complexity of computation

June 10, 2017 Adversarial Search 32

Depth2 Depth10

slide-33
SLIDE 33

Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky

Evaluation Functions

  • Evaluation functions score non-terminals in depth-

limited search

  • Ideal: returns the actual minimax value of the position
  • In practice: typically weighted linear sum of features:

e.g. 𝑔

R(𝑡) = (num white queens – num black queens)

June 10, 2017 Adversarial Search 33

slide-34
SLIDE 34

Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky

Why Pacman Starves/Thrashes

  • A danger of replanning agents!

– He knows his score will go up by eating a dot now – He knows his score will go up just as much by eating a dot later – There are no point-scoring opportunities after eating a dot (within the horizon, two here) – Therefore, waiting seems just as good as eating: he may go east, then back west in the next round of replanning!

June 10, 2017 Adversarial Search 34

slide-35
SLIDE 35

Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky

Pacman/Ghost Evaluation

June 10, 2017 Adversarial Search 35

Thrashing Thrashing-Fixed SmartGhosts-1 SmartGhosts-2

slide-36
SLIDE 36

Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky

Nondeterministic Games

June 10, 2017 Adversarial Search 36

slide-37
SLIDE 37

Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky

Worst Case vs. Average Case

In nondeterministic games, chance is introduced by non-opponent stochasticity (e.g. dice, card-shuffling)

June 10, 2017 Adversarial Search 37

10 10 9 100 max min

slide-38
SLIDE 38

Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky

Expectiminimax Search

  • Why wouldn’t we know what the

result of an action will be?

– Explicit randomness: rolling dice – Unpredictable opponents: the ghosts respond randomly – Actions can fail: when moving a robot, wheels might slip

  • Values should now reflect

average-case (expectimax)

  • utcomes, not worst-case

(minimax) outcomes

  • Expectiminimax search: compute

the average score under optimal play

– Max nodes as in minimax search – Chance nodes are like min nodes but the outcome is uncertain – Calculate their expected utilities

June 10, 2017 Adversarial Search 38

10 4 5 7 max chance 10 10 9 100

slide-39
SLIDE 39

Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky

Reminder: Probabilities

  • A random variable represents

an event whose outcome is unknown

  • A probability distribution is

an assignment of weights to

  • utcomes
  • Example: Traffic on freeway

– Random variable:

  • T = whether there’s traffic

– Outcomes:

  • T in {none, light, heavy}

– Distribution:

  • P(T=none) = 0.25
  • P(T=light) = 0.50
  • P(T=heavy) = 0.25

June 10, 2017 Adversarial Search 39

0.25 0.50 0.25

slide-40
SLIDE 40

Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky

Reminder: Expectations

  • The expected value of a function of a random

variable is the average, weighted by the probability distribution over outcomes

  • Example: How long to get to the airport?

June 10, 2017 Adversarial Search 40

0.25 0.50 0.25 𝑸(𝑼) 𝑼 20 min 30 min 60 min Time

x x x

+ +

35 min

slide-41
SLIDE 41

Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky

Expectiminimax Implementation

June 10, 2017 Adversarial Search 41

def exp-value(state): initialize v = 0 for each successor of state: p = probability(successor) v += p * value(successor) return v def max-value(state): initialize v = -∞ for each successor of state: v = max(v, value(successor)) return v def value(state): if the state is a terminal state: return the state’s utility if the next agent is MAX: return max-value(state) if the next agent is EXP: return exp-value(state)

slide-42
SLIDE 42

Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky

Expectiminimax Example

June 10, 2017 Adversarial Search 42

def exp-value(state): initialize v = 0 for each successor of state: p = probability(successor) v += p * value(successor) return v

5 7 8 24

  • 12

1/2 1/3 1/6

v = (1/2) (8) + (1/3) (24) + (1/6) (-12) = 10

slide-43
SLIDE 43

Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky

Where Do Probabilities Come From?

  • In expectiminimax search, we

have a probabilistic model of how the opponent (or environment) will behave in any state

– Model could be a simple uniform distribution (roll a die) – Model could be sophisticated and require a great deal of computation – We have a chance node for any

  • utcome out of our control:
  • pponent or environment

– The model might say that adversarial actions are likely!

  • For now, assume each chance

node magically comes along with probabilities that specify the distribution over its outcomes

June 10, 2017 Adversarial Search 43

slide-44
SLIDE 44

Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky

Summary

  • A game can be formulated as a search problem, with a

solution policy (𝑇 → 𝐵)

  • For deterministic games, the minimax algorithm plays
  • ptimally (assuming the game tree is reasonable)
  • To help with resource limitations, standard practice is

to employ alpha-beta pruning and depth-limited search (with an evaluation function)

  • To model uncertainty, the expectiminimax algorithm

introduces chance nodes that employ a probability distribution over actions to model expected utility

June 10, 2017 Adversarial Search 44