Adversarial Search Rob Platt Northeastern University Some images - - PowerPoint PPT Presentation

adversarial search
SMART_READER_LITE
LIVE PREVIEW

Adversarial Search Rob Platt Northeastern University Some images - - PowerPoint PPT Presentation

Adversarial Search Rob Platt Northeastern University Some images and slides are used from: AIMA CS188 UC Berkeley What is adversarial search? Adversarial search: planning used to play a game such as chess or checkers algorithms are


slide-1
SLIDE 1

Adversarial Search

Rob Platt Northeastern University Some images and slides are used from: AIMA CS188 UC Berkeley

slide-2
SLIDE 2

What is adversarial search?

Adversarial search: planning used to play a game such as chess or checkers – algorithms are similar to graph search except that we plan under the assumption that our opponent will maximize his own advantage...

slide-3
SLIDE 3

Chess Checkers Tic-tac-toe Go Solved/unsolved? Solved/unsolved? Solved/unsolved? Solved/unsolved? Outcome of game can be predicted from any initial state assuming both players play perfectly

Some types of games

slide-4
SLIDE 4

Examples of adversarial search

Chess Checkers Tic-tac-toe Go Outcome of game can be predicted from any initial state assuming both players play perfectly Unsolved Solved Solved Unsolved

slide-5
SLIDE 5

Examples of adversarial search

Chess Checkers Tic-tac-toe Go Outcome of game can be predicted from any initial state assuming both players play perfectly Unsolved Solved Solved Unsolved ~10^40 states ~10^20 states Less than 9!=362k states ?

slide-6
SLIDE 6

Different types of games

Deterministic / stochastic Two player / multi player? Zero-sum / non zero-sum Perfect information / imperfect information

slide-7
SLIDE 7

Different types of games

Deterministic / stochastic Two player / multi player? Zero-sum / non zero-sum Perfect information / imperfect information

Zero Sum: – utilities of all players sum to zero – pure competition Non-Zero Sum: – utility function of each play could be arbitrary – optimal strategies could involve cooperation

slide-8
SLIDE 8

Formalizing a Game

Calculate a policy: Action that player p should take from state s Given:

slide-9
SLIDE 9

Formalizing a Game

Calculate a policy: Action that player p should take from state s Given:

How?

slide-10
SLIDE 10

How solve for a policy?

Use adversarial search! – build a game tree

slide-11
SLIDE 11

This is a game tree for tic-tac-toe

slide-12
SLIDE 12

This is a game tree for tic-tac-toe

You

slide-13
SLIDE 13

This is a game tree for tic-tac-toe

You Them

slide-14
SLIDE 14

This is a game tree for tic-tac-toe

You Them You

slide-15
SLIDE 15

This is a game tree for tic-tac-toe

You Them Them You

slide-16
SLIDE 16

This is a game tree for tic-tac-toe

You Them Them You Utility

slide-17
SLIDE 17

What is Minimax?

Consider a simple game:

  • 1. you make a move
  • 2. your opponent makes a move
  • 3. game ends
slide-18
SLIDE 18

What is Minimax?

Consider a simple game:

  • 1. you make a move
  • 2. your opponent makes a move
  • 3. game ends

What does the minimax tree look like in this case?

slide-19
SLIDE 19

What is Minimax?

3 8 12 2 6 4 14 2 5

Max (you) Min (them) Max (you)

Consider a simple game:

  • 1. you make a move
  • 2. your opponent makes a move
  • 3. game ends

What does the minimax tree look like in this case?

slide-20
SLIDE 20

What is Minimax?

3 8 12 2 6 4 14 2 5

Max (you) Min (them) Max (you)

These are terminal utilities – assume we know what these values are

slide-21
SLIDE 21

What is Minimax?

3 8 12 2 6 4 14 2 5 3 2 2

Max (you) Min (them) Max (you)

slide-22
SLIDE 22

What is Minimax?

3 8 12 2 6 4 14 2 5 3 2 2 3

Max (you) Min (them) Max (you) Max (you) Min (them)

slide-23
SLIDE 23

What is Minimax?

3 8 12 2 6 4 14 2 5 3 2 2 3

Max (you) Min (them) Max (you)

This is called “backing up” the values

slide-24
SLIDE 24

Minimax

3 8 12 2 6 4 14 2 5

Okay – so we know how to back up values ... … but, how do we construct the tree?

This tree is already built...

slide-25
SLIDE 25

Minimax

Notice that we only get utilities at the bottom of the tree … – therefore, DFS makes sense.

slide-26
SLIDE 26

Minimax

Notice that we only get utilities at the bottom of the tree … – therefore, DFS makes sense.

slide-27
SLIDE 27

Minimax

Notice that we only get utilities at the bottom of the tree … – therefore, DFS makes sense.

3

slide-28
SLIDE 28

Minimax

Notice that we only get utilities at the bottom of the tree … – therefore, DFS makes sense.

3 12

slide-29
SLIDE 29

Minimax

Notice that we only get utilities at the bottom of the tree … – therefore, DFS makes sense.

3 8 12

slide-30
SLIDE 30

Minimax

Notice that we only get utilities at the bottom of the tree … – therefore, DFS makes sense.

3 8 12 3

slide-31
SLIDE 31

Minimax

Notice that we only get utilities at the bottom of the tree … – therefore, DFS makes sense.

3 8 12 3

slide-32
SLIDE 32

Minimax

Notice that we only get utilities at the bottom of the tree … – therefore, DFS makes sense.

3 8 12 2 6 4 3 2

slide-33
SLIDE 33

Minimax

Notice that we only get utilities at the bottom of the tree … – therefore, DFS makes sense.

3 8 12 2 6 4 14 2 5 3 2 2 3

slide-34
SLIDE 34

Minimax

Notice that we only get utilities at the bottom of the tree … – therefore, DFS makes sense. – since most games have forward progress, the distinction between tree search and graph search is less important

slide-35
SLIDE 35

Minimax

slide-36
SLIDE 36

Is it always correct to assume your opponent plays optimally?

Minimax properties

10 10 9 100

Max (you) Min (them) Max (you) ?

slide-37
SLIDE 37

Is minimax optimal? Is it complete?

Minimax properties

slide-38
SLIDE 38

Is minimax optimal? Is it complete? Time complexity = ? Space complexity = ?

Minimax properties

slide-39
SLIDE 39

Is minimax optimal? Is it complete? Time complexity = Space complexity =

Minimax properties

slide-40
SLIDE 40

Is minimax optimal? Is it complete? Time complexity = Space complexity = Is it practical? In chess, b=35, d=100

Minimax properties

slide-41
SLIDE 41

Is minimax optimal? Is it complete? Time complexity = Space complexity = Is it practical? In chess, b=35, d=100

Minimax properties

is a big number...

slide-42
SLIDE 42

Is minimax optimal? Is it complete? Time complexity = Space complexity = Is it practical? In chess, b=35, d=100

Minimax properties

is a big number...

So what can we do?

slide-43
SLIDE 43

Key idea: cut off search at a certain depth and give the corresponding nodes an estimated value.

Evaluation functions

Cut off recursion here

1

  • 5
  • 6
  • 6

3 1

1

slide-44
SLIDE 44

Key idea: cut off search at a certain depth and give the corresponding nodes an estimated value.

Evaluation functions

Cut off recursion here

1

  • 5
  • 6
  • 6

3 1

1

the evaluation function makes this estimate.

slide-45
SLIDE 45

Evaluation functions

How does the evaluation function make the estimate? – depends upon domain For example, in chess, the value of a state might equal the sum of piece values. – a pawn counts for 1 – a rook counts for 5 – a knight counts for 3 ...

slide-46
SLIDE 46

A weighted linear evaluation function

number of pawns on the board number of knights on the board A pawn counts for 1 A knight counts for 3

Eval = 3-2.5=0.5 Eval = 3+2.5+1+1-2.5 = 5

slide-47
SLIDE 47

A weighted linear evaluation function

number of pawns on the board number of knights on the board A pawn counts for 1 A knight counts for 3

Eval = 3-2.5=0.5 Eval = 3+2.5+1+1-2.5 = 5 Maybe consider other factors as well?

slide-48
SLIDE 48

Problem: In realistic games, cannot search to leaves! Solution: Depth-limited search

Instead, search only to a limited depth in the tree Replace terminal utilities with an evaluation function for non-terminal positions

Example:

Suppose we have 100 seconds Can explore 10K nodes / sec So can check 1M nodes per move

Guarantee of optimal play is gone More plies makes a BIG difference Use iterative deepening for an anytime algorithm

Evaluation functions

slide-49
SLIDE 49

At what depth do you run the evaluation function?

Option 1: cut off search at a fixed depth Option 2: cut off search at particular states deeper than a certain threshold The deeper your threshold, the less the quality of the evaluation function matters...

1

  • 5
  • 6
  • 6

3 1

1

slide-50
SLIDE 50

Alpha/Beta pruning

slide-51
SLIDE 51

Alpha/Beta pruning

3 8 12 3

slide-52
SLIDE 52

Alpha/Beta pruning

3 8 12 3

slide-53
SLIDE 53

Alpha/Beta pruning

3 8 12 2 3

slide-54
SLIDE 54

Alpha/Beta pruning

3 8 12 2 4 3

slide-55
SLIDE 55

Alpha/Beta pruning

3 8 12 2 4 3 We don't need to expand this node!

slide-56
SLIDE 56

Alpha/Beta pruning

3 8 12 2 4 3 We don't need to expand this node! Why?

slide-57
SLIDE 57

Alpha/Beta pruning

3 8 12 2 4 3 We don't need to expand this node! Why?

Max Min

slide-58
SLIDE 58

Alpha/Beta pruning

Max Min

3 8 12 2 14 2 5 3 2 2 3

slide-59
SLIDE 59

Alpha/Beta pruning

Max Min

3 8 12 2 14 2 5 3 2 2 3 So, we don't need to expand these nodes in order to back up correct values!

slide-60
SLIDE 60

Alpha/Beta pruning

Max Min

3 8 12 2 14 2 5 3 2 2 3 So, we don't need to expand these nodes in order to back up correct values! That's alpha-beta pruning.

slide-61
SLIDE 61

def min-value(state , α, β): initialize v = +∞ for each successor of state: v = min(v, value(successor, α, β)) if v ≤ α return v β = min(β, v) return v def max-value(state, α, β): initialize v = -∞ for each successor of state: v = max(v, value(successor, α, β)) if v ≥ β return v α = max(α, v) return v α: MAX’s best option on path to root β: MIN’s best option on path to root

Alpha/Beta pruning: algorithm

slide-62
SLIDE 62

Alpha/Beta pruning

(-inf,+inf)

slide-63
SLIDE 63

Alpha/Beta pruning

(-inf,+inf) (-inf,+inf)

slide-64
SLIDE 64

Alpha/Beta pruning

3 3 (-inf,+inf) (-inf,3) Best value for far for MIN along path to root

slide-65
SLIDE 65

Alpha/Beta pruning

3 12 3 (-inf,+inf) (-inf,3) Best value for far for MIN along path to root

slide-66
SLIDE 66

Alpha/Beta pruning

3 8 12 3 (-inf,+inf) (-inf,3) Best value for far for MIN along path to root

slide-67
SLIDE 67

Alpha/Beta pruning

3 8 12 3 (3,+inf) (-inf,3) Best value for far for MAX along path to root

slide-68
SLIDE 68

Alpha/Beta pruning

3 8 12 3 (3,+inf) (-inf,3) (3,+inf)

slide-69
SLIDE 69

Alpha/Beta pruning

3 8 12 3 2 (3,+inf) (-inf,3) (3,+inf) 2

slide-70
SLIDE 70

Alpha/Beta pruning

3 8 12 3 2 (3,+inf) (-inf,3) (3,+inf) 2 Prune because value (2) is out of alpha-beta range

slide-71
SLIDE 71

Alpha/Beta pruning

3 8 12 3 2 (3,+inf) (-inf,3) (3,+inf) 2 (3,+inf)

slide-72
SLIDE 72

Alpha/Beta pruning

3 8 12 3 2 (3,+inf) (-inf,3) (3,+inf) 2 14 (3,14) 14

slide-73
SLIDE 73

Alpha/Beta pruning

3 8 12 3 2 (3,+inf) (-inf,3) (3,+inf) 2 5 (3,5) 14 5

slide-74
SLIDE 74

Alpha/Beta pruning

3 8 12 3 2 (3,+inf) (-inf,3) (3,+inf) 2 2 (3,5) 14 5 2

slide-75
SLIDE 75

Alpha/Beta algorithm

slide-76
SLIDE 76

Alpha/Beta properties

Is it complete?

slide-77
SLIDE 77

Alpha/Beta properties

Is it complete? How much does alpha/beta help relative to minimax? Minimax time complexity = Alpha/beta time complexity >= – the improvement w/ alpha/beta depends upon move ordering...

slide-78
SLIDE 78

Alpha/Beta properties

Is it complete? How much does alpha/beta help relative to minimax? Minimax time complexity = Alpha/beta time complexity >= – the improvement w/ alpha/beta depends upon move ordering... 3 8 12 2 6 4 14 2 5 3 2 2 3 The order in which we expand a node.

slide-79
SLIDE 79

Alpha/Beta properties

Is it complete? How much does alpha/beta help relative to minimax? Minimax time complexity = Alpha/beta time complexity >= – the improvement w/ alpha/beta depends upon move ordering... 3 8 12 2 6 4 14 2 5 3 2 2 3 The order in which we expand a node. How to choose move ordering? Use IDS. – on each iteration of IDS, use prior run to inform ordering of next node expansions.

slide-80
SLIDE 80

Expectimax

10 10 9 100

? What if your opponent does not maximize his/her utility? – e.g. suppose he/she picks moves uniformly at random? Max (you) Min (them) Max (you)

slide-81
SLIDE 81

Expectimax

10 10 9 100

10 9 Minimax backup for a rational agent: Max (you) Min (them) Max (you)

slide-82
SLIDE 82

Expectimax

10 10 9 100

10 54.5 Minimax backup for agent who selects actions uniformly at random: Max (you) Min (them) Max (you)

slide-83
SLIDE 83

Expectimax

10 10 9 100

10 54.5 Minimax backup for agent who selects actions uniformly at random: Max (you) Min (them) Max (you)

Instead of backing up min values for min-plys, back up the average – could also account for agents who are somewhere in between rational and uniformly random. How? – later, this idea will be generalized using Markov Decision Processes

slide-84
SLIDE 84

Backgammon

1 2 3 4 5 6 7 8 9 1 1 1 1 2 2 4 2 3 2 2 2 1 2 1 9 1 8 1 7 1 6 1 5 1 4 1 3 2 5

Mixing these ideas: Nondeterministic games

slide-85
SLIDE 85

In nondeterministic games, chance introduced by dice, card-shuffling Simplified example with coin-flipping:

Nondeterministic games in general

2 4 7 4 6 5 − 2 2 4 − 2 0.5 0.5 0.5 0.5 3 − 1

max min chance