Chapter6 Adversarial Search 20070419 Chap6 1 Game Theory - - PDF document

chapter6
SMART_READER_LITE
LIVE PREVIEW

Chapter6 Adversarial Search 20070419 Chap6 1 Game Theory - - PDF document

Chapter6 Adversarial Search 20070419 Chap6 1 Game Theory Studied by mathematicians, economists, finance In AI we limit games to: - deterministic - turn-taking - two-player - zero-sum ( Win-lose Game;


slide-1
SLIDE 1

1

20070419 Chap6 1

Chapter6

Adversarial Search

20070419 Chap6 2

Game Theory

  • Studied by mathematicians, economists, finance
  • In AI we limit games to:
  • deterministic
  • turn-taking
  • two-player
  • zero-sum (零和遊戲或 Win-lose Game;你死我活)
  • perfect information

This means deterministic, fully observable environments in which there are two agents whose actions must alternate and in which the utility values at the end of the game are always equal and opposite.

slide-2
SLIDE 2

2

20070419 Chap6 3

Types of Games

  • Game playing was one of the first tasks undertaken in AI.
  • Machines have surpassed humans on checkers and Othello,

have defeated human champions in chess and backgammon.

  • In Go, computers perform at the amateur level.

Bridge (橋牌), Poker(梭哈) Blind Tictactoe Imperfect information Backgammon(西洋雙陸棋) Monopoly (地產大亨, 大富翁) Chess, Go, Othello Checkers(西洋跳棋) Perfect information Chance Deterministic

20070419 Chap6 4

Games as Search Problems

  • Games offer pure, abstract competition.
  • A chess-playing computer would be

an existence proof of a machine doing something generally thought to require intelligence.

  • Games are idealization of worlds in which
  • the world state is fully accessible;
  • the (small number of) actions are well-defined;
  • uncertainty

due to moves by the opponent due to the complexity of games

slide-3
SLIDE 3

3

20070419 Chap6 5

Games as Search Problems (cont.-1)

  • Games are usually much too hard to solve.

For example, in a typical chess game,

  • Average branching factor: 35
  • Average moves by each player: 50
  • Total number of nodes in search tree :

35100 or 10154

(although total number of different legal positions: 1040)

  • Time limits for making good decisions

20070419 Chap6 6

Games as Search Problems (cont.-2)

  • Initial State
  • How does the game start?
  • Successor Function
  • A list of legal (move, state) pairs for each state
  • Terminal Test
  • Determines when game is over
  • Utility Function
  • Provides numeric value for all terminal states
slide-4
SLIDE 4

4

20070419 Chap6 7

Partial Game Tree

20070419 Chap6 8

Optimal strategies

  • Find the contingent strategy for MAX assuming

an infallible MIN opponent.

  • Assumption: Both players play optimally !!
  • Given a game tree, the optimal strategy can be

determined by using the minimax value of each node:

MinimaxValue(n) =

Utility (n)

if n is a terminal state

maxs∈ Successors(n) MinimaxValue(s)

if n is a MAX node

mins∈ Successors(n) MinimaxValue(s)

if n is a MIN node

slide-5
SLIDE 5

5

20070419 Chap6 9

Minimax

  • Perfect play for deterministic, perfect information games
  • Idea: choose move to a position with highest minimax value

= best achievable payoff against best play

20070419 Chap6 10

Two-Ply Game Tree

slide-6
SLIDE 6

6

20070419 Chap6 11

Two-Ply Game Tree (cont.-1)

20070419 Chap6 12

Two-Ply Game Tree (cont.-2)

slide-7
SLIDE 7

7

20070419 Chap6 13

Two-Ply Game Tree (cont.-3)

The minimax decision

Minimax maximizes the worst-case outcome for max.

20070419 Chap6 14

Minimax Algorithm

slide-8
SLIDE 8

8

20070419 Chap6 15

The Minimax Algorithm (cont.)

  • Generate the whole game tree.
  • Apply the utility function to each terminal state.
  • Determine the utility of the nodes one level higher up

from the terminal nodes.

  • Continue backing up the values.
  • At the root, MAX chooses the move leading to the

highest utility value.

) . , , 2 . , 1 . ( min max / ) Utility( b n n n n K =

20070419 Chap6 16

Analysis of Minimax

Complete?? Yes, only if tree is finite Optimal??

Yes , against an optimal opponent. Otherwise??

Time??

O(bm), is a complete depth-first search m: max depth, b: # of legal moves

Space??

O(bm), generate all successors at once

  • r O(m), generate successors one at a time

For chess, b ≈ 35, m ≈ 100 for “reasonable” games ⇒ Exact solution completely infeasible

slide-9
SLIDE 9

9

20070419 Chap6 17

Optimal Decisions in Multiplayer Games

  • Extend the minimax idea to multiplayer games
  • Replace the single value for each node with a vector of values

20070419 Chap6 18

α-β Pruning

  • The problem of minimax search

# of state to examine: exponential in number of moves

  • α- β Pruning:

returns the same moves as minimax would, but prunes away branches that cannot possibly influence the final decision

  • α: the value of the best (highest) choice so far in search of MAX
  • β: the value of the best (lowest) choice so far in search of MIN
  • Order of considering successors matters

(look at step f of Fig 6.5 pp.168)

  • If possible, consider best successors first
slide-10
SLIDE 10

10

20070419 Chap6 19

α-β Pruning (cont.)

If m is better than n for Player, we will never get to n in play and just prune it.

20070419 Chap6 20

α-β Pruning Example

[-∞, +∞] [-∞,+∞]

Range of possible values

Do DF-search until first leaf

slide-11
SLIDE 11

11

20070419 Chap6 21

α-β Pruning Example (cont.-1)

[-∞,3] [-∞,+∞]

20070419 Chap6 22

α-β Pruning Example (cont.-2)

[-∞,3] [-∞,+∞]

slide-12
SLIDE 12

12

20070419 Chap6 23

α-β Pruning Example (cont.-3)

[3,+∞] [3,3]

20070419 Chap6 24

α-β Pruning Example (cont.-4)

[-∞,2] [3,+∞] [3,3]

This node is worse for MAX

slide-13
SLIDE 13

13

20070419 Chap6 25

α-β Pruning Example (cont.-5)

[-∞,2] [3,14] [3,3] [-∞,14]

,

20070419 Chap6 26

α-β Pruning Example (cont.-6)

[−∞,2] [3,5] [-∞,5]

,

slide-14
SLIDE 14

14

20070419 Chap6 27

α-β Pruning Example (cont.-7)

[2,2] [−∞,2] [3,3] [3,3]

20070419 Chap6 28

α-β Pruning Example (cont.-8)

[2,2] [-∞,2] [3,3] [3,3]

slide-15
SLIDE 15

15

20070419 Chap6 29

The α-β Algorithm

20070419 Chap6 30

The α-β Algorithm (cont.)

slide-16
SLIDE 16

16

20070419 Chap6 31

Analysis of α-β Algorithm

  • Pruning does not affect final result.
  • Entire subtrees can be pruned.
  • Good move ordering improves its effectiveness

highly dependent on the order in which the successor are examined ⇒ It is worthwhile to try to examine first the successors that are likely to be best. e.g. Figure 6.5 (e, f) If successors of D is 2, 5, 14 (instead of 14, 5, 2) then 5, 14 can be pruned.

20070419 Chap6 32

Analysis of α-β Algorithm (cont.)

  • If best-move-first,
  • the total number of nodes examined is: O(bd/2)
  • the effective branching factor becomes: b1/2

for chess, 6 instead 35 i.e. α-β can look ahead roughly twice as far as minimax in the same amount of time.

  • If random ordering,
  • the total number of nodes examined is: O(b3d/4)

for moderate b

  • Repeated states are again possible.
  • Store them in memory = transposition table
slide-17
SLIDE 17

17

20070419 Chap6 33

Imperfect, Real-Time Decisions

  • Minimax and alpha-beta pruning require

too much leaf-node evaluations.

  • May be impractical within a reasonable

amount of time.

  • Shannon (1950):
  • Apply heuristic evaluation function EVAL

(replacing utility function of alpha-beta)

  • Cut off search earlier

(replacing terminal-test by Cutoff test)

20070419 Chap6 34

Heuristic Evaluation Functions

  • Produce an estimate of the expected utility of the

game from a given position.

  • Performance depends on the quality of EVAL
  • Requirements
  • EVAL should order terminal-nodes in the same as UTILITY
  • Computation cannot take too long
  • For non-terminal states, the EVAL should be strongly correlated

with the actual chance of winning.

  • Most evaluation functions work by calculating various features
  • f the state.

What are features of chess?

e.g. # of pawns possessed, etc.

  • Weighted linear function

Eval(s)= w1f1 (s) + w2f2 (s) + … + wnfn (s)

addition assumes independence of each feature

slide-18
SLIDE 18

18

20070419 Chap6 35

Heuristic Evaluation Functions (cont.-1)

  • Give material value for each piece (by chess book)

pawn 1 knight or bishop 3 rook 5 queen 9

20070419 Chap6 36

Heuristic Evaluation Functions (cont.-2)

Heuristic difficulties (Heuristic counts pieces won )

e.g. Two slightly different chess positions: (a) Black has an advantage of a knight and two pawns and will win the game. (b) Black will lose after white captures the queen.

slide-19
SLIDE 19

19

20070419 Chap6 37

Cutting Off Search

  • When do you recuse or use evaluation function?

if Cutoff-Test(state, depth) then return Eval(state)

  • Controlling the amount of search is to set a fixed depth limit d

Cutoff-Test(state, depth) returns 1 or 0 When 1 is returned for all depth greater than some fixed depth d, use evaluation function

  • Cutoff beyond a certain depth
  • Cutoff if state is stable (more predicable)
  • Cutoff moves you know are bad (forward pruning)
  • Can have disastrous effect if the evaluation

functions is not sophisticated enough.

  • Should continue the search until a quiescent

position is found (no wild swings in value in near future)

20070419 Chap6 38

Cutting Off Search (cont.)

  • Does it work in practice?

bm = 106, b = 35 ⇒ m = 4

4-ply lookahead is a hopeless chess player

4-ply ≈ human novice 8-ply ≈ typical PC, human master 12-ply ≈ Deep Blue, Kasparov

slide-20
SLIDE 20

20

20070419 Chap6 39

Horizontal Effect

  • Horizontal effect arises when the program is facing a move

by the opponent that causes serious damage and is ultimately unavoidable.

  • At present, no general solution has been found for the

horizon problem. (from Russel 1st ed. pp 129) a series of checks by the black rook forces the inevitable queening move by white “over the horizon” and makes this position look like a win for black, when it is really a win for white.

20070419 Chap6 40

Quiescence

  • A quiescent position is one which is

unlikely to exhibit a dramatic change of value in the near future.

  • Quiescence searches typically are

applied to certain types of moves (in chess, e.g. capture moves, protection of king, etc.)

slide-21
SLIDE 21

21

20070419 Chap6 41

Deterministic Games in Practice

  • Checkers (1994 computer wins)

Chinook ended 40-year-reign of human world champion Marion Tinsley in

  • 1994. Used a precomputed endgame database defining perfect play for all

positions involving 8 or fewer pieces on the board, a total of 444 billion positions.

  • Chess (1997 deep blue wins)

Deep Blue defeated human world champion Garry Kasparov in a six-game match in 1997. Deep Blue searches 200 million positions per second, uses very sophisticated evaluation, and undisclosed methods for extending some lines of search up to 40 ply.

  • Othello (computer wins)

human champions refuse to compete against computers, who are too good.

  • Go (human wins)

human champions refuse to compete against computers, who are too bad. In go, b > 300, so most programs use pattern knowledge bases to suggest plausible moves. 20070419 Chap6 42

Nondeterministic Games: Backgammon

White moves clockwise toward 25 Black moves counterclockwise toward 0 A piece can move to any position unless there are multiple opponent pieces there; if there is one opponent, it is captured and must start over. White has rolled 6-5 and must choose among four legal moves: (5-10, 5-11), (5-11, 19-24) (5-10, 10-16), and (5-11, 11-16)

slide-22
SLIDE 22

22

20070419 Chap6 43

Nondeterministic Games: Backgammon

(cont.) Chance nodes are included in the game tree

20070419 Chap6 44

Nondeterministic Games in General

In nondeterministic games, chance introduced by dice, card-shuffling

slide-23
SLIDE 23

23

20070419 Chap6 45

Algorithm for Nondeterministic Games

  • Expectiminimax gives perfect play
  • Expectiminimax (n) =

Utility(n) if n is a terminal state if n is a MAX node if n is a MIN node if n is a chance node ) s ( IMAX EXPECTIMIN max

) n ( Successors s∈

) ( min

) (

s IMAX EXPECTIMIN

n Successors s∈

∈ ) (

) ( * ) (

n Successors s

s IMAX EXPECTIMIN s P

20070419 Chap6 46

Digression: Exact Values DO Matter

Behavior is preserved only by positive linear transformation of Eval Hence Eval should be proportional to the expected payoff

A1 A1 A2 A2

Move A2 is best Move A1 is best

slide-24
SLIDE 24

24

20070419 Chap6 47

Games of Imperfect Information

e.g. card game, where opponent’s initial cards are unknown Typically we can calculate a probability for each possible deal Seems just like having one big dice roll at the beginning of the game Idea: compute the minimax value of each action in each deal, then choose the action with highest expected value over all deals Special case: if an action is optimal for all deals, it’s optimal The GIB program (Ginsberg, 1999), current best bridge program, approximates this idea by 1) generating 100 deals consistent with hiding information 2) picking the action that wins most tricks on average

20070419 Chap6 48

Example

slide-25
SLIDE 25

25

20070419 Chap6 49

Example (cont.-1)

20070419 Chap6 50

Example (cont.-2)

slide-26
SLIDE 26

26

20070419 Chap6 51

Summary

  • Games are fun to work on!
  • They illustrate several important points about AI
  • perfection is unattainable must approximate
  • good idea to think about what to think about
  • Uncertainty constrains the assignment of values to

states

  • Games are to AI as grand prix racing is to

automobile design.