CS 331: Artificial Intelligence Adversarial Search II 1 Outline - - PDF document

cs 331 artificial intelligence adversarial search ii
SMART_READER_LITE
LIVE PREVIEW

CS 331: Artificial Intelligence Adversarial Search II 1 Outline - - PDF document

CS 331: Artificial Intelligence Adversarial Search II 1 Outline 1. Evaluation Functions 2 2. 2 player zero-sum finite stochastic games 2 pla er ero s m finite stochastic games of perfect information 3. State-of-the-art game playing programs


slide-1
SLIDE 1

1

CS 331: Artificial Intelligence Adversarial Search II

1

Outline

  • 1. Evaluation Functions

2 2 pla er ero s m finite stochastic games

  • 2. 2 player zero-sum finite stochastic games
  • f perfect information
  • 3. State-of-the-art game playing programs

2

slide-2
SLIDE 2

2

Evaluation Functions

3

Evaluation Functions

  • Minimax and Alpha-Beta require us to search all

the way to the terminal states y

  • What if we can’t do this in a reasonable amount of

time?

  • Cut off search earlier and apply a heuristic

evaluation function to states in the search

  • Effectively turns non-terminal nodes into terminal

4

  • Effectively turns non-terminal nodes into terminal

leaves

slide-3
SLIDE 3

3

Evaluation Functions

  • If at terminal state after cutting off search, return

actual utility

  • If at non-terminal state after cutting off search,

return an estimate of the expected utility of the game from that state

5

T Cutoff

Example: Evaluation Function for Tic-Tac-Toe

X is the maximizing player

X O O X X

Eval=+100 (for win)

O X X O X O

Eval=-100 (for loss)

X O

6

O X X O

Eval=2 X’s move

X O O X

X’s move Eval=1

slide-4
SLIDE 4

4

Properties of Good Evaluation Functions

1. Orders the terminal states in the same way as the utility function 2. Computation can’t take too long 3. Evaluation function should be strongly correlated with the actual chances of winning

Exact values don’t matter. It’s the ordering of terminal

7

g states that matters. In fact, behavior is preserved under any monotonic transformation of the evaluation function

Properties of Good Evaluation Functions

1. Orders the terminal states in the same way as the utility function 2. Computation can’t take too long 3. Evaluation function should be strongly correlated with the actual chances of winning

8

Even in a deterministic game like chess, the evaluation function introduces uncertainty because of the lack of computational resources (can’t see all the way to the terminal state so you have to make a guess as to how good your state is).

slide-5
SLIDE 5

5

Coming up with Evaluation Functions

  • Extract features from the game

For e ample hat feat res from a game of

  • For example, what features from a game of

chess indicate that a state will likely lead to a win?

9

n

Coming up with Evaluation Functions

Weighted linear function:

    

n i i i n n

s f w s f w s f w s f w s

1 2 2 1 1

) ( ) ( ) ( ) ( ) EVAL( 

wi’s are weights fi’s are features of the game state (eg. # of g game state (eg. # of pawns in chess) The weights and features are ways of encoding human knowledge of game strategies into the adversarial search algorithm

slide-6
SLIDE 6

6

Coming up with Evaluation Functions?

  • Suppose we use the weighted linear

evaluation function for chess What are evaluation function for chess. What are two problems with it?

  • 1. Assumes features are independent
  • 2. Need to know if you’re at the beginning,

middle, or end of the game

11

Alpha-Beta with Eval Functions

Replace: if TERMINAL-TEST(state) then return UTILITY(state) ( ) ( ) With if CUTOFF-TEST(state,depth) then return EVAL(state) Also, need to pass depth parameter along and need to

12

increment depth parameter with each recursive call.

slide-7
SLIDE 7

7

The depth parameter

  • CUTOFF-TEST(state,depth) returns:

True for all terminal states – True for all terminal states – True for all depth greater than some fixed depth limit d

  • How to pick d?

– Pick d so that agent can decide on move within

13

some time limit – Could also use iterative deepening

Quiescence Search

  • Suppose the board at the

left is at the depth limit

  • Black ahead by 2 pawns

and a knight

  • Heuristic function says

Black is doing well

  • But it can’t see one more

move ahead when White

14

W takes Black’s queen

slide-8
SLIDE 8

8

Quiescence Search

  • Evaluation function should only be applied

to quiescent positions to quiescent positions

  • ie. positions that don’t exhibit wild swings

in value in the near future

  • Quiescence search: nonquiescent positions

can be expanded further until quiescent

15

p q positions are reached

Horizon Effect

  • Stalling moves push an unavoidable and

damaging move by the opponent “over the damaging move by the opponent over the search horizon” to a place where it cannot be detected

  • Agent believes it has avoided the damaging,

inevitable move with these stalling moves

16

slide-9
SLIDE 9

9

Horizon Effect Example

17

Singular Extensions

  • Can be used to avoid horizon effect
  • Expand only 1 move that is clearly better than all

Expand only 1 move that is clearly better than all

  • ther moves
  • Goes beyond normal depth limit because

branching factor is 1

  • In chess example, if Black’s checking moves and

White’s king moves are clearly better than the

18

White s king moves are clearly better than the alternatives, then singular extension will expand search until it picks up the queening

slide-10
SLIDE 10

10

Another Optimization: Forward Pruning

  • Prune moves at a given node immediately

Dangero s! Might pr ne a a the best

  • Dangerous! Might prune away the best

move

  • Best used in special situations eg.

symmetric or equivalent moves

19

Chess

  • Branching factor: 35 on average

Minima lookahead abo t 5 pl

  • Minimax lookahead about 5 ply
  • Humans lookahead about 6-8 plies
  • Alpha-Beta lookahead about 10 plies

(roughly expert level of play)

20

If you do all the optimizations discussed so far

slide-11
SLIDE 11

11

2 player zero-sum finite stochastic games of perfect information

21

But First…A Mini-Tutorial on Expected Values

What is probability? p y

– The relative frequency with which an outcome would be obtained if the process were repeated a large number of times under similar conditions

22

Example: Probability of rolling a 1 on a fair dice is about 1/6

slide-12
SLIDE 12

12

Expected Values

  • Suppose you have an event that can take a

finite number of outcomes finite number of outcomes

– Eg. Rolling a dice, you can get either 1, 2, 3, 4, 5, 6

  • Expected value: What is the average value

you should get if you roll a fair dice?

23

Expected Values

What if your dice isn’t fair? Suppose your probabilities are: probabilities are:

Value Prob 1 2 3 Value Prob 1 0.5 2 3 Value Prob 1 0.1 2 0.1 3 0.2 OR OR

24

4 5 6 1 4 5 6 0.5 4 0.2 5 0.3 6 0.1

slide-13
SLIDE 13

13

Expected Values

The expected value is a weighted average of the probability of an outcome times the value of that

  • utcome

  • me)

value(outc * me) Prob(outco

Value Prob 1 0.1

Expected Value (0 1)(1)+(0 1)(2)+(0 2)(3)+(0 2)(4)+(0 3)(5)+(0 1)(6)

25

2 0.1 3 0.2 4 0.2 5 0.3 6 0.1

= (0.1)(1)+(0.1)(2)+(0.2)(3)+(0.2)(4)+(0.3)(5)+(0.1)(6) = 0.1 + 0.2 + 0.6 + 0.8 + 1.5 + 0.6 = 3.8

2 player zero-sum finite stochastic games of perfect information

A MAX B Chance Chance p=0.1

  • 50

p=0.9 +10

  • 2

p=0.5 p=0.5 MIN

26

  • 12

+10 p

  • Need to calculate expected value for

chance nodes

  • Calculate expectiminimax value instead of

minimax value

slide-14
SLIDE 14

14

2 player zero-sum finite stochastic games of perfect information

A MAX B Chance Chance p=0.1

  • 50

p=0.9 +10

  • 2

p=0.5 p=0.5

(0.5)(10)+(0.5)(-12)=

  • 1

MIN

27

  • 12

+10 p

2 player zero-sum finite stochastic games of perfect information

A

(0 1)( 50)+(0 9)(10) 4

MAX B Chance Chance p=0.1

  • 50

p=0.9 +10

  • 2

p=0.5 p=0.5

(0.1)(-50)+(0.9)(10)=4 (0.5)(10)+(0.5)(-12)=

  • 1
  • 2

MIN

28

  • 12

+10 p

slide-15
SLIDE 15

15

2 player zero-sum finite stochastic games of perfect information

A

(0 1)( 50)+(0 9)(10) 4 4

MAX B Chance Chance p=0.1

  • 50

p=0.9 +10

  • 2

p=0.5 p=0.5

(0.1)(-50)+(0.9)(10)=4 (0.5)(10)+(0.5)(-12)=

  • 1
  • 2

MIN

29

  • 12

+10 p

Expectiminimax

 ) IMAX( EXPECTIMIN n ) UTILITY(n

If n is a terminal state

) UTILITY(n ) IMAX( EXPECTIMIN max

) ( Successors s

s

n 

) IMAX( EXPECTIMIN min

) ( Successors s

s

n 

(n) Successors s

) IMAX( EXPECTIMIN ) ( s s P

If n is a MAX node If n is a terminal state If n is a chance node If n is a MIN node

30 ( )

slide-16
SLIDE 16

16

Evaluation Functions

a1 a2 a1 a2 Max 2 3 2.1 1 4 1.3 20 30 21 1 400 40.9 0.9 0.1 0.9 0.1 0.9 0.1 0.9 0.1 Chance Min

31

2 2 3 3 1 1 4 4 20 20 30 30 1 1 400 400

Eval function: [1,2,3,4] on leaves Eval function: [1,20,30,400] on leaves

Evaluation Functions

a1 a2 a1 a2 Max 2 3 2.1 1 4 1.3 20 30 21 1 400 40.9 0.9 0.1 0.9 0.1 0.9 0.1 0.9 0.1 Chance Min 2 2 3 3 1 1 4 4 20 20 30 30 1 1 400 400

Order of evaluation values remains the same but their scale differs. This changes the behavior of the program! To preserve the behavior, you need to do a positive linear transformation on the expected utilities of a position.

slide-17
SLIDE 17

17

Complexity of Expectiminimax

  • Minimax – O(bm)

E pectiminima O(bmnm)

  • Expectiminimax – O(bmnm)

N = # of possibilities at a chance node (assuming all chance nodes have the same number of possibilities)

33

Expectiminimax is computationally expensive so you can’t look ahead too far! The uncertainty due to randomness accounts for the expense.

Alpha-Beta for Games with Chance Nodes

  • Yes it can be done!

B t e need to kno the bo nds on the

  • But we need to know the bounds on the

utility function

  • If we don’t we can’t know the bound on the

expected value of a node

34

slide-18
SLIDE 18

18

State-of-the-art Game Playing Programs

35

State of the Art Game Programs

  • Checkers (Samuels, Chinook)

Othello (Logistello)

  • Othello (Logistello)
  • Backgammon (Tesauro’s TD-gammon)
  • Go (Goemate, Go4++)
  • Bridge (Bridge Baron, GIB)

Ch

36

  • Chess
slide-19
SLIDE 19

19

Chess

  • Deep Blue – Campbell, Hsu, Hoane

1997 Deep Bl e defeats Garr Kasparo

  • 1997 – Deep Blue defeats Garry Kasparov

in a 6 game exhibition match

37

Chess

  • Deep Blue Hardware:

Parallel computer with 30 IBM RS/6000 – Parallel computer with 30 IBM RS/6000 processors running the software search – 480 custom VLSI chess processors that performed:

  • Move generation (and move ordering)
  • Hardware search for the last few levels of the tree

38

  • Hardware search for the last few levels of the tree
  • Evaluation of leaf nodes
slide-20
SLIDE 20

20

Chess

  • Algorithm:

– Iterative-deepening alpha-beta search with a transposition table transposition table – Key to success: generating extensions beyond the depth limit for sufficiently interesting lines of forcing/forced moves – Reaches depth 14 routinely, depth 40 in some cases – Evaluation function:

  • Had over 8000 features

U d i f b 4000 i i

39

  • Used an opening of about 4000 positions
  • Database of 700,000 grandmaster games
  • Large endgame database of solved positions (all positions with

5 pieces, many with 6 pieces remaining)

Chess

  • So was it the hardware or software that

made the difference? made the difference?

– Campbell et al. say search extensions and evaluation function were critical – But recent algorithmic improvements allow programs running on standard PCs to beat

  • pponents running on massively parallel

40

  • pponents running on massively parallel

machines

slide-21
SLIDE 21

21

What you should know

  • What evaluation functions are

Problems ith them like q iescence

  • Problems with them like quiescence,

horizon effect

  • How to calculate the expectiminimax value
  • f a node

41