Adversarial Search and Game Playing Russell and Norvig, Chapter 5 - - PowerPoint PPT Presentation

adversarial search and game playing
SMART_READER_LITE
LIVE PREVIEW

Adversarial Search and Game Playing Russell and Norvig, Chapter 5 - - PowerPoint PPT Presentation

Adversarial Search and Game Playing Russell and Norvig, Chapter 5 http://xkcd.com/601/ Games n Games: multi-agent environment q What do other agents do and how do they affect our success? q Cooperative vs. competitive multi-agent environments.


slide-1
SLIDE 1

Adversarial Search and Game Playing

Russell and Norvig, Chapter 5

http://xkcd.com/601/

slide-2
SLIDE 2

Games

n Games: multi-agent environment

q What do other agents do and how do they affect our

success?

q Cooperative vs. competitive multi-agent environments. q Competitive multi-agent environments give rise to

adversarial search a.k.a. games

n Why study games?

q Fun! q They are hard q Easy to represent and agents restricted to small

number of actions… sometimes!

2

slide-3
SLIDE 3

Relation of Games to Search

n Search – no adversary

q Solution is (heuristic) method for finding goal q Heuristics and CSP techniques can find optimal solution q Evaluation function: estimate of cost from start to goal through

given node

q Examples: path planning, scheduling activities

n Games – adversary

q Solution is strategy (strategy specifies move for every possible

  • pponent reply).

q Time limits force approximate solutions q Examples: chess, checkers, Othello, backgammon 3

slide-4
SLIDE 4

Types of Games

4

Our focus: deterministic, turn-taking, two-player, zero-sum games of perfect information

Deterministic Chance Perfect information chess, go, checkers,

  • thello

backgammon Imperfect information Bridge, hearts Poker, canasta, scrabble

zero-sum game: a participant's gain (or loss) is exactly balanced by the losses (or gains) of the other participant. perfect information: fully observable

slide-5
SLIDE 5

Partial Game Tree for Tic-Tac-Toe

5

slide-6
SLIDE 6

6

http://xkcd.com/832/

slide-7
SLIDE 7

The Tic-Tac-Toe search space

n Is this search space a tree or graph? n What is the minimum search depth? n What is the maximum search depth? n What is the branching factor?

slide-8
SLIDE 8

Game setup

n Two players: MAX and MIN n MAX moves first and they take turns until the game is over. n Games as search:

q initial state: e.g. starting board configuration q player(s): which player has the move in a state q action(s): set of legal moves in a state q result(s, a): the states resulting from a given move. q terminal-test(s): game over? (terminal states) q utility(s,p): value of terminal states, e.g., win (+1), lose (-1) and

draw (0) in chess.

n Players use search tree to determine next move.

8

slide-9
SLIDE 9

Optimal strategies

n Find the best strategy for MAX assuming an infallible MIN

  • pponent.

n Assumption: Both players play optimally. n Given a game tree, the optimal strategy can be determined

by using the minimax value of each node:

MINIMAX(s)= UTILITY(s) If s is a terminal maxa ∈ Actions(s) MINIMAX(RESULT(s,a)) If PLAYER(s)=MAX mina ∈ Actions(s) MINIMAX(RESULT(s,a)) If PLAYER(s)=MIN

9

slide-10
SLIDE 10

Two-ply game tree

10

Definition: ply = turn of a two-player game

MAX

A B C D 3 12 8 2 4 6 14 5 2 3 2 2 3 a1 a2 a3 b1 b2 b3 c1 c2 c3 d1 d2 d3

MIN

slide-11
SLIDE 11

Two-ply game tree

11

The minimax value at a min node is the minimum

  • f backed-up values, because your opponent will

do what’s best for them (and worst for you).

MAX

A B C D 3 12 8 2 4 6 14 5 2 3 2 2 3 a1 a2 a3 b1 b2 b3 c1 c2 c3 d1 d2 d3

MIN

slide-12
SLIDE 12

MAX

A B C D 3 12 8 2 4 6 14 5 2 3 2 2 3 a1 a2 a3 b1 b2 b3 c1 c2 c3 d1 d2 d3

MIN

Two-ply game tree

12

The minimax decision

Minimax maximizes the worst-case outcome for max.

slide-13
SLIDE 13

The minimax algorithm

13

function MINIMAX-DECISION(state) returns an action return arg maxa ∈ Actions(s) MIN-VALUE(RESULT(state,a)) function MIN-VALUE(state) returns a utility value if TERMINAL-TEST(state) then return UTILITY(state) v ← ∞ for a in ACTIONS(state) do v ← MIN(v,MAX-VALUE(RESULT(state,a))) return v function MAX-VALUE(state) returns a utility value if TERMINAL-TEST(state) then return UTILITY(state) v ← -∞ for each a in ACTIONS(state) do v ← MAX(v,MIN-VALUE(RESULT(state,a))) return v

slide-14
SLIDE 14

Properties of minimax

n Minimax explores tree using DFS. n Therefore:

q Time complexity: O(bm) q Space complexity: O(bm)

14

J L

slide-15
SLIDE 15

The problem with minimax search

n Number of game states is exponential in the number of

moves.

q Solution: Do not examine every node q Alpha-beta pruning

n Remove branches that do not influence final

decision

n General idea: you can bracket the highest/lowest

value at a node, even before all its successors have been evaluated

15

slide-16
SLIDE 16

Pruning

minimax(root) = max(min(3,12,8), min(2,x,y), min(14,5,2)) = max(3, min(2,x,y), 2) = max(3,z,2) where z = min(2,x,y) = 3

16

MAX

A B C D 3 12 8 2 4 6 14 5 2 3 2 2 3 a1 a2 a3 b1 b2 b3 c1 c2 c3 d1 d2 d3

MIN

x y

slide-17
SLIDE 17

Alpha-Beta Example

17

[-∞, +∞] [-∞,+∞]

Range of possible values

slide-18
SLIDE 18

Alpha-Beta Example (continued)

18

[-∞,3] [-∞,+∞]

slide-19
SLIDE 19

Alpha-Beta Example (continued)

19

[-∞,3] [-∞,+∞]

slide-20
SLIDE 20

Alpha-Beta Example (continued)

20

[3,+∞] [3,3]

slide-21
SLIDE 21

Alpha-Beta Example (continued)

21

[-∞,2] [3,+∞] [3,3]

This node is worse for MAX

slide-22
SLIDE 22

Alpha-Beta Example (continued)

22

[-∞,2] [3,14] [3,3] [-∞,14]

,

slide-23
SLIDE 23

Alpha-Beta Example (continued)

23

[-∞,2] [3,5] [3,3] [-∞,5]

,

slide-24
SLIDE 24

Alpha-Beta Example (continued)

24

[2,2] [-∞,2] [3,3] [3,3]

slide-25
SLIDE 25

Alpha-Beta Example (continued)

25

[2,2] [-∞,2] [3,3] [3,3]

slide-26
SLIDE 26

Alpha-Beta Pruning

n α: the best value for MAX (i.e. highest) along a path

from the root

n β: the best value for MIN (i.e. lowest) along a path

from the root

n initially α and β are (-∞, ∞).

slide-27
SLIDE 27

Alpha-Beta Algorithm

27

function ALPHA-BETA-SEARCH(state) returns an action v←MAX-VALUE(state, - ∞ , +∞) return the action in ACTIONS(state) with value v function MAX-VALUE(state,α , β) returns a utility value if TERMINAL-TEST(state) then return UTILITY(state) v ← - ∞ for each a in ACTIONS(state) do v ← MAX(v, MIN-VALUE(RESULT(state,a), α , β)) if v ≥ β then return v α ← MAX(α ,v) return v

slide-28
SLIDE 28

Alpha-Beta Algorithm

28

function MIN-VALUE(state, α , β) returns a utility value if TERMINAL-TEST(state) then return UTILITY(state) v ← + ∞ for each a in ACTIONS(state) do v ← MIN(v, MAX-VALUE(RESULT(state,a), α , β)) if v ≤ α then return v β ← MIN(β ,v) return v

slide-29
SLIDE 29

Alpha-beta pruning

n When enough is known about

a node n, it can be pruned.

29

slide-30
SLIDE 30

Final Comments about Alpha-Beta Pruning

n Pruning does not affect final results n Entire subtrees can be pruned, not just leaves. n Good move ordering improves effectiveness of pruning n With “perfect ordering,” time complexity is O(bm/2)

q Effective branching factor of sqrt(b) q Consequence: alpha-beta pruning can look twice as

deep as minimax in the same amount of time

30

slide-31
SLIDE 31

Is this practical?

n Minimax and alpha-beta pruning still have exponential

complexity.

n May be impractical within a reasonable amount of time. n SHANNON (1950):

q Terminate search at a lower depth q Apply heuristic evaluation function EVAL instead of the

UTILITY function

31

slide-32
SLIDE 32

Cutting off search

n Change:

q if TERMINAL-TEST(state) then return UTILITY(state)

into

q if CUTOFF-TEST(state,depth) then return EVAL(state)

n Introduces a fixed-depth limit depth

q Selected so that the amount of time will not exceed what the

rules of the game allow.

n When cuttoff occurs, the evaluation is performed.

32

slide-33
SLIDE 33

Heuristic EVAL

n Idea: produce an estimate of the expected utility of the game

from a given position.

n Performance depends on quality of EVAL. n Requirements:

q EVAL should order terminal-nodes in the same way as UTILITY. q Fast to compute. q For non-terminal states the EVAL should be strongly correlated

with the actual chance of winning.

33

slide-34
SLIDE 34

Heuristic EVAL example

34

Eval(s) = w1 f1(s) + w2 f2(s) + … + wn fn(s)

In chess: w1 material + w2 mobility + w3 king safety + w4 center control + …

slide-35
SLIDE 35

How good are computers…

n Let’s look at the state of the art computer

programs that play games such as chess, checkers, othello, go…

35

slide-36
SLIDE 36

Checkers

n Chinook: the first program to win the world

champion title in a competition against a human (1994)

36

slide-37
SLIDE 37

Chinook

n Components of Chinook:

q Search (variant of alpha-beta). Search space has 1020

states.

q Evaluation function q Endgame database (for all states with 4 vs. 4 pieces;

roughly 444 billion positions).

q Opening book - a database of opening moves

n Chinook can determine the final result of the game within

the first 10 moves.

n 2007: Checkers is solved. Perfect play leads to a draw.

37

Jonathan Schaeffer, Neil Burch, Yngvi Bjornsson, Akihiro Kishimoto, Martin Muller, Rob Lake, Paul Lu and Steve Sutphen. "Checkers is Solved," Science, 2007. http://www.cs.ualberta.ca/~chinook/publications/solving_checkers.html

slide-38
SLIDE 38

Chess

n 1997: Deep Blue wins a 6-

game match against Garry Kasparov

n Searches using iterative deepening

alpha-beta; evaluation function has

  • ver 8000 features; opening book
  • f 4000 positions; end game

database.

n FRITZ plays world champion,

Vladimir Kramnik; wins 6- game match.

38

slide-39
SLIDE 39

Othello

n The best Othello computer programs can

easily defeat the best humans (e.g. Logistello, 1997).

39

slide-40
SLIDE 40

Go

n Go: humans still much better! (circa 2014)

40

slide-41
SLIDE 41

And then came AlphaGo

n AlphaGo: Google's DeepMind created a program that

was able to beat top human players

41

slide-42
SLIDE 42

And then came AlphaGo

n AlphaGo: Google's DeepMind created a program that

was able to beat top human players

n Uses a combination of methods: reinforcement

learning, deep convolutional networks, and Monte Carlo tree search

42

slide-43
SLIDE 43

AlphaGo Zero

n AlphaGo Zero was trained from scratch just

by playing against itself

43

ARTICLE

doi:10.1038/nature24270

Mastering the game of Go without human knowledge

David Silver1*, Julian Schrittwieser1*, Karen Simonyan1*, Ioannis Antonoglou1, Aja Huang1, Arthur Guez1, Thomas Hubert1, Lucas Baker1, Matthew Lai1, Adrian Bolton1, Yutian Chen1, Timothy Lillicrap1, Fan Hui1, Laurent Sifre1, George van den Driessche1, Thore Graepel1 & Demis Hassabis1

slide-44
SLIDE 44

Games that include chance

n Possible moves (5-10,5-11), (5-11,19-24),(5-10,10-16) and

(5-11,11-16)

44

slide-45
SLIDE 45

Games that include chance

n Possible moves (5-10,5-11), (5-11,19-24),(5-10,10-16) and

(5-11,11-16)

n [1,1],…,[6,6] probability 1/36, all others - 1/18 n Can not calculate definite minimax value, only expected value

45

chance nodes

slide-46
SLIDE 46

Expected minimax value

EXPECTIMINIMAX(s)= UTILITY(s) If s is a terminal maxa EXPECTIMINIMAX(RESULT(s,a)) If PLAYER(S)=MAX mina EXPECTIMINIMAX(RESULT(s,a)) If PLAYER(S)=MIN ∑r P(r) EXPECTIMINIMAX(RESULT(s,r)) If PLAYER(S)=CHANCE

r is a chance event (e.g., a roll of the dice). These equations can be propagated recursively in a similar way to the MINIMAX algorithm.

46

slide-47
SLIDE 47

TD-Gammon (Tesauro, 1994)

47

World class program based on a combination of reinforcement Learning, neural networks and alpha-beta pruning to 3 plies. Move analyses by TD-Gammon have led to some changes in accepted strategies. White’s turn, with a roll of 4-4

http://www.research.ibm.com/massive/tdl.html

slide-48
SLIDE 48

Summary

n Games are fun n Can be played very well by computers n They illustrate important points about AI

q Perfection is (usually) unattainable -> approximation

48