Game playing Chapter 5 Chapter 5 1 Outline Games Perfect play - - PowerPoint PPT Presentation

game playing
SMART_READER_LITE
LIVE PREVIEW

Game playing Chapter 5 Chapter 5 1 Outline Games Perfect play - - PowerPoint PPT Presentation

Game playing Chapter 5 Chapter 5 1 Outline Games Perfect play minimax decisions pruning Resource limits and approximate evaluation Games of chance Games of imperfect information Chapter 5 2 Games


slide-1
SLIDE 1

Game playing

Chapter 5

Chapter 5 1

slide-2
SLIDE 2

Outline

♦ Games ♦ Perfect play – minimax decisions – α–β pruning ♦ Resource limits and approximate evaluation ♦ Games of chance ♦ Games of imperfect information

Chapter 5 2

slide-3
SLIDE 3

Games

Reminder: Multi-agent environment is an environment in which each agent needs to consider the actions of other agents and how they affect its own welfare. In AI, the most common games are of a rather specialized kind: deterministic, turn-taking, two-player, zero-sum games of perfect information. For example, if one player wins a game of chess, the other player necessarily loses. Why games? The state of a game is easy to represent, and agents are usually restricted to a small number of actions whose outcomes are defined by precise rules. With the exception of robot soccer, physical games have not attracted much interest in the AI community.

Chapter 5 3

slide-4
SLIDE 4

Games formulation

We first consider games with two players, whom we call MAX and MIN. MAX moves first, and then they take turns moving until the game is over. A game can be formally defined as a kind of search problem with the following elements: ♦ S0 : The initial state, which specifies how the game is set up at the start. ♦ PLAY ER(s): Defines which player has the move in a state. ♦ ACTIONS(s): Returns the set of legal moves in a state. ♦ RESULT(s, a): The transition model, which defines the result of a move.

Chapter 5 4

slide-5
SLIDE 5

Games formulation

♦ TERMINAL − TEST(s): A terminal test, which is true when the game is over and false otherwise. States where the game has ended are called terminal states. ♦ UTILITY (s, p): A utility function (also called an objective function), defines the final numeric value for a game that ends in terminal state s for a player p. → In chess, the outcome is a win, loss, or draw, with values +1, 0, or

  • 12. Zero-sum game?? Constant-sum would have been a better term, but

zero-sum is traditional.

Chapter 5 5

slide-6
SLIDE 6

Game tree

A tree where the nodes are game states and the edges are moves.

X X X X X X X X X X X O O X O O X O X O X . . . . . . . . . . . . . . . . . . . . . X X

–1 +1

X X X X O X X O X X O O O X X X O O O O O X X

MAX (X) MIN (O) MAX (X) MIN (O) TERMINAL Utility

Chapter 5 6

slide-7
SLIDE 7

Types of games

deterministic chance perfect information imperfect information chess, checkers, go, othello backgammon monopoly bridge, poker battleships, blind tictactoe

Chapter 5 7

slide-8
SLIDE 8

Minimax

Perfect play for deterministic, perfect-information games Idea: choose move to position with highest minimax value = best achievable payoff against best play E.g., 2-ply game:

MAX

3 12 8 6 4 2 14 5 2

MIN

3

A 1 A 3 A 2

A 13 A 12 A 11 A 21 A 23 A 22 A 33 A 32 A 31

3 2 2

Chapter 5 8

slide-9
SLIDE 9

Minimax algorithm

function Minimax-Decision(state) returns an action inputs: state, current state in game return the a in Actions(state) maximizing Min-Value(Result(a,state)) function Max-Value(state) returns a utility value if Terminal-Test(state) then return Utility(state) v ← −∞ for a, s in Successors(state) do v ← Max(v, Min-Value(s)) return v function Min-Value(state) returns a utility value if Terminal-Test(state) then return Utility(state) v ← ∞ for a, s in Successors(state) do v ← Min(v, Max-Value(s)) return v

Chapter 5 9

slide-10
SLIDE 10

Properties of minimax

Complete?? Yes, if tree is finite (chess has specific rules for this) Optimal?? Yes, against an optimal opponent. Otherwise?? Time complexity?? O(bm) Space complexity?? O(bm) (depth-first exploration) For chess, b ≈ 35, m ≈ 100 for “reasonable” games ⇒ exact solution completely infeasible But do we need to explore every path?

Chapter 5 10

slide-11
SLIDE 11

Optimal decisions in multiplayer games

Many popular games allow more than two players. How to extend the con- cepts of MiniMax algorithm to those? The single value for each node is replaced with a vector of values. For example, in a three-player game with players A, B, and C, a vector < vA, vB, vC > is associated with each node. For terminal states: Design a utility function that returns a vector of values. For non-terminal states: How to compute the value of each parent node from the values of its child?

Chapter 5 11

slide-12
SLIDE 12

Optimal decisions in multiplayer games

to move A B C A

(1, 2, 6) (4, 2, 3) (6, 1, 2) (7, 4,1) (5,1,1) (1, 5, 2) (7, 7,1) (5, 4, 5) (1, 2, 6) (6, 1, 2) (1, 5, 2) (5, 4, 5) (1, 2, 6) (1, 5, 2) (1, 2, 6)

X

Chapter 5 12

slide-13
SLIDE 13

α–β pruning The problem with minimax search is that the number of game states it has to examine is exponential in the depth of the tree. α–β pruning technique can effectively cut it in half. It cant eliminate the exponent.

Chapter 5 13

slide-14
SLIDE 14

α–β pruning example

MAX

3 12 8

MIN

3 3

Chapter 5 14

slide-15
SLIDE 15

α–β pruning example

MAX

3 12 8

MIN

3 2 2 X X 3

Chapter 5 15

slide-16
SLIDE 16

α–β pruning example

MAX

3 12 8

MIN

3 2 2 X X 14 14 3

Chapter 5 16

slide-17
SLIDE 17

α–β pruning example

MAX

3 12 8

MIN

3 2 2 X X 14 14 5 5 3

Chapter 5 17

slide-18
SLIDE 18

α–β pruning example

MAX

3 12 8

MIN

3 3 2 2 X X 14 14 5 5 2 2 3

Chapter 5 18

slide-19
SLIDE 19

The α–β algorithm

Chapter 5 19

slide-20
SLIDE 20

α–β pruning

.. .. .. MAX MIN MAX MIN V

α is the best value (to max) found so far off the current path If V is worse than α, max will avoid it ⇒ prune that branch Define β similarly for min

Chapter 5 20

slide-21
SLIDE 21

Properties of α–β

The effectiveness of alphabeta pruning is highly dependent on the order in which the states are examined.

MAX

3 12 8

MIN

3 3 2 2 X X 14 14 5 5 2 2 3

This suggests that it might be worthwhile to try to examine first the succes- sors that are likely to be best (Obviously, it cannot be done.) With “perfect ordering,” time complexity = O(bm/2) ⇒ doubles solvable depth

Chapter 5 21

slide-22
SLIDE 22

Resource limits

Standard approach:

  • Use Cutoff-Test instead of Terminal-Test

e.g., depth limit

  • Use Eval instead of Utility

i.e., evaluation function that estimates desirability of position Suppose we have 100 seconds, explore 104 nodes/second ⇒ 106 nodes per move ≈ 358/2 ⇒ α–β reaches depth 8 ⇒ pretty good chess program

Chapter 5 22

slide-23
SLIDE 23

Evaluation functions

An evaluation function returns an estimate of the expected utility of the game from a given position. The performance of a game-playing program depends strongly on the quality

  • f its evaluation function.

What is the properties of a good evaluation function? (1) The evaluation function should order the terminal states in the same way as the true utility function. (2) The computation must not take too long! (3) For non-terminal states, the evaluation function should be strongly cor- related with the actual chances of winning.

Chapter 5 23

slide-24
SLIDE 24

Evaluation functions

Black to move White slightly better White to move Black winning

For chess, typically linear weighted sum of features Eval(s) = w1f1(s) + w2f2(s) + . . . + wnfn(s) e.g., w1 = 9 with f1(s) = (number of white queens) – (number of black queens), etc.

Chapter 5 24

slide-25
SLIDE 25

Cutting off search

(b) White to move (a) White to move

The evaluation function should be applied only to positions that are quies- centthat is, unlikely to exhibit wild swings in value in the near future. Non-quiescent positions can be expanded further until quiescent positions are reached. This extra search is called a quiescence search.

Chapter 5 25

slide-26
SLIDE 26

Some other techniques to improve performance

♦ Using transposition table: It is worthwhile to store the evaluation of the resulting position in a hash table the first time it is encountered so that we dont have to recompute it on subsequent occurrences. ♦ Forward pruning: On each turn, consider only a beam of the n best moves (according to the evaluation function) rather than considering all possible moves. ♦ Table lookup: Specifically for the opening and ending of games. Use table look up at the first and the switch to search to continue. Near the end

  • f the game there are again fewer possible positions, and thus more chance

to do lookup. In 2016, Bourzutschky solved all pawn-less six-piece. there is a KQNKRBN endgame that with best play requires 517 moves until a capture, which then leads to a mate.

Chapter 5 26

slide-27
SLIDE 27

Digression: Exact values don’t matter

MIN MAX

2 1 1 4 2 2 20 1 1 400 20 20

Behaviour is preserved under any monotonic transformation of Eval Only the order matters...

Chapter 5 27

slide-28
SLIDE 28

Nondeterministic games: backgammon

1 2 3 4 5 6 7 8 9 10 11 12 24 23 22 21 20 19 18 17 16 15 14 13 25

Chapter 5 28

slide-29
SLIDE 29

Nondeterministic games in general

In nondeterministic games, chance introduced by dice, card-shuffling Simplified example with coin-flipping:

MIN MAX

2

CHANCE

4 7 4 6 5 −2 2 4 −2 0.5 0.5 0.5 0.5 3 −1

Chapter 5 29

slide-30
SLIDE 30

Algorithm for nondeterministic games

Expectiminimax gives perfect play Just like Minimax, except we must also handle chance nodes: . . . if state is a Max node then return the highest ExpectiMinimax-Value of Successors(state) if state is a Min node then return the lowest ExpectiMinimax-Value of Successors(state) if state is a chance node then return average of ExpectiMinimax-Value of Successors(state) . . .

Chapter 5 30

slide-31
SLIDE 31

Nondeterministic games in practice

Time complexity: O(bmnm), where n is the number of distinct rolls. Dice rolls increase b: 21 possible rolls with 2 dice Backgammon ≈ 20 legal moves (can be 6,000 with 1-1 roll) depth 4 = 20 × (21 × 20)3 ≈ 1.2 × 109 As depth increases, probability of reaching a given node shrinks ⇒ value of lookahead is diminished α–β pruning is much less effective TDGammon uses depth-2 search + very good Eval ≈ world-champion level

Chapter 5 31

slide-32
SLIDE 32

Digression: Exact values DO matter

DICE MIN MAX

2 2 3 3 1 1 4 4 2 3 1 4 .9 .1 .9 .1 2.1 1.3 20 20 30 30 1 1 400 400 20 30 1 400 .9 .1 .9 .1 21 40.9

Behaviour is preserved only by positive linear transformation of Eval

Chapter 5 32

slide-33
SLIDE 33

Pruning in nondeterministic game trees

A version of α-β pruning is possible:

0.5 0.5 [ − , + ] [ − , + ] [ − , + ] 0.5 0.5 [ − , + ] [ − , + ] [ − , + ]

Chapter 5 33

slide-34
SLIDE 34

Pruning in nondeterministic game trees

A version of α-β pruning is possible:

2 [ − , 2 ] 0.5 0.5 [ − , + ] [ − , + ] [ − , + ] 0.5 0.5 [ − , + ] [ − , + ]

Chapter 5 34

slide-35
SLIDE 35

Pruning in nondeterministic game trees

A version of α-β pruning is possible:

0.5 0.5 [ − , + ] [ − , + ] [ − , + ] 0.5 0.5 [ − , + ] [ − , + ] 2 2 [ 2 , 2 ]

Chapter 5 35

slide-36
SLIDE 36

Pruning in nondeterministic game trees

A version of α-β pruning is possible:

[ − , 2 ] 2 [ − , 2 ] 0.5 0.5 [ − , + ] [ − , + ] [ − , + ] 0.5 0.5 2 2 [ 2 , 2 ]

Chapter 5 36

slide-37
SLIDE 37

Pruning in nondeterministic game trees

A version of α-β pruning is possible:

2 0.5 0.5 [ − , + ] [ − , + ] [ − , + ] 0.5 0.5 2 2 [ 2 , 2 ] 1 [ 1 , 1 ] [ 1.5 , 1.5 ]

Chapter 5 37

slide-38
SLIDE 38

Pruning in nondeterministic game trees

A version of α-β pruning is possible:

[ − , 0 ] 2 0.5 0.5 [ − , + ] [ − , + ] 0.5 0.5 2 2 [ 2 , 2 ] 1 [ 1 , 1 ] [ 1.5 , 1.5 ]

Chapter 5 38

slide-39
SLIDE 39

Pruning in nondeterministic game trees

A version of α-β pruning is possible:

2 0.5 0.5 [ − , + ] [ − , + ] 0.5 0.5 2 2 [ 2 , 2 ] 1 [ 1 , 1 ] [ 1.5 , 1.5 ] 1 [ 0 , 0 ]

Chapter 5 39

slide-40
SLIDE 40

Pruning in nondeterministic game trees

A version of α-β pruning is possible:

[ − , 0.5 ] [ − , 1 ] 2 0.5 0.5 0.5 0.5 2 2 [ 2 , 2 ] 1 [ 1 , 1 ] [ 1.5 , 1.5 ] 1 [ 0 , 0 ] 1

Chapter 5 40

slide-41
SLIDE 41

Pruning contd.

More pruning occurs if we can bound the leaf values

0.5 0.5 0.5 0.5 [ −2 , 2 ] [ −2 , 2 ] [ −2 , 2 ] [ −2 , 2 ] [ −2 , 2 ] [ −2 , 2 ]

Chapter 5 41

slide-42
SLIDE 42

Pruning contd.

More pruning occurs if we can bound the leaf values

0.5 0.5 0.5 0.5 [ −2 , 2 ] [ −2 , 2 ] [ −2 , 2 ] [ −2 , 2 ] [ −2 , 2 ] [ −2 , 2 ] 2

Chapter 5 42

slide-43
SLIDE 43

Pruning contd.

More pruning occurs if we can bound the leaf values

0.5 0.5 0.5 0.5 [ −2 , 2 ] [ −2 , 2 ] [ −2 , 2 ] [ −2 , 2 ] 2 2 [ 2 , 2 ] [ 0 , 2 ]

Chapter 5 43

slide-44
SLIDE 44

Pruning contd.

More pruning occurs if we can bound the leaf values

0.5 0.5 0.5 0.5 [ −2 , 2 ] [ −2 , 2 ] [ −2 , 2 ] [ −2 , 2 ] 2 2 [ 2 , 2 ] [ 0 , 2 ] 2

Chapter 5 44

slide-45
SLIDE 45

Pruning contd.

More pruning occurs if we can bound the leaf values

0.5 0.5 0.5 0.5 [ −2 , 2 ] [ −2 , 2 ] [ −2 , 2 ] 2 2 [ 2 , 2 ] 2 1 [ 1 , 1 ] [ 1.5 , 1.5 ]

Chapter 5 45

slide-46
SLIDE 46

Pruning contd.

More pruning occurs if we can bound the leaf values

0.5 0.5 0.5 0.5 [ −2 , 2 ] 2 2 [ 2 , 2 ] 2 1 [ 1 , 1 ] [ 1.5 , 1.5 ] [ −2 , 0 ] [ −2 , 1 ]

Chapter 5 46

slide-47
SLIDE 47

Partially observable games

Kriegspiel: A partially observable variant of chess in which pieces can move but are completely invisible to the opponent. White and Black each see a board containing only their own pieces. A referee, who can see all the pieces, adjudicates the game and periodically makes announcements that are heard by both players (legal/illegal moves, captures, mate in one direction, checkmate)

Chapter 5 47

slide-48
SLIDE 48

Partially observable games

Recall belief state: the set of all logically possible board states given the complete history of percepts to date. A winning strategy, or guaranteed checkmate, is one that, for each pos- sible percept sequence, leads to an actual checkmate for every possible board state in the current belief state, regardless of how the opponent moves. If a guaranteed checkmate found, the opponent will lose even if he/she can see all the pieces.

Chapter 5 48

slide-49
SLIDE 49

Example of guaranteed checkmate

Chapter 5 49

slide-50
SLIDE 50

Probabilistic checkmate

Such checkmates are still required to work in every board state in the belief state; they are probabilistic with respect to randomization of the winning players moves. To get the basic idea, consider the problem of finding a lone black king using just the white king. Simply by moving randomly, the white king will eventually bump into the black king even if the latter tries to avoid this fate, since Black cannot keep guessing the right evasive moves indefinitely. In the terminology of probability theory, detection occurs with probability 1. Example: The KBNK endgame What about The KBBK endgame?

Chapter 5 50

slide-51
SLIDE 51

Summary

Games are fun to work on! (and dangerous) They illustrate several important points about AI ♦ perfection is unattainable ⇒ must approximate ♦ good idea to think about what to think about ♦ uncertainty constrains the assignment of values to states ♦ optimal decisions depend on information state, not real state Games are to AI as grand prix racing is to automobile design

Chapter 5 51