 
              Game playing Chapter 5 Chapter 5 1
Outline ♦ Games ♦ Perfect play – minimax decisions – α – β pruning ♦ Resource limits and approximate evaluation ♦ Games of chance ♦ Games of imperfect information Chapter 5 2
Games Reminder: Multi-agent environment is an environment in which each agent needs to consider the actions of other agents and how they affect its own welfare. In AI, the most common games are of a rather specialized kind: deterministic, turn-taking, two-player, zero-sum games of perfect information. For example, if one player wins a game of chess, the other player necessarily loses. Why games? The state of a game is easy to represent, and agents are usually restricted to a small number of actions whose outcomes are defined by precise rules. With the exception of robot soccer, physical games have not attracted much interest in the AI community. Chapter 5 3
Games formulation We first consider games with two players, whom we call MAX and MIN. MAX moves first, and then they take turns moving until the game is over. A game can be formally defined as a kind of search problem with the following elements: ♦ S 0 : The initial state, which specifies how the game is set up at the start. ♦ PLAY ER ( s ) : Defines which player has the move in a state. ♦ ACTIONS ( s ) : Returns the set of legal moves in a state. ♦ RESULT ( s, a ) : The transition model, which defines the result of a move. Chapter 5 4
Games formulation ♦ TERMINAL − TEST ( s ) : A terminal test, which is true when the game is over and false otherwise. States where the game has ended are called terminal states. ♦ UTILITY ( s, p ) : A utility function (also called an objective function), defines the final numeric value for a game that ends in terminal state s for a player p. → In chess, the outcome is a win, loss, or draw, with values +1, 0, or 12. Zero-sum game?? Constant-sum would have been a better term, but zero-sum is traditional. Chapter 5 5
Game tree A tree where the nodes are game states and the edges are moves. MAX ( X ) X X X MIN ( O ) X X X X X X X O X O X . . . MAX ( X ) O X O X X O X O . . . MIN ( O ) X X . . . . . . . . . . . . . . . X O X X O X X O X TERMINAL O X O O X X O X X O X O O Utility –1 0 +1 Chapter 5 6
Types of games deterministic chance perfect information chess, checkers, backgammon go, othello monopoly imperfect information battleships, bridge, poker blind tictactoe Chapter 5 7
Minimax Perfect play for deterministic, perfect-information games Idea: choose move to position with highest minimax value = best achievable payoff against best play E.g., 2-ply game: 3 MAX A 1 A 2 A 3 3 2 2 MIN A 21 A 22 A 31 A 32 A 33 A 11 A 12 A 13 A 23 3 12 8 2 4 6 14 5 2 Chapter 5 8
Minimax algorithm function Minimax-Decision ( state ) returns an action inputs : state , current state in game return the a in Actions ( state ) maximizing Min-Value ( Result ( a , state )) function Max-Value ( state ) returns a utility value if Terminal-Test ( state ) then return Utility ( state ) v ← −∞ for a, s in Successors ( state ) do v ← Max ( v , Min-Value ( s )) return v function Min-Value ( state ) returns a utility value if Terminal-Test ( state ) then return Utility ( state ) v ← ∞ for a, s in Successors ( state ) do v ← Min ( v , Max-Value ( s )) return v Chapter 5 9
Properties of minimax Complete?? Yes, if tree is finite (chess has specific rules for this) Optimal?? Yes, against an optimal opponent. Otherwise?? Time complexity?? O ( b m ) Space complexity?? O ( bm ) (depth-first exploration) For chess, b ≈ 35 , m ≈ 100 for “reasonable” games ⇒ exact solution completely infeasible But do we need to explore every path? Chapter 5 10
Optimal decisions in multiplayer games Many popular games allow more than two players. How to extend the con- cepts of MiniMax algorithm to those? The single value for each node is replaced with a vector of values. For example, in a three-player game with players A, B, and C, a vector < v A , v B , v C > is associated with each node. For terminal states: Design a utility function that returns a vector of values. For non-terminal states: How to compute the value of each parent node from the values of its child? Chapter 5 11
Optimal decisions in multiplayer games to move A (1, 2, 6) B (1, 2, 6) (1, 5, 2) C (1, 2, 6) (6, 1, 2) (1, 5, 2) (5, 4, 5) X A (1, 2, 6) (4, 2, 3) (6, 1, 2) (7, 4,1) (5,1,1) (1, 5, 2) (7, 7,1) (5, 4, 5) Chapter 5 12
α – β pruning The problem with minimax search is that the number of game states it has to examine is exponential in the depth of the tree. α – β pruning technique can effectively cut it in half. It cant eliminate the exponent. Chapter 5 13
α – β pruning example 3 MAX 3 MIN 3 12 8 Chapter 5 14
α – β pruning example 3 MAX 3 2 MIN X X 3 12 8 2 Chapter 5 15
α – β pruning example 3 MAX 3 2 14 MIN X X 3 12 8 2 14 Chapter 5 16
α – β pruning example 3 MAX 2 14 5 3 MIN X X 3 12 8 2 14 5 Chapter 5 17
α – β pruning example 3 3 MAX 3 2 14 5 2 MIN X X 3 12 8 2 14 5 2 Chapter 5 18
The α – β algorithm Chapter 5 19
α – β pruning MAX MIN .. .. .. MAX MIN V α is the best value (to max ) found so far off the current path If V is worse than α , max will avoid it ⇒ prune that branch Define β similarly for min Chapter 5 20
Properties of α – β The effectiveness of alphabeta pruning is highly dependent on the order in which the states are examined. 3 3 MAX 2 14 5 2 3 MIN X X 3 12 8 2 14 5 2 This suggests that it might be worthwhile to try to examine first the succes- sors that are likely to be best (Obviously, it cannot be done.) With “perfect ordering,” time complexity = O ( b m/ 2 ) ⇒ doubles solvable depth Chapter 5 21
Resource limits Standard approach: • Use Cutoff-Test instead of Terminal-Test e.g., depth limit • Use Eval instead of Utility i.e., evaluation function that estimates desirability of position Suppose we have 100 seconds, explore 10 4 nodes/second ⇒ 10 6 nodes per move ≈ 35 8 / 2 ⇒ α – β reaches depth 8 ⇒ pretty good chess program Chapter 5 22
Evaluation functions An evaluation function returns an estimate of the expected utility of the game from a given position. The performance of a game-playing program depends strongly on the quality of its evaluation function. What is the properties of a good evaluation function? (1) The evaluation function should order the terminal states in the same way as the true utility function. (2) The computation must not take too long! (3) For non-terminal states, the evaluation function should be strongly cor- related with the actual chances of winning. Chapter 5 23
Evaluation functions Black to move White to move White slightly better Black winning For chess, typically linear weighted sum of features Eval ( s ) = w 1 f 1 ( s ) + w 2 f 2 ( s ) + . . . + w n f n ( s ) e.g., w 1 = 9 with f 1 ( s ) = (number of white queens) – (number of black queens), etc. Chapter 5 24
Cutting off search (a) White to move (b) White to move The evaluation function should be applied only to positions that are quies- centthat is, unlikely to exhibit wild swings in value in the near future. Non-quiescent positions can be expanded further until quiescent positions are reached. This extra search is called a quiescence search . Chapter 5 25
Some other techniques to improve performance ♦ Using transposition table : It is worthwhile to store the evaluation of the resulting position in a hash table the first time it is encountered so that we dont have to recompute it on subsequent occurrences. ♦ Forward pruning : On each turn, consider only a beam of the n best moves (according to the evaluation function) rather than considering all possible moves. ♦ Table lookup : Specifically for the opening and ending of games. Use table look up at the first and the switch to search to continue. Near the end of the game there are again fewer possible positions, and thus more chance to do lookup. In 2016, Bourzutschky solved all pawn-less six-piece. there is a KQNKRBN endgame that with best play requires 517 moves until a capture, which then leads to a mate. Chapter 5 26
Digression: Exact values don’t matter MAX MIN 1 2 1 20 1 2 2 4 1 20 20 400 Behaviour is preserved under any monotonic transformation of Eval Only the order matters... Chapter 5 27
Nondeterministic games: backgammon 5 0 1 2 3 4 6 7 8 9 10 11 12 25 24 23 22 21 20 19 18 17 16 15 14 13 Chapter 5 28
Nondeterministic games in general In nondeterministic games, chance introduced by dice, card-shuffling Simplified example with coin-flipping: MAX 3 −1 CHANCE 0.5 0.5 0.5 0.5 MIN 2 4 0 −2 2 4 7 4 6 0 5 −2 Chapter 5 29
Algorithm for nondeterministic games Expectiminimax gives perfect play Just like Minimax , except we must also handle chance nodes: . . . if state is a Max node then return the highest ExpectiMinimax-Value of Successors ( state ) if state is a Min node then return the lowest ExpectiMinimax-Value of Successors ( state ) if state is a chance node then return average of ExpectiMinimax-Value of Successors ( state ) . . . Chapter 5 30
Recommend
More recommend