N I V E U R S E I H T T Y O H F G R E U D I B N Adversarial Search R&N 5.1–5.5 Jacques Fleuriot University of Edinburgh, School of Informatics jdf@ed.ac.uk Jacques Fleuriot Adversarial Search, R&N 5.1–5.5 1/25
Overview N I V E U R S E I H T T Y O H F G R E U D I B N Perfect play α – β pruning Resource limits Games of chance Jacques Fleuriot Adversarial Search, R&N 5.1–5.5 2/25
Games vs. search problems N I V E U R S E I H T T Y O H F G R E U D I B N A game can be formally defined as a kind of search problem: S 0 : The initial state, which specifies how the game is set up at the start. PLAYER( s ): Defines which player has the move in a state. ACTIONS( s ): Returns the set of legal moves in a state. RESULT( s , a ): The transition model, which defines the result of a move. TERMINAL-TEST( s ): which is true when the game is over and false otherwise. States where the game has ended are called terminal states. UTILITY( s , p ): A utility function (objective or payoff), defines the final numeric value for a game that ends in terminal state s for a player p . In chess, the outcome is a win (1), loss (0), or draw (1/2). “Unpredictable” opponent ⇒ solution is a strategy Time limits ⇒ unlikely to find goal, must approximate Jacques Fleuriot Adversarial Search, R&N 5.1–5.5 3/25
Types of games N I V E U R S E I H T T Y O H F G R E U D I B N Jacques Fleuriot Adversarial Search, R&N 5.1–5.5 4/25
Game tree (2-player, deterministic, turns) N I V E U R S E I H T T Y O H F G R E U D I B N MAX ( X ) X X X MIN ( O ) X X X X X X X O X O X . . . MAX ( X ) O X O X X O X O . . . MIN ( O ) X X . . . . . . . . . . . . . . . X O X X O X X O X TERMINAL O X O O X X O X X O X O O Utility –1 0 +1 Utility for each terminal state is from MAX’s point of view. Jacques Fleuriot Adversarial Search, R&N 5.1–5.5 5/25
Optimal Decisions N I V E U R S E I H T T Y O H F G R E U D I B Normal search: optimal decision is a sequence of actions leading to N a goal state (i.e. a winning terminal state) Adversarial search: MIN has a say in game MAX needs to find a contingent strategy that specifies: MAX’s move in initial state then... MAX’s moves in states resulting from every response by MIN to the move then... MAX’s moves in states resulting from every response by MIN to all those moves, etc... minimax value of a node = utility for MAX of being in corresponding state: MINIMAX( s ) = UTILITY( s ) if TERMINAL-TEST( s ) max a ∈ Actions ( s ) MINIMAX(RESULT( s , a )) if PLAYER( s ) = MAX min a ∈ Actions ( s ) MINIMAX(RESULT( s , a )) if PLAYER( s ) = MIN Jacques Fleuriot Adversarial Search, R&N 5.1–5.5 6/25
Minimax N I V E U R S E I H T T Y O H F G R E U Perfect play for deterministic, perfect-information games D I B N Idea: choose move to position with highest minimax value = best achievable payoff against best play Example: 2-ply game: 3 MAX A 1 A 2 A 3 3 2 2 MIN A 31 A 33 A 11 A 12 A 13 A 21 A 22 A 23 A 32 3 12 8 2 4 6 14 5 2 Idea: Proceed all the way down to the leaves of the tree then minimax values are backed up through tree Jacques Fleuriot Adversarial Search, R&N 5.1–5.5 7/25
Minimax algorithm N I V E U R S E I H T T Y O H F G R E U D I B N function MINIMAX-DECISION( state ) returns an action return argmax a ∈ ACTIONS ( s ) MIN-VALUE(RESULT( state , a )) function MAX-VALUE( state ) returns a utility value if TERMINAL-TEST( state ) then return UTILITY( state ) v ← −∞ for each a in ACTIONS( state ) do v ← MAX( v , MIN-VALUE(RESULT( state , a ))) return v function MIN-VALUE( state ) returns a utility value if TERMINAL-TEST( state ) then return UTILITY( state ) v ← ∞ for each a in ACTIONS( state ) do v ← MIN( v , MAX-VALUE(RESULT( state , a ))) return v Jacques Fleuriot Adversarial Search, R&N 5.1–5.5 8/25
Properties of minimax N I V E U R S E I H T T Y O H F G R E U D I B N Complete? Optimal? Time complexity? Space complexity? Jacques Fleuriot Adversarial Search, R&N 5.1–5.5 9/25
Properties of minimax N I V E U R S E I H T T Y O H F G R E U D I B N Complete? Yes, if tree is finite Optimal? Time complexity? Space complexity? Jacques Fleuriot Adversarial Search, R&N 5.1–5.5 9/25
Properties of minimax N I V E U R S E I H T T Y O H F G R E U D I B N Complete? Yes, if tree is finite Optimal? Yes, against an optimal opponent. Otherwise? Time complexity? Space complexity? Jacques Fleuriot Adversarial Search, R&N 5.1–5.5 9/25
Properties of minimax N I V E U R S E I H T T Y O H F G R E U D I B N Complete? Yes, if tree is finite Optimal? Yes, against an optimal opponent. Otherwise? Time complexity? O ( b m ) Space complexity? Jacques Fleuriot Adversarial Search, R&N 5.1–5.5 9/25
Properties of minimax N I V E U R S E I H T T Y O H F G R E U D I B N Complete? Yes, if tree is finite Optimal? Yes, against an optimal opponent. Otherwise? Time complexity? O ( b m ) Space complexity? O ( bm ) (depth-first exploration) Jacques Fleuriot Adversarial Search, R&N 5.1–5.5 9/25
Properties of minimax N I V E U R S E I H T T Y O H F G R E U D I B N Complete? Yes, if tree is finite Optimal? Yes, against an optimal opponent. Otherwise? Time complexity? O ( b m ) Space complexity? O ( bm ) (depth-first exploration) For chess, b ≈ 35, m ≈ 100 for “reasonable” games ⇒ exact solution completely infeasible! ⇒ would like to eliminate (large) parts of game tree Jacques Fleuriot Adversarial Search, R&N 5.1–5.5 9/25
A Prolog implementation of Minimax 1 N I V E U R S E I H T T Y O H F G R E U D I B N minimax(Pos, BestNextPos, Val) :- bagof(NextPos, move(Pos, NextPos), NextPosList), bestmove(NextPosList, BestNextPos, Val), !. minimax(Pos, _, Val) :- utility(Pos, Val). bestmove([Pos], Pos, Val) :- minimax(Pos, _, Val), !. bestmove([Pos1 | PosList], BestPos, BestVal) :- minimax(Pos1, _, Val1), bestmove(PosList, Pos2, Val2), betterOf(Pos1, Val1, Pos2, Val2, BestPos, BestVal). betterOf(Pos0, Val0, _, Val1, Pos0, Val0) :- min_to_move(Pos0), Val0 > Val1, ! ; max_to_move(Pos0), Val0 < Val1, !. betterOf(_, _, Pos1, Val1, Pos1, Val1). 1Algorithm adapted from Prolog Programming for Artificial Intelligence by Bratko Jacques Fleuriot Adversarial Search, R&N 5.1–5.5 10/25
α – β pruning N I V E U R S E I H T T Y O H F G R E U D I B N It is possible to compute the correct minimax decision without looking at every node in the game tree. When applied to a standard minimax tree, α – β pruning returns the same move as minimax would, but prunes away branches that cannot possibly influence the final decision Jacques Fleuriot Adversarial Search, R&N 5.1–5.5 11/25
α – β pruning example N I V E U R S E I H T T Y O H F G R E U D I B N 3 MAX 3 MIN 3 12 8 Jacques Fleuriot Adversarial Search, R&N 5.1–5.5 12/25
α – β pruning example N I V E U R S E I H T T Y O H F G R E U D I B N 3 MAX 3 2 MIN X X 3 12 8 2 Jacques Fleuriot Adversarial Search, R&N 5.1–5.5 12/25
α – β pruning example N I V E U R S E I H T T Y O H F G R E U D I B N 3 MAX 3 2 14 MIN X X 3 12 8 2 14 Jacques Fleuriot Adversarial Search, R&N 5.1–5.5 12/25
α – β pruning example N I V E U R S E I H T T Y O H F G R E U D I B N 3 MAX 3 2 14 5 MIN X X 3 12 8 2 14 5 Jacques Fleuriot Adversarial Search, R&N 5.1–5.5 12/25
α – β pruning example N I V E U R S E I H T T Y O H F G R E U D I B N 3 3 MAX 3 2 14 5 2 MIN X X 3 12 8 2 14 5 2 Jacques Fleuriot Adversarial Search, R&N 5.1–5.5 12/25
Properties of α – β N I V E U R S E I H T T Y O H F G R E U D I B N Pruning does not affect final result (as we saw for the example) Good move ordering improves effectiveness of pruning (how could the tree in the example be better?) With “perfect ordering,” time complexity = O ( b m / 2 ) √ branching factor goes from b to b (alternative view) doubles depth of search compared to minimax A simple example of the value of reasoning about which computations are relevant (a form of metareasoning ) Jacques Fleuriot Adversarial Search, R&N 5.1–5.5 13/25
Why is it called α – β ? N I V E U R S E I H T T Y O H F G R E U D I B N MAX MIN .. .. .. MAX MIN V α is the value of the best (i.e., highest-value) choice found so far at any choice point along the path for MAX If v is worse than α , MAX will avoid it ⇒ prune that branch Define β similarly for MIN Jacques Fleuriot Adversarial Search, R&N 5.1–5.5 14/25
Recommend
More recommend