Adversarial Search R&N 5.15.5 Jacques Fleuriot University of - - PowerPoint PPT Presentation

adversarial search
SMART_READER_LITE
LIVE PREVIEW

Adversarial Search R&N 5.15.5 Jacques Fleuriot University of - - PowerPoint PPT Presentation

N I V E U R S E I H T T Y O H F G R E U D I B N Adversarial Search R&N 5.15.5 Jacques Fleuriot University of Edinburgh, School of Informatics jdf@ed.ac.uk Jacques Fleuriot Adversarial Search, R&N 5.15.5


slide-1
SLIDE 1

T H E U N I V E R S I T Y O F E D I N B U R G H

Adversarial Search

R&N 5.1–5.5 Jacques Fleuriot

University of Edinburgh, School of Informatics jdf@ed.ac.uk

Jacques Fleuriot Adversarial Search, R&N 5.1–5.5 1/25

slide-2
SLIDE 2

T H E U N I V E R S I T Y O F E D I N B U R G H

Overview

Perfect play α–β pruning Resource limits Games of chance

Jacques Fleuriot Adversarial Search, R&N 5.1–5.5 2/25

slide-3
SLIDE 3

T H E U N I V E R S I T Y O F E D I N B U R G H

Games vs. search problems

A game can be formally defined as a kind of search problem:

S0: The initial state, which specifies how the game is set up at the start. PLAYER(s): Defines which player has the move in a state. ACTIONS(s): Returns the set of legal moves in a state. RESULT(s, a): The transition model, which defines the result of a move. TERMINAL-TEST(s): which is true when the game is over and false otherwise. States where the game has ended are called terminal states. UTILITY(s, p): A utility function (objective or payoff), defines the final numeric value for a game that ends in terminal state s for a player p. In chess, the outcome is a win (1), loss (0), or draw (1/2).

“Unpredictable” opponent ⇒ solution is a strategy Time limits ⇒ unlikely to find goal, must approximate

Jacques Fleuriot Adversarial Search, R&N 5.1–5.5 3/25

slide-4
SLIDE 4

T H E U N I V E R S I T Y O F E D I N B U R G H

Types of games

Jacques Fleuriot Adversarial Search, R&N 5.1–5.5 4/25

slide-5
SLIDE 5

T H E U N I V E R S I T Y O F E D I N B U R G H

Game tree (2-player, deterministic, turns)

X X X X X X X X X X X O O X O O X O X O X . . . . . . . . . . . . . . . . . . . . . X X

–1 +1

X X X X O X X O X X O O O X X X O O O O O X X

MAX (X) MIN (O) MAX (X) MIN (O) TERMINAL Utility

Utility for each terminal state is from MAX’s point of view.

Jacques Fleuriot Adversarial Search, R&N 5.1–5.5 5/25

slide-6
SLIDE 6

T H E U N I V E R S I T Y O F E D I N B U R G H

Optimal Decisions

Normal search: optimal decision is a sequence of actions leading to a goal state (i.e. a winning terminal state) Adversarial search: MIN has a say in game MAX needs to find a contingent strategy that specifies:

MAX’s move in initial state then... MAX’s moves in states resulting from every response by MIN to the move then... MAX’s moves in states resulting from every response by MIN to all those moves, etc...

minimax value of a node = utility for MAX of being in corresponding state:

MINIMAX(s) =      UTILITY(s) if TERMINAL-TEST(s) maxa∈Actions(s) MINIMAX(RESULT(s, a)) if PLAYER(s) = MAX mina∈Actions(s) MINIMAX(RESULT(s, a)) if PLAYER(s) = MIN

Jacques Fleuriot Adversarial Search, R&N 5.1–5.5 6/25

slide-7
SLIDE 7

T H E U N I V E R S I T Y O F E D I N B U R G H

Minimax

Perfect play for deterministic, perfect-information games Idea: choose move to position with highest minimax value = best achievable payoff against best play Example: 2-ply game:

MAX

3 12 8 6 4 2 14 5 2

MIN

3 A 1 A 3 A 2

A 13 A 12 A 11 A 21 A 23 A 22 A 33 A 32 A 31

3 2 2

Idea: Proceed all the way down to the leaves of the tree then minimax values are backed up through tree

Jacques Fleuriot Adversarial Search, R&N 5.1–5.5 7/25

slide-8
SLIDE 8

T H E U N I V E R S I T Y O F E D I N B U R G H

Minimax algorithm

function MINIMAX-DECISION(state) returns an action return argmaxa∈ACTIONS(s) MIN-VALUE(RESULT(state, a)) function MAX-VALUE(state) returns a utility value if TERMINAL-TEST(state) then return UTILITY(state) v ← −∞ for each a in ACTIONS(state) do v ← MAX(v, MIN-VALUE(RESULT(state, a))) return v function MIN-VALUE(state) returns a utility value if TERMINAL-TEST(state) then return UTILITY(state) v ← ∞ for each a in ACTIONS(state) do v ← MIN(v, MAX-VALUE(RESULT(state, a))) return v

Jacques Fleuriot Adversarial Search, R&N 5.1–5.5 8/25

slide-9
SLIDE 9

T H E U N I V E R S I T Y O F E D I N B U R G H

Properties of minimax

Complete? Optimal? Time complexity? Space complexity?

Jacques Fleuriot Adversarial Search, R&N 5.1–5.5 9/25

slide-10
SLIDE 10

T H E U N I V E R S I T Y O F E D I N B U R G H

Properties of minimax

Complete? Yes, if tree is finite Optimal? Time complexity? Space complexity?

Jacques Fleuriot Adversarial Search, R&N 5.1–5.5 9/25

slide-11
SLIDE 11

T H E U N I V E R S I T Y O F E D I N B U R G H

Properties of minimax

Complete? Yes, if tree is finite Optimal? Yes, against an optimal opponent. Otherwise? Time complexity? Space complexity?

Jacques Fleuriot Adversarial Search, R&N 5.1–5.5 9/25

slide-12
SLIDE 12

T H E U N I V E R S I T Y O F E D I N B U R G H

Properties of minimax

Complete? Yes, if tree is finite Optimal? Yes, against an optimal opponent. Otherwise? Time complexity? O(bm) Space complexity?

Jacques Fleuriot Adversarial Search, R&N 5.1–5.5 9/25

slide-13
SLIDE 13

T H E U N I V E R S I T Y O F E D I N B U R G H

Properties of minimax

Complete? Yes, if tree is finite Optimal? Yes, against an optimal opponent. Otherwise? Time complexity? O(bm) Space complexity? O(bm) (depth-first exploration)

Jacques Fleuriot Adversarial Search, R&N 5.1–5.5 9/25

slide-14
SLIDE 14

T H E U N I V E R S I T Y O F E D I N B U R G H

Properties of minimax

Complete? Yes, if tree is finite Optimal? Yes, against an optimal opponent. Otherwise? Time complexity? O(bm) Space complexity? O(bm) (depth-first exploration) For chess, b ≈ 35, m ≈ 100 for “reasonable” games ⇒ exact solution completely infeasible! ⇒ would like to eliminate (large) parts of game tree

Jacques Fleuriot Adversarial Search, R&N 5.1–5.5 9/25

slide-15
SLIDE 15

T H E U N I V E R S I T Y O F E D I N B U R G H

A Prolog implementation of Minimax 1

minimax(Pos, BestNextPos, Val) :- bagof(NextPos, move(Pos, NextPos), NextPosList), bestmove(NextPosList, BestNextPos, Val), !. minimax(Pos, _, Val) :- utility(Pos, Val). bestmove([Pos], Pos, Val) :- minimax(Pos, _, Val), !. bestmove([Pos1 | PosList], BestPos, BestVal) :- minimax(Pos1, _, Val1), bestmove(PosList, Pos2, Val2), betterOf(Pos1, Val1, Pos2, Val2, BestPos, BestVal). betterOf(Pos0, Val0, _, Val1, Pos0, Val0) :- min_to_move(Pos0), Val0 > Val1, ! ; max_to_move(Pos0), Val0 < Val1, !. betterOf(_, _, Pos1, Val1, Pos1, Val1).

1Algorithm adapted from Prolog Programming for Artificial Intelligence by Bratko

Jacques Fleuriot Adversarial Search, R&N 5.1–5.5 10/25

slide-16
SLIDE 16

T H E U N I V E R S I T Y O F E D I N B U R G H

α–β pruning

It is possible to compute the correct minimax decision without looking at every node in the game tree. When applied to a standard minimax tree, α–β pruning returns the same move as minimax would, but prunes away branches that cannot possibly influence the final decision

Jacques Fleuriot Adversarial Search, R&N 5.1–5.5 11/25

slide-17
SLIDE 17

T H E U N I V E R S I T Y O F E D I N B U R G H

α–β pruning example

MAX

3 12 8

MIN

3 3

Jacques Fleuriot Adversarial Search, R&N 5.1–5.5 12/25

slide-18
SLIDE 18

T H E U N I V E R S I T Y O F E D I N B U R G H

α–β pruning example

MAX

3 12 8

MIN

3 2 2 X X 3

Jacques Fleuriot Adversarial Search, R&N 5.1–5.5 12/25

slide-19
SLIDE 19

T H E U N I V E R S I T Y O F E D I N B U R G H

α–β pruning example

MAX

3 12 8

MIN

3 2 2 X X 14 14 3

Jacques Fleuriot Adversarial Search, R&N 5.1–5.5 12/25

slide-20
SLIDE 20

T H E U N I V E R S I T Y O F E D I N B U R G H

α–β pruning example

MAX

3 12 8

MIN

3 2 2 X X 14 14 5 5 3

Jacques Fleuriot Adversarial Search, R&N 5.1–5.5 12/25

slide-21
SLIDE 21

T H E U N I V E R S I T Y O F E D I N B U R G H

α–β pruning example

MAX

3 12 8

MIN

3 3 2 2 X X 14 14 5 5 2 2 3

Jacques Fleuriot Adversarial Search, R&N 5.1–5.5 12/25

slide-22
SLIDE 22

T H E U N I V E R S I T Y O F E D I N B U R G H

Properties of α–β

Pruning does not affect final result (as we saw for the example) Good move ordering improves effectiveness of pruning (how could the tree in the example be better?) With “perfect ordering,” time complexity = O(bm/2)

branching factor goes from b to √ b (alternative view) doubles depth of search compared to minimax

A simple example of the value of reasoning about which computations are relevant (a form of metareasoning)

Jacques Fleuriot Adversarial Search, R&N 5.1–5.5 13/25

slide-23
SLIDE 23

T H E U N I V E R S I T Y O F E D I N B U R G H

Why is it called α–β?

.. .. .. MAX MIN MAX MIN V

α is the value of the best (i.e., highest-value) choice found so far at any choice point along the path for MAX If v is worse than α, MAX will avoid it ⇒ prune that branch Define β similarly for MIN

Jacques Fleuriot Adversarial Search, R&N 5.1–5.5 14/25

slide-24
SLIDE 24

T H E U N I V E R S I T Y O F E D I N B U R G H

The α–β algorithm I

Basically Minimax + keep track of α, β + prune function ALPHA-BETA-SEARCH(state) returns an action v ← MAX-VALUE(state, −∞, +∞) return the action in ACTIONS(state) with value v function MAX-VALUE(state, α, β) returns a utility value if TERMINAL-TEST(state) then return UTILITY(state) v ← −∞ for each a in ACTIONS(state) do v ← MAX(v,MIN-VALUE(RESULT(state, a), α, β)) if v ≥ β then return v alpha ← MAX(α, v) return v

Jacques Fleuriot Adversarial Search, R&N 5.1–5.5 15/25

slide-25
SLIDE 25

T H E U N I V E R S I T Y O F E D I N B U R G H

The α–β algorithm II

function MIN-VALUE(state, α, β) returns a utility value if TERMINAL-TEST(state) then return UTILITY(state) v ← +∞ for each a in ACTIONS(state) do v ← MIN(v,MAX-VALUE(RESULT(state, a), α, β)) if v ≤ α then return v beta ← MIN(β, v) return v

Jacques Fleuriot Adversarial Search, R&N 5.1–5.5 16/25

slide-26
SLIDE 26

T H E U N I V E R S I T Y O F E D I N B U R G H

Resource limits

Suppose we have 100 seconds, explore 104 nodes/second ⇒ 106 nodes per move Standard approach: Evaluation function = estimated desirability of position Cutoff test e.g., depth limit (perhaps add quiescence search, which tries to search interesting positions to a greater depth than quiet

  • nes)

Jacques Fleuriot Adversarial Search, R&N 5.1–5.5 17/25

slide-27
SLIDE 27

T H E U N I V E R S I T Y O F E D I N B U R G H

Evaluation functions

For chess, typically linear weighted sum of features: Eval(s) = w1f1(s) + w2f2(s) + . . . + wnfn(s) e.g., w1 = 9 with f1(s) = (# white queens) – (# black queens) etc. Assumes that contribution of each feature is independent of the values of the other features

Jacques Fleuriot Adversarial Search, R&N 5.1–5.5 18/25

slide-28
SLIDE 28

T H E U N I V E R S I T Y O F E D I N B U R G H

Cutting off search

Standard approach: Use Cutoff-Test instead of Terminal-Test Use Eval instead of Utility i.e., evaluation function that estimates desirability of position Heuristic minimax:

H-MINIMAX(s) =      EVAL(s, d) if CUTOFF-TEST(s, d) maxa∈Actions(s) H-MINIMAX(RESULT(s, a), d + 1) if PLAYER(s) = MAX mina∈Actions(s) H-MINIMAX(RESULT(s, a), d + 1) if PLAYER(s) = MIN

Jacques Fleuriot Adversarial Search, R&N 5.1–5.5 19/25

slide-29
SLIDE 29

T H E U N I V E R S I T Y O F E D I N B U R G H

Deterministic games in practice

Checkers: Chinook ended 40-year-reign of human world champion Marion Tinsley in 1994. Used an endgame database defining perfect play for all positions involving 8 or fewer pieces on the board, a total of 443,748,401,247 positions. Chess: Deep Blue defeated human world champion Gary Kasparov in a six-game match in 1997. Go: human champions refuse to compete against computers, who are too bad. In go, b > 300, so most programs use pattern knowledge bases to suggest plausible moves.

Jacques Fleuriot Adversarial Search, R&N 5.1–5.5 20/25

slide-30
SLIDE 30

T H E U N I V E R S I T Y O F E D I N B U R G H

Nondeterministic (Stochastic) games

Example: In backgammon, the dice rolls determine the legal moves Simplified example with coin-flipping instead of dice-rolling:

MIN MAX

2

CHANCE

4 7 4 6 5 −2 2 4 −2 0.5 0.5 0.5 0.5 3 −1

Jacques Fleuriot Adversarial Search, R&N 5.1–5.5 21/25

slide-31
SLIDE 31

T H E U N I V E R S I T Y O F E D I N B U R G H

Algorithm for nondeterministic games

Expectiminimax gives perfect play Just like Minimax, except we must also handle chance nodes: . . . if state is a chance node then return expected ExpectiMinimax value of successors of state . . . (Recall:) The expected value is the sum of the value over all

  • utcomes, weighted by the probability of each chance action

Jacques Fleuriot Adversarial Search, R&N 5.1–5.5 22/25

slide-32
SLIDE 32

T H E U N I V E R S I T Y O F E D I N B U R G H

Algorithm for nondeterministic games

Just like Minimax, except we must also handle chance nodes:

EXPECTIMINIMAX(s) =          UTILITY(s) if TERMINAL-TEST(s) maxa∈Actions(s) EXPECTIMINIMAX(RESULT(s, a)) if PLAYER(s) = MAX mina∈Actions(s) EXPECTIMINIMAX(RESULT(s, a)) if PLAYER(s) = MIN ΣrP(r)EXPECTIMINIMAX(RESULT(s, r)) if PLAYER(s) = CHANCE where r is a chance event e.g. a possible dice roll.

Eval for a position in games of chance: using Monte Carlo simulation to evaluate a position: algorithm plays thousands of games (against itself) with random dice rolls. Resulting win percentage is a good approximation of the value of the position. This type of random simulation is often known as a rollout. Some of these ideas e.g. rollout are central to Monte Carlo Tree Search.

Jacques Fleuriot Adversarial Search, R&N 5.1–5.5 23/25

slide-33
SLIDE 33

T H E U N I V E R S I T Y O F E D I N B U R G H

Summary I

A game can be defined by the initial state, the legal actions in each state, the result of each action, a terminal test and a utility function that applies to terminal states. In two-player zero-sum games with perfect information, the minimax algorithm can select optimal moves by a depth-first enumeration of the game tree. The alphabeta search algorithm computes the same optimal move as minimax, but achieves much greater efficiency by eliminating subtrees that are provably irrelevant. Usually, it is not feasible to consider the whole game tree (even with alphabeta), so we need to cut the search off at some point and apply a heuristic evaluation function that estimates the utility of a state. Games of chance can be handled by an extension to the minimax algorithm that evaluates a chance node by taking the average utility

  • f all its children, weighted by the probability of each child.

Jacques Fleuriot Adversarial Search, R&N 5.1–5.5 24/25

slide-34
SLIDE 34

T H E U N I V E R S I T Y O F E D I N B U R G H

Summary II

Games are fun to work on! They illustrate several important points about AI: perfection is unattainable ⇒ must approximate good idea to think about what to think about uncertainty constrains the assignment of values to states “Games are to AI as grand prix racing is to automobile design”

Jacques Fleuriot Adversarial Search, R&N 5.1–5.5 25/25