Minimax strategies, alpha beta pruning Lirong Xia Reminder - - PowerPoint PPT Presentation

minimax strategies alpha beta pruning
SMART_READER_LITE
LIVE PREVIEW

Minimax strategies, alpha beta pruning Lirong Xia Reminder - - PowerPoint PPT Presentation

Minimax strategies, alpha beta pruning Lirong Xia Reminder Project 1 due tonight Makes sure you DO NOT SEE ERROR: Summation of parsed points does not match Project 2 due in two weeks 2 How to find good heuristics? No really


slide-1
SLIDE 1

Lirong Xia

Minimax strategies, alpha beta pruning

slide-2
SLIDE 2

ØProject 1 due tonight

§ Makes sure you DO NOT SEE “ERROR: Summation of parsed points does not match”

ØProject 2 due in two weeks

2

Reminder

slide-3
SLIDE 3

ØNo really mechanical way

§ art more than science

ØGeneral guideline: relaxing constraints

§ e.g. Pacman can pass through the walls

ØMimic what you would do

3

How to find good heuristics?

slide-4
SLIDE 4

Arc Consistency of a CSP

4

Ø A simple form of propagation makes sure all arcs are consistent: Ø If V loses a value, neighbors of V need to be rechecked! Ø Arc consistency detects failure earlier than forward checking Ø Can be run as a preprocessor or after each assignment Ø Might be time-consuming

Delete from tail! X X X

slide-5
SLIDE 5

Limitations of Arc Consistency

5

ØAfter running arc consistency:

§ Can have one solution left § Can have multiple solutions left § Can have no solutions left (and not know it)

slide-6
SLIDE 6

“Sum to 2” game

Ø Player 1 moves, then player 2, finally player 1 again Ø Move = 0 or 1 Ø Player 1 wins if and only if all moves together sum to 2

Player 1 Player 2 Player 2 Player 1

  • 1

Player 1 Player 1 Player 1

1 1 1 1 1 1 1

  • 1
  • 1

1

  • 1

1 1

  • 1

Player 1’s utility is in the leaves; player 2’s utility is the negative of this

slide-7
SLIDE 7

ØAdversarial game ØMinimax search ØAlpha-beta pruning algorithm

7

Today’s schedule

slide-8
SLIDE 8

Adversarial Games

8

Ø Deterministic, zero-sum games:

§ Tic-tac-toe, chess, checkers § The MAX player maximizes result § The MIN player minimizes result

Ø Minimax search:

§ A search tree § Players alternate turns § Each node has a minimax value: best achievable utility against a rational adversary

slide-9
SLIDE 9

Computing Minimax Values

9

Ø This is DFS Ø Two recursive functions:

§ max-value maxes the values of successors § min-value mins the values of successors

Ø Def value (state):

If the state is a terminal state: return the state’s utility If the agent at the state is MAX: return max-value(state) If the agent at the state is MIN: return min-value(state)

Ø Def max-value(state):

Initialize max = -∞ For each successor of state: Compute value(successor) Update max accordingly return max

Ø Def min-value(state): similar to max-value

slide-10
SLIDE 10

Minimax Example

10

3 2 2 3

slide-11
SLIDE 11

Tic-tac-toe Game Tree

11

slide-12
SLIDE 12

12

Renju

  • 15*15
  • 5 horizontal, vertical, or

diagonal in a row win

  • no double-3 or double-4

moves for black

  • otherwise black’s winning

strategy was computed

– L. Victor Allis 1994 (PhD thesis)

slide-13
SLIDE 13

Minimax Properties

13

Ø Time complexity?

§

Ø Space complexity?

§

Ø For chess,

§ Exact solution is completely infeasible § But, do we need to explore the whole tree?

( )

m

O b

( )

O bm

35, 100 b m ≈ ≈

slide-14
SLIDE 14

Resource Limits

14

Ø Cannot search to leaves Ø Depth-limited search

§ Instead, search a limited depth of tree § Replace terminal utilities with an evaluation function for non-terminal positions

Ø Guarantee of optimal play is gone

slide-15
SLIDE 15

Evaluation Functions

15

Ø Functions which scores non-terminals Ø Ideal function: returns the minimax utility of the position Ø In practice: typically weighted linear sum of features: Ø e.g. , etc.

Evals s

( ) = w1 f1 s ( )+ w2 f2 s ( )++ wn fn s ( )

( ) ( )

1

# white queens - # black queens f s =

slide-16
SLIDE 16

ØSuppose you are the MAX player ØGiven a depth d and current state ØCompute value(state,d) that reaches depth d

§ at depth d, use a evaluation function to estimate the value if it is non-terminal

16

Minimax with limited depth

slide-17
SLIDE 17

17

Improving minimax: pruning

slide-18
SLIDE 18

Pruning in Minimax Search

18

ØAn ancestor is a MAX node

§ already has an option than my current solution § my future solution can only be smaller

slide-19
SLIDE 19

Alpha-beta pruning

ØPruning = cutting off parts of the search tree (because you realize you don’t need to look at them)

§ When we considered A* we also pruned large parts of the search tree

ØMaintain

§ α = value of the best option for the MAX player along the path so far § β = value of the best option for the MIN player along the path so far § Initialized to be α = -∞ and β = +∞

ØMaintain and update α and β for each node

§ α is updated at MAX player’s nodes § β is updated at MIN player’s nodes

slide-20
SLIDE 20

Alpha-Beta Pruning

20

Ø General configuration

§ We’re computing the MIN-VALUE at n § We’re looping over n’s children § n’s value estimate is dropping § α is the best value that MAX can get at any choice point along the current path § If n becomes worse than α, MAX will avoid it, so can stop considering n’s other children § Define β similarly for MIN § α is usually smaller than β

  • Once α >= β, return to the upper

layer

slide-21
SLIDE 21

Alpha-Beta Pruning Example

21

is MAX’s best alternative here or above is MIN’s best alternative here or above

α

β

slide-22
SLIDE 22

Alpha-Beta Pruning Example

22

is MAX’s best alternative here or above is MIN’s best alternative here or above

α

β

starting / α β raising α raising α lowering β

  • +

α β = ∞ = ∞

  • +

α β = ∞ = ∞

  • +

α β = ∞ = ∞ 3 + α β = = ∞ 3 + α β = = ∞

  • +

α β = ∞ = ∞

  • 3

α β = ∞ =

  • 3

α β = ∞ =

  • 3

α β = ∞ =

  • 3

α β = ∞ = 8 3 α β = = 3 + α β = = ∞ 3 2 α β = = 3 + α β = = ∞ 3 14 α β = = 3 5 α β = = 3 1 α β = =

slide-23
SLIDE 23

Alpha-Beta Pseudocode

23

slide-24
SLIDE 24

Alpha-Beta Pruning Properties

24

Ø This pruning has no effect on final result at the root Ø Values of intermediate nodes might be wrong!

§ Important: children of the root may have the wrong value

Ø Good children ordering improves effectiveness of pruning Ø With “perfect ordering”:

§ Time complexity drops to O(bm/2) § Doubles solvable depth! § Your action looks smarter: more forward-looking with good evaluation function § Full search of, e.g. chess, is still hopeless…

slide-25
SLIDE 25

ØQ1: write an evaluation function for (state,action) pairs

§ the evaluation function is for this question only

ØQ2: minimax search with arbitrary depth and multiple MIN players (ghosts)

§ evaluation function on states has been implemented for you

ØQ3: alpha-beta pruning with arbitrary depth and multiple MIN players (ghosts)

25

Project 2

slide-26
SLIDE 26

ØMinimax search

§ with limited depth § evaluation function

ØAlpha-beta pruning ØProject 1 due midnight today ØProject 2 due in two weeks

26

Recap