Adversarial Search
George Konidaris gdk@cs.brown.edu
Fall 2019
Adversarial Search George Konidaris gdk@cs.brown.edu Fall 2019 - - PowerPoint PPT Presentation
Adversarial Search George Konidaris gdk@cs.brown.edu Fall 2019 Games Chess is the Drosophila of Artificial Intelligence Kronrod, c. 1966 TuroChamp, 1948 Games Programming a Computer for Playing Chess - Claude Shannon, 1950. The
George Konidaris gdk@cs.brown.edu
Fall 2019
“Chess is the Drosophila of Artificial Intelligence” Kronrod, c. 1966 TuroChamp, 1948
Programming a Computer for Playing Chess - Claude Shannon, 1950.
“The chess machine is an ideal one to start with, since: (1) the problem is sharply defined both in allowed operations (the moves) and in the ultimate goal (checkmate); (2) it is neither so simple as to be trivial nor too difficult for satisfactory solution; (3) chess is generally considered to require "thinking" for skillful play; a solution of this problem will force us either to admit the possibility of a mechanized thinking or to further restrict our concept of "thinking"; (4) the discrete structure of chess fits well into the digital nature of modern computers.”
A game is solved if an optimal strategy is known. Strong solved: all positions. Weakly solved: some (start) positions.
Games are usually:
Very much like search:
The key difference is alternating control.
player 1 moves …
x x
player 2 moves player 1 moves
…
p1 p2 p2 p2 p1 p1 p1
…
they select to min score you select to max score
s0 s1 s2 s3 s6 s5 s4
… score min max
Propagate value backwards through tree.
g1 g2 g3
V(s5) = max(V(g1), V(g2), V(g3)) V(s2) = min(V(s4), V(s5), V(s6)) V(s0) = max(V(s1), V(s2), V(s3)) max
Compute value for each node, going backwards from the end-nodes. Max (min) player: select action to maximize (minimize) return. Optimal for both players (if zero sum). Assumes perfect play, worst case. Can run as depth first:
Require the agent to evaluate the whole tree.
p1 p2 p2 p2 p1 p1 p1 p1 p1 p1
10 5
20
2 max min 5
5
What if there is a chance element?
An outcome is called stochastic when it is determined at random.
p=1/6 p=1/6 p=1/6 p=1/6 p=1/6 p=1/6 sums to 1
How to factor in stochasticity? Agent does not get to choose.
Must be probability-aware. Be aware of who is choosing at each level.
insert randomization layer
dice
… they select to min score stochastic
p1 p1 p1
you select (max)
dice dice dice p2
stochastic
p2 p2
How to compute value of stochastic layer? What is the average die value? This factors in both probabilities and the value of event. In general, given random event x and function f(x): (1 + 2 + 3 + 4 + 5 + 6) 6 = 3.5 E[f(x)] = X
x
P(x)f(x)
dice
… they select to min score stochastic (expectation)
p1 p1 p1
you select (max)
dice dice dice p2
stochastic (expectation)
p2 p2
p1 p2 p2 p2 p1 p1 p1 p1 p1 p1
10 5
20
2 max min 5
5
Can run as depth first:
Depth is too deep.
Breadth is too broad.
Full search never terminates for non-trivial games.
Terminate early. Branch less often. p1 p2 p1 p1 p1
…
p2 p2
p1 p2 p2 p2 p1 p1 p1 p1 p1 p1
10 5 max min 5
S A B
10 5 max min 5
At a min layer: If V(B) V(A) then prune B’s siblings.
≤
S B A
10 5 max min 3 At a max layer: If V(A) V(B) then prune A’s siblings.
≥
5
S A B
min max max min More generally:
If max node:
If min node:
α β ≥ β ≤ α
(from Russell and Norvig)
Single most useful search control method:
Resulting algorithm: alpha-beta pruning. Empirically: square roots branching factor.
Alpha-beta makes the difference between novice and expert computer game players. Most successful players use alpha-beta.
Terminate early. Branch less often. p1 p2 p1 p1 p1
…
p2 p2
Solution: substitute evaluation function.
p1 p2 p1
!
480 Special Purpose Chips 200 million positions/sec Search depth 6-8 moves (up to 20)
Horizon Effects
More sophisticated strategies:
… Continually estimate value Adaptively explore Random rollouts to evaluate
p1 p2 p1 p1 p1
…
p2 p2
Step 1: path selection.
p1 p2 p1 p1 p1
…
p2 p2
Step 1: path selection.
wi ni + c r log n ni
UCT
p1 p2 p1 p1 p1
…
p2 p2
Step 2: expansion.
p2
p1 p2 p1 p1 p1
…
p2 p2
Step 3: rollout.
p2
terminal state
p1 p2 p1 p1 p1
…
p2 p2
Step 4: update.
p2
terminal state
World champion level:
“Heads-up Limit Hold’em Poker is Solved”, Bowling et al., Science, January 2015.
Perform well:
Far off: Go
Lee Sedol AlphaGo (Google Deepmind)
“ … board games are more or less done and it's time to move