ARTIFICIAL INTELLIGENCE
Lecturer: Silja Renooij
Decision making: opponent based
Utrecht University The Netherlands
These slides are part of the INFOB2KI Course Notes available from www.cs.uu.nl/docs/vakken/b2ki/schema.html
INFOB2KI 2019-2020
ARTIFICIAL INTELLIGENCE Decision making: opponent based Lecturer: - - PowerPoint PPT Presentation
Utrecht University INFOB2KI 2019-2020 The Netherlands ARTIFICIAL INTELLIGENCE Decision making: opponent based Lecturer: Silja Renooij These slides are part of the INFOB2KI Course Notes available from www.cs.uu.nl/docs/vakken/b2ki/schema.html
Lecturer: Silja Renooij
Utrecht University The Netherlands
These slides are part of the INFOB2KI Course Notes available from www.cs.uu.nl/docs/vakken/b2ki/schema.html
INFOB2KI 2019-2020
2
– Perfect information games
– Mini‐max algorithm, Alpha‐beta pruning
– Best response, Nash equilibrium – Imperfect information games
– Incomplete information games
3
Developed to explain the optimal strategy in two‐person (nowadays n ≥ 2) interactions.
– Zero‐sum games
(your win == opponents loss)
– Nonzero‐sum games
– Incomplete information (Bayesian games)
4
Better term: constant‐sum game Examples of zero‐sum games:
total payoff is 0, regardless of the outcome
total payoff is 1, regadless of outcome
depending on performance Example of non‐zero‐sum game:
total payoff is either 3 or 2, depending on the outcome.
5
Complete information games:
upon making a move, players know full history of the game, all moves by all players, all payoffs, etc
players know all outcomes/payoffs, types of other players & their strategies, but are unaware of (or unsure about) possible actions of other playes ‐ simultaneous moves: what action will others choose? ‐ (temporarily) shielded attributes: who has which cards? Complete information games can be deterministic or involve chance.
6
Incomplete information games:
Uncertainty about game being played: factors outside the rules
the outcome of the game
strategies, payoffs or preferences Incomplete information games can be deterministic or involve chance.
7
Deterministic Chance Perfect information Chess, checkers, go, othello, Tic‐tac‐toe Backgammon, monopoly Imperfect information Battleships, Minesweeper Bridge, poker, scrabble
8
NB(!) textbook says randomness is the difference between perfect and imperfect. Other sources state imperfect == incomplete …. be aware of this!
9
(my rewards) me:
you: me: you:
Zero‐sum or Non‐zero sum?
10
information games – Traverse game tree in DFS‐like manner – ‘bubble up’ values of evaluation function: maximise if my turn, minimise for opponent’s turn
with highest minimax value = best achievable payoff against best play
finish my best move, for every move of opponent.
11
12
NB book uses circles and squares instead of triangles!
13
depth m
(in general, worst case, we cannot improve on this, so O(bm) suggested in textbook AI4G must be a typo)
(if depth‐first exploration; can be reduced to O(m) with backtracking variant of DFS which generates one successor at a time, rather than all )
For chess, with b ≈ 35, m ≈100 for "reasonable" games exact solution completely infeasible
14
to n in play (MAX will avoid it) prune that leaf/subtree
current path
m n
15
Minimax, augmented with upper‐ and lowerbounds
α = −∞ lowerbound on achievable score
β = ∞ upperbound on achievable score
16
α = −∞, β = ∞ α = −∞, β = ∞
function MIN-VALUE is similarly extended: if then return v β ← MIN(β, v)
17
α = −∞ β = 3 α = 3 β = ∞
α = best score till now (lowerbound )
updated in own (MAX) move β = upperbound on achievable score updated in opponents (MIN) move
18
α = −∞ β = 3 α = 3 β = ∞ α = −∞ β = 2
19
Prune or continue?
α = −∞ β = 3 α = 3 β = ∞ α = −∞ β = 2 α = −∞ β = 14
20
Prune or continue?
α = −∞ β = 3 α = 3 β = ∞ α = −∞ β = 2 α = −∞ β = 5
21
α = −∞ β = 3 α = −∞ β = 2 α = 3 β = ∞ α = −∞ β = 2
22
effective branching factor is √b allows search depth to double for same cost
which computations are relevant (a form of meta‐reasoning)
23
Suppose we have 100 secs and explore 104 nodes/sec 106 nodes can be explored per move What if we have too little time to reach terminal states (=utility function)? Standard approach combines:
e.g., depth limit (perhaps add quiescence search: disregard positions that
are unlikely to exhibit wild swings in value in near future)
= estimated desirability of position
24
Evaluation function (cf heuristic with A*):
given position
For chess, typically linear weighted sum of features Eval(s) = w1 f1(s) + w2 f2(s) + … + wn fn(s) e.g., w1 = 9 with f1(s) = (# white queens) – (# black queens), etc.
25
MinimaxCutoff is identical to MinimaxValue except: 1. Terminal? is replaced by Cutoff? 2. Utility is replaced by Eval Does it work in practice? bm = 106, b=35 m=4 4‐ply lookahead is a hopeless chess player!
– 4‐ply ≈ human novice – 8‐ply ≈ typical PC, human master – 12‐ply ≈ Deep Blue, Kasparov
26
Marion Tinsley in 1994. Used a pre‐computed endgame database defining perfect play for all positions involving 8 or fewer pieces on the board, a total of 444 billion positions.
six‐game match in 1997. Deep Blue searches 200 million positions per second, uses very sophisticated evaluation, and undisclosed methods for extending some lines of search up to 40 ply.
are too good.
suggest plausible moves. 2016/2017: Alpha Go beats world’s number 1 and 2 using deep learning.
27
28
loose a coconut (1 per tree)
the tree, shaking and climbing down.
exercise.
29
LM eats some before BM gets down BM gets 6 C, LM gets 4 C
BM eats almost all before LM gets down BM gets 9 C, LM gets 1 C
BM is first to hog coconut BM gets 7 C, LM gets 3 C
How should the monkeys each act so as to maximize their own calorie gain?
30
Strategies are determined prior to ‘playing the game’ Assume BM will be allowed to move first. BM has two (single action) strategies:
– wait (w), or – climb (c)
LM has four strategies:
– If BM waits, then wait; if BM climbs then wait (xw) – If BM waits, then wait; if BM climbs then climb (xx) – If BM waits, then climb; if BM climbs then wait (x¬x) – If BM waits, then climb; if BM climbs then climb (xc)
31
Big monkey Little monkey w w w c c c 0,0 9,1 6-2,4 7-2,3 What should Big Monkey do? If BM waits, will outcome be at least that of climbing, regardless of what LM does? No: 0 vs 4, 9 vs 5 …. What if we believe LM will act rationally?
(BM,LM)
32
Big monkey Little monkey w w w c c c 0,0 9,1 6-2,4 7-2,3 What should Big Monkey do?
BM should wait (w) What about Little Monkey? Opposite of BM (x¬ x)
(eventhough we’ll never get to the right side of the game tree unless BM errs)
(BM,LM)
33
What should BM do? What about Little Monkey? wait (w) Opposite of BM (x¬ x)
1 game tree that explicitly shows the players´ moves and resulting payoffs
2 table showing payoffs of outcomes of simultaneous ‘decisions’ (strategies)
LM xc xw xx x¬x
c
BM
w The game‐tree representation of a game is called extensive form1, as opposed to normal form2 : 5,3 4,4 5,3 4,4 9,1 0,0 0,0 9,1
34
For Little Monkey x¬x is a weakly dominant strategy; BM does not have a dominant strategy: LM xc xw xx x¬x
c
BM
w 5,3 4,4 5,3 4,4 9,1 0,0 0,0 9,1
35
Consider a player’s strategies s1 and s2. If, regardless of the other players’ strategy:
A player has a dominant strategy s if s dominates all the player’s
Little monkey Big monkey w w w c c c 0,0 4,4 1,9 3,5
What should Little Monkey do?
LM should wait (w) What about Big Monkey? Opposite of LM (x¬ x)
(LM,BM)
36
– given what the other player does, this is the best thing to do.
response is called a Nash equilibrium.
– No one can unilaterally change and improve things.
– but not necessarily in terms of pure strategies!
1 finite in #players and #pure strategies;
pure = not mixed (see imperfect information games)
37
For each strategy of one player there is a best response of the
BM moves first the following Nash equilibria (BM, LM):
Why isn’t (c, x¬x) a Nash equilibrium? What if the monkeys have to move simultaneously?
LM xc xw xx x¬x
c
BM
w 5,3 4,4 5,3 4,4 9,1 0,0 0,0 9,1
38
39
♥♣ ♠♦
LM/BM has to choose before he sees BM/LM move…. two obvious Nash equilibria: (c,w), (w,c) A third Nash equilibrium, if both use a mixed strategy: “choose between c & w with p=0.5” each outcome has p=0.25 Expected payoff (BM,LM) = (4.5, 2)
Big monkey Little monkey w w w c c c 0,0 9,1 4,4 5,3
?
5,3 4,4 9,1 0,0 LM c w c
BM
w
40
is preferred by all players
is required
each monkey should do
– Mixed strategy is optimal
number of possible actions:
– E.g. using dominance
41
42
? ?
Each player can cooperate or defect
cooperate defect defect 0,-10
Rob Carl cooperate Rob,Carl
43
Each player can cooperate or defect
cooperate defect defect 0,-10
Rob Carl cooperate Defecting is a (strictly) dominant strategy for Rob
Rob,Carl
44
Each player can cooperate or defect
cooperate defect defect 0,-10
Rob Carl cooperate Defecting is also a dominant strategy for Carl Result is not optimal!
Rob,Carl
45
dominant strategy…
– One‐shot game – Inability to trust your opponent (incomplete information: is your opponent selfish or nice?) – Perfect rationality
46
illustrate several important points about AI
approximate
about
47