ARTIFICIAL INTELLIGENCE Decision making: opponent based Lecturer: - - PowerPoint PPT Presentation

artificial intelligence decision making opponent based
SMART_READER_LITE
LIVE PREVIEW

ARTIFICIAL INTELLIGENCE Decision making: opponent based Lecturer: - - PowerPoint PPT Presentation

Utrecht University INFOB2KI 2019-2020 The Netherlands ARTIFICIAL INTELLIGENCE Decision making: opponent based Lecturer: Silja Renooij These slides are part of the INFOB2KI Course Notes available from www.cs.uu.nl/docs/vakken/b2ki/schema.html


slide-1
SLIDE 1

ARTIFICIAL INTELLIGENCE

Lecturer: Silja Renooij

Decision making: opponent based

Utrecht University The Netherlands

These slides are part of the INFOB2KI Course Notes available from www.cs.uu.nl/docs/vakken/b2ki/schema.html

INFOB2KI 2019-2020

slide-2
SLIDE 2

Game theory

2

slide-3
SLIDE 3

Outline

  • Game theory: rules for defeating opponents

– Perfect information games

  • Deterministic turn‐taking games

– Mini‐max algorithm, Alpha‐beta pruning

– Best response, Nash equilibrium – Imperfect information games

  • Simultaneous‐move games: Mixed strategy

– Incomplete information games

  • Prisoner’s dilemma

3

slide-4
SLIDE 4

Game Theory

Developed to explain the optimal strategy in two‐person (nowadays n ≥ 2) interactions.

  • Initially, von Neumann and Morgenstern

– Zero‐sum games

(your win == opponents loss)

  • John Nash

– Nonzero‐sum games

  • Harsanyi, Selten

– Incomplete information (Bayesian games)

4

slide-5
SLIDE 5

Zero-sum games

Better term: constant‐sum game Examples of zero‐sum games:

  • 2‐player; payoffs: 1 for win, ‐1 for loss, 0 for draw

 total payoff is 0, regardless of the outcome

  • 2‐player; payoffs: 1 for win, 0 for loss, ½ for draw

 total payoff is 1, regadless of outcome

  • 3‐player; payoffs: distribute 3 points over players,

depending on performance Example of non‐zero‐sum game:

  • 2‐player; payoffs: 3 for win, 0 for loss, 1 for draw

 total payoff is either 3 or 2, depending on the outcome.

5

slide-6
SLIDE 6

Game types

Complete information games:

  • Perfect information games

upon making a move, players know full history of the game, all moves by all players, all payoffs, etc

  • Imperfect information games

players know all outcomes/payoffs, types of other players & their strategies, but are unaware of (or unsure about) possible actions of other playes ‐ simultaneous moves: what action will others choose? ‐ (temporarily) shielded attributes: who has which cards? Complete information games can be deterministic or involve chance.

6

slide-7
SLIDE 7

Game types

Incomplete information games:

Uncertainty about game being played: factors outside the rules

  • f the game, not known to one or more players, may affect

the outcome of the game

  • E.g players may not know other players "type", their

strategies, payoffs or preferences Incomplete information games can be deterministic or involve chance.

7

slide-8
SLIDE 8

Complete information games

Deterministic Chance Perfect information Chess, checkers, go, othello, Tic‐tac‐toe Backgammon, monopoly Imperfect information Battleships, Minesweeper Bridge, poker, scrabble

8

NB(!) textbook says randomness is the difference between perfect and imperfect. Other sources state imperfect == incomplete …. be aware of this!

slide-9
SLIDE 9

Deterministic Two-player, turn-taking Perfect information

9

slide-10
SLIDE 10

Game tree

(my rewards) me:

  • alternates between two players.
  • represents all possibilities from perspective of one player (‘me’) from current root.

you: me: you:

Zero‐sum or Non‐zero sum?

10

slide-11
SLIDE 11
  • Compute value of perfect play for deterministic, perfect

information games – Traverse game tree in DFS‐like manner – ‘bubble up’ values of evaluation function: maximise if my turn, minimise for opponent’s turn

  • Serves for selecting next move: choose move to position

with highest minimax value = best achievable payoff against best play

  • May also serve for finding optimal strategy = from start to

finish my best move, for every move of opponent.

Minimax

11

slide-12
SLIDE 12

Minimax: example

12

NB book uses circles and squares instead of triangles!

slide-13
SLIDE 13

Minimax algorithm

13

slide-14
SLIDE 14

Properties of minimax

  • Complete? Yes (if tree is finite)
  • Optimal? Yes (against an optimal opponent)
  • Time complexity? O(bm) for branching factor b and max

depth m

(in general, worst case, we cannot improve on this, so O(bm) suggested in textbook AI4G must be a typo)

  • Space complexity? O(bm)

(if depth‐first exploration; can be reduced to O(m) with backtracking variant of DFS which generates one successor at a time, rather than all )

For chess, with b ≈ 35, m ≈100 for "reasonable" games  exact solution completely infeasible

14

slide-15
SLIDE 15

Solution: pruning

  • Idea: If m is better for me than n, we will never actually get

to n in play (MAX will avoid it)  prune that leaf/subtree

  • let α be the best (= highest) value (to MAX) found so far on

current path

  • define β similarly for MIN: best (= lowest) value found so far

m n

15

slide-16
SLIDE 16

Alpha-Beta (α-β) pruning

Minimax, augmented with upper‐ and lowerbounds

  • Init: for all non‐leaf nodes set

α = −∞ lowerbound on achievable score

β = ∞ upperbound on achievable score

  • Upwards: update α in MAX move; update β in MIN move

16

α = −∞, β = ∞ α = −∞, β = ∞

slide-17
SLIDE 17

α-β Pruning

function MIN-VALUE is similarly extended: if then return v β ← MIN(β, v)

17

slide-18
SLIDE 18

α-β pruning example

α = −∞ β = 3 α = 3 β = ∞

α = best score till now (lowerbound )

updated in own (MAX) move β = upperbound on achievable score  updated in opponents (MIN) move

18

slide-19
SLIDE 19

α-β pruning example

α = −∞ β = 3 α = 3 β = ∞ α = −∞ β = 2

19

slide-20
SLIDE 20

α-β pruning example

Prune or continue?

α = −∞ β = 3 α = 3 β = ∞ α = −∞ β = 2 α = −∞ β = 14

20

slide-21
SLIDE 21

α-β pruning example

Prune or continue?

α = −∞ β = 3 α = 3 β = ∞ α = −∞ β = 2 α = −∞ β = 5

21

slide-22
SLIDE 22

α-β pruning example

α = −∞ β = 3 α = −∞ β = 2 α = 3 β = ∞ α = −∞ β = 2

22

slide-23
SLIDE 23

Properties of α-β

  • Pruning does not affect final result!
  • Good move ordering improves effectiveness
  • f pruning
  • With “perfect ordering”, time complexity = O(bm/2)

 effective branching factor is √b  allows search depth to double for same cost

  • A simple example of the value of reasoning about

which computations are relevant (a form of meta‐reasoning)

23

slide-24
SLIDE 24

Practical feasibility: resource limits

Suppose we have 100 secs and explore 104 nodes/sec  106 nodes can be explored per move What if we have too little time to reach terminal states (=utility function)? Standard approach combines:

  • cutoff test:

e.g., depth limit (perhaps add quiescence search: disregard positions that

are unlikely to exhibit wild swings in value in near future)

  • evaluation function

= estimated desirability of position

24

slide-25
SLIDE 25

Evaluation functions

Evaluation function (cf heuristic with A*):

  • Returns estimate of expected utility of the game from a

given position

  • Must agree with utility function on terminal nodes
  • Must not take too long

For chess, typically linear weighted sum of features Eval(s) = w1 f1(s) + w2 f2(s) + … + wn fn(s) e.g., w1 = 9 with f1(s) = (# white queens) – (# black queens), etc.

25

slide-26
SLIDE 26

Cutting off search

MinimaxCutoff is identical to MinimaxValue except: 1. Terminal? is replaced by Cutoff? 2. Utility is replaced by Eval Does it work in practice? bm = 106, b=35  m=4 4‐ply lookahead is a hopeless chess player!

– 4‐ply ≈ human novice – 8‐ply ≈ typical PC, human master – 12‐ply ≈ Deep Blue, Kasparov

26

slide-27
SLIDE 27

Deterministic games in practice

  • Checkers: Chinook ended 40‐year‐reign of human world champion

Marion Tinsley in 1994. Used a pre‐computed endgame database defining perfect play for all positions involving 8 or fewer pieces on the board, a total of 444 billion positions.

  • Chess: Deep Blue defeated human world champion Garry Kasparov in a

six‐game match in 1997. Deep Blue searches 200 million positions per second, uses very sophisticated evaluation, and undisclosed methods for extending some lines of search up to 40 ply.

  • Othello: human champions refuse to compete against computers, who

are too good.

  • Go: human champions refuse to compete against computers, who are too
  • bad. In Go, b > 300, so most programs use pattern knowledge bases to

suggest plausible moves. 2016/2017: Alpha Go beats world’s number 1 and 2 using deep learning.

27

slide-28
SLIDE 28

Strategies and equilibria

28

slide-29
SLIDE 29

Big Monkey, Little Monkey example

  • Monkeys usually eat ground‐level fruit
  • Occasionally they climb a tree to shake

loose a coconut (1 per tree)

  • A Coconut yields 10 Calories
  • Big Monkey expends 2 Calories climbing up

the tree, shaking and climbing down.

  • Little Monkey expends 0 Calories on this

exercise.

29

slide-30
SLIDE 30
  • If only BM climbs the tree

LM eats some before BM gets down BM gets 6 C, LM gets 4 C

  • If only LM climbs the tree

BM eats almost all before LM gets down BM gets 9 C, LM gets 1 C

  • If both climb the tree

BM is first to hog coconut BM gets 7 C, LM gets 3 C

How should the monkeys each act so as to maximize their own calorie gain?

BM and LM utilities

30

slide-31
SLIDE 31

Strategies are determined prior to ‘playing the game’ Assume BM will be allowed to move first. BM has two (single action) strategies:

– wait (w), or – climb (c)

LM has four strategies:

– If BM waits, then wait; if BM climbs then wait (xw) – If BM waits, then wait; if BM climbs then climb (xx) – If BM waits, then climb; if BM climbs then wait (x¬x) – If BM waits, then climb; if BM climbs then climb (xc)

BM and LM: strategies

31

slide-32
SLIDE 32

BM and LM: BM moves first

Big monkey Little monkey w w w c c c 0,0 9,1 6-2,4 7-2,3 What should Big Monkey do? If BM waits, will outcome be at least that of climbing, regardless of what LM does? No: 0 vs 4, 9 vs 5 …. What if we believe LM will act rationally?

(BM,LM)

32

slide-33
SLIDE 33

BM and LM: BM moves first

Big monkey Little monkey w w w c c c 0,0 9,1 6-2,4 7-2,3 What should Big Monkey do?

  • If BM waits, LM will climb  BM gets 9
  • If BM climbs, LM will wait  BM gets 4

 BM should wait (w) What about Little Monkey?  Opposite of BM (x¬ x)

(eventhough we’ll never get to the right side of the game tree unless BM errs)

(BM,LM)

33

slide-34
SLIDE 34

BM and LM: BM moves first

What should BM do? What about Little Monkey?  wait (w)  Opposite of BM (x¬ x)

1 game tree that explicitly shows the players´ moves and resulting payoffs

2 table showing payoffs of outcomes of simultaneous ‘decisions’ (strategies)

LM xc xw xx x¬x

c

BM

w The game‐tree representation of a game is called extensive form1, as opposed to normal form2 : 5,3 4,4 5,3 4,4 9,1 0,0 0,0 9,1

34

slide-35
SLIDE 35

Dominant strategies

For Little Monkey x¬x is a weakly dominant strategy; BM does not have a dominant strategy: LM xc xw xx x¬x

c

BM

w 5,3 4,4 5,3 4,4 9,1 0,0 0,0 9,1

35

Consider a player’s strategies s1 and s2. If, regardless of the other players’ strategy:

  • payoff for s1 ≥ payoff for s2 then s1 (weakly) dominates s2
  • payoff for s1 > payoff for s2 then s1 strictly dominates s2

A player has a dominant strategy s if s dominates all the player’s

  • ther strategies.
slide-36
SLIDE 36

BM and LM: LM moves first

Little monkey Big monkey w w w c c c 0,0 4,4 1,9 3,5

What should Little Monkey do?

  • If LM waits, BM will climb  LM gets 4
  • If LM climbs, BM will wait  LM gets 1

 LM should wait (w) What about Big Monkey?  Opposite of LM (x¬ x)

(LM,BM)

36

slide-37
SLIDE 37
  • Strategies w and x¬x are called best responses:

– given what the other player does, this is the best thing to do.

  • A solution where everyone is playing a best

response is called a Nash equilibrium.

– No one can unilaterally change and improve things.

  • Every finite1 game has a Nash equilibrium

– but not necessarily in terms of pure strategies!

1 finite in #players and #pure strategies;

pure = not mixed (see imperfect information games)

Responses and equilibria

37

slide-38
SLIDE 38

For each strategy of one player there is a best response of the

  • ther  multiple Nash equilibria

BM moves first  the following Nash equilibria (BM, LM):

  • (w, x¬x); (w, xc); (c, xw)

Why isn’t (c, x¬x) a Nash equilibrium? What if the monkeys have to move simultaneously?

BM and LM: equilibria

LM xc xw xx x¬x

c

BM

w 5,3 4,4 5,3 4,4 9,1 0,0 0,0 9,1

38

slide-39
SLIDE 39

Imperfect information

39

♥♣ ♠♦

slide-40
SLIDE 40

LM/BM has to choose before he sees BM/LM move…. two obvious Nash equilibria: (c,w), (w,c) A third Nash equilibrium, if both use a mixed strategy: “choose between c & w with p=0.5”  each outcome has p=0.25  Expected payoff (BM,LM) = (4.5, 2)

BM and LM move together

Big monkey Little monkey w w w c c c 0,0 9,1 4,4 5,3

?

5,3 4,4 9,1 0,0 LM c w c

BM

w

40

slide-41
SLIDE 41

Choosing Strategies

  • A strategy is optimal if no other strategy/outcome

is preferred by all players

  • In zero‐sum games a pure strategy can be
  • ptimal; in non‐zero‐sum games a mixed strategy

is required

  • In the simultaneous game, it’s harder to see what

each monkey should do

– Mixed strategy is optimal

  • Often, other techniques can be used to prune the

number of possible actions:

– E.g. using dominance

41

slide-42
SLIDE 42

Incomplete information

42

? ?

slide-43
SLIDE 43

Prisoner’s Dilemma

Each player can cooperate or defect

cooperate defect defect 0,-10

  • 10,0
  • 8,-8
  • 1,-1

Rob Carl cooperate Rob,Carl

43

slide-44
SLIDE 44

Prisoner’s Dilemma

Each player can cooperate or defect

cooperate defect defect 0,-10

  • 10,0
  • 8,-8
  • 1,-1

Rob Carl cooperate Defecting is a (strictly) dominant strategy for Rob

Rob,Carl

44

slide-45
SLIDE 45

Prisoner’s Dilemma

Each player can cooperate or defect

cooperate defect defect 0,-10

  • 10,0
  • 8,-8
  • 1,-1

Rob Carl cooperate Defecting is also a dominant strategy for Carl  Result is not optimal!

Rob,Carl

45

slide-46
SLIDE 46

Prisoner’s Dilemma

  • Even though both players would be better
  • ff cooperating, mutual defection is the

dominant strategy…

  • What drives this?

– One‐shot game – Inability to trust your opponent (incomplete information: is your opponent selfish or nice?) – Perfect rationality

46

slide-47
SLIDE 47

Summary

  • (“Board”) games are fun to work on and

illustrate several important points about AI

  • perfection is unattainable  must

approximate

  • good idea to think about what to think

about

47