Previously in Game Theory Previously in Game Theory decision makers: - - PowerPoint PPT Presentation

previously in game theory previously in game theory
SMART_READER_LITE
LIVE PREVIEW

Previously in Game Theory Previously in Game Theory decision makers: - - PowerPoint PPT Presentation

Previously in Game Theory Previously in Game Theory decision makers: choices preferences Previously in Game Theory decision makers: choices preferences solution concepts: best response Nash equilibrium Rock,


slide-1
SLIDE 1
slide-2
SLIDE 2

Previously in Game Theory

slide-3
SLIDE 3

Previously in Game Theory

◮ decision makers:

◮ choices ◮ preferences

slide-4
SLIDE 4

Previously in Game Theory

◮ decision makers:

◮ choices ◮ preferences

◮ solution concepts:

◮ best response ◮ Nash equilibrium

slide-5
SLIDE 5

Rock, paper, scissors

slide-6
SLIDE 6

Rock, paper, scissors

R P S R 0, 0 −1, 1 1, −1 P 1, −1 0, 0 −1, 1 S −1, 1 1, −1 0, 0

slide-7
SLIDE 7

Learning in games

slide-8
SLIDE 8

Learning in games Repeated games

slide-9
SLIDE 9

Learning in games

slide-10
SLIDE 10

Best Response learning

slide-11
SLIDE 11

Best Response learning

  • 1. Guess what the opponent(s) will play
slide-12
SLIDE 12

Best Response learning

  • 1. Guess what the opponent(s) will play
  • 2. Play a Best Response to that guess
slide-13
SLIDE 13

Best Response learning

  • 1. Guess what the opponent(s) will play
  • 2. Play a Best Response to that guess
  • 3. Observe the play
slide-14
SLIDE 14

Best Response learning

  • 1. Guess what the opponent(s) will play
  • 2. Play a Best Response to that guess
  • 3. Observe the play
  • 4. Update the guess
slide-15
SLIDE 15

BR learning: Cournot dynamics

slide-16
SLIDE 16

BR learning: Cournot dynamics

Guess = last action played

slide-17
SLIDE 17

BR learning: Cournot dynamics

Guess = last action played C D C 2, 2 −1, 3 D 3, −1 0, 0

slide-18
SLIDE 18

BR learning: Cournot dynamics

Guess = last action played C D C 2, 2 −1, 3 D 3, −1 0, 0 R P S R 0, 0 −1, 1 1, −1 P 1, −1 0, 0 −1, 1 S −1, 1 1, −1 0, 0

slide-19
SLIDE 19

BR learning: Fictitious play

slide-20
SLIDE 20

BR learning: Fictitious play

Guess = empirical distribution of play

slide-21
SLIDE 21

BR learning: Fictitious play

Guess = empirical distribution of play R P S R 0, 0 −1, 1 1, −1 P 1, −1 0, 0 −1, 1 S −1, 1 1, −1 0, 0

slide-22
SLIDE 22

BR learning: Fictitious play

Guess = empirical distribution of play R P S R 0, 0 −1, 1 1, −1 P 1, −1 0, 0 −1, 1 S −1, 1 1, −1 0, 0 L C R U 0, 0 0, 1 1, 0 M 1, 0 0, 0 0, 1 D 0, 1 1, 0 0, 0

slide-23
SLIDE 23

Evolutionary learning

slide-24
SLIDE 24

Evolutionary learning

Action set: A Utility function: u

slide-25
SLIDE 25

Evolutionary learning

Action set: A Utility function: u p ∈ ∆(A), k ∈ A ˙ pk = pk (u(k, p) − u(p, p))

slide-26
SLIDE 26

Battle of the Sexes

slide-27
SLIDE 27

Battle of the Sexes

O F O 3, 2 0, 0 F 0, 0 2, 3

slide-28
SLIDE 28

Correlated equilibrium (CE)

slide-29
SLIDE 29

Correlated equilibrium (CE)

a∗ ∈ A =

i Ai is a NE:

∀i, ∀a′

i, ui(a∗ i, a∗ −i) ≥ ui(a′ i, a∗ −i)

slide-30
SLIDE 30

Correlated equilibrium (CE)

a∗ ∈ A =

i Ai is a NE:

∀i, ∀a′

i, ui(a∗ i, a∗ −i) ≥ ui(a′ i, a∗ −i)

α ∈

i ∆(Ai) is a NE: ∀i, ∀ai, ∀a′ i,

  • a−i

ui(ai, a−i)α(a) ≥

  • a−i

ui(a′

i, a−i)α(a)

slide-31
SLIDE 31

Correlated equilibrium (CE)

a∗ ∈ A =

i Ai is a NE:

∀i, ∀a′

i, ui(a∗ i, a∗ −i) ≥ ui(a′ i, a∗ −i)

α ∈

i ∆(Ai) is a NE: ∀i, ∀ai, ∀a′ i,

  • a−i

ui(ai, a−i)α(a) ≥

  • a−i

ui(a′

i, a−i)α(a)

π ∈ ∆(A) is a CE: ∀i, ∀ai, ∀a′

i,

  • a−i

ui(ai, a−i)π(a) ≥

  • a−i

ui(a′

i, a−i)π(a)

slide-32
SLIDE 32

No regret learning

slide-33
SLIDE 33

No regret learning

ui(k, a−i) − ui(j, a−i)

slide-34
SLIDE 34

No regret learning

ui(k, a−i) − ui(j, a−i) Ri

jk(t) = t

  • τ=0:ai(τ)=j

ui(k, a−i(τ))−ui(j, a−i(τ))

slide-35
SLIDE 35

No regret learning

ui(k, a−i) − ui(j, a−i) Ri

jk(t) = t

  • τ=0:ai(τ)=j

ui(k, a−i(τ))−ui(j, a−i(τ)) Regret matching converges to the correlated equilibria set.

slide-36
SLIDE 36

Learning in games

slide-37
SLIDE 37

Learning in games

◮ Best response

slide-38
SLIDE 38

Learning in games

◮ Best response ◮ Replicator dynamics

slide-39
SLIDE 39

Learning in games

◮ Best response ◮ Replicator dynamics ◮ No regret

slide-40
SLIDE 40

Repeated games

slide-41
SLIDE 41

Markov Decision Process (MDP)

slide-42
SLIDE 42

Markov Decision Process (MDP)

state space X action space U transition P : X × U → ∆(X) reward r : X × U → R discount factor δ ∈ [0, 1]

slide-43
SLIDE 43

Markov Decision Process (MDP)

state space X action space U transition P : X × U → ∆(X) reward r : X × U → R discount factor δ ∈ [0, 1] U(x(·), u(·)) =

+∞

  • t=0

δtr(x(t), u(t))

slide-44
SLIDE 44

MDP (continued)

history H ∈ (X, U) policy π : H → ∆(U)

slide-45
SLIDE 45

MDP (continued)

history H ∈ (X, U) policy π : H → ∆(U) V π(x0) = Eπ [U(x(·), u(·))]

slide-46
SLIDE 46

MDP (continued)

history H ∈ (X, U) policy π : H → ∆(U) V π(x0) = Eπ [U(x(·), u(·))] V (x0) = max

π

V π(x0)

slide-47
SLIDE 47

Principle of Optimality

Bellman’s equation: V (x0) = max

u0 [r(x0, u0) + δV (P(x0, u0))]

slide-48
SLIDE 48

Dynamic Programming

Solving the MDP:

slide-49
SLIDE 49

Dynamic Programming

Solving the MDP:

◮ knowing P: value iteration

slide-50
SLIDE 50

Dynamic Programming

Solving the MDP:

◮ knowing P: value iteration ◮ not knowing P: online learning

slide-51
SLIDE 51

Repeated game

slide-52
SLIDE 52

Repeated game

Game (I,

i Ai, i ui)

slide-53
SLIDE 53

Repeated game

Game (I,

i Ai, i ui)

Discount factor δ Ui(a(·)) =

+∞

  • t=0

δtui(a(t))

slide-54
SLIDE 54

Repeated game

Game (I,

i Ai, i ui)

Discount factor δ Ui(a(·)) =

+∞

  • t=0

δtui(a(t)) Strategy σ : H →

i ∆(Aix)

Vi(σ) = Eσ [Ui(a(·))]

slide-55
SLIDE 55

Nash equilibrium

Player i:

◮ choices σi ◮ utility Vi

slide-56
SLIDE 56

Nash equilibrium

Player i:

◮ choices σi ◮ utility Vi

Nash equilibrium is not strong enough! (Explanation on the whiteboard ⇒)

slide-57
SLIDE 57

Information structure

slide-58
SLIDE 58

Information structure

◮ perfect ◮ imperfect

slide-59
SLIDE 59

Information structure

◮ perfect ◮ imperfect ◮ public ◮ private (beliefs)

slide-60
SLIDE 60

Folk theorem

Any feasible, strictly individually rational payoff can be sustained by a sequentially rational equilibrium.

slide-61
SLIDE 61

Folk theorem

Any feasible, strictly individually rational payoff can be sustained by a sequentially rational equilibrium. Holy grail for repeated games.

slide-62
SLIDE 62

u1 u2

slide-63
SLIDE 63

u1 u2 DC

slide-64
SLIDE 64

u1 u2 CC DD DC CD

slide-65
SLIDE 65

u1 u2 CC DD DC CD

slide-66
SLIDE 66

u1 u2 CC DD DC CD

slide-67
SLIDE 67

Research

slide-68
SLIDE 68

Weakly belief-free equilibria

Characterization of repeated games with correlated equilibria.

slide-69
SLIDE 69

Repeated games

slide-70
SLIDE 70

Repeated games

◮ Dynamic programming

slide-71
SLIDE 71

Repeated games

◮ Dynamic programming ◮ Repeated games

slide-72
SLIDE 72

Repeated games

◮ Dynamic programming ◮ Repeated games ◮ Folk theorem

slide-73
SLIDE 73

Learning in games

slide-74
SLIDE 74

Learning in games Repeated games

slide-75
SLIDE 75

Questions, Comments