Multi-agent learning Rep eated games Gerard Vreeswijk , - - PowerPoint PPT Presentation

multi agent learning
SMART_READER_LITE
LIVE PREVIEW

Multi-agent learning Rep eated games Gerard Vreeswijk , - - PowerPoint PPT Presentation

Multi-agent learning Rep eated games Gerard Vreeswijk , Intelligent Systems Group, Computer Science Department, Faculty of Sciences, Utrecht University, The Netherlands. Friday 3 rd May, 2019 interation lea rning stage game nite


slide-1
SLIDE 1

Multi-agent learning

Rep eated games

Gerard Vreeswijk, Intelligent Systems Group, Computer Science Department, Faculty of Sciences, Utrecht University, The Netherlands.

Friday 3rd May, 2019

slide-2
SLIDE 2

Repeated games: motivation

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 2

intera tion lea rning stage game nite indenite innite
slide-3
SLIDE 3

Repeated games: motivation

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 2

  • 1. Much
intera tion in MASs can be modelled through games. lea rning stage game nite indenite innite
slide-4
SLIDE 4

Repeated games: motivation

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 2

  • 1. Much
intera tion in MASs can be modelled through games.
  • 2. Much
lea rning in MASs can therefore be modelled through learning in

games.

stage game nite indenite innite
slide-5
SLIDE 5

Repeated games: motivation

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 2

  • 1. Much
intera tion in MASs can be modelled through games.
  • 2. Much
lea rning in MASs can therefore be modelled through learning in

games.

  • 3. Learning in games usually takes place through the (gradual)

adaptation of strategies (hence, behaviour) in a repeated game.

stage game nite indenite innite
slide-6
SLIDE 6

Repeated games: motivation

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 2

  • 1. Much
intera tion in MASs can be modelled through games.
  • 2. Much
lea rning in MASs can therefore be modelled through learning in

games.

  • 3. Learning in games usually takes place through the (gradual)

adaptation of strategies (hence, behaviour) in a repeated game.

  • 4. In most repeated games, one game (a.k.a.
stage game) is played
  • repeatedly. Possibilities:
nite indenite innite
slide-7
SLIDE 7

Repeated games: motivation

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 2

  • 1. Much
intera tion in MASs can be modelled through games.
  • 2. Much
lea rning in MASs can therefore be modelled through learning in

games.

  • 3. Learning in games usually takes place through the (gradual)

adaptation of strategies (hence, behaviour) in a repeated game.

  • 4. In most repeated games, one game (a.k.a.
stage game) is played
  • repeatedly. Possibilities:

■ A

nite number of times. indenite innite
slide-8
SLIDE 8

Repeated games: motivation

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 2

  • 1. Much
intera tion in MASs can be modelled through games.
  • 2. Much
lea rning in MASs can therefore be modelled through learning in

games.

  • 3. Learning in games usually takes place through the (gradual)

adaptation of strategies (hence, behaviour) in a repeated game.

  • 4. In most repeated games, one game (a.k.a.
stage game) is played
  • repeatedly. Possibilities:

■ A

nite number of times.

■ An

indenite (same: indeterminate) number of times. innite
slide-9
SLIDE 9

Repeated games: motivation

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 2

  • 1. Much
intera tion in MASs can be modelled through games.
  • 2. Much
lea rning in MASs can therefore be modelled through learning in

games.

  • 3. Learning in games usually takes place through the (gradual)

adaptation of strategies (hence, behaviour) in a repeated game.

  • 4. In most repeated games, one game (a.k.a.
stage game) is played
  • repeatedly. Possibilities:

■ A

nite number of times.

■ An

indenite (same: indeterminate) number of times.

■ An

innite number of times.
slide-10
SLIDE 10

Repeated games: motivation

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 2

  • 1. Much
intera tion in MASs can be modelled through games.
  • 2. Much
lea rning in MASs can therefore be modelled through learning in

games.

  • 3. Learning in games usually takes place through the (gradual)

adaptation of strategies (hence, behaviour) in a repeated game.

  • 4. In most repeated games, one game (a.k.a.
stage game) is played
  • repeatedly. Possibilities:

■ A

nite number of times.

■ An

indenite (same: indeterminate) number of times.

■ An

innite number of times.
  • 5. Therefore, familiarity with the basic concepts and results from the

theory of repeated games is essential to understand MAL.

slide-11
SLIDE 11

Plan for today

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 3

ba kw a rd indu tion Dis ount fa to r F
  • lk
theo rem T rigger strategy
slide-12
SLIDE 12

Plan for today

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 3

■ NE in normal form games that are repeated a finite number of times.

ba kw a rd indu tion Dis ount fa to r F
  • lk
theo rem T rigger strategy
slide-13
SLIDE 13

Plan for today

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 3

■ NE in normal form games that are repeated a finite number of times.

  • Principle of
ba kw a rd indu tion. Dis ount fa to r F
  • lk
theo rem T rigger strategy
slide-14
SLIDE 14

Plan for today

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 3

■ NE in normal form games that are repeated a finite number of times.

  • Principle of
ba kw a rd indu tion.

■ NE in normal form games that are repeated an indefinite number of

times.

Dis ount fa to r F
  • lk
theo rem T rigger strategy
slide-15
SLIDE 15

Plan for today

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 3

■ NE in normal form games that are repeated a finite number of times.

  • Principle of
ba kw a rd indu tion.

■ NE in normal form games that are repeated an indefinite number of

times.

  • Dis ount
fa to
  • r. Models the probability of continuation.
F
  • lk
theo rem T rigger strategy
slide-16
SLIDE 16

Plan for today

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 3

■ NE in normal form games that are repeated a finite number of times.

  • Principle of
ba kw a rd indu tion.

■ NE in normal form games that are repeated an indefinite number of

times.

  • Dis ount
fa to
  • r. Models the probability of continuation.
  • F
  • lk
theo
  • rem. (Actually many FT’s.) Repeated games generally do

have infinitely many Nash equilibria.

T rigger strategy
slide-17
SLIDE 17

Plan for today

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 3

■ NE in normal form games that are repeated a finite number of times.

  • Principle of
ba kw a rd indu tion.

■ NE in normal form games that are repeated an indefinite number of

times.

  • Dis ount
fa to
  • r. Models the probability of continuation.
  • F
  • lk
theo
  • rem. (Actually many FT’s.) Repeated games generally do

have infinitely many Nash equilibria.

  • T
rigger strategy.
slide-18
SLIDE 18

Plan for today

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 3

■ NE in normal form games that are repeated a finite number of times.

  • Principle of
ba kw a rd indu tion.

■ NE in normal form games that are repeated an indefinite number of

times.

  • Dis ount
fa to
  • r. Models the probability of continuation.
  • F
  • lk
theo
  • rem. (Actually many FT’s.) Repeated games generally do

have infinitely many Nash equilibria.

  • T
rigger
  • strategy. “on-path” vs. “off-path” play
slide-19
SLIDE 19

Plan for today

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 3

■ NE in normal form games that are repeated a finite number of times.

  • Principle of
ba kw a rd indu tion.

■ NE in normal form games that are repeated an indefinite number of

times.

  • Dis ount
fa to
  • r. Models the probability of continuation.
  • F
  • lk
theo
  • rem. (Actually many FT’s.) Repeated games generally do

have infinitely many Nash equilibria.

  • T
rigger
  • strategy. “on-path” vs. “off-path” play, “minmax” as a

threat.

slide-20
SLIDE 20

Plan for today

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 3

■ NE in normal form games that are repeated a finite number of times.

  • Principle of
ba kw a rd indu tion.

■ NE in normal form games that are repeated an indefinite number of

times.

  • Dis ount
fa to
  • r. Models the probability of continuation.
  • F
  • lk
theo
  • rem. (Actually many FT’s.) Repeated games generally do

have infinitely many Nash equilibria.

  • T
rigger
  • strategy. “on-path” vs. “off-path” play, “minmax” as a

threat. This presentation draws heavily on (Peters, 2008).

slide-21
SLIDE 21

Plan for today

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 3

■ NE in normal form games that are repeated a finite number of times.

  • Principle of
ba kw a rd indu tion.

■ NE in normal form games that are repeated an indefinite number of

times.

  • Dis ount
fa to
  • r. Models the probability of continuation.
  • F
  • lk
theo
  • rem. (Actually many FT’s.) Repeated games generally do

have infinitely many Nash equilibria.

  • T
rigger
  • strategy. “on-path” vs. “off-path” play, “minmax” as a

threat. This presentation draws heavily on (Peters, 2008).

* H. Peters (2008): Game Theory: A Multi-Leveled Approach. Springer, ISBN: 978-3-540- 69290-4. Ch. 8: Repeated games.

slide-22
SLIDE 22

Part I: Nash equilibria

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 4

slide-23
SLIDE 23

Part I: Nash equilibria in normal form games

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 4

slide-24
SLIDE 24

Part I: Nash equilibria in normal form games that are repeated

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 4

slide-25
SLIDE 25

Part I: Nash equilibria in normal form games that are repeated a finite number of times

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 4

slide-26
SLIDE 26

Nash equilibria in playing the PD twice

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 5

Other: Prisoners’ Dilemma Cooperate Defect

Y
  • u:

Cooperate

(3, 3) (0, 5)

Defect

(5, 0) (1, 1)

  • ne
Nash equilib rium D P a reto sub-optimal
slide-27
SLIDE 27

Nash equilibria in playing the PD twice

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 5

Other: Prisoners’ Dilemma Cooperate Defect

Y
  • u:

Cooperate

(3, 3) (0, 5)

Defect

(5, 0) (1, 1)

■ Even if mixed strategies are allowed, the PD possesses

  • ne
Nash equilib rium, viz. ( D, D) with payoffs (1, 1). P a reto sub-optimal
slide-28
SLIDE 28

Nash equilibria in playing the PD twice

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 5

Other: Prisoners’ Dilemma Cooperate Defect

Y
  • u:

Cooperate

(3, 3) (0, 5)

Defect

(5, 0) (1, 1)

■ Even if mixed strategies are allowed, the PD possesses

  • ne
Nash equilib rium, viz. ( D, D) with payoffs (1, 1).

■ This equilibrium is

P a reto sub-optimal.
slide-29
SLIDE 29

Nash equilibria in playing the PD twice

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 5

Other: Prisoners’ Dilemma Cooperate Defect

Y
  • u:

Cooperate

(3, 3) (0, 5)

Defect

(5, 0) (1, 1)

■ Even if mixed strategies are allowed, the PD possesses

  • ne
Nash equilib rium, viz. ( D, D) with payoffs (1, 1).

■ This equilibrium is

P a reto sub-optimal.

■ Does the situation change if two parties get to play the Prisoners’

Dilemma two times in succession?

slide-30
SLIDE 30

Nash equilibria in playing the PD twice

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 5

Other: Prisoners’ Dilemma Cooperate Defect

Y
  • u:

Cooperate

(3, 3) (0, 5)

Defect

(5, 0) (1, 1)

■ Even if mixed strategies are allowed, the PD possesses

  • ne
Nash equilib rium, viz. ( D, D) with payoffs (1, 1).

■ This equilibrium is

P a reto sub-optimal.

■ Does the situation change if two parties get to play the Prisoners’

Dilemma two times in succession?

■ The following diagram (hopefully) shows that playing the PD two

times in succession does not yield an essentially new NE.

slide-31
SLIDE 31

Nash equilibria in playing the PD twice

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 6

(2, 2)

CC

(4, 4)

CC

(6, 6)

CD

(3, 8)

DC

(8, 3)

DD

(4, 4)

CD

(1, 6)

CC

(3, 8)

CD

(0, 10)

DC

(5, 5)

DD

(1, 6)

DC

(6, 1)

CC

(8, 3)

CD

(5, 5)

DC

(10, 0)

DD

(6, 1)

DD

(2, 2)

CC

(4, 4)

CD

(1, 6)

DC

(6, 1)

DD

(2, 2)

slide-32
SLIDE 32

Nash equilibria in playing the PD twice

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 6

(2, 2)

CC

(4, 4)

CC

(6, 6)

CD

(3, 8)

DC

(8, 3)

DD

(4, 4)

CD

(1, 6)

CC

(3, 8)

CD

(0, 10)

DC

(5, 5)

DD

(1, 6)

DC

(6, 1)

CC

(8, 3)

CD

(5, 5)

DC

(10, 0)

DD

(6, 1)

DD

(2, 2)

CC

(4, 4)

CD

(1, 6)

DC

(6, 1)

DD

(2, 2)

P.S. This is just a payoff tree, not a game in extensive form!

slide-33
SLIDE 33

Nash equilibria in playing the PD twice

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 7

In normal form: Other: CC CD DC DD

Y
  • u:

CC

(6, 6) (3, 8) (3, 8) (0, 10)

CD

(8, 3) (4, 4) (5, 5) (1, 6)

DC

(8, 3) (5, 5) (4, 4) (1, 6)

DD

(10, 0) (6, 1) (6, 1) (2, 2)

slide-34
SLIDE 34

Nash equilibria in playing the PD twice

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 7

In normal form: Other: CC CD DC DD

Y
  • u:

CC

(6, 6) (3, 8) (3, 8) (0, 10)

CD

(8, 3) (4, 4) (5, 5) (1, 6)

DC

(8, 3) (5, 5) (4, 4) (1, 6)

DD

(10, 0) (6, 1) (6, 1) (2, 2)

■ The action profile (DD, DD) is the only Nash equilibrium.

slide-35
SLIDE 35

Nash equilibria in playing the PD twice

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 7

In normal form: Other: CC CD DC DD

Y
  • u:

CC

(6, 6) (3, 8) (3, 8) (0, 10)

CD

(8, 3) (4, 4) (5, 5) (1, 6)

DC

(8, 3) (5, 5) (4, 4) (1, 6)

DD

(10, 0) (6, 1) (6, 1) (2, 2)

■ The action profile (DD, DD) is the only Nash equilibrium. ■ With 3 successive games, we obtain a 23 × 23 matrix, where the action

profile (DDD, DDD) still would be the only Nash equilibrium.

slide-36
SLIDE 36

Nash equilibria in playing the PD twice

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 7

In normal form: Other: CC CD DC DD

Y
  • u:

CC

(6, 6) (3, 8) (3, 8) (0, 10)

CD

(8, 3) (4, 4) (5, 5) (1, 6)

DC

(8, 3) (5, 5) (4, 4) (1, 6)

DD

(10, 0) (6, 1) (6, 1) (2, 2)

■ The action profile (DD, DD) is the only Nash equilibrium. ■ With 3 successive games, we obtain a 23 × 23 matrix, where the action

profile (DDD, DDD) still would be the only Nash equilibrium.

■ Generalise to N repetitions: (DDN−1, DDN−1) still is the only Nash

equilibrium in a repeated game where the PD is played N times in succession.

slide-37
SLIDE 37

Part II: Nash equilibria

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 8

slide-38
SLIDE 38

Part II: Nash equilibria in normal form games

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 8

slide-39
SLIDE 39

Part II: Nash equilibria in normal form games that are repeated

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 8

slide-40
SLIDE 40

Part II: Nash equilibria in normal form games that are repeated an indefinite number of times

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 8

slide-41
SLIDE 41

Terminology: finite, indefinite, infinite

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 9

To repeat an experiment . . .

a xed numb er
  • f
times an indenite numb er
  • f
times
  • untably
innite
slide-42
SLIDE 42

Terminology: finite, indefinite, infinite

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 9

To repeat an experiment . . .

■ . . . ten times.

a xed numb er
  • f
times an indenite numb er
  • f
times
  • untably
innite
slide-43
SLIDE 43

Terminology: finite, indefinite, infinite

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 9

To repeat an experiment . . .

■ . . . ten times. That’s hopefully clear.

a xed numb er
  • f
times an indenite numb er
  • f
times
  • untably
innite
slide-44
SLIDE 44

Terminology: finite, indefinite, infinite

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 9

To repeat an experiment . . .

■ . . . ten times. That’s hopefully clear. ■ . . . a finite number of times.

a xed numb er
  • f
times an indenite numb er
  • f
times
  • untably
innite
slide-45
SLIDE 45

Terminology: finite, indefinite, infinite

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 9

To repeat an experiment . . .

■ . . . ten times. That’s hopefully clear. ■ . . . a finite number of times. May mean:

a xed numb er
  • f
times, where the number
  • f repetitions is determined beforehand.
an indenite numb er
  • f
times
  • untably
innite
slide-46
SLIDE 46

Terminology: finite, indefinite, infinite

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 9

To repeat an experiment . . .

■ . . . ten times. That’s hopefully clear. ■ . . . a finite number of times. May mean:

a xed numb er
  • f
times, where the number
  • f repetitions is determined beforehand.

Or it may mean:

an indenite numb er
  • f
times.
  • untably
innite
slide-47
SLIDE 47

Terminology: finite, indefinite, infinite

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 9

To repeat an experiment . . .

■ . . . ten times. That’s hopefully clear. ■ . . . a finite number of times. May mean:

a xed numb er
  • f
times, where the number
  • f repetitions is determined beforehand.

Or it may mean:

an indenite numb er
  • f
  • times. Depends on the context!
  • untably
innite
slide-48
SLIDE 48

Terminology: finite, indefinite, infinite

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 9

To repeat an experiment . . .

■ . . . ten times. That’s hopefully clear. ■ . . . a finite number of times. May mean:

a xed numb er
  • f
times, where the number
  • f repetitions is determined beforehand.

Or it may mean:

an indenite numb er
  • f
  • times. Depends on the context!

■ . . . an indefinite number of times.

  • untably
innite
slide-49
SLIDE 49

Terminology: finite, indefinite, infinite

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 9

To repeat an experiment . . .

■ . . . ten times. That’s hopefully clear. ■ . . . a finite number of times. May mean:

a xed numb er
  • f
times, where the number
  • f repetitions is determined beforehand.

Or it may mean:

an indenite numb er
  • f
  • times. Depends on the context!

■ . . . an indefinite number of times. Means:

a finite number of times, but nothing is known beforehand about the number of repetitions.

  • untably
innite
slide-50
SLIDE 50

Terminology: finite, indefinite, infinite

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 9

To repeat an experiment . . .

■ . . . ten times. That’s hopefully clear. ■ . . . a finite number of times. May mean:

a xed numb er
  • f
times, where the number
  • f repetitions is determined beforehand.

Or it may mean:

an indenite numb er
  • f
  • times. Depends on the context!

■ . . . an indefinite number of times. Means:

a finite number of times, but nothing is known beforehand about the number of repetitions.

■ . . . an infinite number of times.

  • untably
innite
slide-51
SLIDE 51

Terminology: finite, indefinite, infinite

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 9

To repeat an experiment . . .

■ . . . ten times. That’s hopefully clear. ■ . . . a finite number of times. May mean:

a xed numb er
  • f
times, where the number
  • f repetitions is determined beforehand.

Or it may mean:

an indenite numb er
  • f
  • times. Depends on the context!

■ . . . an indefinite number of times. Means:

a finite number of times, but nothing is known beforehand about the number of repetitions.

■ . . . an infinite number of times. When

throwing a dice this must mean a

  • untably
innite number of times.
slide-52
SLIDE 52

Indefinite number of repetitions

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 10

indenite dis ount fa to r innitely many F
  • lk
theo rems
slide-53
SLIDE 53

Indefinite number of repetitions

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 10

■ A Pareto-suboptimal outcome can be avoided in case the following

three conditions are met.

indenite dis ount fa to r innitely many F
  • lk
theo rems
slide-54
SLIDE 54

Indefinite number of repetitions

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 10

■ A Pareto-suboptimal outcome can be avoided in case the following

three conditions are met.

  • 1. The Prisoners’ Dilemma is repeated an
indenite number of times

(rounds).

dis ount fa to r innitely many F
  • lk
theo rems
slide-55
SLIDE 55

Indefinite number of repetitions

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 10

■ A Pareto-suboptimal outcome can be avoided in case the following

three conditions are met.

  • 1. The Prisoners’ Dilemma is repeated an
indenite number of times

(rounds).

  • 2. A so-called
dis ount fa to r δ ∈ [0, 1] determines the probability of

continuing the game after each round.

innitely many F
  • lk
theo rems
slide-56
SLIDE 56

Indefinite number of repetitions

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 10

■ A Pareto-suboptimal outcome can be avoided in case the following

three conditions are met.

  • 1. The Prisoners’ Dilemma is repeated an
indenite number of times

(rounds).

  • 2. A so-called
dis ount fa to r δ ∈ [0, 1] determines the probability of

continuing the game after each round.

  • 3. The probability to continue, δ, is large enough.
innitely many F
  • lk
theo rems
slide-57
SLIDE 57

Indefinite number of repetitions

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 10

■ A Pareto-suboptimal outcome can be avoided in case the following

three conditions are met.

  • 1. The Prisoners’ Dilemma is repeated an
indenite number of times

(rounds).

  • 2. A so-called
dis ount fa to r δ ∈ [0, 1] determines the probability of

continuing the game after each round.

  • 3. The probability to continue, δ, is large enough.

■ Under these conditions suddenly

innitely many Nash equilibria exist.

This is sometimes called an embarrassment of richness (Peters, 2008).

F
  • lk
theo rems
slide-58
SLIDE 58

Indefinite number of repetitions

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 10

■ A Pareto-suboptimal outcome can be avoided in case the following

three conditions are met.

  • 1. The Prisoners’ Dilemma is repeated an
indenite number of times

(rounds).

  • 2. A so-called
dis ount fa to r δ ∈ [0, 1] determines the probability of

continuing the game after each round.

  • 3. The probability to continue, δ, is large enough.

■ Under these conditions suddenly

innitely many Nash equilibria exist.

This is sometimes called an embarrassment of richness (Peters, 2008).

■ Various

F
  • lk
theo rems state the existence of multiple equilibria in

games that are repeated an indefinite number of times.

slide-59
SLIDE 59

Indefinite number of repetitions

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 10

■ A Pareto-suboptimal outcome can be avoided in case the following

three conditions are met.

  • 1. The Prisoners’ Dilemma is repeated an
indenite number of times

(rounds).

  • 2. A so-called
dis ount fa to r δ ∈ [0, 1] determines the probability of

continuing the game after each round.

  • 3. The probability to continue, δ, is large enough.

■ Under these conditions suddenly

innitely many Nash equilibria exist.

This is sometimes called an embarrassment of richness (Peters, 2008).

■ Various

F
  • lk
theo rems state the existence of multiple equilibria in

games that are repeated an indefinite number of times.

■ Here we discuss one version of “the” Folk Theorem.

slide-60
SLIDE 60

Family of Folk Theorems

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 11

There actually exist many Folk Theorems.

slide-61
SLIDE 61

Family of Folk Theorems

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 11

There actually exist many Folk Theorems.

■ Horizon. The game may be repeated indefinitely (present case)

slide-62
SLIDE 62

Family of Folk Theorems

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 11

There actually exist many Folk Theorems.

■ Horizon. The game may be repeated indefinitely (present case) or

there may be an upper bound to the number of plays.

slide-63
SLIDE 63

Family of Folk Theorems

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 11

There actually exist many Folk Theorems.

■ Horizon. The game may be repeated indefinitely (present case) or

there may be an upper bound to the number of plays.

■ Information. Players may act on the basis of CKR (present case)

slide-64
SLIDE 64

Family of Folk Theorems

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 11

There actually exist many Folk Theorems.

■ Horizon. The game may be repeated indefinitely (present case) or

there may be an upper bound to the number of plays.

■ Information. Players may act on the basis of CKR (present case), or

certain parts of the history may be hidden.

slide-65
SLIDE 65

Family of Folk Theorems

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 11

There actually exist many Folk Theorems.

■ Horizon. The game may be repeated indefinitely (present case) or

there may be an upper bound to the number of plays.

■ Information. Players may act on the basis of CKR (present case), or

certain parts of the history may be hidden.

■ Reward. Players may collect their payoff through a discount factor

(present case)

slide-66
SLIDE 66

Family of Folk Theorems

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 11

There actually exist many Folk Theorems.

■ Horizon. The game may be repeated indefinitely (present case) or

there may be an upper bound to the number of plays.

■ Information. Players may act on the basis of CKR (present case), or

certain parts of the history may be hidden.

■ Reward. Players may collect their payoff through a discount factor

(present case) or through average rewards.

slide-67
SLIDE 67

Family of Folk Theorems

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 11

There actually exist many Folk Theorems.

■ Horizon. The game may be repeated indefinitely (present case) or

there may be an upper bound to the number of plays.

■ Information. Players may act on the basis of CKR (present case), or

certain parts of the history may be hidden.

■ Reward. Players may collect their payoff through a discount factor

(present case) or through average rewards.

■ Subgame perfectness. Subgame perfect equilibria (present case)

slide-68
SLIDE 68

Family of Folk Theorems

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 11

There actually exist many Folk Theorems.

■ Horizon. The game may be repeated indefinitely (present case) or

there may be an upper bound to the number of plays.

■ Information. Players may act on the basis of CKR (present case), or

certain parts of the history may be hidden.

■ Reward. Players may collect their payoff through a discount factor

(present case) or through average rewards.

■ Subgame perfectness. Subgame perfect equilibria (present case) or

plain Nash equilibria.

slide-69
SLIDE 69

Family of Folk Theorems

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 11

There actually exist many Folk Theorems.

■ Horizon. The game may be repeated indefinitely (present case) or

there may be an upper bound to the number of plays.

■ Information. Players may act on the basis of CKR (present case), or

certain parts of the history may be hidden.

■ Reward. Players may collect their payoff through a discount factor

(present case) or through average rewards.

■ Subgame perfectness. Subgame perfect equilibria (present case) or

plain Nash equilibria.

■ Equilibrium. We may be interested in Nash equilibria (present case)

slide-70
SLIDE 70

Family of Folk Theorems

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 11

There actually exist many Folk Theorems.

■ Horizon. The game may be repeated indefinitely (present case) or

there may be an upper bound to the number of plays.

■ Information. Players may act on the basis of CKR (present case), or

certain parts of the history may be hidden.

■ Reward. Players may collect their payoff through a discount factor

(present case) or through average rewards.

■ Subgame perfectness. Subgame perfect equilibria (present case) or

plain Nash equilibria.

■ Equilibrium. We may be interested in Nash equilibria (present case),

  • r other types of equilibria, such as so-called ǫ-Nash equilibria or

so-called correlated equilibria.

slide-71
SLIDE 71

The concept of a repeated game

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 12

rep eated game stage game histo ry
slide-72
SLIDE 72

The concept of a repeated game

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 12

■ Let G be a game in normal form.

rep eated game stage game histo ry
slide-73
SLIDE 73

The concept of a repeated game

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 12

■ Let G be a game in normal form. ■ The

rep eated game G∗(δ), is G, played an indefinite number of times stage game histo ry
slide-74
SLIDE 74

The concept of a repeated game

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 12

■ Let G be a game in normal form. ■ The

rep eated game G∗(δ), is G, played an indefinite number of times,

where δ represents the probability that the game will be played another time.

stage game histo ry
slide-75
SLIDE 75

The concept of a repeated game

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 12

■ Let G be a game in normal form. ■ The

rep eated game G∗(δ), is G, played an indefinite number of times,

where δ represents the probability that the game will be played another time. Exercise: give P{ G∗(δ) lasts at least t rounds }.

stage game histo ry
slide-76
SLIDE 76

The concept of a repeated game

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 12

■ Let G be a game in normal form. ■ The

rep eated game G∗(δ), is G, played an indefinite number of times,

where δ represents the probability that the game will be played another time. Exercise: give P{ G∗(δ) lasts at least t rounds }. Answer: δt.

stage game histo ry
slide-77
SLIDE 77

The concept of a repeated game

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 12

■ Let G be a game in normal form. ■ The

rep eated game G∗(δ), is G, played an indefinite number of times,

where δ represents the probability that the game will be played another time. Exercise: give P{ G∗(δ) lasts at least t rounds }. Answer: δt.

■ G is called the

stage game. histo ry
slide-78
SLIDE 78

The concept of a repeated game

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 12

■ Let G be a game in normal form. ■ The

rep eated game G∗(δ), is G, played an indefinite number of times,

where δ represents the probability that the game will be played another time. Exercise: give P{ G∗(δ) lasts at least t rounds }. Answer: δt.

■ G is called the

stage game.

■ A

histo ry h of length t of a repeated game is a sequence of action

profiles of length t.

slide-79
SLIDE 79

The concept of a repeated game

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 12

■ Let G be a game in normal form. ■ The

rep eated game G∗(δ), is G, played an indefinite number of times,

where δ represents the probability that the game will be played another time. Exercise: give P{ G∗(δ) lasts at least t rounds }. Answer: δt.

■ G is called the

stage game.

■ A

histo ry h of length t of a repeated game is a sequence of action

profiles of length t. Example: (for the prisoner’s dilemma): Row player: C D D D C C D D D D Column player: C D D D D D D C D D 1 2 3 4 5 6 7 8 9

slide-80
SLIDE 80

The concept of a repeated game (II)

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 13

strategy strategy p role exp e ted pa y
slide-81
SLIDE 81

The concept of a repeated game (II)

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 13

■ The set of all possible histories (of any length) is denoted by H.

strategy strategy p role exp e ted pa y
slide-82
SLIDE 82

The concept of a repeated game (II)

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 13

■ The set of all possible histories (of any length) is denoted by H. ■ A

strategy for Player i is a function si : H → ∆{C, D} such that

Pr( Player i plays C in round |h| + 1 | h ) = si(h) (C).

strategy p role exp e ted pa y
slide-83
SLIDE 83

The concept of a repeated game (II)

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 13

■ The set of all possible histories (of any length) is denoted by H. ■ A

strategy for Player i is a function si : H → ∆{C, D} such that

Pr( Player i plays C in round |h| + 1 | h ) = si(h) (C).

■ A

strategy p role s is a combination of strategies, one for each player. exp e ted pa y
slide-84
SLIDE 84

The concept of a repeated game (II)

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 13

■ The set of all possible histories (of any length) is denoted by H. ■ A

strategy for Player i is a function si : H → ∆{C, D} such that

Pr( Player i plays C in round |h| + 1 | h ) = si(h) (C).

■ A

strategy p role s is a combination of strategies, one for each player.

■ The

exp e ted pa y
  • for player i given s can be computed. It is

Expected payoffi(s) =

t=0

δt Expected payoffi,t(s). Example on next page.

slide-85
SLIDE 85

The concept of a repeated game (II)

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 14

Repeated from the previous slide: The

exp e ted pa y
  • for player i given s can be computed. It is

Expected payoffi(s) =

t=0

δt Expected payoffi,t(s).

slide-86
SLIDE 86

The concept of a repeated game (II)

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 14

Repeated from the previous slide: The

exp e ted pa y
  • for player i given s can be computed. It is

Expected payoffi(s) =

t=0

δt Expected payoffi,t(s). Example: prisoner’s dilemma, strategy Player 1 is s1 = “always cooperate 80%”;

slide-87
SLIDE 87

The concept of a repeated game (II)

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 14

Repeated from the previous slide: The

exp e ted pa y
  • for player i given s can be computed. It is

Expected payoffi(s) =

t=0

δt Expected payoffi,t(s). Example: prisoner’s dilemma, strategy Player 1 is s1 = “always cooperate 80%”; strategy Player 2 is s2 = “always cooperate 70%”;

slide-88
SLIDE 88

The concept of a repeated game (II)

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 14

Repeated from the previous slide: The

exp e ted pa y
  • for player i given s can be computed. It is

Expected payoffi(s) =

t=0

δt Expected payoffi,t(s). Example: prisoner’s dilemma, strategy Player 1 is s1 = “always cooperate 80%”; strategy Player 2 is s2 = “always cooperate 70%”; δ = 1/2.

slide-89
SLIDE 89

The concept of a repeated game (II)

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 14

Repeated from the previous slide: The

exp e ted pa y
  • for player i given s can be computed. It is

Expected payoffi(s) =

t=0

δt Expected payoffi,t(s). Example: prisoner’s dilemma, strategy Player 1 is s1 = “always cooperate 80%”; strategy Player 2 is s2 = “always cooperate 70%”; δ = 1/2. Expected payoff1(s) =

t=0

1 2 t

[0.8(0.7· 3 + 0.3 · 0) + 0.2(0.7· 5 + 0.3 · 1)]

  • =

1 1 − 1/2[ . . . ] ≈ 1 1 − 1/22.44 = 2 × 2.44 = 4.88.

slide-90
SLIDE 90

Subgame perfect equilibria of G∗(δ): D∗

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 15

  • Definition. A strategy profile s for G∗(δ) is a
subgame-p erfe t Nash equilib rium if (1) it is a Nash equilibrium of this repeated game, and (2)

for every subgame (i.e., tail game) of this repeated game, s restricted to that subgame is a Nash equilibrium as well.

slide-91
SLIDE 91

Subgame perfect equilibria of G∗(δ): D∗

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 15

  • Definition. A strategy profile s for G∗(δ) is a
subgame-p erfe t Nash equilib rium if (1) it is a Nash equilibrium of this repeated game, and (2)

for every subgame (i.e., tail game) of this repeated game, s restricted to that subgame is a Nash equilibrium as well. Consider the strategy of iterated defection D∗: “always defect, no matter what”.1

slide-92
SLIDE 92

Subgame perfect equilibria of G∗(δ): D∗

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 15

  • Definition. A strategy profile s for G∗(δ) is a
subgame-p erfe t Nash equilib rium if (1) it is a Nash equilibrium of this repeated game, and (2)

for every subgame (i.e., tail game) of this repeated game, s restricted to that subgame is a Nash equilibrium as well. Consider the strategy of iterated defection D∗: “always defect, no matter what”.1

  • Claim. The strategy profile (D∗, D∗) is a subgame perfect equilibrium in

G∗(δ).

slide-93
SLIDE 93

Subgame perfect equilibria of G∗(δ): D∗

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 15

  • Definition. A strategy profile s for G∗(δ) is a
subgame-p erfe t Nash equilib rium if (1) it is a Nash equilibrium of this repeated game, and (2)

for every subgame (i.e., tail game) of this repeated game, s restricted to that subgame is a Nash equilibrium as well. Consider the strategy of iterated defection D∗: “always defect, no matter what”.1

  • Claim. The strategy profile (D∗, D∗) is a subgame perfect equilibrium in

G∗(δ).

  • Proof. Consider any tailgame starting at round t ≥ 0.
slide-94
SLIDE 94

Subgame perfect equilibria of G∗(δ): D∗

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 15

  • Definition. A strategy profile s for G∗(δ) is a
subgame-p erfe t Nash equilib rium if (1) it is a Nash equilibrium of this repeated game, and (2)

for every subgame (i.e., tail game) of this repeated game, s restricted to that subgame is a Nash equilibrium as well. Consider the strategy of iterated defection D∗: “always defect, no matter what”.1

  • Claim. The strategy profile (D∗, D∗) is a subgame perfect equilibrium in

G∗(δ).

  • Proof. Consider any tailgame starting at round t ≥ 0. We are done if we

can show that (D∗, D∗) is a NE for this subgame.

slide-95
SLIDE 95

Subgame perfect equilibria of G∗(δ): D∗

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 15

  • Definition. A strategy profile s for G∗(δ) is a
subgame-p erfe t Nash equilib rium if (1) it is a Nash equilibrium of this repeated game, and (2)

for every subgame (i.e., tail game) of this repeated game, s restricted to that subgame is a Nash equilibrium as well. Consider the strategy of iterated defection D∗: “always defect, no matter what”.1

  • Claim. The strategy profile (D∗, D∗) is a subgame perfect equilibrium in

G∗(δ).

  • Proof. Consider any tailgame starting at round t ≥ 0. We are done if we

can show that (D∗, D∗) is a NE for this subgame. This is true: given that

  • ne player always defects, it never pays off for the other player to play C

at any time.

slide-96
SLIDE 96

Subgame perfect equilibria of G∗(δ): D∗

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 15

  • Definition. A strategy profile s for G∗(δ) is a
subgame-p erfe t Nash equilib rium if (1) it is a Nash equilibrium of this repeated game, and (2)

for every subgame (i.e., tail game) of this repeated game, s restricted to that subgame is a Nash equilibrium as well. Consider the strategy of iterated defection D∗: “always defect, no matter what”.1

  • Claim. The strategy profile (D∗, D∗) is a subgame perfect equilibrium in

G∗(δ).

  • Proof. Consider any tailgame starting at round t ≥ 0. We are done if we

can show that (D∗, D∗) is a NE for this subgame. This is true: given that

  • ne player always defects, it never pays off for the other player to play C

at any time. Therefore, everyone sticks to D∗.

1A notation like D∗ or (worse) D∞ is suggestive. Mathematically it makes no sense,

but intuitively it does.

slide-97
SLIDE 97

Part III: Trigger strategies

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 16

slide-98
SLIDE 98

Cost of deviating in Round N

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 17

Consider the so-called trigger strategy T: “always play C unless D has been played at least once. In that case play D forever”.

slide-99
SLIDE 99

Cost of deviating in Round N

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 17

Consider the so-called trigger strategy T: “always play C unless D has been played at least once. In that case play D forever”.

  • Claim. The strategy profile (T, T) is a subgame perfect equilibrium in G∗(δ),

provided the probability of continuation, δ, is sufficiently large.

slide-100
SLIDE 100

Cost of deviating in Round N

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 17

Consider the so-called trigger strategy T: “always play C unless D has been played at least once. In that case play D forever”.

  • Claim. The strategy profile (T, T) is a subgame perfect equilibrium in G∗(δ),

provided the probability of continuation, δ, is sufficiently large.

  • Proof. Suppose one player starts to defect at Round N.
slide-101
SLIDE 101

Cost of deviating in Round N

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 17

Consider the so-called trigger strategy T: “always play C unless D has been played at least once. In that case play D forever”.

  • Claim. The strategy profile (T, T) is a subgame perfect equilibrium in G∗(δ),

provided the probability of continuation, δ, is sufficiently large.

  • Proof. Suppose one player starts to defect at Round N. By doing so he

expects a payoff of

N−1

t=0

slide-102
SLIDE 102

Cost of deviating in Round N

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 17

Consider the so-called trigger strategy T: “always play C unless D has been played at least once. In that case play D forever”.

  • Claim. The strategy profile (T, T) is a subgame perfect equilibrium in G∗(δ),

provided the probability of continuation, δ, is sufficiently large.

  • Proof. Suppose one player starts to defect at Round N. By doing so he

expects a payoff of

N−1

t=0

δt· 3

slide-103
SLIDE 103

Cost of deviating in Round N

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 17

Consider the so-called trigger strategy T: “always play C unless D has been played at least once. In that case play D forever”.

  • Claim. The strategy profile (T, T) is a subgame perfect equilibrium in G∗(δ),

provided the probability of continuation, δ, is sufficiently large.

  • Proof. Suppose one player starts to defect at Round N. By doing so he

expects a payoff of

N−1

t=0

δt· 3 + δN· 5

slide-104
SLIDE 104

Cost of deviating in Round N

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 17

Consider the so-called trigger strategy T: “always play C unless D has been played at least once. In that case play D forever”.

  • Claim. The strategy profile (T, T) is a subgame perfect equilibrium in G∗(δ),

provided the probability of continuation, δ, is sufficiently large.

  • Proof. Suppose one player starts to defect at Round N. By doing so he

expects a payoff of

N−1

t=0

δt· 3 + δN· 5 +

t=N+1

δt· 1

slide-105
SLIDE 105

Cost of deviating in Round N

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 17

Consider the so-called trigger strategy T: “always play C unless D has been played at least once. In that case play D forever”.

  • Claim. The strategy profile (T, T) is a subgame perfect equilibrium in G∗(δ),

provided the probability of continuation, δ, is sufficiently large.

  • Proof. Suppose one player starts to defect at Round N. By doing so he

expects a payoff of

N−1

t=0

δt· 3 + δN· 5 +

t=N+1

δt· 1 By playing C throughout, he could have earned

slide-106
SLIDE 106

Cost of deviating in Round N

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 17

Consider the so-called trigger strategy T: “always play C unless D has been played at least once. In that case play D forever”.

  • Claim. The strategy profile (T, T) is a subgame perfect equilibrium in G∗(δ),

provided the probability of continuation, δ, is sufficiently large.

  • Proof. Suppose one player starts to defect at Round N. By doing so he

expects a payoff of

N−1

t=0

δt· 3 + δN· 5 +

t=N+1

δt· 1 By playing C throughout, he could have earned ∑∞

t=0

slide-107
SLIDE 107

Cost of deviating in Round N

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 17

Consider the so-called trigger strategy T: “always play C unless D has been played at least once. In that case play D forever”.

  • Claim. The strategy profile (T, T) is a subgame perfect equilibrium in G∗(δ),

provided the probability of continuation, δ, is sufficiently large.

  • Proof. Suppose one player starts to defect at Round N. By doing so he

expects a payoff of

N−1

t=0

δt· 3 + δN· 5 +

t=N+1

δt· 1 By playing C throughout, he could have earned ∑∞

t=0 δt· 3

slide-108
SLIDE 108

Cost of deviating in Round N

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 17

Consider the so-called trigger strategy T: “always play C unless D has been played at least once. In that case play D forever”.

  • Claim. The strategy profile (T, T) is a subgame perfect equilibrium in G∗(δ),

provided the probability of continuation, δ, is sufficiently large.

  • Proof. Suppose one player starts to defect at Round N. By doing so he

expects a payoff of

N−1

t=0

δt· 3 + δN· 5 +

t=N+1

δt· 1 By playing C throughout, he could have earned ∑∞

t=0 δt· 3 which means he

forfeited

slide-109
SLIDE 109

Cost of deviating in Round N

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 17

Consider the so-called trigger strategy T: “always play C unless D has been played at least once. In that case play D forever”.

  • Claim. The strategy profile (T, T) is a subgame perfect equilibrium in G∗(δ),

provided the probability of continuation, δ, is sufficiently large.

  • Proof. Suppose one player starts to defect at Round N. By doing so he

expects a payoff of

N−1

t=0

δt· 3 + δN· 5 +

t=N+1

δt· 1 By playing C throughout, he could have earned ∑∞

t=0 δt· 3 which means he

forfeited

t=0

δt· 3

slide-110
SLIDE 110

Cost of deviating in Round N

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 17

Consider the so-called trigger strategy T: “always play C unless D has been played at least once. In that case play D forever”.

  • Claim. The strategy profile (T, T) is a subgame perfect equilibrium in G∗(δ),

provided the probability of continuation, δ, is sufficiently large.

  • Proof. Suppose one player starts to defect at Round N. By doing so he

expects a payoff of

N−1

t=0

δt· 3 + δN· 5 +

t=N+1

δt· 1 By playing C throughout, he could have earned ∑∞

t=0 δt· 3 which means he

forfeited

t=0

δt· 3

  • N−1

t=0

slide-111
SLIDE 111

Cost of deviating in Round N

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 17

Consider the so-called trigger strategy T: “always play C unless D has been played at least once. In that case play D forever”.

  • Claim. The strategy profile (T, T) is a subgame perfect equilibrium in G∗(δ),

provided the probability of continuation, δ, is sufficiently large.

  • Proof. Suppose one player starts to defect at Round N. By doing so he

expects a payoff of

N−1

t=0

δt· 3 + δN· 5 +

t=N+1

δt· 1 By playing C throughout, he could have earned ∑∞

t=0 δt· 3 which means he

forfeited

t=0

δt· 3

  • N−1

t=0

δt· 3

slide-112
SLIDE 112

Cost of deviating in Round N

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 17

Consider the so-called trigger strategy T: “always play C unless D has been played at least once. In that case play D forever”.

  • Claim. The strategy profile (T, T) is a subgame perfect equilibrium in G∗(δ),

provided the probability of continuation, δ, is sufficiently large.

  • Proof. Suppose one player starts to defect at Round N. By doing so he

expects a payoff of

N−1

t=0

δt· 3 + δN· 5 +

t=N+1

δt· 1 By playing C throughout, he could have earned ∑∞

t=0 δt· 3 which means he

forfeited

t=0

δt· 3

  • N−1

t=0

δt· 3 + δN· 5

slide-113
SLIDE 113

Cost of deviating in Round N

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 17

Consider the so-called trigger strategy T: “always play C unless D has been played at least once. In that case play D forever”.

  • Claim. The strategy profile (T, T) is a subgame perfect equilibrium in G∗(δ),

provided the probability of continuation, δ, is sufficiently large.

  • Proof. Suppose one player starts to defect at Round N. By doing so he

expects a payoff of

N−1

t=0

δt· 3 + δN· 5 +

t=N+1

δt· 1 By playing C throughout, he could have earned ∑∞

t=0 δt· 3 which means he

forfeited

t=0

δt· 3

  • N−1

t=0

δt· 3 + δN· 5 +

t=N+1

δt· 1

slide-114
SLIDE 114

Cost of deviating in Round N

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 17

Consider the so-called trigger strategy T: “always play C unless D has been played at least once. In that case play D forever”.

  • Claim. The strategy profile (T, T) is a subgame perfect equilibrium in G∗(δ),

provided the probability of continuation, δ, is sufficiently large.

  • Proof. Suppose one player starts to defect at Round N. By doing so he

expects a payoff of

N−1

t=0

δt· 3 + δN· 5 +

t=N+1

δt· 1 By playing C throughout, he could have earned ∑∞

t=0 δt· 3 which means he

forfeited

t=0

δt· 3

  • N−1

t=0

δt· 3 + δN· 5 +

t=N+1

δt· 1

  • = −2δN
slide-115
SLIDE 115

Cost of deviating in Round N

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 17

Consider the so-called trigger strategy T: “always play C unless D has been played at least once. In that case play D forever”.

  • Claim. The strategy profile (T, T) is a subgame perfect equilibrium in G∗(δ),

provided the probability of continuation, δ, is sufficiently large.

  • Proof. Suppose one player starts to defect at Round N. By doing so he

expects a payoff of

N−1

t=0

δt· 3 + δN· 5 +

t=N+1

δt· 1 By playing C throughout, he could have earned ∑∞

t=0 δt· 3 which means he

forfeited

t=0

δt· 3

  • N−1

t=0

δt· 3 + δN· 5 +

t=N+1

δt· 1

  • = −2δN + 2

t=N+1

δt.

slide-116
SLIDE 116

Cost of deviating in Round N

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 18

So if

−2δN + 2

t=N+1

δt > 0 there is forfeit from payoff.

slide-117
SLIDE 117

Cost of deviating in Round N

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 18

So if

−2δN + 2

t=N+1

δt > 0 there is forfeit from payoff. This is when

t=N+1

δt > δN

slide-118
SLIDE 118

Cost of deviating in Round N

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 18

So if

−2δN + 2

t=N+1

δt > 0 there is forfeit from payoff. This is when

t=N+1

δt > δN

δN+1

t=0

δt > δN

slide-119
SLIDE 119

Cost of deviating in Round N

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 18

So if

−2δN + 2

t=N+1

δt > 0 there is forfeit from payoff. This is when

t=N+1

δt > δN

δN+1

t=0

δt > δN

δ = 1 xor δN+1 1 1 − δ > δN

slide-120
SLIDE 120

Cost of deviating in Round N

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 18

So if

−2δN + 2

t=N+1

δt > 0 there is forfeit from payoff. This is when

t=N+1

δt > δN

δN+1

t=0

δt > δN

δ = 1 xor δN+1 1 1 − δ > δN

δ = 1 xor δ 1 1 − δ > 1, δ = 0

slide-121
SLIDE 121

Cost of deviating in Round N

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 18

So if

−2δN + 2

t=N+1

δt > 0 there is forfeit from payoff. This is when

t=N+1

δt > δN

δN+1

t=0

δt > δN

δ = 1 xor δN+1 1 1 − δ > δN

δ = 1 xor δ 1 1 − δ > 1, δ = 0

δ > 1 − δ

slide-122
SLIDE 122

Cost of deviating in Round N

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 18

So if

−2δN + 2

t=N+1

δt > 0 there is forfeit from payoff. This is when

t=N+1

δt > δN

δN+1

t=0

δt > δN

δ = 1 xor δN+1 1 1 − δ > δN

δ = 1 xor δ 1 1 − δ > 1, δ = 0

δ > 1 − δ

δ > 1 2.

slide-123
SLIDE 123

Cost of deviating in Round N

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 18

So if

−2δN + 2

t=N+1

δt > 0 there is forfeit from payoff. This is when

t=N+1

δt > δN

δN+1

t=0

δt > δN

δ = 1 xor δN+1 1 1 − δ > δN

δ = 1 xor δ 1 1 − δ > 1, δ = 0

δ > 1 − δ

δ > 1 2. Therefore, if δ > 1/2 every player forfeits payoff by deviating from T.

slide-124
SLIDE 124

Example: an alternating trigger strategy

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 19

Yet another subgame perfect equilibrium:

slide-125
SLIDE 125

Example: an alternating trigger strategy

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 19

Yet another subgame perfect equilibrium: Informal definition of strategies. A and B tacitly agree to alternate actions, i.e.,

(C, D), (D, C), (C, D), (D, C) . . . .

slide-126
SLIDE 126

Example: an alternating trigger strategy

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 19

Yet another subgame perfect equilibrium: Informal definition of strategies. A and B tacitly agree to alternate actions, i.e.,

(C, D), (D, C), (C, D), (D, C) . . . .

If one of them deviates, the other party plays D forever.

slide-127
SLIDE 127

Example: an alternating trigger strategy

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 19

Yet another subgame perfect equilibrium: Informal definition of strategies. A and B tacitly agree to alternate actions, i.e.,

(C, D), (D, C), (C, D), (D, C) . . . .

If one of them deviates, the other party plays D forever. (The party who

  • riginally deviated plays D forever thereafter as well.)
slide-128
SLIDE 128

Example: an alternating trigger strategy

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 19

Yet another subgame perfect equilibrium: Informal definition of strategies. A and B tacitly agree to alternate actions, i.e.,

(C, D), (D, C), (C, D), (D, C) . . . .

If one of them deviates, the other party plays D forever. (The party who

  • riginally deviated plays D forever thereafter as well.) CKR is at work

here!

slide-129
SLIDE 129

Example: an alternating trigger strategy

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 19

Yet another subgame perfect equilibrium: Informal definition of strategies. A and B tacitly agree to alternate actions, i.e.,

(C, D), (D, C), (C, D), (D, C) . . . .

If one of them deviates, the other party plays D forever. (The party who

  • riginally deviated plays D forever thereafter as well.) CKR is at work

here! Let A be the strategy that plays C in Round 1. Let B be the other strategy.

slide-130
SLIDE 130

Example: an alternating trigger strategy

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 19

Yet another subgame perfect equilibrium: Informal definition of strategies. A and B tacitly agree to alternate actions, i.e.,

(C, D), (D, C), (C, D), (D, C) . . . .

If one of them deviates, the other party plays D forever. (The party who

  • riginally deviated plays D forever thereafter as well.) CKR is at work

here! Let A be the strategy that plays C in Round 1. Let B be the other strategy.

  • Claim. The strategy profile (A, B) is a subgame perfect equilibrium in G∗(δ),

provided the probability of continuation, δ, is sufficiently large.

slide-131
SLIDE 131

Example: an alternating trigger strategy

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 19

Yet another subgame perfect equilibrium: Informal definition of strategies. A and B tacitly agree to alternate actions, i.e.,

(C, D), (D, C), (C, D), (D, C) . . . .

If one of them deviates, the other party plays D forever. (The party who

  • riginally deviated plays D forever thereafter as well.) CKR is at work

here! Let A be the strategy that plays C in Round 1. Let B be the other strategy.

  • Claim. The strategy profile (A, B) is a subgame perfect equilibrium in G∗(δ),

provided the probability of continuation, δ, is sufficiently large. An analysis of this situation and a proof of this claim can be found in (Peters, 2008), pp. 104-105.*

*H. Peters (2008): Game Theory: A Multi-Leveled Approach. Springer, ISBN: 978-3-540-69290-4.

slide-132
SLIDE 132

Generalisation of trigger strategies

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 20

slide-133
SLIDE 133

Generalisation of trigger strategies

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 20

■ Every convex combination2 of payoffs

α1(3, 3) + α2(0, 5) + α3(5, 0) + α4(1, 1) can be established by smartly picking appropriate strategy patterns.

slide-134
SLIDE 134

Generalisation of trigger strategies

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 20

■ Every convex combination2 of payoffs

α1(3, 3) + α2(0, 5) + α3(5, 0) + α4(1, 1) can be established by smartly picking appropriate strategy patterns. E.g.: “We play 4 times (C, C). Then we play 7 times (C, D), (D, C), then 4 times (C, C) and so on”.

slide-135
SLIDE 135

Generalisation of trigger strategies

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 20

■ Every convex combination2 of payoffs

α1(3, 3) + α2(0, 5) + α3(5, 0) + α4(1, 1) can be established by smartly picking appropriate strategy patterns. E.g.: “We play 4 times (C, C). Then we play 7 times (C, D), (D, C), then 4 times (C, C) and so on”.

■ Ensure that (C, C) occurs (in the long run) in α1, (C, D) in α2, (D, C)

in α3, and (D, D) in α4 percent of the stages.

slide-136
SLIDE 136

Generalisation of trigger strategies

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 20

■ Every convex combination2 of payoffs

α1(3, 3) + α2(0, 5) + α3(5, 0) + α4(1, 1) can be established by smartly picking appropriate strategy patterns. E.g.: “We play 4 times (C, C). Then we play 7 times (C, D), (D, C), then 4 times (C, C) and so on”.

■ Ensure that (C, C) occurs (in the long run) in α1, (C, D) in α2, (D, C)

in α3, and (D, D) in α4 percent of the stages.

■ As long as these limiting average payoffs exceed payoff({D, D}) for

each player (which is 1), associated trigger strategies can be formulated that lead to these payoffs and trigger eternal play of

(D, D) after a deviation.

slide-137
SLIDE 137

Generalisation of trigger strategies

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 20

■ Every convex combination2 of payoffs

α1(3, 3) + α2(0, 5) + α3(5, 0) + α4(1, 1) can be established by smartly picking appropriate strategy patterns. E.g.: “We play 4 times (C, C). Then we play 7 times (C, D), (D, C), then 4 times (C, C) and so on”.

■ Ensure that (C, C) occurs (in the long run) in α1, (C, D) in α2, (D, C)

in α3, and (D, D) in α4 percent of the stages.

■ As long as these limiting average payoffs exceed payoff({D, D}) for

each player (which is 1), associated trigger strategies can be formulated that lead to these payoffs and trigger eternal play of

(D, D) after a deviation.

■ For δ high enough, these strategies again form a SGP NE.

2Meaning αi ≥ 0 and α1 + α2 + α3 + α4 = 1.

slide-138
SLIDE 138

Folk theorem for SGP NE in a repeated PD

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 21

1 2 3 4 5 1 2 3 4 5

(3, 3)

slide-139
SLIDE 139

Folk theorem for SGP NE in a repeated PD

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 21

1 2 3 4 5 1 2 3 4 5

(3, 3)

  • 1. Feasible payoffs (striped):

payoff combos that can be

  • btained by jointly repeating

patterns of actions (more accurately: patterns of action profiles).

slide-140
SLIDE 140

Folk theorem for SGP NE in a repeated PD

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 21

1 2 3 4 5 1 2 3 4 5

(3, 3)

  • 1. Feasible payoffs (striped):

payoff combos that can be

  • btained by jointly repeating

patterns of actions (more accurately: patterns of action profiles).

  • 2. Enforceable payoffs (shaded):

everyone resides above punishment minmax.

slide-141
SLIDE 141

Folk theorem for SGP NE in a repeated PD

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 21

1 2 3 4 5 1 2 3 4 5

(3, 3)

  • 1. Feasible payoffs (striped):

payoff combos that can be

  • btained by jointly repeating

patterns of actions (more accurately: patterns of action profiles).

  • 2. Enforceable payoffs (shaded):

everyone resides above punishment minmax. For every payoff pair (x, y) in (1) ∩ (2),

slide-142
SLIDE 142

Folk theorem for SGP NE in a repeated PD

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 21

1 2 3 4 5 1 2 3 4 5

(3, 3)

  • 1. Feasible payoffs (striped):

payoff combos that can be

  • btained by jointly repeating

patterns of actions (more accurately: patterns of action profiles).

  • 2. Enforceable payoffs (shaded):

everyone resides above punishment minmax. For every payoff pair (x, y) in (1) ∩ (2), there is a δ(x, y) ∈ (0, 1),

slide-143
SLIDE 143

Folk theorem for SGP NE in a repeated PD

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 21

1 2 3 4 5 1 2 3 4 5

(3, 3)

  • 1. Feasible payoffs (striped):

payoff combos that can be

  • btained by jointly repeating

patterns of actions (more accurately: patterns of action profiles).

  • 2. Enforceable payoffs (shaded):

everyone resides above punishment minmax. For every payoff pair (x, y) in (1) ∩ (2), there is a δ(x, y) ∈ (0, 1), such that for all δ ≥ δ(x, y)

slide-144
SLIDE 144

Folk theorem for SGP NE in a repeated PD

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 21

1 2 3 4 5 1 2 3 4 5

(3, 3)

  • 1. Feasible payoffs (striped):

payoff combos that can be

  • btained by jointly repeating

patterns of actions (more accurately: patterns of action profiles).

  • 2. Enforceable payoffs (shaded):

everyone resides above punishment minmax. For every payoff pair (x, y) in (1) ∩ (2), there is a δ(x, y) ∈ (0, 1), such that for all δ ≥ δ(x, y) the payoff

(x, y) can be obtained as the

limiting average in a sub- game perfect equilibrium of G∗(δ).

slide-145
SLIDE 145

Part IV: non-SGP Nash equilibria

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 22

slide-146
SLIDE 146

Existence of non-SGP Nash equilibria

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 23

existen e
  • f
non-SGP Nash equilib ria in rep eated games threats that a re not redible
slide-147
SLIDE 147

Existence of non-SGP Nash equilibria

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 23

■ We have seen that many

subgame perfect equilibria exist for repeated games.

existen e
  • f
non-SGP Nash equilib ria in rep eated games threats that a re not redible
slide-148
SLIDE 148

Existence of non-SGP Nash equilibria

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 23

■ We have seen that many

subgame perfect equilibria exist for repeated games.

■ What about the

existen e
  • f
non-SGP Nash equilib ria in rep eated games, i.e., equilibria

that are not necessarily subgame perfect?

threats that a re not redible
slide-149
SLIDE 149

Existence of non-SGP Nash equilibria

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 23

■ We have seen that many

subgame perfect equilibria exist for repeated games.

■ What about the

existen e
  • f
non-SGP Nash equilib ria in rep eated games, i.e., equilibria

that are not necessarily subgame perfect?

■ Without the requirement of

subgame perfection, deviations can be punished more severely: the equilibrium does not have to induce SGPs.

threats that a re not redible
slide-150
SLIDE 150

Existence of non-SGP Nash equilibria

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 23

■ We have seen that many

subgame perfect equilibria exist for repeated games.

■ What about the

existen e
  • f
non-SGP Nash equilib ria in rep eated games, i.e., equilibria

that are not necessarily subgame perfect?

■ Without the requirement of

subgame perfection, deviations can be punished more severely: the equilibrium does not have to induce SGPs.

■ However, non-SGPs implies

threats that a re not redible.
slide-151
SLIDE 151

Game Theory: A Critical [what?]

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 24

slide-152
SLIDE 152

Example: A repeated game with a non-SGP NE

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 25

Col: Some game Left ( L ) Right ( R )

Ro w:

Up ( U )

(1, 1) (0, 0)

Down ( D )

(0, 0) (−1, 4)

slide-153
SLIDE 153

Example: A repeated game with a non-SGP NE

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 25

Col: Some game Left ( L ) Right ( R )

Ro w:

Up ( U )

(1, 1) (0, 0)

Down ( D )

(0, 0) (−1, 4)

  • 1. For row, U is a dominating strategy.
slide-154
SLIDE 154

Example: A repeated game with a non-SGP NE

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 25

Col: Some game Left ( L ) Right ( R )

Ro w:

Up ( U )

(1, 1) (0, 0)

Down ( D )

(0, 0) (−1, 4)

  • 1. For row, U is a dominating strategy.
  • 2. The pure profile (U, L) is the only mixed strategy profile that is a NE.
slide-155
SLIDE 155

Example: A repeated game with a non-SGP NE

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 25

Col: Some game Left ( L ) Right ( R )

Ro w:

Up ( U )

(1, 1) (0, 0)

Down ( D )

(0, 0) (−1, 4)

  • 1. For row, U is a dominating strategy.
  • 2. The pure profile (U, L) is the only mixed strategy profile that is a NE.
  • 3. Define trigger-strategies (T1, T2) such that the pattern

[(D, R), (U, L)3]∗ is played indefinitely.

slide-156
SLIDE 156

Example: A repeated game with a non-SGP NE

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 25

Col: Some game Left ( L ) Right ( R )

Ro w:

Up ( U )

(1, 1) (0, 0)

Down ( D )

(0, 0) (−1, 4)

  • 1. For row, U is a dominating strategy.
  • 2. The pure profile (U, L) is the only mixed strategy profile that is a NE.
  • 3. Define trigger-strategies (T1, T2) such that the pattern

[(D, R), (U, L)3]∗ is played indefinitely.

If this pattern is violated, both parties fall back to punishment strategies:

slide-157
SLIDE 157

Example: A repeated game with a non-SGP NE

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 25

Col: Some game Left ( L ) Right ( R )

Ro w:

Up ( U )

(1, 1) (0, 0)

Down ( D )

(0, 0) (−1, 4)

  • 1. For row, U is a dominating strategy.
  • 2. The pure profile (U, L) is the only mixed strategy profile that is a NE.
  • 3. Define trigger-strategies (T1, T2) such that the pattern

[(D, R), (U, L)3]∗ is played indefinitely.

If this pattern is violated, both parties fall back to punishment strategies:

■ The punishment strategy of row is mixed (0.8, 0.2)∗.

slide-158
SLIDE 158

Example: A repeated game with a non-SGP NE

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 25

Col: Some game Left ( L ) Right ( R )

Ro w:

Up ( U )

(1, 1) (0, 0)

Down ( D )

(0, 0) (−1, 4)

  • 1. For row, U is a dominating strategy.
  • 2. The pure profile (U, L) is the only mixed strategy profile that is a NE.
  • 3. Define trigger-strategies (T1, T2) such that the pattern

[(D, R), (U, L)3]∗ is played indefinitely.

If this pattern is violated, both parties fall back to punishment strategies:

■ The punishment strategy of row is mixed (0.8, 0.2)∗. ■ The punishment strategy of col is pure R∗.

slide-159
SLIDE 159

Example: A repeated game with a non-SGP NE

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 25

Col: Some game Left ( L ) Right ( R )

Ro w:

Up ( U )

(1, 1) (0, 0)

Down ( D )

(0, 0) (−1, 4)

  • 1. For row, U is a dominating strategy.
  • 2. The pure profile (U, L) is the only mixed strategy profile that is a NE.
  • 3. Define trigger-strategies (T1, T2) such that the pattern

[(D, R), (U, L)3]∗ is played indefinitely.

If this pattern is violated, both parties fall back to punishment strategies:

■ The punishment strategy of row is mixed (0.8, 0.2)∗. ■ The punishment strategy of col is pure R∗.

This combination of strategies is not a NE.

slide-160
SLIDE 160

Example: A repeated game with a non-SGP NE

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 25

Col: Some game Left ( L ) Right ( R )

Ro w:

Up ( U )

(1, 1) (0, 0)

Down ( D )

(0, 0) (−1, 4)

  • 1. For row, U is a dominating strategy.
  • 2. The pure profile (U, L) is the only mixed strategy profile that is a NE.
  • 3. Define trigger-strategies (T1, T2) such that the pattern

[(D, R), (U, L)3]∗ is played indefinitely.

If this pattern is violated, both parties fall back to punishment strategies:

■ The punishment strategy of row is mixed (0.8, 0.2)∗. ■ The punishment strategy of col is pure R∗.

This combination of strategies is not a NE. (For R∗ induces U∗.)

slide-161
SLIDE 161

Example: a repeated game with a non-SGP NE

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 26

Col: Some game Left ( L ) Right ( R )

Ro w:

Up ( U )

(1, 1) (0, 0)

Down ( D )

(0, 0) (−1, 4)

slide-162
SLIDE 162

Example: a repeated game with a non-SGP NE

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 26

Col: Some game Left ( L ) Right ( R )

Ro w:

Up ( U )

(1, 1) (0, 0)

Down ( D )

(0, 0) (−1, 4)

  • Claim. The combination of trigger strategies (T1, T2) is a Nash-equilibrium for

large enough δ ∈ [0, 1].

slide-163
SLIDE 163

Example: a repeated game with a non-SGP NE

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 26

Col: Some game Left ( L ) Right ( R )

Ro w:

Up ( U )

(1, 1) (0, 0)

Down ( D )

(0, 0) (−1, 4)

  • Claim. The combination of trigger strategies (T1, T2) is a Nash-equilibrium for

large enough δ ∈ [0, 1].

■ T1 ⇒ T2. If row plays (the non-degenerated part of) T1, then col

must play T2, for T2 is a best response to T1.

slide-164
SLIDE 164

Example: a repeated game with a non-SGP NE

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 26

Col: Some game Left ( L ) Right ( R )

Ro w:

Up ( U )

(1, 1) (0, 0)

Down ( D )

(0, 0) (−1, 4)

  • Claim. The combination of trigger strategies (T1, T2) is a Nash-equilibrium for

large enough δ ∈ [0, 1].

■ T1 ⇒ T2. If row plays (the non-degenerated part of) T1, then col

must play T2, for T2 is a best response to T1.

■ T2 ⇒ T1. If at all, the best moment for row to deviate is at D, for that

would give row an incidental advantage of 1. After that row’s

  • pponent falls back to R∗.
slide-165
SLIDE 165

Example: a repeated game with a non-SGP NE

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 26

Col: Some game Left ( L ) Right ( R )

Ro w:

Up ( U )

(1, 1) (0, 0)

Down ( D )

(0, 0) (−1, 4)

  • Claim. The combination of trigger strategies (T1, T2) is a Nash-equilibrium for

large enough δ ∈ [0, 1].

■ T1 ⇒ T2. If row plays (the non-degenerated part of) T1, then col

must play T2, for T2 is a best response to T1.

■ T2 ⇒ T1. If at all, the best moment for row to deviate is at D, for that

would give row an incidental advantage of 1. After that row’s

  • pponent falls back to R∗.

Total payoff for row: 0 (for cheating)

slide-166
SLIDE 166

Example: a repeated game with a non-SGP NE

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 26

Col: Some game Left ( L ) Right ( R )

Ro w:

Up ( U )

(1, 1) (0, 0)

Down ( D )

(0, 0) (−1, 4)

  • Claim. The combination of trigger strategies (T1, T2) is a Nash-equilibrium for

large enough δ ∈ [0, 1].

■ T1 ⇒ T2. If row plays (the non-degenerated part of) T1, then col

must play T2, for T2 is a best response to T1.

■ T2 ⇒ T1. If at all, the best moment for row to deviate is at D, for that

would give row an incidental advantage of 1. After that row’s

  • pponent falls back to R∗.

Total payoff for row: 0 (for cheating) + 0 + · · · + 0 (for being punished by col).

slide-167
SLIDE 167

Example: a repeated game with a non-SGP NE

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 27

Col: Some game Left ( L ) Right ( R )

Ro w:

Up ( U )

(1, 1) (0, 0)

Down ( D )

(0, 0) (−1, 4)

slide-168
SLIDE 168

Example: a repeated game with a non-SGP NE

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 27

Col: Some game Left ( L ) Right ( R )

Ro w:

Up ( U )

(1, 1) (0, 0)

Down ( D )

(0, 0) (−1, 4)

■ T2 ⇒ T1 (continued). Total payoff for row player: 0 (for cheating) +

0 + · · · + 0 (for being punished by the column player).

slide-169
SLIDE 169

Example: a repeated game with a non-SGP NE

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 27

Col: Some game Left ( L ) Right ( R )

Ro w:

Up ( U )

(1, 1) (0, 0)

Down ( D )

(0, 0) (−1, 4)

■ T2 ⇒ T1 (continued). Total payoff for row player: 0 (for cheating) +

0 + · · · + 0 (for being punished by the column player). Payoff for row player if he was loyal:

(−1 + 1· δ + 1· δ2 + 1· δ3)

slide-170
SLIDE 170

Example: a repeated game with a non-SGP NE

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 27

Col: Some game Left ( L ) Right ( R )

Ro w:

Up ( U )

(1, 1) (0, 0)

Down ( D )

(0, 0) (−1, 4)

■ T2 ⇒ T1 (continued). Total payoff for row player: 0 (for cheating) +

0 + · · · + 0 (for being punished by the column player). Payoff for row player if he was loyal:

(−1 + 1· δ + 1· δ2 + 1· δ3) + (−1· δ4 + 1· δ5 + 1· δ6 + 1· δ7)

slide-171
SLIDE 171

Example: a repeated game with a non-SGP NE

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 27

Col: Some game Left ( L ) Right ( R )

Ro w:

Up ( U )

(1, 1) (0, 0)

Down ( D )

(0, 0) (−1, 4)

■ T2 ⇒ T1 (continued). Total payoff for row player: 0 (for cheating) +

0 + · · · + 0 (for being punished by the column player). Payoff for row player if he was loyal:

(−1 + 1· δ + 1· δ2 + 1· δ3) + (−1· δ4 + 1· δ5 + 1· δ6 + 1· δ7) + . . .

slide-172
SLIDE 172

Example: a repeated game with a non-SGP NE

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 27

Col: Some game Left ( L ) Right ( R )

Ro w:

Up ( U )

(1, 1) (0, 0)

Down ( D )

(0, 0) (−1, 4)

■ T2 ⇒ T1 (continued). Total payoff for row player: 0 (for cheating) +

0 + · · · + 0 (for being punished by the column player). Payoff for row player if he was loyal:

(−1 + 1· δ + 1· δ2 + 1· δ3) + (−1· δ4 + 1· δ5 + 1· δ6 + 1· δ7) + . . . =

k=0

δk

slide-173
SLIDE 173

Example: a repeated game with a non-SGP NE

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 27

Col: Some game Left ( L ) Right ( R )

Ro w:

Up ( U )

(1, 1) (0, 0)

Down ( D )

(0, 0) (−1, 4)

■ T2 ⇒ T1 (continued). Total payoff for row player: 0 (for cheating) +

0 + · · · + 0 (for being punished by the column player). Payoff for row player if he was loyal:

(−1 + 1· δ + 1· δ2 + 1· δ3) + (−1· δ4 + 1· δ5 + 1· δ6 + 1· δ7) + . . . =

k=0

δk − 2

k=0

δ4k

slide-174
SLIDE 174

Example: a repeated game with a non-SGP NE

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 27

Col: Some game Left ( L ) Right ( R )

Ro w:

Up ( U )

(1, 1) (0, 0)

Down ( D )

(0, 0) (−1, 4)

■ T2 ⇒ T1 (continued). Total payoff for row player: 0 (for cheating) +

0 + · · · + 0 (for being punished by the column player). Payoff for row player if he was loyal:

(−1 + 1· δ + 1· δ2 + 1· δ3) + (−1· δ4 + 1· δ5 + 1· δ6 + 1· δ7) + . . . =

k=0

δk − 2

k=0

δ4k = 1 1 − δ

slide-175
SLIDE 175

Example: a repeated game with a non-SGP NE

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 27

Col: Some game Left ( L ) Right ( R )

Ro w:

Up ( U )

(1, 1) (0, 0)

Down ( D )

(0, 0) (−1, 4)

■ T2 ⇒ T1 (continued). Total payoff for row player: 0 (for cheating) +

0 + · · · + 0 (for being punished by the column player). Payoff for row player if he was loyal:

(−1 + 1· δ + 1· δ2 + 1· δ3) + (−1· δ4 + 1· δ5 + 1· δ6 + 1· δ7) + . . . =

k=0

δk − 2

k=0

δ4k = 1 1 − δ − 2 1 1 − δ4 .

slide-176
SLIDE 176

Example: a repeated game with a non-SGP NE

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 27

Col: Some game Left ( L ) Right ( R )

Ro w:

Up ( U )

(1, 1) (0, 0)

Down ( D )

(0, 0) (−1, 4)

■ T2 ⇒ T1 (continued). Total payoff for row player: 0 (for cheating) +

0 + · · · + 0 (for being punished by the column player). Payoff for row player if he was loyal:

(−1 + 1· δ + 1· δ2 + 1· δ3) + (−1· δ4 + 1· δ5 + 1· δ6 + 1· δ7) + . . . =

k=0

δk − 2

k=0

δ4k = 1 1 − δ − 2 1 1 − δ4 . This expression is positive only if δ ≥ 0.54. (Solve 3rd-degree equation.)

slide-177
SLIDE 177

Example: A repeated game with a non-SGP NE

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 28

Col: L R

Ro w: U

(1, 1) (0, 0)

D

(0, 0) (−1, 4)

slide-178
SLIDE 178

Example: A repeated game with a non-SGP NE

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 28

Col: L R

Ro w: U

(1, 1) (0, 0)

D

(0, 0) (−1, 4)

■ Col can punish row maximally

by playing R∗. How row can punish col is less obvious.

slide-179
SLIDE 179

Example: A repeated game with a non-SGP NE

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 28

Col: L R

Ro w: U

(1, 1) (0, 0)

D

(0, 0) (−1, 4)

■ Col can punish row maximally

by playing R∗. How row can punish col is less obvious.

■ If row plays D∗ then col will

play R∗. If row plays U∗, then col will play L∗.

slide-180
SLIDE 180

Example: A repeated game with a non-SGP NE

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 28

Col: L R

Ro w: U

(1, 1) (0, 0)

D

(0, 0) (−1, 4)

■ Col can punish row maximally

by playing R∗. How row can punish col is less obvious.

■ If row plays D∗ then col will

play R∗. If row plays U∗, then col will play L∗.

■ Row can punish col even more

by playing a minmax strategy.

slide-181
SLIDE 181

Example: A repeated game with a non-SGP NE

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 28

Col: L R

Ro w: U

(1, 1) (0, 0)

D

(0, 0) (−1, 4)

■ Col can punish row maximally

by playing R∗. How row can punish col is less obvious.

■ If row plays D∗ then col will

play R∗. If row plays U∗, then col will play L∗.

■ Row can punish col even more

by playing a minmax strategy. Given row’s mixed strategy

(u, d), col maximises his

expected payoff by choosing the right mix (l, r): max

l,r ul· 1+ dr· 4

slide-182
SLIDE 182

Example: A repeated game with a non-SGP NE

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 28

Col: L R

Ro w: U

(1, 1) (0, 0)

D

(0, 0) (−1, 4)

■ Col can punish row maximally

by playing R∗. How row can punish col is less obvious.

■ If row plays D∗ then col will

play R∗. If row plays U∗, then col will play L∗.

■ Row can punish col even more

by playing a minmax strategy. Given row’s mixed strategy

(u, d), col maximises his

expected payoff by choosing the right mix (l, r): max

l,r ul· 1+ dr· 4

= max

l

ul + 4(1 − u)(1 − l)

slide-183
SLIDE 183

Example: A repeated game with a non-SGP NE

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 28

Col: L R

Ro w: U

(1, 1) (0, 0)

D

(0, 0) (−1, 4)

■ Col can punish row maximally

by playing R∗. How row can punish col is less obvious.

■ If row plays D∗ then col will

play R∗. If row plays U∗, then col will play L∗.

■ Row can punish col even more

by playing a minmax strategy. Given row’s mixed strategy

(u, d), col maximises his

expected payoff by choosing the right mix (l, r): max

l,r ul· 1+ dr· 4

= max

l

ul + 4(1 − u)(1 − l)

= max

l

(5u − 4)l + 4 − 4u.

slide-184
SLIDE 184

Example: A repeated game with a non-SGP NE

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 28

Col: L R

Ro w: U

(1, 1) (0, 0)

D

(0, 0) (−1, 4)

■ Col can punish row maximally

by playing R∗. How row can punish col is less obvious.

■ If row plays D∗ then col will

play R∗. If row plays U∗, then col will play L∗.

■ Row can punish col even more

by playing a minmax strategy. Given row’s mixed strategy

(u, d), col maximises his

expected payoff by choosing the right mix (l, r): max

l,r ul· 1+ dr· 4

= max

l

ul + 4(1 − u)(1 − l)

= max

l

(5u − 4)l + 4 − 4u.

■ If 5u − 4 = 0, it does not matter

what col chooses for l—his expected payoff is always 4 − 4(4/5) = 4/5.

slide-185
SLIDE 185

Example: A repeated game with a non-SGP NE

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 29

u 1 4/5 l 1

1 4

Draw a picture of the payoff sur- face of col.

slide-186
SLIDE 186

Example: A repeated game with a non-SGP NE

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 29

u 1 4/5 l 1

1 4

Draw a picture of the payoff sur- face of col.

■ If 5u − 4 = 0, it does not mat-

ter what col chooses for l. He expects 4/5.

slide-187
SLIDE 187

Example: A repeated game with a non-SGP NE

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 29

u 1 4/5 l 1

1 4

Draw a picture of the payoff sur- face of col.

■ If 5u − 4 = 0, it does not mat-

ter what col chooses for l. He expects 4/5.

■ If 5u − 4 > 0 col will play l = 1,

and expects to earn u and u > 4/5.

slide-188
SLIDE 188

Example: A repeated game with a non-SGP NE

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 29

u 1 4/5 l 1

1 4

Draw a picture of the payoff sur- face of col.

■ If 5u − 4 = 0, it does not mat-

ter what col chooses for l. He expects 4/5.

■ If 5u − 4 > 0 col will play l = 1,

and expects to earn u and u > 4/5.

■ If 5u − 4 < 0 col will play l = 0,

and expects to earn 4 − 4u > 4 − 4(4/5) = 4/5.

slide-189
SLIDE 189

Example: A repeated game with a non-SGP NE

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 29

u 1 4/5 l 1

1 4

Draw a picture of the payoff sur- face of col.

■ If 5u − 4 = 0, it does not mat-

ter what col chooses for l. He expects 4/5.

■ If 5u − 4 > 0 col will play l = 1,

and expects to earn u and u > 4/5.

■ If 5u − 4 < 0 col will play l = 0,

and expects to earn 4 − 4u > 4 − 4(4/5) = 4/5. These calculations are done by hand, and do not easily generalise to higher dimensions.

slide-190
SLIDE 190

What next?

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 30

Now that we know that infinitely many equilibria exist in repeated games (an embarrassment of richness), there are a number of ways in which we may proceed.

slide-191
SLIDE 191

What next?

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 30

Now that we know that infinitely many equilibria exist in repeated games (an embarrassment of richness), there are a number of ways in which we may proceed.

■ Reinforcement Learning. Agents simply execute the action(s) with

maximal rewards in the past.

slide-192
SLIDE 192

What next?

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 30

Now that we know that infinitely many equilibria exist in repeated games (an embarrassment of richness), there are a number of ways in which we may proceed.

■ Reinforcement Learning. Agents simply execute the action(s) with

maximal rewards in the past.

■ No-regret learning. Agents execute the action(s) with maximal

virtual rewards in the past.

slide-193
SLIDE 193

What next?

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 30

Now that we know that infinitely many equilibria exist in repeated games (an embarrassment of richness), there are a number of ways in which we may proceed.

■ Reinforcement Learning. Agents simply execute the action(s) with

maximal rewards in the past.

■ No-regret learning. Agents execute the action(s) with maximal

virtual rewards in the past.

■ Fictitious Play. Sample the actions of opponent(s) and play a best

response.

slide-194
SLIDE 194

What next?

Author: Gerard Vreeswijk. Slides last modified on May 3rd, 2019 at 12:39 Multi-agent learning: Repeated games, slide 30

Now that we know that infinitely many equilibria exist in repeated games (an embarrassment of richness), there are a number of ways in which we may proceed.

■ Reinforcement Learning. Agents simply execute the action(s) with

maximal rewards in the past.

■ No-regret learning. Agents execute the action(s) with maximal

virtual rewards in the past.

■ Fictitious Play. Sample the actions of opponent(s) and play a best

response.

■ Gradient Dynamics. This is to approximate NE of single-shot games

(stage games) through gradient ascent (hill-climbing).