Multi-agent learning Satising pla y Gerard Vreeswijk , Intelligent - - PowerPoint PPT Presentation

multi agent learning
SMART_READER_LITE
LIVE PREVIEW

Multi-agent learning Satising pla y Gerard Vreeswijk , Intelligent - - PowerPoint PPT Presentation

Multi-agent learning Satising pla y Gerard Vreeswijk , Intelligent Software Systems, Computer Science Department, Faculty of Sciences, Utrecht University, The Netherlands. Tuesday 16 th June, 2020 Assumptions in game playing Author: Gerard


slide-1
SLIDE 1

Multi-agent learning

Satis ing pla y

Gerard Vreeswijk, Intelligent Software Systems, Computer Science Department, Faculty of Sciences, Utrecht University, The Netherlands.

Tuesday 16th June, 2020

slide-2
SLIDE 2

Assumptions in game playing

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 2

slide-3
SLIDE 3

Assumptions in game playing

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 2

■ Players know the the

structure of the game, such as:

slide-4
SLIDE 4

Assumptions in game playing

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 2

■ Players know the the

structure of the game, such as:

  • Other players.
slide-5
SLIDE 5

Assumptions in game playing

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 2

■ Players know the the

structure of the game, such as:

  • Other players.
  • Other player’s possible

actions.

slide-6
SLIDE 6

Assumptions in game playing

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 2

■ Players know the the

structure of the game, such as:

  • Other players.
  • Other player’s possible

actions.

  • Relationship between

actions and payoffs.

slide-7
SLIDE 7

Assumptions in game playing

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 2

■ Players know the the

structure of the game, such as:

  • Other players.
  • Other player’s possible

actions.

  • Relationship between

actions and payoffs.

■ Players can observe other

player’s actions.

slide-8
SLIDE 8

Assumptions in game playing

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 2

■ Players know the the

structure of the game, such as:

  • Other players.
  • Other player’s possible

actions.

  • Relationship between

actions and payoffs.

■ Players can observe other

player’s actions.

■ . . . other player’s payoffs.

slide-9
SLIDE 9

Assumptions in game playing

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 2

■ Players know the the

structure of the game, such as:

  • Other players.
  • Other player’s possible

actions.

  • Relationship between

actions and payoffs.

■ Players can observe other

player’s actions.

■ . . . other player’s payoffs. ■ Players are aware that they

are in a game.

slide-10
SLIDE 10

What if none of these assumptions holds?

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 3

slide-11
SLIDE 11

What if none of these assumptions holds?

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 3

■ Players don’t know the

structure of the game.

slide-12
SLIDE 12

What if none of these assumptions holds?

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 3

■ Players don’t know the

structure of the game.

  • Players don’t know who’s

playing.

slide-13
SLIDE 13

What if none of these assumptions holds?

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 3

■ Players don’t know the

structure of the game.

  • Players don’t know who’s

playing.

  • Players don’t know the

arsenal of other players.

slide-14
SLIDE 14

What if none of these assumptions holds?

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 3

■ Players don’t know the

structure of the game.

  • Players don’t know who’s

playing.

  • Players don’t know the

arsenal of other players.

■ Players can’t observe other

player’s actions.

slide-15
SLIDE 15

What if none of these assumptions holds?

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 3

■ Players don’t know the

structure of the game.

  • Players don’t know who’s

playing.

  • Players don’t know the

arsenal of other players.

■ Players can’t observe other

player’s actions.

■ Players can’t observe other

player’s payoffs.

slide-16
SLIDE 16

What if none of these assumptions holds?

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 3

■ Players don’t know the

structure of the game.

  • Players don’t know who’s

playing.

  • Players don’t know the

arsenal of other players.

■ Players can’t observe other

player’s actions.

■ Players can’t observe other

player’s payoffs.

■ Players aren’t aware that they

are in a game.

slide-17
SLIDE 17

What if none of these assumptions holds?

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 3

■ Players don’t know the

structure of the game.

  • Players don’t know who’s

playing.

  • Players don’t know the

arsenal of other players.

■ Players can’t observe other

player’s actions.

■ Players can’t observe other

player’s payoffs.

■ Players aren’t aware that they

are in a game. This takes the problem out of game theory into machine learning.

slide-18
SLIDE 18

What if none of these assumptions holds?

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 3

■ Players don’t know the

structure of the game.

  • Players don’t know who’s

playing.

  • Players don’t know the

arsenal of other players.

■ Players can’t observe other

player’s actions.

■ Players can’t observe other

player’s payoffs.

■ Players aren’t aware that they

are in a game. This takes the problem out of game theory into machine learning. What can we do?

slide-19
SLIDE 19

What if none of these assumptions holds?

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 3

■ Players don’t know the

structure of the game.

  • Players don’t know who’s

playing.

  • Players don’t know the

arsenal of other players.

■ Players can’t observe other

player’s actions.

■ Players can’t observe other

player’s payoffs.

■ Players aren’t aware that they

are in a game. This takes the problem out of game theory into machine learning. What can we do?

■ Reinforcement learning.

slide-20
SLIDE 20

What if none of these assumptions holds?

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 3

■ Players don’t know the

structure of the game.

  • Players don’t know who’s

playing.

  • Players don’t know the

arsenal of other players.

■ Players can’t observe other

player’s actions.

■ Players can’t observe other

player’s payoffs.

■ Players aren’t aware that they

are in a game. This takes the problem out of game theory into machine learning. What can we do?

■ Reinforcement learning.

Disadvantages:

slide-21
SLIDE 21

What if none of these assumptions holds?

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 3

■ Players don’t know the

structure of the game.

  • Players don’t know who’s

playing.

  • Players don’t know the

arsenal of other players.

■ Players can’t observe other

player’s actions.

■ Players can’t observe other

player’s payoffs.

■ Players aren’t aware that they

are in a game. This takes the problem out of game theory into machine learning. What can we do?

■ Reinforcement learning.

Disadvantages:

  • No reference to past

average payoffs.

slide-22
SLIDE 22

What if none of these assumptions holds?

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 3

■ Players don’t know the

structure of the game.

  • Players don’t know who’s

playing.

  • Players don’t know the

arsenal of other players.

■ Players can’t observe other

player’s actions.

■ Players can’t observe other

player’s payoffs.

■ Players aren’t aware that they

are in a game. This takes the problem out of game theory into machine learning. What can we do?

■ Reinforcement learning.

Disadvantages:

  • No reference to past

average payoffs.

  • Difficult theory.
slide-23
SLIDE 23

What if none of these assumptions holds?

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 3

■ Players don’t know the

structure of the game.

  • Players don’t know who’s

playing.

  • Players don’t know the

arsenal of other players.

■ Players can’t observe other

player’s actions.

■ Players can’t observe other

player’s payoffs.

■ Players aren’t aware that they

are in a game. This takes the problem out of game theory into machine learning. What can we do?

■ Reinforcement learning.

Disadvantages:

  • No reference to past

average payoffs.

  • Difficult theory.

Alternative:

slide-24
SLIDE 24

What if none of these assumptions holds?

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 3

■ Players don’t know the

structure of the game.

  • Players don’t know who’s

playing.

  • Players don’t know the

arsenal of other players.

■ Players can’t observe other

player’s actions.

■ Players can’t observe other

player’s payoffs.

■ Players aren’t aware that they

are in a game. This takes the problem out of game theory into machine learning. What can we do?

■ Reinforcement learning.

Disadvantages:

  • No reference to past

average payoffs.

  • Difficult theory.

Alternative:

■ Satisficing learning.

slide-25
SLIDE 25

Herbert A. Simon on maximising vs. satisficing

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 4

slide-26
SLIDE 26

Herbert A. Simon on maximising vs. satisficing

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 4

slide-27
SLIDE 27

Herbert A. Simon on maximising vs. satisficing

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 4

slide-28
SLIDE 28

Herbert A. Simon on maximising vs. satisficing

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 4

“A decision maker who chooses the best available alternative ac- cording to some criteria is said to optimise; one who chooses an alternative that meets or exceeds specified criteria, but that is not guaranteed to be either unique or in any sense the best, is said to satisfice .”a

  • aH. Simon “Models of bounded rationality” in: Empirically grounded economic

reasons, Vol. 3. MIT Press, 1997.

slide-29
SLIDE 29

Herbert A. Simon on maximising vs. satisficing

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 4

“A decision maker who chooses the best available alternative ac- cording to some criteria is said to optimise; one who chooses an alternative that meets or exceeds specified criteria, but that is not guaranteed to be either unique or in any sense the best, is said to satisfice .”a

  • aH. Simon “Models of bounded rationality” in: Empirically grounded economic

reasons, Vol. 3. MIT Press, 1997.

“Decision makers can satisfice either by finding optimum solu- tions for a simplified world, or by finding satisfactory solutions for a more realistic world.

slide-30
SLIDE 30

Herbert A. Simon on maximising vs. satisficing

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 4

“A decision maker who chooses the best available alternative ac- cording to some criteria is said to optimise; one who chooses an alternative that meets or exceeds specified criteria, but that is not guaranteed to be either unique or in any sense the best, is said to satisfice .”a

  • aH. Simon “Models of bounded rationality” in: Empirically grounded economic

reasons, Vol. 3. MIT Press, 1997.

“Decision makers can satisfice either by finding optimum solu- tions for a simplified world, or by finding satisfactory solutions for a more realistic world. Neither approach, in general, domi- nates the other, and both have continued to co-exist in the world

  • f management science.”
slide-31
SLIDE 31

Herbert A. Simon on maximising vs. satisficing

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 4

“A decision maker who chooses the best available alternative ac- cording to some criteria is said to optimise; one who chooses an alternative that meets or exceeds specified criteria, but that is not guaranteed to be either unique or in any sense the best, is said to satisfice .”a

  • aH. Simon “Models of bounded rationality” in: Empirically grounded economic

reasons, Vol. 3. MIT Press, 1997.

“Decision makers can satisfice either by finding optimum solu- tions for a simplified world, or by finding satisfactory solutions for a more realistic world. Neither approach, in general, domi- nates the other, and both have continued to co-exist in the world

  • f management science.”a
  • aH. Simon “Rational decision making in business organizations” in: The

American Economic Review, Vol. 69(4), pp. 493-513.

slide-32
SLIDE 32

Karandikar et al.’s algorithm for satisficing play (1989)

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 5

slide-33
SLIDE 33

Satisficing algorithm

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 6

aspiration level p ersisten e rate
slide-34
SLIDE 34

Satisficing algorithm

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 6

■ At any time, t, the agent’s state is a tuple (At, αt).

aspiration level p ersisten e rate
slide-35
SLIDE 35

Satisficing algorithm

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 6

■ At any time, t, the agent’s state is a tuple (At, αt).

  • At is the current action.
aspiration level p ersisten e rate
slide-36
SLIDE 36

Satisficing algorithm

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 6

■ At any time, t, the agent’s state is a tuple (At, αt).

  • At is the current action.
  • αt is the current
aspiration level. p ersisten e rate
slide-37
SLIDE 37

Satisficing algorithm

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 6

■ At any time, t, the agent’s state is a tuple (At, αt).

  • At is the current action.
  • αt is the current
aspiration
  • level. Updated as

αt+1 =Def λαt + (1 − λ)πt,

p ersisten e rate
slide-38
SLIDE 38

Satisficing algorithm

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 6

■ At any time, t, the agent’s state is a tuple (At, αt).

  • At is the current action.
  • αt is the current
aspiration
  • level. Updated as

αt+1 =Def λαt + (1 − λ)πt, where λ is the

p ersisten e rate
slide-39
SLIDE 39

Satisficing algorithm

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 6

■ At any time, t, the agent’s state is a tuple (At, αt).

  • At is the current action.
  • αt is the current
aspiration
  • level. Updated as

αt+1 =Def λαt + (1 − λ)πt, where λ is the

p ersisten e rate, and πt is the payoff in round t.
slide-40
SLIDE 40

Satisficing algorithm

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 6

■ At any time, t, the agent’s state is a tuple (At, αt).

  • At is the current action.
  • αt is the current
aspiration
  • level. Updated as

αt+1 =Def λαt + (1 − λ)πt, where λ is the

p ersisten e rate, and πt is the payoff in round t.
  • It is up to the programmer to choose an initial action A0, and an

initial aspiration level α0.

slide-41
SLIDE 41

Satisficing algorithm

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 6

■ At any time, t, the agent’s state is a tuple (At, αt).

  • At is the current action.
  • αt is the current
aspiration
  • level. Updated as

αt+1 =Def λαt + (1 − λ)πt, where λ is the

p ersisten e rate, and πt is the payoff in round t.
  • It is up to the programmer to choose an initial action A0, and an

initial aspiration level α0.

■ Satisficing algorithm:

At+1 = At if πt ≥ αt, any other action else.

slide-42
SLIDE 42

Satisficing algorithm

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 6

■ At any time, t, the agent’s state is a tuple (At, αt).

  • At is the current action.
  • αt is the current
aspiration
  • level. Updated as

αt+1 =Def λαt + (1 − λ)πt, where λ is the

p ersisten e rate, and πt is the payoff in round t.
  • It is up to the programmer to choose an initial action A0, and an

initial aspiration level α0.

■ Satisficing algorithm:

At+1 = At if πt ≥ αt, any other action else. Also works if “any other action” is replaced by “any action”.

slide-43
SLIDE 43

Example of satisficing play

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Game: prisoner’s dilemma.

slide-44
SLIDE 44

Example of satisficing play

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state (A0, α0) = (C, 5).

slide-45
SLIDE 45

Example of satisficing play

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state (A0, α0) = (C, 5). Persistence rate: λ = 0.5. t TFT At πt αt

2 4 6 8 10 t 1 2 3 4 5 Α

Progress of aspirations.

slide-46
SLIDE 46

Example of satisficing play

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state (A0, α0) = (C, 5). Persistence rate: λ = 0.5. t TFT At πt αt

2 4 6 8 10 t 1 2 3 4 5 Α

Progress of aspirations.

slide-47
SLIDE 47

Example of satisficing play

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state (A0, α0) = (C, 5). Persistence rate: λ = 0.5. t TFT At πt αt

2 4 6 8 10 t 1 2 3 4 5 Α

Progress of aspirations.

slide-48
SLIDE 48

Example of satisficing play

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state (A0, α0) = (C, 5). Persistence rate: λ = 0.5. t TFT At πt αt C

2 4 6 8 10 t 1 2 3 4 5 Α

Progress of aspirations.

slide-49
SLIDE 49

Example of satisficing play

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state (A0, α0) = (C, 5). Persistence rate: λ = 0.5. t TFT At πt αt C C

2 4 6 8 10 t 1 2 3 4 5 Α

Progress of aspirations.

slide-50
SLIDE 50

Example of satisficing play

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state (A0, α0) = (C, 5). Persistence rate: λ = 0.5. t TFT At πt αt C C 3

2 4 6 8 10 t 1 2 3 4 5 Α

Progress of aspirations.

slide-51
SLIDE 51

Example of satisficing play

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state (A0, α0) = (C, 5). Persistence rate: λ = 0.5. t TFT At πt αt C C 3 5

2 4 6 8 10 t 1 2 3 4 5 Α

Progress of aspirations.

slide-52
SLIDE 52

Example of satisficing play

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state (A0, α0) = (C, 5). Persistence rate: λ = 0.5. t TFT At πt αt C C 3 5 1

2 4 6 8 10 t 1 2 3 4 5 Α

Progress of aspirations.

slide-53
SLIDE 53

Example of satisficing play

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state (A0, α0) = (C, 5). Persistence rate: λ = 0.5. t TFT At πt αt C C 3 5 1 C

2 4 6 8 10 t 1 2 3 4 5 Α

Progress of aspirations.

slide-54
SLIDE 54

Example of satisficing play

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state (A0, α0) = (C, 5). Persistence rate: λ = 0.5. t TFT At πt αt C C 3 5 1 C D

2 4 6 8 10 t 1 2 3 4 5 Α

Progress of aspirations.

slide-55
SLIDE 55

Example of satisficing play

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state (A0, α0) = (C, 5). Persistence rate: λ = 0.5. t TFT At πt αt C C 3 5 1 C D 5

2 4 6 8 10 t 1 2 3 4 5 Α

Progress of aspirations.

slide-56
SLIDE 56

Example of satisficing play

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state (A0, α0) = (C, 5). Persistence rate: λ = 0.5. t TFT At πt αt C C 3 5 1 C D 5 4

2 4 6 8 10 t 1 2 3 4 5 Α

Progress of aspirations.

slide-57
SLIDE 57

Example of satisficing play

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state (A0, α0) = (C, 5). Persistence rate: λ = 0.5. t TFT At πt αt C C 3 5 1 C D 5 4 2

2 4 6 8 10 t 1 2 3 4 5 Α

Progress of aspirations.

slide-58
SLIDE 58

Example of satisficing play

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state (A0, α0) = (C, 5). Persistence rate: λ = 0.5. t TFT At πt αt C C 3 5 1 C D 5 4 2 D

2 4 6 8 10 t 1 2 3 4 5 Α

Progress of aspirations.

slide-59
SLIDE 59

Example of satisficing play

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state (A0, α0) = (C, 5). Persistence rate: λ = 0.5. t TFT At πt αt C C 3 5 1 C D 5 4 2 D D

2 4 6 8 10 t 1 2 3 4 5 Α

Progress of aspirations.

slide-60
SLIDE 60

Example of satisficing play

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state (A0, α0) = (C, 5). Persistence rate: λ = 0.5. t TFT At πt αt C C 3 5 1 C D 5 4 2 D D 1

2 4 6 8 10 t 1 2 3 4 5 Α

Progress of aspirations.

slide-61
SLIDE 61

Example of satisficing play

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state (A0, α0) = (C, 5). Persistence rate: λ = 0.5. t TFT At πt αt C C 3 5 1 C D 5 4 2 D D 1 4.5

2 4 6 8 10 t 1 2 3 4 5 Α

Progress of aspirations.

slide-62
SLIDE 62

Example of satisficing play

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state (A0, α0) = (C, 5). Persistence rate: λ = 0.5. t TFT At πt αt C C 3 5 1 C D 5 4 2 D D 1 4.5 3

2 4 6 8 10 t 1 2 3 4 5 Α

Progress of aspirations.

slide-63
SLIDE 63

Example of satisficing play

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state (A0, α0) = (C, 5). Persistence rate: λ = 0.5. t TFT At πt αt C C 3 5 1 C D 5 4 2 D D 1 4.5 3 D

2 4 6 8 10 t 1 2 3 4 5 Α

Progress of aspirations.

slide-64
SLIDE 64

Example of satisficing play

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state (A0, α0) = (C, 5). Persistence rate: λ = 0.5. t TFT At πt αt C C 3 5 1 C D 5 4 2 D D 1 4.5 3 D C

2 4 6 8 10 t 1 2 3 4 5 Α

Progress of aspirations.

slide-65
SLIDE 65

Example of satisficing play

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state (A0, α0) = (C, 5). Persistence rate: λ = 0.5. t TFT At πt αt C C 3 5 1 C D 5 4 2 D D 1 4.5 3 D C

2 4 6 8 10 t 1 2 3 4 5 Α

Progress of aspirations.

slide-66
SLIDE 66

Example of satisficing play

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state (A0, α0) = (C, 5). Persistence rate: λ = 0.5. t TFT At πt αt C C 3 5 1 C D 5 4 2 D D 1 4.5 3 D C 2.75

2 4 6 8 10 t 1 2 3 4 5 Α

Progress of aspirations.

slide-67
SLIDE 67

Example of satisficing play

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state (A0, α0) = (C, 5). Persistence rate: λ = 0.5. t TFT At πt αt C C 3 5 1 C D 5 4 2 D D 1 4.5 3 D C 2.75 4

2 4 6 8 10 t 1 2 3 4 5 Α

Progress of aspirations.

slide-68
SLIDE 68

Example of satisficing play

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state (A0, α0) = (C, 5). Persistence rate: λ = 0.5. t TFT At πt αt C C 3 5 1 C D 5 4 2 D D 1 4.5 3 D C 2.75 4 C

2 4 6 8 10 t 1 2 3 4 5 Α

Progress of aspirations.

slide-69
SLIDE 69

Example of satisficing play

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state (A0, α0) = (C, 5). Persistence rate: λ = 0.5. t TFT At πt αt C C 3 5 1 C D 5 4 2 D D 1 4.5 3 D C 2.75 4 C D

2 4 6 8 10 t 1 2 3 4 5 Α

Progress of aspirations.

slide-70
SLIDE 70

Example of satisficing play

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state (A0, α0) = (C, 5). Persistence rate: λ = 0.5. t TFT At πt αt C C 3 5 1 C D 5 4 2 D D 1 4.5 3 D C 2.75 4 C D 5

2 4 6 8 10 t 1 2 3 4 5 Α

Progress of aspirations.

slide-71
SLIDE 71

Example of satisficing play

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state (A0, α0) = (C, 5). Persistence rate: λ = 0.5. t TFT At πt αt C C 3 5 1 C D 5 4 2 D D 1 4.5 3 D C 2.75 4 C D 5 1.375

2 4 6 8 10 t 1 2 3 4 5 Α

Progress of aspirations.

slide-72
SLIDE 72

Example of satisficing play

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state (A0, α0) = (C, 5). Persistence rate: λ = 0.5. t TFT At πt αt C C 3 5 1 C D 5 4 2 D D 1 4.5 3 D C 2.75 4 C D 5 1.375 5

2 4 6 8 10 t 1 2 3 4 5 Α

Progress of aspirations.

slide-73
SLIDE 73

Example of satisficing play

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state (A0, α0) = (C, 5). Persistence rate: λ = 0.5. t TFT At πt αt C C 3 5 1 C D 5 4 2 D D 1 4.5 3 D C 2.75 4 C D 5 1.375 5 D

2 4 6 8 10 t 1 2 3 4 5 Α

Progress of aspirations.

slide-74
SLIDE 74

Example of satisficing play

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state (A0, α0) = (C, 5). Persistence rate: λ = 0.5. t TFT At πt αt C C 3 5 1 C D 5 4 2 D D 1 4.5 3 D C 2.75 4 C D 5 1.375 5 D D

2 4 6 8 10 t 1 2 3 4 5 Α

Progress of aspirations.

slide-75
SLIDE 75

Example of satisficing play

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state (A0, α0) = (C, 5). Persistence rate: λ = 0.5. t TFT At πt αt C C 3 5 1 C D 5 4 2 D D 1 4.5 3 D C 2.75 4 C D 5 1.375 5 D D 1

2 4 6 8 10 t 1 2 3 4 5 Α

Progress of aspirations.

slide-76
SLIDE 76

Example of satisficing play

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state (A0, α0) = (C, 5). Persistence rate: λ = 0.5. t TFT At πt αt C C 3 5 1 C D 5 4 2 D D 1 4.5 3 D C 2.75 4 C D 5 1.375 5 D D 1 3.1875

2 4 6 8 10 t 1 2 3 4 5 Α

Progress of aspirations.

slide-77
SLIDE 77

Example of satisficing play

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state (A0, α0) = (C, 5). Persistence rate: λ = 0.5. t TFT At πt αt C C 3 5 1 C D 5 4 2 D D 1 4.5 3 D C 2.75 4 C D 5 1.375 5 D D 1 3.1875 6

2 4 6 8 10 t 1 2 3 4 5 Α

Progress of aspirations.

slide-78
SLIDE 78

Example of satisficing play

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state (A0, α0) = (C, 5). Persistence rate: λ = 0.5. t TFT At πt αt C C 3 5 1 C D 5 4 2 D D 1 4.5 3 D C 2.75 4 C D 5 1.375 5 D D 1 3.1875 6 D

2 4 6 8 10 t 1 2 3 4 5 Α

Progress of aspirations.

slide-79
SLIDE 79

Example of satisficing play

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state (A0, α0) = (C, 5). Persistence rate: λ = 0.5. t TFT At πt αt C C 3 5 1 C D 5 4 2 D D 1 4.5 3 D C 2.75 4 C D 5 1.375 5 D D 1 3.1875 6 D C

2 4 6 8 10 t 1 2 3 4 5 Α

Progress of aspirations.

slide-80
SLIDE 80

Example of satisficing play

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state (A0, α0) = (C, 5). Persistence rate: λ = 0.5. t TFT At πt αt C C 3 5 1 C D 5 4 2 D D 1 4.5 3 D C 2.75 4 C D 5 1.375 5 D D 1 3.1875 6 D C

2 4 6 8 10 t 1 2 3 4 5 Α

Progress of aspirations.

slide-81
SLIDE 81

Example of satisficing play

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state (A0, α0) = (C, 5). Persistence rate: λ = 0.5. t TFT At πt αt C C 3 5 1 C D 5 4 2 D D 1 4.5 3 D C 2.75 4 C D 5 1.375 5 D D 1 3.1875 6 D C 2.09375

2 4 6 8 10 t 1 2 3 4 5 Α

Progress of aspirations.

slide-82
SLIDE 82

Example of satisficing play

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state (A0, α0) = (C, 5). Persistence rate: λ = 0.5. t TFT At πt αt C C 3 5 1 C D 5 4 2 D D 1 4.5 3 D C 2.75 4 C D 5 1.375 5 D D 1 3.1875 6 D C 2.09375 7

2 4 6 8 10 t 1 2 3 4 5 Α

Progress of aspirations.

slide-83
SLIDE 83

Example of satisficing play

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state (A0, α0) = (C, 5). Persistence rate: λ = 0.5. t TFT At πt αt C C 3 5 1 C D 5 4 2 D D 1 4.5 3 D C 2.75 4 C D 5 1.375 5 D D 1 3.1875 6 D C 2.09375 7 C

2 4 6 8 10 t 1 2 3 4 5 Α

Progress of aspirations.

slide-84
SLIDE 84

Example of satisficing play

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state (A0, α0) = (C, 5). Persistence rate: λ = 0.5. t TFT At πt αt C C 3 5 1 C D 5 4 2 D D 1 4.5 3 D C 2.75 4 C D 5 1.375 5 D D 1 3.1875 6 D C 2.09375 7 C D

2 4 6 8 10 t 1 2 3 4 5 Α

Progress of aspirations.

slide-85
SLIDE 85

Example of satisficing play

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state (A0, α0) = (C, 5). Persistence rate: λ = 0.5. t TFT At πt αt C C 3 5 1 C D 5 4 2 D D 1 4.5 3 D C 2.75 4 C D 5 1.375 5 D D 1 3.1875 6 D C 2.09375 7 C D 5

2 4 6 8 10 t 1 2 3 4 5 Α

Progress of aspirations.

slide-86
SLIDE 86

Example of satisficing play

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state (A0, α0) = (C, 5). Persistence rate: λ = 0.5. t TFT At πt αt C C 3 5 1 C D 5 4 2 D D 1 4.5 3 D C 2.75 4 C D 5 1.375 5 D D 1 3.1875 6 D C 2.09375 7 C D 5 1.046875

2 4 6 8 10 t 1 2 3 4 5 Α

Progress of aspirations.

slide-87
SLIDE 87

Example of satisficing play

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state (A0, α0) = (C, 5). Persistence rate: λ = 0.5. t TFT At πt αt C C 3 5 1 C D 5 4 2 D D 1 4.5 3 D C 2.75 4 C D 5 1.375 5 D D 1 3.1875 6 D C 2.09375 7 C D 5 1.046875 8

2 4 6 8 10 t 1 2 3 4 5 Α

Progress of aspirations.

slide-88
SLIDE 88

Example of satisficing play

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state (A0, α0) = (C, 5). Persistence rate: λ = 0.5. t TFT At πt αt C C 3 5 1 C D 5 4 2 D D 1 4.5 3 D C 2.75 4 C D 5 1.375 5 D D 1 3.1875 6 D C 2.09375 7 C D 5 1.046875 8 D

2 4 6 8 10 t 1 2 3 4 5 Α

Progress of aspirations.

slide-89
SLIDE 89

Example of satisficing play

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state (A0, α0) = (C, 5). Persistence rate: λ = 0.5. t TFT At πt αt C C 3 5 1 C D 5 4 2 D D 1 4.5 3 D C 2.75 4 C D 5 1.375 5 D D 1 3.1875 6 D C 2.09375 7 C D 5 1.046875 8 D D

2 4 6 8 10 t 1 2 3 4 5 Α

Progress of aspirations.

slide-90
SLIDE 90

Example of satisficing play

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state (A0, α0) = (C, 5). Persistence rate: λ = 0.5. t TFT At πt αt C C 3 5 1 C D 5 4 2 D D 1 4.5 3 D C 2.75 4 C D 5 1.375 5 D D 1 3.1875 6 D C 2.09375 7 C D 5 1.046875 8 D D 1

2 4 6 8 10 t 1 2 3 4 5 Α

Progress of aspirations.

slide-91
SLIDE 91

Example of satisficing play

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state (A0, α0) = (C, 5). Persistence rate: λ = 0.5. t TFT At πt αt C C 3 5 1 C D 5 4 2 D D 1 4.5 3 D C 2.75 4 C D 5 1.375 5 D D 1 3.1875 6 D C 2.09375 7 C D 5 1.046875 8 D D 1 3.0234375

2 4 6 8 10 t 1 2 3 4 5 Α

Progress of aspirations.

slide-92
SLIDE 92

Example of satisficing play

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state (A0, α0) = (C, 5). Persistence rate: λ = 0.5. t TFT At πt αt C C 3 5 1 C D 5 4 2 D D 1 4.5 3 D C 2.75 4 C D 5 1.375 5 D D 1 3.1875 6 D C 2.09375 7 C D 5 1.046875 8 D D 1 3.0234375 9

2 4 6 8 10 t 1 2 3 4 5 Α

Progress of aspirations.

slide-93
SLIDE 93

Example of satisficing play

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state (A0, α0) = (C, 5). Persistence rate: λ = 0.5. t TFT At πt αt C C 3 5 1 C D 5 4 2 D D 1 4.5 3 D C 2.75 4 C D 5 1.375 5 D D 1 3.1875 6 D C 2.09375 7 C D 5 1.046875 8 D D 1 3.0234375 9 D

2 4 6 8 10 t 1 2 3 4 5 Α

Progress of aspirations.

slide-94
SLIDE 94

Example of satisficing play

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state (A0, α0) = (C, 5). Persistence rate: λ = 0.5. t TFT At πt αt C C 3 5 1 C D 5 4 2 D D 1 4.5 3 D C 2.75 4 C D 5 1.375 5 D D 1 3.1875 6 D C 2.09375 7 C D 5 1.046875 8 D D 1 3.0234375 9 D C

2 4 6 8 10 t 1 2 3 4 5 Α

Progress of aspirations.

slide-95
SLIDE 95

Example of satisficing play

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state (A0, α0) = (C, 5). Persistence rate: λ = 0.5. t TFT At πt αt C C 3 5 1 C D 5 4 2 D D 1 4.5 3 D C 2.75 4 C D 5 1.375 5 D D 1 3.1875 6 D C 2.09375 7 C D 5 1.046875 8 D D 1 3.0234375 9 D C

2 4 6 8 10 t 1 2 3 4 5 Α

Progress of aspirations.

slide-96
SLIDE 96

Example of satisficing play

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state (A0, α0) = (C, 5). Persistence rate: λ = 0.5. t TFT At πt αt C C 3 5 1 C D 5 4 2 D D 1 4.5 3 D C 2.75 4 C D 5 1.375 5 D D 1 3.1875 6 D C 2.09375 7 C D 5 1.046875 8 D D 1 3.0234375 9 D C 2.01171875

2 4 6 8 10 t 1 2 3 4 5 Α

Progress of aspirations.

slide-97
SLIDE 97

Example of satisficing play

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state (A0, α0) = (C, 5). Persistence rate: λ = 0.5. t TFT At πt αt C C 3 5 1 C D 5 4 2 D D 1 4.5 3 D C 2.75 4 C D 5 1.375 5 D D 1 3.1875 6 D C 2.09375 7 C D 5 1.046875 8 D D 1 3.0234375 9 D C 2.01171875 10

2 4 6 8 10 t 1 2 3 4 5 Α

Progress of aspirations.

slide-98
SLIDE 98

Example of satisficing play

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state (A0, α0) = (C, 5). Persistence rate: λ = 0.5. t TFT At πt αt C C 3 5 1 C D 5 4 2 D D 1 4.5 3 D C 2.75 4 C D 5 1.375 5 D D 1 3.1875 6 D C 2.09375 7 C D 5 1.046875 8 D D 1 3.0234375 9 D C 2.01171875 10 C

2 4 6 8 10 t 1 2 3 4 5 Α

Progress of aspirations.

slide-99
SLIDE 99

Example of satisficing play

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state (A0, α0) = (C, 5). Persistence rate: λ = 0.5. t TFT At πt αt C C 3 5 1 C D 5 4 2 D D 1 4.5 3 D C 2.75 4 C D 5 1.375 5 D D 1 3.1875 6 D C 2.09375 7 C D 5 1.046875 8 D D 1 3.0234375 9 D C 2.01171875 10 C D

2 4 6 8 10 t 1 2 3 4 5 Α

Progress of aspirations.

slide-100
SLIDE 100

Example of satisficing play

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state (A0, α0) = (C, 5). Persistence rate: λ = 0.5. t TFT At πt αt C C 3 5 1 C D 5 4 2 D D 1 4.5 3 D C 2.75 4 C D 5 1.375 5 D D 1 3.1875 6 D C 2.09375 7 C D 5 1.046875 8 D D 1 3.0234375 9 D C 2.01171875 10 C D 5

2 4 6 8 10 t 1 2 3 4 5 Α

Progress of aspirations.

slide-101
SLIDE 101

Example of satisficing play

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state (A0, α0) = (C, 5). Persistence rate: λ = 0.5. t TFT At πt αt C C 3 5 1 C D 5 4 2 D D 1 4.5 3 D C 2.75 4 C D 5 1.375 5 D D 1 3.1875 6 D C 2.09375 7 C D 5 1.046875 8 D D 1 3.0234375 9 D C 2.01171875 10 C D 5 1.005859375

2 4 6 8 10 t 1 2 3 4 5 Α

Progress of aspirations.

slide-102
SLIDE 102

Example of satisficing play

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 7

Game: prisoner’s dilemma. Strategy player 1: tit-for-tat. Strategy player 2: satisficing with initial state (A0, α0) = (C, 5). Persistence rate: λ = 0.5. t TFT At πt αt C C 3 5 1 C D 5 4 2 D D 1 4.5 3 D C 2.75 4 C D 5 1.375 5 D D 1 3.1875 6 D C 2.09375 7 C D 5 1.046875 8 D D 1 3.0234375 9 D C 2.01171875 10 C D 5 1.005859375 . . . . . . . . . . . .

2 4 6 8 10 t 1 2 3 4 5 Α

Progress of aspirations.

slide-103
SLIDE 103

Demo: Satisficing play in general 2-player 3x3 matrix games

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 8

slide-104
SLIDE 104

Approach

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 9

slide-105
SLIDE 105

Approach

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 9

■ Take a 2-player 3 × 3 game in

normal form.

slide-106
SLIDE 106

Approach

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 9

■ Take a 2-player 3 × 3 game in

normal form.

■ Plot all 9 pure payoff profiles

in 2D.

slide-107
SLIDE 107

Approach

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 9

■ Take a 2-player 3 × 3 game in

normal form.

■ Plot all 9 pure payoff profiles

in 2D.

■ Initialize, say, 100 profiles.

One profile looks like:

( (At, αt) , (Bt, βt) ).

slide-108
SLIDE 108

Approach

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 9

■ Take a 2-player 3 × 3 game in

normal form.

■ Plot all 9 pure payoff profiles

in 2D.

■ Initialize, say, 100 profiles.

One profile looks like:

( (At, αt) , (Bt, βt) ).

Plot the corresponding 100 aspiration profiles (αt, βt) in the same canvas.

slide-109
SLIDE 109

Approach

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 9

■ Take a 2-player 3 × 3 game in

normal form.

■ Plot all 9 pure payoff profiles

in 2D.

■ Initialize, say, 100 profiles.

One profile looks like:

( (At, αt) , (Bt, βt) ).

Plot the corresponding 100 aspiration profiles (αt, βt) in the same canvas.

■ Execute satisficing play for

all player profiles simultaneously.

slide-110
SLIDE 110

Satisficing play in a 2-player matrix game

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 10

slide-111
SLIDE 111

Satisficing play in a generalised prisoner’s dilemma with self-play (Stimpson et al., 2001)

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 11

slide-112
SLIDE 112

The generalised prisoner’s dilemma (GPD)

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 12

slide-113
SLIDE 113

The generalised prisoner’s dilemma (GPD)

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 12

■ Generalised payoff matrix

C D C σ, σ 0, 1 D 1, 0 δ, δ Reward payoff: σ Sucker payoff: Temptation payoff: 1 Punishment payoff: δ

slide-114
SLIDE 114

The generalised prisoner’s dilemma (GPD)

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 12

■ Generalised payoff matrix

C D C σ, σ 0, 1 D 1, 0 δ, δ Reward payoff: σ Sucker payoff: Temptation payoff: 1 Punishment payoff: δ Constraints: 0 < δ < σ < 1 and 1/2 < σ.

slide-115
SLIDE 115

The generalised prisoner’s dilemma (GPD)

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 12

■ Generalised payoff matrix

C D C σ, σ 0, 1 D 1, 0 δ, δ Reward payoff: σ Sucker payoff: Temptation payoff: 1 Punishment payoff: δ Constraints: 0 < δ < σ < 1 and 1/2 < σ. (Why?)

slide-116
SLIDE 116

The generalised prisoner’s dilemma (GPD)

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 12

■ Generalised payoff matrix

C D C σ, σ 0, 1 D 1, 0 δ, δ Reward payoff: σ Sucker payoff: Temptation payoff: 1 Punishment payoff: δ Constraints: 0 < δ < σ < 1 and 1/2 < σ. (Why?)

■ Use Karandikar et al.’s algorithm.

slide-117
SLIDE 117

The generalised prisoner’s dilemma (GPD)

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 12

■ Generalised payoff matrix

C D C σ, σ 0, 1 D 1, 0 δ, δ Reward payoff: σ Sucker payoff: Temptation payoff: 1 Punishment payoff: δ Constraints: 0 < δ < σ < 1 and 1/2 < σ. (Why?)

■ Use Karandikar et al.’s algorithm.

  • States for satisficing play:
slide-118
SLIDE 118

The generalised prisoner’s dilemma (GPD)

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 12

■ Generalised payoff matrix

C D C σ, σ 0, 1 D 1, 0 δ, δ Reward payoff: σ Sucker payoff: Temptation payoff: 1 Punishment payoff: δ Constraints: 0 < δ < σ < 1 and 1/2 < σ. (Why?)

■ Use Karandikar et al.’s algorithm.

  • States for satisficing play:

◆ (At, αt) for the row player.

slide-119
SLIDE 119

The generalised prisoner’s dilemma (GPD)

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 12

■ Generalised payoff matrix

C D C σ, σ 0, 1 D 1, 0 δ, δ Reward payoff: σ Sucker payoff: Temptation payoff: 1 Punishment payoff: δ Constraints: 0 < δ < σ < 1 and 1/2 < σ. (Why?)

■ Use Karandikar et al.’s algorithm.

  • States for satisficing play:

◆ (At, αt) for the row player. ◆ (Bt, βt) for the column player.

slide-120
SLIDE 120

The generalised prisoner’s dilemma (GPD)

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 12

■ Generalised payoff matrix

C D C σ, σ 0, 1 D 1, 0 δ, δ Reward payoff: σ Sucker payoff: Temptation payoff: 1 Punishment payoff: δ Constraints: 0 < δ < σ < 1 and 1/2 < σ. (Why?)

■ Use Karandikar et al.’s algorithm.

  • States for satisficing play:

◆ (At, αt) for the row player. ◆ (Bt, βt) for the column player.

  • The initial states are denoted by (A0, α0) and (B0, β0), respectively.
slide-121
SLIDE 121

Self-play: possible dynamics

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 13

slide-122
SLIDE 122

Self-play: possible dynamics

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 13

  • 1. Stability.
slide-123
SLIDE 123

Self-play: possible dynamics

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 13

  • 1. Stability. Convergence to a fixed action profile.
slide-124
SLIDE 124

Self-play: possible dynamics

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 13

  • 1. Stability. Convergence to a fixed action profile. This

happens if and only if αA

t

≤ πA

t

and αB

t

≤ πB

t .

for all t ≥ T, for some T ≥ 0.

slide-125
SLIDE 125

Self-play: possible dynamics

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 13

  • 1. Stability. Convergence to a fixed action profile. This

happens if and only if αA

t

≤ πA

t

and αB

t

≤ πB

t .

for all t ≥ T, for some T ≥ 0. 2. Periodicity.

slide-126
SLIDE 126

Self-play: possible dynamics

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 13

  • 1. Stability. Convergence to a fixed action profile. This

happens if and only if αA

t

≤ πA

t

and αB

t

≤ πB

t .

for all t ≥ T, for some T ≥ 0. 2. Periodicity. Convergence to a cycle of action profiles

slide-127
SLIDE 127

Self-play: possible dynamics

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 13

  • 1. Stability. Convergence to a fixed action profile. This

happens if and only if αA

t

≤ πA

t

and αB

t

≤ πB

t .

for all t ≥ T, for some T ≥ 0. 2. Periodicity. Convergence to a cycle of action profiles, e.g.

(D, D), (D, C), (C, D), (D, D), (D, C), (C, D), . . .

slide-128
SLIDE 128

Self-play: possible dynamics

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 13

  • 1. Stability. Convergence to a fixed action profile. This

happens if and only if αA

t

≤ πA

t

and αB

t

≤ πB

t .

for all t ≥ T, for some T ≥ 0. 2. Periodicity. Convergence to a cycle of action profiles, e.g.

(D, D), (D, C), (C, D), (D, D), (D, C), (C, D), . . .

  • 3. Chaos.
slide-129
SLIDE 129

Self-play: possible dynamics

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 13

  • 1. Stability. Convergence to a fixed action profile. This

happens if and only if αA

t

≤ πA

t

and αB

t

≤ πB

t .

for all t ≥ T, for some T ≥ 0. 2. Periodicity. Convergence to a cycle of action profiles, e.g.

(D, D), (D, C), (C, D), (D, D), (D, C), (C, D), . . .

  • 3. Chaos. Deterministic but non-periodic behaviour.
slide-130
SLIDE 130

Experiments throughout the parameter space

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 14

slide-131
SLIDE 131

Parameter space

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 15

Symbol Min Max Reward payoff σ 0.51 1.0 Punishment payoff δ 0.1 σ Initial aspirations α0, β0 0.5 2.0 Initial actions A0, B0 50% C, 50% D Persistence rate λ 0.1 0.9 Table 1: Distribution of parameters for simulations.

slide-132
SLIDE 132

Frequencies of each of the possible outcomes

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 16

Frequencies of each of the possible outcomes from 5,000 trials. Parameters were randomly selected as described in Table 1. (From: “Satisficing and Learning Cooperation in the Prisoner’s Dilemma”, Stimpson et al., 2001.)

slide-133
SLIDE 133

Mutual cooperation as a result of initial aspirations

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 17

A contour plot of the percentage of trials out of 1,000 that converged to mutual cooperation as a function of initial aspirations. Light colors indicate that in most of the trials with the given initial aspirations, the agents learned to cooperate. Parameters other than α0 and β0 were randomly selected from Table 1. (From: Stimpson et al., 2001.)

slide-134
SLIDE 134

Same experiment with Netlogo

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 18

A Netlogo plot of the percentage of trials out of 100 that converged to mutual cooperation as a function of initial aspirations. Light colors indicate that in most of the trials the agents learned to cooperate. Parameters other than α0 and β0 were randomly selected from Table 1.

slide-135
SLIDE 135

Mutual cooperation as a result of reward and punishmen

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 19

A contour plot of the percentage of trials out of 1,000 that converged to mutual cooperation as a function of each (δ, σ) pair. Light colors indicate that most of the trials converged to mutual cooperation. Parameters

  • ther than δ and σ were randomly selected from Table 1. (From:

Stimpson et al., 2001.)

slide-136
SLIDE 136

Effects of the initial actions

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 20

Initial actions Cooperation Random 73.7% CC 81.6% DD 81.6% CD or DC 66.7% Table 2: Percentage of cooperation out of 1,000 trials as a function of initial actions. Parameters other than A0 and B0 were randomly selected from Table 1. (From: “Satisficing and Learning Cooperation . . . ”, Stimpson et al., 2001.)

slide-137
SLIDE 137

Effect of the persistence rate

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 21

Percentage of trials out of 1,000 that converged to mutual cooperation as a function of the persistence rate, λ. Parameters other than λ were selected randomly as described in Table 1. (From: “Satisficing and Learning Cooperation . . . ”, Stimpson et al., 2001.)

slide-138
SLIDE 138

Experiments with specific parameters

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 22

slide-139
SLIDE 139

Final outcome as a result of initial aspirations

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 23

slide-140
SLIDE 140

Final outcome as a result of initial aspirations

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 23

■ Initial aspiration of

player A on x-axis; Initial aspiration of player B on y-axis.

slide-141
SLIDE 141

Final outcome as a result of initial aspirations

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 23

■ Initial aspiration of

player A on x-axis; Initial aspiration of player B on y-axis.

■ White: convergence to

(C, C); black:

convergence to (D, D); grey: periodic or chaotic behaviour.

slide-142
SLIDE 142

Final outcome as a result of initial aspirations

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 23

■ Initial aspiration of

player A on x-axis; Initial aspiration of player B on y-axis.

■ White: convergence to

(C, C); black:

convergence to (D, D); grey: periodic or chaotic behaviour.

■ (A0, B0) = (D, D),

σ = 0.8, δ = 0.7, λ = 0.9.

slide-143
SLIDE 143

Final outcome as a result of initial aspirations

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 23

■ Initial aspiration of

player A on x-axis; Initial aspiration of player B on y-axis.

■ White: convergence to

(C, C); black:

convergence to (D, D); grey: periodic or chaotic behaviour.

■ (A0, B0) = (D, D),

σ = 0.8, δ = 0.7, λ = 0.9. (From: “Satisficing and Learning Cooperation in the Prisoner’s Dilemma”, Stimpson et al., 2001.)

slide-144
SLIDE 144

Final outcome as a result of initial aspirations (demo)

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 24

slide-145
SLIDE 145

Final outcome as a result of initial aspirations (buildup)

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 25

slide-146
SLIDE 146

Final outcome as a result of initial aspirations

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 26

■ Initial aspiration of

player A on x-axis; Initial aspiration of player B on y-axis.

■ White: convergence to

(C, C); black: convergence to (D, D); grey: periodic or chaotic behaviour.

■ (A0, B0) = (C, C),

σ = 0.8, δ = 0.5, λ = 0.5. (From: “Satisficing and Learning Cooperation in the Prisoner’s Dilemma”, Stimpson et al., 2001.)

slide-147
SLIDE 147

Final outcome as a result of initial aspirations

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 27

■ Initial aspiration of

player A on x-axis; Initial aspiration of player B on y-axis.

■ White: convergence to

(C, C); black: convergence to (D, D); grey: periodic or chaotic behaviour.

■ (A0, B0) = (D, C),

σ = 0.6, δ = 0.5, λ = 0.8. (From: “Satisficing and Learning Cooperation in the Prisoner’s Dilemma”, Stimpson et al., 2001.)

slide-148
SLIDE 148

Difficult games for satisficing play

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 28

slide-149
SLIDE 149

Difficult games for satisficing play (RPSc)

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 29

slide-150
SLIDE 150

Difficult games for satisficing play (Shapley)

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 30

slide-151
SLIDE 151

Difficult games for satisficing play (Curve)

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 31

slide-152
SLIDE 152

Regret matching as a form of satisficing play

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 32

slide-153
SLIDE 153

Regret matching as a form of satisficing play

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 33

aspiration level reinfo r ement in rement Hyp
  • theti al
reinfo r ement
slide-154
SLIDE 154

Regret matching as a form of satisficing play

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 33

■ Regret matching can be cast in

a reinforcement rule with an

aspiration level ¯

ut (cf. Strategic Learning, H. Peyton Young,

  • Ch. 2, p. 22).
reinfo r ement in rement Hyp
  • theti al
reinfo r ement
slide-155
SLIDE 155

Regret matching as a form of satisficing play

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 33

■ Regret matching can be cast in

a reinforcement rule with an

aspiration level ¯

ut (cf. Strategic Learning, H. Peyton Young,

  • Ch. 2, p. 22).

■ Define the

reinfo r ement in rement for every action x in

round t as ∆rt

x =Def u(x, yt) − ¯

ut.

Hyp
  • theti al
reinfo r ement
slide-156
SLIDE 156

Regret matching as a form of satisficing play

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 33

■ Regret matching can be cast in

a reinforcement rule with an

aspiration level ¯

ut (cf. Strategic Learning, H. Peyton Young,

  • Ch. 2, p. 22).

■ Define the

reinfo r ement in rement for every action x in

round t as ∆rt

x =Def u(x, yt) − ¯

ut.

■ Define the propensities in

round t + 1 as θt+1

x

=Def

  • t

s=1

∆rt

x

  • +
Hyp
  • theti al
reinfo r ement
slide-157
SLIDE 157

Regret matching as a form of satisficing play

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 33

■ Regret matching can be cast in

a reinforcement rule with an

aspiration level ¯

ut (cf. Strategic Learning, H. Peyton Young,

  • Ch. 2, p. 22).

■ Define the

reinfo r ement in rement for every action x in

round t as ∆rt

x =Def u(x, yt) − ¯

ut.

■ Define the propensities in

round t + 1 as θt+1

x

=Def

  • t

s=1

∆rt

x

  • +

■ This is like standard

reinforcement, but now all actions in a given period are reinforced, whether or not they are actually played.

Hyp
  • theti al
reinfo r ement
slide-158
SLIDE 158

Regret matching as a form of satisficing play

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 33

■ Regret matching can be cast in

a reinforcement rule with an

aspiration level ¯

ut (cf. Strategic Learning, H. Peyton Young,

  • Ch. 2, p. 22).

■ Define the

reinfo r ement in rement for every action x in

round t as ∆rt

x =Def u(x, yt) − ¯

ut.

■ Define the propensities in

round t + 1 as θt+1

x

=Def

  • t

s=1

∆rt

x

  • +

■ This is like standard

reinforcement, but now all actions in a given period are reinforced, whether or not they are actually played.

Hyp
  • theti al
reinfo r ement

takes into account virtual

  • payoffs. (Payoffs that never

materialised.)

slide-159
SLIDE 159

Regret matching as a form of satisficing play

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 33

■ Regret matching can be cast in

a reinforcement rule with an

aspiration level ¯

ut (cf. Strategic Learning, H. Peyton Young,

  • Ch. 2, p. 22).

■ Define the

reinfo r ement in rement for every action x in

round t as ∆rt

x =Def u(x, yt) − ¯

ut.

■ Define the propensities in

round t + 1 as θt+1

x

=Def

  • t

s=1

∆rt

x

  • +

■ This is like standard

reinforcement, but now all actions in a given period are reinforced, whether or not they are actually played.

Hyp
  • theti al
reinfo r ement

takes into account virtual

  • payoffs. (Payoffs that never

materialised.) The vector ∆rt is a vector of virtual reinforcements—gains

  • r losses relative to the

current average that that would have materialised if a given action x had been played at time t.

slide-160
SLIDE 160

Conclusions

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 34

slide-161
SLIDE 161

Conclusions

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 34

■ Agents should have high enough initial aspirations.

slide-162
SLIDE 162

Conclusions

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 34

■ Agents should have high enough initial aspirations. ■ Agents should learn, but slowly.

slide-163
SLIDE 163

Conclusions

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 34

■ Agents should have high enough initial aspirations. ■ Agents should learn, but slowly. ■ The difference between payoffs for mutual defection and mutual

cooperation should be maximized.

slide-164
SLIDE 164

Conclusions

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 34

■ Agents should have high enough initial aspirations. ■ Agents should learn, but slowly. ■ The difference between payoffs for mutual defection and mutual

cooperation should be maximized.

■ Agents should start out with similar behavior.

slide-165
SLIDE 165

Conclusions

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 34

■ Agents should have high enough initial aspirations. ■ Agents should learn, but slowly. ■ The difference between payoffs for mutual defection and mutual

cooperation should be maximized.

■ Agents should start out with similar behavior.

As a test, a final set of 5,000 simulations was run within the following confined parameter space: Parameter Min Max

slide-166
SLIDE 166

Conclusions

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 34

■ Agents should have high enough initial aspirations. ■ Agents should learn, but slowly. ■ The difference between payoffs for mutual defection and mutual

cooperation should be maximized.

■ Agents should start out with similar behavior.

As a test, a final set of 5,000 simulations was run within the following confined parameter space: Parameter Min Max α0, β0 σ 2.0

slide-167
SLIDE 167

Conclusions

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 34

■ Agents should have high enough initial aspirations. ■ Agents should learn, but slowly. ■ The difference between payoffs for mutual defection and mutual

cooperation should be maximized.

■ Agents should start out with similar behavior.

As a test, a final set of 5,000 simulations was run within the following confined parameter space: Parameter Min Max α0, β0 σ 2.0 λ 0.8 1.0

slide-168
SLIDE 168

Conclusions

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 34

■ Agents should have high enough initial aspirations. ■ Agents should learn, but slowly. ■ The difference between payoffs for mutual defection and mutual

cooperation should be maximized.

■ Agents should start out with similar behavior.

As a test, a final set of 5,000 simulations was run within the following confined parameter space: Parameter Min Max α0, β0 σ 2.0 λ 0.8 1.0 σ 0.51 1.0

slide-169
SLIDE 169

Conclusions

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 34

■ Agents should have high enough initial aspirations. ■ Agents should learn, but slowly. ■ The difference between payoffs for mutual defection and mutual

cooperation should be maximized.

■ Agents should start out with similar behavior.

As a test, a final set of 5,000 simulations was run within the following confined parameter space: Parameter Min Max α0, β0 σ 2.0 λ 0.8 1.0 σ 0.51 1.0 δ 0.1 σ − 0.4

slide-170
SLIDE 170

Conclusions

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 34

■ Agents should have high enough initial aspirations. ■ Agents should learn, but slowly. ■ The difference between payoffs for mutual defection and mutual

cooperation should be maximized.

■ Agents should start out with similar behavior.

As a test, a final set of 5,000 simulations was run within the following confined parameter space: Parameter Min Max α0, β0 σ 2.0 λ 0.8 1.0 σ 0.51 1.0 δ 0.1 σ − 0.4 A0, B0 A0 = B0

slide-171
SLIDE 171

Conclusions

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 34

■ Agents should have high enough initial aspirations. ■ Agents should learn, but slowly. ■ The difference between payoffs for mutual defection and mutual

cooperation should be maximized.

■ Agents should start out with similar behavior.

As a test, a final set of 5,000 simulations was run within the following confined parameter space: Parameter Min Max α0, β0 σ 2.0 λ 0.8 1.0 σ 0.51 1.0 δ 0.1 σ − 0.4 A0, B0 A0 = B0 Result: 100% mutual cooperation.

slide-172
SLIDE 172

What next?

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 35

single mixed strategy p robabilit y distribution
  • ver
(a p
  • ssibly
  • nned
set
  • f
) mixed strategies

Gradient dynamics:

■ Like fictitious play, players model (or

assess) each other through mixed strategies.

■ Strategies are not played, only

maintained.

■ Due to CKR (common knowledge of

rationality, cf. Hargreaves Heap & Varoufakis, 2004), all models of mixed strategies are correct. (I.e., q−i = s−i, for all i.)

■ Players gradually adapt their mixed

strategies through hill-climbing in the payoff space.

slide-173
SLIDE 173

What next?

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 35

Bayesian play:

single mixed strategy p robabilit y distribution
  • ver
(a p
  • ssibly
  • nned
set
  • f
) mixed strategies

Gradient dynamics:

■ Like fictitious play, players model (or

assess) each other through mixed strategies.

■ Strategies are not played, only

maintained.

■ Due to CKR (common knowledge of

rationality, cf. Hargreaves Heap & Varoufakis, 2004), all models of mixed strategies are correct. (I.e., q−i = s−i, for all i.)

■ Players gradually adapt their mixed

strategies through hill-climbing in the payoff space.

slide-174
SLIDE 174

What next?

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 35

Bayesian play:

■ With fictitious play,

the behaviour of

  • pponents is

modelled by a

single mixed strategy. p robabilit y distribution
  • ver
(a p
  • ssibly
  • nned
set
  • f
) mixed strategies

Gradient dynamics:

■ Like fictitious play, players model (or

assess) each other through mixed strategies.

■ Strategies are not played, only

maintained.

■ Due to CKR (common knowledge of

rationality, cf. Hargreaves Heap & Varoufakis, 2004), all models of mixed strategies are correct. (I.e., q−i = s−i, for all i.)

■ Players gradually adapt their mixed

strategies through hill-climbing in the payoff space.

slide-175
SLIDE 175

What next?

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 35

Bayesian play:

■ With fictitious play,

the behaviour of

  • pponents is

modelled by a

single mixed strategy.

■ With Bayesian play,

  • pponents are

modelled by a

p robabilit y distribution
  • ver
(a p
  • ssibly
  • nned
set
  • f
) mixed strategies.
slide-176
SLIDE 176

What next?

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 35

Bayesian play:

■ With fictitious play,

the behaviour of

  • pponents is

modelled by a

single mixed strategy.

■ With Bayesian play,

  • pponents are

modelled by a

p robabilit y distribution
  • ver
(a p
  • ssibly
  • nned
set
  • f
) mixed strategies.

Gradient dynamics:

slide-177
SLIDE 177

What next?

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 35

Bayesian play:

■ With fictitious play,

the behaviour of

  • pponents is

modelled by a

single mixed strategy.

■ With Bayesian play,

  • pponents are

modelled by a

p robabilit y distribution
  • ver
(a p
  • ssibly
  • nned
set
  • f
) mixed strategies.

Gradient dynamics:

■ Like fictitious play, players model (or

assess) each other through mixed strategies.

slide-178
SLIDE 178

What next?

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 35

Bayesian play:

■ With fictitious play,

the behaviour of

  • pponents is

modelled by a

single mixed strategy.

■ With Bayesian play,

  • pponents are

modelled by a

p robabilit y distribution
  • ver
(a p
  • ssibly
  • nned
set
  • f
) mixed strategies.

Gradient dynamics:

■ Like fictitious play, players model (or

assess) each other through mixed strategies.

■ Strategies are not played, only

maintained.

slide-179
SLIDE 179

What next?

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 35

Bayesian play:

■ With fictitious play,

the behaviour of

  • pponents is

modelled by a

single mixed strategy.

■ With Bayesian play,

  • pponents are

modelled by a

p robabilit y distribution
  • ver
(a p
  • ssibly
  • nned
set
  • f
) mixed strategies.

Gradient dynamics:

■ Like fictitious play, players model (or

assess) each other through mixed strategies.

■ Strategies are not played, only

maintained.

■ Due to CKR (common knowledge of

rationality, cf. Hargreaves Heap & Varoufakis, 2004), all models of mixed strategies are correct. (I.e., q−i = s−i, for all i.)

slide-180
SLIDE 180

What next?

Author: Gerard Vreeswijk. Slides last modified on June 16th, 2020 at 12:21 Multi-agent learning: Satisficing play, slide 35

Bayesian play:

■ With fictitious play,

the behaviour of

  • pponents is

modelled by a

single mixed strategy.

■ With Bayesian play,

  • pponents are

modelled by a

p robabilit y distribution
  • ver
(a p
  • ssibly
  • nned
set
  • f
) mixed strategies.

Gradient dynamics:

■ Like fictitious play, players model (or

assess) each other through mixed strategies.

■ Strategies are not played, only

maintained.

■ Due to CKR (common knowledge of

rationality, cf. Hargreaves Heap & Varoufakis, 2004), all models of mixed strategies are correct. (I.e., q−i = s−i, for all i.)

■ Players gradually adapt their mixed

strategies through hill-climbing in the payoff space.