Multi-agent learning Compa ring algo rithms empirially Gerard - - PowerPoint PPT Presentation

multi agent learning
SMART_READER_LITE
LIVE PREVIEW

Multi-agent learning Compa ring algo rithms empirially Gerard - - PowerPoint PPT Presentation

Multi-agent learning Compa ring algo rithms empirially Gerard Vreeswijk , Intelligent Software Systems, Computer Science Department, Faculty of Sciences, Utrecht University, The Netherlands. Sunday 21 st June, 2020 pitting games against


slide-1
SLIDE 1

Multi-agent learning

Compa ring algo rithms empiri ally

Gerard Vreeswijk, Intelligent Software Systems, Computer Science Department, Faculty of Sciences, Utrecht University, The Netherlands.

Sunday 21st June, 2020

slide-2
SLIDE 2

Motivation

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 2

pitting games against ea h
  • ther
elimination evolutiona ry app roa hes
slide-3
SLIDE 3

Motivation

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 2

■ Until 2005, say, MAL research

was typically done this way:

pitting games against ea h
  • ther
elimination evolutiona ry app roa hes
slide-4
SLIDE 4

Motivation

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 2

■ Until 2005, say, MAL research

was typically done this way:

  • 1. Invent a nifty MAL

algorithm

pitting games against ea h
  • ther
elimination evolutiona ry app roa hes
slide-5
SLIDE 5

Motivation

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 2

■ Until 2005, say, MAL research

was typically done this way:

  • 1. Invent a nifty MAL

algorithm

  • 2. Compare the nifty

algorithm with a handful of trendy MAL algorithms on a handful of well-known games.

pitting games against ea h
  • ther
elimination evolutiona ry app roa hes
slide-6
SLIDE 6

Motivation

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 2

■ Until 2005, say, MAL research

was typically done this way:

  • 1. Invent a nifty MAL

algorithm

  • 2. Compare the nifty

algorithm with a handful of trendy MAL algorithms on a handful of well-known games.

  • 3. Show that the new

algorithm performs much better.

pitting games against ea h
  • ther
elimination evolutiona ry app roa hes
slide-7
SLIDE 7

Motivation

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 2

■ Until 2005, say, MAL research

was typically done this way:

  • 1. Invent a nifty MAL

algorithm

  • 2. Compare the nifty

algorithm with a handful of trendy MAL algorithms on a handful of well-known games.

  • 3. Show that the new

algorithm performs much better.

■ Later, MAL algorithms were

compared more systematically.

pitting games against ea h
  • ther
elimination evolutiona ry app roa hes
slide-8
SLIDE 8

Motivation

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 2

■ Until 2005, say, MAL research

was typically done this way:

  • 1. Invent a nifty MAL

algorithm

  • 2. Compare the nifty

algorithm with a handful of trendy MAL algorithms on a handful of well-known games.

  • 3. Show that the new

algorithm performs much better.

■ Later, MAL algorithms were

compared more systematically.

  • By
pitting games against ea h
  • ther.
elimination evolutiona ry app roa hes
slide-9
SLIDE 9

Motivation

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 2

■ Until 2005, say, MAL research

was typically done this way:

  • 1. Invent a nifty MAL

algorithm

  • 2. Compare the nifty

algorithm with a handful of trendy MAL algorithms on a handful of well-known games.

  • 3. Show that the new

algorithm performs much better.

■ Later, MAL algorithms were

compared more systematically.

  • By
pitting games against ea h
  • ther.
  • Through
elimination

(”knock-out”).

evolutiona ry app roa hes
slide-10
SLIDE 10

Motivation

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 2

■ Until 2005, say, MAL research

was typically done this way:

  • 1. Invent a nifty MAL

algorithm

  • 2. Compare the nifty

algorithm with a handful of trendy MAL algorithms on a handful of well-known games.

  • 3. Show that the new

algorithm performs much better.

■ Later, MAL algorithms were

compared more systematically.

  • By
pitting games against ea h
  • ther.
  • Through
elimination

(”knock-out”).

  • Through
evolutiona ry app roa hes.
slide-11
SLIDE 11

Motivation

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 2

■ Until 2005, say, MAL research

was typically done this way:

  • 1. Invent a nifty MAL

algorithm

  • 2. Compare the nifty

algorithm with a handful of trendy MAL algorithms on a handful of well-known games.

  • 3. Show that the new

algorithm performs much better.

■ Later, MAL algorithms were

compared more systematically.

  • By
pitting games against ea h
  • ther.
  • Through
elimination

(”knock-out”).

  • Through
evolutiona ry app roa hes.

■ We will look into the

methodology (= procedures) and methods (= tools) of the various approaches.

slide-12
SLIDE 12

Motivation

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 2

■ Until 2005, say, MAL research

was typically done this way:

  • 1. Invent a nifty MAL

algorithm

  • 2. Compare the nifty

algorithm with a handful of trendy MAL algorithms on a handful of well-known games.

  • 3. Show that the new

algorithm performs much better.

■ Later, MAL algorithms were

compared more systematically.

  • By
pitting games against ea h
  • ther.
  • Through
elimination

(”knock-out”).

  • Through
evolutiona ry app roa hes.

■ We will look into the

methodology (= procedures) and methods (= tools) of the various approaches. The algorithms themselves and the outcomes of the comparison are of secondary interest in our review today!

slide-13
SLIDE 13

Work discussed

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 3

slide-14
SLIDE 14

Work discussed

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 3

■ Axelrod: organising tournaments to let algorithms play the IPD.

Axelrod, Robert. The evolution of cooperation. (1984) New York: Basic Books.

slide-15
SLIDE 15

Work discussed

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 3

■ Axelrod: organising tournaments to let algorithms play the IPD.

Axelrod, Robert. The evolution of cooperation. (1984) New York: Basic Books.

■ Zawadzki et al.: straight but thorough.

Zawadzki, Erik, Asher Lipson, and Kevin Leyton-Brown. “Empirically evaluating multiagent learning algorithms.” arXiv preprint arXiv:1401.8074 (2014).

slide-16
SLIDE 16

Work discussed

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 3

■ Axelrod: organising tournaments to let algorithms play the IPD.

Axelrod, Robert. The evolution of cooperation. (1984) New York: Basic Books.

■ Zawadzki et al.: straight but thorough.

Zawadzki, Erik, Asher Lipson, and Kevin Leyton-Brown. “Empirically evaluating multiagent learning algorithms.” arXiv preprint arXiv:1401.8074 (2014).

■ Bouzy et al.: elimination (“knock-out”).

Bouzy, Bruno, and Marc Métivier. “Multi-agent learning experiments on repeated matrix games” in: Proc. of the 27th Int. Conf. on Machine Learning (2010).

slide-17
SLIDE 17

Work discussed

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 3

■ Axelrod: organising tournaments to let algorithms play the IPD.

Axelrod, Robert. The evolution of cooperation. (1984) New York: Basic Books.

■ Zawadzki et al.: straight but thorough.

Zawadzki, Erik, Asher Lipson, and Kevin Leyton-Brown. “Empirically evaluating multiagent learning algorithms.” arXiv preprint arXiv:1401.8074 (2014).

■ Bouzy et al.: elimination (“knock-out”).

Bouzy, Bruno, and Marc Métivier. “Multi-agent learning experiments on repeated matrix games” in: Proc. of the 27th Int. Conf. on Machine Learning (2010).

■ Airiau et al.: evolutionary dynamics.

Airiau, Stéphane, Sabyasachi Saha, and Sandip Sen. “Evolutionary tournament-based comparison of learning and non-learning algorithms for iterated games” in: Journal of Artificial Societies and Social Simulation 10.3 (2007).

slide-18
SLIDE 18

Round robin tournament

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 4

grand table head-to-head s o res p erfo rman e measures to even
  • ut
randomness settling-in phase burn-in phase
slide-19
SLIDE 19

Round robin tournament

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 4

Given a pool of games G to test on, all approaches have in common that they have a

grand table of head-to-head s o res:

A1 A2 . . . A12 avg algorithm A1 3.2 5.1 . . . 4.7 4.1 A2 2.4 1.2 . . . 2.2 1.3 . . . . . . . . . ... . . . . . . A12 3.1 6.1 . . . 3.8 4.2

p erfo rman e measures to even
  • ut
randomness settling-in phase burn-in phase
slide-20
SLIDE 20

Round robin tournament

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 4

Given a pool of games G to test on, all approaches have in common that they have a

grand table of head-to-head s o res:

A1 A2 . . . A12 avg algorithm A1 3.2 5.1 . . . 4.7 4.1 A2 2.4 1.2 . . . 2.2 1.3 . . . . . . . . . ... . . . . . . A12 3.1 6.1 . . . 3.8 4.2

■ Entries are

p erfo rman e measures for the protagonist (row), which

almost always is average payoff

to even
  • ut
randomness settling-in phase burn-in phase
slide-21
SLIDE 21

Round robin tournament

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 4

Given a pool of games G to test on, all approaches have in common that they have a

grand table of head-to-head s o res:

A1 A2 . . . A12 avg algorithm A1 3.2 5.1 . . . 4.7 4.1 A2 2.4 1.2 . . . 2.2 1.3 . . . . . . . . . ... . . . . . . A12 3.1 6.1 . . . 3.8 4.2

■ Entries are

p erfo rman e measures for the protagonist (row), which

almost always is average payoff (alternatives: no-regret, . . . ).

to even
  • ut
randomness settling-in phase burn-in phase
slide-22
SLIDE 22

Round robin tournament

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 4

Given a pool of games G to test on, all approaches have in common that they have a

grand table of head-to-head s o res:

A1 A2 . . . A12 avg algorithm A1 3.2 5.1 . . . 4.7 4.1 A2 2.4 1.2 . . . 2.2 1.3 . . . . . . . . . ... . . . . . . A12 3.1 6.1 . . . 3.8 4.2

■ Entries are

p erfo rman e measures for the protagonist (row), which

almost always is average payoff (alternatives: no-regret, . . . ).

■ Often each entry is computed multiple times

to even
  • ut
randomness in

algorithms (which are implementations of response rules).

settling-in phase burn-in phase
slide-23
SLIDE 23

Round robin tournament

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 4

Given a pool of games G to test on, all approaches have in common that they have a

grand table of head-to-head s o res:

A1 A2 . . . A12 avg algorithm A1 3.2 5.1 . . . 4.7 4.1 A2 2.4 1.2 . . . 2.2 1.3 . . . . . . . . . ... . . . . . . A12 3.1 6.1 . . . 3.8 4.2

■ Entries are

p erfo rman e measures for the protagonist (row), which

almost always is average payoff (alternatives: no-regret, . . . ).

■ Often each entry is computed multiple times

to even
  • ut
randomness in

algorithms (which are implementations of response rules).

■ Sometimes there is a

settling-in phase (a.k.a. burn-in phase) in which

payoffs are not yet recorded.

slide-24
SLIDE 24

Work of Axelrod (1980, 1984)

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 5

slide-25
SLIDE 25

Axelrod receiving the National Medal of Science (2014)

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 6

slide-26
SLIDE 26

Axelrod: tournament for the repeated prisoner’s dilemma

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 7

Zero-Determinant strategies
slide-27
SLIDE 27

Axelrod: tournament for the repeated prisoner’s dilemma

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 7

■ One game to test: the

prisoner’s dilemma.

Zero-Determinant strategies
slide-28
SLIDE 28

Axelrod: tournament for the repeated prisoner’s dilemma

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 7

■ One game to test: the

prisoner’s dilemma.

■ Contestants: 14 constructed

algorithms + 1 random = 15: Tit-for-tat, Shubik, Nydegger, Joss, . . . , Random.

Zero-Determinant strategies
slide-29
SLIDE 29

Axelrod: tournament for the repeated prisoner’s dilemma

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 7

■ One game to test: the

prisoner’s dilemma.

■ Contestants: 14 constructed

algorithms + 1 random = 15: Tit-for-tat, Shubik, Nydegger, Joss, . . . , Random. Response rules (algorithms) were mostly reactive. One could hardly speak of learning.

Zero-Determinant strategies
slide-30
SLIDE 30

Axelrod: tournament for the repeated prisoner’s dilemma

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 7

■ One game to test: the

prisoner’s dilemma.

■ Contestants: 14 constructed

algorithms + 1 random = 15: Tit-for-tat, Shubik, Nydegger, Joss, . . . , Random. Response rules (algorithms) were mostly reactive. One could hardly speak of learning.

■ Grand table: all pairs play 200

rounds.

Zero-Determinant strategies
slide-31
SLIDE 31

Axelrod: tournament for the repeated prisoner’s dilemma

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 7

■ One game to test: the

prisoner’s dilemma.

■ Contestants: 14 constructed

algorithms + 1 random = 15: Tit-for-tat, Shubik, Nydegger, Joss, . . . , Random. Response rules (algorithms) were mostly reactive. One could hardly speak of learning.

■ Grand table: all pairs play 200

  • rounds. This was repeated 5

times to even out randomness.

Zero-Determinant strategies
slide-32
SLIDE 32

Axelrod: tournament for the repeated prisoner’s dilemma

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 7

■ One game to test: the

prisoner’s dilemma.

■ Contestants: 14 constructed

algorithms + 1 random = 15: Tit-for-tat, Shubik, Nydegger, Joss, . . . , Random. Response rules (algorithms) were mostly reactive. One could hardly speak of learning.

■ Grand table: all pairs play 200

  • rounds. This was repeated 5

times to even out randomness.

■ Winner: Tit-for-tat.

Axelrod, Robert. "Effective choice in the prisoner’s dilemma." Journal of conflict resolution 24.1 (1980): 3-25.

Zero-Determinant strategies
slide-33
SLIDE 33

Axelrod: tournament for the repeated prisoner’s dilemma

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 7

■ One game to test: the

prisoner’s dilemma.

■ Contestants: 14 constructed

algorithms + 1 random = 15: Tit-for-tat, Shubik, Nydegger, Joss, . . . , Random. Response rules (algorithms) were mostly reactive. One could hardly speak of learning.

■ Grand table: all pairs play 200

  • rounds. This was repeated 5

times to even out randomness.

■ Winner: Tit-for-tat.

Axelrod, Robert. "Effective choice in the prisoner’s dilemma." Journal of conflict resolution 24.1 (1980): 3-25.

■ Second tournament: 64

contestants.

Zero-Determinant strategies
slide-34
SLIDE 34

Axelrod: tournament for the repeated prisoner’s dilemma

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 7

■ One game to test: the

prisoner’s dilemma.

■ Contestants: 14 constructed

algorithms + 1 random = 15: Tit-for-tat, Shubik, Nydegger, Joss, . . . , Random. Response rules (algorithms) were mostly reactive. One could hardly speak of learning.

■ Grand table: all pairs play 200

  • rounds. This was repeated 5

times to even out randomness.

■ Winner: Tit-for-tat.

Axelrod, Robert. "Effective choice in the prisoner’s dilemma." Journal of conflict resolution 24.1 (1980): 3-25.

■ Second tournament: 64

  • contestants. All contestants

were informed about the results of the first tournament.

Zero-Determinant strategies
slide-35
SLIDE 35

Axelrod: tournament for the repeated prisoner’s dilemma

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 7

■ One game to test: the

prisoner’s dilemma.

■ Contestants: 14 constructed

algorithms + 1 random = 15: Tit-for-tat, Shubik, Nydegger, Joss, . . . , Random. Response rules (algorithms) were mostly reactive. One could hardly speak of learning.

■ Grand table: all pairs play 200

  • rounds. This was repeated 5

times to even out randomness.

■ Winner: Tit-for-tat.

Axelrod, Robert. "Effective choice in the prisoner’s dilemma." Journal of conflict resolution 24.1 (1980): 3-25.

■ Second tournament: 64

  • contestants. All contestants

were informed about the results of the first tournament. Winner: Tit-for-tat.

Zero-Determinant strategies
slide-36
SLIDE 36

Axelrod: tournament for the repeated prisoner’s dilemma

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 7

■ One game to test: the

prisoner’s dilemma.

■ Contestants: 14 constructed

algorithms + 1 random = 15: Tit-for-tat, Shubik, Nydegger, Joss, . . . , Random. Response rules (algorithms) were mostly reactive. One could hardly speak of learning.

■ Grand table: all pairs play 200

  • rounds. This was repeated 5

times to even out randomness.

■ Winner: Tit-for-tat.

Axelrod, Robert. "Effective choice in the prisoner’s dilemma." Journal of conflict resolution 24.1 (1980): 3-25.

■ Second tournament: 64

  • contestants. All contestants

were informed about the results of the first tournament. Winner: Tit-for-tat.

■ In 2012, Alexander Stewart and

Joshua Plotkin ran a variant of Axelrod’s tournament with 19 strategies to test the effectiveness of the then newly discovered

Zero-Determinant strategies.
slide-37
SLIDE 37

Work of Zawadzki et al. (2014)

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 8

slide-38
SLIDE 38

Zawadzki et al.

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 9

slide-39
SLIDE 39

Zawadzki et al.

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 9

■ Contestants: FP, Determinate,

Awesome, Meta, WoLF-IGA, GSA, RVS, QL, Minmax-Q, Minmax-Q-IDR, Random.

slide-40
SLIDE 40

Zawadzki et al.

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 9

■ Contestants: FP, Determinate,

Awesome, Meta, WoLF-IGA, GSA, RVS, QL, Minmax-Q, Minmax-Q-IDR, Random. A motivation for this set of 11 algorithms, other than “state-of-the-art” wasn’t given.

slide-41
SLIDE 41

Zawadzki et al.

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 9

■ Contestants: FP, Determinate,

Awesome, Meta, WoLF-IGA, GSA, RVS, QL, Minmax-Q, Minmax-Q-IDR, Random. A motivation for this set of 11 algorithms, other than “state-of-the-art” wasn’t given.

■ Games: a suite of 13 interesting

families, D = D1, . . . , D13: D1 = games with normal covariant random payoffs; D2 = Bertrand

  • ligopoly; D3 = Cournot

duopoly; D4 = dispersion games; D5 = grab the dollar type games; D6 = guess two thirds of the average games; . . .

slide-42
SLIDE 42

Zawadzki et al.

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 9

■ Contestants: FP, Determinate,

Awesome, Meta, WoLF-IGA, GSA, RVS, QL, Minmax-Q, Minmax-Q-IDR, Random. A motivation for this set of 11 algorithms, other than “state-of-the-art” wasn’t given.

■ Games: a suite of 13 interesting

families, D = D1, . . . , D13: D1 = games with normal covariant random payoffs; D2 = Bertrand

  • ligopoly; D3 = Cournot

duopoly; D4 = dispersion games; D5 = grab the dollar type games; D6 = guess two thirds of the average games; . . .

■ Game pool: 600 games: 100

games for each size 22, 42, 62, 82, 102, randomly selected from

D, and 100 games of dimension

2 × 2 from Rapoport’s catalogue.

slide-43
SLIDE 43

Zawadzki et al.

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 9

■ Contestants: FP, Determinate,

Awesome, Meta, WoLF-IGA, GSA, RVS, QL, Minmax-Q, Minmax-Q-IDR, Random. A motivation for this set of 11 algorithms, other than “state-of-the-art” wasn’t given.

■ Games: a suite of 13 interesting

families, D = D1, . . . , D13: D1 = games with normal covariant random payoffs; D2 = Bertrand

  • ligopoly; D3 = Cournot

duopoly; D4 = dispersion games; D5 = grab the dollar type games; D6 = guess two thirds of the average games; . . .

■ Game pool: 600 games: 100

games for each size 22, 42, 62, 82, 102, randomly selected from

D, and 100 games of dimension

2 × 2 from Rapoport’s catalogue.

■ Grand table: each algorithm

pair plays all 600 games for 104 rounds.

slide-44
SLIDE 44

Zawadzki et al.

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 9

■ Contestants: FP, Determinate,

Awesome, Meta, WoLF-IGA, GSA, RVS, QL, Minmax-Q, Minmax-Q-IDR, Random. A motivation for this set of 11 algorithms, other than “state-of-the-art” wasn’t given.

■ Games: a suite of 13 interesting

families, D = D1, . . . , D13: D1 = games with normal covariant random payoffs; D2 = Bertrand

  • ligopoly; D3 = Cournot

duopoly; D4 = dispersion games; D5 = grab the dollar type games; D6 = guess two thirds of the average games; . . .

■ Game pool: 600 games: 100

games for each size 22, 42, 62, 82, 102, randomly selected from

D, and 100 games of dimension

2 × 2 from Rapoport’s catalogue.

■ Grand table: each algorithm

pair plays all 600 games for 104 rounds.

■ Evaluation: through

non-parametric tests and squared heat plots.

slide-45
SLIDE 45

Zawadzki et al.

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 9

■ Contestants: FP, Determinate,

Awesome, Meta, WoLF-IGA, GSA, RVS, QL, Minmax-Q, Minmax-Q-IDR, Random. A motivation for this set of 11 algorithms, other than “state-of-the-art” wasn’t given.

■ Games: a suite of 13 interesting

families, D = D1, . . . , D13: D1 = games with normal covariant random payoffs; D2 = Bertrand

  • ligopoly; D3 = Cournot

duopoly; D4 = dispersion games; D5 = grab the dollar type games; D6 = guess two thirds of the average games; . . .

■ Game pool: 600 games: 100

games for each size 22, 42, 62, 82, 102, randomly selected from

D, and 100 games of dimension

2 × 2 from Rapoport’s catalogue.

■ Grand table: each algorithm

pair plays all 600 games for 104 rounds.

■ Evaluation: through

non-parametric tests and squared heat plots.

■ Conclusion: Q-learning is the

  • verall winner.
slide-46
SLIDE 46

Zawadzki et al.

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 10

Mean reward over all opponents and games.

slide-47
SLIDE 47

Zawadzki et al.

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 10

Mean reward over all opponents and games. Mean regret over all opponents and games.

slide-48
SLIDE 48

Zawadzki et al.

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 11

Mean reward against different game suites.

slide-49
SLIDE 49

Zawadzki et al.

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 11

Mean reward against different game suites. Mean reward against different opponents.

slide-50
SLIDE 50

Parametric test: paired t-test

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 12

A1 A2 A3 dots A1 G1 G2 G3 G1 G2 G3 G1 G2 G3 . . . . . . . . . 2.1 3.1 4.7 5.1 1.1 1.2 3.5 4.2 3.8 . . . . . . . . . A2 G1 G2 G3 G1 G2 G3 G1 G2 G3 . . . . . . . . . 2.7 3.5 4.1 4.9 0.9 1.9 3.7 4.7 4.5 . . . . . . . . .

test statisti Student's t-distribution
  • value
slide-51
SLIDE 51

Parametric test: paired t-test

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 12

A1 A2 A3 dots A1 G1 G2 G3 G1 G2 G3 G1 G2 G3 . . . . . . . . . 2.1 3.1 4.7 5.1 1.1 1.2 3.5 4.2 3.8 . . . . . . . . . A2 G1 G2 G3 G1 G2 G3 G1 G2 G3 . . . . . . . . . 2.7 3.5 4.1 4.9 0.9 1.9 3.7 4.7 4.5 . . . . . . . . . Paired t-test:

test statisti Student's t-distribution
  • value
slide-52
SLIDE 52

Parametric test: paired t-test

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 12

A1 A2 A3 dots A1 G1 G2 G3 G1 G2 G3 G1 G2 G3 . . . . . . . . . 2.1 3.1 4.7 5.1 1.1 1.2 3.5 4.2 3.8 . . . . . . . . . A2 G1 G2 G3 G1 G2 G3 G1 G2 G3 . . . . . . . . . 2.7 3.5 4.1 4.9 0.9 1.9 3.7 4.7 4.5 . . . . . . . . . Paired t-test:

■ Compute the average difference ¯

XD, and the average standard deviation of differences ¯ sD, of all n pairs (we see nine here).

test statisti Student's t-distribution
  • value
slide-53
SLIDE 53

Parametric test: paired t-test

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 12

A1 A2 A3 dots A1 G1 G2 G3 G1 G2 G3 G1 G2 G3 . . . . . . . . . 2.1 3.1 4.7 5.1 1.1 1.2 3.5 4.2 3.8 . . . . . . . . . A2 G1 G2 G3 G1 G2 G3 G1 G2 G3 . . . . . . . . . 2.7 3.5 4.1 4.9 0.9 1.9 3.7 4.7 4.5 . . . . . . . . . Paired t-test:

■ Compute the average difference ¯

XD, and the average standard deviation of differences ¯ sD, of all n pairs (we see nine here).

■ If the two series are generated by the same random process, the

test statisti t = ¯

XD/(¯ sD

√n) should follow the

Student's t-distribution with

mean 0 and n − 1 degrees of freedom.

  • value
slide-54
SLIDE 54

Parametric test: paired t-test

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 12

A1 A2 A3 dots A1 G1 G2 G3 G1 G2 G3 G1 G2 G3 . . . . . . . . . 2.1 3.1 4.7 5.1 1.1 1.2 3.5 4.2 3.8 . . . . . . . . . A2 G1 G2 G3 G1 G2 G3 G1 G2 G3 . . . . . . . . . 2.7 3.5 4.1 4.9 0.9 1.9 3.7 4.7 4.5 . . . . . . . . . Paired t-test:

■ Compute the average difference ¯

XD, and the average standard deviation of differences ¯ sD, of all n pairs (we see nine here).

■ If the two series are generated by the same random process, the

test statisti t = ¯

XD/(¯ sD

√n) should follow the

Student's t-distribution with

mean 0 and n − 1 degrees of freedom.

■ If t is too eccentric, then we’ll have to reject that possibility, since

eccentric values of t are unlikely (“have a low p

  • value”).
slide-55
SLIDE 55

Parametric test: paired t-test

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 13

slide-56
SLIDE 56

Non-parametric test: the Kolmogorov-Smirnov test

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 14

test statisti empiri al umulative distribution fun tions
  • value
slide-57
SLIDE 57

Non-parametric test: the Kolmogorov-Smirnov test

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 14

■ Test whether two distributions are generated by the same random

  • process. H0: yes. H1: no.
test statisti empiri al umulative distribution fun tions
  • value
slide-58
SLIDE 58

Non-parametric test: the Kolmogorov-Smirnov test

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 14

■ Test whether two distributions are generated by the same random

  • process. H0: yes. H1: no.

■ The

test statisti is the maximum distance between the empiri al umulative distribution fun tions of the two samples.
  • value
slide-59
SLIDE 59

Non-parametric test: the Kolmogorov-Smirnov test

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 14

■ Test whether two distributions are generated by the same random

  • process. H0: yes. H1: no.

■ The

test statisti is the maximum distance between the empiri al umulative distribution fun tions of the two samples.

■ The p

  • value is the probability of seeing a test statistic (i.e., max

distance) as high as the one observed, under the assumption that both samples were drawn from the same distribution.

slide-60
SLIDE 60

Zawadzki et al.

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 15

Variations / extensions. Investigate:

p robabilisti ally dominate enfo r eable pa y
slide-61
SLIDE 61

Zawadzki et al.

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 15

Variations / extensions. Investigate:

■ The relation between game sizes and rewards.

p robabilisti ally dominate enfo r eable pa y
slide-62
SLIDE 62

Zawadzki et al.

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 15

Variations / extensions. Investigate:

■ The relation between game sizes and rewards. Outcome: no relation.

p robabilisti ally dominate enfo r eable pa y
slide-63
SLIDE 63

Zawadzki et al.

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 15

Variations / extensions. Investigate:

■ The relation between game sizes and rewards. Outcome: no relation. ■ The correlation between regret and average reward.

p robabilisti ally dominate enfo r eable pa y
slide-64
SLIDE 64

Zawadzki et al.

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 15

Variations / extensions. Investigate:

■ The relation between game sizes and rewards. Outcome: no relation. ■ The correlation between regret and average reward. ■ The correlation between distance to nearest Nash and average reward.

p robabilisti ally dominate enfo r eable pa y
slide-65
SLIDE 65

Zawadzki et al.

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 15

Variations / extensions. Investigate:

■ The relation between game sizes and rewards. Outcome: no relation. ■ The correlation between regret and average reward. ■ The correlation between distance to nearest Nash and average reward. ■ Which algorithms

p robabilisti ally dominate which other algorithms.

(Cf. article for a definition of this concept.)

enfo r eable pa y
slide-66
SLIDE 66

Zawadzki et al.

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 15

Variations / extensions. Investigate:

■ The relation between game sizes and rewards. Outcome: no relation. ■ The correlation between regret and average reward. ■ The correlation between distance to nearest Nash and average reward. ■ Which algorithms

p robabilisti ally dominate which other algorithms.

(Cf. article for a definition of this concept.) Outcome: Q-Learning is the only algorithm that is not probabilistically dominated by other algorithms.

enfo r eable pa y
slide-67
SLIDE 67

Zawadzki et al.

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 15

Variations / extensions. Investigate:

■ The relation between game sizes and rewards. Outcome: no relation. ■ The correlation between regret and average reward. ■ The correlation between distance to nearest Nash and average reward. ■ Which algorithms

p robabilisti ally dominate which other algorithms.

(Cf. article for a definition of this concept.) Outcome: Q-Learning is the only algorithm that is not probabilistically dominated by other algorithms.

■ the difference between average reward and maxmin value ( enfo

r eable pa y
  • ).
slide-68
SLIDE 68

Zawadzki et al.

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 15

Variations / extensions. Investigate:

■ The relation between game sizes and rewards. Outcome: no relation. ■ The correlation between regret and average reward. ■ The correlation between distance to nearest Nash and average reward. ■ Which algorithms

p robabilisti ally dominate which other algorithms.

(Cf. article for a definition of this concept.) Outcome: Q-Learning is the only algorithm that is not probabilistically dominated by other algorithms.

■ the difference between average reward and maxmin value ( enfo

r eable pa y
  • ).

Outcome: Q-Learning attained an enforceable payoff more frequently than any other algorithm.

slide-69
SLIDE 69

Work of Bouzy et al. (2010)

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 16

slide-70
SLIDE 70

Bouzy et al. (2010)

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 17

rank
slide-71
SLIDE 71

Bouzy et al. (2010)

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 17

■ Contestants: Minimax, FP, QL,

JR, Sat, M3, UCB, Exp3, HMC, Bully, Optimistic, Random (12 games).

rank
slide-72
SLIDE 72

Bouzy et al. (2010)

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 17

■ Contestants: Minimax, FP, QL,

JR, Sat, M3, UCB, Exp3, HMC, Bully, Optimistic, Random (12 games).

■ Games: random 2-player,

3 × 3-actions, with payoffs in Z ∩ [−9, 9]

rank
slide-73
SLIDE 73

Bouzy et al. (2010)

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 17

■ Contestants: Minimax, FP, QL,

JR, Sat, M3, UCB, Exp3, HMC, Bully, Optimistic, Random (12 games).

■ Games: random 2-player,

3 × 3-actions, with payoffs in Z ∩ [−9, 9] (if I understand correctly

rank
slide-74
SLIDE 74

Bouzy et al. (2010)

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 17

■ Contestants: Minimax, FP, QL,

JR, Sat, M3, UCB, Exp3, HMC, Bully, Optimistic, Random (12 games).

■ Games: random 2-player,

3 × 3-actions, with payoffs in Z ∩ [−9, 9] (if I understand correctly, else it’s [−9, 9]).

rank
slide-75
SLIDE 75

Bouzy et al. (2010)

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 17

■ Contestants: Minimax, FP, QL,

JR, Sat, M3, UCB, Exp3, HMC, Bully, Optimistic, Random (12 games).

■ Games: random 2-player,

3 × 3-actions, with payoffs in Z ∩ [−9, 9] (if I understand correctly, else it’s [−9, 9]).

■ Grand table: each pair plays

3 × 106 rounds (!) on a random

  • game. Restart 100 times to even
  • ut randomness.
rank
slide-76
SLIDE 76

Bouzy et al. (2010)

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 17

■ Contestants: Minimax, FP, QL,

JR, Sat, M3, UCB, Exp3, HMC, Bully, Optimistic, Random (12 games).

■ Games: random 2-player,

3 × 3-actions, with payoffs in Z ∩ [−9, 9] (if I understand correctly, else it’s [−9, 9]).

■ Grand table: each pair plays

3 × 106 rounds (!) on a random

  • game. Restart 100 times to even
  • ut randomness.

■ Final ranking: UCB, M3, Sat,

JR, . . .

rank
slide-77
SLIDE 77

Bouzy et al. (2010)

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 17

■ Contestants: Minimax, FP, QL,

JR, Sat, M3, UCB, Exp3, HMC, Bully, Optimistic, Random (12 games).

■ Games: random 2-player,

3 × 3-actions, with payoffs in Z ∩ [−9, 9] (if I understand correctly, else it’s [−9, 9]).

■ Grand table: each pair plays

3 × 106 rounds (!) on a random

  • game. Restart 100 times to even
  • ut randomness.

■ Final ranking: UCB, M3, Sat,

JR, . . .

■ Evaluation: Plot with x-axis =

log rounds and y-axis the

rank
  • f that algorithm w.r.t.

performance.

slide-78
SLIDE 78

Bouzy et al. (2010)

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 17

■ Contestants: Minimax, FP, QL,

JR, Sat, M3, UCB, Exp3, HMC, Bully, Optimistic, Random (12 games).

■ Games: random 2-player,

3 × 3-actions, with payoffs in Z ∩ [−9, 9] (if I understand correctly, else it’s [−9, 9]).

■ Grand table: each pair plays

3 × 106 rounds (!) on a random

  • game. Restart 100 times to even
  • ut randomness.

■ Final ranking: UCB, M3, Sat,

JR, . . .

■ Evaluation: Plot with x-axis =

log rounds and y-axis the

rank
  • f that algorithm w.r.t.

performance.

■ Bouzy et al. are familiar with

the work of Airiau et al. and Zawadzki et al..

slide-79
SLIDE 79

Bouzy et al. (2010)

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 17

■ Contestants: Minimax, FP, QL,

JR, Sat, M3, UCB, Exp3, HMC, Bully, Optimistic, Random (12 games).

■ Games: random 2-player,

3 × 3-actions, with payoffs in Z ∩ [−9, 9] (if I understand correctly, else it’s [−9, 9]).

■ Grand table: each pair plays

3 × 106 rounds (!) on a random

  • game. Restart 100 times to even
  • ut randomness.

■ Final ranking: UCB, M3, Sat,

JR, . . .

■ Evaluation: Plot with x-axis =

log rounds and y-axis the

rank
  • f that algorithm w.r.t.

performance.

■ Bouzy et al. are familiar with

the work of Airiau et al. and Zawadzki et al.. Contrary to Airiau et al. and Zawadzki et al.., the ranking still fluctuates after 3 × 106 rounds

slide-80
SLIDE 80

Bouzy et al. (2010)

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 17

■ Contestants: Minimax, FP, QL,

JR, Sat, M3, UCB, Exp3, HMC, Bully, Optimistic, Random (12 games).

■ Games: random 2-player,

3 × 3-actions, with payoffs in Z ∩ [−9, 9] (if I understand correctly, else it’s [−9, 9]).

■ Grand table: each pair plays

3 × 106 rounds (!) on a random

  • game. Restart 100 times to even
  • ut randomness.

■ Final ranking: UCB, M3, Sat,

JR, . . .

■ Evaluation: Plot with x-axis =

log rounds and y-axis the

rank
  • f that algorithm w.r.t.

performance.

■ Bouzy et al. are familiar with

the work of Airiau et al. and Zawadzki et al.. Contrary to Airiau et al. and Zawadzki et al.., the ranking still fluctuates after 3 × 106 rounds . . . ?!

slide-81
SLIDE 81

Bouzy et al. (2010)

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 18

Variation:

eliminate b y rank. eliminate b y lag
slide-82
SLIDE 82

Bouzy et al. (2010)

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 18

Variation:

eliminate b y rank.

■ Algorithm: Repeat:

eliminate b y lag
slide-83
SLIDE 83

Bouzy et al. (2010)

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 18

Variation:

eliminate b y rank.

■ Algorithm: Repeat:

  • Rank, eliminate the worst,

and subtract all payoffs earned against that player from the revenues of all survivors.

eliminate b y lag
slide-84
SLIDE 84

Bouzy et al. (2010)

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 18

Variation:

eliminate b y rank.

■ Algorithm: Repeat:

  • Rank, eliminate the worst,

and subtract all payoffs earned against that player from the revenues of all survivors.

■ Final ranking: M3, Sat, UCB,

JR, . . .

eliminate b y lag
slide-85
SLIDE 85

Bouzy et al. (2010)

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 18

Variation:

eliminate b y rank.

■ Algorithm: Repeat:

  • Rank, eliminate the worst,

and subtract all payoffs earned against that player from the revenues of all survivors.

■ Final ranking: M3, Sat, UCB,

JR, . . . Variation:

eliminate b y lag.
slide-86
SLIDE 86

Bouzy et al. (2010)

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 18

Variation:

eliminate b y rank.

■ Algorithm: Repeat:

  • Rank, eliminate the worst,

and subtract all payoffs earned against that player from the revenues of all survivors.

■ Final ranking: M3, Sat, UCB,

JR, . . . Variation:

eliminate b y lag.

■ Algorithm: Repeat:

slide-87
SLIDE 87

Bouzy et al. (2010)

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 18

Variation:

eliminate b y rank.

■ Algorithm: Repeat:

  • Rank, eliminate the worst,

and subtract all payoffs earned against that player from the revenues of all survivors.

■ Final ranking: M3, Sat, UCB,

JR, . . . Variation:

eliminate b y lag.

■ Algorithm: Repeat:

  • Organise a tournament. If

the difference between the cumulative returns of the two worst performers is larger than 600/

nT (nT the number of tournaments performed), then the laggard is removed.

slide-88
SLIDE 88

Bouzy et al. (2010)

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 18

Variation:

eliminate b y rank.

■ Algorithm: Repeat:

  • Rank, eliminate the worst,

and subtract all payoffs earned against that player from the revenues of all survivors.

■ Final ranking: M3, Sat, UCB,

JR, . . . Variation:

eliminate b y lag.

■ Algorithm: Repeat:

  • Organise a tournament. If

the difference between the cumulative returns of the two worst performers is larger than 600/

nT (nT the number of tournaments performed), then the laggard is removed. The laggard is also removed if the global ranking has not changed during 100n2

p(np − 1)2

tournaments since the last elimination (np is the current number of players).

slide-89
SLIDE 89

Bouzy et al. (2010)

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 18

Variation:

eliminate b y rank.

■ Algorithm: Repeat:

  • Rank, eliminate the worst,

and subtract all payoffs earned against that player from the revenues of all survivors.

■ Final ranking: M3, Sat, UCB,

JR, . . . Variation:

eliminate b y lag.

■ Algorithm: Repeat:

  • Organise a tournament. If

the difference between the cumulative returns of the two worst performers is larger than 600/

nT (nT the number of tournaments performed), then the laggard is removed. The laggard is also removed if the global ranking has not changed during 100n2

p(np − 1)2

tournaments since the last elimination (np is the current number of players).

■ Final ranking: M3, Sat, UCB,

FP, . . .

slide-90
SLIDE 90

Bouzy et al.

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 19

Ranking evolution according to the number of steps played in games (logscale). The key is ordered according to the final ranking.

slide-91
SLIDE 91

Bouzy et al.

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 20

Ranking based on eliminations (logscale). The key is ordered according to the final ranking.

slide-92
SLIDE 92

Bouzy et al. (2010)

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 21

Variation:

sele t sub- lasses
  • f
games.
slide-93
SLIDE 93

Bouzy et al. (2010)

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 21

Variation:

sele t sub- lasses
  • f
games.

■ Only cooperative games (shared payoffs): Exp3, M3, Bully, JR, . . .

slide-94
SLIDE 94

Bouzy et al. (2010)

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 21

Variation:

sele t sub- lasses
  • f
games.

■ Only cooperative games (shared payoffs): Exp3, M3, Bully, JR, . . . ■ Only competitive games (zero-sum payoffs): Exp3, M3, Minimax, JR,

. . .

slide-95
SLIDE 95

Bouzy et al. (2010)

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 21

Variation:

sele t sub- lasses
  • f
games.

■ Only cooperative games (shared payoffs): Exp3, M3, Bully, JR, . . . ■ Only competitive games (zero-sum payoffs): Exp3, M3, Minimax, JR,

. . .

■ Specific matrix games: penalty game, climbing game, coordination

game, . . .

slide-96
SLIDE 96

Bouzy et al. (2010)

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 21

Variation:

sele t sub- lasses
  • f
games.

■ Only cooperative games (shared payoffs): Exp3, M3, Bully, JR, . . . ■ Only competitive games (zero-sum payoffs): Exp3, M3, Minimax, JR,

. . .

■ Specific matrix games: penalty game, climbing game, coordination

game, . . .

■ Different number of actions (n × n games).

slide-97
SLIDE 97

Bouzy et al. (2010)

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 21

Variation:

sele t sub- lasses
  • f
games.

■ Only cooperative games (shared payoffs): Exp3, M3, Bully, JR, . . . ■ Only competitive games (zero-sum payoffs): Exp3, M3, Minimax, JR,

. . .

■ Specific matrix games: penalty game, climbing game, coordination

game, . . .

■ Different number of actions (n × n games).

Conclusion: M3, Sat, and UCB perform best. Do not maintain averages but geometric (decaying) averages of payoffs.

slide-98
SLIDE 98

Bouzy et al. (2010)

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 21

Variation:

sele t sub- lasses
  • f
games.

■ Only cooperative games (shared payoffs): Exp3, M3, Bully, JR, . . . ■ Only competitive games (zero-sum payoffs): Exp3, M3, Minimax, JR,

. . .

■ Specific matrix games: penalty game, climbing game, coordination

game, . . .

■ Different number of actions (n × n games).

Conclusion: M3, Sat, and UCB perform best. Do not maintain averages but geometric (decaying) averages of payoffs. Another interesting direction is exploring why Exp3 is the best MAL player on both cooperative games and competitive games, but not on general-sum games, and to exploit this fact to design a new MAL algorithm.

slide-99
SLIDE 99

Work of Airiau et al. (2007)

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 22

slide-100
SLIDE 100

Airiau et al.

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 23

tness-p rop
  • rtionate
sele tion
slide-101
SLIDE 101

Airiau et al.

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 23

■ Contestants: Maxmin, Nash,

Generalised TFT, BR, FP, BRFP, Bully, Saby (unpublished!?), Random (9).

tness-p rop
  • rtionate
sele tion
slide-102
SLIDE 102

Airiau et al.

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 23

■ Contestants: Maxmin, Nash,

Generalised TFT, BR, FP, BRFP, Bully, Saby (unpublished!?), Random (9). Motivation: “We chose well-known (. . . ) algorithms.”

tness-p rop
  • rtionate
sele tion
slide-103
SLIDE 103

Airiau et al.

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 23

■ Contestants: Maxmin, Nash,

Generalised TFT, BR, FP, BRFP, Bully, Saby (unpublished!?), Random (9). Motivation: “We chose well-known (. . . ) algorithms.”

■ Games: 57 distinct 2 × 2 normal

form games (Brams, 1994) “provides a sufficiently rich environment for our purpose”.

tness-p rop
  • rtionate
sele tion
slide-104
SLIDE 104

Airiau et al.

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 23

■ Contestants: Maxmin, Nash,

Generalised TFT, BR, FP, BRFP, Bully, Saby (unpublished!?), Random (9). Motivation: “We chose well-known (. . . ) algorithms.”

■ Games: 57 distinct 2 × 2 normal

form games (Brams, 1994) “provides a sufficiently rich environment for our purpose”.

■ Grand table: Each algorithm

pair plays a random game for 1000 rounds. Restart 20 times.

tness-p rop
  • rtionate
sele tion
slide-105
SLIDE 105

Airiau et al.

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 23

■ Contestants: Maxmin, Nash,

Generalised TFT, BR, FP, BRFP, Bully, Saby (unpublished!?), Random (9). Motivation: “We chose well-known (. . . ) algorithms.”

■ Games: 57 distinct 2 × 2 normal

form games (Brams, 1994) “provides a sufficiently rich environment for our purpose”.

■ Grand table: Each algorithm

pair plays a random game for 1000 rounds. Restart 20 times.

■ Methodology = evolutionary

dynamics:

tness-p rop
  • rtionate
sele tion
slide-106
SLIDE 106

Airiau et al.

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 23

■ Contestants: Maxmin, Nash,

Generalised TFT, BR, FP, BRFP, Bully, Saby (unpublished!?), Random (9). Motivation: “We chose well-known (. . . ) algorithms.”

■ Games: 57 distinct 2 × 2 normal

form games (Brams, 1994) “provides a sufficiently rich environment for our purpose”.

■ Grand table: Each algorithm

pair plays a random game for 1000 rounds. Restart 20 times.

■ Methodology = evolutionary

dynamics:

  • 1. Start with a population of

algorithms.

tness-p rop
  • rtionate
sele tion
slide-107
SLIDE 107

Airiau et al.

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 23

■ Contestants: Maxmin, Nash,

Generalised TFT, BR, FP, BRFP, Bully, Saby (unpublished!?), Random (9). Motivation: “We chose well-known (. . . ) algorithms.”

■ Games: 57 distinct 2 × 2 normal

form games (Brams, 1994) “provides a sufficiently rich environment for our purpose”.

■ Grand table: Each algorithm

pair plays a random game for 1000 rounds. Restart 20 times.

■ Methodology = evolutionary

dynamics:

  • 1. Start with a population of
  • algorithms. Initial

distribution is reciprocal to performance in previous experiments.

tness-p rop
  • rtionate
sele tion
slide-108
SLIDE 108

Airiau et al.

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 23

■ Contestants: Maxmin, Nash,

Generalised TFT, BR, FP, BRFP, Bully, Saby (unpublished!?), Random (9). Motivation: “We chose well-known (. . . ) algorithms.”

■ Games: 57 distinct 2 × 2 normal

form games (Brams, 1994) “provides a sufficiently rich environment for our purpose”.

■ Grand table: Each algorithm

pair plays a random game for 1000 rounds. Restart 20 times.

■ Methodology = evolutionary

dynamics:

  • 1. Start with a population of
  • algorithms. Initial

distribution is reciprocal to performance in previous experiments.

  • 2. Apply, e.g.,
tness-p rop
  • rtionate
sele tion

through pselect =

(1 + δ/δmax)/2, where δ is

normalised performance measure.

slide-109
SLIDE 109

Airiau et al.

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 23

■ Contestants: Maxmin, Nash,

Generalised TFT, BR, FP, BRFP, Bully, Saby (unpublished!?), Random (9). Motivation: “We chose well-known (. . . ) algorithms.”

■ Games: 57 distinct 2 × 2 normal

form games (Brams, 1994) “provides a sufficiently rich environment for our purpose”.

■ Grand table: Each algorithm

pair plays a random game for 1000 rounds. Restart 20 times.

■ Methodology = evolutionary

dynamics:

  • 1. Start with a population of
  • algorithms. Initial

distribution is reciprocal to performance in previous experiments.

  • 2. Apply, e.g.,
tness-p rop
  • rtionate
sele tion

through pselect =

(1 + δ/δmax)/2, where δ is

normalised performance measure.

■ Final ranking: BRFP, FP, Saby,

. . .

slide-110
SLIDE 110

Airiau et al. results

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 24

slide-111
SLIDE 111

Airiau et al.

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 25

Evolutionary tournament with six algorithms: 1% FP and equiproportion

  • f R, GTFT, BR, MaxMin, Nash each.

With tournament selection.

slide-112
SLIDE 112

Airiau et al.

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 25

Evolutionary tournament with six algorithms: 1% FP and equiproportion

  • f R, GTFT, BR, MaxMin, Nash each.

With tournament selection. With modified tournament selection.a

aModified tournament selection is a hybrid of fitness-proportionate

selection and 2-sample tournament selection. Cf. Sec. 2.7. of Airiau et al.’ 2007 paper.

slide-113
SLIDE 113

Implications (green concerns the replicator dynamic)

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 26

SN ESS NSS GSS ASS LSS LP NE FP * i * * SN = strict Nash, ESS - evol’y stable strategy, GSS = glob’y stable state, ASS = asymp’y stable state, NSS = neutrally stable strategy, LP = limit point, LSS = Lyapunov stable state, NE = Nash eq., FP = fixed point, * = only if fully mixed, i = isolated Nash eq. Dotted lines are indirect implications.

slide-114
SLIDE 114

Evaluation by computing NE of grand table

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 27

grand table
slide-115
SLIDE 115

Evaluation by computing NE of grand table

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 27

Reconsider the

grand table:

A1 A2 . . . A12 avg A1 3.1 5.1 . . . 4.7 4.1 A2 2.4 1.2 . . . 2.2 1.3 . . . . . . . . . ... . . . . . . A12 3.1 6.1 . . . 3.8 4.2

slide-116
SLIDE 116

Evaluation by computing NE of grand table

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 27

Reconsider the

grand table:

A1 A2 . . . A12 avg A1 3.1 5.1 . . . 4.7 4.1 A2 2.4 1.2 . . . 2.2 1.3 . . . . . . . . . ... . . . . . . A12 3.1 6.1 . . . 3.8 4.2 See this as a game in normal form:      A1 A2 . . . A12 A1 3.1 5.1 . . . 4.7 A2 2.4 1.2 . . . 2.2 . . . . . . . . . ... . . . A12 3.1 6.1 . . . 3.8     

slide-117
SLIDE 117

Evaluation by computing NE of grand table

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 27

Reconsider the

grand table:

A1 A2 . . . A12 avg A1 3.1 5.1 . . . 4.7 4.1 A2 2.4 1.2 . . . 2.2 1.3 . . . . . . . . . ... . . . . . . A12 3.1 6.1 . . . 3.8 4.2 See this as a game in normal form:      A1 A2 . . . A12 A1 3.1 5.1 . . . 4.7 A2 2.4 1.2 . . . 2.2 . . . . . . . . . ... . . . A12 3.1 6.1 . . . 3.8      Compute all Nash Equilibria.

  • Facts. (1) Every fully mixed

limit point of the replicator dynamic is a Nash

  • equilibrium. (2) Every

Nash-equilibrium is a fixed point of the replicator dynamic.

slide-118
SLIDE 118

Evaluation by computing NE of grand table

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 27

Reconsider the

grand table:

A1 A2 . . . A12 avg A1 3.1 5.1 . . . 4.7 4.1 A2 2.4 1.2 . . . 2.2 1.3 . . . . . . . . . ... . . . . . . A12 3.1 6.1 . . . 3.8 4.2 See this as a game in normal form:      A1 A2 . . . A12 A1 3.1 5.1 . . . 4.7 A2 2.4 1.2 . . . 2.2 . . . . . . . . . ... . . . A12 3.1 6.1 . . . 3.8      Compute all Nash Equilibria.

  • Facts. (1) Every fully mixed

limit point of the replicator dynamic is a Nash

  • equilibrium. (2) Every

Nash-equilibrium is a fixed point of the replicator dynamic. The converse of (1) and (2) are not true (absent species violate the contra-implication), but (1) and (2) surely help to isolate interesting “response profiles”.

slide-119
SLIDE 119

Evaluation by computing NE of grand table

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 27

Reconsider the

grand table:

A1 A2 . . . A12 avg A1 3.1 5.1 . . . 4.7 4.1 A2 2.4 1.2 . . . 2.2 1.3 . . . . . . . . . ... . . . . . . A12 3.1 6.1 . . . 3.8 4.2 See this as a game in normal form:      A1 A2 . . . A12 A1 3.1 5.1 . . . 4.7 A2 2.4 1.2 . . . 2.2 . . . . . . . . . ... . . . A12 3.1 6.1 . . . 3.8      Compute all Nash Equilibria.

  • Facts. (1) Every fully mixed

limit point of the replicator dynamic is a Nash

  • equilibrium. (2) Every

Nash-equilibrium is a fixed point of the replicator dynamic. The converse of (1) and (2) are not true (absent species violate the contra-implication), but (1) and (2) surely help to isolate interesting “response profiles”. It is interesting to interpret Nash equilibria among reply rules on the grand table.

slide-120
SLIDE 120

General evaluation

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 28

slide-121
SLIDE 121

General evaluation

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 29

slide-122
SLIDE 122

General evaluation

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 29

  • 1. Selection of learning algorithms to test on.
slide-123
SLIDE 123

General evaluation

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 29

  • 1. Selection of learning algorithms to test on. All approaches

demonstrate arbitrariness and lack of method.

slide-124
SLIDE 124

General evaluation

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 29

  • 1. Selection of learning algorithms to test on. All approaches

demonstrate arbitrariness and lack of method.

  • 2. Conditions on algorithms: coupled, uncoupled, completely

uncoupled.

slide-125
SLIDE 125

General evaluation

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 29

  • 1. Selection of learning algorithms to test on. All approaches

demonstrate arbitrariness and lack of method.

  • 2. Conditions on algorithms: coupled, uncoupled, completely
  • uncoupled. All methods give contestants full information ⇒ unequal

playing field.

slide-126
SLIDE 126

General evaluation

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 29

  • 1. Selection of learning algorithms to test on. All approaches

demonstrate arbitrariness and lack of method.

  • 2. Conditions on algorithms: coupled, uncoupled, completely
  • uncoupled. All methods give contestants full information ⇒ unequal

playing field. Further, accessibility to computationally expensive game properties (mixed Nash equilibria) is taken for granted.

slide-127
SLIDE 127

General evaluation

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 29

  • 1. Selection of learning algorithms to test on. All approaches

demonstrate arbitrariness and lack of method.

  • 2. Conditions on algorithms: coupled, uncoupled, completely
  • uncoupled. All methods give contestants full information ⇒ unequal

playing field. Further, accessibility to computationally expensive game properties (mixed Nash equilibria) is taken for granted. We should have different leagues: coupled (disposal of Nash equilibria), uncoupled, completely uncoupled.

slide-128
SLIDE 128

General evaluation

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 29

  • 1. Selection of learning algorithms to test on. All approaches

demonstrate arbitrariness and lack of method.

  • 2. Conditions on algorithms: coupled, uncoupled, completely
  • uncoupled. All methods give contestants full information ⇒ unequal

playing field. Further, accessibility to computationally expensive game properties (mixed Nash equilibria) is taken for granted. We should have different leagues: coupled (disposal of Nash equilibria), uncoupled, completely uncoupled.

  • 3. Parametrisation of algorithms. The parameter search space is often

in the order of [0, 1]k, k ≥ 1 per algorithm (k the number of parameters), which gives uncountably many sub-algorithms.

slide-129
SLIDE 129

General evaluation

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 29

  • 1. Selection of learning algorithms to test on. All approaches

demonstrate arbitrariness and lack of method.

  • 2. Conditions on algorithms: coupled, uncoupled, completely
  • uncoupled. All methods give contestants full information ⇒ unequal

playing field. Further, accessibility to computationally expensive game properties (mixed Nash equilibria) is taken for granted. We should have different leagues: coupled (disposal of Nash equilibria), uncoupled, completely uncoupled.

  • 3. Parametrisation of algorithms. The parameter search space is often

in the order of [0, 1]k, k ≥ 1 per algorithm (k the number of parameters), which gives uncountably many sub-algorithms.

  • 4. Selection of games.
slide-130
SLIDE 130

General evaluation

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 29

  • 1. Selection of learning algorithms to test on. All approaches

demonstrate arbitrariness and lack of method.

  • 2. Conditions on algorithms: coupled, uncoupled, completely
  • uncoupled. All methods give contestants full information ⇒ unequal

playing field. Further, accessibility to computationally expensive game properties (mixed Nash equilibria) is taken for granted. We should have different leagues: coupled (disposal of Nash equilibria), uncoupled, completely uncoupled.

  • 3. Parametrisation of algorithms. The parameter search space is often

in the order of [0, 1]k, k ≥ 1 per algorithm (k the number of parameters), which gives uncountably many sub-algorithms.

  • 4. Selection of games. Approaches demonstrate poor method.
slide-131
SLIDE 131

General evaluation

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 29

  • 1. Selection of learning algorithms to test on. All approaches

demonstrate arbitrariness and lack of method.

  • 2. Conditions on algorithms: coupled, uncoupled, completely
  • uncoupled. All methods give contestants full information ⇒ unequal

playing field. Further, accessibility to computationally expensive game properties (mixed Nash equilibria) is taken for granted. We should have different leagues: coupled (disposal of Nash equilibria), uncoupled, completely uncoupled.

  • 3. Parametrisation of algorithms. The parameter search space is often

in the order of [0, 1]k, k ≥ 1 per algorithm (k the number of parameters), which gives uncountably many sub-algorithms.

  • 4. Selection of games. Approaches demonstrate poor method.

Justification of choices is weak and subjective.

slide-132
SLIDE 132

General evaluation (continued)

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 30

slide-133
SLIDE 133

General evaluation (continued)

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 30

  • 5. All approaches evaluate learning 2-player games in normal form.
slide-134
SLIDE 134

General evaluation (continued)

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 30

  • 5. All approaches evaluate learning 2-player games in normal form.

Why not > 2 players?

slide-135
SLIDE 135

General evaluation (continued)

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 30

  • 5. All approaches evaluate learning 2-player games in normal form.

Why not > 2 players? Why not games in extensive form?

slide-136
SLIDE 136

General evaluation (continued)

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 30

  • 5. All approaches evaluate learning 2-player games in normal form.

Why not > 2 players? Why not games in extensive form? Too much work? Then say so.

slide-137
SLIDE 137

General evaluation (continued)

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 30

  • 5. All approaches evaluate learning 2-player games in normal form.

Why not > 2 players? Why not games in extensive form? Too much work? Then say so.

  • 6. Selection of “sub-contestants” yields other rankings.
slide-138
SLIDE 138

General evaluation (continued)

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 30

  • 5. All approaches evaluate learning 2-player games in normal form.

Why not > 2 players? Why not games in extensive form? Too much work? Then say so.

  • 6. Selection of “sub-contestants” yields other rankings. How to pit?
slide-139
SLIDE 139

General evaluation (continued)

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 30

  • 5. All approaches evaluate learning 2-player games in normal form.

Why not > 2 players? Why not games in extensive form? Too much work? Then say so.

  • 6. Selection of “sub-contestants” yields other rankings. How to pit?

Suppose algorithm A performs best in the presence of algorithms B and C

slide-140
SLIDE 140

General evaluation (continued)

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 30

  • 5. All approaches evaluate learning 2-player games in normal form.

Why not > 2 players? Why not games in extensive form? Too much work? Then say so.

  • 6. Selection of “sub-contestants” yields other rankings. How to pit?

Suppose algorithm A performs best in the presence of algorithms B and C, thanks to C.

slide-141
SLIDE 141

General evaluation (continued)

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 30

  • 5. All approaches evaluate learning 2-player games in normal form.

Why not > 2 players? Why not games in extensive form? Too much work? Then say so.

  • 6. Selection of “sub-contestants” yields other rankings. How to pit?

Suppose algorithm A performs best in the presence of algorithms B and C, thanks to C. But A perform worse than B in the absence of C.

slide-142
SLIDE 142

General evaluation (continued)

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 30

  • 5. All approaches evaluate learning 2-player games in normal form.

Why not > 2 players? Why not games in extensive form? Too much work? Then say so.

  • 6. Selection of “sub-contestants” yields other rankings. How to pit?

Suppose algorithm A performs best in the presence of algorithms B and C, thanks to C. But A perform worse than B in the absence of C. How to deal systematically with that?

slide-143
SLIDE 143

General evaluation (continued)

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 30

  • 5. All approaches evaluate learning 2-player games in normal form.

Why not > 2 players? Why not games in extensive form? Too much work? Then say so.

  • 6. Selection of “sub-contestants” yields other rankings. How to pit?

Suppose algorithm A performs best in the presence of algorithms B and C, thanks to C. But A perform worse than B in the absence of C. How to deal systematically with that?

  • 7. Evolutionary selection:
slide-144
SLIDE 144

General evaluation (continued)

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 30

  • 5. All approaches evaluate learning 2-player games in normal form.

Why not > 2 players? Why not games in extensive form? Too much work? Then say so.

  • 6. Selection of “sub-contestants” yields other rankings. How to pit?

Suppose algorithm A performs best in the presence of algorithms B and C, thanks to C. But A perform worse than B in the absence of C. How to deal systematically with that?

  • 7. Evolutionary selection: How to select algorithms?

Fitness-proportional selection, tournament selection, reward-based selection, stochastic universal sampling, replicator dynamic?

slide-145
SLIDE 145

General evaluation (continued)

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 30

  • 5. All approaches evaluate learning 2-player games in normal form.

Why not > 2 players? Why not games in extensive form? Too much work? Then say so.

  • 6. Selection of “sub-contestants” yields other rankings. How to pit?

Suppose algorithm A performs best in the presence of algorithms B and C, thanks to C. But A perform worse than B in the absence of C. How to deal systematically with that?

  • 7. Evolutionary selection: How to select algorithms?

Fitness-proportional selection, tournament selection, reward-based selection, stochastic universal sampling, replicator dynamic?

  • 8. Knock-out tournament:
slide-146
SLIDE 146

General evaluation (continued)

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 30

  • 5. All approaches evaluate learning 2-player games in normal form.

Why not > 2 players? Why not games in extensive form? Too much work? Then say so.

  • 6. Selection of “sub-contestants” yields other rankings. How to pit?

Suppose algorithm A performs best in the presence of algorithms B and C, thanks to C. But A perform worse than B in the absence of C. How to deal systematically with that?

  • 7. Evolutionary selection: How to select algorithms?

Fitness-proportional selection, tournament selection, reward-based selection, stochastic universal sampling, replicator dynamic?

  • 8. Knock-out tournament: How to decide when and where to drop the

worst performer?

slide-147
SLIDE 147

General evaluation (continued)

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 30

  • 5. All approaches evaluate learning 2-player games in normal form.

Why not > 2 players? Why not games in extensive form? Too much work? Then say so.

  • 6. Selection of “sub-contestants” yields other rankings. How to pit?

Suppose algorithm A performs best in the presence of algorithms B and C, thanks to C. But A perform worse than B in the absence of C. How to deal systematically with that?

  • 7. Evolutionary selection: How to select algorithms?

Fitness-proportional selection, tournament selection, reward-based selection, stochastic universal sampling, replicator dynamic?

  • 8. Knock-out tournament: How to decide when and where to drop the

worst performer? The order of elimination puts a (plausible but arbitrary) bias on the ranking.

slide-148
SLIDE 148

Now you know enough . . .

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 31

slide-149
SLIDE 149

Now you know enough . . . to design your own MAL algorithm . . .

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 31

slide-150
SLIDE 150

Now you know enough . . . to design your own MAL algorithm . . . and MAL comparison methods . . .

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 31

slide-151
SLIDE 151

The end

Author: Gerard Vreeswijk. Slides last modified on June 21st, 2020 at 21:18 Multi-agent learning: Comparing algorithms empirically, slide 32

Good luck and au revoir . . .