Multi-agent learning
T ea hing strategiesGerard Vreeswijk, Intelligent Software Systems, Computer Science Department, Faculty of Sciences, Utrecht University, The Netherlands.
Thursday 18th June, 2020
Multi-agent learning T eahing strategies Gerard Vreeswijk , - - PowerPoint PPT Presentation
Multi-agent learning T eahing strategies Gerard Vreeswijk , Intelligent Software Systems, Computer Science Department, Faculty of Sciences, Utrecht University, The Netherlands. Thursday 18 th June, 2020 Bully Go dfather {lenient, strit}
Gerard Vreeswijk, Intelligent Software Systems, Computer Science Department, Faculty of Sciences, Utrecht University, The Netherlands.
Thursday 18th June, 2020
Author: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies,
Author: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies,
Author: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies,
Author: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies,
Author: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies,
Author: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies,
Author: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies,
Author: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies,
Author: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies,
Author: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies,
Author: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies,
Author: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies,
Michael L. Littman and Peter Stone (2001). “Leading best-response strategies in repeated games”. Research note.
One of the first papers, if not the first paper, that mentions Bully and Godfather.
Author: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies,
Michael L. Littman and Peter Stone (2001). “Leading best-response strategies in repeated games”. Research note.
One of the first papers, if not the first paper, that mentions Bully and Godfather.
Michael L. Littman and Peter Stone (2005). “A polynomial-time Nash equilibrium algorithm for repeated games”. In Decision Support Systems Vol. 39, pp. 55-66.
Paper that describes Godfather++.
Author: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies,
Michael L. Littman and Peter Stone (2001). “Leading best-response strategies in repeated games”. Research note.
One of the first papers, if not the first paper, that mentions Bully and Godfather.
Michael L. Littman and Peter Stone (2005). “A polynomial-time Nash equilibrium algorithm for repeated games”. In Decision Support Systems Vol. 39, pp. 55-66.
Paper that describes Godfather++.
Jacob W. Crandall and Michael A. Goodrich (2005). “Learning to teach and follow in repeated games”. In AAAI Workshop on Multiagent Learning, Pittsburgh, PA.
Paper that attempts to combine Fictitious Play and a modified Godfather++ to define an algorithm that “knows” when to teach and when to follow.
Author: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies,
Michael L. Littman and Peter Stone (2001). “Leading best-response strategies in repeated games”. Research note.
One of the first papers, if not the first paper, that mentions Bully and Godfather.
Michael L. Littman and Peter Stone (2005). “A polynomial-time Nash equilibrium algorithm for repeated games”. In Decision Support Systems Vol. 39, pp. 55-66.
Paper that describes Godfather++.
Jacob W. Crandall and Michael A. Goodrich (2005). “Learning to teach and follow in repeated games”. In AAAI Workshop on Multiagent Learning, Pittsburgh, PA.
Paper that attempts to combine Fictitious Play and a modified Godfather++ to define an algorithm that “knows” when to teach and when to follow.
Doran Chakraborty and Peter Stone (2008). “Online Multiagent Learning against Memory Bounded Adversaries,” Machine Learning and Knowledge Discovery in Databases, Lecture Notes in Artificial Intelligence Vol. 5212, pp. 211-26
Author: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies,
Adversaries Joint-action based k-Markov 1. Best response 2. Godfather 3. Bully Dependent on entire history 1. Fictitious play 2. Grim opponent 3. WoLF-PHC Joint-strategy based Previous step joint- strategy 1. IGA 2. WoLF-IGA 3. ReDVaLer Entire history of joint strategies. 1. No-regret learners.
Author: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies,
Author: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies,
Author: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies,
Author: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies,
Author: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies,
Author: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies,
Author: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies,
Author: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies,
Author: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies,
Author: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies,
Author: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies,
Play against the computer. At the outset, the computer initializes to either Bully (with a probability of 50%) or pure fictitious play, the choice of which you can’t see. After that, the computer won’t change strategy. Try to press regret down as within few rounds as possible.
Author: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies,
Author: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies,
Author: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies,
Author: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies,
Author: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies,
Author: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies,
Author: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies,
Author: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies,
Author: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies,
■
Author: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies,
■
■
Author: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies,
■
■
■
Author: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies,
■
■
■
Author: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies,
setAuthor: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies,
■
Author: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies,
■
■
Author: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies,
■
■
■
Author: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies,
■
■
■
■
Author: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies, s
Author: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies, s
ta rgetable pairAuthor: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies, s
■
Author: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies, s
■
■
Author: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies, s
■
■
■
Author: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies, s
■
■
■
■
Author: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies, s
■
■
■
■
Author: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies, s
■
■
■
■
Author: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies, s
■
■
■
■
■
Author: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies, s
Author: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies, s
Author: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies, s
■ Feasible payoffs (striped): payoff combos that can be obtained by jointly repeating patterns of actions (more accurate: patterns of action profiles).
Author: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies, s
■ Feasible payoffs (striped): payoff combos that can be obtained by jointly repeating patterns of actions (more accurate: patterns of action profiles). ■ Enforceable payoffs (shaded): no
Theorem. If (x, y) is both feasible and enforceable, then (x, y) is the payoff in a Nash equilibrium of the infinitely repeated G with average payoffs. Conversely, if (x, y) is the payoff in any Nash equilibrium of the in- finitely repeated G with average pay-
1 2 3 4 5 1 2 3 4 5
(3, 3)
Author: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies, s
Author: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies, s
■
Go dfather-lenient plays its partAuthor: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies, s
■
Go dfather-lenient plays its part■
Go dfather-stri t plays its part ofAuthor: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies, s
■
Go dfather-lenient plays its part■
Go dfather-stri t plays its part ofAuthor: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies, s
nite state ma hineAuthor: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies, s
■
Author: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies, s
■
■
Author: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies, s
■
■
■
Author: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies, s
■
■
■
■
Author: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies, s
■
■
■
■
Author: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies, s
■
■
■
■
Author: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies, s
■
■
■
■
Author: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies, s
■
■
■
■
Michael L. Littman and Peter Stone (2005). “A polynomial-time Nash equilibrium algorithm for repeated games”. In Decision Support Systems Vol. 39, pp. 55-66.
Author: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies, s
Author: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies, s
■
Finite state ma hine for the Prisoners’ dilemma. states transitionsAuthor: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies, s
■
Finite state ma hine for the Prisoners’ dilemma.■
Author: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies, s
■
Finite state ma hine for the Prisoners’ dilemma.■
■
Author: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies, s
c ai
(ai, a−i) ∗
ai c times
ai ai . . . . . . ai ai
(ai, a−i) (ai, a−i) (ai, a−i) ∗ ∗ ∗ (ai, a−i) ∗ ∗ ∗
Author: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies, s
c ai
(ai, a−i) ∗
ai c times
ai ai . . . . . . ai ai
(ai, a−i) (ai, a−i) (ai, a−i) ∗ ∗ ∗ (ai, a−i) ∗ ∗ ∗
■
Author: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies, s
c ai
(ai, a−i) ∗
ai c times
ai ai . . . . . . ai ai
(ai, a−i) (ai, a−i) (ai, a−i) ∗ ∗ ∗ (ai, a−i) ∗ ∗ ∗
■
■
Author: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies, s
c ai
(ai, a−i) ∗
ai c times
ai ai . . . . . . ai ai
(ai, a−i) (ai, a−i) (ai, a−i) ∗ ∗ ∗ (ai, a−i) ∗ ∗ ∗
■
■
■
Author: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies, s
a1 a2 αrow a2
(a1, b1) (a2, b2) ∗ ∗ ∗ ∗
r1 r2 r2 max{βrow, βcol} b1 b2 αcol b2
(a1, b1) (a2, b2) ∗ ∗ ∗ ∗
r1 r2 r2 max{βrow, βcol}
Author: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies, s
a1 a2 αrow a2
(a1, b1) (a2, b2) ∗ ∗ ∗ ∗
r1 r2 r2 max{βrow, βcol} b1 b2 αcol b2
(a1, b1) (a2, b2) ∗ ∗ ∗ ∗
r1 r2 r2 max{βrow, βcol}
■
Author: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies, s
a1 a2 αrow a2
(a1, b1) (a2, b2) ∗ ∗ ∗ ∗
r1 r2 r2 max{βrow, βcol} b1 b2 αcol b2
(a1, b1) (a2, b2) ∗ ∗ ∗ ∗
r1 r2 r2 max{βrow, βcol}
■
■
Author: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies, s
a1 a2 αrow a2
(a1, b1) (a2, b2) ∗ ∗ ∗ ∗
r1 r2 r2 max{βrow, βcol} b1 b2 αcol b2
(a1, b1) (a2, b2) ∗ ∗ ∗ ∗
r1 r2 r2 max{βrow, βcol}
■
■
■
Author: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies, s
Author: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies, s
Author: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies, s
Author: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies, s
Author: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies, s
Author: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies, s
■
Author: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies, s
■
■
Author: Gerard Vreeswijk. Slides last modified on June 18th, 2020 at 20:55 Multi-agent learning: Teaching strategies, s