Multi-agent learning T eahing strategies Gerard Vreeswijk , - PowerPoint PPT Presentation

Multi-agent learning T ea hing strategies Gerard Vreeswijk , Intelligent Software Systems, Computer Science Department, Faculty of Sciences, Utrecht University, The Netherlands. Thursday 18 th June, 2020

Bully Go dfather {lenient, stri t} Go dfather Go dfather++ SP aM guilt Plan for Today Part I: Preliminaries Author: Gerard Vreeswijk. Slides last modified on June 18 th , 2020 at 20:55 Multi-agent learning: Teaching strategies,

Go dfather {lenient, stri t} Go dfather Go dfather++ SP aM guilt Plan for Today Part I: Preliminaries Teacher possesses memory of k = 0 rounds: 1. Bully Author: Gerard Vreeswijk. Slides last modified on June 18 th , 2020 at 20:55 Multi-agent learning: Teaching strategies,

{lenient, stri t} Go dfather Go dfather++ SP aM guilt Plan for Today Part I: Preliminaries Teacher possesses memory of k = 0 rounds: 1. Bully Teacher possesses memory of k = 1 round: 2. Go dfather Author: Gerard Vreeswijk. Slides last modified on June 18 th , 2020 at 20:55 Multi-agent learning: Teaching strategies,

Go dfather++ SP aM guilt Plan for Today Part I: Preliminaries Teacher possesses memory of k = 0 rounds: 1. Bully Teacher possesses memory of k = 1 round: 2. Go dfather 3. Teacher possesses memory of k > 1 rounds: {lenient, stri t} Go dfather Author: Gerard Vreeswijk. Slides last modified on June 18 th , 2020 at 20:55 Multi-agent learning: Teaching strategies,

SP aM guilt Plan for Today Part I: Preliminaries Teacher possesses memory of k = 0 rounds: 1. Bully Teacher possesses memory of k = 1 round: 2. Go dfather 3. Teacher possesses memory of k > 1 rounds: {lenient, stri t} Go dfather 4. Teacher is represented by a finite machine: Go dfather++ Author: Gerard Vreeswijk. Slides last modified on June 18 th , 2020 at 20:55 Multi-agent learning: Teaching strategies,

guilt Plan for Today Part I: Preliminaries Teacher possesses memory of k = 0 rounds: 1. Bully Teacher possesses memory of k = 1 round: 2. Go dfather 3. Teacher possesses memory of k > 1 rounds: {lenient, stri t} Go dfather 4. Teacher is represented by a finite machine: Go dfather++ Part II: Crandall & Goodrich (2005) aM : an algorithm that claims to SP integrate follower and teacher algorithms. Author: Gerard Vreeswijk. Slides last modified on June 18 th , 2020 at 20:55 Multi-agent learning: Teaching strategies,

guilt Plan for Today Part I: Preliminaries Teacher possesses memory of k = 0 rounds: 1. Bully Teacher possesses memory of k = 1 round: 2. Go dfather 3. Teacher possesses memory of k > 1 rounds: {lenient, stri t} Go dfather 4. Teacher is represented by a finite machine: Go dfather++ Part II: Crandall & Goodrich (2005) aM : an algorithm that claims to SP integrate follower and teacher algorithms. 1. Three points of criticism to Godfather++. Author: Gerard Vreeswijk. Slides last modified on June 18 th , 2020 at 20:55 Multi-agent learning: Teaching strategies,

guilt Plan for Today Part I: Preliminaries Teacher possesses memory of k = 0 rounds: 1. Bully Teacher possesses memory of k = 1 round: 2. Go dfather 3. Teacher possesses memory of k > 1 rounds: {lenient, stri t} Go dfather 4. Teacher is represented by a finite machine: Go dfather++ Part II: Crandall & Goodrich (2005) aM : an algorithm that claims to SP integrate follower and teacher algorithms. 1. Three points of criticism to Godfather++. 2. Core idea of SPaM: combine teacher and follower capabilities. Author: Gerard Vreeswijk. Slides last modified on June 18 th , 2020 at 20:55 Multi-agent learning: Teaching strategies,

Plan for Today Part I: Preliminaries Teacher possesses memory of k = 0 rounds: 1. Bully Teacher possesses memory of k = 1 round: 2. Go dfather 3. Teacher possesses memory of k > 1 rounds: {lenient, stri t} Go dfather 4. Teacher is represented by a finite machine: Go dfather++ Part II: Crandall & Goodrich (2005) aM : an algorithm that claims to SP integrate follower and teacher algorithms. 1. Three points of criticism to Godfather++. 2. Core idea of SPaM: combine teacher and follower capabilities. 3. Notion of guilt to trigger switches between teaching and following. Author: Gerard Vreeswijk. Slides last modified on June 18 th , 2020 at 20:55 Multi-agent learning: Teaching strategies,

Literature Author: Gerard Vreeswijk. Slides last modified on June 18 th , 2020 at 20:55 Multi-agent learning: Teaching strategies,

Literature Michael L. Littman and Peter Stone (2001). “Leading best-response strategies in repeated games”. Research note. One of the first papers, if not the first paper, that mentions Bully and Godfather. Author: Gerard Vreeswijk. Slides last modified on June 18 th , 2020 at 20:55 Multi-agent learning: Teaching strategies,

Literature Michael L. Littman and Peter Stone (2001). “Leading best-response strategies in repeated games”. Research note. One of the first papers, if not the first paper, that mentions Bully and Godfather. Michael L. Littman and Peter Stone (2005). “A polynomial-time Nash equilibrium algorithm for repeated games”. In Decision Support Systems Vol. 39, pp. 55-66. Paper that describes Godfather++. Author: Gerard Vreeswijk. Slides last modified on June 18 th , 2020 at 20:55 Multi-agent learning: Teaching strategies,

Literature Michael L. Littman and Peter Stone (2001). “Leading best-response strategies in repeated games”. Research note. One of the first papers, if not the first paper, that mentions Bully and Godfather. Michael L. Littman and Peter Stone (2005). “A polynomial-time Nash equilibrium algorithm for repeated games”. In Decision Support Systems Vol. 39, pp. 55-66. Paper that describes Godfather++. Jacob W. Crandall and Michael A. Goodrich (2005). “Learning to teach and follow in repeated games”. In AAAI Workshop on Multiagent Learning , Pittsburgh, PA. Paper that attempts to combine Fictitious Play and a modified Godfather++ to define an algorithm that “knows” when to teach and when to follow. Author: Gerard Vreeswijk. Slides last modified on June 18 th , 2020 at 20:55 Multi-agent learning: Teaching strategies,

Literature Michael L. Littman and Peter Stone (2001). “Leading best-response strategies in repeated games”. Research note. One of the first papers, if not the first paper, that mentions Bully and Godfather. Michael L. Littman and Peter Stone (2005). “A polynomial-time Nash equilibrium algorithm for repeated games”. In Decision Support Systems Vol. 39, pp. 55-66. Paper that describes Godfather++. Jacob W. Crandall and Michael A. Goodrich (2005). “Learning to teach and follow in repeated games”. In AAAI Workshop on Multiagent Learning , Pittsburgh, PA. Paper that attempts to combine Fictitious Play and a modified Godfather++ to define an algorithm that “knows” when to teach and when to follow. Doran Chakraborty and Peter Stone (2008). “Online Multiagent Learning against Memory Bounded Adversaries,” Machine Learning and Knowledge Discovery in Databases , Lecture Notes in Artificial Intelligence Vol. 5212, pp. 211-26 Author: Gerard Vreeswijk. Slides last modified on June 18 th , 2020 at 20:55 Multi-agent learning: Teaching strategies,

Taxonomy of possible adversaries (Taken from Chakraborty and Stone, 2008): Adversaries Joint-action based Joint-strategy based Dependent on entire Previous step joint- k -Markov strategy history Entire history of joint 1. Best response strategies. 1. Fictitious play 1. IGA 2. Godfather 1. No-regret 2. Grim opponent 2. WoLF-IGA learners. 3. Bully 3. WoLF-PHC 3. ReDVaLer Author: Gerard Vreeswijk. Slides last modified on June 18 th , 2020 at 20:55 Multi-agent learning: Teaching strategies,

Bully Author: Gerard Vreeswijk. Slides last modified on June 18 th , 2020 at 20:55 Multi-agent learning: Teaching strategies,

Bully Play any strategy that gives you the highest payoff, assuming that your opponent is a mindless follower. Author: Gerard Vreeswijk. Slides last modified on June 18 th , 2020 at 20:55 Multi-agent learning: Teaching strategies,

Bully Play any strategy that gives you the highest payoff, assuming that your opponent is a mindless follower. Example of finding a pure Bully strategy: Author: Gerard Vreeswijk. Slides last modified on June 18 th , 2020 at 20:55 Multi-agent learning: Teaching strategies,

Bully Play any strategy that gives you the highest payoff, assuming that your opponent is a mindless follower. Example of finding a pure Bully strategy: L M R   T 3, 6 8, 6 7, 3 C 8, 1 6, 3 7, 3   B 3, 5 9, 2 7, 5 Author: Gerard Vreeswijk. Slides last modified on June 18 th , 2020 at 20:55 Multi-agent learning: Teaching strategies,

Multi-agent learning T eahing strategies Gerard Vreeswijk , - PowerPoint PPT Presentation

Multi-agent learning T eahing strategies Gerard Vreeswijk , Intelligent Software Systems, Computer Science Department, Faculty of Sciences, Utrecht University, The Netherlands. Thursday 18 th June, 2020 Bully Go dfather {lenient, strit}

Multi-agent learning Multi-agent reinforcement learning Gerard Vreeswijk , Intelligent Systems

Overview Multi-Agent Systems Introduction to multi-agent systems and agent societies Agent

Multi-agent learning Gerard Vreeswijk , Intelligent Systems Group, Computer Science Department,

REINFORCEMENT LEARNING IN MULTI-AGENT SYSTEMS MACHINE LEARNING MEETUP DR. ANA PELETEIRO

An Agent Architecture An Agent Architecture An Agent Architecture An Agent Architecture for

S S S S calable calable Agent calable calable Agent Agent Plat forms Agent Plat forms

Agent-Based Systems Agent communication Speech act theory Michael Rovatsos Agent

Multi-agent learning Simplied Poker Yannick Bitane , April 14th, 2011. Yannick Bitane. Slides

Learning Agent Learning Agents An Agent that observes its performance and adapts its

ROMA: Multi-Agent Reinforcement Learning with Emerging Roles Tonghan Wang, Heng Dong, Victor

The Player Agent The Player Agent Are they the most important league official right now? right

Rational Agents (Ch. 2) Rational agent An agent/robot must be able to perceive and interact with

Agent-Based Systems Michael Rovatsos mrovatso@inf.ed.ac.uk Lecture 6 Agent Communication 1

Agent Training Welcome Blues Agent Portal Training e-Learning on the BCBSM Agent Portal

W HAT S AN A GENT ? Weiss, p. 29 [after Wooldridge and Jennings]: An agent is a

M ULTI -A GENT S YSTEMS Overview and Research Directions Whats an agent? AI Class 12 (C H .

Feature Selection: ROC and Subset Selection Theodoridis 5.5-5.7 Using ROC for Feature Selection

Information-Theoretic Metric Learning Jason V. Davis, Brian Kulis, Suvrit Sra, and Inderjit

Exercise 1: Energy Deposition FLUKA Advanced Course Exercise 1a Study case Beam dump of a

Agreement: Implications of Proposals to date Xolisa Ngwadla, Marianne Karlsen CCXG Global Forum

#gotochicago @thejayfields Tuesday, May 12, 15 @thejayfields JUnit version 4.11 .........

ETC5510: Introduction to Data Analysis ETC5510: Introduction to Data Analysis Week 5, part B

!"#$%&' ()$)++,$)' (-%./&+0' 1,".' 234+'

F ra ud a nd Go ing Co nc e rn Pro je c t Upda te a nd Disc ussio n Pa pe r AUGUST 2020

Multi-agent learning T eahing strategies Gerard Vreeswijk , - PowerPoint PPT Presentation

Multi-agent learning T eahing strategies Gerard Vreeswijk , Intelligent Software Systems, Computer Science Department, Faculty of Sciences, Utrecht University, The Netherlands. Thursday 18 th June, 2020 Bully Go dfather {lenient, strit}

Multi-agent learning Multi-agent reinforcement learning Gerard Vreeswijk , Intelligent Systems

Overview Multi-Agent Systems Introduction to multi-agent systems and agent societies Agent

Multi-agent learning Gerard Vreeswijk , Intelligent Systems Group, Computer Science Department,

REINFORCEMENT LEARNING IN MULTI-AGENT SYSTEMS MACHINE LEARNING MEETUP DR. ANA PELETEIRO

An Agent Architecture An Agent Architecture An Agent Architecture An Agent Architecture for

S S S S calable calable Agent calable calable Agent Agent Plat forms Agent Plat forms

Agent-Based Systems Agent communication Speech act theory Michael Rovatsos Agent

Multi-agent learning Simplied Poker Yannick Bitane , April 14th, 2011. Yannick Bitane. Slides

Learning Agent Learning Agents An Agent that observes its performance and adapts its

ROMA: Multi-Agent Reinforcement Learning with Emerging Roles Tonghan Wang, Heng Dong, Victor

The Player Agent The Player Agent Are they the most important league official right now? right

Rational Agents (Ch. 2) Rational agent An agent/robot must be able to perceive and interact with

Agent-Based Systems Michael Rovatsos mrovatso@inf.ed.ac.uk Lecture 6 Agent Communication 1

Agent Training Welcome Blues Agent Portal Training e-Learning on the BCBSM Agent Portal

W HAT S AN A GENT ? Weiss, p. 29 [after Wooldridge and Jennings]: An agent is a

M ULTI -A GENT S YSTEMS Overview and Research Directions Whats an agent? AI Class 12 (C H .

Feature Selection: ROC and Subset Selection Theodoridis 5.5-5.7 Using ROC for Feature Selection

Information-Theoretic Metric Learning Jason V. Davis, Brian Kulis, Suvrit Sra, and Inderjit

Exercise 1: Energy Deposition FLUKA Advanced Course Exercise 1a Study case Beam dump of a

Agreement: Implications of Proposals to date Xolisa Ngwadla, Marianne Karlsen CCXG Global Forum

#gotochicago @thejayfields Tuesday, May 12, 15 @thejayfields JUnit version 4.11 .........

ETC5510: Introduction to Data Analysis ETC5510: Introduction to Data Analysis Week 5, part B

!&quot;#$%&amp;' ()*$)++,$)*' (-%./&amp;+0' 1,&quot;.' 234+'

F ra ud a nd Go ing Co nc e rn Pro je c t Upda te a nd Disc ussio n Pa pe r AUGUST 2020

!"#$%&' ()$)++,$)' (-%./&+0' 1,".' 234+'