Multi-agent learning Teaching strategies Gerard Vreeswijk , - PowerPoint PPT Presentation

Multi-agent learning Teaching strategies Multi-agent learning Teaching strategies Gerard Vreeswijk , Intelligent Systems Group, Computer Science Department, Faculty of Sciences, Utrecht University, The Netherlands. Gerard Vreeswijk. Slides last processed on Thursday 8 th April, 2010 at 10:56h. Slide 1

Multi-agent learning Teaching strategies Plan for Today Part I: Preliminaries 1. Teacher possesses memory of k = 0 rounds: Bully 2. Teacher possesses memory of k = 1 round: Godfather 3. Teacher possesses memory of k > 1 rounds: {lenient, strict} Godfather 4. Teacher is represented by a finite machine: Godfather++ Part II: Crandall & Goodrich (2005) SPaM : an algorithm that claims to integrate follower and teacher algorithms. a. Three points of criticism to Godfather++. b. Core idea of SPaM: combine teacher and follower capabilities. c. Notion of guilt to trigger switches between teaching and following. Gerard Vreeswijk. Slides last processed on Thursday 8 th April, 2010 at 10:56h. Slide 2

Multi-agent learning Teaching strategies Literature Michael L. Littman and Peter Stone (2001). “Leading best-response strategies in repeated games”. Research note. One of the first papers, if not the first paper, that mentions Bully and Godfather. Michael L. Littman and Peter Stone (2005). “A polynomial-time Nash equilibrium algorithm for repeated games”. In Decision Support Systems Vol. 39, pp. 55-66. Paper that describes Godfather++. Jacob W. Crandall and Michael A. Goodrich (2005). “Learning to teach and follow in repeated games”. In AAAI Workshop on Multiagent Learning , Pittsburgh, PA. Paper that attempts to combine Fictitious Play and a modified Godfather++ to define an algorithm that “knows” when to teach and when to follow. Doran Chakraborty and Peter Stone (2008). “Online Multiagent Learning against Memory Bounded Ad- versaries,” Machine Learning and Knowledge Discovery in Databases , Lecture Notes in Artificial Intelligence Vol. 5212, pp. 211-26 Gerard Vreeswijk. Slides last processed on Thursday 8 th April, 2010 at 10:56h. Slide 3

Multi-agent learning Teaching strategies Taxonomy of possible adversaries (Taken from Chakraborty and Stone, 2008): Adversaries Joint-action based Joint-strategy based Dependent on entire Previous step joint- k -Markov Entire history of joint history strategy 1. Best response strategies. 1. Fictitious play 1. IGA 2. Godfather 1. No-regret 2. Grim opponent 2. WoLF-IGA 3. Bully learners. 3. WoLF-PHC 3. ReDVaLer Gerard Vreeswijk. Slides last processed on Thursday 8 th April, 2010 at 10:56h. Slide 4

Multi-agent learning Teaching strategies Bully Play any strategy that gives you the highest payoff, assuming that your opponent is a mindless follower. Example of finding a pure Bully opponent. This yields strategy: ( T , L ( 6 )) , ( C , R ( 4 )) , ( B , M ( 5 )) , ( B , R ( 5 )) . L M R 2. Now change perspective   T 3, 6 8, 1 7, 3 ( T ( 3 ) , L ) , ( C ( 9 ) , R ) , C 8, 1 6, 3 9, 4     ( B ( 9 ) , M ) , ( B ( 8 ) , R ) . B 3, 2 9, 5 8, 5 and choose action with highest guaranteed payoff. 1. Find, for every action of yourself, That would be C . the best response of your Gerard Vreeswijk. Slides last processed on Thursday 8 th April, 2010 at 10:56h. Slide 5

Multi-agent learning Teaching strategies Bully: precise definition Play any strategy that gives you the highest payoff, assuming that your opponent is a mindless follower. Surprisingly difficult to capture in an exact definition. Would be something like: Bully i = Def argmax s i ∈ S i min { u i ( s i , s − i ) | s − i ∈ argmax s − i { u − i ( s i , s − i ) | s − i ∈ S − i }} • Right most inner part (green): best response of opponent to s i . • Middle inner part (2nd line): guaranteed payoff for bullying opponent with s i . • Entire formula: choose s i that maximises own payoff regarding guaranteed payoff for bullying opponent with s i . Gerard Vreeswijk. Slides last processed on Thursday 8 th April, 2010 at 10:56h. Slide 6

Multi-agent learning Teaching strategies Bully: precise definition (in parts) • Let BR ( s i ) be the set of all best responses to strategy s i : BR ( s i ) = Def argmax s − i { u − i ( s i , s − i ) | s − i ∈ S − i } • Let Bully i ( s i ) be the payoff guaranteed for playing s i against mindless followers (i.e, best responders): Bully i ( s i ) = Def min { u i ( s i , s − i ) | s − i ∈ BR ( s i ) } • The set of bully strategies is formed by: Bully i = Def argmax s i ∈ S i Bully i ( s i ) • Bully is stateless (a.k.a. memoryless, i.e, memory of k = 0 rounds), thus keeps playing the same action throughout. Gerard Vreeswijk. Slides last processed on Thursday 8 th April, 2010 at 10:56h. Slide 7

Multi-agent learning Teaching strategies Godfather (Littman and Stone, 2001) • A strategy [function H → ∆ ( A ) from histories to mixed strategies] that makes its opponent an offer that it cannot refuse. • Capitalises on the Folk theorem for repeated games with (not necessarily SGP) Nash equilibria. • A pair of strategies ( s i , s − i ) is called a targetable pair if playing them results in each player getting more than the safety value (maxmin) and plays its half of the pair. • Godfather chooses a targetable pair. 1. If the opponent keeps playing its half of targetable pair in one stage, Godfather plays its half in the next stage. 2. Otherwise it falls back forever to the (mixed) strategy that forces the opponent to achieve at most its safety value. • Godfather needs a memory of k = 1 (one round). Gerard Vreeswijk. Slides last processed on Thursday 8 th April, 2010 at 10:56h. Slide 8

Multi-agent learning Teaching strategies Folk theorem for NE in repeated games with average payoffs • Feasible payoffs (striped): payoff • 5 combos that can be obtained by jointly repeating patterns of actions (more accurate: patterns of action profiles). 4 ( 3, 3 ) • Enforceable payoffs (shaded): no one • goes below their minmax. 3 Theorem. If ( x , y ) is both feasible and 2 enforceable, then ( x , y ) is the payoff in a Nash equilibrium of the infinitely re- • 1 peated G with average payoffs. Conversely, if ( x , y ) is the payoff in any • 0 Nash equilibrium of the infinitely re- 0 1 2 3 4 5 peated G with average payoffs, then ( x , y ) is enforceable. Gerard Vreeswijk. Slides last processed on Thursday 8 th April, 2010 at 10:56h. Slide 9

Multi-agent learning Teaching strategies Variations on Godfather with memory k > 1 (Taken from Chakraborty and Stone, 2008): • Godfather-lenient plays its part of a targetable pair if, within the last k actions, the opponent played its own half of the pair at least once. Otherwise execute threat. (But no longer forever.) • Godfather-strict plays its part of a targetable pair if, within the last k actions, the opponent always played its own half of the pair. Gerard Vreeswijk. Slides last processed on Thursday 8 th April, 2010 at 10:56h. Slide 10

Multi-agent learning Teaching strategies Godfather++ (Littman & Stone, 2005) • The name “Godfather++” is due to Crandall (2005). • Capitalises on the Folk theorem for repeated games with (not necessarily SGP) Nash equilibria. • Godfather++ a polynomial-time algorithm for constructing a finite state machine . This FSM represents a strategy which plays a Nash equilibrium for a repeated 2-player game with averaged payoffs. • – Not for finite repeated games. – Not for infinite repeated games with discounted payoffs. – Not for n -player games, n > 2. Michael L. Littman and Peter Stone (2005). “A polynomial-time Nash equilibrium algorithm for repeated games”. In Decision Support Systems Vol. 39, pp. 55-66. Gerard Vreeswijk. Slides last processed on Thursday 8 th April, 2010 at 10:56h. Slide 11

Multi-agent learning Teaching strategies Finite machine for “two tits for tat” Start ∗ ∗ ∗ ( C , C ) C D D ( D , C ) • Finite state machine for the Prisoners’ dilemma . • Personal actions determine states . • Action profiles determine transitions between states. The “ ∗ ” represents an “else,” in the sense of “all other action profiles”. Gerard Vreeswijk. Slides last processed on Thursday 8 th April, 2010 at 10:56h. Slide 12

Multi-agent learning Teaching strategies The use of counting nodes c times } ( a i , a − i ) ( a i , a − i ) ( a i , a − i ) ( a i , a − i ) ( a i , a − i ) . . . a i a i a i a i ∗ = ∗ ∗ c ∗ ∗ ∗ ∗ . . . a i a i Upon entry: • If exactly c times action profile ( a i , a − i ) is played, then take exit above. • If column player deviates in round d , keep playing a i for the remaining c − ( d + 1 ) rounds. Finally, exit below. • Because integers up to c can be expressed in log c bits (roughly), size of finite machine is polynomial in log c . Gerard Vreeswijk. Slides last processed on Thursday 8 th April, 2010 at 10:56h. Slide 13

Multi-agent learning Teaching strategies Gerard Vreeswijk , - PowerPoint PPT Presentation

Multi-agent learning Teaching strategies Multi-agent learning Teaching strategies Gerard Vreeswijk , Intelligent Systems Group, Computer Science Department, Faculty of Sciences, Utrecht University, The Netherlands. Gerard Vreeswijk. Slides last

Multi-agent learning Multi-agent reinforcement learning Gerard Vreeswijk , Intelligent Systems

Overview Multi-Agent Systems Introduction to multi-agent systems and agent societies Agent

Multi-agent learning Gerard Vreeswijk , Intelligent Systems Group, Computer Science Department,

REINFORCEMENT LEARNING IN MULTI-AGENT SYSTEMS MACHINE LEARNING MEETUP DR. ANA PELETEIRO

An Agent Architecture An Agent Architecture An Agent Architecture An Agent Architecture for

S S S S calable calable Agent calable calable Agent Agent Plat forms Agent Plat forms

Agent-Based Systems Agent communication Speech act theory Michael Rovatsos Agent

Multi-agent learning Simplied Poker Yannick Bitane , April 14th, 2011. Yannick Bitane. Slides

Learning Agent Learning Agents An Agent that observes its performance and adapts its

ROMA: Multi-Agent Reinforcement Learning with Emerging Roles Tonghan Wang, Heng Dong, Victor

The Player Agent The Player Agent Are they the most important league official right now? right

Rational Agents (Ch. 2) Rational agent An agent/robot must be able to perceive and interact with

Agent-Based Systems Michael Rovatsos mrovatso@inf.ed.ac.uk Lecture 6 Agent Communication 1

Agent Training Welcome Blues Agent Portal Training e-Learning on the BCBSM Agent Portal

W HAT S AN A GENT ? Weiss, p. 29 [after Wooldridge and Jennings]: An agent is a

M ULTI -A GENT S YSTEMS Overview and Research Directions Whats an agent? AI Class 12 (C H .

APA Annual Meeting Disclosure: Sermsak Lolak, MD With respect to the following presentation,

Mindfulness in Trauma What is Mindfulness? Treatment Sati in Pali Connotes awareness ,

THE MOTHERS AND BABIES COURSE A Postpartum Depression Prevention Intervention Darius Tandon, PhD

Developing Collaborative Comprehensive Case Plans: A Web-based Tool October 10, 2017 Brought

Overview (SEL)? What is social and emotional learning? What is mindfulness? Social

Anticipation in cybernetic systems: A case against mindless antirepresentationalism Lambert

Text Classification Diyi Yang Some slides borrowed from Jacob Eisenstein (was at GT) and Dan

1. Introduction butterfillS@ceu.hu butterfillS@ceu.hu first challenge second challenge We