Multi-agent learning Repeated games Gerard Vreeswijk , Intelligent - PowerPoint PPT Presentation

Multi-agent learning Repeated games Multi-agent learning Repeated games Gerard Vreeswijk , Intelligent Systems Group, Computer Science Department, Faculty of Sciences, Utrecht University, The Netherlands. Last modified on February 9 th , 2012 at 17:15 Gerard Vreeswijk. Slide 1

Multi-agent learning Repeated games Repeated games: motivation 1. Much interaction in multi-agent systems can be modelled through games . 2. Much learning in multi-agent systems can therefore be modelled through learning in games . 3. Learning in games usually takes place through the (gradual) adaption of strategies (hence, behaviour) in a repeated game. 4. In most repeated games, one game (a.k.a. stage game ) is played repeatedly. Possibilities: • A finite number of times. • An indefinite (same: indeterminate ) number of times. • An infinite number of times. 5. Therefore, familiarity with the basic concepts and results from the theory of repeated games is essential to understand multi-agent learning. Last modified on February 9 th , 2012 at 17:15 Gerard Vreeswijk. Slide 2

Multi-agent learning Repeated games Plan for today • NE in normal form games that are repeated a finite number of times. – Principle of backward induction . • NE in normal form games that are repeated an indefinite number of times. – Discount factor . Models the probability of continuation. – Folk theorem . (Actually many FT’s.) Repeated games generally do have infinitely many Nash equilibria. – Trigger strategy , on-path vs. off-path play, the threat to “minmax” an opponent. This presentation draws heavily on (Peters, 2008). * H. Peters (2008): Game Theory: A Multi-Leveled Approach . Springer, ISBN: 978-3-540-69290-4. Ch. 8: Repeated games. Last modified on February 9 th , 2012 at 17:15 Gerard Vreeswijk. Slide 3

Multi-agent learning Repeated games Example 1: Nash equilibria in playing the PD twice Other: Prisoners’ Dilemma Cooperate Defect You: ( 3 , 3 ) ( 0 , 5 ) Cooperate ( 5 , 0 ) ( 1 , 1 ) Defect • Even if mixed strategies are allowed, the PD possesses one Nash equilibrium , viz. ( D , D ) with payoffs ( 1 , 1 ) . • This equilibrium is Pareto sub-optimal. (Because ( 3 , 3 ) makes both players better off.) • Does the situation change if two parties get to play the Prisoners’ Dilemma two times in succession ? • The following diagram (hopefully) shows that playing the PD two times in succession does not yield an essentially new NE. Last modified on February 9 th , 2012 at 17:15 Gerard Vreeswijk. Slide 4

Multi-agent learning Repeated games Example 1: Nash equilibria in playing the PD twice ( 2 ) ( 0 , 0 ) C C C D D C D D ( 3 , 3 ) ( 0 , 5 ) ( 5 , 0 ) ( 1 , 1 ) C C C D D C D D C C C D D C D D C C C D D C D D C C C D D C D D ( 6 , 6 ) ( 3 , 8 ) ( 8 , 3 ) ( 4 , 4 ) ( 3 , 8 ) ( 0 , 10 ) ( 5 , 5 ) ( 1 , 6 ) ( 8 , 3 ) ( 5 , 5 ) ( 10 , 0 ) ( 6 , 1 ) ( 4 , 4 ) ( 1 , 6 ) ( 6 , 1 ) ( 2 , 2 ) Last modified on February 9 th , 2012 at 17:15 Gerard Vreeswijk. Slide 5

Multi-agent learning Repeated games Example 1: Nash equilibria in playing the PD twice ( 3 ) In normal form: Other: CC CD DC DD You: ( 6 , 6 ) ( 3 , 8 ) ( 3 , 8 ) ( 0 , 10 ) CC ( 8 , 3 ) ( 4 , 4 ) ( 5 , 5 ) ( 1 , 6 ) CD ( 8 , 3 ) ( 5 , 5 ) ( 4 , 4 ) ( 1 , 6 ) DC ( 10 , 0 ) ( 6 , 1 ) ( 6 , 1 ) ( 2 , 2 ) DD • The action profile ( DD , DD ) is the only Nash equilibrium. • With 3 successive games, we obtain a 2 3 × 2 3 matrix, where the action profile ( DDD , DDD ) still would be the only Nash equilibrium. • Generalise to N repetitions: ( D N , D N ) still is the only Nash equilibrium in a repeated game where the PD is played N times in succession. Last modified on February 9 th , 2012 at 17:15 Gerard Vreeswijk. Slide 6

Multi-agent learning Repeated games Backward induction (version for repeated games) • Suppose G is a game in normal form for p players, where all players possess the same arsenal of possible actions A = { a 1 , . . . , a m } . • The game G n arises by playing the stage game G a number of n times in succession. • A history h of length k is an element of ( A p ) k , e.g., for p = 3 and k = 10, a 7 a 5 a 3 a 6 a 1 a 9 a 2 a 7 a 7 a 3 a 6 a 9 a 2 a 4 a 2 a 9 a 9 a 1 a 1 a 4 a 1 a 2 a 7 a 9 a 6 a 1 a 1 a 8 a 2 a 4 is a history of length ten in a game with three players. The set of all possible histories is denoted by H . (Hence, | H k | = m kp .) • A (possibly mixed) strategy for one player is a function H → Pr ( A ) . Last modified on February 9 th , 2012 at 17:15 Gerard Vreeswijk. Slide 7

Multi-agent learning Repeated games Backward induction (version for repeated games) • For some repeated games of length n , the dominating (read: “clearly best”) strategy for all players in round n (the last round) does not depend on the history of play. E.g., for the Prisoners’ Dilemma in last round: “No matter what happened in rounds 1 . . . n − 1 , I am better off playing D.” • Fixed strategies ( D , D ) in round n determine play after round n − 1. • Independence on history , plus a determined future , leads to the following justification for playing D in round n − 1: “No matter what happened in rounds 1 . . . n − 2 (the past), and given that I will receive a payoff of 1 in round n (the future), I am better off playing D now.” • Per induction in round k , where k ≥ 1: “No matter what happened in rounds 1 . . . k, and given that I will receive a payoff of ( n − k ) · 1 in rounds ( k + 1 ) . . . n, I am better off playing D in round k.” Last modified on February 9 th , 2012 at 17:15 Gerard Vreeswijk. Slide 8

Multi-agent learning Repeated games Indefinite number of repetitions • A Pareto-suboptimal outcome can be avoided in case the following three conditions are met. 1. The Prisoners’ Dilemma is repeated an indefinite number of times (rounds). 2. A so-called discount factor δ ∈ [ 0, 1 ] determines the probability of continuing the game after each round. 3. The probability to continue, δ , must be large enough. • Under these conditions suddenly infinitely many Nash equilibria exist. This is sometimes called an embarrassment of richness (Peters, 2008). • Various Folk theorems state the existence of multiple equilibria in infinitely repeated games. a • We now informally discuss one version of “the” Folk Theorem. a Folk Theorems are named such, because their exact origin cannot be traced. Last modified on February 9 th , 2012 at 17:15 Gerard Vreeswijk. Slide 9

Multi-agent learning Repeated games Example 2: Prisoners’ Dilemma repeated indefinitely • Consider the game G ∗ ( δ ) where the PD is played a number of times in succession. We write G ∗ ( δ ) : G 0 , G 1 , G 2 , . . . . • The number of times the stage game is played is determined by a parameter 0 ≤ δ ≤ 1. The probability that the next stage (and the stages thereafter) will be played is δ . Thus, the probability that stage game G t will be played is δ t . (What if t = 0?) • The PD (of which every G t is an incarnation) is called the stage game , as opposed to the overall game G ∗ ( δ ) . • A history h of length t of a repeated game is a sequence of action profiles of length t . • A realisation h is a countably infinite sequence of action profiles. Last modified on February 9 th , 2012 at 17:15 Gerard Vreeswijk. Slide 10

Multi-agent learning Repeated games Example 2: Prisoners’ Dilemma repeated indefinitely ( 2 ) • Example of a history of length t = 10: Row player: C D D D C C D D D D Column player: C D D D D D D C D D 0 1 2 3 4 5 6 7 8 9 • The set of all possible histories (of any length) is denoted by H . • A (mixed) strategy for Player i is a function s i : H → Pr ( { C , D } ) such that Pr ( Player i plays C in round | h | + 1 | h ) = s i ( h ) ( C ) . • A strategy profile s is a combination of strategies, one for each player. • The expected payoff for player i given s can be computed. It is ∞ δ t Expected payoff i , t ( s ) . ∑ Expected payoff i ( s ) = t = 0 Last modified on February 9 th , 2012 at 17:15 Gerard Vreeswijk. Slide 11

Multi-agent learning Repeated games Example: The expected payoff of a stage game Prisoners’ Dilemma Other: Cooperate Defect ( 3 , 3 ) ( 0 , 5 ) You: Cooperate ( 5 , 0 ) ( 1 , 1 ) Defect • Suppose following strategy profile for one game: – Row player (you) plays with mixed strategy 0.8 on C (hence, 0.2 on D ). – Column player (other) plays with mixed strategy 0.7 on C . • Your expected payoff is 0.8 ( 0.7 · 3 + 0.3 · 0 ) + 0.2 ( 0.7 · 5 + 0.3 · 1 ) = 2.44 • General formula (cf., e.g., Leyton-Brown et al. , 2008): Π n ∑ Expected payoff i , t ( s ) = k = 1 s k , i k · payoff i ( s i 1 , . . . , s i n ) ( i 1 ,..., i n ) ∈ A n Last modified on February 9 th , 2012 at 17:15 Gerard Vreeswijk. Slide 12

Multi-agent learning Repeated games Expected payoffs for P1 and P2 in stage PD with mixed strategies Player 1 may only move “back – front”; Player 2 may only move “left – right”. Last modified on February 9 th , 2012 at 17:15 Gerard Vreeswijk. Slide 13

Multi-agent learning Repeated games Gerard Vreeswijk , Intelligent - PowerPoint PPT Presentation

Multi-agent learning Repeated games Multi-agent learning Repeated games Gerard Vreeswijk , Intelligent Systems Group, Computer Science Department, Faculty of Sciences, Utrecht University, The Netherlands. Last modified on February 9 th , 2012 at

Multi-agent learning Multi-agent reinforcement learning Gerard Vreeswijk , Intelligent Systems

Overview Multi-Agent Systems Introduction to multi-agent systems and agent societies Agent

Multi-agent learning Gerard Vreeswijk , Intelligent Systems Group, Computer Science Department,

REINFORCEMENT LEARNING IN MULTI-AGENT SYSTEMS MACHINE LEARNING MEETUP DR. ANA PELETEIRO

An Agent Architecture An Agent Architecture An Agent Architecture An Agent Architecture for

S S S S calable calable Agent calable calable Agent Agent Plat forms Agent Plat forms

Agent-Based Systems Agent communication Speech act theory Michael Rovatsos Agent

Multi-agent learning Simplied Poker Yannick Bitane , April 14th, 2011. Yannick Bitane. Slides

Learning Agent Learning Agents An Agent that observes its performance and adapts its

ROMA: Multi-Agent Reinforcement Learning with Emerging Roles Tonghan Wang, Heng Dong, Victor

The Player Agent The Player Agent Are they the most important league official right now? right

Rational Agents (Ch. 2) Rational agent An agent/robot must be able to perceive and interact with

Agent-Based Systems Michael Rovatsos mrovatso@inf.ed.ac.uk Lecture 6 Agent Communication 1

Agent Training Welcome Blues Agent Portal Training e-Learning on the BCBSM Agent Portal

W HAT S AN A GENT ? Weiss, p. 29 [after Wooldridge and Jennings]: An agent is a

M ULTI -A GENT S YSTEMS Overview and Research Directions Whats an agent? AI Class 12 (C H .

Finitely Repeated Games: A Generalized Nash Folk Theorem Julio Gonz alez-D az Department

WinnCompanies Community Solar Photovoltaic to Benefit Affordable Housing Darien Crimmin Vice

Understanding Farm Profitability: Impact of Best Practices Speaker 3: Insights From 3 Years of

OBJECT-ORIENTED Object Analysis And Design ANALYSIS Earlier, we saw a number of different

Repeated Games CMPUT 654: Modelling Human Strategic Behaviour S&LB 6.1 Recap: Imperfect

Continuous Time Models of Repeated Games with Imperfect Public Monitoring Drew Fudenberg and

Computing Equilibria Christos H. Papadimitriou UC Berkeley christos Games 1/3 1/3 1/3

for Compressed Imaging Chunli Guo, Mike E. Davies Institute for Digital Communications

Multi-agent learning Repeated games Gerard Vreeswijk , Intelligent - PowerPoint PPT Presentation

Multi-agent learning Repeated games Multi-agent learning Repeated games Gerard Vreeswijk , Intelligent Systems Group, Computer Science Department, Faculty of Sciences, Utrecht University, The Netherlands. Last modified on February 9 th , 2012 at

Multi-agent learning Multi-agent reinforcement learning Gerard Vreeswijk , Intelligent Systems

Overview Multi-Agent Systems Introduction to multi-agent systems and agent societies Agent

Multi-agent learning Gerard Vreeswijk , Intelligent Systems Group, Computer Science Department,

REINFORCEMENT LEARNING IN MULTI-AGENT SYSTEMS MACHINE LEARNING MEETUP DR. ANA PELETEIRO

An Agent Architecture An Agent Architecture An Agent Architecture An Agent Architecture for

S S S S calable calable Agent calable calable Agent Agent Plat forms Agent Plat forms

Agent-Based Systems Agent communication Speech act theory Michael Rovatsos Agent

Multi-agent learning Simplied Poker Yannick Bitane , April 14th, 2011. Yannick Bitane. Slides

Learning Agent Learning Agents An Agent that observes its performance and adapts its

ROMA: Multi-Agent Reinforcement Learning with Emerging Roles Tonghan Wang, Heng Dong, Victor

The Player Agent The Player Agent Are they the most important league official right now? right

Rational Agents (Ch. 2) Rational agent An agent/robot must be able to perceive and interact with

Agent-Based Systems Michael Rovatsos mrovatso@inf.ed.ac.uk Lecture 6 Agent Communication 1

Agent Training Welcome Blues Agent Portal Training e-Learning on the BCBSM Agent Portal

W HAT S AN A GENT ? Weiss, p. 29 [after Wooldridge and Jennings]: An agent is a

M ULTI -A GENT S YSTEMS Overview and Research Directions Whats an agent? AI Class 12 (C H .

Finitely Repeated Games: A Generalized Nash Folk Theorem Julio Gonz alez-D az Department

WinnCompanies Community Solar Photovoltaic to Benefit Affordable Housing Darien Crimmin Vice

Understanding Farm Profitability: Impact of Best Practices Speaker 3: Insights From 3 Years of

OBJECT-ORIENTED Object Analysis And Design ANALYSIS Earlier, we saw a number of different

Repeated Games CMPUT 654: Modelling Human Strategic Behaviour S&amp;LB 6.1 Recap: Imperfect

Continuous Time Models of Repeated Games with Imperfect Public Monitoring Drew Fudenberg and

Computing Equilibria Christos H. Papadimitriou UC Berkeley christos Games 1/3 1/3 1/3

for Compressed Imaging Chunli Guo, Mike E. Davies Institute for Digital Communications

Repeated Games CMPUT 654: Modelling Human Strategic Behaviour S&LB 6.1 Recap: Imperfect