multi agent learning
play

Multi-agent learning Repeated games Gerard Vreeswijk , Intelligent - PowerPoint PPT Presentation

Multi-agent learning Repeated games Multi-agent learning Repeated games Gerard Vreeswijk , Intelligent Systems Group, Computer Science Department, Faculty of Sciences, Utrecht University, The Netherlands. Last modified on February 9 th , 2012 at


  1. Multi-agent learning Repeated games Multi-agent learning Repeated games Gerard Vreeswijk , Intelligent Systems Group, Computer Science Department, Faculty of Sciences, Utrecht University, The Netherlands. Last modified on February 9 th , 2012 at 17:15 Gerard Vreeswijk. Slide 1

  2. Multi-agent learning Repeated games Repeated games: motivation 1. Much interaction in multi-agent systems can be modelled through games . 2. Much learning in multi-agent systems can therefore be modelled through learning in games . 3. Learning in games usually takes place through the (gradual) adaption of strategies (hence, behaviour) in a repeated game. 4. In most repeated games, one game (a.k.a. stage game ) is played repeatedly. Possibilities: • A finite number of times. • An indefinite (same: indeterminate ) number of times. • An infinite number of times. 5. Therefore, familiarity with the basic concepts and results from the theory of repeated games is essential to understand multi-agent learning. Last modified on February 9 th , 2012 at 17:15 Gerard Vreeswijk. Slide 2

  3. Multi-agent learning Repeated games Plan for today • NE in normal form games that are repeated a finite number of times. – Principle of backward induction . • NE in normal form games that are repeated an indefinite number of times. – Discount factor . Models the probability of continuation. – Folk theorem . (Actually many FT’s.) Repeated games generally do have infinitely many Nash equilibria. – Trigger strategy , on-path vs. off-path play, the threat to “minmax” an opponent. This presentation draws heavily on (Peters, 2008). * H. Peters (2008): Game Theory: A Multi-Leveled Approach . Springer, ISBN: 978-3-540-69290-4. Ch. 8: Repeated games. Last modified on February 9 th , 2012 at 17:15 Gerard Vreeswijk. Slide 3

  4. Multi-agent learning Repeated games Example 1: Nash equilibria in playing the PD twice Other: Prisoners’ Dilemma Cooperate Defect You: ( 3 , 3 ) ( 0 , 5 ) Cooperate ( 5 , 0 ) ( 1 , 1 ) Defect • Even if mixed strategies are allowed, the PD possesses one Nash equilibrium , viz. ( D , D ) with payoffs ( 1 , 1 ) . • This equilibrium is Pareto sub-optimal. (Because ( 3 , 3 ) makes both players better off.) • Does the situation change if two parties get to play the Prisoners’ Dilemma two times in succession ? • The following diagram (hopefully) shows that playing the PD two times in succession does not yield an essentially new NE. Last modified on February 9 th , 2012 at 17:15 Gerard Vreeswijk. Slide 4

  5. Multi-agent learning Repeated games Example 1: Nash equilibria in playing the PD twice ( 2 ) ( 0 , 0 ) C C C D D C D D ( 3 , 3 ) ( 0 , 5 ) ( 5 , 0 ) ( 1 , 1 ) C C C D D C D D C C C D D C D D C C C D D C D D C C C D D C D D ( 6 , 6 ) ( 3 , 8 ) ( 8 , 3 ) ( 4 , 4 ) ( 3 , 8 ) ( 0 , 10 ) ( 5 , 5 ) ( 1 , 6 ) ( 8 , 3 ) ( 5 , 5 ) ( 10 , 0 ) ( 6 , 1 ) ( 4 , 4 ) ( 1 , 6 ) ( 6 , 1 ) ( 2 , 2 ) Last modified on February 9 th , 2012 at 17:15 Gerard Vreeswijk. Slide 5

  6. Multi-agent learning Repeated games Example 1: Nash equilibria in playing the PD twice ( 3 ) In normal form: Other: CC CD DC DD You: ( 6 , 6 ) ( 3 , 8 ) ( 3 , 8 ) ( 0 , 10 ) CC ( 8 , 3 ) ( 4 , 4 ) ( 5 , 5 ) ( 1 , 6 ) CD ( 8 , 3 ) ( 5 , 5 ) ( 4 , 4 ) ( 1 , 6 ) DC ( 10 , 0 ) ( 6 , 1 ) ( 6 , 1 ) ( 2 , 2 ) DD • The action profile ( DD , DD ) is the only Nash equilibrium. • With 3 successive games, we obtain a 2 3 × 2 3 matrix, where the action profile ( DDD , DDD ) still would be the only Nash equilibrium. • Generalise to N repetitions: ( D N , D N ) still is the only Nash equilibrium in a repeated game where the PD is played N times in succession. Last modified on February 9 th , 2012 at 17:15 Gerard Vreeswijk. Slide 6

  7. Multi-agent learning Repeated games Backward induction (version for repeated games) • Suppose G is a game in normal form for p players, where all players possess the same arsenal of possible actions A = { a 1 , . . . , a m } . • The game G n arises by playing the stage game G a number of n times in succession. • A history h of length k is an element of ( A p ) k , e.g., for p = 3 and k = 10, a 7 a 5 a 3 a 6 a 1 a 9 a 2 a 7 a 7 a 3 a 6 a 9 a 2 a 4 a 2 a 9 a 9 a 1 a 1 a 4 a 1 a 2 a 7 a 9 a 6 a 1 a 1 a 8 a 2 a 4 is a history of length ten in a game with three players. The set of all possible histories is denoted by H . (Hence, | H k | = m kp .) • A (possibly mixed) strategy for one player is a function H → Pr ( A ) . Last modified on February 9 th , 2012 at 17:15 Gerard Vreeswijk. Slide 7

  8. Multi-agent learning Repeated games Backward induction (version for repeated games) • For some repeated games of length n , the dominating (read: “clearly best”) strategy for all players in round n (the last round) does not depend on the history of play. E.g., for the Prisoners’ Dilemma in last round: “No matter what happened in rounds 1 . . . n − 1 , I am better off playing D.” • Fixed strategies ( D , D ) in round n determine play after round n − 1. • Independence on history , plus a determined future , leads to the following justification for playing D in round n − 1: “No matter what happened in rounds 1 . . . n − 2 (the past), and given that I will receive a payoff of 1 in round n (the future), I am better off playing D now.” • Per induction in round k , where k ≥ 1: “No matter what happened in rounds 1 . . . k, and given that I will receive a payoff of ( n − k ) · 1 in rounds ( k + 1 ) . . . n, I am better off playing D in round k.” Last modified on February 9 th , 2012 at 17:15 Gerard Vreeswijk. Slide 8

  9. Multi-agent learning Repeated games Indefinite number of repetitions • A Pareto-suboptimal outcome can be avoided in case the following three conditions are met. 1. The Prisoners’ Dilemma is repeated an indefinite number of times (rounds). 2. A so-called discount factor δ ∈ [ 0, 1 ] determines the probability of continuing the game after each round. 3. The probability to continue, δ , must be large enough. • Under these conditions suddenly infinitely many Nash equilibria exist. This is sometimes called an embarrassment of richness (Peters, 2008). • Various Folk theorems state the existence of multiple equilibria in infinitely repeated games. a • We now informally discuss one version of “the” Folk Theorem. a Folk Theorems are named such, because their exact origin cannot be traced. Last modified on February 9 th , 2012 at 17:15 Gerard Vreeswijk. Slide 9

  10. Multi-agent learning Repeated games Example 2: Prisoners’ Dilemma repeated indefinitely • Consider the game G ∗ ( δ ) where the PD is played a number of times in succession. We write G ∗ ( δ ) : G 0 , G 1 , G 2 , . . . . • The number of times the stage game is played is determined by a parameter 0 ≤ δ ≤ 1. The probability that the next stage (and the stages thereafter) will be played is δ . Thus, the probability that stage game G t will be played is δ t . (What if t = 0?) • The PD (of which every G t is an incarnation) is called the stage game , as opposed to the overall game G ∗ ( δ ) . • A history h of length t of a repeated game is a sequence of action profiles of length t . • A realisation h is a countably infinite sequence of action profiles. Last modified on February 9 th , 2012 at 17:15 Gerard Vreeswijk. Slide 10

  11. Multi-agent learning Repeated games Example 2: Prisoners’ Dilemma repeated indefinitely ( 2 ) • Example of a history of length t = 10: Row player: C D D D C C D D D D Column player: C D D D D D D C D D 0 1 2 3 4 5 6 7 8 9 • The set of all possible histories (of any length) is denoted by H . • A (mixed) strategy for Player i is a function s i : H → Pr ( { C , D } ) such that Pr ( Player i plays C in round | h | + 1 | h ) = s i ( h ) ( C ) . • A strategy profile s is a combination of strategies, one for each player. • The expected payoff for player i given s can be computed. It is ∞ δ t Expected payoff i , t ( s ) . ∑ Expected payoff i ( s ) = t = 0 Last modified on February 9 th , 2012 at 17:15 Gerard Vreeswijk. Slide 11

  12. Multi-agent learning Repeated games Example: The expected payoff of a stage game Prisoners’ Dilemma Other: Cooperate Defect ( 3 , 3 ) ( 0 , 5 ) You: Cooperate ( 5 , 0 ) ( 1 , 1 ) Defect • Suppose following strategy profile for one game: – Row player (you) plays with mixed strategy 0.8 on C (hence, 0.2 on D ). – Column player (other) plays with mixed strategy 0.7 on C . • Your expected payoff is 0.8 ( 0.7 · 3 + 0.3 · 0 ) + 0.2 ( 0.7 · 5 + 0.3 · 1 ) = 2.44 • General formula (cf., e.g., Leyton-Brown et al. , 2008): Π n ∑ Expected payoff i , t ( s ) = k = 1 s k , i k · payoff i ( s i 1 , . . . , s i n ) ( i 1 ,..., i n ) ∈ A n Last modified on February 9 th , 2012 at 17:15 Gerard Vreeswijk. Slide 12

  13. Multi-agent learning Repeated games Expected payoffs for P1 and P2 in stage PD with mixed strategies Player 1 may only move “back – front”; Player 2 may only move “left – right”. Last modified on February 9 th , 2012 at 17:15 Gerard Vreeswijk. Slide 13

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend