multi agent learning
play

Multi-agent learning Rep eated games Gerard Vreeswijk , - PowerPoint PPT Presentation

Multi-agent learning Rep eated games Gerard Vreeswijk , Intelligent Systems Group, Computer Science Department, Faculty of Sciences, Utrecht University, The Netherlands. Friday 3 rd May, 2019 interation lea rning stage game nite


  1. Plan for today ■ NE in normal form games that are repeated a finite number of times. ● Principle of indu tion . ba kw a rd ■ NE in normal form games that are repeated an indefinite number of times. r . Models the probability of continuation. ● Dis ount fa to rem . (Actually many FT’s.) Repeated games generally do ● F olk theo have infinitely many Nash equilibria. strategy . “on-path” vs. “off-path” play, “minmax” as a ● T rigger threat. This presentation draws heavily on (Peters, 2008). * H. Peters (2008): Game Theory: A Multi-Leveled Approach . Springer, ISBN: 978-3-540- 69290-4. Ch. 8: Repeated games. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 3

  2. Part I: Nash equilibria Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 4

  3. Part I: Nash equilibria in normal form games Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 4

  4. Part I: Nash equilibria in normal form games that are repeated Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 4

  5. Part I: Nash equilibria in normal form games that are repeated a finite number of times Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 4

  6. one Nash equilib rium D P a reto sub-optimal Nash equilibria in playing the PD twice Other: Prisoners’ Dilemma Cooperate Defect ( 3 , 3 ) ( 0 , 5 ) Cooperate Y ou: ( 5 , 0 ) ( 1 , 1 ) Defect Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 5

  7. P a reto sub-optimal Nash equilibria in playing the PD twice Other: Prisoners’ Dilemma Cooperate Defect ( 3 , 3 ) ( 0 , 5 ) Cooperate Y ou: ( 5 , 0 ) ( 1 , 1 ) Defect ■ Even if mixed strategies are allowed, the PD possesses one Nash rium , viz. ( D , D ) with payoffs ( 1 , 1 ) . equilib Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 5

  8. Nash equilibria in playing the PD twice Other: Prisoners’ Dilemma Cooperate Defect ( 3 , 3 ) ( 0 , 5 ) Cooperate Y ou: ( 5 , 0 ) ( 1 , 1 ) Defect ■ Even if mixed strategies are allowed, the PD possesses one Nash rium , viz. ( D , D ) with payoffs ( 1 , 1 ) . equilib ■ This equilibrium is sub-optimal . P a reto Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 5

  9. Nash equilibria in playing the PD twice Other: Prisoners’ Dilemma Cooperate Defect ( 3 , 3 ) ( 0 , 5 ) Cooperate Y ou: ( 5 , 0 ) ( 1 , 1 ) Defect ■ Even if mixed strategies are allowed, the PD possesses one Nash rium , viz. ( D , D ) with payoffs ( 1 , 1 ) . equilib ■ This equilibrium is sub-optimal . P a reto ■ Does the situation change if two parties get to play the Prisoners’ Dilemma two times in succession? Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 5

  10. Nash equilibria in playing the PD twice Other: Prisoners’ Dilemma Cooperate Defect ( 3 , 3 ) ( 0 , 5 ) Cooperate Y ou: ( 5 , 0 ) ( 1 , 1 ) Defect ■ Even if mixed strategies are allowed, the PD possesses one Nash rium , viz. ( D , D ) with payoffs ( 1 , 1 ) . equilib ■ This equilibrium is sub-optimal . P a reto ■ Does the situation change if two parties get to play the Prisoners’ Dilemma two times in succession? ■ The following diagram (hopefully) shows that playing the PD two times in succession does not yield an essentially new NE. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 5

  11. Nash equilibria in playing the PD twice ( 2 , 2 ) C C C D D C D D ( 4 , 4 ) ( 1 , 6 ) ( 6 , 1 ) ( 2 , 2 ) C C C D D C D D C C C D D C D D C C C D D C D D C C C D D C D D ( 6 , 6 ) ( 3 , 8 ) ( 8 , 3 ) ( 4 , 4 ) ( 3 , 8 ) ( 0 , 10 ) ( 5 , 5 ) ( 1 , 6 ) ( 8 , 3 ) ( 5 , 5 ) ( 10 , 0 ) ( 6 , 1 ) ( 4 , 4 ) ( 1 , 6 ) ( 6 , 1 ) ( 2 , 2 ) Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 6

  12. Nash equilibria in playing the PD twice ( 2 , 2 ) C C C D D C D D ( 4 , 4 ) ( 1 , 6 ) ( 6 , 1 ) ( 2 , 2 ) C C C D D C D D C C C D D C D D C C C D D C D D C C C D D C D D ( 6 , 6 ) ( 3 , 8 ) ( 8 , 3 ) ( 4 , 4 ) ( 3 , 8 ) ( 0 , 10 ) ( 5 , 5 ) ( 1 , 6 ) ( 8 , 3 ) ( 5 , 5 ) ( 10 , 0 ) ( 6 , 1 ) ( 4 , 4 ) ( 1 , 6 ) ( 6 , 1 ) ( 2 , 2 ) P.S. This is just a payoff tree, not a game in extensive form! Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 6

  13. Nash equilibria in playing the PD twice In normal form: Other: CC CD DC DD ( 6 , 6 ) ( 3 , 8 ) ( 3 , 8 ) ( 0 , 10 ) CC Y ou: ( 8 , 3 ) ( 4, 4 ) ( 5 , 5 ) ( 1, 6 ) CD ( 8 , 3 ) ( 5 , 5 ) ( 4 , 4 ) ( 1 , 6 ) DC ( 10 , 0 ) ( 6, 1 ) ( 6 , 1 ) ( 2, 2 ) DD Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 7

  14. Nash equilibria in playing the PD twice In normal form: Other: CC CD DC DD ( 6 , 6 ) ( 3 , 8 ) ( 3 , 8 ) ( 0 , 10 ) CC Y ou: ( 8 , 3 ) ( 4, 4 ) ( 5 , 5 ) ( 1, 6 ) CD ( 8 , 3 ) ( 5 , 5 ) ( 4 , 4 ) ( 1 , 6 ) DC ( 10 , 0 ) ( 6, 1 ) ( 6 , 1 ) ( 2, 2 ) DD ■ The action profile ( DD , DD ) is the only Nash equilibrium. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 7

  15. Nash equilibria in playing the PD twice In normal form: Other: CC CD DC DD ( 6 , 6 ) ( 3 , 8 ) ( 3 , 8 ) ( 0 , 10 ) CC Y ou: ( 8 , 3 ) ( 4, 4 ) ( 5 , 5 ) ( 1, 6 ) CD ( 8 , 3 ) ( 5 , 5 ) ( 4 , 4 ) ( 1 , 6 ) DC ( 10 , 0 ) ( 6, 1 ) ( 6 , 1 ) ( 2, 2 ) DD ■ The action profile ( DD , DD ) is the only Nash equilibrium. ■ With 3 successive games, we obtain a 2 3 × 2 3 matrix, where the action profile ( DDD , DDD ) still would be the only Nash equilibrium. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 7

  16. Nash equilibria in playing the PD twice In normal form: Other: CC CD DC DD ( 6 , 6 ) ( 3 , 8 ) ( 3 , 8 ) ( 0 , 10 ) CC Y ou: ( 8 , 3 ) ( 4, 4 ) ( 5 , 5 ) ( 1, 6 ) CD ( 8 , 3 ) ( 5 , 5 ) ( 4 , 4 ) ( 1 , 6 ) DC ( 10 , 0 ) ( 6, 1 ) ( 6 , 1 ) ( 2, 2 ) DD ■ The action profile ( DD , DD ) is the only Nash equilibrium. ■ With 3 successive games, we obtain a 2 3 × 2 3 matrix, where the action profile ( DDD , DDD ) still would be the only Nash equilibrium. ■ Generalise to N repetitions: ( D D N − 1 , DD N − 1 ) still is the only Nash equilibrium in a repeated game where the PD is played N times in succession. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 7

  17. Part II: Nash equilibria Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 8

  18. Part II: Nash equilibria in normal form games Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 8

  19. Part II: Nash equilibria in normal form games that are repeated Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 8

  20. Part II: Nash equilibria in normal form games that are repeated an indefinite number of times Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 8

  21. a �xed numb er of times an inde�nite numb er of times ountably in�nite Terminology: finite, indefinite, infinite To repeat an experiment . . . Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 9

  22. a �xed numb er of times an inde�nite numb er of times ountably in�nite Terminology: finite, indefinite, infinite To repeat an experiment . . . ■ . . . ten times. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 9

  23. a �xed numb er of times an inde�nite numb er of times ountably in�nite Terminology: finite, indefinite, infinite To repeat an experiment . . . ■ . . . ten times. That’s hopefully clear. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 9

  24. a �xed numb er of times an inde�nite numb er of times ountably in�nite Terminology: finite, indefinite, infinite To repeat an experiment . . . ■ . . . ten times. That’s hopefully clear. ■ . . . a finite number of times. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 9

  25. an inde�nite numb er of times ountably in�nite Terminology: finite, indefinite, infinite To repeat an experiment . . . ■ . . . ten times. That’s hopefully clear. ■ . . . a finite number of times. May mean: a times , where the number �xed numb er of of repetitions is determined beforehand. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 9

  26. ountably in�nite Terminology: finite, indefinite, infinite To repeat an experiment . . . ■ . . . ten times. That’s hopefully clear. ■ . . . a finite number of times. May mean: a times , where the number �xed numb er of of repetitions is determined beforehand. Or it may mean: an inde�nite numb er of times . Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 9

  27. ountably in�nite Terminology: finite, indefinite, infinite To repeat an experiment . . . ■ . . . ten times. That’s hopefully clear. ■ . . . a finite number of times. May mean: a times , where the number �xed numb er of of repetitions is determined beforehand. Or it may mean: an inde�nite numb er of times . Depends on the context! Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 9

  28. ountably in�nite Terminology: finite, indefinite, infinite To repeat an experiment . . . ■ . . . ten times. That’s hopefully clear. ■ . . . a finite number of times. May mean: a times , where the number �xed numb er of of repetitions is determined beforehand. Or it may mean: an inde�nite numb er of times . Depends on the context! ■ . . . an indefinite number of times. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 9

  29. ountably in�nite Terminology: finite, indefinite, infinite To repeat an experiment . . . ■ . . . ten times. That’s hopefully clear. ■ . . . a finite number of times. May mean: a times , where the number �xed numb er of of repetitions is determined beforehand. Or it may mean: an inde�nite numb er of times . Depends on the context! ■ . . . an indefinite number of times. Means: a finite number of times, but nothing is known beforehand about the number of repetitions. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 9

  30. ountably in�nite Terminology: finite, indefinite, infinite To repeat an experiment . . . ■ . . . ten times. That’s hopefully clear. ■ . . . a finite number of times. May mean: a times , where the number �xed numb er of of repetitions is determined beforehand. Or it may mean: an inde�nite numb er of times . Depends on the context! ■ . . . an indefinite number of times. Means: a finite number of times, but nothing is known beforehand about the number of repetitions. ■ . . . an infinite number of times. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 9

  31. Terminology: finite, indefinite, infinite To repeat an experiment . . . ■ . . . ten times. That’s hopefully clear. ■ . . . a finite number of times. May mean: a times , where the number �xed numb er of of repetitions is determined beforehand. Or it may mean: an inde�nite numb er of times . Depends on the context! ■ . . . an indefinite number of times. Means: a finite number of times, but nothing is known beforehand about the number of repetitions. ■ . . . an infinite number of times. When throwing a dice this must mean a in�nite number of times. ountably Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 9

  32. inde�nite dis ount fa to r in�nitely many F olk theo rems Indefinite number of repetitions Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 10

  33. inde�nite dis ount fa to r in�nitely many F olk theo rems Indefinite number of repetitions ■ A Pareto-suboptimal outcome can be avoided in case the following three conditions are met. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 10

  34. dis ount fa to r in�nitely many F olk theo rems Indefinite number of repetitions ■ A Pareto-suboptimal outcome can be avoided in case the following three conditions are met. 1. The Prisoners’ Dilemma is repeated an inde�nite number of times (rounds). Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 10

  35. in�nitely many F olk theo rems Indefinite number of repetitions ■ A Pareto-suboptimal outcome can be avoided in case the following three conditions are met. 1. The Prisoners’ Dilemma is repeated an inde�nite number of times (rounds). r δ ∈ [ 0, 1 ] determines the probability of 2. A so-called dis ount fa to continuing the game after each round. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 10

  36. in�nitely many F olk theo rems Indefinite number of repetitions ■ A Pareto-suboptimal outcome can be avoided in case the following three conditions are met. 1. The Prisoners’ Dilemma is repeated an inde�nite number of times (rounds). r δ ∈ [ 0, 1 ] determines the probability of 2. A so-called dis ount fa to continuing the game after each round. 3. The probability to continue, δ , is large enough. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 10

  37. F olk theo rems Indefinite number of repetitions ■ A Pareto-suboptimal outcome can be avoided in case the following three conditions are met. 1. The Prisoners’ Dilemma is repeated an inde�nite number of times (rounds). r δ ∈ [ 0, 1 ] determines the probability of 2. A so-called dis ount fa to continuing the game after each round. 3. The probability to continue, δ , is large enough. ■ Under these conditions suddenly many Nash equilibria exist. in�nitely This is sometimes called an embarrassment of richness (Peters, 2008). Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 10

  38. Indefinite number of repetitions ■ A Pareto-suboptimal outcome can be avoided in case the following three conditions are met. 1. The Prisoners’ Dilemma is repeated an inde�nite number of times (rounds). r δ ∈ [ 0, 1 ] determines the probability of 2. A so-called dis ount fa to continuing the game after each round. 3. The probability to continue, δ , is large enough. ■ Under these conditions suddenly many Nash equilibria exist. in�nitely This is sometimes called an embarrassment of richness (Peters, 2008). ■ Various rems state the existence of multiple equilibria in F olk theo games that are repeated an indefinite number of times. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 10

  39. Indefinite number of repetitions ■ A Pareto-suboptimal outcome can be avoided in case the following three conditions are met. 1. The Prisoners’ Dilemma is repeated an inde�nite number of times (rounds). r δ ∈ [ 0, 1 ] determines the probability of 2. A so-called dis ount fa to continuing the game after each round. 3. The probability to continue, δ , is large enough. ■ Under these conditions suddenly many Nash equilibria exist. in�nitely This is sometimes called an embarrassment of richness (Peters, 2008). ■ Various rems state the existence of multiple equilibria in F olk theo games that are repeated an indefinite number of times. ■ Here we discuss one version of “the” Folk Theorem. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 10

  40. Family of Folk Theorems There actually exist many Folk Theorems. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 11

  41. Family of Folk Theorems There actually exist many Folk Theorems. ■ Horizon . The game may be repeated indefinitely (present case) Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 11

  42. Family of Folk Theorems There actually exist many Folk Theorems. ■ Horizon . The game may be repeated indefinitely (present case) or there may be an upper bound to the number of plays. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 11

  43. Family of Folk Theorems There actually exist many Folk Theorems. ■ Horizon . The game may be repeated indefinitely (present case) or there may be an upper bound to the number of plays. ■ Information . Players may act on the basis of CKR (present case) Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 11

  44. Family of Folk Theorems There actually exist many Folk Theorems. ■ Horizon . The game may be repeated indefinitely (present case) or there may be an upper bound to the number of plays. ■ Information . Players may act on the basis of CKR (present case), or certain parts of the history may be hidden. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 11

  45. Family of Folk Theorems There actually exist many Folk Theorems. ■ Horizon . The game may be repeated indefinitely (present case) or there may be an upper bound to the number of plays. ■ Information . Players may act on the basis of CKR (present case), or certain parts of the history may be hidden. ■ Reward . Players may collect their payoff through a discount factor (present case) Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 11

  46. Family of Folk Theorems There actually exist many Folk Theorems. ■ Horizon . The game may be repeated indefinitely (present case) or there may be an upper bound to the number of plays. ■ Information . Players may act on the basis of CKR (present case), or certain parts of the history may be hidden. ■ Reward . Players may collect their payoff through a discount factor (present case) or through average rewards. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 11

  47. Family of Folk Theorems There actually exist many Folk Theorems. ■ Horizon . The game may be repeated indefinitely (present case) or there may be an upper bound to the number of plays. ■ Information . Players may act on the basis of CKR (present case), or certain parts of the history may be hidden. ■ Reward . Players may collect their payoff through a discount factor (present case) or through average rewards. ■ Subgame perfectness . Subgame perfect equilibria (present case) Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 11

  48. Family of Folk Theorems There actually exist many Folk Theorems. ■ Horizon . The game may be repeated indefinitely (present case) or there may be an upper bound to the number of plays. ■ Information . Players may act on the basis of CKR (present case), or certain parts of the history may be hidden. ■ Reward . Players may collect their payoff through a discount factor (present case) or through average rewards. ■ Subgame perfectness . Subgame perfect equilibria (present case) or plain Nash equilibria. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 11

  49. Family of Folk Theorems There actually exist many Folk Theorems. ■ Horizon . The game may be repeated indefinitely (present case) or there may be an upper bound to the number of plays. ■ Information . Players may act on the basis of CKR (present case), or certain parts of the history may be hidden. ■ Reward . Players may collect their payoff through a discount factor (present case) or through average rewards. ■ Subgame perfectness . Subgame perfect equilibria (present case) or plain Nash equilibria. ■ Equilibrium . We may be interested in Nash equilibria (present case) Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 11

  50. Family of Folk Theorems There actually exist many Folk Theorems. ■ Horizon . The game may be repeated indefinitely (present case) or there may be an upper bound to the number of plays. ■ Information . Players may act on the basis of CKR (present case), or certain parts of the history may be hidden. ■ Reward . Players may collect their payoff through a discount factor (present case) or through average rewards. ■ Subgame perfectness . Subgame perfect equilibria (present case) or plain Nash equilibria. ■ Equilibrium . We may be interested in Nash equilibria (present case), or other types of equilibria, such as so-called ǫ -Nash equilibria or so-called correlated equilibria . Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 11

  51. rep eated game stage game histo ry The concept of a repeated game Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 12

  52. rep eated game stage game histo ry The concept of a repeated game ■ Let G be a game in normal form. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 12

  53. stage game histo ry The concept of a repeated game ■ Let G be a game in normal form. game G ∗ ( δ ) , is G , played an indefinite number of times ■ The rep eated Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 12

  54. stage game histo ry The concept of a repeated game ■ Let G be a game in normal form. game G ∗ ( δ ) , is G , played an indefinite number of times, ■ The rep eated where δ represents the probability that the game will be played another time. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 12

  55. stage game histo ry The concept of a repeated game ■ Let G be a game in normal form. game G ∗ ( δ ) , is G , played an indefinite number of times, ■ The rep eated where δ represents the probability that the game will be played another time. Exercise: give P { G ∗ ( δ ) lasts at least t rounds } . Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 12

  56. stage game histo ry The concept of a repeated game ■ Let G be a game in normal form. game G ∗ ( δ ) , is G , played an indefinite number of times, ■ The rep eated where δ represents the probability that the game will be played another time. Exercise: give P { G ∗ ( δ ) lasts at least t rounds } . Answer: δ t . Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 12

  57. histo ry The concept of a repeated game ■ Let G be a game in normal form. game G ∗ ( δ ) , is G , played an indefinite number of times, ■ The rep eated where δ represents the probability that the game will be played another time. Exercise: give P { G ∗ ( δ ) lasts at least t rounds } . Answer: δ t . ■ G is called the game . stage Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 12

  58. The concept of a repeated game ■ Let G be a game in normal form. game G ∗ ( δ ) , is G , played an indefinite number of times, ■ The rep eated where δ represents the probability that the game will be played another time. Exercise: give P { G ∗ ( δ ) lasts at least t rounds } . Answer: δ t . ■ G is called the game . stage ■ A ry h of length t of a repeated game is a sequence of action histo profiles of length t . Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 12

  59. The concept of a repeated game ■ Let G be a game in normal form. game G ∗ ( δ ) , is G , played an indefinite number of times, ■ The rep eated where δ represents the probability that the game will be played another time. Exercise: give P { G ∗ ( δ ) lasts at least t rounds } . Answer: δ t . ■ G is called the game . stage ■ A ry h of length t of a repeated game is a sequence of action histo profiles of length t . Example: (for the prisoner’s dilemma): Row player: C D D D C C D D D D Column player: C D D D D D D C D D 0 1 2 3 4 5 6 7 8 9 Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 12

  60. strategy strategy p ro�le exp e ted pa y o� The concept of a repeated game (II) Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 13

  61. strategy strategy p ro�le exp e ted pa y o� The concept of a repeated game (II) ■ The set of all possible histories (of any length) is denoted by H . Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 13

  62. strategy p ro�le exp e ted pa y o� The concept of a repeated game (II) ■ The set of all possible histories (of any length) is denoted by H . strategy for Player i is a function s i : H → ∆ { C , D } such that ■ A Pr ( Player i plays C in round | h | + 1 | h ) = s i ( h ) ( C ) . Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 13

  63. exp e ted pa y o� The concept of a repeated game (II) ■ The set of all possible histories (of any length) is denoted by H . strategy for Player i is a function s i : H → ∆ { C , D } such that ■ A Pr ( Player i plays C in round | h | + 1 | h ) = s i ( h ) ( C ) . ■ A ro�le s is a combination of strategies, one for each player. strategy p Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 13

  64. The concept of a repeated game (II) ■ The set of all possible histories (of any length) is denoted by H . strategy for Player i is a function s i : H → ∆ { C , D } such that ■ A Pr ( Player i plays C in round | h | + 1 | h ) = s i ( h ) ( C ) . ■ A ro�le s is a combination of strategies, one for each player. strategy p ■ The o� for player i given s can be computed. It is exp e ted pa y ∞ δ t Expected payoff i , t ( s ) . ∑ Expected payoff i ( s ) = t = 0 Example on next page. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 13

  65. The concept of a repeated game (II) Repeated from the previous slide: The o� for player i given s can be computed. It is exp e ted pa y ∞ δ t Expected payoff i , t ( s ) . ∑ Expected payoff i ( s ) = t = 0 Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 14

  66. The concept of a repeated game (II) Repeated from the previous slide: The o� for player i given s can be computed. It is exp e ted pa y ∞ δ t Expected payoff i , t ( s ) . ∑ Expected payoff i ( s ) = t = 0 Example: prisoner’s dilemma, strategy Player 1 is s 1 = “always cooperate 80%”; Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 14

  67. The concept of a repeated game (II) Repeated from the previous slide: The o� for player i given s can be computed. It is exp e ted pa y ∞ δ t Expected payoff i , t ( s ) . ∑ Expected payoff i ( s ) = t = 0 Example: prisoner’s dilemma, strategy Player 1 is s 1 = “always cooperate 80%”; strategy Player 2 is s 2 = “always cooperate 70%”; Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 14

  68. The concept of a repeated game (II) Repeated from the previous slide: The o� for player i given s can be computed. It is exp e ted pa y ∞ δ t Expected payoff i , t ( s ) . ∑ Expected payoff i ( s ) = t = 0 Example: prisoner’s dilemma, strategy Player 1 is s 1 = “always cooperate 80%”; strategy Player 2 is s 2 = “always cooperate 70%”; δ = 1/2. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 14

  69. The concept of a repeated game (II) Repeated from the previous slide: The o� for player i given s can be computed. It is exp e ted pa y ∞ δ t Expected payoff i , t ( s ) . ∑ Expected payoff i ( s ) = t = 0 Example: prisoner’s dilemma, strategy Player 1 is s 1 = “always cooperate 80%”; strategy Player 2 is s 2 = “always cooperate 70%”; δ = 1/2. �� 1 � � t ∞ ∑ Expected payoff 1 ( s ) = [ 0.8 ( 0.7 · 3 + 0.3 · 0 ) + 0.2 ( 0.7 · 5 + 0.3 · 1 )] 2 t = 0 1 1 = 1 − 1/2 [ . . . ] ≈ 1 − 1/22.44 = 2 × 2.44 = 4.88. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 14

  70. Subgame perfect equilibria of G ∗ ( δ ) : D ∗ Definition. A strategy profile s for G ∗ ( δ ) is a subgame-p erfe t Nash rium if (1) it is a Nash equilibrium of this repeated game, and (2) equilib for every subgame (i.e., tail game) of this repeated game, s restricted to that subgame is a Nash equilibrium as well. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 15

  71. Subgame perfect equilibria of G ∗ ( δ ) : D ∗ Definition. A strategy profile s for G ∗ ( δ ) is a subgame-p erfe t Nash rium if (1) it is a Nash equilibrium of this repeated game, and (2) equilib for every subgame (i.e., tail game) of this repeated game, s restricted to that subgame is a Nash equilibrium as well. Consider the strategy of iterated defection D ∗ : “always defect, no matter what”. 1 Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 15

  72. Subgame perfect equilibria of G ∗ ( δ ) : D ∗ Definition. A strategy profile s for G ∗ ( δ ) is a subgame-p erfe t Nash rium if (1) it is a Nash equilibrium of this repeated game, and (2) equilib for every subgame (i.e., tail game) of this repeated game, s restricted to that subgame is a Nash equilibrium as well. Consider the strategy of iterated defection D ∗ : “always defect, no matter what”. 1 Claim . The strategy profile ( D ∗ , D ∗ ) is a subgame perfect equilibrium in G ∗ ( δ ) . Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 15

  73. Subgame perfect equilibria of G ∗ ( δ ) : D ∗ Definition. A strategy profile s for G ∗ ( δ ) is a subgame-p erfe t Nash rium if (1) it is a Nash equilibrium of this repeated game, and (2) equilib for every subgame (i.e., tail game) of this repeated game, s restricted to that subgame is a Nash equilibrium as well. Consider the strategy of iterated defection D ∗ : “always defect, no matter what”. 1 Claim . The strategy profile ( D ∗ , D ∗ ) is a subgame perfect equilibrium in G ∗ ( δ ) . Proof. Consider any tailgame starting at round t ≥ 0. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 15

  74. Subgame perfect equilibria of G ∗ ( δ ) : D ∗ Definition. A strategy profile s for G ∗ ( δ ) is a subgame-p erfe t Nash rium if (1) it is a Nash equilibrium of this repeated game, and (2) equilib for every subgame (i.e., tail game) of this repeated game, s restricted to that subgame is a Nash equilibrium as well. Consider the strategy of iterated defection D ∗ : “always defect, no matter what”. 1 Claim . The strategy profile ( D ∗ , D ∗ ) is a subgame perfect equilibrium in G ∗ ( δ ) . Proof. Consider any tailgame starting at round t ≥ 0. We are done if we can show that ( D ∗ , D ∗ ) is a NE for this subgame. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 15

  75. Subgame perfect equilibria of G ∗ ( δ ) : D ∗ Definition. A strategy profile s for G ∗ ( δ ) is a subgame-p erfe t Nash rium if (1) it is a Nash equilibrium of this repeated game, and (2) equilib for every subgame (i.e., tail game) of this repeated game, s restricted to that subgame is a Nash equilibrium as well. Consider the strategy of iterated defection D ∗ : “always defect, no matter what”. 1 Claim . The strategy profile ( D ∗ , D ∗ ) is a subgame perfect equilibrium in G ∗ ( δ ) . Proof. Consider any tailgame starting at round t ≥ 0. We are done if we can show that ( D ∗ , D ∗ ) is a NE for this subgame. This is true: given that one player always defects, it never pays off for the other player to play C at any time. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 15

  76. Subgame perfect equilibria of G ∗ ( δ ) : D ∗ Definition. A strategy profile s for G ∗ ( δ ) is a subgame-p erfe t Nash rium if (1) it is a Nash equilibrium of this repeated game, and (2) equilib for every subgame (i.e., tail game) of this repeated game, s restricted to that subgame is a Nash equilibrium as well. Consider the strategy of iterated defection D ∗ : “always defect, no matter what”. 1 Claim . The strategy profile ( D ∗ , D ∗ ) is a subgame perfect equilibrium in G ∗ ( δ ) . Proof. Consider any tailgame starting at round t ≥ 0. We are done if we can show that ( D ∗ , D ∗ ) is a NE for this subgame. This is true: given that one player always defects, it never pays off for the other player to play C at any time. Therefore, everyone sticks to D ∗ . 1 A notation like D ∗ or (worse) D ∞ is suggestive. Mathematically it makes no sense, but intuitively it does. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 15

  77. Part III: Trigger strategies Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 16

  78. Cost of deviating in Round N Consider the so-called trigger strategy T : “always play C unless D has been played at least once. In that case play D forever”. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 17

  79. Cost of deviating in Round N Consider the so-called trigger strategy T : “always play C unless D has been played at least once. In that case play D forever”. Claim . The strategy profile ( T , T ) is a subgame perfect equilibrium in G ∗ ( δ ) , provided the probability of continuation, δ , is sufficiently large. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 17

  80. Cost of deviating in Round N Consider the so-called trigger strategy T : “always play C unless D has been played at least once. In that case play D forever”. Claim . The strategy profile ( T , T ) is a subgame perfect equilibrium in G ∗ ( δ ) , provided the probability of continuation, δ , is sufficiently large. Proof. Suppose one player starts to defect at Round N . Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 17

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend