multi agent learning
play

Multi-agent learning Gradient asent Gerard Vreeswijk , Intelligent - PowerPoint PPT Presentation

Multi-agent learning Gradient asent Gerard Vreeswijk , Intelligent Software Systems, Computer Science Department, Faculty of Sciences, Utrecht University, The Netherlands. Saturday 16 th May, 2020 Gradient ascent: idea Idea Author: Gerard


  1. Two-player, two-action, general sum games In its most general form, a two-player, two-action game in normal form with real-valued payoffs can be represented by L R � r 11 , c 11 � T r 12 , c 12 M = B r 21 , c 21 r 22 , c 22 Row plays mixed ( α , 1 − α ) . Column plays mixed ( β , 1 − β ) . Expected payoff: Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 5

  2. Two-player, two-action, general sum games In its most general form, a two-player, two-action game in normal form with real-valued payoffs can be represented by L R � r 11 , c 11 � T r 12 , c 12 M = B r 21 , c 21 r 22 , c 22 Row plays mixed ( α , 1 − α ) . Column plays mixed ( β , 1 − β ) . Expected payoff: u 1 ( α , β ) Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 5

  3. Two-player, two-action, general sum games In its most general form, a two-player, two-action game in normal form with real-valued payoffs can be represented by L R � r 11 , c 11 � T r 12 , c 12 M = B r 21 , c 21 r 22 , c 22 Row plays mixed ( α , 1 − α ) . Column plays mixed ( β , 1 − β ) . Expected payoff: u 1 ( α , β ) = α [ β r 11 + ( 1 − β ) r 12 ] + ( 1 − α )[ β r 21 + ( 1 − β ) r 22 ] Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 5

  4. Two-player, two-action, general sum games In its most general form, a two-player, two-action game in normal form with real-valued payoffs can be represented by L R � r 11 , c 11 � T r 12 , c 12 M = B r 21 , c 21 r 22 , c 22 Row plays mixed ( α , 1 − α ) . Column plays mixed ( β , 1 − β ) . Expected payoff: u 1 ( α , β ) = α [ β r 11 + ( 1 − β ) r 12 ] + ( 1 − α )[ β r 21 + ( 1 − β ) r 22 ] = u αβ + α ( r 12 − r 22 ) + β ( r 21 − r 22 ) + r 22 . Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 5

  5. Two-player, two-action, general sum games In its most general form, a two-player, two-action game in normal form with real-valued payoffs can be represented by L R � r 11 , c 11 � T r 12 , c 12 M = B r 21 , c 21 r 22 , c 22 Row plays mixed ( α , 1 − α ) . Column plays mixed ( β , 1 − β ) . Expected payoff: u 1 ( α , β ) = α [ β r 11 + ( 1 − β ) r 12 ] + ( 1 − α )[ β r 21 + ( 1 − β ) r 22 ] = u αβ + α ( r 12 − r 22 ) + β ( r 21 − r 22 ) + r 22 . u 2 ( α , β ) Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 5

  6. Two-player, two-action, general sum games In its most general form, a two-player, two-action game in normal form with real-valued payoffs can be represented by L R � r 11 , c 11 � T r 12 , c 12 M = B r 21 , c 21 r 22 , c 22 Row plays mixed ( α , 1 − α ) . Column plays mixed ( β , 1 − β ) . Expected payoff: u 1 ( α , β ) = α [ β r 11 + ( 1 − β ) r 12 ] + ( 1 − α )[ β r 21 + ( 1 − β ) r 22 ] = u αβ + α ( r 12 − r 22 ) + β ( r 21 − r 22 ) + r 22 . u 2 ( α , β ) = β [ α c 11 + ( 1 − α ) c 21 ] + ( 1 − β )[ α c 12 + ( 1 − α ) c 22 ] = u ′ αβ + α ( c 21 − c 22 ) + β ( c 12 − c 22 ) + c 22 . Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 5

  7. Two-player, two-action, general sum games In its most general form, a two-player, two-action game in normal form with real-valued payoffs can be represented by L R � r 11 , c 11 � T r 12 , c 12 M = B r 21 , c 21 r 22 , c 22 Row plays mixed ( α , 1 − α ) . Column plays mixed ( β , 1 − β ) . Expected payoff: u 1 ( α , β ) = α [ β r 11 + ( 1 − β ) r 12 ] + ( 1 − α )[ β r 21 + ( 1 − β ) r 22 ] = u αβ + α ( r 12 − r 22 ) + β ( r 21 − r 22 ) + r 22 . u 2 ( α , β ) = β [ α c 11 + ( 1 − α ) c 21 ] + ( 1 − β )[ α c 12 + ( 1 − α ) c 22 ] = u ′ αβ + α ( c 21 − c 22 ) + β ( c 12 − c 22 ) + c 22 . where u = ( r 11 − r 12 ) − ( r 21 − r 22 ) and u ′ = ( c 11 − c 21 ) − ( c 12 − c 22 ) . Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 5

  8. a�ne map Gradient of expected payoff Gradient: ∂ u 1 ( α , β ) = β u + ( r 12 − r 22 ) ∂α ∂ u 2 ( α , β ) = α u ′ + ( c 21 − c 22 ) ∂β Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 6

  9. Gradient of expected payoff Gradient: ∂ u 1 ( α , β ) = β u + ( r 12 − r 22 ) ∂α ∂ u 2 ( α , β ) = α u ′ + ( c 21 − c 22 ) ∂β As an map : a�ne � ∂ u 1 / ∂α � � α � 0 � � u = u ′ ∂ u 2 / ∂β 0 β � r 12 − r 22 � + c 21 − c 22 � α � = U + C β Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 6

  10. Gradient of expected payoff Gradient: Stationary point: ∂ u 1 ( α , β ) ( α ∗ , β ∗ ) = ( c 22 − c 21 , r 22 − r 12 = β u + ( r 12 − r 22 ) ) ∂α u ′ u ∂ u 2 ( α , β ) = α u ′ + ( c 21 − c 22 ) ∂β As an map : a�ne � ∂ u 1 / ∂α � � α � 0 � � u = u ′ ∂ u 2 / ∂β 0 β � r 12 − r 22 � + c 21 − c 22 � α � = U + C β Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 6

  11. Gradient of expected payoff Gradient: Stationary point: ∂ u 1 ( α , β ) ( α ∗ , β ∗ ) = ( c 22 − c 21 , r 22 − r 12 = β u + ( r 12 − r 22 ) ) ∂α u ′ u ∂ u 2 ( α , β ) = α u ′ + ( c 21 − c 22 ) ∂β Remarks: As an map : a�ne � ∂ u 1 / ∂α � � α � 0 � � u = u ′ ∂ u 2 / ∂β 0 β � r 12 − r 22 � + c 21 − c 22 � α � = U + C β Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 6

  12. Gradient of expected payoff Gradient: Stationary point: ∂ u 1 ( α , β ) ( α ∗ , β ∗ ) = ( c 22 − c 21 , r 22 − r 12 = β u + ( r 12 − r 22 ) ) ∂α u ′ u ∂ u 2 ( α , β ) = α u ′ + ( c 21 − c 22 ) ∂β Remarks: ■ There is at most one stationary As an map : a�ne point. � ∂ u 1 / ∂α � � α � 0 � � u = u ′ ∂ u 2 / ∂β 0 β � r 12 − r 22 � + c 21 − c 22 � α � = U + C β Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 6

  13. Gradient of expected payoff Gradient: Stationary point: ∂ u 1 ( α , β ) ( α ∗ , β ∗ ) = ( c 22 − c 21 , r 22 − r 12 = β u + ( r 12 − r 22 ) ) ∂α u ′ u ∂ u 2 ( α , β ) = α u ′ + ( c 21 − c 22 ) ∂β Remarks: ■ There is at most one stationary As an map : a�ne point. � ∂ u 1 / ∂α � � α � 0 � � u = ■ If a stationary point exists, it u ′ ∂ u 2 / ∂β 0 β � r 12 − r 22 may lie outside [ 0, 1 ] 2 . � + c 21 − c 22 � α � = U + C β Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 6

  14. Gradient of expected payoff Gradient: Stationary point: ∂ u 1 ( α , β ) ( α ∗ , β ∗ ) = ( c 22 − c 21 , r 22 − r 12 = β u + ( r 12 − r 22 ) ) ∂α u ′ u ∂ u 2 ( α , β ) = α u ′ + ( c 21 − c 22 ) ∂β Remarks: ■ There is at most one stationary As an map : a�ne point. � ∂ u 1 / ∂α � � α � 0 � � u = ■ If a stationary point exists, it u ′ ∂ u 2 / ∂β 0 β � r 12 − r 22 may lie outside [ 0, 1 ] 2 . � + c 21 − c 22 ■ If there is a stationary point � α inside [ 0, 1 ] 2 , it is a weak (i.e., � = U + C non-strict) Nash equilibrium. β Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 6

  15. Example: payoffs in Stag Hunt (r=4, t=3, s=1, p=3) Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 7

  16. Example: payoffs in Stag Hunt (r=4, t=3, s=1, p=3) Player 1 may only move “back – front”; Player 2 may only move “left – right”. Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 7

  17. Part 2: IGA Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 8

  18. Gradient ascent map : A�ne di�erential � α � α � ∂ u 1 / ∂α � � � = + η β β ∂ u 2 / ∂β t + 1 t t Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 9

  19. Gradient ascent map : A�ne di�erential � α � α � ∂ u 1 / ∂α � � � = + η β β ∂ u 2 / ∂β t + 1 t t ■ Because α , β ∈ [ 0, 1 ] , the dynamics must be confined to [ 0, 1 ] 2 . Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 9

  20. Gradient ascent map : A�ne di�erential � α � α � ∂ u 1 / ∂α � � � = + η β β ∂ u 2 / ∂β t + 1 t t ■ Because α , β ∈ [ 0, 1 ] , the dynamics must be confined to [ 0, 1 ] 2 . ■ Suppose the state ( α , β ) is on the boundary of the probability space [ 0, 1 ] 2 , and the gradient vector points outwards. Intuition: one of the players has an incentive to improve, but cannot improve further. Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 9

  21. Gradient ascent map : A�ne di�erential � α � α � ∂ u 1 / ∂α � � � = + η β β ∂ u 2 / ∂β t + 1 t t ■ Because α , β ∈ [ 0, 1 ] , the ■ To maintain dynamics within [ 0, 1 ] 2 , the gradient is projected dynamics must be confined to [ 0, 1 ] 2 . back on to [ 0, 1 ] 2 . Intuition: if one of the players ■ Suppose the state ( α , β ) is on has an incentive to improve, but the boundary of the probability cannot improve, then he will not space [ 0, 1 ] 2 , and the gradient improve. vector points outwards. Intuition: one of the players has an incentive to improve, but cannot improve further. Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 9

  22. Gradient ascent map : A�ne di�erential � α � α � ∂ u 1 / ∂α � � � = + η β β ∂ u 2 / ∂β t + 1 t t ■ Because α , β ∈ [ 0, 1 ] , the ■ To maintain dynamics within [ 0, 1 ] 2 , the gradient is projected dynamics must be confined to [ 0, 1 ] 2 . back on to [ 0, 1 ] 2 . Intuition: if one of the players ■ Suppose the state ( α , β ) is on has an incentive to improve, but the boundary of the probability cannot improve, then he will not space [ 0, 1 ] 2 , and the gradient improve. vector points outwards. ■ If nonzero, the projected Intuition: one of the players has an incentive to improve, but gradient is parallel to the (closest) boundary of [ 0, 1 ] 2 . cannot improve further. Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 9

  23. invertible eigenvalue invertible eigenvalue invertible Infinitesimal Gradient Ascent : IGA (Singh et al. , 2000) map : A�ne di�erential � α � α �� 0 � � α � r 12 − r 22 � � � �� u = + η + u ′ c 21 − c 22 β β 0 β t + 1 t Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 10

  24. invertible eigenvalue invertible eigenvalue invertible Infinitesimal Gradient Ascent : IGA (Singh et al. , 2000) map : A�ne di�erential � α � α �� 0 � � α � r 12 − r 22 � � � �� u = + η + u ′ c 21 − c 22 β β 0 β t + 1 t Theorem (Singh, Kearns and Mansour, 2000) If players follow IGA, where η → 0 , their average payoffs will converge to the (expected) payoffs of a NE. If their strategies converge, they will converge to that same NE. Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 10

  25. invertible eigenvalue invertible eigenvalue invertible Infinitesimal Gradient Ascent : IGA (Singh et al. , 2000) map : A�ne di�erential � α � α �� 0 � � α � r 12 − r 22 � � � �� u = + η + u ′ c 21 − c 22 β β 0 β t + 1 t Theorem (Singh, Kearns and Mansour, 2000) If players follow IGA, where η → 0 , their average payoffs will converge to the (expected) payoffs of a NE. If their strategies converge, they will converge to that same NE. The proof is based on a qualitative result in the the theory of differential equations, which says that the behaviour of an affine differential map is determined by the multiplicative matrix U : Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 10

  26. invertible eigenvalue invertible Infinitesimal Gradient Ascent : IGA (Singh et al. , 2000) map : A�ne di�erential � α � α �� 0 � � α � r 12 − r 22 � � � �� u = + η + u ′ c 21 − c 22 β β 0 β t + 1 t Theorem (Singh, Kearns and Mansour, 2000) If players follow IGA, where η → 0 , their average payoffs will converge to the (expected) payoffs of a NE. If their strategies converge, they will converge to that same NE. The proof is based on a qualitative result in the the theory of differential equations, which says that the behaviour of an affine differential map is determined by the multiplicative matrix U : eigenvalue λ (solution of Ux = λ x ⇔ solution 1. If U is invertible , and its of Det [ U − λ I ] = 0 ) is real, ∃ stationary point Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 10

  27. invertible eigenvalue invertible Infinitesimal Gradient Ascent : IGA (Singh et al. , 2000) map : A�ne di�erential � α � α �� 0 � � α � r 12 − r 22 � � � �� u = + η + u ′ c 21 − c 22 β β 0 β t + 1 t Theorem (Singh, Kearns and Mansour, 2000) If players follow IGA, where η → 0 , their average payoffs will converge to the (expected) payoffs of a NE. If their strategies converge, they will converge to that same NE. The proof is based on a qualitative result in the the theory of differential equations, which says that the behaviour of an affine differential map is determined by the multiplicative matrix U : eigenvalue λ (solution of Ux = λ x ⇔ solution 1. If U is invertible , and its of Det [ U − λ I ] = 0 ) is real, ∃ stationary point, w.i. is a saddle point. Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 10

  28. invertible Infinitesimal Gradient Ascent : IGA (Singh et al. , 2000) map : A�ne di�erential � α � α �� 0 � � α � r 12 − r 22 � � � �� u = + η + u ′ c 21 − c 22 β β 0 β t + 1 t Theorem (Singh, Kearns and Mansour, 2000) If players follow IGA, where η → 0 , their average payoffs will converge to the (expected) payoffs of a NE. If their strategies converge, they will converge to that same NE. The proof is based on a qualitative result in the the theory of differential equations, which says that the behaviour of an affine differential map is determined by the multiplicative matrix U : eigenvalue λ (solution of Ux = λ x ⇔ solution 1. If U is invertible , and its of Det [ U − λ I ] = 0 ) is real, ∃ stationary point, w.i. is a saddle point. 2. If U is invertible , and its eigenvalue λ is imaginary, there is a stationary point Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 10

  29. invertible Infinitesimal Gradient Ascent : IGA (Singh et al. , 2000) map : A�ne di�erential � α � α �� 0 � � α � r 12 − r 22 � � � �� u = + η + u ′ c 21 − c 22 β β 0 β t + 1 t Theorem (Singh, Kearns and Mansour, 2000) If players follow IGA, where η → 0 , their average payoffs will converge to the (expected) payoffs of a NE. If their strategies converge, they will converge to that same NE. The proof is based on a qualitative result in the the theory of differential equations, which says that the behaviour of an affine differential map is determined by the multiplicative matrix U : eigenvalue λ (solution of Ux = λ x ⇔ solution 1. If U is invertible , and its of Det [ U − λ I ] = 0 ) is real, ∃ stationary point, w.i. is a saddle point. 2. If U is invertible , and its eigenvalue λ is imaginary, there is a stationary point, which, in particular, is a centric point. Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 10

  30. Infinitesimal Gradient Ascent : IGA (Singh et al. , 2000) map : A�ne di�erential � α � α �� 0 � � α � r 12 − r 22 � � � �� u = + η + u ′ c 21 − c 22 β β 0 β t + 1 t Theorem (Singh, Kearns and Mansour, 2000) If players follow IGA, where η → 0 , their average payoffs will converge to the (expected) payoffs of a NE. If their strategies converge, they will converge to that same NE. The proof is based on a qualitative result in the the theory of differential equations, which says that the behaviour of an affine differential map is determined by the multiplicative matrix U : eigenvalue λ (solution of Ux = λ x ⇔ solution 1. If U is invertible , and its of Det [ U − λ I ] = 0 ) is real, ∃ stationary point, w.i. is a saddle point. 2. If U is invertible , and its eigenvalue λ is imaginary, there is a stationary point, which, in particular, is a centric point. invertible (iff u = 0 or u ′ = 0), there is no stationary point. 3. If U is not Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 10

  31. Saddle point Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 11

  32. Gradient ascent: Coordination game Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 12

  33. Gradient ascent: Coordination game ■ Symmetric, but not zero sum: L R � 1, 1 � T 0, 0 B 0, 0 1, 1 Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 12

  34. Gradient ascent: Coordination game ■ Symmetric, but not zero sum: L R � 1, 1 � T 0, 0 B 0, 0 1, 1 ■ Gradient: � 2 · β − 1 � 2 · α − 1 Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 12

  35. Gradient ascent: Coordination game ■ Symmetric, but not zero sum: L R � 1, 1 � T 0, 0 B 0, 0 1, 1 ■ Gradient: � 2 · β − 1 � 2 · α − 1 ■ Stationary at ( 1/2, 1/2 ) . Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 12

  36. Gradient ascent: Coordination game ■ Symmetric, but not zero sum: L R � 1, 1 � T 0, 0 B 0, 0 1, 1 ■ Gradient: � 2 · β − 1 � 2 · α − 1 ■ Stationary at ( 1/2, 1/2 ) . ■ Matrix � 0 � 2 U = 2 0 has real eigenvalues: λ 2 − 4 = 0. Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 12

  37. Gradient ascent: Coordination game ■ Symmetric, but not zero sum: L R � 1, 1 � T 0, 0 B 0, 0 1, 1 ■ Gradient: � 2 · β − 1 � 2 · α − 1 ■ Stationary at ( 1/2, 1/2 ) . ■ Matrix � 0 � 2 U = 2 0 has real eigenvalues: λ 2 − 4 = 0. Saddle point inside [ 0, 1 ] 2 . Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 12

  38. Gradient ascent: Coordination game ■ Symmetric, but not zero sum: L R � 1, 1 � T 0, 0 B 0, 0 1, 1 ■ Gradient: � 2 · β − 1 � 2 · α − 1 ■ Stationary at ( 1/2, 1/2 ) . ■ Matrix � 0 � 2 U = 2 0 has real eigenvalues: λ 2 − 4 = 0. Saddle point inside [ 0, 1 ] 2 . Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 12

  39. Gradient ascent: Prisoners’ Dilemma Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 13

  40. Gradient ascent: Prisoners’ Dilemma ■ Symmetric, but not zero sum: L R � 3, 3 � T 0, 5 B 5, 0 1, 1 Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 13

  41. Gradient ascent: Prisoners’ Dilemma ■ Symmetric, but not zero sum: L R � 3, 3 � T 0, 5 B 5, 0 1, 1 ■ Gradient: � − 1 · β − 1 � − 1 · α − 1 Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 13

  42. Gradient ascent: Prisoners’ Dilemma ■ Symmetric, but not zero sum: L R � 3, 3 � T 0, 5 B 5, 0 1, 1 ■ Gradient: � − 1 · β − 1 � − 1 · α − 1 ■ Stationary at ( − 1, − 1 ) . Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 13

  43. Gradient ascent: Prisoners’ Dilemma ■ Symmetric, but not zero sum: L R � 3, 3 � T 0, 5 B 5, 0 1, 1 ■ Gradient: � − 1 · β − 1 � − 1 · α − 1 ■ Stationary at ( − 1, − 1 ) . ■ Matrix � � − 1 0 U = − 1 0 has real eigenvalues: λ 2 − 1 = 0. Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 13

  44. Gradient ascent: Prisoners’ Dilemma ■ Symmetric, but not zero sum: L R � 3, 3 � T 0, 5 B 5, 0 1, 1 ■ Gradient: � − 1 · β − 1 � − 1 · α − 1 ■ Stationary at ( − 1, − 1 ) . ■ Matrix � � − 1 0 U = − 1 0 has real eigenvalues: λ 2 − 1 = 0. Saddle point outside [ 0, 1 ] 2 . Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 13

  45. Gradient ascent: Prisoners’ Dilemma ■ Symmetric, but not zero sum: L R � 3, 3 � T 0, 5 B 5, 0 1, 1 ■ Gradient: � − 1 · β − 1 � − 1 · α − 1 ■ Stationary at ( − 1, − 1 ) . ■ Matrix � � − 1 0 U = − 1 0 has real eigenvalues: λ 2 − 1 = 0. Saddle point outside [ 0, 1 ] 2 . Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 13

  46. Gradient ascent: Stag hunt Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 14

  47. Gradient ascent: Stag hunt ■ Symmetric, but not zero sum: L R � 5, 5 � T 0, 3 B 3, 0 2, 2 Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 14

  48. Gradient ascent: Stag hunt ■ Symmetric, but not zero sum: L R � 5, 5 � T 0, 3 B 3, 0 2, 2 ■ Gradient: � 4 · β − 2 � 4 · α − 2 Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 14

  49. Gradient ascent: Stag hunt ■ Symmetric, but not zero sum: L R � 5, 5 � T 0, 3 B 3, 0 2, 2 ■ Gradient: � 4 · β − 2 � 4 · α − 2 ■ Stationary at ( 1/2, 1/2 ) . Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 14

  50. Gradient ascent: Stag hunt ■ Symmetric, but not zero sum: L R � 5, 5 � T 0, 3 B 3, 0 2, 2 ■ Gradient: � 4 · β − 2 � 4 · α − 2 ■ Stationary at ( 1/2, 1/2 ) . ■ Matrix � 0 � 4 U = 4 0 has real eigenvalues: λ 2 − 16 = 0. Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 14

  51. Gradient ascent: Stag hunt ■ Symmetric, but not zero sum: L R � 5, 5 � T 0, 3 B 3, 0 2, 2 ■ Gradient: � 4 · β − 2 � 4 · α − 2 ■ Stationary at ( 1/2, 1/2 ) . ■ Matrix � 0 � 4 U = 4 0 has real eigenvalues: λ 2 − 16 = 0. Saddle point inside [ 0, 1 ] 2 . Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 14

  52. Gradient ascent: Stag hunt ■ Symmetric, but not zero sum: L R � 5, 5 � T 0, 3 B 3, 0 2, 2 ■ Gradient: � 4 · β − 2 � 4 · α − 2 ■ Stationary at ( 1/2, 1/2 ) . ■ Matrix � 0 � 4 U = 4 0 has real eigenvalues: λ 2 − 16 = 0. Saddle point inside [ 0, 1 ] 2 . Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 14

  53. Gradient ascent: Game of Chicken Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 15

  54. Gradient ascent: Game of Chicken ■ Symmetric, but not zero sum: L R � � − 1, 1 T 0, 0 1, − 1 − 3, − 3 B Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 15

  55. Gradient ascent: Game of Chicken ■ Symmetric, but not zero sum: L R � � − 1, 1 T 0, 0 1, − 1 − 3, − 3 B ■ Gradient: � − 3 · β + 2 � − 3 · α + 2 Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 15

  56. Gradient ascent: Game of Chicken ■ Symmetric, but not zero sum: L R � � − 1, 1 T 0, 0 1, − 1 − 3, − 3 B ■ Gradient: � − 3 · β + 2 � − 3 · α + 2 ■ Stationary at ( 2/3, 2/3 ) . Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 15

  57. Gradient ascent: Game of Chicken ■ Symmetric, but not zero sum: L R � � − 1, 1 T 0, 0 1, − 1 − 3, − 3 B ■ Gradient: � − 3 · β + 2 � − 3 · α + 2 ■ Stationary at ( 2/3, 2/3 ) . ■ Matrix � � − 3 0 U = − 3 0 has real eigenvalues: λ 2 − 9 = 0. Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 15

  58. Gradient ascent: Game of Chicken ■ Symmetric, but not zero sum: L R � � − 1, 1 T 0, 0 1, − 1 − 3, − 3 B ■ Gradient: � − 3 · β + 2 � − 3 · α + 2 ■ Stationary at ( 2/3, 2/3 ) . ■ Matrix � � − 3 0 U = − 3 0 has real eigenvalues: λ 2 − 9 = 0. Saddle point inside [ 0, 1 ] 2 . Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 15

  59. Gradient ascent: Game of Chicken ■ Symmetric, but not zero sum: L R � � − 1, 1 T 0, 0 1, − 1 − 3, − 3 B ■ Gradient: � − 3 · β + 2 � − 3 · α + 2 ■ Stationary at ( 2/3, 2/3 ) . ■ Matrix � � − 3 0 U = − 3 0 has real eigenvalues: λ 2 − 9 = 0. Saddle point inside [ 0, 1 ] 2 . Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 15

  60. Gradient ascent: Battle of the Sexes Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 16

  61. Gradient ascent: Battle of the Sexes ■ Symmetric, but not zero sum: L R � 0, 0 � T 2, 3 B 3, 2 1, 1 Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 16

  62. Gradient ascent: Battle of the Sexes ■ Symmetric, but not zero sum: L R � 0, 0 � T 2, 3 B 3, 2 1, 1 ■ Gradient: � − 4 · β + 1 � − 4 · α + 1 Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 16

  63. Gradient ascent: Battle of the Sexes ■ Symmetric, but not zero sum: L R � 0, 0 � T 2, 3 B 3, 2 1, 1 ■ Gradient: � − 4 · β + 1 � − 4 · α + 1 ■ Stationary at ( 1/4, 1/4 ) . Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 16

  64. Gradient ascent: Battle of the Sexes ■ Symmetric, but not zero sum: L R � 0, 0 � T 2, 3 B 3, 2 1, 1 ■ Gradient: � − 4 · β + 1 � − 4 · α + 1 ■ Stationary at ( 1/4, 1/4 ) . ■ Matrix � � − 4 0 U = − 4 0 has real eigenvalues: λ 2 − 16 = 0. Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 16

  65. Gradient ascent: Battle of the Sexes ■ Symmetric, but not zero sum: L R � 0, 0 � T 2, 3 B 3, 2 1, 1 ■ Gradient: � − 4 · β + 1 � − 4 · α + 1 ■ Stationary at ( 1/4, 1/4 ) . ■ Matrix � � − 4 0 U = − 4 0 has real eigenvalues: λ 2 − 16 = 0. Saddle point inside [ 0, 1 ] 2 . Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 16

  66. Gradient ascent: Battle of the Sexes ■ Symmetric, but not zero sum: L R � 0, 0 � T 2, 3 B 3, 2 1, 1 ■ Gradient: � − 4 · β + 1 � − 4 · α + 1 ■ Stationary at ( 1/4, 1/4 ) . ■ Matrix � � − 4 0 U = − 4 0 has real eigenvalues: λ 2 − 16 = 0. Saddle point inside [ 0, 1 ] 2 . Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 16

  67. Gradient ascent: Matching pennies Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 17

  68. Gradient ascent: Matching pennies ■ Symmetric, zero sum: L R � 1, − 1 � − 1, 1 T − 1, 1 1, − 1 B Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 17

  69. Gradient ascent: Matching pennies ■ Symmetric, zero sum: L R � 1, − 1 � − 1, 1 T − 1, 1 1, − 1 B ■ Gradient: � � 4 · β − 2 − 4 · α + 2 Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 17

  70. Gradient ascent: Matching pennies ■ Symmetric, zero sum: L R � 1, − 1 � − 1, 1 T − 1, 1 1, − 1 B ■ Gradient: � � 4 · β − 2 − 4 · α + 2 ■ Stationary at ( 1/2, 1/2 ) . Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 17

  71. Gradient ascent: Matching pennies ■ Symmetric, zero sum: L R � 1, − 1 � − 1, 1 T − 1, 1 1, − 1 B ■ Gradient: � � 4 · β − 2 − 4 · α + 2 ■ Stationary at ( 1/2, 1/2 ) . ■ Matrix � � 0 4 U = − 4 0 has imaginary eigenvalues: λ 2 + 16 = 0. Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 17

  72. Gradient ascent: Matching pennies ■ Symmetric, zero sum: L R � 1, − 1 � − 1, 1 T − 1, 1 1, − 1 B ■ Gradient: � � 4 · β − 2 − 4 · α + 2 ■ Stationary at ( 1/2, 1/2 ) . ■ Matrix � � 0 4 U = − 4 0 has imaginary eigenvalues: λ 2 + 16 = 0. Centric point inside [ 0, 1 ] 2 . Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 17

  73. Gradient ascent: Matching pennies ■ Symmetric, zero sum: L R � 1, − 1 � − 1, 1 T − 1, 1 1, − 1 B ■ Gradient: � � 4 · β − 2 − 4 · α + 2 ■ Stationary at ( 1/2, 1/2 ) . ■ Matrix � � 0 4 U = − 4 0 has imaginary eigenvalues: λ 2 + 16 = 0. Centric point inside [ 0, 1 ] 2 . Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 17

  74. Gradient ascent: other game with centric Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 18

  75. Gradient ascent: other game with centric ■ Symmetric, zero sum: L R � − 2, 2 � T 1, 1 3, − 3 − 2, 1 B Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 18

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend