Multi-agent learning Gradient asent Gerard Vreeswijk , Intelligent - PowerPoint PPT Presentation

Two-player, two-action, general sum games In its most general form, a two-player, two-action game in normal form with real-valued payoffs can be represented by L R � r 11 , c 11 � T r 12 , c 12 M = B r 21 , c 21 r 22 , c 22 Row plays mixed ( α , 1 − α ) . Column plays mixed ( β , 1 − β ) . Expected payoff: Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 5

Two-player, two-action, general sum games In its most general form, a two-player, two-action game in normal form with real-valued payoffs can be represented by L R � r 11 , c 11 � T r 12 , c 12 M = B r 21 , c 21 r 22 , c 22 Row plays mixed ( α , 1 − α ) . Column plays mixed ( β , 1 − β ) . Expected payoff: u 1 ( α , β ) Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 5

Two-player, two-action, general sum games In its most general form, a two-player, two-action game in normal form with real-valued payoffs can be represented by L R � r 11 , c 11 � T r 12 , c 12 M = B r 21 , c 21 r 22 , c 22 Row plays mixed ( α , 1 − α ) . Column plays mixed ( β , 1 − β ) . Expected payoff: u 1 ( α , β ) = α [ β r 11 + ( 1 − β ) r 12 ] + ( 1 − α )[ β r 21 + ( 1 − β ) r 22 ] Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 5

Two-player, two-action, general sum games In its most general form, a two-player, two-action game in normal form with real-valued payoffs can be represented by L R � r 11 , c 11 � T r 12 , c 12 M = B r 21 , c 21 r 22 , c 22 Row plays mixed ( α , 1 − α ) . Column plays mixed ( β , 1 − β ) . Expected payoff: u 1 ( α , β ) = α [ β r 11 + ( 1 − β ) r 12 ] + ( 1 − α )[ β r 21 + ( 1 − β ) r 22 ] = u αβ + α ( r 12 − r 22 ) + β ( r 21 − r 22 ) + r 22 . Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 5

Two-player, two-action, general sum games In its most general form, a two-player, two-action game in normal form with real-valued payoffs can be represented by L R � r 11 , c 11 � T r 12 , c 12 M = B r 21 , c 21 r 22 , c 22 Row plays mixed ( α , 1 − α ) . Column plays mixed ( β , 1 − β ) . Expected payoff: u 1 ( α , β ) = α [ β r 11 + ( 1 − β ) r 12 ] + ( 1 − α )[ β r 21 + ( 1 − β ) r 22 ] = u αβ + α ( r 12 − r 22 ) + β ( r 21 − r 22 ) + r 22 . u 2 ( α , β ) Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 5

Two-player, two-action, general sum games In its most general form, a two-player, two-action game in normal form with real-valued payoffs can be represented by L R � r 11 , c 11 � T r 12 , c 12 M = B r 21 , c 21 r 22 , c 22 Row plays mixed ( α , 1 − α ) . Column plays mixed ( β , 1 − β ) . Expected payoff: u 1 ( α , β ) = α [ β r 11 + ( 1 − β ) r 12 ] + ( 1 − α )[ β r 21 + ( 1 − β ) r 22 ] = u αβ + α ( r 12 − r 22 ) + β ( r 21 − r 22 ) + r 22 . u 2 ( α , β ) = β [ α c 11 + ( 1 − α ) c 21 ] + ( 1 − β )[ α c 12 + ( 1 − α ) c 22 ] = u ′ αβ + α ( c 21 − c 22 ) + β ( c 12 − c 22 ) + c 22 . Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 5

Two-player, two-action, general sum games In its most general form, a two-player, two-action game in normal form with real-valued payoffs can be represented by L R � r 11 , c 11 � T r 12 , c 12 M = B r 21 , c 21 r 22 , c 22 Row plays mixed ( α , 1 − α ) . Column plays mixed ( β , 1 − β ) . Expected payoff: u 1 ( α , β ) = α [ β r 11 + ( 1 − β ) r 12 ] + ( 1 − α )[ β r 21 + ( 1 − β ) r 22 ] = u αβ + α ( r 12 − r 22 ) + β ( r 21 − r 22 ) + r 22 . u 2 ( α , β ) = β [ α c 11 + ( 1 − α ) c 21 ] + ( 1 − β )[ α c 12 + ( 1 − α ) c 22 ] = u ′ αβ + α ( c 21 − c 22 ) + β ( c 12 − c 22 ) + c 22 . where u = ( r 11 − r 12 ) − ( r 21 − r 22 ) and u ′ = ( c 11 − c 21 ) − ( c 12 − c 22 ) . Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 5

a�ne map Gradient of expected payoff Gradient: ∂ u 1 ( α , β ) = β u + ( r 12 − r 22 ) ∂α ∂ u 2 ( α , β ) = α u ′ + ( c 21 − c 22 ) ∂β Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 6

Gradient of expected payoff Gradient: ∂ u 1 ( α , β ) = β u + ( r 12 − r 22 ) ∂α ∂ u 2 ( α , β ) = α u ′ + ( c 21 − c 22 ) ∂β As an map : a�ne � ∂ u 1 / ∂α � � α � 0 � � u = u ′ ∂ u 2 / ∂β 0 β � r 12 − r 22 � + c 21 − c 22 � α � = U + C β Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 6

Gradient of expected payoff Gradient: Stationary point: ∂ u 1 ( α , β ) ( α ∗ , β ∗ ) = ( c 22 − c 21 , r 22 − r 12 = β u + ( r 12 − r 22 ) ) ∂α u ′ u ∂ u 2 ( α , β ) = α u ′ + ( c 21 − c 22 ) ∂β As an map : a�ne � ∂ u 1 / ∂α � � α � 0 � � u = u ′ ∂ u 2 / ∂β 0 β � r 12 − r 22 � + c 21 − c 22 � α � = U + C β Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 6

Gradient of expected payoff Gradient: Stationary point: ∂ u 1 ( α , β ) ( α ∗ , β ∗ ) = ( c 22 − c 21 , r 22 − r 12 = β u + ( r 12 − r 22 ) ) ∂α u ′ u ∂ u 2 ( α , β ) = α u ′ + ( c 21 − c 22 ) ∂β Remarks: As an map : a�ne � ∂ u 1 / ∂α � � α � 0 � � u = u ′ ∂ u 2 / ∂β 0 β � r 12 − r 22 � + c 21 − c 22 � α � = U + C β Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 6

Gradient of expected payoff Gradient: Stationary point: ∂ u 1 ( α , β ) ( α ∗ , β ∗ ) = ( c 22 − c 21 , r 22 − r 12 = β u + ( r 12 − r 22 ) ) ∂α u ′ u ∂ u 2 ( α , β ) = α u ′ + ( c 21 − c 22 ) ∂β Remarks: ■ There is at most one stationary As an map : a�ne point. � ∂ u 1 / ∂α � � α � 0 � � u = u ′ ∂ u 2 / ∂β 0 β � r 12 − r 22 � + c 21 − c 22 � α � = U + C β Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 6

Gradient of expected payoff Gradient: Stationary point: ∂ u 1 ( α , β ) ( α ∗ , β ∗ ) = ( c 22 − c 21 , r 22 − r 12 = β u + ( r 12 − r 22 ) ) ∂α u ′ u ∂ u 2 ( α , β ) = α u ′ + ( c 21 − c 22 ) ∂β Remarks: ■ There is at most one stationary As an map : a�ne point. � ∂ u 1 / ∂α � � α � 0 � � u = ■ If a stationary point exists, it u ′ ∂ u 2 / ∂β 0 β � r 12 − r 22 may lie outside [ 0, 1 ] 2 . � + c 21 − c 22 � α � = U + C β Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 6

Gradient of expected payoff Gradient: Stationary point: ∂ u 1 ( α , β ) ( α ∗ , β ∗ ) = ( c 22 − c 21 , r 22 − r 12 = β u + ( r 12 − r 22 ) ) ∂α u ′ u ∂ u 2 ( α , β ) = α u ′ + ( c 21 − c 22 ) ∂β Remarks: ■ There is at most one stationary As an map : a�ne point. � ∂ u 1 / ∂α � � α � 0 � � u = ■ If a stationary point exists, it u ′ ∂ u 2 / ∂β 0 β � r 12 − r 22 may lie outside [ 0, 1 ] 2 . � + c 21 − c 22 ■ If there is a stationary point � α inside [ 0, 1 ] 2 , it is a weak (i.e., � = U + C non-strict) Nash equilibrium. β Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 6

Example: payoffs in Stag Hunt (r=4, t=3, s=1, p=3) Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 7

Example: payoffs in Stag Hunt (r=4, t=3, s=1, p=3) Player 1 may only move “back – front”; Player 2 may only move “left – right”. Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 7

Part 2: IGA Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 8

Gradient ascent map : A�ne di�erential � α � α � ∂ u 1 / ∂α � � � = + η β β ∂ u 2 / ∂β t + 1 t t Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 9

Gradient ascent map : A�ne di�erential � α � α � ∂ u 1 / ∂α � � � = + η β β ∂ u 2 / ∂β t + 1 t t ■ Because α , β ∈ [ 0, 1 ] , the dynamics must be confined to [ 0, 1 ] 2 . Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 9

Gradient ascent map : A�ne di�erential � α � α � ∂ u 1 / ∂α � � � = + η β β ∂ u 2 / ∂β t + 1 t t ■ Because α , β ∈ [ 0, 1 ] , the dynamics must be confined to [ 0, 1 ] 2 . ■ Suppose the state ( α , β ) is on the boundary of the probability space [ 0, 1 ] 2 , and the gradient vector points outwards. Intuition: one of the players has an incentive to improve, but cannot improve further. Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 9

Gradient ascent map : A�ne di�erential � α � α � ∂ u 1 / ∂α � � � = + η β β ∂ u 2 / ∂β t + 1 t t ■ Because α , β ∈ [ 0, 1 ] , the ■ To maintain dynamics within [ 0, 1 ] 2 , the gradient is projected dynamics must be confined to [ 0, 1 ] 2 . back on to [ 0, 1 ] 2 . Intuition: if one of the players ■ Suppose the state ( α , β ) is on has an incentive to improve, but the boundary of the probability cannot improve, then he will not space [ 0, 1 ] 2 , and the gradient improve. vector points outwards. Intuition: one of the players has an incentive to improve, but cannot improve further. Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 9

Gradient ascent map : A�ne di�erential � α � α � ∂ u 1 / ∂α � � � = + η β β ∂ u 2 / ∂β t + 1 t t ■ Because α , β ∈ [ 0, 1 ] , the ■ To maintain dynamics within [ 0, 1 ] 2 , the gradient is projected dynamics must be confined to [ 0, 1 ] 2 . back on to [ 0, 1 ] 2 . Intuition: if one of the players ■ Suppose the state ( α , β ) is on has an incentive to improve, but the boundary of the probability cannot improve, then he will not space [ 0, 1 ] 2 , and the gradient improve. vector points outwards. ■ If nonzero, the projected Intuition: one of the players has an incentive to improve, but gradient is parallel to the (closest) boundary of [ 0, 1 ] 2 . cannot improve further. Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 9

invertible eigenvalue invertible eigenvalue invertible Infinitesimal Gradient Ascent : IGA (Singh et al. , 2000) map : A�ne di�erential � α � α �� 0 � � α � r 12 − r 22 � � � �� u = + η + u ′ c 21 − c 22 β β 0 β t + 1 t Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 10

invertible eigenvalue invertible eigenvalue invertible Infinitesimal Gradient Ascent : IGA (Singh et al. , 2000) map : A�ne di�erential � α � α �� 0 � � α � r 12 − r 22 � � � �� u = + η + u ′ c 21 − c 22 β β 0 β t + 1 t Theorem (Singh, Kearns and Mansour, 2000) If players follow IGA, where η → 0 , their average payoffs will converge to the (expected) payoffs of a NE. If their strategies converge, they will converge to that same NE. Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 10

invertible eigenvalue invertible eigenvalue invertible Infinitesimal Gradient Ascent : IGA (Singh et al. , 2000) map : A�ne di�erential � α � α �� 0 � � α � r 12 − r 22 � � � �� u = + η + u ′ c 21 − c 22 β β 0 β t + 1 t Theorem (Singh, Kearns and Mansour, 2000) If players follow IGA, where η → 0 , their average payoffs will converge to the (expected) payoffs of a NE. If their strategies converge, they will converge to that same NE. The proof is based on a qualitative result in the the theory of differential equations, which says that the behaviour of an affine differential map is determined by the multiplicative matrix U : Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 10

invertible eigenvalue invertible Infinitesimal Gradient Ascent : IGA (Singh et al. , 2000) map : A�ne di�erential � α � α �� 0 � � α � r 12 − r 22 � � � �� u = + η + u ′ c 21 − c 22 β β 0 β t + 1 t Theorem (Singh, Kearns and Mansour, 2000) If players follow IGA, where η → 0 , their average payoffs will converge to the (expected) payoffs of a NE. If their strategies converge, they will converge to that same NE. The proof is based on a qualitative result in the the theory of differential equations, which says that the behaviour of an affine differential map is determined by the multiplicative matrix U : eigenvalue λ (solution of Ux = λ x ⇔ solution 1. If U is invertible , and its of Det [ U − λ I ] = 0 ) is real, ∃ stationary point Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 10

invertible eigenvalue invertible Infinitesimal Gradient Ascent : IGA (Singh et al. , 2000) map : A�ne di�erential � α � α �� 0 � � α � r 12 − r 22 � � � �� u = + η + u ′ c 21 − c 22 β β 0 β t + 1 t Theorem (Singh, Kearns and Mansour, 2000) If players follow IGA, where η → 0 , their average payoffs will converge to the (expected) payoffs of a NE. If their strategies converge, they will converge to that same NE. The proof is based on a qualitative result in the the theory of differential equations, which says that the behaviour of an affine differential map is determined by the multiplicative matrix U : eigenvalue λ (solution of Ux = λ x ⇔ solution 1. If U is invertible , and its of Det [ U − λ I ] = 0 ) is real, ∃ stationary point, w.i. is a saddle point. Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 10

invertible Infinitesimal Gradient Ascent : IGA (Singh et al. , 2000) map : A�ne di�erential � α � α �� 0 � � α � r 12 − r 22 � � � �� u = + η + u ′ c 21 − c 22 β β 0 β t + 1 t Theorem (Singh, Kearns and Mansour, 2000) If players follow IGA, where η → 0 , their average payoffs will converge to the (expected) payoffs of a NE. If their strategies converge, they will converge to that same NE. The proof is based on a qualitative result in the the theory of differential equations, which says that the behaviour of an affine differential map is determined by the multiplicative matrix U : eigenvalue λ (solution of Ux = λ x ⇔ solution 1. If U is invertible , and its of Det [ U − λ I ] = 0 ) is real, ∃ stationary point, w.i. is a saddle point. 2. If U is invertible , and its eigenvalue λ is imaginary, there is a stationary point Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 10

invertible Infinitesimal Gradient Ascent : IGA (Singh et al. , 2000) map : A�ne di�erential � α � α �� 0 � � α � r 12 − r 22 � � � �� u = + η + u ′ c 21 − c 22 β β 0 β t + 1 t Theorem (Singh, Kearns and Mansour, 2000) If players follow IGA, where η → 0 , their average payoffs will converge to the (expected) payoffs of a NE. If their strategies converge, they will converge to that same NE. The proof is based on a qualitative result in the the theory of differential equations, which says that the behaviour of an affine differential map is determined by the multiplicative matrix U : eigenvalue λ (solution of Ux = λ x ⇔ solution 1. If U is invertible , and its of Det [ U − λ I ] = 0 ) is real, ∃ stationary point, w.i. is a saddle point. 2. If U is invertible , and its eigenvalue λ is imaginary, there is a stationary point, which, in particular, is a centric point. Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 10

Infinitesimal Gradient Ascent : IGA (Singh et al. , 2000) map : A�ne di�erential � α � α �� 0 � � α � r 12 − r 22 � � � �� u = + η + u ′ c 21 − c 22 β β 0 β t + 1 t Theorem (Singh, Kearns and Mansour, 2000) If players follow IGA, where η → 0 , their average payoffs will converge to the (expected) payoffs of a NE. If their strategies converge, they will converge to that same NE. The proof is based on a qualitative result in the the theory of differential equations, which says that the behaviour of an affine differential map is determined by the multiplicative matrix U : eigenvalue λ (solution of Ux = λ x ⇔ solution 1. If U is invertible , and its of Det [ U − λ I ] = 0 ) is real, ∃ stationary point, w.i. is a saddle point. 2. If U is invertible , and its eigenvalue λ is imaginary, there is a stationary point, which, in particular, is a centric point. invertible (iff u = 0 or u ′ = 0), there is no stationary point. 3. If U is not Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 10

Saddle point Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 11

Gradient ascent: Coordination game Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 12

Gradient ascent: Coordination game ■ Symmetric, but not zero sum: L R � 1, 1 � T 0, 0 B 0, 0 1, 1 Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 12

Gradient ascent: Coordination game ■ Symmetric, but not zero sum: L R � 1, 1 � T 0, 0 B 0, 0 1, 1 ■ Gradient: � 2 · β − 1 � 2 · α − 1 Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 12

Gradient ascent: Coordination game ■ Symmetric, but not zero sum: L R � 1, 1 � T 0, 0 B 0, 0 1, 1 ■ Gradient: � 2 · β − 1 � 2 · α − 1 ■ Stationary at ( 1/2, 1/2 ) . Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 12

Gradient ascent: Coordination game ■ Symmetric, but not zero sum: L R � 1, 1 � T 0, 0 B 0, 0 1, 1 ■ Gradient: � 2 · β − 1 � 2 · α − 1 ■ Stationary at ( 1/2, 1/2 ) . ■ Matrix � 0 � 2 U = 2 0 has real eigenvalues: λ 2 − 4 = 0. Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 12

Gradient ascent: Coordination game ■ Symmetric, but not zero sum: L R � 1, 1 � T 0, 0 B 0, 0 1, 1 ■ Gradient: � 2 · β − 1 � 2 · α − 1 ■ Stationary at ( 1/2, 1/2 ) . ■ Matrix � 0 � 2 U = 2 0 has real eigenvalues: λ 2 − 4 = 0. Saddle point inside [ 0, 1 ] 2 . Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 12

Gradient ascent: Prisoners’ Dilemma Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 13

Gradient ascent: Prisoners’ Dilemma ■ Symmetric, but not zero sum: L R � 3, 3 � T 0, 5 B 5, 0 1, 1 Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 13

Gradient ascent: Prisoners’ Dilemma ■ Symmetric, but not zero sum: L R � 3, 3 � T 0, 5 B 5, 0 1, 1 ■ Gradient: � − 1 · β − 1 � − 1 · α − 1 Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 13

Gradient ascent: Prisoners’ Dilemma ■ Symmetric, but not zero sum: L R � 3, 3 � T 0, 5 B 5, 0 1, 1 ■ Gradient: � − 1 · β − 1 � − 1 · α − 1 ■ Stationary at ( − 1, − 1 ) . Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 13

Gradient ascent: Prisoners’ Dilemma ■ Symmetric, but not zero sum: L R � 3, 3 � T 0, 5 B 5, 0 1, 1 ■ Gradient: � − 1 · β − 1 � − 1 · α − 1 ■ Stationary at ( − 1, − 1 ) . ■ Matrix � � − 1 0 U = − 1 0 has real eigenvalues: λ 2 − 1 = 0. Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 13

Gradient ascent: Prisoners’ Dilemma ■ Symmetric, but not zero sum: L R � 3, 3 � T 0, 5 B 5, 0 1, 1 ■ Gradient: � − 1 · β − 1 � − 1 · α − 1 ■ Stationary at ( − 1, − 1 ) . ■ Matrix � � − 1 0 U = − 1 0 has real eigenvalues: λ 2 − 1 = 0. Saddle point outside [ 0, 1 ] 2 . Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 13

Gradient ascent: Stag hunt Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 14

Gradient ascent: Stag hunt ■ Symmetric, but not zero sum: L R � 5, 5 � T 0, 3 B 3, 0 2, 2 Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 14

Gradient ascent: Stag hunt ■ Symmetric, but not zero sum: L R � 5, 5 � T 0, 3 B 3, 0 2, 2 ■ Gradient: � 4 · β − 2 � 4 · α − 2 Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 14

Gradient ascent: Stag hunt ■ Symmetric, but not zero sum: L R � 5, 5 � T 0, 3 B 3, 0 2, 2 ■ Gradient: � 4 · β − 2 � 4 · α − 2 ■ Stationary at ( 1/2, 1/2 ) . Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 14

Gradient ascent: Stag hunt ■ Symmetric, but not zero sum: L R � 5, 5 � T 0, 3 B 3, 0 2, 2 ■ Gradient: � 4 · β − 2 � 4 · α − 2 ■ Stationary at ( 1/2, 1/2 ) . ■ Matrix � 0 � 4 U = 4 0 has real eigenvalues: λ 2 − 16 = 0. Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 14

Gradient ascent: Stag hunt ■ Symmetric, but not zero sum: L R � 5, 5 � T 0, 3 B 3, 0 2, 2 ■ Gradient: � 4 · β − 2 � 4 · α − 2 ■ Stationary at ( 1/2, 1/2 ) . ■ Matrix � 0 � 4 U = 4 0 has real eigenvalues: λ 2 − 16 = 0. Saddle point inside [ 0, 1 ] 2 . Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 14

Gradient ascent: Game of Chicken Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 15

Gradient ascent: Game of Chicken ■ Symmetric, but not zero sum: L R � � − 1, 1 T 0, 0 1, − 1 − 3, − 3 B Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 15

Gradient ascent: Game of Chicken ■ Symmetric, but not zero sum: L R � � − 1, 1 T 0, 0 1, − 1 − 3, − 3 B ■ Gradient: � − 3 · β + 2 � − 3 · α + 2 Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 15

Gradient ascent: Game of Chicken ■ Symmetric, but not zero sum: L R � � − 1, 1 T 0, 0 1, − 1 − 3, − 3 B ■ Gradient: � − 3 · β + 2 � − 3 · α + 2 ■ Stationary at ( 2/3, 2/3 ) . Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 15

Gradient ascent: Game of Chicken ■ Symmetric, but not zero sum: L R � � − 1, 1 T 0, 0 1, − 1 − 3, − 3 B ■ Gradient: � − 3 · β + 2 � − 3 · α + 2 ■ Stationary at ( 2/3, 2/3 ) . ■ Matrix � � − 3 0 U = − 3 0 has real eigenvalues: λ 2 − 9 = 0. Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 15

Gradient ascent: Game of Chicken ■ Symmetric, but not zero sum: L R � � − 1, 1 T 0, 0 1, − 1 − 3, − 3 B ■ Gradient: � − 3 · β + 2 � − 3 · α + 2 ■ Stationary at ( 2/3, 2/3 ) . ■ Matrix � � − 3 0 U = − 3 0 has real eigenvalues: λ 2 − 9 = 0. Saddle point inside [ 0, 1 ] 2 . Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 15

Gradient ascent: Battle of the Sexes Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 16

Gradient ascent: Battle of the Sexes ■ Symmetric, but not zero sum: L R � 0, 0 � T 2, 3 B 3, 2 1, 1 Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 16

Gradient ascent: Battle of the Sexes ■ Symmetric, but not zero sum: L R � 0, 0 � T 2, 3 B 3, 2 1, 1 ■ Gradient: � − 4 · β + 1 � − 4 · α + 1 Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 16

Gradient ascent: Battle of the Sexes ■ Symmetric, but not zero sum: L R � 0, 0 � T 2, 3 B 3, 2 1, 1 ■ Gradient: � − 4 · β + 1 � − 4 · α + 1 ■ Stationary at ( 1/4, 1/4 ) . Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 16

Gradient ascent: Battle of the Sexes ■ Symmetric, but not zero sum: L R � 0, 0 � T 2, 3 B 3, 2 1, 1 ■ Gradient: � − 4 · β + 1 � − 4 · α + 1 ■ Stationary at ( 1/4, 1/4 ) . ■ Matrix � � − 4 0 U = − 4 0 has real eigenvalues: λ 2 − 16 = 0. Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 16

Gradient ascent: Battle of the Sexes ■ Symmetric, but not zero sum: L R � 0, 0 � T 2, 3 B 3, 2 1, 1 ■ Gradient: � − 4 · β + 1 � − 4 · α + 1 ■ Stationary at ( 1/4, 1/4 ) . ■ Matrix � � − 4 0 U = − 4 0 has real eigenvalues: λ 2 − 16 = 0. Saddle point inside [ 0, 1 ] 2 . Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 16

Gradient ascent: Matching pennies Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 17

Gradient ascent: Matching pennies ■ Symmetric, zero sum: L R � 1, − 1 � − 1, 1 T − 1, 1 1, − 1 B Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 17

Gradient ascent: Matching pennies ■ Symmetric, zero sum: L R � 1, − 1 � − 1, 1 T − 1, 1 1, − 1 B ■ Gradient: � � 4 · β − 2 − 4 · α + 2 Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 17

Gradient ascent: Matching pennies ■ Symmetric, zero sum: L R � 1, − 1 � − 1, 1 T − 1, 1 1, − 1 B ■ Gradient: � � 4 · β − 2 − 4 · α + 2 ■ Stationary at ( 1/2, 1/2 ) . Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 17

Gradient ascent: Matching pennies ■ Symmetric, zero sum: L R � 1, − 1 � − 1, 1 T − 1, 1 1, − 1 B ■ Gradient: � � 4 · β − 2 − 4 · α + 2 ■ Stationary at ( 1/2, 1/2 ) . ■ Matrix � � 0 4 U = − 4 0 has imaginary eigenvalues: λ 2 + 16 = 0. Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 17

Gradient ascent: Matching pennies ■ Symmetric, zero sum: L R � 1, − 1 � − 1, 1 T − 1, 1 1, − 1 B ■ Gradient: � � 4 · β − 2 − 4 · α + 2 ■ Stationary at ( 1/2, 1/2 ) . ■ Matrix � � 0 4 U = − 4 0 has imaginary eigenvalues: λ 2 + 16 = 0. Centric point inside [ 0, 1 ] 2 . Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 17

Gradient ascent: other game with centric Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 18

Gradient ascent: other game with centric ■ Symmetric, zero sum: L R � − 2, 2 � T 1, 1 3, − 3 − 2, 1 B Author: Gerard Vreeswijk. Slides last modified on May 16 th , 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 18

Multi-agent learning Gradient asent Gerard Vreeswijk , Intelligent - PowerPoint PPT Presentation

Multi-agent learning Gradient asent Gerard Vreeswijk , Intelligent Software Systems, Computer Science Department, Faculty of Sciences, Utrecht University, The Netherlands. Saturday 16 th May, 2020 Gradient ascent: idea Idea Author: Gerard

Multi-agent learning Multi-agent reinforcement learning Gerard Vreeswijk , Intelligent Systems

Overview Multi-Agent Systems Introduction to multi-agent systems and agent societies Agent

Multi-agent learning Gerard Vreeswijk , Intelligent Systems Group, Computer Science Department,

REINFORCEMENT LEARNING IN MULTI-AGENT SYSTEMS MACHINE LEARNING MEETUP DR. ANA PELETEIRO

An Agent Architecture An Agent Architecture An Agent Architecture An Agent Architecture for

S S S S calable calable Agent calable calable Agent Agent Plat forms Agent Plat forms

Agent-Based Systems Agent communication Speech act theory Michael Rovatsos Agent

Multi-agent learning Simplied Poker Yannick Bitane , April 14th, 2011. Yannick Bitane. Slides

Learning Agent Learning Agents An Agent that observes its performance and adapts its

ROMA: Multi-Agent Reinforcement Learning with Emerging Roles Tonghan Wang, Heng Dong, Victor

The Player Agent The Player Agent Are they the most important league official right now? right

Rational Agents (Ch. 2) Rational agent An agent/robot must be able to perceive and interact with

Agent-Based Systems Michael Rovatsos mrovatso@inf.ed.ac.uk Lecture 6 Agent Communication 1

Agent Training Welcome Blues Agent Portal Training e-Learning on the BCBSM Agent Portal

W HAT S AN A GENT ? Weiss, p. 29 [after Wooldridge and Jennings]: An agent is a

M ULTI -A GENT S YSTEMS Overview and Research Directions Whats an agent? AI Class 12 (C H .

Brief description of the LHC and ATLAS. The LHC has reached a new milestone in number of

Challenges in Nuclear Astrophysics A.B. Balantekin 5 th Interna*onal Conference on

Compromise, Consensus, and System-ness: Developing a Community Crisis Standards of

A road map to more complex dynamic models Y Y Y discrete discrete continuous A A A X X

sin(2 /2 1 ) Measurements Riccardo de Sangro INFN - Frascati FPCP May 19-24, 2013 Outline

Harnessing the power of Spark for Enterprise data engineering and analytics Vickye Jain,

EE 457 Unit 8 Exceptions What Happens When Things Go Wrong 2 What are Exceptions?

Direct Data Placement (DDP) over Reliable Transports 55 th IETF Atlanta 20 th November 2002