Multi-agent learning Gerard Vreeswijk , Intelligent Systems Group, - PowerPoint PPT Presentation

Multi-agent learning Gradient Dynamics Gradient Dynami s Multi-agent learning Gerard Vreeswijk , Intelligent Systems Group, Computer Science Department, Faculty of Sciences, Utrecht University, The Netherlands. Last modified on March 1 st , 2012 at 10:07 Gerard Vreeswijk. Slide 1

Multi-agent learning Gradient Dynamics Gradient dynamics: motivation • Every player “identifies itself” with a single mixed strategy. pla y ers even kno w the mixed strategies of their opp onent . • Like in fictitious play, players project each other on a mixed strategy. • CKR is in order. CKR (common knowledge of rationality, cf. Hargreaves Heap & Varoufakis, 2004) implies that players know everything. In this case, however, (Hence, q − i = s − i , for all i .) – Fictitious play assesses strategies, and plays a best response to an assessment. – Gradient dynamics does not asses, and it does not play a best response. • With gradient dynamics, players don’t actually (need to) play to learn. Rather, players gradually adapt their strategy through hill-climbing in the payoff space. Last modified on March 1 st , 2012 at 10:07 Gerard Vreeswijk. Slide 2

Multi-agent learning Gradient Dynamics Dynami s of (mixed) strategies in such games. Plan for today 1. Two-player, two-action, general sum games with real payoffs. 2. Examples: (a) Coordination game (b) Prisoners’ dilemma (c) Other examples 3. IGA : I nfinitesimal G radient A scent. Singh, Kearns and Mansour (2000). — Convergence of IGA. 4. IGA-WoLF : W in o r L earn F ast. Bowling and Veloso (2001, 2002). — Convergence of IGA-WoLF. — Analysis of the proof of convergence of IGA-WoLF. Last modified on March 1 st , 2012 at 10:07 Gerard Vreeswijk. Slide 3

Multi-agent learning Gradient Dynamics Two-player, two-action, general sum games with real payoffs In its most general form, a two-player, two-action, general sum game in normal form with real-valued payoffs can be represented by L R � � T r 11 , c 11 r 12 , c 12 M = B r 21 , c 21 r 22 , c 22 Row plays mixed ( α , 1 − α ) . Column plays mixed ( β , 1 − β ) . Expected payoff: V r ( α , β ) = α [ β r 11 + ( 1 − β ) r 12 ] + ( 1 − α )[ β r 21 + ( 1 − β ) r 22 ] = u αβ + α ( r 12 − r 22 ) + β ( r 21 − r 22 ) + r 22 . V c ( α , β ) = β [ α c 11 + ( 1 − α ) c 21 ] + ( 1 − β )[ α c 12 + ( 1 − α ) c 22 ] = u ′ αβ + α ( c 21 − c 22 ) + β ( c 12 − c 22 ) + c 22 . where u = ( r 11 − r 12 ) − ( r 21 − r 22 ) and u ′ = ( c 11 − c 21 ) − ( c 12 − c 22 ) . Last modified on March 1 st , 2012 at 10:07 Gerard Vreeswijk. Slide 4

Multi-agent learning Gradient Dynamics Example: payoffs for Player 1 and Player 2 in Stag Hunt Player 1 may only move “back – front”; Player 2 may only move “left – right”. Last modified on March 1 st , 2012 at 10:07 Gerard Vreeswijk. Slide 5

Multi-agent learning Gradient Dynamics Gradient of expected payoff Stationary point: Gradient: ∂ v r ( α , β ) ( α ∗ , β ∗ ) = ( c 21 − c 22 , r 12 − r 22 a�ne dynami al system : = β u + ( r 12 − r 22 ) ) ∂α u ′ u ∂ v c ( α , β ) = α u ′ + ( c 21 − c 22 ) Remarks: ∂β • There is at most one stationary As an point. � � ∂ V r / ∂α = • If a stationary point exists, it may ∂ V c / ∂β lie outside [ 0, 1 ] 2 . � � � � 0 u α • If there is a stationary point + u ′ 0 β inside [ 0, 1 ] 2 , it is a non-strict � � Nash equilibrium. r 12 − r 22 c 21 − c 22 Last modified on March 1 st , 2012 at 10:07 Gerard Vreeswijk. Slide 6

Multi-agent learning Gradient Dynamics Gradient dynamics: Coordination game • Symmetric, but not zero sum: L R � � T 1, 1 0, 0 B 0, 0 1, 1 • Gradient: � � 2 · β − 1 2 · α − 1 • Stationary at ( 1/2, 1/2 ) . Multip. matrix has a real eigenvalue: � � 0 2 2 0 Last modified on March 1 st , 2012 at 10:07 Gerard Vreeswijk. Slide 7

Multi-agent learning Gradient Dynamics Saddle point Last modified on March 1 st , 2012 at 10:07 Gerard Vreeswijk. Slide 8

Multi-agent learning Gradient Dynamics Gradient dynamics: Stag hunt • Symmetric, but not zero sum: L R � � T 5, 5 0, 3 B 3, 0 2, 2 • Gradient: � � 4 · β − 2 4 · α − 2 • Stationary at ( 1/2, 1/2 ) . Multip. matrix has a real eigenvalue: � � 0 4 4 0 Last modified on March 1 st , 2012 at 10:07 Gerard Vreeswijk. Slide 9

Multi-agent learning Gradient Dynamics Gradient dynamics: Prisoners’ Dilemma • Symmetric, but not zero sum: L R � � T 3, 3 0, 5 B 5, 0 1, 1 • Gradient: � � − 1 · β − 1 − 1 · α − 1 • Stationary at ( − 1, − 1 ) . Multip. matrix has a real eigenvalue: � � − 1 0 − 1 0 Last modified on March 1 st , 2012 at 10:07 Gerard Vreeswijk. Slide 10

Multi-agent learning Gradient Dynamics Gradient dynamics: Game of Chicken • Symmetric, but not zero sum: L R � � − 1, 1 T 0, 0 1, − 1 − 3, − 3 B • Gradient: � � − 3 · β + 2 − 3 · α + 2 • Stationary at ( 2/3, 2/3 ) . Multip. matrix has a real eigenvalue: � � − 3 0 − 3 0 Last modified on March 1 st , 2012 at 10:07 Gerard Vreeswijk. Slide 11

Multi-agent learning Gradient Dynamics Gradient dynamics: Battle of the Sexes • Symmetric, but not zero sum: L R � � T 0, 0 2, 3 B 3, 2 1, 1 • Gradient: � � − 4 · β + 1 − 4 · α + 1 • Stationary at ( 1/4, 1/4 ) . Multip. matrix has a real eigenvalue: � � − 4 0 − 4 0 Last modified on March 1 st , 2012 at 10:07 Gerard Vreeswijk. Slide 12

Multi-agent learning Gradient Dynamics Gradient dynamics: Matching pennies • Symmetric, zero sum: L R � � 1, − 1 − 1, 1 T − 1, 1 1, − 1 B • Gradient: � � 4 · β − 2 − 4 · α + 2 • Stationary at ( 1/2, 1/2 ) . Multip. matrix has imaginary eigenvalue: � � 0 4 − 4 0 Last modified on March 1 st , 2012 at 10:07 Gerard Vreeswijk. Slide 13

Multi-agent learning Gradient Dynamics Gradient dynamics of expected payoff Discrete dynamics with step size η : an incentive to improve, but cannot improve further. � � � � � � α α ∂ V r / ∂α = + η • To maintain dynamics within β β ∂ V c / ∂β t + 1 t t [ 0, 1 ] 2 , the gradient is projected back on to [ 0, 1 ] 2 . • Because α , β ∈ [ 0, 1 ] , the Intuition: if one of the players has dynamics must be confined to an incentive to improve, but [ 0, 1 ] 2 . cannot improve, then he will not • Suppose the state ( α , β ) is on the improve. boundary of the probability space • If nonzero, the projected gradient [ 0, 1 ] 2 , and the gradient vector is parallel to the (closest part of points outwards. the) boundary. Intuition: one of the players has Last modified on March 1 st , 2012 at 10:07 Gerard Vreeswijk. Slide 14

Multi-agent learning Gradient Dynamics Infinitesimal Gradient Ascent : IGA (Singh et al. , 2000) IGA: Discrete dynamics with step size η → 0: � � � � �� r 12 − r 22 α α 0 u α = + η + u ′ c 21 − c 22 β β 0 β t + 1 t Theorem (Singh, Kearns and Mansour, 2000) If players follow IGA, where η → 0 , then their strategies will converge to a Nash equilibrium. If not, then at least their average payoffs will converge to the expected payoffs of a Nash equilibrium. invertible . The proof is based on a qualitative result in the theory of differential invertible , and eigenvalue Ux = λ x is real. equations. The behaviour of an affine differential map is determined by the multiplicative matrix U : invertible , and eigenvalue Ux = λ x is imaginary. 1. U is not 2. U is 3. U is Last modified on March 1 st , 2012 at 10:07 Gerard Vreeswijk. Slide 15

Multi-agent learning Gradient Dynamics Convergence of IGA (Singh et al. , 2000) rep ello r . Then it repels Proof outline . There are two main movement which then cases: becomes stationary. (b) The stationary point is a 1. There is no stationary point, or the stationary point lies outside [ 0, 1 ] 2 . Then there is movement movement towards the everywhere in [ 0, 1 ] 2 . boundary. (c) Both (2a) and (2b): saddle Since movement is caused by an point. affine differential map the flow is in one direction, hence gets stuck (d) None of the above. Then plain somewhere at the boundary. IGA does not converge. attra to r . Then it attracts 2. There is a stationary point In three out of four cases, the inside [ 0, 1 ] 2 . dynamics ends, hence ends in Nash. (a) The stationary point is an Case (2a) and (2b) actually do not occur in isolation. Last modified on March 1 st , 2012 at 10:07 Gerard Vreeswijk. Slide 16

Multi-agent learning Gradient Dynamics IGA-WoLF (Bowling et al. , 2001) Bowling and Veloso modify IGA as to ensure convergence in Case 2d. Idea: Win or Learn Fast (WoLF). To this end, IGA-WoLF uses a variable step: � � � � � � l r t · ∂ V r / ∂α α α = + η Winning l c t · ∂ V c / ∂β β β t + 1 t t Losing where l { r , c } ∈ { l min , l max } all positive. (Bowling et al. use [ l min , l max ] .) t Winning � Losing if V r ( α t , β t ) > V r ( α e , β t ) l min l r t = Def l max otherwise � if V c ( α t , β t ) > V c ( α t , β e ) l min l c t = Def l max otherwise where α e is a row strategy belonging to some NE, chosen by row player. Similarly for β e and column player. Thus, ( α e , β e ) need not be Nash. Last modified on March 1 st , 2012 at 10:07 Gerard Vreeswijk. Slide 17

Multi-agent learning Gerard Vreeswijk , Intelligent Systems Group, - PowerPoint PPT Presentation

Multi-agent learning Gradient Dynamics Gradient Dynamis Multi-agent learning Gerard Vreeswijk , Intelligent Systems Group, Computer Science Department, Faculty of Sciences, Utrecht University, The Netherlands. Last modified on March 1 st ,

Multi-agent learning Multi-agent reinforcement learning Gerard Vreeswijk , Intelligent Systems

Overview Multi-Agent Systems Introduction to multi-agent systems and agent societies Agent

Multi-agent learning Gerard Vreeswijk , Intelligent Systems Group, Computer Science Department,

REINFORCEMENT LEARNING IN MULTI-AGENT SYSTEMS MACHINE LEARNING MEETUP DR. ANA PELETEIRO

An Agent Architecture An Agent Architecture An Agent Architecture An Agent Architecture for

S S S S calable calable Agent calable calable Agent Agent Plat forms Agent Plat forms

Agent-Based Systems Agent communication Speech act theory Michael Rovatsos Agent

Multi-agent learning Simplied Poker Yannick Bitane , April 14th, 2011. Yannick Bitane. Slides

Learning Agent Learning Agents An Agent that observes its performance and adapts its

ROMA: Multi-Agent Reinforcement Learning with Emerging Roles Tonghan Wang, Heng Dong, Victor

The Player Agent The Player Agent Are they the most important league official right now? right

Rational Agents (Ch. 2) Rational agent An agent/robot must be able to perceive and interact with

Agent-Based Systems Michael Rovatsos mrovatso@inf.ed.ac.uk Lecture 6 Agent Communication 1

Agent Training Welcome Blues Agent Portal Training e-Learning on the BCBSM Agent Portal

W HAT S AN A GENT ? Weiss, p. 29 [after Wooldridge and Jennings]: An agent is a

M ULTI -A GENT S YSTEMS Overview and Research Directions Whats an agent? AI Class 12 (C H .

Finding Strongly Connected Components Directed Acyclic Graphs Directed Acyclic Graphs Directed

D15 Facilities - Part 2 Feasibility Study Findings & Facility Master Planning November 13,

Isogeometric Analysis for high-order two-point singularly perturbed problems of

IGA Lecture III: Twisted Spin c structures Eckhard Meinrenken Adelaide, September 7, 2011

IGA-based Multi-Index Stochastic Collocation for Uncertainty Quantification J. Beck 1 , L.

Without understanding usefulness Yes, I know the intranet system, but it is more convenient

trs r t tr ts

IGA BEM for Maxwell Eigenvalue Problems RICAM Workshop Analysis and Numerics of Acoustic and

Multi-agent learning Gerard Vreeswijk , Intelligent Systems Group, - PowerPoint PPT Presentation

Multi-agent learning Gradient Dynamics Gradient Dynamis Multi-agent learning Gerard Vreeswijk , Intelligent Systems Group, Computer Science Department, Faculty of Sciences, Utrecht University, The Netherlands. Last modified on March 1 st ,

Multi-agent learning Multi-agent reinforcement learning Gerard Vreeswijk , Intelligent Systems

Overview Multi-Agent Systems Introduction to multi-agent systems and agent societies Agent

Multi-agent learning Gerard Vreeswijk , Intelligent Systems Group, Computer Science Department,

REINFORCEMENT LEARNING IN MULTI-AGENT SYSTEMS MACHINE LEARNING MEETUP DR. ANA PELETEIRO

An Agent Architecture An Agent Architecture An Agent Architecture An Agent Architecture for

S S S S calable calable Agent calable calable Agent Agent Plat forms Agent Plat forms

Agent-Based Systems Agent communication Speech act theory Michael Rovatsos Agent

Multi-agent learning Simplied Poker Yannick Bitane , April 14th, 2011. Yannick Bitane. Slides

Learning Agent Learning Agents An Agent that observes its performance and adapts its

ROMA: Multi-Agent Reinforcement Learning with Emerging Roles Tonghan Wang, Heng Dong, Victor

The Player Agent The Player Agent Are they the most important league official right now? right

Rational Agents (Ch. 2) Rational agent An agent/robot must be able to perceive and interact with

Agent-Based Systems Michael Rovatsos mrovatso@inf.ed.ac.uk Lecture 6 Agent Communication 1

Agent Training Welcome Blues Agent Portal Training e-Learning on the BCBSM Agent Portal

W HAT S AN A GENT ? Weiss, p. 29 [after Wooldridge and Jennings]: An agent is a

M ULTI -A GENT S YSTEMS Overview and Research Directions Whats an agent? AI Class 12 (C H .

Finding Strongly Connected Components Directed Acyclic Graphs Directed Acyclic Graphs Directed

D15 Facilities - Part 2 Feasibility Study Findings &amp; Facility Master Planning November 13,

Isogeometric Analysis for high-order two-point singularly perturbed problems of

IGA Lecture III: Twisted Spin c structures Eckhard Meinrenken Adelaide, September 7, 2011

IGA-based Multi-Index Stochastic Collocation for Uncertainty Quantification J. Beck 1 , L.

Without understanding usefulness Yes, I know the intranet system, but it is more convenient

trs r t tr ts

IGA BEM for Maxwell Eigenvalue Problems RICAM Workshop Analysis and Numerics of Acoustic and

D15 Facilities - Part 2 Feasibility Study Findings & Facility Master Planning November 13,