Multi-agent learning Gerard Vreeswijk , Intelligent Systems Group, - PowerPoint PPT Presentation

Multi-agent learning Multi-agent reinforcement learning Multi-agent reinfo r ement lea rning Multi-agent learning Gerard Vreeswijk , Intelligent Systems Group, Computer Science Department, Faculty of Sciences, Utrecht University, The Netherlands. Last modified on April 3 rd , 2014 at 13:17 Gerard Vreeswijk. Slide 1

Multi-agent learning Multi-agent reinforcement learning Indep endent Lea rners ( IL ) Agents that attempt to learn Research questions Joint A tion Lea rners ( JAL ) Agents that attempt to learn both 1. Are there differences between (a) i. The values of single actions (single-action RL). (b) i. The values of joint actions (multi-action RL). ii. The behaviour employed by other agents (Fictitious Play). 2. Are RL algorithms guaranteed to converge in multi-agent settings? If so, do they converge to equilibria? Are these equilibria optimal? 3. How are rates of convergence and limit points influenced by the system structure and action selection strategies? Claus et al. address some of these questions in a limited setting, namely, A repeated cooperative two-player multiple-action game in strategic form. Last modified on April 3 rd , 2014 at 13:17 Gerard Vreeswijk. Slide 2

Multi-agent learning Multi-agent reinforcement learning Cited work Claus and Boutilier (1998). “The Dynamics of Reinforcement Learning in Cooperative Multia- gent Systems” in: Proc. of the Fifteenth National Conf. on Artificial Intelligence , pp. 746-752. The paper on which this presentation is mostly based on. Watkins and Dayan (1992). “Q-learning”. Machine Learning , Vol. 8 , pp. 279-292. Mainly the result that Q-learning converges to the optimum action-values with probability one as long as all actions are repeatedly sampled in all states and the action-values are represented discretely. Fudenberg, D. and D. Kreps (1993): “Learning Mixed Equilibria,” Games and Economic Behavior , Vol. 5 , pp. 320-367. Mainly Proposition 6.1 and its proof pp. 342-344. Last modified on April 3 rd , 2014 at 13:17 Gerard Vreeswijk. Slide 3

Multi-agent learning Multi-agent reinforcement learning Q-learning • The general version of Q-learning • Single-state reinforcement is multi-state and amounts to learning rule: continuously updating the Q new ( a ) = ( 1 − λ ) Q old ( a ) + λ · r various Q ( s , a ) with • Two sufficient conditions for r ( s , a , s ′ ) + γ · max Q ( s ′ , a ) (1) a convergence in Q-learning (Watkins, Dayan, 1992): • In the present setting, there is only one state (namely, the stage 1. Parameter λ decreases through time such that ∑ t λ is game G ) so that (1) reduces to divergent and ∑ t λ 2 is r ( s , a , s ) convergent. which may be abbreviated to r ( a ) 2. All actions are sampled infinitely often. or even r . Last modified on April 3 rd , 2014 at 13:17 Gerard Vreeswijk. Slide 4

Multi-agent learning Multi-agent reinforcement learning Exploitive vs. non-exploitive exploration Non-exploitive explo ration This is like what happens in the ǫ -part of ǫ -greedy Convergence on Q-learning does not depend on the exploration strategy Exploitive explo ration Even during exploration, there is a probabilistic bias to used. (It is just that all actions must be sampled infinitely often.) learning. exploring optimal actions. Example . Boltzmann exploration (a.k.a. soft max, mixed logit, or quantal response function): e Q ( a ) / T ∑ a ′ e Q ( a ′ ) / T with T > 0. Letting T → 0 establishes convergence conditions (1) and (2) as mentioned above (Watkins, Dayan, 1992). Last modified on April 3 rd , 2014 at 13:17 Gerard Vreeswijk. Slide 5

Multi-agent learning Multi-agent reinforcement learning indep endent lea rner (IL) algorithm Independent Learning (IL) • A MARL algorithm is an performed by other agents. • Typical conditions for if the agents learn Q-values for Independent Learning: their individual actions. – An agent is unaware of the • Experiences for agent i take the existence of other agents. form � a i , r ( a i ) � where a i is the – It cannot identify other agent’s action performed by i and r ( a i ) is actions, or has no reason to a reward for action a i . believe that other agents are • Learning is based on acting strategically. Q new ( a ) = ( 1 − λ ) Q old ( a ) + λ · r ( a ) Of course, even if an agent can learn through joint actions, it may ILs perform their actions, obtain a still choose to ignore information reward and update their Q-values about the other agents’ behaviour. without regard to the actions Last modified on April 3 rd , 2014 at 13:17 Gerard Vreeswijk. Slide 6

Multi-agent learning Multi-agent reinforcement learning Joint Q-values are estimated Joint-Action Learning (JAL) • e.g., fictitious play: rewards for joint actions. f i ( a − i ) = Def Π j � = i φ j ( a − i ) exp e ted value of an individual For a 2 × 2 game an agent would a tion is the sum of joint have to maintain Q ( T , L ) , where φ j ( a − i ) is i ’s empirical Q ( T , R ) , Q ( B , L ) , and Q ( B , R ) . distribution of j ’s actions on a − i . omplementa ry joint a tion p ro�le • Row can only influence T , B but • The not opponent’s actions L , R . Let a i be an action of player i . A Q-values, weighed by the estimated probability of the is a set of joint actions a − i such associated complementary joint that a = a i ∪ a − i is a complete action profiles: joint action profile. ∑ EV ( a i ) = Q ( a i ∪ a − i ) f i ( a − i ) • Opponent’s actions can be a − i ∈ A − i estimated through forecast by, Last modified on April 3 rd , 2014 at 13:17 Gerard Vreeswijk. Slide 7

Multi-agent learning Multi-agent reinforcement learning Comparing Independent and Joint-Action Learners Case 1: the coordination game agents through fictitious play, and plays a softmax best response. L R A JAL computes singular Q-values � � T 10 0 by means of explicit belief B 0 10 distributions on joint Q-values. Thus, • A JAL is able to distinguish Q-values of different joint actions ∑ EV ( a i ) = Q ( a i ∪ a − i ) f i ( a − i ) a = a i ∪ a − i . a − i ∈ A − i • However, its ability to use this is more or less the same as the information is circumscribed by the Q-values learned by ILs. limited freedom of its own actions • Thus even though a JAL may be a i ∈ A i . fairly sure of the relative Q-values • A JAL maintains beliefs f ( a i ) about of its joint actions, it seems it the strategy being played by other cannot really benefit from this. Last modified on April 3 rd , 2014 at 13:17 Gerard Vreeswijk. Slide 8

Figure 1: Multi-agent learning Multi-agent reinforcement learning Convergence of coordination for ILs and JALs (aver- aged over 100 trials). Last modified on April 3 rd , 2014 at 13:17 Gerard Vreeswijk. Slide 9

Multi-agent learning Multi-agent reinforcement learning Comparing Independent and Joint-Action Learners Case 1: the coordination game agents (fictitious play) and plays a softmax best response. L R A JAL computes single Q-values � � T 10 0 by means of explicit belief B 0 10 distributions on joint Q-values. Thus, • A JAL is able to distinguish ∑ EV ( a i ) = Q ( a i ∪ a − i ) f i ( a − i ) Q-values of different joint actions a − i ∈ A − i a = a i ∪ a − i . is more or less the same as the • However, its ability to use this Q-values learned by ILs. information is circumscribed by the limited freedom of its own actions • Thus even though a JAL may be a i ∈ A i . fairly sure of the relative Q-values of its joint actions, it seems it • A JAL maintains beliefs f ( a i ) about cannot really benefit from this. the strategy being played by other Last modified on April 3 rd , 2014 at 13:17 Gerard Vreeswijk. Slide 10

Multi-agent learning Multi-agent reinforcement learning Case 2: Penalty game B on average very L M R unattractive, and will   T 10 0 k converge to C . C 0 2 0     3. Therefore, Col will find T and B k 0 10 B slightly less attractive, and will converge to C as well. Suppose penalty k = − 100. The following stories are entirely JAL 1. Initially, Column explores. symmetrical for Row and Column. 2. Therefore Row gives low EV to T and B . Plays C the most. IL 1. Initially, Column explores. 3. Convergence to ( C , M ) . 2. Therefore, Row wil find T and Last modified on April 3 rd , 2014 at 13:17 Gerard Vreeswijk. Slide 11

Figure 2: Multi-agent learning Multi-agent reinforcement learning Likelihood of convergence to opt. equilib- rium as a function of penalty k (100 trials). Last modified on April 3 rd , 2014 at 13:17 Gerard Vreeswijk. Slide 12

Multi-agent learning Gerard Vreeswijk , Intelligent Systems Group, - PowerPoint PPT Presentation

Multi-agent learning Multi-agent reinforcement learning Multi-agent reinfo rement lea rning Multi-agent learning Gerard Vreeswijk , Intelligent Systems Group, Computer Science Department, Faculty of Sciences, Utrecht University, The

Multi-agent learning Multi-agent reinforcement learning Gerard Vreeswijk , Intelligent Systems

Overview Multi-Agent Systems Introduction to multi-agent systems and agent societies Agent

REINFORCEMENT LEARNING IN MULTI-AGENT SYSTEMS MACHINE LEARNING MEETUP DR. ANA PELETEIRO

An Agent Architecture An Agent Architecture An Agent Architecture An Agent Architecture for

S S S S calable calable Agent calable calable Agent Agent Plat forms Agent Plat forms

Agent-Based Systems Agent communication Speech act theory Michael Rovatsos Agent

Multi-agent learning Simplied Poker Yannick Bitane , April 14th, 2011. Yannick Bitane. Slides

Learning Agent Learning Agents An Agent that observes its performance and adapts its

ROMA: Multi-Agent Reinforcement Learning with Emerging Roles Tonghan Wang, Heng Dong, Victor

The Player Agent The Player Agent Are they the most important league official right now? right

Rational Agents (Ch. 2) Rational agent An agent/robot must be able to perceive and interact with

Agent-Based Systems Michael Rovatsos mrovatso@inf.ed.ac.uk Lecture 6 Agent Communication 1

Agent Training Welcome Blues Agent Portal Training e-Learning on the BCBSM Agent Portal

W HAT S AN A GENT ? Weiss, p. 29 [after Wooldridge and Jennings]: An agent is a

M ULTI -A GENT S YSTEMS Overview and Research Directions Whats an agent? AI Class 12 (C H .

Agent-Based Systems Agent: autonomous Learning for Agent-Based Systems Environment: fully,

A new project at North Mountain Park Unless we are willing to encourage our children to

Terminating Ring Exploration with Myopic Oblivious Robots GRASTA-MAC Open Problem Session

Sovereign debt, government myopia and the financial sector Viral V Acharya (NYU Stern, CEPR and

Coefficient of Correlation The regression equation Y = 0 + 1 x + shows the linear

Overcoming innovation barriers in large, traditional companies: understanding, tools &

The Cost of Non-Decreasing Pay: Tenured Academics and Civil Servants Stanimir Morfov State

AT THE END ) Emmanuel Farhi and Xavier Gabaix Harvard Lecture 3: September 2018 I NTRODUCTION

The disability blind spot in health care reform Harold Pollack University of Chicago

Multi-agent learning Gerard Vreeswijk , Intelligent Systems Group, - PowerPoint PPT Presentation

Multi-agent learning Multi-agent reinforcement learning Multi-agent reinfo rement lea rning Multi-agent learning Gerard Vreeswijk , Intelligent Systems Group, Computer Science Department, Faculty of Sciences, Utrecht University, The

Multi-agent learning Multi-agent reinforcement learning Gerard Vreeswijk , Intelligent Systems

Overview Multi-Agent Systems Introduction to multi-agent systems and agent societies Agent

REINFORCEMENT LEARNING IN MULTI-AGENT SYSTEMS MACHINE LEARNING MEETUP DR. ANA PELETEIRO

An Agent Architecture An Agent Architecture An Agent Architecture An Agent Architecture for

S S S S calable calable Agent calable calable Agent Agent Plat forms Agent Plat forms

Agent-Based Systems Agent communication Speech act theory Michael Rovatsos Agent

Multi-agent learning Simplied Poker Yannick Bitane , April 14th, 2011. Yannick Bitane. Slides

Learning Agent Learning Agents An Agent that observes its performance and adapts its

ROMA: Multi-Agent Reinforcement Learning with Emerging Roles Tonghan Wang, Heng Dong, Victor

The Player Agent The Player Agent Are they the most important league official right now? right

Rational Agents (Ch. 2) Rational agent An agent/robot must be able to perceive and interact with

Agent-Based Systems Michael Rovatsos mrovatso@inf.ed.ac.uk Lecture 6 Agent Communication 1

Agent Training Welcome Blues Agent Portal Training e-Learning on the BCBSM Agent Portal

W HAT S AN A GENT ? Weiss, p. 29 [after Wooldridge and Jennings]: An agent is a

M ULTI -A GENT S YSTEMS Overview and Research Directions Whats an agent? AI Class 12 (C H .

Agent-Based Systems Agent: autonomous Learning for Agent-Based Systems Environment: fully,

A new project at North Mountain Park Unless we are willing to encourage our children to

Terminating Ring Exploration with Myopic Oblivious Robots GRASTA-MAC Open Problem Session

Sovereign debt, government myopia and the financial sector Viral V Acharya (NYU Stern, CEPR and

Coefficient of Correlation The regression equation Y = 0 + 1 x + shows the linear

Overcoming innovation barriers in large, traditional companies: understanding, tools &amp;

The Cost of Non-Decreasing Pay: Tenured Academics and Civil Servants Stanimir Morfov State

AT THE END ) Emmanuel Farhi and Xavier Gabaix Harvard Lecture 3: September 2018 I NTRODUCTION

The disability blind spot in health care reform Harold Pollack University of Chicago

Overcoming innovation barriers in large, traditional companies: understanding, tools &