Multi-agent learning Multi-agent reinforcement learning Gerard - PowerPoint PPT Presentation

Multi-agent learning Multi-agent reinforcement learning Multi-agent learning Multi-agent reinforcement learning Gerard Vreeswijk , Intelligent Systems Group, Computer Science Department, Faculty of Sciences, Utrecht University, The Netherlands. Gerard Vreeswijk. Slides last processed on Thursday 25 th March, 2010 at 17:32h. Slide 1

Multi-agent learning Multi-agent reinforcement learning Research questions 1. Are there differences between (a) Independent Learners ( IL ) Agents that attempt to learn i. The values of single actions (single-action RL). (b) Joint Action Learners ( JAL ) Agents that attempt to learn both i. The values of joint actions (multi-action RL). ii. The behaviour employed by other agents (Fictitious Play). 2. Are RL algorithms guaranteed to converge in multi-agent settings? If so, do they converge to equilibria? Are these equilibria optimal? 3. How are rates of convergence and limit points influenced by the system structure and action selection strategies? Claus et al. address some of these questions in a limited setting, namely, A repeated cooperative two-player multiple-action game in strategic form. Gerard Vreeswijk. Slides last processed on Thursday 25 th March, 2010 at 17:32h. Slide 2

Multi-agent learning Multi-agent reinforcement learning Cited work Claus and Boutilier (1998). “The Dynamics of Reinforcement Learning in Cooperative Multia- gent Systems” in: Proc. of the Fifteenth National Conf. on Artificial Intelligence , pp. 746-752. The paper on which this presentation is mostly based on. Watkins and Dayan (1992). “Q-learning”. Machine Learning , Vol. 8 , pp. 279-292. Mainly the result that Q-learning converges to the optimum action-values with probability one as long as all actions are repeatedly sampled in all states and the action-values are represented discretely. Fudenberg, D. and D. Kreps (1993): “Learning Mixed Equilibria,” Games and Economic Behavior , Vol. 5 , pp. 320-367. Mainly Proposition 6.1 and its proof pp. 342-344. Gerard Vreeswijk. Slides last processed on Thursday 25 th March, 2010 at 17:32h. Slide 3

Multi-agent learning Multi-agent reinforcement learning Q-learning • The general version of Q-learning • Single-state reinforcement is multi-state and amounts to learning rule: continuously updating the Q new ( a ) = ( 1 − λ ) Q old ( a ) + λ · r various Q ( s , a ) with • Two sufficient conditions for r ( s , a , s ′ ) + γ · max Q ( s ′ , a ) (1) a convergence in Q-learning (Watkins, Dayan, 1992): • In the present setting, there is only one state (namely, the stage 1. Parameter λ decreases through time such that ∑ t λ is game G ) so that (1) reduces to divergent and ∑ t λ 2 is r ( s , a , s ) convergent. which may be abbreviated to r ( a ) 2. All actions are sampled infinitely often. or even r . Gerard Vreeswijk. Slides last processed on Thursday 25 th March, 2010 at 17:32h. Slide 4

Multi-agent learning Multi-agent reinforcement learning Exploitive vs. non-exploitive exploration Convergence on Q-learning does not depend on the exploration strategy used. (It is just that all actions must be sampled infinitely often.) Non-exploitive exploration This is like what happens in the ǫ -part of ǫ -greedy learning. Exploitive exploration Even during exploration, there is a probabilistic bias to exploring optimal actions. Example . Boltzmann exploration (a.k.a. soft max, mixed logit, or quantal response function): e Q ( a ) / T ∑ a ′ e Q ( a ′ ) / T with T > 0. Letting T → 0 establishes convergence conditions (1) and (2) as mentioned above (Watkins, Dayan, 1992). Gerard Vreeswijk. Slides last processed on Thursday 25 th March, 2010 at 17:32h. Slide 5

Multi-agent learning Multi-agent reinforcement learning Independent Learning (IL) • A MARL algorithm is an performed by other agents. independent learner (IL) algorithm • Typical conditions for if the agents learn Q-values for Independent Learning: their individual actions. – An agent is unaware of the • Experiences for agent i take the existence of other agents. form � a i , r ( a i ) � where a i is the – It cannot identify other agent’s action performed by i and r ( a i ) is actions, or has no reason to a reward for action a i . believe that other agents are • Learning is based on acting strategically. Q new ( a ) = ( 1 − λ ) Q old ( a ) + λ · r ( a ) Of course, even if an agent can learn through joint actions, it may ILs perform their actions, obtain a still choose to ignore information reward and update their Q-values about the other agents’ behaviour. without regard to the actions Gerard Vreeswijk. Slides last processed on Thursday 25 th March, 2010 at 17:32h. Slide 6

Multi-agent learning Multi-agent reinforcement learning Joint-Action Learning (JAL) • Joint Q-values are estimated e.g., fictitious play: rewards for joint actions. f i ( a − i ) = Def Π j � = i φ j ( a − i ) For a 2 × 2 game an agent would have to maintain Q ( T , L ) , where φ j ( a − i ) is i ’s empirical Q ( T , R ) , Q ( B , L ) , and Q ( B , R ) . distribution of j ’s actions on a − i . • Row can only influence T , B but • The expected value of an individual not opponent’s actions L , R . action is the sum of joint Let a i be an action of player i . A Q-values, weighed by the complementary joint action profile estimated probability of the is a set of joint actions a − i such associated complementary joint that a = a i ∪ a − i is a complete action profiles: joint action profile. ∑ EV ( a i ) = Q ( a i ∪ a − i ) f i ( a − i ) • Opponent’s actions can be a − i ∈ A − i estimated through forecast by, Gerard Vreeswijk. Slides last processed on Thursday 25 th March, 2010 at 17:32h. Slide 7

Multi-agent learning Multi-agent reinforcement learning Comparing Independent and Joint-Action Learners Case 1: the coordination game agents through fictitious play, and plays a softmax best response. L R A JAL computes singular Q-values � � T 10 0 by means of explicit belief B 0 10 distributions on joint Q-values. Thus, • A JAL is able to distinguish Q-values of different joint actions ∑ EV ( a i ) = Q ( a i ∪ a − i ) f i ( a − i ) a = a i ∪ a − i . a − i ∈ A − i • However, its ability to use this is more or less the same as the information is circumscribed by the Q-values learned by ILs. limited freedom of its own actions • Thus even though a JAL may be a i ∈ A i . fairly sure of the relative Q-values • A JAL maintains beliefs f ( a i ) about of its joint actions, it seems it the strategy being played by other cannot really benefit from this. Gerard Vreeswijk. Slides last processed on Thursday 25 th March, 2010 at 17:32h. Slide 8

Multi-agent learning Multi-agent reinforcement learning Figure 1: Convergence of coordination for ILs and JALs (aver- aged over 100 trials). Gerard Vreeswijk. Slides last processed on Thursday 25 th March, 2010 at 17:32h. Slide 9

Multi-agent learning Multi-agent reinforcement learning Comparing Independent and Joint-Action Learners Case 1: the coordination game agents (fictitious play) and plays a softmax best response. L R A JAL computes single Q-values � � T 10 0 by means of explicit belief B 0 10 distributions on joint Q-values. Thus, • A JAL is able to distinguish ∑ EV ( a i ) = Q ( a i ∪ a − i ) f i ( a − i ) Q-values of different joint actions a − i ∈ A − i a = a i ∪ a − i . is more or less the same as the • However, its ability to use this Q-values learned by ILs. information is circumscribed by the limited freedom of its own actions • Thus even though a JAL may be a i ∈ A i . fairly sure of the relative Q-values of its joint actions, it seems it • A JAL maintains beliefs f ( a i ) about cannot really benefit from this. the strategy being played by other Gerard Vreeswijk. Slides last processed on Thursday 25 th March, 2010 at 17:32h. Slide 10

Multi-agent learning Multi-agent reinforcement learning Case 2: Penalty game B on average very L M R unattractive, and will   T 10 0 k converge to C . C 0 2 0     3. Therefore, Col will find T and B k 0 10 B slightly less attractive, and will converge to C as well. Suppose penalty k = − 100. The following stories are entirely JAL 1. Initially, Column explores. symmetrical for Row and Column. 2. Therefore Row gives low EV to T and B . Plays C the most. IL 1. Initially, Column explores. 3. Convergence to ( C , M ) . 2. Therefore, Row wil find T and Gerard Vreeswijk. Slides last processed on Thursday 25 th March, 2010 at 17:32h. Slide 11

Multi-agent learning Multi-agent reinforcement learning Figure 2: Likelihood of convergence to opt. equilib- rium as a function of penalty k (100 trials). Gerard Vreeswijk. Slides last processed on Thursday 25 th March, 2010 at 17:32h. Slide 12

Multi-agent learning Multi-agent reinforcement learning Gerard - PowerPoint PPT Presentation

Multi-agent learning Multi-agent reinforcement learning Multi-agent learning Multi-agent reinforcement learning Gerard Vreeswijk , Intelligent Systems Group, Computer Science Department, Faculty of Sciences, Utrecht University, The Netherlands.

Overview Multi-Agent Systems Introduction to multi-agent systems and agent societies Agent

Multi-agent learning Gerard Vreeswijk , Intelligent Systems Group, Computer Science Department,

REINFORCEMENT LEARNING IN MULTI-AGENT SYSTEMS MACHINE LEARNING MEETUP DR. ANA PELETEIRO

An Agent Architecture An Agent Architecture An Agent Architecture An Agent Architecture for

S S S S calable calable Agent calable calable Agent Agent Plat forms Agent Plat forms

Agent-Based Systems Agent communication Speech act theory Michael Rovatsos Agent

Multi-agent learning Simplied Poker Yannick Bitane , April 14th, 2011. Yannick Bitane. Slides

Learning Agent Learning Agents An Agent that observes its performance and adapts its

ROMA: Multi-Agent Reinforcement Learning with Emerging Roles Tonghan Wang, Heng Dong, Victor

The Player Agent The Player Agent Are they the most important league official right now? right

Rational Agents (Ch. 2) Rational agent An agent/robot must be able to perceive and interact with

Agent-Based Systems Michael Rovatsos mrovatso@inf.ed.ac.uk Lecture 6 Agent Communication 1

Agent Training Welcome Blues Agent Portal Training e-Learning on the BCBSM Agent Portal

W HAT S AN A GENT ? Weiss, p. 29 [after Wooldridge and Jennings]: An agent is a

M ULTI -A GENT S YSTEMS Overview and Research Directions Whats an agent? AI Class 12 (C H .

Agent-Based Systems Agent: autonomous Learning for Agent-Based Systems Environment: fully,

Statistical physics of agent-based systems: Learning dynamics and complex co-operative behaviour

Game Theory Intro CMPUT 654: Modelling Human Strategic Behaviour S&LB 3.2-3.3.3 Recap:

Visualizing Prosopographical Data based on Correspondences Next steps towards releasing the data

Automatic Verification of Automatic Verification of Automatic Verification of Automatic

Asymptotic behavior of Multiscaled Gradient Dynamics. Applications to Coupled systems, Games and

Coordination-free query evaluation and multi-query optimization in parallel and distributed

Trading Coordinat ion For Randomness Szymon Chachulski Mike J ennings, Sachin Kat t i, and Dina

for anonymity, but quite a lot is. Debajyoti Das 1 Sebastian Meiser 2 Esfandiar Mohammadi 3 Aniket

Multi-agent learning Multi-agent reinforcement learning Gerard - PowerPoint PPT Presentation

Multi-agent learning Multi-agent reinforcement learning Multi-agent learning Multi-agent reinforcement learning Gerard Vreeswijk , Intelligent Systems Group, Computer Science Department, Faculty of Sciences, Utrecht University, The Netherlands.

Overview Multi-Agent Systems Introduction to multi-agent systems and agent societies Agent

Multi-agent learning Gerard Vreeswijk , Intelligent Systems Group, Computer Science Department,

REINFORCEMENT LEARNING IN MULTI-AGENT SYSTEMS MACHINE LEARNING MEETUP DR. ANA PELETEIRO

An Agent Architecture An Agent Architecture An Agent Architecture An Agent Architecture for

S S S S calable calable Agent calable calable Agent Agent Plat forms Agent Plat forms

Agent-Based Systems Agent communication Speech act theory Michael Rovatsos Agent

Multi-agent learning Simplied Poker Yannick Bitane , April 14th, 2011. Yannick Bitane. Slides

Learning Agent Learning Agents An Agent that observes its performance and adapts its

ROMA: Multi-Agent Reinforcement Learning with Emerging Roles Tonghan Wang, Heng Dong, Victor

The Player Agent The Player Agent Are they the most important league official right now? right

Rational Agents (Ch. 2) Rational agent An agent/robot must be able to perceive and interact with

Agent-Based Systems Michael Rovatsos mrovatso@inf.ed.ac.uk Lecture 6 Agent Communication 1

Agent Training Welcome Blues Agent Portal Training e-Learning on the BCBSM Agent Portal

W HAT S AN A GENT ? Weiss, p. 29 [after Wooldridge and Jennings]: An agent is a

M ULTI -A GENT S YSTEMS Overview and Research Directions Whats an agent? AI Class 12 (C H .

Agent-Based Systems Agent: autonomous Learning for Agent-Based Systems Environment: fully,

Statistical physics of agent-based systems: Learning dynamics and complex co-operative behaviour

Game Theory Intro CMPUT 654: Modelling Human Strategic Behaviour S&amp;LB 3.2-3.3.3 Recap:

Visualizing Prosopographical Data based on Correspondences Next steps towards releasing the data

Automatic Verification of Automatic Verification of Automatic Verification of Automatic

Asymptotic behavior of Multiscaled Gradient Dynamics. Applications to Coupled systems, Games and

Coordination-free query evaluation and multi-query optimization in parallel and distributed

Trading Coordinat ion For Randomness Szymon Chachulski Mike J ennings, Sachin Kat t i, and Dina

for anonymity, but quite a lot is. Debajyoti Das 1 Sebastian Meiser 2 Esfandiar Mohammadi 3 Aniket

Game Theory Intro CMPUT 654: Modelling Human Strategic Behaviour S&LB 3.2-3.3.3 Recap: