Multi-agent learning Erik Berbee & Bas van Gijzel , Master - - PowerPoint PPT Presentation

multi agent learning
SMART_READER_LITE
LIVE PREVIEW

Multi-agent learning Erik Berbee & Bas van Gijzel , Master - - PowerPoint PPT Presentation

Multi-agent learning Methodology of MAL research Metho dology of MAL resea rh Multi-agent learning Erik Berbee & Bas van Gijzel , Master Student AT, Utrecht University Erik Berbee & Bas van Gijzel. Slides last processed on Monday


slide-1
SLIDE 1

Multi-agent learning Methodology of MAL research

Multi-agent learning

Metho dology
  • f
MAL resea r h

Erik Berbee & Bas van Gijzel, Master Student AT, Utrecht University Erik Berbee & Bas van Gijzel. Slides last processed on Monday 22nd March, 2010 at 23:37h. Slide 1

slide-2
SLIDE 2

Multi-agent learning Methodology of MAL research

Overview

Today we will talk about...

  • Formal setting
  • Characteristics of multi-agent learning
  • Classes of techniques
  • Types of results
  • Agendas and criticism
  • Loose ends and questions

Erik Berbee & Bas van Gijzel. Slides last processed on Monday 22nd March, 2010 at 23:37h. Slide 2

slide-3
SLIDE 3

Multi-agent learning Methodology of MAL research

The Problem

  • No unified goals/agendas

– No unified formal setting

Erik Berbee & Bas van Gijzel. Slides last processed on Monday 22nd March, 2010 at 23:37h. Slide 3

slide-4
SLIDE 4

Multi-agent learning Methodology of MAL research

Formal setting: Stochastic games

  • Represented as a tuple (N, S,

A, R, T) – N is the set of agents – S is the set of n-agents stage games – A = A1, ..., An with Ai the set of actions (pure strategies) of agent i – R = R1, ..., Rn, with Ri : S × A → ℜ the reward function of agent i – T : S × A → Π(S) is a stochastic transition function – Restricted versions: Repeated game, MDP

Erik Berbee & Bas van Gijzel. Slides last processed on Monday 22nd March, 2010 at 23:37h. Slide 4

slide-5
SLIDE 5

Multi-agent learning Methodology of MAL research

Sidetrack: Replicator Dynamics

  • Represented as a tuple (A, P0, R)
  • A set of possible pure strategies/actions for the agents indexed 1,...,m
  • P0 initial distribution of agent across possible strategies, ∑m

i=1 P0(i)

  • R : A × A → ℜ the immediate reward function for each agent
  • Each Pt(a) is adjusted to the average reward
  • Can be seen as a repeated game between two agents playing the same

mixed strategy

Erik Berbee & Bas van Gijzel. Slides last processed on Monday 22nd March, 2010 at 23:37h. Slide 5

slide-6
SLIDE 6

Multi-agent learning Methodology of MAL research

Formal setting: Available Information

What information does an agent have?

  • Play is fully observable
  • Game is known
  • Opponents strategy is not know a priori

Erik Berbee & Bas van Gijzel. Slides last processed on Monday 22nd March, 2010 at 23:37h. Slide 6

slide-7
SLIDE 7

Multi-agent learning Methodology of MAL research

Sidetrack: Consequences of Restrictions on Information

  • fi(z) maps each state z to a probability distribution over i’s actions next

period

  • fi(z) is uncoupled if it does not depend on opponents’payoffs

Theorem 3. Given a finite action space A and positive integer s, there exist no uncoupled rules fi(z) whose state variable z is the last s plays, such that, for every game G on A, the period-by-period behaviors converge almost surely to a Nash equilibrium of G, or even to an ǫ-equilibrium of G, for all sufficiently small ǫ > 0.

* H.P. Young (2007): The possible and the impossible in multi-agent learning. In: Artificial Intelligence 171, pp. 429-433, 2007.

Erik Berbee & Bas van Gijzel. Slides last processed on Monday 22nd March, 2010 at 23:37h. Slide 7

slide-8
SLIDE 8

Multi-agent learning Methodology of MAL research

Characteristics of multi-agent learning

  • Learning and Teaching

– Teaching assumes learning

  • Equilibrium play not always best

Other: Stackelberg game Left Right

Y
  • u:

Up

(1, 0) (3, 2)

Down

(2, 1) (4, 0)

  • Agent can either learn opponents strategy or learn a strategy that does

well without learning the opponents strategy.

Erik Berbee & Bas van Gijzel. Slides last processed on Monday 22nd March, 2010 at 23:37h. Slide 8

slide-9
SLIDE 9

Multi-agent learning Methodology of MAL research

Model-based Learning

  • learns opponent’s strategy and play a best response
  • General scheme:
  • 1. Start with some model of the opponent’s strategy.
  • 2. Compute and play the best response.
  • 3. Observe the opponent’s play and update your model of her strategy.
  • 4. Goto step 2.

Erik Berbee & Bas van Gijzel. Slides last processed on Monday 22nd March, 2010 at 23:37h. Slide 9

slide-10
SLIDE 10

Multi-agent learning Methodology of MAL research

Model-based Learning: Fictitious Play

  • Model is a count of the plays by the opponent in the past
  • Model after (R, S, P, R, P) is (R = 0.4, P = 0.4, S = 0.2)
  • Other examples Are: smooth fictitious play, exponential fictitious play,

rational learninig

Erik Berbee & Bas van Gijzel. Slides last processed on Monday 22nd March, 2010 at 23:37h. Slide 10

slide-11
SLIDE 11

Multi-agent learning Methodology of MAL research

Model-free Learning

  • Learns how well own possible actions do
  • Most based on Bellman equation.
  • Basic algorithm:

– Initial value function V0 : S → ℜ for each state – Vk+1 ← R(s) + γ maxa ∑s′ T(s, a, s′)Vk(s′) – Optimal policy: for each s select a that maximizes ∑s′ T(s, a, s′)Vk(s′)

  • Q-Learning: Compute optimal policy with unknown reward and

transition functions

  • MAL: minimax-Q (zero sum), joint-action learner and Friend-or-Foe Q

(team-games), Nash-Q and CE-Q (general-sum games)

Erik Berbee & Bas van Gijzel. Slides last processed on Monday 22nd March, 2010 at 23:37h. Slide 11

slide-12
SLIDE 12

Multi-agent learning Methodology of MAL research

Regret minimization: No Regret Learning

  • No-regret learning
  • rt

i(aj, si|s−i) = ∑t k=1 R(aj, sk

−i) − R(sk

i , sk

−i)

  • If regret is positive agents selects each of its actions with probability

proportional to regret

Erik Berbee & Bas van Gijzel. Slides last processed on Monday 22nd March, 2010 at 23:37h. Slide 12

slide-13
SLIDE 13

Multi-agent learning Methodology of MAL research

Types of results from Learning Algorithms

  • 1. Convergence of the strategy profile to an (e.g., Nash) equilibrium of the

stage game in self play (that is, when all agents adopt the learning procedure under consideration).

  • 2. Successful learning of an opponent’s strategy (or opponents’ strategies).
  • 3. Obtaining payoffs that exceed a specified threshold.
  • Safe, at least minimax
  • Consistent, at leas as well as the best response to the empirical

distribution of play

Erik Berbee & Bas van Gijzel. Slides last processed on Monday 22nd March, 2010 at 23:37h. Slide 13

slide-14
SLIDE 14

Multi-agent learning Methodology of MAL research

Discussion

  • 1. Convergence in self play to play of the equilibrium of stage game
  • Nash Equilibrium of stage game useful?
  • Convergence of play vs convergence of payoff
  • Convergence in self play necessary?
  • 2. Most work assumes 2 players 2 actions, why?
  • 3. Obtaining payoffs that exceed a specified threshold (safe/consistent).
  • Excludes teaching?

Erik Berbee & Bas van Gijzel. Slides last processed on Monday 22nd March, 2010 at 23:37h. Slide 14

slide-15
SLIDE 15

Multi-agent learning Methodology of MAL research

Agendas of MAL

Shoham et al. try to make a

lassi ation of the agendas in multi-agent
  • learning. What are possible purposes of the current (and possibly future)

research done in MAL?

Erik Berbee & Bas van Gijzel. Slides last processed on Monday 22nd March, 2010 at 23:37h. Slide 15

slide-16
SLIDE 16

Multi-agent learning Methodology of MAL research

Introducing the Agenda (Shoham et al.)

  • Computational
  • Descriptive
  • Normative
  • Prescriptive - cooperative
  • Prescriptive - non-cooperative

Erik Berbee & Bas van Gijzel. Slides last processed on Monday 22nd March, 2010 at 23:37h. Slide 16

slide-17
SLIDE 17

Multi-agent learning Methodology of MAL research

Computational Agenda

  • Learning algorithms as an
iterative way to compute certain properties,

such as Nash equilibria, on a certain class of games. – Fictitious play calculates Nash equilibria for zero-sum games. – Replicator dynamics calculates Nash equilibria for symmetric games.

  • Quick and dirty

Erik Berbee & Bas van Gijzel. Slides last processed on Monday 22nd March, 2010 at 23:37h. Slide 17

slide-18
SLIDE 18

Multi-agent learning Methodology of MAL research

Computational Agenda: Addenda by Sandholm

  • Computing properties of games by using
dire t algo rithms.
  • Quick and dirty MAL algorithms as a last resort, when there is no good

direct algorithm available.

  • (MAL algorithms can be easier to program though.)

* T. Sandholm (2007): Perspectives on multiagent learning. In: Artificial Intelligence 171, pp. 382-391, 2007.

Erik Berbee & Bas van Gijzel. Slides last processed on Monday 22nd March, 2010 at 23:37h. Slide 18

slide-19
SLIDE 19

Multi-agent learning Methodology of MAL research

Descriptive Agenda

  • MAL as a
des ription for social and

economic behaviour.

  • Formal models of learning possibly

correspond to

p eople's b ehaviours.
  • Can be extended to the modelling
  • f
la rge p
  • pulations.
  • Descriptive agenda corresponds to

(most) usage of MAL in social sci- ences.

Erik Berbee & Bas van Gijzel. Slides last processed on Monday 22nd March, 2010 at 23:37h. Slide 19

slide-20
SLIDE 20

Multi-agent learning Methodology of MAL research

Descriptive Agenda: Addenda by Sandholm

Problem: Humans might not have the required rationality to act according to game-theoretic equilibrium. But this is exactly what the descriptive agenda wants to model!

Erik Berbee & Bas van Gijzel. Slides last processed on Monday 22nd March, 2010 at 23:37h. Slide 20

slide-21
SLIDE 21

Multi-agent learning Methodology of MAL research

Descriptive Agenda: Addenda by Sandholm(2)

Solution:

  • Find "human" MAL techniques/learning algorithms that lead to

equilibrium play.

  • Give a clear
denition of what constitutes a MAL algorithm. (He does not

get to a clear definition himself.)

Erik Berbee & Bas van Gijzel. Slides last processed on Monday 22nd March, 2010 at 23:37h. Slide 21

slide-22
SLIDE 22

Multi-agent learning Methodology of MAL research

Descriptive Agenda: Addenda by Sandholm(2)

Solution:

  • Find "human" MAL techniques/learning algorithms that lead to

equilibrium play.

  • Give a clear
denition of what constitutes a MAL algorithm. (He does not

get to a clear definition himself.) Conclusion by Sandholm: More work to do!

Erik Berbee & Bas van Gijzel. Slides last processed on Monday 22nd March, 2010 at 23:37h. Slide 22

slide-23
SLIDE 23

Multi-agent learning Methodology of MAL research

Descriptive Agenda: More criticism

More authors especially from the social sciences departments criticize the descriptive feature of game theory for human populations.

Erik Berbee & Bas van Gijzel. Slides last processed on Monday 22nd March, 2010 at 23:37h. Slide 23

slide-24
SLIDE 24

Multi-agent learning Methodology of MAL research

Descriptive Agenda: More criticism(2)

  • Empirical results gained from experiments (on human populations)

contradict intuitions gained from game theory. – People do not play the Prisoner’s Dilemma according to game theory (Guala). Again this can be criticized (also by Guala himself). –

Ba kw a rd indu tion is not always supported by results from

evolutionary game theory (for example Mailath).

  • Complete rationality of humans (or even bounded rationality) is not an
  • bvious fact.
  • Is the
  • mmon
kno wledge
  • f
rationalit y (CKR) assumption a correct one?

* F. Guala (2006): Has Game Theory Been Refuted?. In: Journal of philosophy 103, pp. 239-263, 2006. * G. Mailath (1998): Do People Play Nash Equilibrium? Lessons from Evolutionary Game Theory. In: Journal of Economic Literature 36, pp. 1347-1374, 1998.

Erik Berbee & Bas van Gijzel. Slides last processed on Monday 22nd March, 2010 at 23:37h. Slide 24

slide-25
SLIDE 25

Multi-agent learning Methodology of MAL research

Normative Agenda

"Normative for lack of a better term".

  • Determines sets of learning rules that are in
equilib rium with each other.

– Q-Learning and fictitious play if appropriately initialised. – A lot of learning algorithms when doing self-play.

  • Most research is originated in AI.

Erik Berbee & Bas van Gijzel. Slides last processed on Monday 22nd March, 2010 at 23:37h. Slide 25

slide-26
SLIDE 26

Multi-agent learning Methodology of MAL research

Normative Agenda: Addenda by Sandholm

Sandholm considers the normative agenda as a special case of the non-cooperative prescriptive agenda. More on this later.

Erik Berbee & Bas van Gijzel. Slides last processed on Monday 22nd March, 2010 at 23:37h. Slide 26

slide-27
SLIDE 27

Multi-agent learning Methodology of MAL research

Prescriptive Agenda

  • Prescribes how an agent should learn.
  • Can be split into two different types:

– Cooperative – Non-cooperative

Erik Berbee & Bas van Gijzel. Slides last processed on Monday 22nd March, 2010 at 23:37h. Slide 27

slide-28
SLIDE 28

Multi-agent learning Methodology of MAL research

Prescriptive Agenda - cooperative

  • Multiple agents doing the same game in a team setting.
  • One setting is the
distributed control in a dynamic system.

– Naturally modelled as a stochastic game with common pay-off. – Can be seen as a distributed system with a common purpose. – Distributed agents all have the

same (learning) algorithm
  • Close to distributed computing.

This can actually be argued to not be a learning setting, because Shoham et al. assume a full information stochastic game. (See next slide.)

Erik Berbee & Bas van Gijzel. Slides last processed on Monday 22nd March, 2010 at 23:37h. Slide 28

slide-29
SLIDE 29

Multi-agent learning Methodology of MAL research

Prescriptive Agenda - cooperative: Addenda by Sandholm

First the criticism:

  • A team setting is not really a learning setting. If the full game is known,

agents can compute the

  • ptimal
p
  • li y and apply it.
  • The setting is not necessarily even multi-agent.

– Incentives of agents are aligned and payoffs are identical. – Considering that all agents have equal (learning) algorithms, why not use

  • ne
agent?

Erik Berbee & Bas van Gijzel. Slides last processed on Monday 22nd March, 2010 at 23:37h. Slide 29

slide-30
SLIDE 30

Multi-agent learning Methodology of MAL research

Prescriptive Agenda - cooperative: Addenda by Sandholm

Now the constructive part:

  • Learning:

– Instead, consider

unkno wn games (with unknown payoffs), thus

requiring agents to learn the game and to learn to play the game.

  • Multi-agent:

– A problem might be considered multi-agent because it is inherently

  • distributed. (robot soccer)

– More agents can give

fault-toleran e for a system.

Erik Berbee & Bas van Gijzel. Slides last processed on Monday 22nd March, 2010 at 23:37h. Slide 30

slide-31
SLIDE 31

Multi-agent learning Methodology of MAL research

Prescriptive Agenda - non-cooperative

  • Convergence to an equilibrium is not a goal in itself.
  • The environment is determined by the
lasses
  • f
  • pp
  • nent
agents (with

possibly different strategies).

  • Agents should obtain "high
rew a rd" payoffs in a non-cooperative

(stochastic game) setting.

Erik Berbee & Bas van Gijzel. Slides last processed on Monday 22nd March, 2010 at 23:37h. Slide 31

slide-32
SLIDE 32

Multi-agent learning Methodology of MAL research

Prescriptive Agenda - non-cooperative (2)

Multiple demands possible on the learning algorithms or the rewards gained:

  • Learning algorithms should:

  • nverge to a stationary policy.

– converge to a

b est resp
  • nse if the opponent is stationary.
  • Possible categories of high reward:

– the amount of

regret.

– payoffs within a certain ǫ of the

  • ptimal
pa y
  • possible against an
  • pponent.

– payoffs within a certain ǫ of the

se urit y level (minmax).

– "well" performance in

self-pla y.

Erik Berbee & Bas van Gijzel. Slides last processed on Monday 22nd March, 2010 at 23:37h. Slide 32

slide-33
SLIDE 33

Multi-agent learning Methodology of MAL research

Prescriptive Agenda - non-cooperative: Addenda by Sandholm

  • Convergence to an equilibrium in a non-cooperative setting is just one of

the possible goals. This

sp e ial ase of the non-cooperative prescriptive

agenda replaces the normative agenda. (No further motivations are given.)

  • Suggestions for
p rop erties
  • f
lea rning algo rithms, which actually are similar

to those suggested in Shoham et al.

  • AWESOME: Adapt When Everybody is Stationary, Otherwise Move to

Equilibrium.

Erik Berbee & Bas van Gijzel. Slides last processed on Monday 22nd March, 2010 at 23:37h. Slide 33

slide-34
SLIDE 34

Multi-agent learning Methodology of MAL research

Discussion (if time permits)

  • Game theory is almost immediately assumed as the tool to use for MAL.

Are there any good reasons to or not to use game theory?

  • Are all agendas really necessary, or should some of them be considered

not appropriate for MAL?

  • Are the "high rewards" defined by Shoham et al. good approximations?
  • Do you miss a category? Consider for example modelling and design by
  • Gordon. (also homework)

*G. Gordon (2007): Agendas for multi-agent learning. In: Artificial Intelligence 171, 2007

Erik Berbee & Bas van Gijzel. Slides last processed on Monday 22nd March, 2010 at 23:37h. Slide 34

slide-35
SLIDE 35

Multi-agent learning Methodology of MAL research

Questions?

Erik Berbee & Bas van Gijzel. Slides last processed on Monday 22nd March, 2010 at 23:37h. Slide 35