Multi-agent learning Emergence of Conventions Gerard Vreeswijk , - - PowerPoint PPT Presentation

multi agent learning
SMART_READER_LITE
LIVE PREVIEW

Multi-agent learning Emergence of Conventions Gerard Vreeswijk , - - PowerPoint PPT Presentation

Multi-agent learning Emergence of Conventions Multi-agent learning Emergence of Conventions Gerard Vreeswijk , Intelligent Systems Group, Computer Science Department, Faculty of Sciences, Utrecht University, The Netherlands. Last modified on


slide-1
SLIDE 1

Multi-agent learning Emergence of Conventions

Multi-agent learning

Emergence of Conventions

Gerard Vreeswijk, Intelligent Systems Group, Computer Science Department, Faculty of Sciences, Utrecht University, The Netherlands. Gerard Vreeswijk. Last modified on March 29th, 2011 at 11:53 Slide 1

slide-2
SLIDE 2

Multi-agent learning Emergence of Conventions

Motivation

Gerard Vreeswijk. Last modified on March 29th, 2011 at 11:53 Slide 2

slide-3
SLIDE 3

Multi-agent learning Emergence of Conventions

Simple example of a Markov process

  • Return probabilities are usually
  • mitted in diagrams.
  • In this case it can be derived that,
  • n average,
  • P(Sun)

= 6/7

P(Rain)

= 1/7

  • How?

We’ll see . . .

Gerard Vreeswijk. Last modified on March 29th, 2011 at 11:53 Slide 3

slide-4
SLIDE 4

Multi-agent learning Emergence of Conventions

Plan for today

  • 1. Markov processes. (Ergodic process, communicating states/class, transient

state/class, recurrent state/class, periodic state/class, absorbing state, irreducible process, stationary distribution.) Compute stationary distributions:

  • Solve n linear equations.
  • Compare n so-called z-trees (Freidlin and Wentzell, 1984).
  • 2. Perturbed Markov processes. (Regular perturbed Markov process,

punctuated equilibrium, stochastically stable state.) Compute stochastically stable states:

  • Compare k so-called z-trees, where k is the number of so-called recurrent

classes (Peyton Young, 1993).

Gerard Vreeswijk. Last modified on March 29th, 2011 at 11:53 Slide 4

slide-5
SLIDE 5

Multi-agent learning Emergence of Conventions

Plan for today

  • 3. Applications.
  • Emergence of a currency standard.
  • Competing technologies: operating system A vs. operating system B.
  • Competing technologies: cell phone company A vs. cell phone

company B. (If time allows.)

  • Schelling’s model of segregation (1969).

Gerard Vreeswijk. Last modified on March 29th, 2011 at 11:53 Slide 5

slide-6
SLIDE 6

Multi-agent learning Emergence of Conventions

Part 1: Markov processes

Gerard Vreeswijk. Last modified on March 29th, 2011 at 11:53 Slide 6

slide-7
SLIDE 7

Multi-agent learning Emergence of Conventions

State transitions

Gerard Vreeswijk. Last modified on March 29th, 2011 at 11:53 Slide 7

slide-8
SLIDE 8

Multi-agent learning Emergence of Conventions

Communication classes

Gerard Vreeswijk. Last modified on March 29th, 2011 at 11:53 Slide 8

slide-9
SLIDE 9

Multi-agent learning Emergence of Conventions

Start state matters

Gerard Vreeswijk. Last modified on March 29th, 2011 at 11:53 Slide 9

slide-10
SLIDE 10

Multi-agent learning Emergence of Conventions

Start state matters. . . but here it does not

Gerard Vreeswijk. Last modified on March 29th, 2011 at 11:53 Slide 10

slide-11
SLIDE 11

Multi-agent learning Emergence of Conventions

The stationary distribution (and computing one)

P(A) = P(A|A′)P(A′) + P(A|B′)P(B′) + P(A|C′)P(C′) + P(A|D′)P(D′) Let us assume that visiting probabilities are stationary (A = A′, B = B′, . . . ):

= P(A|A)P(A) + P(A|B)P(B) + P(A|C)P(C) + P(A|D)P(D) = 0 · P(A) + 0 · P(B) + 1 · P(C) + 0 · P(D) = P(C)

Let us write this as A = C. Similarly, B = 0.8A, C = D, and D = 0.2A + B. Four equations with four unknowns. (Always regular, i.e. Det = 0 ?)

Gerard Vreeswijk. Last modified on March 29th, 2011 at 11:53 Slide 11

slide-12
SLIDE 12

Multi-agent learning Emergence of Conventions

Theory of discrete Markov processes

Definitions:

  • Stationary distribution: fixed point
  • f transition probabilities.
  • Empirical distribution: long run

normalised frequency of visits.

  • Limit distribution: long run

probability to visit a node.

  • Process is path-dependent:

empirical distribution depends on start state. Ergodic otherwise.

  • Class is recurrent: process cannot
  • escape. Transient otherwise.
  • Process is irreducible: all states can

reach each other. Facts:

  • Node is recurrent: process will

return to it a.s.

  • If finite number of states:

– At least one recurrence class. – If precisely one recurrence class then ergodic, and conversely.

  • Stationary distribution always

exists. Unique iff ergodic. In that case, stationary distr. ≡ empirical distr.

  • If ergodic and a-periodic, then

stationary distr. ≡ limit distr.

Gerard Vreeswijk. Last modified on March 29th, 2011 at 11:53 Slide 12

slide-13
SLIDE 13

Multi-agent learning Emergence of Conventions

Finding stationary distributions with many states is difficult

  • Solve n equations in n
  • unknowns. What if S is large?

                   0.1 0.2 0.0 0.1 0.0 0.1 0.0 0.3 0.0 0.2 0.1 0.2 0.0 0.1 0.0 0.1 0.0 0.3 0.0 0.2 0.1 0.2 0.0 0.1 0.0 0.1 0.0 0.3 0.0 0.2 0.0 0.1 0.1 0.2 0.0 0.1 0.0 0.3 0.0 0.2 0.5 0.2 0.0 0.1 0.0 0.0 0.0 0.0 0.0 0.2 0.1 0.2 0.0 0.1 0.0 0.1 0.0 0.3 0.0 0.2 0.0 0.1 0.1 0.2 0.0 0.1 0.0 0.3 0.0 0.2 0.1 0.2 0.0 0.1 0.0 0.1 0.0 0.3 0.0 0.2 0.3 0.1 0.2 0.0 0.1 0.0 0.0 0.0 0.3 0.0 0.1 0.2 0.0 0.1 0.0 0.1 0.0 0.3 0.0 0.2                   

  • Freidlin & Wentzell (1984):
  • nly look at so-called state

trees.

Gerard Vreeswijk. Last modified on March 29th, 2011 at 11:53 Slide 13

slide-14
SLIDE 14

Multi-agent learning Emergence of Conventions

An irreducible (and finite) Markov process

Gerard Vreeswijk. Last modified on March 29th, 2011 at 11:53 Slide 14

slide-15
SLIDE 15

Multi-agent learning Emergence of Conventions

One possible A-tree

Gerard Vreeswijk. Last modified on March 29th, 2011 at 11:53 Slide 15

slide-16
SLIDE 16

Multi-agent learning Emergence of Conventions

Another possible A-tree

Gerard Vreeswijk. Last modified on March 29th, 2011 at 11:53 Slide 16

slide-17
SLIDE 17

Multi-agent learning Emergence of Conventions

A perhaps easier way to compute the stationary distribution

  • An s-tree, Ts, is a complete collection of disjoint paths from states = s to s.
  • The likelihood of an s-tree Ts, written ℓ(Ts), =Def the product of its edge

probabilities.

  • The likelihood of a state s, written ℓ(s), =Def sum of the likelihood of all

s-trees. Theorem (Freidlin & Wentzell, 1984). Let P be an irreducible finite Markov process. Then, for all states, the likelihood of that state is proportional to the stationary probability of that state.

Gerard Vreeswijk. Last modified on March 29th, 2011 at 11:53 Slide 17

slide-18
SLIDE 18

Multi-agent learning Emergence of Conventions

Counting s-trees with Freidlin & Wentzell: example

Freidlin & Wentzell (1984): µ(s) = v(s) ∑t∈S v(t), where v(t) =Def ∑

T∈Ts

ℓ(Ts)

The unique C-tree is coloured red. Computing ℓ(TC) = 10ǫ· 1/4· . . . = 5ǫ3/12. Similarly: State: A B C D E F G Distribution: ǫ2/24 5ǫ3/9 5ǫ3/12 5ǫ2/24 ǫ2/24 ǫ/48 ǫ/32 Note what happens if ǫ → 0.

Gerard Vreeswijk. Last modified on March 29th, 2011 at 11:53 Slide 18

slide-19
SLIDE 19

Multi-agent learning Emergence of Conventions

Part 2: Perturbed Markov processes

Gerard Vreeswijk. Last modified on March 29th, 2011 at 11:53 Slide 19

slide-20
SLIDE 20

Multi-agent learning Emergence of Conventions

Motivation

Gerard Vreeswijk. Last modified on March 29th, 2011 at 11:53 Slide 20

slide-21
SLIDE 21

Multi-agent learning Emergence of Conventions

Most Markov processes are path-dependent (non-ergodic)

Gerard Vreeswijk. Last modified on March 29th, 2011 at 11:53 Slide 21

slide-22
SLIDE 22

Multi-agent learning Emergence of Conventions

Make them ergodic by perturbing with ǫr(s,s′) here and there

Gerard Vreeswijk. Last modified on March 29th, 2011 at 11:53 Slide 22

slide-23
SLIDE 23

Multi-agent learning Emergence of Conventions

Compute s-trees from P0-recurrent classes only (!)

Gerard Vreeswijk. Last modified on March 29th, 2011 at 11:53 Slide 23

slide-24
SLIDE 24

Multi-agent learning Emergence of Conventions

Compute s-trees from P0-recurrent classes only (!)

Gerard Vreeswijk. Last modified on March 29th, 2011 at 11:53 Slide 24

slide-25
SLIDE 25

Multi-agent learning Emergence of Conventions

Class {B, D, E} possesses lowest stochastic potential, viz. 4.

Gerard Vreeswijk. Last modified on March 29th, 2011 at 11:53 Slide 25

slide-26
SLIDE 26

Multi-agent learning Emergence of Conventions

Example of P0 and Pǫ

lim

ǫ→0

          0.0 0.2 0.2 0.1 0.5 0.3 ǫ7 0.1 0.1 0.5 − ǫ7 0.1 0.2 0.2 0.0 0.5 0.7 0.1 0.2 0.0 0.0 0.1 0.2 − ǫ2/2 0.2 ǫ2 0.5 − ǫ2/2 0.0 0.0 0.1 0.0 0.9          

=

          0.0 0.2 0.2 0.1 0.5 0.3 0.0 0.1 0.1 0.5 0.1 0.2 0.2 0.0 0.5 0.7 0.1 0.2 0.0 0.0 0.1 0.2 0.2 0.0 0.5 0.0 0.0 0.1 0.0 0.9          

  • Notice that some P0-positive probabilities “have to give way” to perturbe

P0-zero probabilities with ǫ. (Because row probabilities must add up to 1.)

Gerard Vreeswijk. Last modified on March 29th, 2011 at 11:53 Slide 26

slide-27
SLIDE 27

Multi-agent learning Emergence of Conventions

Perturbed Markov processes

  • P0 is a Markov process on a finite state space S.
  • Let, for each ǫ ∈ (0, ǫ∗], Pǫ be a Markov process on the same state space.
  • The collection

{Pǫ | ǫ ∈ (0, ǫ∗]}

is a regular perturbation of P0 if

  • 1. Each Pǫ is ergodic.
  • 2. It holds that limǫ→0 Pǫ = P0.
  • 3. If Pǫ

s,s′ > 0, for some ǫ > 0, then

0 < lim

ǫ→0

Pǫ ǫr(s,s′) < ∞ for some r(s, s′) ≥ 0. This number is called the resistance from s tot s′.

Gerard Vreeswijk. Last modified on March 29th, 2011 at 11:53 Slide 27

slide-28
SLIDE 28

Multi-agent learning Emergence of Conventions

Resistance

  • 1. Each Pǫ is ergodic.
  • 2. It holds that limǫ→0 Pǫ = P0.
  • 3. If Pǫ

s,s′ > 0, for some ǫ > 0, then

0 < lim

ǫ→0

Pǫ ǫr(s,s′) < ∞ for some r(s, s′) ≥ 0.

  • 4. For transitions s → s′ where P0

s,s′ = Pǫ s,s′ = 0 the resistance is defined to be

∞. Note:

  • The number r(s, s′) is well-defined!
  • If P0

s,s′ > 0 then r(s, s′) = 0.

  • If r(s, s′) = 0 then P0

s,s′ > 0.

Gerard Vreeswijk. Last modified on March 29th, 2011 at 11:53 Slide 28

slide-29
SLIDE 29

Multi-agent learning Emergence of Conventions

Stochastic stability

  • Because each Pǫ is ergodic, the stationary distribution µǫ is uniquely

defined, for every ǫ ∈ (0, ǫ∗].

  • It can be shown that

lim

ǫ→0 µǫ(s)

exists for every s. Let us call this distribution µ0.

  • A state s is said to be stochastically stable if

µ0(s) > 0 Remarks:

  • It can be shown that µ0 is a stationary distribution of P0.
  • It follows that every regular perturbed Markov process possesses at least
  • ne stochastically stable state.

Gerard Vreeswijk. Last modified on March 29th, 2011 at 11:53 Slide 29

slide-30
SLIDE 30

Multi-agent learning Emergence of Conventions

A way to compute stochastically stable states

  • Recurrent classes 1, . . . , K.
  • The resistance of a path from i to j

=Def the sum of edge resistances.

(Why the sum?)

  • Construct edges rij (between

classes) with the minimum resistance from i to j.

  • The resistance of j-tree Tj, written

r(Tj), =Def sum of edge resistances (in class graph).

  • The stochastic potential of

recurrence class j, written p(j),

=Def the minimum resistance over

all j-trees. Theorem (Young, 1993). Let

{Pǫ | ǫ ∈ (0, ǫ∗]}

be a regular perturbed Markov process, and let µǫ be the unique stationary distribution of Pǫ, ǫ > 0. Then

  • limǫ→0 µǫ = µ0 exists.
  • µ0 is a stationary distribution of P0.
  • The stochastically stable states are

precisely those that are contained in the recurrent class(es) of P0 with minimum stochastic potential.

Gerard Vreeswijk. Last modified on March 29th, 2011 at 11:53 Slide 30

slide-31
SLIDE 31

Multi-agent learning Emergence of Conventions

Minimum path resistance: example

  • Compute path resistance between all K recurrent classes.
  • With K recurrent classes there are always K(K − 1) minimum path

resistances to be computed. (We work on KK [unfortunate notation].) Example:

  • Suppose there are three recurrent classes E1, E2, and E3.
  • Minimum path resistances here are 1, 5, 6, 7, 8, 9.

Gerard Vreeswijk. Last modified on March 29th, 2011 at 11:53 Slide 31

slide-32
SLIDE 32

Multi-agent learning Emergence of Conventions

Nine j-trees generated by three recurrence classes

Gerard Vreeswijk. Last modified on March 29th, 2011 at 11:53 Slide 32

slide-33
SLIDE 33

Multi-agent learning Emergence of Conventions

Revisit earlier example

  • 1. The unperturbed Markov process

P0 possesses two recurrent classes,

  • viz. E1 = {A} and E2 = {F, G}.
  • 2. Least resistance from E1 to E2 is

10ǫ· . . . = ǫ1/32. Resistance 1.

  • 3. Least resistance from E2 to E1 is

1/3· ǫ· . . . = ǫ2/24. Resistance 2.

  • 4. There is only one resistance tree to

either side, hence one minimum resistance tree.

  • 5. Stochastic potential of E1 is 2;

stochastic potential of E2 is 1.

  • 6. Conclusion: E2 is stochastically

stable, E1 is not.

Gerard Vreeswijk. Last modified on March 29th, 2011 at 11:53 Slide 33

slide-34
SLIDE 34

Multi-agent learning Emergence of Conventions

Part 3: Applications

Gerard Vreeswijk. Last modified on March 29th, 2011 at 11:53 Slide 34

slide-35
SLIDE 35

Multi-agent learning Emergence of Conventions

Technology adoption

Other: Operating system A Operating system B You: Operating system A

(a, a) (0, 0)

Operating system B

(0, 0) (b, b)

Total number of players : n, for example n = 5 Sample size : s, for example s = 3 Total number of players currently playing A : m, for example m = 2 P( individual chooses A | AABBB ) = 3/ 5 3

  • = 3/10

P( #A′s = k | AABBB ) = 5 3

  • ( 3

10)k( 7 10)5−k This process is path-dependent (non-ergodic): for example always BABBB, BABBB, etc. → BBBBB. With b ≫ a even BAABB, etc. → BBBBB.

Gerard Vreeswijk. Last modified on March 29th, 2011 at 11:53 Slide 35

slide-36
SLIDE 36

Multi-agent learning Emergence of Conventions

Idiosyncratic play in technology adoption

“How, then, might institutional change occur? Because best-response play renders both conventions absorbing states, it is clear that in order to understand institutional change, some kind of nonbest-response play must be introduced. Suppose there is a probability ǫ that when individuals are in the process of updating, each may switch their type for idiosyncratic reasons. Thus, 1 − ǫ represents the probability that the individual pursues the best-response updating process described

  • above. The idiosyncratic play accounting for nonbest responses need not

be irrational or odd; it simply represents actions whose reasons are not explicitly modeled. Included is experimentation, whim, error, and intentional acts seeking to affect game outcomes but whose motivations are not captured by the above game.” From Microeconomics: behavior, institutions, and evolution (Bowles, 2003).

Gerard Vreeswijk. Last modified on March 29th, 2011 at 11:53 Slide 36

slide-37
SLIDE 37

Multi-agent learning Emergence of Conventions

The tipping effect

Total number of players : n, for example n = 5 Sample size : s, for example s = 3 Total number of players currently playing A : m, for example m = 2 Suppose a = b. Let E1 = { s ∈ S | s ∼ AAAAA }

= {AAAAA}

T1 = { s ∈ S | s ∼ AAAAB }

= {AAAAB, AAABA, . . . , BAAAA}

T2 = { s ∈ S | s ∼ AAABB }

= {AAABB, AABAB, . . . , BBBAA}

T3 = { s ∈ S | s ∼ ABBBB } E2 = { s ∈ S | s ∼ BBBBB }

  • How many idiosyncratic transitions must be made to move from E1 to E2?
  • Wat is the resistance from E1 to E2? From E1 to E2?
  • What is (are) the stochastically stable state(s)?

Gerard Vreeswijk. Last modified on March 29th, 2011 at 11:53 Slide 37

slide-38
SLIDE 38

Multi-agent learning Emergence of Conventions

Tipping point (general case)

  • Suppose we’re in all-B.
  • Generally, an individual will

choose for A when ak ≥ b(s − k)

k ≥ bs a + b. (Why “≥” instead of “>”?)

  • Thus,

⌈ bs

a + b⌉ times an idiosyncratic choice (ok, error) must be made to move from BBBBB . . . into the first transient class that, without further idiosyncracies, leads to AAAAA.

  • With probability ǫ of idiosyncratic

choice this probability is

2)⌈ bs

a+b ⌉

Indeed ǫ/2, if we assume that idiosyncracy is uniformly distributed among A and B. In that case, half of the idiosyncratic choices are contra-productive again!

  • With this payoff matrix, the

Pareto-optimal outcome is favoured, proved s large enough.

Gerard Vreeswijk. Last modified on March 29th, 2011 at 11:53 Slide 38

slide-39
SLIDE 39

Multi-agent learning Emergence of Conventions

Part 4: Schelling’s model

  • f segregation

Gerard Vreeswijk. Last modified on March 29th, 2011 at 11:53 Slide 39

slide-40
SLIDE 40

Multi-agent learning Emergence of Conventions

Schelling’s model in 2D (torus)

Gerard Vreeswijk. Last modified on March 29th, 2011 at 11:53 Slide 40

slide-41
SLIDE 41

Multi-agent learning Emergence of Conventions

Schelling’s model in 1D (circle)

  • Schelling (1969, 1971, 1978).
  • Isolated people are discontent.

(Other people are content.)

  • Possible swaps:

Trade Profit DD → CC 2 DC → CC 1 CD → DC CC → CD

−1

CC → DD

−2

  • This “problem” can be “solved” in

“hundreds” of ways. (Analytically, stochastically, whatever.)

Gerard Vreeswijk. Last modified on March 29th, 2011 at 11:53 Slide 41

slide-42
SLIDE 42

Multi-agent learning Emergence of Conventions

Young’s take on Schelling’s model

  • Possible trades:

Trade Profit Probability DD → CC 2 − 2m high DC → CC 1 − 2m high CD → DC 0 − 2m low: ǫa CC → CD

−1 − 2m

lower: ǫb CC → DD

−2 − 2m

lowest: ǫc Where 0 < a < b < c, and m are moving costs.

  • The resulting Markov process is

ergodic and regular.

Gerard Vreeswijk. Last modified on March 29th, 2011 at 11:53 Slide 42

slide-43
SLIDE 43

Multi-agent learning Emergence of Conventions

Recurrent classes are just {{a} | a ∈ Absorbing }

  • 1. To determine all recurrent classes
  • f P0.
  • All absorbing states A are

recurrent.

  • If not in absorbing state, then a

mutually advantageous swap is possible. Thus, if not in absorbing state, then transient state. Therefore, all and only recurrent classes are singletons of absorbing states: R = {{a} | a ∈ A}.

Gerard Vreeswijk. Last modified on March 29th, 2011 at 11:53 Slide 43

slide-44
SLIDE 44

Multi-agent learning Emergence of Conventions

Completely segregated vs. dispersed states

  • Absorbant states are either

completely separated or dispersed: A = S ∪ D.

  • For each s, s′ ∈ A, let r(s, s′) be

defined as usual. Claims:

  • 1. If s ∈ D, there does not exist an

s-tree from A\{s} with only a-edges.

  • 2. If s ∈ S, there does exist an s-tree

from A\{s} with only a-edges.

  • 3. The classes with lowest potential

are L = {{s} | s ∈ S}.

Gerard Vreeswijk. Last modified on March 29th, 2011 at 11:53 Slide 44

slide-45
SLIDE 45

Multi-agent learning Emergence of Conventions

Claim 1

If s ∈ D, there does not exist an s-tree from A\{s} with only a-edges.

Gerard Vreeswijk. Last modified on March 29th, 2011 at 11:53 Slide 45

slide-46
SLIDE 46

Multi-agent learning Emergence of Conventions

Claim 2: a resistance

If s ∈ S, there does exist an s-tree from A\{s} with only a-edges.

Gerard Vreeswijk. Last modified on March 29th, 2011 at 11:53 Slide 46

slide-47
SLIDE 47

Multi-agent learning Emergence of Conventions

Claim 2: a resistance, discontent individual (no problem)

If s ∈ S, there does exist an s-tree from A\{s} with only a-edges.

Gerard Vreeswijk. Last modified on March 29th, 2011 at 11:53 Slide 47

slide-48
SLIDE 48

Multi-agent learning Emergence of Conventions

Absorbing state

state with low potential

Claim 1: If s ∈ D, there does not exist an s-tree from A\{s} with only a-edges.

  • Let s ∈ D. We must show that

some edges from A to s have resistance > a.

  • Well, edges from S to s, at least,

necessarily involve moves that create at least one discontent (= isolated) individual.

  • Therefore, all j-trees from A to D

have resistance b > a or c > a. Claim 2: If s ∈ S, there does exist an s-tree from A\{s} with only a-edges.

  • Let s ∈ S. We must show that all

edges from A to s have resistance a. i) From elements in S to other elements in S: ok! Put head to tail repeatedly. ii) From elements in D to elements in S: ok! Put head to tail of small groups

  • repeatedly. If one large cluster,

continue as in i).

Gerard Vreeswijk. Last modified on March 29th, 2011 at 11:53 Slide 48