Signaling Games and the Emergence of Linguistic Meaning PENG - - PowerPoint PPT Presentation

▶

Aug 08, 2023 446 likes •659 views

S IGNALING G AMES R EPLICATOR D YNAMICS R EINFORCEMENT L EARNING Signaling Games and the Emergence of Linguistic Meaning PENG 2012/2013 Introduction 1 S IGNALING G AMES R EPLICATOR D YNAMICS R EINFORCEMENT L EARNING L ANGUAGE AS C ONVENTION ?

SLIDE 1 SIGNALING GAMES REPLICATOR DYNAMICS REINFORCEMENT LEARNING

Signaling Games and the Emergence of Linguistic Meaning

PENG 2012/2013 Introduction 1

SLIDE 2 SIGNALING GAMES REPLICATOR DYNAMICS REINFORCEMENT LEARNING

LANGUAGE AS CONVENTION?

”A name is a spoken sound significant by convention... I say ’by convention’ because no name is a name naturally but only when it has become a symbol.” Aristotle, De Interpretatione ”[L]anguages [are] gradually establish’d by human conventions without any explicit promise. In like manner do gold and silver become the common measures of exchange, and are esteem’d sufficient payment for what is of a hundred times their value.” Hume, Treatise of Human Nature

SLIDE 3 SIGNALING GAMES REPLICATOR DYNAMICS REINFORCEMENT LEARNING

LANGUAGE AS CONVENTION?

”[w]e can hardly suppose a parliament of hitherto speechless elders meeting together and agreeing to call a cow a cow and a wolf a wolf.” Russell, The Analysis of Mind ”Conventions are like fires: under favourable conditions, a sufficient concentration of heat spreads and perpetuates

itself. The nature of the fire does not depend on the original

source of heat. Matches may be the best fire starters, but that is no reason to think of fires started otherwise as any the less fires.” Lewis, Convention

SLIDE 4 SIGNALING GAMES REPLICATOR DYNAMICS REINFORCEMENT LEARNING

COORDINATION & SIGNALING

R L R 1 L 1 aL aS tL 1 tS 1 Messages: One or two lanterns? s1:

tL m1 tS m2

s2:

tL m2 tS m1

s3:

tL m1 tS m2

s4:

tL m2 tS m1

r1:

m1 aL m2 aS

r2:

m1 aS m2 aL r3: m1 aL m2 aS

r4:

m1 aS m2 aL

SLIDE 5 SIGNALING GAMES REPLICATOR DYNAMICS REINFORCEMENT LEARNING

◮ a signaling game is a tuple SG = {S, R}, T, Pr, M, A, U ◮ a Lewis game is defined by: ◮ T = {tL, tS} ◮ M = {m1, m2} ◮ A = {aL, aS} ◮ Pr(tL) = Pr(tS) = .5 ◮ U(ti, aj) =

1 if i = j else aL aS tL 1 tS 1 N S R 1 R 1 S R 1 R 1

.5 .5

tL tS m1 m2 m1 m2 aL aS aL aS aL aS aL aS

SLIDE 6 SIGNALING GAMES REPLICATOR DYNAMICS REINFORCEMENT LEARNING

PURE STRATEGIES

Pure strategies are contingency plans, players act according to.

◮ sender strategy: s : T → M ◮ receiver strategy: r : M → A

s1:

tL m1 tS m2

s2:

tL m2 tS m1

s3:

tL m1 tS m2

s4:

tL m2 tS m1

r1:

m1 aL m2 aS

r2:

m1 aS m2 aL

r3:

m1 aL m2 aS

r4:

m1 aS m2 aL

SLIDE 7 SIGNALING GAMES REPLICATOR DYNAMICS REINFORCEMENT LEARNING

SIGNALING SYSTEMS

◮ signaling systems are combinations of pure strategies. The

Lewis game has two: L1 = s1, r1 and L2 = s2, r2 L1: tL tS m1 m2 aL aS L2: tL tS m1 m2 aL aS

◮ signaling systems are strict Nash equilibria of the EU-table:

r1 r2 r3 r4 s1 1 .5 .5 s2 1 .5 .5 s3 .5 .5 .5 .5 s4 .5 .5 .5 .5

◮ in signaling systems messages associate states and actions

uniquely

◮ signaling systems constitute evolutionary stable states

SLIDE 8 SIGNALING GAMES REPLICATOR DYNAMICS REINFORCEMENT LEARNING

SIGNALING CONVENTION

”Given the definition of signaling systems, we can define a signaling convention as any convention whereby members

f a population P who are involved as communicators or

audience in a certain signaling problem S do their parts of a certain signaling system Fc, Fa by acting according to their respective contingency plans. If such a convention exists, we also call Fc, Fa a conventional signaling system.” Lewis, Convention

SLIDE 9 SIGNALING GAMES REPLICATOR DYNAMICS REINFORCEMENT LEARNING

ASYMMETRIC STATIC SIGNALING GAME

Given a signaling game SG = {S, R}, T, M, A, Pr, C, U′ as initially defined. The corresponding asymmetric static signaling game SSGa = {S, R}, S, R, U is defined as follows:

◮ S is a sender, R is a receiver ◮ S = {s|s ∈ [T → M]} is the set of the sender’s strategies ◮ R = {r|r ∈ [M → A]} is the set of the receiver’s strategies ◮ U : S × R → R is the utility function, defined as

U(s, r) =

t Pr(t) × U′(t, s(t), r(s(t)))

A SSGa is asymmetric because sender and receiver have a different set strategies.

SLIDE 10 SIGNALING GAMES REPLICATOR DYNAMICS REINFORCEMENT LEARNING

REPLICATOR DYNAMICS

Given a very large (effectively infinite) population of agents playing a symmetric static game {P1, P2}, S, U : S × S → R randomly against each other. Then we can define

◮ p(si): proportion of agents in the population playing strategy si ◮ U(si) =

sj∈S p(sj)U(si, sj): expected utility for agents playing si

◮ U =

si∈S p(si)U(si) the average fitness of the whole population

Replicator Dynamics The RD is defined by the following differential equation: dp(si) dt = p(si)[U(si) − U]

SLIDE 11 SIGNALING GAMES REPLICATOR DYNAMICS REINFORCEMENT LEARNING

REPLICATOR DYNAMICS FOR ASYMMETRIC GAMES?

”In an evolutionary setting, we can either model a situation where senders and receivers belong to different populations

r model the case where individuals of the same population

at different times assume the role of sender and receiver.”

Skyrms, Evolution of the Social Contract

◮ the replicator dynamics is defined for symmetric static

games

◮ there are two possible solutions to apply replicator

dynamics on a signaling game

1. use a ’two population’ model (sender population &

receiver population)

2. symmetrize a asymmetric static signaling game to a

symmetric static signaling game

SLIDE 12 SIGNALING GAMES REPLICATOR DYNAMICS REINFORCEMENT LEARNING

RESULT FOR A ’TWO-POPULATION’ MODEL

p(S2)

. S2, R2 S1, R1

p(R2)

SLIDE 13 SIGNALING GAMES REPLICATOR DYNAMICS REINFORCEMENT LEARNING

SYMMETRIC STATIC SIGNALING GAME

Given a asymmetric static signaling game SSGa = (S, R), S, R, U′ as defined before. The corresponding symmetric static signaling game SSGs = (S, R), L, U is defined as follows:

◮ S is a sender, R is a receiver ◮ L = {Lij|Lij = (si, rj)∀si ∈ S, ri ∈ R} is the set of languages ◮ U : L × L → R is the utility function over languages,

defined as U(Lij, Lkl) = 1

2(U′(si, rl) + U′(sk, rj))

r1 r2 r3 r4 s1 L1 L12 L13 L14 s2 L21 L2 L23 L24 s3 L31 L32 L3 L34 s4 L41 L42 L43 L4

SLIDE 14 SIGNALING GAMES REPLICATOR DYNAMICS REINFORCEMENT LEARNING L1 L12 L13 L14 L21 L2 L23 L24 L31 L32 L3 L34 L41 L42 L43 L4 L1 1 .5 .75 .75 .5 .25 .25 .75 .25 .5 .5 .75 .25 .5 .5 L12 .5 .25 .25 1 .5 .75 .75 .75 .25 .5 .5 .75 .25 .5 .5 L13 .75 .25 .5 .5 .75 .25 .5 .5 .75 .25 .5 .5 .75 .25 .5 .5 L14 .75 .25 .5 .5 .75 .25 .5 .5 .75 .25 .5 .5 .75 .25 .5 .5 L21 .5 1 .75 .75 .5 .25 .25 .25 .75 .5 .5 .25 .75 .5 .5 L2 .5 .25 .25 .5 1 .75 .75 .25 .75 .5 .5 .25 .75 . .5 L23 .25 .75 .5 .5 .25 .75 .5 .5 .25 .75 .5 .5 .25 .75 .5 .5 L24 .25 .75 .5 .5 .25 .75 .5 .5 .25 .75 .5 .5 .25 .75 .5 .5 L31 .75 .75 .75 .75 .25 .25 .25 .25 .5 .5 .5 .5 .5 .5 .5 .5 L32 .25 .25 .25 .25 .75 .75 .75 .75 .5 .5 .5 .5 .5 .5 .5 .5 L3 .5 .5 .5 .5 .5 .5 .5 .5 .5 .5 .5 .5 .5 .5 .5 .5 L34 .5 .5 .5 .5 .5 .5 .5 .5 .5 .5 .5 .5 .5 .5 .5 .5 L41 .75 .75 .75 .75 .25 .25 .25 .25 .5 .5 .5 .5 .5 .5 .5 .5 L42 .25 .25 .25 .25 .75 .75 .75 .75 .5 .5 .5 .5 .5 .5 .5 .5 L43 .5 .5 .5 .5 .5 .5 .5 .5 .5 .5 .5 .5 .5 .5 .5 .5 L4 .5 .5 .5 .5 .5 .5 .5 .5 .5 .5 .5 .5 .5 .5 .5 .5

SLIDE 15 SIGNALING GAMES REPLICATOR DYNAMICS REINFORCEMENT LEARNING

RESULT FOR A ’ONE-POPULATION’ MODEL

SLIDE 16 SIGNALING GAMES REPLICATOR DYNAMICS REINFORCEMENT LEARNING

BEHAVIORAL STRATEGIES

Behavioral strategies are functions that map choice points to probability distributions over actions available in that choice point.

◮ behavioral sender strategy

σ : T → ∆(M)

◮ behavioral receiver strategy

ρ : M → ∆(A) σ =     t1 → m1 → .9 m2 → .1

t2 →

m1 → .5 m2 → .5

   ρ =     m1 → a1 → .33 a2 → .67

m2 →

a1 → 1 a2 →

  

SLIDE 17 SIGNALING GAMES REPLICATOR DYNAMICS REINFORCEMENT LEARNING

REINFORCEMENT LEARNING

Reinforcement learning via Polya urns:

◮ the sender has a urn ℧t for each t ∈ T filled with balls of a

type m ∈ M

◮ the receiver has a urn ℧m for each m ∈ M filled with balls

f a type a ∈ A

σ(m|t) = m(℧t) |℧t| ρ(a|m) = a(℧m) |℧m| After a played round successful communication will be reinforced (by adding 10 appropriate balls and reducing 4 balls

f other types).

SLIDE 18 SIGNALING GAMES REPLICATOR DYNAMICS REINFORCEMENT LEARNING

REINFORCEMENT LEARNING

S R

ts tg m1 m2 as ag

℧ ℧ ℧ ℧

◮ the sender has an urn for

each state t ∈ T

◮ each urn contains balls of

each message m ∈ M

◮ the sender decides by

drawing from urn ℧t

◮ the receiver has an urn for

each message m ∈ M

◮ each urn contains balls of

each action a ∈ A

◮ the receiver decides by

drawing from urn ℧t

◮ successful communication → urn update ◮ in general a signaling system emerges over time

SLIDE 19 SIGNALING GAMES REPLICATOR DYNAMICS REINFORCEMENT LEARNING

WHAT WE DID LAST YEAR

Extensions in time and space:

◮ agents are placed in a network structure ◮ agents play the game with direct neighbors ◮ agents play both as sender and receiver ◮ agents play the game repeatedly ◮ agents’ decisions are influenced by previous encounters:

implementation of reinforcement learning Related work:

◮ ”Talking to Neighbours” (Zollman, 2005) ◮ ”Communication and Structured Correlation” (Wagner,

2009)

SLIDE 20 SIGNALING GAMES REPLICATOR DYNAMICS REINFORCEMENT LEARNING

HOMEWORK

1. Programming Task: Compute the population change for

the Prisoner’s Dilemma for 3 steps with a start population

f p(C) = .8 and p(D) = .2
2. Questions to Skyrm’s Chapter 1: Signals

2.1 What are the main differences (conceptual and according to the perspective) between replicator dynamics and reinforcement learning applied on signaling games? 2.2 In the first session we considered a game with 2 states, 2 messages and 2 actions, thus the number of states/actions and messages are the same. It can be shown that various update dynamics produce perfect signaling systems. What could probably happen if we consider a game with more or less messages than states/actions? 2.3 When does it make sense to use directed graphs, when to use undirected graphs, to model interaction structures among multiple agents?