Signaling Games and the Emergence of Linguistic Meaning
PENG 2012/2013 Introduction 1
Signaling Games and the Emergence of Linguistic Meaning PENG - - PowerPoint PPT Presentation
S IGNALING G AMES R EPLICATOR D YNAMICS R EINFORCEMENT L EARNING Signaling Games and the Emergence of Linguistic Meaning PENG 2012/2013 Introduction 1 S IGNALING G AMES R EPLICATOR D YNAMICS R EINFORCEMENT L EARNING L ANGUAGE AS C ONVENTION ?
Signaling Games and the Emergence of Linguistic Meaning
PENG 2012/2013 Introduction 1
LANGUAGE AS CONVENTION?
”A name is a spoken sound significant by convention... I say ’by convention’ because no name is a name naturally but only when it has become a symbol.” Aristotle, De Interpretatione ”[L]anguages [are] gradually establish’d by human conventions without any explicit promise. In like manner do gold and silver become the common measures of exchange, and are esteem’d sufficient payment for what is of a hundred times their value.” Hume, Treatise of Human Nature
LANGUAGE AS CONVENTION?
”[w]e can hardly suppose a parliament of hitherto speechless elders meeting together and agreeing to call a cow a cow and a wolf a wolf.” Russell, The Analysis of Mind ”Conventions are like fires: under favourable conditions, a sufficient concentration of heat spreads and perpetuates
source of heat. Matches may be the best fire starters, but that is no reason to think of fires started otherwise as any the less fires.” Lewis, Convention
COORDINATION & SIGNALING
R L R 1 L 1 aL aS tL 1 tS 1 Messages: One or two lanterns? s1:
tL m1 tS m2s2:
tL m2 tS m1s3:
tL m1 tS m2s4:
tL m2 tS m1r1:
m1 aL m2 aSr2:
m1 aS m2 aL r3: m1 aL m2 aSr4:
m1 aS m2 aL◮ a signaling game is a tuple SG = {S, R}, T, Pr, M, A, U ◮ a Lewis game is defined by: ◮ T = {tL, tS} ◮ M = {m1, m2} ◮ A = {aL, aS} ◮ Pr(tL) = Pr(tS) = .5 ◮ U(ti, aj) =
1 if i = j else aL aS tL 1 tS 1 N S R 1 R 1 S R 1 R 1
.5 .5tL tS m1 m2 m1 m2 aL aS aL aS aL aS aL aS
PURE STRATEGIES
Pure strategies are contingency plans, players act according to.
◮ sender strategy: s : T → M ◮ receiver strategy: r : M → A
s1:
tL m1 tS m2s2:
tL m2 tS m1s3:
tL m1 tS m2s4:
tL m2 tS m1r1:
m1 aL m2 aSr2:
m1 aS m2 aLr3:
m1 aL m2 aSr4:
m1 aS m2 aLSIGNALING SYSTEMS
◮ signaling systems are combinations of pure strategies. The
Lewis game has two: L1 = s1, r1 and L2 = s2, r2 L1: tL tS m1 m2 aL aS L2: tL tS m1 m2 aL aS
◮ signaling systems are strict Nash equilibria of the EU-table:
r1 r2 r3 r4 s1 1 .5 .5 s2 1 .5 .5 s3 .5 .5 .5 .5 s4 .5 .5 .5 .5
◮ in signaling systems messages associate states and actions
uniquely
◮ signaling systems constitute evolutionary stable states
SIGNALING CONVENTION
”Given the definition of signaling systems, we can define a signaling convention as any convention whereby members
audience in a certain signaling problem S do their parts of a certain signaling system Fc, Fa by acting according to their respective contingency plans. If such a convention exists, we also call Fc, Fa a conventional signaling system.” Lewis, Convention
ASYMMETRIC STATIC SIGNALING GAME
Given a signaling game SG = {S, R}, T, M, A, Pr, C, U′ as initially defined. The corresponding asymmetric static signaling game SSGa = {S, R}, S, R, U is defined as follows:
◮ S is a sender, R is a receiver ◮ S = {s|s ∈ [T → M]} is the set of the sender’s strategies ◮ R = {r|r ∈ [M → A]} is the set of the receiver’s strategies ◮ U : S × R → R is the utility function, defined as
U(s, r) =
t Pr(t) × U′(t, s(t), r(s(t)))
A SSGa is asymmetric because sender and receiver have a different set strategies.
REPLICATOR DYNAMICS
Given a very large (effectively infinite) population of agents playing a symmetric static game {P1, P2}, S, U : S × S → R randomly against each other. Then we can define
◮ p(si): proportion of agents in the population playing strategy si ◮ U(si) =
sj∈S p(sj)U(si, sj): expected utility for agents playing si◮ U =
si∈S p(si)U(si) the average fitness of the whole populationReplicator Dynamics The RD is defined by the following differential equation: dp(si) dt = p(si)[U(si) − U]
REPLICATOR DYNAMICS FOR ASYMMETRIC GAMES?
”In an evolutionary setting, we can either model a situation where senders and receivers belong to different populations
at different times assume the role of sender and receiver.”
Skyrms, Evolution of the Social Contract◮ the replicator dynamics is defined for symmetric static
games
◮ there are two possible solutions to apply replicator
dynamics on a signaling game
receiver population)
symmetric static signaling game
RESULT FOR A ’TWO-POPULATION’ MODEL
p(S2)
. S2, R2 S1, R1
p(R2)
SYMMETRIC STATIC SIGNALING GAME
Given a asymmetric static signaling game SSGa = (S, R), S, R, U′ as defined before. The corresponding symmetric static signaling game SSGs = (S, R), L, U is defined as follows:
◮ S is a sender, R is a receiver ◮ L = {Lij|Lij = (si, rj)∀si ∈ S, ri ∈ R} is the set of languages ◮ U : L × L → R is the utility function over languages,
defined as U(Lij, Lkl) = 1
2(U′(si, rl) + U′(sk, rj))
r1 r2 r3 r4 s1 L1 L12 L13 L14 s2 L21 L2 L23 L24 s3 L31 L32 L3 L34 s4 L41 L42 L43 L4
RESULT FOR A ’ONE-POPULATION’ MODEL
BEHAVIORAL STRATEGIES
Behavioral strategies are functions that map choice points to probability distributions over actions available in that choice point.
◮ behavioral sender strategy
σ : T → ∆(M)
◮ behavioral receiver strategy
ρ : M → ∆(A) σ = t1 → m1 → .9 m2 → .1
m1 → .5 m2 → .5
ρ = m1 → a1 → .33 a2 → .67
a1 → 1 a2 →
REINFORCEMENT LEARNING
Reinforcement learning via Polya urns:
◮ the sender has a urn ℧t for each t ∈ T filled with balls of a
type m ∈ M
◮ the receiver has a urn ℧m for each m ∈ M filled with balls
σ(m|t) = m(℧t) |℧t| ρ(a|m) = a(℧m) |℧m| After a played round successful communication will be reinforced (by adding 10 appropriate balls and reducing 4 balls
REINFORCEMENT LEARNING
ts tg m1 m2 as ag
℧ ℧ ℧ ℧
◮ the sender has an urn for
each state t ∈ T
◮ each urn contains balls of
each message m ∈ M
◮ the sender decides by
drawing from urn ℧t
◮ the receiver has an urn for
each message m ∈ M
◮ each urn contains balls of
each action a ∈ A
◮ the receiver decides by
drawing from urn ℧t
◮ successful communication → urn update ◮ in general a signaling system emerges over time
WHAT WE DID LAST YEAR
Extensions in time and space:
◮ agents are placed in a network structure ◮ agents play the game with direct neighbors ◮ agents play both as sender and receiver ◮ agents play the game repeatedly ◮ agents’ decisions are influenced by previous encounters:
implementation of reinforcement learning Related work:
◮ ”Talking to Neighbours” (Zollman, 2005) ◮ ”Communication and Structured Correlation” (Wagner,
2009)
HOMEWORK
the Prisoner’s Dilemma for 3 steps with a start population
2.1 What are the main differences (conceptual and according to the perspective) between replicator dynamics and reinforcement learning applied on signaling games? 2.2 In the first session we considered a game with 2 states, 2 messages and 2 actions, thus the number of states/actions and messages are the same. It can be shown that various update dynamics produce perfect signaling systems. What could probably happen if we consider a game with more or less messages than states/actions? 2.3 When does it make sense to use directed graphs, when to use undirected graphs, to model interaction structures among multiple agents?