SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA
Models of language Evolution Session 09: Evolution of Pragmatic - - PowerPoint PPT Presentation
Models of language Evolution Session 09: Evolution of Pragmatic - - PowerPoint PPT Presentation
S IGNALING G AME B EHAVIOURAL S TRATEGIES & U PDATE D YNAMICS M ODELING PRAGMATIC PHENOMENA Models of language Evolution Session 09: Evolution of Pragmatic Strategies Roland M uhlenbernd University of T ubingen S IGNALING G AME B
SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA
SIGNALING GAME: DEFINITION
A signaling game is a tuple SG = {S, R}, T, Pr, M, A, US, UR with
◮ {S, R}: set of players ◮ T: set of states ◮ Pr: prior beliefs: Pr ∈ ∆(T) ◮ M: set of messages ◮ A: set of receiver actions ◮ US,R: utility function: T × M × A → R
SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA
SIGNALING GAME: EXAMPLE
A standard Lewis game is defined as:
◮ Set of players {S, R} ◮ Set of states T = {t1, t2} ◮ Equiprobable prior beliefs: Pr(t1) = .5, Pr(t2) = .5 ◮ Set of messages M = {m1, m2} (no costs) ◮ Set of actions A = {a1, a2} ◮ Utility function US,R(ti, mj, ak) =
1 if i = k else a1 a2 t1 1,1 0,0 t2 0,0 1,1
SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA
STRATEGIES
The players’ ”actions” can be represented as pure strategies. For the Lewis game there are 4 strategies for each player: S1:
t1 m1 t2 m2
S2:
t1 m2 t2 m1
S3:
t1 m1 t2 m2
S4:
t1 m2 t2 m1
R1:
m1 a1 m2 a2
R2:
m1 a2 m2 a1
R3:
m1 a1 m2 a2
R4:
m1 a2 m2 a1
SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA
EXPECTED UTILITIES
The expected utility for a combination of strategies is given as: EU(Si, Rj) =
- t∈T
Pr(t) × U(t, Si(t), Rj(Si(t))) (1) R1 R2 R3 R4 S1 1 .5 .5 S2 1 .5 .5 S3 .5 .5 .5 .5 S4 .5 .5 .5 .5
Table: Expected utilities for all strategy combinations of the Lewis game
SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA
SIGNALING GAMES AS STATIC GAMES
◮ static games: agents choose simultaneously ◮ SG as static game: agents choose strategies ◮ a strategy represents a ”contingency plan”: what would an
agent do in each state
SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA
SIGNALING GAMES AS STATIC GAMES
Extensions:
- 1. multi-agent system
◮ graph G = N, V as interaction structure ◮ nodes represent agents, edges represent connections for
interaction
◮ different structures: grid, small world...
- 2. update rules
◮ imitate the majority ◮ imitate the best ◮ conditional imitation ◮ best response ◮ against mixed strategy over all neighbours ◮ against mixed strategy over preplayed rounds (fictitious
play)
SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA
SIGNALING GAMES AS DYNAMIC GAMES
◮ dynamic games: agents choose in sequence ◮ SG as dynamic game: agents play behavioural strategies ◮ a behavioural strategy represents a behaviour: what would
an agent do for a given situation
SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA
BEHAVIOURAL STRATEGIES
Behavioural strategies are functions that map choice points to probability distributions over actions available in that choice point.
◮ behavioural sender strategy
σ ∈ S = (∆(M))T
◮ behavioural receiver strategy
ρ ∈ R = (∆(A))M σ = t1 → m1 → .9 m2 → .1
- t2 →
m1 → .5 m2 → .5
-
ρ = m1 → a1 → .33 a2 → .67
- m2 →
a1 → 1 a2 →
-
SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA
SIGNALING GAMES AS DYNAMIC GAMES
Extensions:
- 1. multi-agent system
◮ graph G = N, V as interaction structure ◮ nodes represent agents, edges represent connections for
interaction
◮ different structures: grid, small world...
- 2. update rules
◮ imitate the majority ◮ imitate the best ◮ conditional imitation ◮ best response ?? ◮ against mixed strategy over all neighbours ◮ against mixed strategy over preplayed rounds (fictitious
play)
SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA
BEHAVIOURAL STRATEGIES
◮ Behavioural strategies represent probabilistic behaviour
Example: σ(m1|t2) = .5 - for state t2 the sender sends message m1 with a probability of .5
◮ Behavioural strategies represent beliefs
Example: ρ(a1|m1) = .33 - the sender believes that the receiver construes message m1 with a1 with a probability of .33
σ = t1 → m1 → .9 m2 → .1
- t2 →
m1 → .5 m2 → .5
-
ρ = m1 → a1 → .33 a2 → .67
- m2 →
a1 → 1 a2 →
-
SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA
EXPECTED UTILITY & BEST RESPONSE
◮ While for static SG the Expected utility EU(Si, Rj) was
defined for a strategy pair Si, Rj, for a dynamic SG it is defined in the following way: EUS(m|t, ρ) =
- a∈A
ρ(a|m) × U(t, m, a) (2) EUR(a|m, σ) =
- t∈T
σ(m|t) × U(t, m, a) (3)
◮ The behavioural strategies σ and ρ represent beliefs about
the participant
◮ Just as for static games to play Best Response means to
maximize Expected utility
SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA
BELIEF LEARNING
◮ Behavioural strategies represent beliefs about the
interlocutor
◮ The belief is a result of observation ◮ Example:
SO a1 a2 m1 8 2 m2 7 13 ρ = m1 → a1 → .8 a2 → .2
- m2 →
a1 → .35 a2 → .65
-
RO t1 t2 m1 6 m2 4 4 σ = t1 → m1 → .6 m2 → .4
- t2 →
m1 → m2 → 1
-
SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA
BELIEF LEARNING & BEST RESPONSE
◮ After a played game both interlocutors can observe the
resulting game path and update their beliefs accordingly
◮ Example:
◮ Given the following observations and appropriate beliefs:
SO a1 a2 m1 8+1 2 m2 7 13 RO t1 t2 m1 6+1 m2 4 4
◮ the sender is faced with state t1 ◮ EU(m1|t1, ρ) = .8, EU(m2|t1, ρ) = .35 ◮ m1 maximises EU, thus sending m1 is best response ◮ the receiver has to construe message m1: ◮ EU(a1|m1, σ) = .6, EU(a2|m1, σ) = 0 ◮ a1 maximises EU, thus playing a1 is best response ◮ players observe resulting game path t1, m1, a1 and update
- bservation counts and beliefs accordingly
SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA
EXAMPLE: RESULT IN A SW NETWORK
Figure: Resulting structure after 30 simulation steps of 100 BL agents playing the Lewis game on a SW
- network. The colours blue and green represent both signaling systems as target strategies.
SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA
REINFORCEMENT LEARNING
For a given signaling game SG
◮ the sender has a urn ℧t for each t ∈ T filled with balls of a
type m ∈ M
◮ the receiver has a urn ℧m for each m ∈ M filled with balls
- f a type a ∈ A
◮ Example:
m1 m2 ℧t1 8 2 ℧t2 7 13 a1 a2 ℧m1 2 3 ℧m2 7 1
SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA
REINFORCEMENT LEARNING
◮ If the sender is faced with state t she draws a ball from urn
℧t and sends the message appropriate to the ball type m.
◮ If the receiver receives message m he draws a ball from urn
℧m and plays the action appropriate to the ball type a.
◮ Behavioural strategies represent probabilistic behaviour ◮ After a played round successful communication will be
reinforced σ = t1 → m1 → .8 m2 → .2
- t2 →
m1 → .35 m2 → .65
-
ρ = m1 → a1 → .4 a2 → .6
- m2 →
a1 → .875 a2 → .125
-
SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA
REINFORCEMENT LEARNING
Example: Given the following urn settings: m1 m2 ℧t1 8 2+1 ℧t2 7 13 a1 a2 ℧m1 2 3 ℧m2 7+1 1 Play 1:
◮ the sender is faced with t1 ◮ she draws ball type m2
from urn ℧t1
◮ the receiver has to
construe m2
◮ he draws ball type a1 from
urn ℧m2
◮ communication per
t1, m2, a1 is successful: reinforcement
Play 2:
◮ the sender is faced with t2 ◮ she draws ball type m1
from urn ℧t2
◮ the receiver has to
construe m1
◮ he draws ball type a1 from
urn ℧m1
◮ communication per
t2, m1, a1 isn’t successful: no reinforcement
SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA
REINFORCEMENT LEARNING
Possible extensions:
◮ negative reinforcement: decrease the number of
appropriate balls if communication is not successful
◮ lateral inhibition: for successful communication not only
increase the number of appropriate balls, but also decrease the number of all other balls in the same urn
◮ limited memory: consider only the last n observations
SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA
EXAMPLE: RESULT IN A SW NETWORK
Figure: Resulting structure after 300 simulation steps of 100 RL agents playing the Lewis game (with lateral
inhibition) on a SW network. The colours blue and green represent both signaling systems as target strategies.
SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA
BELIEF LEARNING VS. REINFORCEMENT LEARNING
behavioural rational learning speed BL + BR √ √ fast RL √
- slow
SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA
NEO-GRICEAN PRAGMATICS
◮ the Conversational Implicature is a pragmatic phenomenon
where an utterance’s intended meaning differs from its literal meaning.
◮ Interlocutors can resolve the difference between the
intended pragmatic interpretation (PI) and the literal interpretation (LI) by Cooperation Principles. Levinson (2000) subdivided GCI’s in:
◮ Q-Implicature ◮ I-Implicature ◮ M-Implicature
SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA
Q-IMPLICATURE
(1) ”Some boys came to the party.” LI: Some, maybe all boys came. ∃ = ∃¬∀ ∨ ∀ PI: Some but not all boys came. ∃¬∀ Strategy for LI t∀ t∃¬∀ mall msome msbna a∀ a∃¬∀ Strategy for PI t∀ t∃¬∀ mall msome msbna a∀ a∃¬∀
SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA
MODELLING Q-IMPLICATURE
Parameter settings:
◮ T = {t∀, t∃¬∀} ◮ M = {mall, msome, msbna} ◮ A = {a∀, a∃¬∀} ◮ Pr(t∀) = Pr(t∃¬∀) = .5 ◮ κ(msbna) = 1
κ(mall) = κ(msome) > 1
◮ Initial LI strategy
t∀ t∃¬∀ mall msome msbna t∀ t∃¬∀
.5 .5 .5 .5 .5 .5
mall msome msbna ℧t∀ 50 50 ℧t∃¬∀ 50 50 a∀ a∃¬∀ ℧mall 100 ℧msome 50 50 ℧msbna 100
SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA
SIMULATION & RESULTS
◮ 200 RL agents play the Q-Implicature game repeatedly on
a total network with random partners
◮ all agents start with the initial urn setting that represents LI ◮ The simulation ends if all agents have learned a pure
strategy Results: t∀ t∃¬∀ mall msome msbna t∀ t∃¬∀ t∀ t∃¬∀ mall msome msbna t∀ t∃¬∀
% of agents 1 2 3 4 5 κ(msome),κ(mall)
SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA
I-IMPLICATURE
”What is expressed simply is stereotypically exemplified” (2) ”Billy drank a glass of milk.” LI: A glass of any kind of milk. tc, tg PI: A glass of cow’s milk. tc Strategy for LI tc tg mcm mm mgm ac ag Strategy for PI tc tg mcm mm mgm ac ag
SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA
MODELLING I-IMPLICATURE
Parameter settings:
◮ T = {tc, tg} ◮ M = {mm, mcm, mgm} ◮ A = {ac, ag} ◮ Pr(tc) = .8 > Pr(tg) = .2 ◮ κ(mm) = 2
κ(mcm) = κ(mgm) = 1
◮ Initial LI strategy
tc tg mcm mm mgm ac ag
.5 .5 1 − p p 1 − p p
mcm mm mgm ℧tc 100 − n n ℧tg n 100 − n for n = ⌊100 × p⌋ ac ag ℧mcm 100 ℧mm 50 50 ℧mgm 100
SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA
SIMULATION & RESULTS
◮ 200 RL agents play the Q-Implicature game repeatedly on
a total network with random partners
◮ all agents start with the initial urn setting that represents LI ◮ The simulation ends if all agents have learned a pure
strategy Results: tc tg mcm mm mgm tc tg tc tg mcm mm mgm tc tg
% of agents .3 .35 .4 .45 .5 .55 .6 p
SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA
M-IMPLICATURE
”What’s said in an abnormal way isn’t normal.” (3) ”Billy caused the sheriff to die.” LI: Billy killed the sheriff in any way. tp, tr PI: Billy killed the sheriff in an abnormal way. tr Strategy for LI tp tr mk mctd ap ar Strategy for PI tp tr mk mctd ap ar
SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA
MODELLING THE M-IMPLICATURE
Parameter settings:
◮ T = {tp, tr} ◮ M = {mk, mctd} ◮ A = {ap, ar} ◮ κ(mk) = 2, κ(mctd) = 1 ◮ Pr(tp) > Pr(tr) ◮ Initial LI strategy
tp tr mk mctd ap ar mk mctd ℧tp 50 50 ℧tr 50 50 ap ar ℧mk 50 50 ℧mctd 50 50
SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA
SIMULATION & RESULTS
◮ 200 RL agents play the Q-Implicature game repeatedly on
a total network with random partners
◮ all agents start with the initial urn setting that represents LI ◮ The simulation ends if all agents have learned a pure
strategy Results: tp tr mk mctd ap ar tp tr mk mctd ap ar
% of agents .51 .52 .53 .54 .55 .56 .57 Pr(tp)
SIGNALING GAME BEHAVIOURAL STRATEGIES & UPDATE DYNAMICS MODELING PRAGMATIC PHENOMENA
SUMMARY
◮ the difference between
◮ a static SG (agents play pure strategies) ◮ a dynamic SG (agents play behavioural strategies)
◮ update dynamics for dynamic signaling games
◮ belief learning + best response ◮ reinforcement learning