A Framework for Representing
Language Acquisition in a Population Setting
Jordan Kodner Christopher Cerezo Falco University of Pennsylvania ACL - July 16, 2018 Melbourne
A Framework for Representing Language Acquisition in a Population - - PowerPoint PPT Presentation
A Framework for Representing Language Acquisition in a Population Setting Jordan Kodner Christopher Cerezo Falco University of Pennsylvania ACL - July 16, 2018 Melbourne Language Change Languages change over time Both an internal and
Jordan Kodner Christopher Cerezo Falco University of Pennsylvania ACL - July 16, 2018 Melbourne
Languages change over time
Modelling Change
community
2
as /ɔ/ “caught”
/ɒ/~/ɔ/ become homophones /ɒ/ /ɔ/ cot caught Don Dawn collar caller knotty naughty
awed pond pawned
3
Merged Unmerged
as /ɔ/ “caught”
American English
○ Eastern New England ○ Western Pennsylvania ○ Lower Midwest ○ West ○ Canada (all)
4
Merged Unmerged
as /ɔ/ “caught”
American English
○ Eastern New England ○ Western Pennsylvania ○ Lower Midwest ○ West ○ Canada (all)
(Johnson 2007)
5
Merged Unmerged
as /ɔ/ “caught”
American English
○ Eastern New England ○ Western Pennsylvania ○ Lower Midwest ○ West ○ Canada (all)
parents and older siblings but merged younger siblings
6
Merged Unmerged
7
8
○ Individual agents on a grid moving randomly and interacting (ABM) ○ e.g., Harrison et al. 2002, Satterfield 2001, Schulze et al. 2008, Stanford & Kenny 2013
9
○ Individual agents on a grid moving randomly and interacting (ABM) ○ e.g., Harrison et al. 2002, Satterfield 2001, Schulze et al. 2008, Stanford & Kenny 2013 + Bloomfield (1933)’s Principle of Density for free + Diffusion is straightforward
10
○ Speakers are nodes in a graph, edges are possibility of interaction ○ e.g., Baxter et al. 2006, Baxter et al. 2009, Blythe & Croft 2012, Fagyal et
11
○ Speakers are nodes in a graph, edges are possibility of interaction ○ e.g., Baxter et al. 2006, Baxter et al. 2009, Blythe & Croft 2012, Fagyal et
+ Much more control over network structure + Easy to model concepts from the sociolinguistic lit. (e.g., Milroy & Milroy)
same problem as #1
12
○ Expected outcome of interactions is calculated analytically ○ e.g., Abrams & Stroganz 2003, Baxter et al. 2006, Minett & Wang 2008, Niyogi & Berwick 1997, Yang 2000, Niyogi & Berwick 2009
13
○ Expected outcome of interactions is calculated analytically ○ e.g., Abrams & Stroganz 2003, Baxter et al. 2006, Minett & Wang 2008, Niyogi & Berwick 1997, Yang 2000, Niyogi & Berwick 2009 + Closed-form solution rather than simulation -> faster and more direct
populations
14
This proliferation of “boutique” frameworks is a problem
15
16
Impose density effects on a network structure and calculate the outcome of each iteration analytically
17
Impose density effects on a network structure and calculate the outcome of each iteration analytically Swarm
+ Captures the Principle of Density
Network
+ Models key facts about social networks
Algebraic
+ No random process in the core algorithm
18
Language change as a two-step loop
19
Language ≈ Variant ≈ Sample
That which is learned/influenced by L Grammar ≈ Variety ≈ Latent Variable
20
G: {Merged grammar, Non-merged grammar} L: Merged or non-merged instances of cot and caught words G: {Dived-generating grammar, Dove-generating grammar} L: Instances of the past tense of dive as dived or dove G: {have+NEG = haven’t got grammar, have+NEG = don’t have grammar} L: Instances of haven’t got and instances of don’t have
21
Language change as a two-step loop
If this were a linear chain,
22
Language change as a two-step loop
23
For T iterations, For the individual at each node Begin travelling; While travelling Randomly select outgoing edge by weight and follow it OR stop; Increase chance of stopping next time; End Interact with the individual at the current Node; End End
24
For T iterations, For the individual at each node Begin travelling; While travelling Randomly select outgoing edge by weight and follow it OR stop; Increase chance of stopping next time; End Interact with the individual at the current node; End End
25
Nodes are not individuals. Individuals “stand on” nodes
For T iterations, For the individual at each node Begin travelling; While travelling Randomly select outgoing edge by weight and follow it OR stop; Increase chance of stopping next time; End Interact with the individual at the current node; End End
26
Weighted or unweighted, Directed or undirected Individuals “travel” along edges and find someone to interact with
For T iterations, For the individual at each node Begin travelling; While travelling Randomly select outgoing edge by weight and follow it OR stop; Increase chance of stopping next time; End Interact with the individual at the current node; End End
27
Weighted or unweighted, Directed or undirected Determine who this node Individuals connected by shorter or higher weighted paths are more likely to interact.
For T iterations, For the individual at each node Begin travelling; While travelling Randomly select outgoing edge by weight and follow it OR stop; Increase chance of stopping next time; End Interact with the individual at the current node; End End
28
Weighted or unweighted, Directed or undirected Rather than simulating interactions in a loop, calculate a closed-form solution
29
The Linguistic Environment
n individuals, g possible grammars
30
The Linguistic Environment Distribution of Grammars
31
The Linguistic Environment Distribution of Grammars Interaction Probabilities
steps travelled declines by a geometric distribution
32
T
33
34
Learners will acquire the merged grammar iff more than ~17% of their environment is merged
35
Learners will acquire the merged grammar iff more than ~17% of their environment is merged
+ Accounts for mergers’ tendency to spread (Labov 1994) + 17% is close to the merged rate estimated in Johnson 2007
36
Learners will acquire the merged grammar iff more than ~17% of their environment is merged
+ Accounts for mergers’ tendency to spread (Labov 1994) + 17% is close to the merged rate estimated in Johnson 2007
37
Claim: The merged grammar has a processing advantage
38
Claim: The merged grammar has a processing advantage Claim: Merged listeners have a lower rate of initial misinterpretation
39
Claim: The merged grammar has a processing advantage Claim: Merged listeners have a lower rate of initial misinterpretation Claim: Only minimal pairs are relevant
40
Claim: The merged grammar has a processing advantage Claim: Merged listeners have a lower rate of initial misinterpretation Claim: Only minimal pairs are relevant
rate of mishearing one vowel for the other (A- said /ɒ/ but B- heard /ɔ/)
41
Claim: The merged grammar has a processing advantage Claim: Merged listeners have a lower rate of initial misinterpretation Claim: Only minimal pairs are relevant
rate of mishearing one vowel for the other (A- said /ɒ/ but B- heard /ɔ/)
expects /ɔ/ and visa-versa
42
Claim: The merged grammar has a processing advantage Claim: Merged listeners have a lower rate of initial misinterpretation Claim: Only minimal pairs are relevant
rate of mishearing one vowel for the other (A- said /ɒ/ but B- heard /ɔ/)
expects /ɔ/ and visa-versa
misunderstandings come down to lexical access - if the intended meaning is not the most frequent meaning (Carmazza et al 2001)
43
Probability of initial misunderstanding depends on
44
Probability of initial misunderstanding depends on
Using minimal pair frequencies estimated from SUBTLEXus and a variational learner, learners will acquire the merged grammar iff more than ~17% of their environment is merged (Yang 2009)
45
Two Grammars: Merged grammar g+ Non-merged grammar g- Precomputed Acquisition Function An individual acquires 100% g+ if >17% environment is generated by the g+, else acquire 100% g-
46
some community members are better connected than others
47
MA
(Merged)
RI
(Non-Merged)
some community members are better connected than others
(Massachusetts)
Island)
48
MA
(Merged)
RI
(Non-Merged)
some community members are better connected than others
(Massachusetts)
Island)
cluster (the “Frontier”)
49
MA
(Merged)
RI
(Non-Merged)
some community members are better connected than others
(Massachusetts)
Island)
cluster (the “Frontier”)
connected to other clusters
50
MA
(Merged)
RI
(Non-Merged)
Rhode Island clusters follows an S-shape
are also S-shaped ○ Staggered in time ○ Steep slopes = rapid change
51
Cluster Merger Rates Rhode Island Avg
The Propagation Function
The Cot-Caught Application
52
53
Acknowledgements: Implementation:
grammars g1, g2 simultaneously
54
grammars g1, g2 simultaneously
cannot parse an input
55
p + γq, if g1 parses input (1-γ)p, if g1 fails
grammars g1, g2 simultaneously
cannot parse an input
probability has the advantage
56
p + γq, if g1 parses input (1-γ)p, if g1 fails
grammars g1, g2 simultaneously
cannot parse an input
probability has the advantage
grammar categorically, the one with smaller C wins
57
p + γq, if g1 parses input (1-γ)p, if g1 fails 1, if C1 < C2 0, if C2 < C1
Penalty probabilities depend on
58
Penalty probabilities depend on
mi, ni = frequencies of each member of a minimal pair H = Σi mi + ni ε = probability of mishearing one vowel for the other
C+ = (1/H) Σi min(mi, ni) hearing the less freq word C- = (1/H) Σi [p+((1-εm)mi + εnni) mishearing + input + p-(εmmi + εnni)] misinterpreting - input
59
iteration
60
Cluster Merger Rates Rhode Island Avg
iteration
points are temporally closer
connected for long
61
Cluster Merger Rates Rhode Island Avg
distinguish older and younger siblings
correct age at any moment
iteration
62
Cluster Merger Rates Rhode Island Avg
distinguish older and younger siblings
correct age at any moment
iteration
cluster “tipping points”
63
Cluster Merger Rates Rhode Island Avg
individuals each
64
Cluster Merger Rates Rhode Island Avg
individuals each
65
Cluster Merger Rates Rhode Island Avg
is more sensitive to random connections
66
Trial Avgs
is more sensitive to random connections
67
Trial Avgs