Investigating the consequences of iterated learning in phonological - - PowerPoint PPT Presentation

investigating the consequences of iterated learning in
SMART_READER_LITE
LIVE PREVIEW

Investigating the consequences of iterated learning in phonological - - PowerPoint PPT Presentation

Introduction Previous work: Interactive Learning Model Iterated Learning Model Investigating the consequences of iterated learning in phonological typology Coral Hughto University of Massachusetts Amherst Society for Computation in


slide-1
SLIDE 1

Introduction Previous work: Interactive Learning Model Iterated Learning Model

Investigating the consequences of iterated learning in phonological typology

Coral Hughto

University of Massachusetts Amherst

Society for Computation in Linguistics (SCiL) 6 January 2018

Coral Hughto UMass Amherst SCiL 2018 Investigating the consequences of iterated learning in phonological typology 1 / 32

slide-2
SLIDE 2

Introduction Previous work: Interactive Learning Model Iterated Learning Model

Introduction

Traditional goal of typology: predict divide between attested and unattested patterns

Grammar should be able to represent all and only attested patterns

Some recent work combines a theory of grammar with a theory

  • f learning to generate probabilistic typological predictions

Pater 2012, Staubs 2014, Stanton 2016, O’Hara 2018, among

  • thers

This approach draws on differences in learnability to explain differences in frequency of attestation

Coral Hughto UMass Amherst SCiL 2018 Investigating the consequences of iterated learning in phonological typology 2 / 32

slide-3
SLIDE 3

Introduction Previous work: Interactive Learning Model Iterated Learning Model

In This Talk

I examine the predictions of combining Maximum Entropy (MaxEnt; Goldwater & Johnson 2003) grammar with one of two agent-based learning models

Reviewing previous work with Interactive learning model Introducing follow-up work with Iterated learning model

Emergent learning biases from both learning models:

Bias away from constraint cumulativity (gang effects) Bias away from variability (such that agents accumulate probability on one output per input)

See Zuraw (2016) on Polarized Variation

With Iterated learning model, bias away from variability only

  • ccurs for longer learning times

Coral Hughto UMass Amherst SCiL 2018 Investigating the consequences of iterated learning in phonological typology 3 / 32

slide-4
SLIDE 4

Introduction Previous work: Interactive Learning Model Iterated Learning Model

MaxEnt

My (and much other) work assumes a weighted-constraint grammatical theory as its base (but see Stanton 2016) 3 2 /In1/ X Y H p →A

  • 1
  • 2

0.73 B

  • 1
  • 3

0.27 3 2 /In2/ X Y H p →C

  • 1
  • 3

0.73 D

  • 2
  • 4

0.27 Harmony score (H) = weighted sum of constraint violations

H(x) = n

i=1 W (Ci) ∗ Ci(x)

Probability (p) = proportion of exponentiated Harmony out of sum over competing candidate set

p(x) =

eH(x) eH(x)+eH(y)+eH(z)...

Coral Hughto UMass Amherst SCiL 2018 Investigating the consequences of iterated learning in phonological typology 4 / 32

slide-5
SLIDE 5

Introduction Previous work: Interactive Learning Model Iterated Learning Model

Gang Effects

3 2 /In1/ X Y H p →A

  • 1
  • 2

0.73 B

  • 1
  • 3

0.27 3 2 /In2/ X Y H p →C

  • 1
  • 3

0.73 D

  • 2
  • 4

0.27 Weighted constraint grammars allow for cumulative constraint interaction (a.k.a. gang effects) Multiple violations of (a) lower-weighted constraint(s) can cumulatively outweigh one violation of a higher-weighted constraint

Coral Hughto UMass Amherst SCiL 2018 Investigating the consequences of iterated learning in phonological typology 5 / 32

slide-6
SLIDE 6

Introduction Previous work: Interactive Learning Model Iterated Learning Model

Gang Effects

3 2 /In1/ X Y H p →A

  • 1
  • 2

0.73 B

  • 1
  • 3

0.27 3 2 /In2/ X Y H p →C

  • 1
  • 3

0.73 D

  • 2
  • 4

0.27 This property of weighted constraint grammars has been criticized for overpredicting the space of typological possibilities (e.g. Legendre et al. 2006, but see Pater 2009) Despite overprediction, the extra representational power may be desirable, e.g.:

stress windows (Staubs 2014) “general-case” neutralization (Hughto and Pater 2017)

Coral Hughto UMass Amherst SCiL 2018 Investigating the consequences of iterated learning in phonological typology 6 / 32

slide-7
SLIDE 7

Introduction Previous work: Interactive Learning Model Iterated Learning Model

Previous work: Hughto and Pater 2017

How to limit overprediction of gang effects with weighted constraints? Perhaps considerations of learnability

Gang effect patterns require a particular balance between the constraint weights

Paired MaxEnt with an agent-based, interactive learning model to generate gradient typological predictions Interactive learning model: simulated learning agents play a kind of imitation game

Coral Hughto UMass Amherst SCiL 2018 Investigating the consequences of iterated learning in phonological typology 7 / 32

slide-8
SLIDE 8

Introduction Previous work: Interactive Learning Model Iterated Learning Model

Previous work: Hughto and Pater 2017

In the interactive learning model, two agents take turns in the roles of teacher and learner

Agents know: constraints, initial weights, inputs and corresponding output candidates There is no target grammar

In each run of the simulation, the agents exchange data for some number of learning steps A1 ↔ A2 Agents’ final grammars are categorized as belonging to a pattern in the typology The distribution of languages learned across multiple runs is taken as the predicted typology

Coral Hughto UMass Amherst SCiL 2018 Investigating the consequences of iterated learning in phonological typology 8 / 32

slide-9
SLIDE 9

Introduction Previous work: Interactive Learning Model Iterated Learning Model

Palatalization Typology

Palatalization typology: possible contrast patterns between /s/ and /S/ (before [i] vs other vowels [a]) (Carroll 2012) Constraints: No[S], No[si], Ident With these constraints, 5 possible patterns:

(44%) Total Neutralization

[si], [sa]

(37%) Full Contrast

[si], [Si], [sa], [Sa]

(10.3%) Complementary Distribution

[Si], [sa]

(8.2%) Special-Case Neutralization

[Si], [sa], [Sa]

(0.5%) General-Case Neutralization (gang effect)

[si], [Si], [sa]

Coral Hughto UMass Amherst SCiL 2018 Investigating the consequences of iterated learning in phonological typology 9 / 32

slide-10
SLIDE 10

Introduction Previous work: Interactive Learning Model Iterated Learning Model

General-Case Neutralization (GCN; gang effect)

weights 3 2 2 /sa/ No[S] No[si] Ident H sa Sa

  • 1
  • 1
  • 5

/Sa/ No[S] No[si] Ident sa

  • 1
  • 2

Sa

  • 1
  • 3

/si/ No[S] No[si] Ident si

  • 1
  • 2

Si

  • 1
  • 1
  • 5

/Si/ No[S] No[si] Ident si

  • 1
  • 1
  • 4

Si

  • 1
  • 3

Coral Hughto UMass Amherst SCiL 2018 Investigating the consequences of iterated learning in phonological typology 10 / 32

slide-11
SLIDE 11

Introduction Previous work: Interactive Learning Model Iterated Learning Model

Results: Avoids gang effect

Zero: Agents initialized with constraint weights at zero Random: Agents initialized with sampled weights, 0-10 Sampling: Just sampling constraint weights, no interaction Type Observed Zero Random Sampling Total Neut. 44% 46.6% 25.7% 16.8% Full Contrast 37% 48% 47.5% 41.3%

  • Comp. Dist.

10.3% 2.6% 7.7% 8.3% Contextual Neut. 8.2% 2.7% 8% 8.4% General-case Neut. 0.5% 0.1% 11.1% 25% r2 0.96 0.63 0.17

Coral Hughto UMass Amherst SCiL 2018 Investigating the consequences of iterated learning in phonological typology 11 / 32

slide-12
SLIDE 12

Introduction Previous work: Interactive Learning Model Iterated Learning Model

Discussion

Combining MaxEnt + learning model:

Keeps the representational power of weighted constraints Restricts typological overprediction by assigning low probability to typologically rare or unobserved patterns, including gang effects

The Interactive learning model additionally tends towards accumulating probability on one output candidate over its competitors Effects are robust across different parameter settings tested Potential issue: in the interactive learning model, agents are not working towards a target grammar Do these biases still emerge in a model where agents are tasked with learning a target grammar?

Coral Hughto UMass Amherst SCiL 2018 Investigating the consequences of iterated learning in phonological typology 12 / 32

slide-13
SLIDE 13

Introduction Previous work: Interactive Learning Model Iterated Learning Model

Iterated Learning Model

Staubs 2014: Iterated learning reduced the predicted probability of gang effects in stress window systems The Iterated learning model approximates the transmission of a language across generations One agent serves as the “teacher” (the target grammar) for a “learner agent” After a period of learning, the learner becomes the teacher for a new learner, and the process repeats for some number of generations A1 → A2, then A2 → A3, then A3 → A4 ...

Coral Hughto UMass Amherst SCiL 2018 Investigating the consequences of iterated learning in phonological typology 13 / 32

slide-14
SLIDE 14

Introduction Previous work: Interactive Learning Model Iterated Learning Model

How it works

A1 → A2, then A2 → A3, then A3 → A4 ... Each agent begins with a set of initial constraint weights (e.g. zero, or randomly sampled) In each learning step:

An input is randomly selected, and each agent samples an

  • utput according to its current grammar

If the outputs are different, the learner updates its constraint weights using the Perceptron update rule (see also Stochastic Gradient Descent, HG-GLA)

New Weights = Old Weights + (Teacher’s Violations - Learner’s Violations) * Learning Rate

From initial teacher to final learner = 1 run of the simulation The distribution of languages learned across multiple runs of the simulation is taken as the predicted typology

Coral Hughto UMass Amherst SCiL 2018 Investigating the consequences of iterated learning in phonological typology 14 / 32

slide-15
SLIDE 15

Introduction Previous work: Interactive Learning Model Iterated Learning Model

Minimal Working Example

/In1/ X Y A

  • 1

B

  • 1

/In2/ D

  • 2

C

  • 1

Three possible patterns:

BC : w(Y) > w(X) AD : w(X) > 2w(Y) AC : 2w(Y) > w(X) > w(Y) (Gang effect)

Coral Hughto UMass Amherst SCiL 2018 Investigating the consequences of iterated learning in phonological typology 15 / 32

slide-16
SLIDE 16

Introduction Previous work: Interactive Learning Model Iterated Learning Model

Minimal Working Example

1 3 /In1/ X Y H A

  • 1
  • 3

→B

  • 1
  • 1

/In2 →C

  • 1
  • 1

D

  • 2
  • 6

3 1 /In1/ X Y H →A

  • 1
  • 1

B

  • 1
  • 3

/In2 C

  • 1
  • 3

→D

  • 2
  • 2

3 2 /In1/ X Y H →A

  • 1
  • 2

B

  • 1
  • 3

/In2 →C

  • 1
  • 3

D

  • 2
  • 4

Three possible patterns:

BC : w(Y) > w(X) AD : w(X) > 2w(Y) AC : 2w(Y) > w(X) > w(Y) (Gang effect)

Coral Hughto UMass Amherst SCiL 2018 Investigating the consequences of iterated learning in phonological typology 16 / 32

slide-17
SLIDE 17

Introduction Previous work: Interactive Learning Model Iterated Learning Model

Iterated Learning Simulations

The iterated learning model was run 1,000 times Two initial constraint weight conditions were tested:

(Zero-Init) Agents’ initial constraint weights were zero (Rand-Init) Agents’ initial constraint weights were randomly sampled from a uniform distribution between 0-10

Each learner agent learned from its teacher for 1,000 learning steps, and there were 50 generations in each run Baseline prediction is the proportion of possible weights that generate each pattern type

BC: 0.5, AD: 0.25, AC: 0.25

Coral Hughto UMass Amherst SCiL 2018 Investigating the consequences of iterated learning in phonological typology 17 / 32

slide-18
SLIDE 18

Introduction Previous work: Interactive Learning Model Iterated Learning Model

Simulation Results (1,000 learning steps)

Like the interactive learning model, the iterated learning model results show a bias away from the gang effect pattern In both the Zero-Init and Rand-Init initial weighting conditions, the model results reduced the predicted probability

  • f the gang effect AC pattern, relative to the sampled baseline

estimation Pattern Sampling Zero-Init Rand-Init BC 0.50 0.55 0.55 AD 0.25 0.43 0.30 AC 0.25 0.03 0.15

Coral Hughto UMass Amherst SCiL 2018 Investigating the consequences of iterated learning in phonological typology 18 / 32

slide-19
SLIDE 19

Introduction Previous work: Interactive Learning Model Iterated Learning Model

Simulation Results (1,000 learning steps)

Results show a bias away from variation (graph shows Rand-Init)

  • 0.5

0.6 0.7 0.8 0.9 1.0 5 10 15 20 25 30 35 40 45 50

Generation Average Probability of Winning Candidate

Average Probability of Winning Candidate by Generation

Coral Hughto UMass Amherst SCiL 2018 Investigating the consequences of iterated learning in phonological typology 19 / 32

slide-20
SLIDE 20

Introduction Previous work: Interactive Learning Model Iterated Learning Model

Simulation Results (200 learning steps)

In the iterated learning model, the bias away from variation is sensitive to the learning step parameter With shorter learning time, 200 learning steps per generation, the bias away from the gang effect AC pattern still emerges: Language Sampling Rand-Init BC 0.50 0.60 AD 0.25 0.24 AC 0.25 0.16

Coral Hughto UMass Amherst SCiL 2018 Investigating the consequences of iterated learning in phonological typology 20 / 32

slide-21
SLIDE 21

Introduction Previous work: Interactive Learning Model Iterated Learning Model

Simulation Results (200 learning steps)

However, the bias away from variation does not visibly emerge:

  • 0.5

0.6 0.7 0.8 0.9 1.0 5 10 15 20 25 30 35 40 45 50

Generation Average Probability of Winning Candidate

Average Probability of Winning Candidate by Generation

Coral Hughto UMass Amherst SCiL 2018 Investigating the consequences of iterated learning in phonological typology 21 / 32

slide-22
SLIDE 22

Introduction Previous work: Interactive Learning Model Iterated Learning Model

Variation depends on number of learning steps?

Not sure why more learning steps correlates with decreasing variability across generations More learning steps expected to correlate with higher accuracy to the target distribution

So, more learning steps should mean less deviation from the initial distribution, not more

But, the difference in accuracy acheived between 200 and 1,000 learning steps in this system isn’t that significant anyway

Coral Hughto UMass Amherst SCiL 2018 Investigating the consequences of iterated learning in phonological typology 22 / 32

slide-23
SLIDE 23

Introduction Previous work: Interactive Learning Model Iterated Learning Model

Target Accuracy

  • 0.00

0.25 0.50 0.75 1.00 100 200 300 400 500 600 700 800 900 1000

Learning Step Average Accuracy to Target

Average Accuracy to Target by Learning Step Coral Hughto UMass Amherst SCiL 2018 Investigating the consequences of iterated learning in phonological typology 23 / 32

slide-24
SLIDE 24

Introduction Previous work: Interactive Learning Model Iterated Learning Model

Summary

Traditional goal of typology: predict divide between attested and unattested patterns A growing line of research additionally investigates the role of learning biases in shaping typology Can generate probabilistic typological predictions by combining a grammar theory with a learning theory

e.g. MaxEnt and an agent-based learning model

Both Interactive and Iterated learning models demonstrate:

Bias away from gang effects (cumulative constraint interaction) Bias away from variation

The iterated learning model only produces a bias away from variation at higher learning step values

Coral Hughto UMass Amherst SCiL 2018 Investigating the consequences of iterated learning in phonological typology 24 / 32

slide-25
SLIDE 25

Introduction Previous work: Interactive Learning Model Iterated Learning Model

Thanks!

Thanks to Joe Pater, Gaja Jarosz, audiences at UMass, PhoNE, NECPhon, mfm, CLS, and everyone here.

Coral Hughto UMass Amherst SCiL 2018 Investigating the consequences of iterated learning in phonological typology 25 / 32

slide-26
SLIDE 26

Introduction Previous work: Interactive Learning Model Iterated Learning Model

Avoid variability (Interactive)

  • 0.5

0.6 0.7 0.8 0.9 1.0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000

Learning Step Average Probability of Winning Candidate

Interactive − Average Probability of Winning Candidate by Learning Step Coral Hughto UMass Amherst SCiL 2018 Investigating the consequences of iterated learning in phonological typology 26 / 32

slide-27
SLIDE 27

Introduction Previous work: Interactive Learning Model Iterated Learning Model

Learning Simulations

The Interactive learning model was run 1,000 times Agents were initialized with random constraint weights sampled from a uniform distribution ranging 0-10 Agents interacted for 10,000 learning steps Baseline prediction is the proportion of possible weights that generate each language type

BC: 0.5, AD: 0.25, AC: 0.25

Coral Hughto UMass Amherst SCiL 2018 Investigating the consequences of iterated learning in phonological typology 27 / 32

slide-28
SLIDE 28

Introduction Previous work: Interactive Learning Model Iterated Learning Model

Simulation Start

  • 5

10 15 5 10 15

w(X) w(Y)

A1_lang

  • AC

AD BC

w(X) by w(Y)

Coral Hughto UMass Amherst SCiL 2018 Investigating the consequences of iterated learning in phonological typology 28 / 32

slide-29
SLIDE 29

Introduction Previous work: Interactive Learning Model Iterated Learning Model

Simulation End

  • 5

10 15 5 10 15

w(X) w(Y)

A1_lang

  • AC

AD BC

w(X) by w(Y)

Coral Hughto UMass Amherst SCiL 2018 Investigating the consequences of iterated learning in phonological typology 29 / 32

slide-30
SLIDE 30

Introduction Previous work: Interactive Learning Model Iterated Learning Model

Simulation Results

  • 5

10 15 5 10 15

w(X) w(Y)

A1_lang

  • AC

AD BC

w(X) by w(Y)

  • 5

10 15 5 10 15

w(X) w(Y)

A1_lang

  • AC

AD BC

w(X) by w(Y)

Coral Hughto UMass Amherst SCiL 2018 Investigating the consequences of iterated learning in phonological typology 30 / 32

slide-31
SLIDE 31

Introduction Previous work: Interactive Learning Model Iterated Learning Model

Simulation Results (Interactive)

0.00 0.25 0.50 0.75 1.00 2500 5000 7500 10000

Learning Step Proportion of Runs

Lang BC AD AC

Language Proportions by Learning Step

Coral Hughto UMass Amherst SCiL 2018 Investigating the consequences of iterated learning in phonological typology 31 / 32

slide-32
SLIDE 32

Introduction Previous work: Interactive Learning Model Iterated Learning Model

Simulation Results (Interactive)

0.00 0.25 0.50 0.75 1.00 2500 5000 7500 10000

Learning Step Proportion of Runs

Lang BC AD AC

Language Proportions by Learning Step

Coral Hughto UMass Amherst SCiL 2018 Investigating the consequences of iterated learning in phonological typology 32 / 32