Feature economy and iterated grammar learning Joe Pater Robert - - PowerPoint PPT Presentation

feature economy and iterated grammar learning
SMART_READER_LITE
LIVE PREVIEW

Feature economy and iterated grammar learning Joe Pater Robert - - PowerPoint PPT Presentation

Feature economy and iterated grammar learning Joe Pater Robert Staubs University of Massachusetts Amherst 21st Manchester Phonology Meeting Joe Pater, Robert Staubs UMass Amherst 21mfm Feature economy and iterated grammar learning 1 / 32


slide-1
SLIDE 1

Feature economy and iterated grammar learning

Joe Pater Robert Staubs

University of Massachusetts Amherst

21st Manchester Phonology Meeting

Joe Pater, Robert Staubs UMass Amherst 21mfm Feature economy and iterated grammar learning 1 / 32

slide-2
SLIDE 2

Overview

Feature economy: an unsolvable problem in standard phonological theory

Joe Pater, Robert Staubs UMass Amherst 21mfm Feature economy and iterated grammar learning 2 / 32

slide-3
SLIDE 3

Overview

Feature economy: an unsolvable problem in standard phonological theory Featural simplicity and ease of learning with an incremental MaxEnt model

Joe Pater, Robert Staubs UMass Amherst 21mfm Feature economy and iterated grammar learning 2 / 32

slide-4
SLIDE 4

Overview

Feature economy: an unsolvable problem in standard phonological theory Featural simplicity and ease of learning with an incremental MaxEnt model Feature economy and contrast in the output of iterated learning

Joe Pater, Robert Staubs UMass Amherst 21mfm Feature economy and iterated grammar learning 2 / 32

slide-5
SLIDE 5

The challenge of feature economy

First - a simple example (J. Kingston p.c., based on Madiesson and Precoda 1992) [b] no [b] [g] 244 11 no [g] 43 153 χ2 = 260, d.f . = 1, p < 0.01

Joe Pater, Robert Staubs UMass Amherst 21mfm Feature economy and iterated grammar learning 3 / 32

slide-6
SLIDE 6

The challenge of feature economy

First - a simple example (J. Kingston p.c., based on Madiesson and Precoda 1992) [b] no [b] [g] 244 11 no [g] 43 153 χ2 = 260, d.f . = 1, p < 0.01 Languages tend to have either both [b] and [g] or neither: it is especially unlikely for a language to have [g] without [b].

Joe Pater, Robert Staubs UMass Amherst 21mfm Feature economy and iterated grammar learning 3 / 32

slide-7
SLIDE 7

The challenge of feature economy

First - a simple example (J. Kingston p.c., based on Madiesson and Precoda 1992) [b] no [b] [g] 244 11 no [g] 43 153 χ2 = 260, d.f . = 1, p < 0.01 Languages tend to have either both [b] and [g] or neither: it is especially unlikely for a language to have [g] without [b]. More generally, a segment is more likely if its feature values are shared by other segments.

Joe Pater, Robert Staubs UMass Amherst 21mfm Feature economy and iterated grammar learning 3 / 32

slide-8
SLIDE 8

The challenge of feature economy

First - a simple example (J. Kingston p.c., based on Madiesson and Precoda 1992) [b] no [b] [g] 244 11 no [g] 43 153 χ2 = 260, d.f . = 1, p < 0.01 Languages tend to have either both [b] and [g] or neither: it is especially unlikely for a language to have [g] without [b]. More generally, a segment is more likely if its feature values are shared by other segments. In other words, languages tend toward feature economy (Martinet 1968; Clements 2003)

Joe Pater, Robert Staubs UMass Amherst 21mfm Feature economy and iterated grammar learning 3 / 32

slide-9
SLIDE 9

The challenge of feature economy - cont’d

First difficulty - feature economy is a property of systems, not

  • f individual representations or derivations

Joe Pater, Robert Staubs UMass Amherst 21mfm Feature economy and iterated grammar learning 4 / 32

slide-10
SLIDE 10

The challenge of feature economy - cont’d

First difficulty - feature economy is a property of systems, not

  • f individual representations or derivations

How do we express the dependency of [b] on [g] and vice versa?

Joe Pater, Robert Staubs UMass Amherst 21mfm Feature economy and iterated grammar learning 4 / 32

slide-11
SLIDE 11

The challenge of feature economy - cont’d

First difficulty - feature economy is a property of systems, not

  • f individual representations or derivations

How do we express the dependency of [b] on [g] and vice versa? Standard phonological theories, be they rule- or constraint-based, do not provide a formal mechanism to express such systemic dependencies

Joe Pater, Robert Staubs UMass Amherst 21mfm Feature economy and iterated grammar learning 4 / 32

slide-12
SLIDE 12

The challenge of feature economy - cont’d

Second difficulty - feature economy is a tendency

Joe Pater, Robert Staubs UMass Amherst 21mfm Feature economy and iterated grammar learning 5 / 32

slide-13
SLIDE 13

The challenge of feature economy - cont’d

Second difficulty - feature economy is a tendency Languages with [p k g] or [p k b] are rare, not unattested

Joe Pater, Robert Staubs UMass Amherst 21mfm Feature economy and iterated grammar learning 5 / 32

slide-14
SLIDE 14

The challenge of feature economy - cont’d

Second difficulty - feature economy is a tendency Languages with [p k g] or [p k b] are rare, not unattested Standard phonological theories deal only with typological absolutes, not probabilities

Joe Pater, Robert Staubs UMass Amherst 21mfm Feature economy and iterated grammar learning 5 / 32

slide-15
SLIDE 15

The challenge of feature economy - cont’d

Our claim - we do not need a new kind of phonological grammar to deal with feature economy

Joe Pater, Robert Staubs UMass Amherst 21mfm Feature economy and iterated grammar learning 6 / 32

slide-16
SLIDE 16

The challenge of feature economy - cont’d

Our claim - we do not need a new kind of phonological grammar to deal with feature economy Instead, we incorporate learning into typological explanation (as in fact suggested by Martinet 1968)

Joe Pater, Robert Staubs UMass Amherst 21mfm Feature economy and iterated grammar learning 6 / 32

slide-17
SLIDE 17

The challenge of feature economy - cont’d

Our claim - we do not need a new kind of phonological grammar to deal with feature economy Instead, we incorporate learning into typological explanation (as in fact suggested by Martinet 1968) We’ll first show how featurally simple systems are learned more quickly by an incremental MaxEnt learner with conjunctive constraint schema

Joe Pater, Robert Staubs UMass Amherst 21mfm Feature economy and iterated grammar learning 6 / 32

slide-18
SLIDE 18

The challenge of feature economy - cont’d

Our claim - we do not need a new kind of phonological grammar to deal with feature economy Instead, we incorporate learning into typological explanation (as in fact suggested by Martinet 1968) We’ll first show how featurally simple systems are learned more quickly by an incremental MaxEnt learner with conjunctive constraint schema We’ll then show how this learning bias can probabilistically affect typology, using iterated learning/agent-based modeling

Joe Pater, Robert Staubs UMass Amherst 21mfm Feature economy and iterated grammar learning 6 / 32

slide-19
SLIDE 19

Featural simplicity and learning bias

Simplicity bias in the learning of inventories in this space: A small representational universe Voiced Voiceless Aspirated Labial b p ph Coronal d t th Dorsal g k kh

Joe Pater, Robert Staubs UMass Amherst 21mfm Feature economy and iterated grammar learning 7 / 32

slide-20
SLIDE 20

Featural simplicity and learning bias

Simplicity bias in the learning of inventories in this space: A small representational universe Voiced Voiceless Aspirated Labial b p ph Coronal d t th Dorsal g k kh One laryngeal feature language: [b d g]

Joe Pater, Robert Staubs UMass Amherst 21mfm Feature economy and iterated grammar learning 7 / 32

slide-21
SLIDE 21

Featural simplicity and learning bias

Simplicity bias in the learning of inventories in this space: A small representational universe Voiced Voiceless Aspirated Labial b p ph Coronal d t th Dorsal g k kh One laryngeal feature language: [b d g] Two laryngeal feature language: [b t g]

Joe Pater, Robert Staubs UMass Amherst 21mfm Feature economy and iterated grammar learning 7 / 32

slide-22
SLIDE 22

Featural simplicity and learning bias

Simplicity bias in the learning of inventories in this space: A small representational universe Voiced Voiceless Aspirated Labial b p ph Coronal d t th Dorsal g k kh One laryngeal feature language: [b d g] Two laryngeal feature language: [b t g] Three laryngeal feature language: [b t kh]

Joe Pater, Robert Staubs UMass Amherst 21mfm Feature economy and iterated grammar learning 7 / 32

slide-23
SLIDE 23

Featural simplicity and learning bias - cont’d

“General” constraints target each feature (lab, cor, dor, vce, vcl, asp)

Joe Pater, Robert Staubs UMass Amherst 21mfm Feature economy and iterated grammar learning 8 / 32

slide-24
SLIDE 24

Featural simplicity and learning bias - cont’d

“General” constraints target each feature (lab, cor, dor, vce, vcl, asp) “Specific” constraints target each conjunction (e.g. lab∧vce = [b])

Joe Pater, Robert Staubs UMass Amherst 21mfm Feature economy and iterated grammar learning 8 / 32

slide-25
SLIDE 25

Featural simplicity and learning bias - cont’d

“General” constraints target each feature (lab, cor, dor, vce, vcl, asp) “Specific” constraints target each conjunction (e.g. lab∧vce = [b]) Constraints can have negative or positive weight

Joe Pater, Robert Staubs UMass Amherst 21mfm Feature economy and iterated grammar learning 8 / 32

slide-26
SLIDE 26

Featural simplicity and learning bias - cont’d

“General” constraints target each feature (lab, cor, dor, vce, vcl, asp) “Specific” constraints target each conjunction (e.g. lab∧vce = [b]) Constraints can have negative or positive weight Probability of a representation proportional to exp(Harmony) (as in Hayes and Wilson 2008)

Joe Pater, Robert Staubs UMass Amherst 21mfm Feature economy and iterated grammar learning 8 / 32

slide-27
SLIDE 27

Featural simplicity and learning bias - cont’d

“General” constraints target each feature (lab, cor, dor, vce, vcl, asp) “Specific” constraints target each conjunction (e.g. lab∧vce = [b]) Constraints can have negative or positive weight Probability of a representation proportional to exp(Harmony) (as in Hayes and Wilson 2008) Weights are adjusted incrementally to move probability onto the observed forms

Joe Pater, Robert Staubs UMass Amherst 21mfm Feature economy and iterated grammar learning 8 / 32

slide-28
SLIDE 28

Featural simplicity and learning bias - cont’d

“General” constraints target each feature (lab, cor, dor, vce, vcl, asp) “Specific” constraints target each conjunction (e.g. lab∧vce = [b]) Constraints can have negative or positive weight Probability of a representation proportional to exp(Harmony) (as in Hayes and Wilson 2008) Weights are adjusted incrementally to move probability onto the observed forms All as in Pater and Moreton 2012

Joe Pater, Robert Staubs UMass Amherst 21mfm Feature economy and iterated grammar learning 8 / 32

slide-29
SLIDE 29

0.33 0.43 0.53 0.63 0.73 0.83 0.93 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 110 115 120 125 130 135 140 145

probbaility epoch

Probability of observed data by epoch

bdg btg btK

Joe Pater, Robert Staubs UMass Amherst 21mfm Feature economy and iterated grammar learning 9 / 32

slide-30
SLIDE 30
  • 1.5
  • 1
  • 0.5

0.5 1 1.5 2 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 110 115 120 125 130 135 140 145 150

weight ¡ epoch

[bdg] constraint weights by epoch

vce g b d k K p P t T asp vcl

Joe Pater, Robert Staubs UMass Amherst 21mfm Feature economy and iterated grammar learning 10 / 32

slide-31
SLIDE 31
  • 1.5
  • 1
  • 0.5

0.5 1 1.5 2 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 110 115 120 125 130 135 140 145 150

weight epoch

[btg] constraint weights by epoch

vce t g b vcl K P T k p d asp

Joe Pater, Robert Staubs UMass Amherst 21mfm Feature economy and iterated grammar learning 11 / 32

slide-32
SLIDE 32

Iterated learning

Iterated learning models give an idealized view of language change: agents learn languages from one another in succession. (Hare and Elman 1995, see overviews in Zuraw 2003, Wedel 2011) Probabilistic biases of learning are reflected in the predicted frequency of resulting languages.

Joe Pater, Robert Staubs UMass Amherst 21mfm Feature economy and iterated grammar learning 12 / 32

slide-33
SLIDE 33

We use just two agents in our iterated learning simulations. Both agents learn from random data for 100 iterations. “Production”

1 Agent is chosen at random. 2 That agent chooses a meaning, produces a pronunciation

according to its (Maximum Entropy) grammar.

Joe Pater, Robert Staubs UMass Amherst 21mfm Feature economy and iterated grammar learning 13 / 32

slide-34
SLIDE 34

Pairs of words with three potentially pronunciations each: Three pairs Meaning Pronunciations Meaning 1 Meaning 2 bi pi phi Meaning 3 Meaning 4 di ti thi Meaning 5 Meaning 6 gi ki khi General and specific constraints on consonantal place and laryngeal features, as in the earlier simulation. Constraints demanding mappings between meaning and sound (e.g. M1 → [pi]).

Joe Pater, Robert Staubs UMass Amherst 21mfm Feature economy and iterated grammar learning 14 / 32

slide-35
SLIDE 35

The agent that is “listening” does not know the intended meaning—it must be inferred. We use a version of Robust Interpretive Parsing (Tesar & Smolensky, 2000). The agent chooses the meaning which would be most likely to yield the given pronunciation according to its own grammar. Learner uses perceived meaning to produce a pronunciation.

Joe Pater, Robert Staubs UMass Amherst 21mfm Feature economy and iterated grammar learning 15 / 32

slide-36
SLIDE 36

Learner’s grammar is updated if its pronunciation does not match its teacher’s. One step of iterated learning Teacher Production: M1 → a b Interpretation: a → M1 M2 Learner Production: M2 → a b Update: Output is not a ⇒ M2 → a ↑, M2 → b ↓

Joe Pater, Robert Staubs UMass Amherst 21mfm Feature economy and iterated grammar learning 16 / 32

slide-37
SLIDE 37

Emergent effects

Resulting languages...

1 Have contrasting pronunciations for meanings. 2 Use near-categorical pronunciations. 3 Distinguish meanings in an economical way. Joe Pater, Robert Staubs UMass Amherst 21mfm Feature economy and iterated grammar learning 17 / 32

slide-38
SLIDE 38

Homophony avoidance: one meaning per sound

Out of 10,000 runs 9,979 distinguished all 6 meanings. There are three meaning pairs with three pronunciations for each. Full contrast is thus expected only

8 27 = 29.63% of the time.

Joe Pater, Robert Staubs UMass Amherst 21mfm Feature economy and iterated grammar learning 18 / 32

slide-39
SLIDE 39

Categorical results: one sound per meaning

With three candidates and initial zero weights, starting probabilities are 33.33%. In 9,979 of the runs all 6 meanings had a candidate with probability > 50%.

Joe Pater, Robert Staubs UMass Amherst 21mfm Feature economy and iterated grammar learning 19 / 32

slide-40
SLIDE 40

Economy

One way to measure economy is the number of distinct patterns of contrast used.

Joe Pater, Robert Staubs UMass Amherst 21mfm Feature economy and iterated grammar learning 20 / 32

slide-41
SLIDE 41

Economy

One way to measure economy is the number of distinct patterns of contrast used. For example: Voiced vs. voiceless suffices to distinguish all 3 pairs.

Joe Pater, Robert Staubs UMass Amherst 21mfm Feature economy and iterated grammar learning 20 / 32

slide-42
SLIDE 42

Economy

One way to measure economy is the number of distinct patterns of contrast used. For example: Voiced vs. voiceless suffices to distinguish all 3 pairs. (Maximally economical.)

Joe Pater, Robert Staubs UMass Amherst 21mfm Feature economy and iterated grammar learning 20 / 32

slide-43
SLIDE 43

Economy

One way to measure economy is the number of distinct patterns of contrast used. For example: Voiced vs. voiceless suffices to distinguish all 3 pairs. (Maximally economical.) Voiced vs. voiceless, voiceless vs. aspirated, voiced vs. aspirated could be used.

Joe Pater, Robert Staubs UMass Amherst 21mfm Feature economy and iterated grammar learning 20 / 32

slide-44
SLIDE 44

Economy

One way to measure economy is the number of distinct patterns of contrast used. For example: Voiced vs. voiceless suffices to distinguish all 3 pairs. (Maximally economical.) Voiced vs. voiceless, voiceless vs. aspirated, voiced vs. aspirated could be used. (Minimally economical.)

Joe Pater, Robert Staubs UMass Amherst 21mfm Feature economy and iterated grammar learning 20 / 32

slide-45
SLIDE 45

Results compared with chance Count 1 contrast 2 contrasts 3 contrasts Simulation 32.14% 57.96% 09.69% Chance 11.11% 66.66% 22.22% n = 10, 000

Joe Pater, Robert Staubs UMass Amherst 21mfm Feature economy and iterated grammar learning 21 / 32

slide-46
SLIDE 46

Explaining contrast and categorical results

A tendency toward a categorical choice of sound for a meaning

  • ccurs when agents learn from one another.

The interpretation step of iterated learning is important to both effects. A learner must decide which meaning was intended in order to update its grammar. Use of Robust Interpretive Parsing (or something similar) pushes learners to choose categorical, distinct pronunciations.

Joe Pater, Robert Staubs UMass Amherst 21mfm Feature economy and iterated grammar learning 22 / 32

slide-47
SLIDE 47

RIP is based on preferences of the following type: Pronunciation x is more likely to originate from meaning y Two logically possible situations C I Meaning 1 preferred for pronunciation a. Meaning 2 preferred for b and c. C II Meaning 1 not preferred for any pronunciation. Meaning 2 preferred for a, b, and c.

Joe Pater, Robert Staubs UMass Amherst 21mfm Feature economy and iterated grammar learning 23 / 32

slide-48
SLIDE 48

Case I Prefers 1: a, 2: b, c Hear a b c Interpret 1 2 2 Rewards 1→a 2→b 2→c Penalizes 1→b, 1→c 2→a, 2→c 2→a, 2→b Prefers 1: a↑, 2: b↑, c↑ 1: a↑, 2: b↑, c↓ 1: a↑, 2: b↓, c↑ a is interpreted as Meaning 1. This is reinforced. Other pronunciations become more preferred for Meaning 2. b or c interpreted as Meaning 2. This is reinforced. ⇒ Pronunciation of Meaning 1 becomes fixed and distinct from Meaning 2’s pronunciations.

Joe Pater, Robert Staubs UMass Amherst 21mfm Feature economy and iterated grammar learning 24 / 32

slide-49
SLIDE 49

Case II Prefers 1: ∅, 2: a, b, c Hear a b c Interpret 2 2 2 Rewards 2→a 2→b 2→c Penalizes 2→b, 2→c 2→a, 2→c 2→a, 2→b Prefers 2: a↑, b↓, c↓ 2: a↓, b↑, c↓ 2: a↓, b↓, c↑ Preference for interpreting one pronunciation as Meaning 2 decreases to the point where it’s better interpreted as Meaning 1. This is Case I.

Joe Pater, Robert Staubs UMass Amherst 21mfm Feature economy and iterated grammar learning 25 / 32

slide-50
SLIDE 50

If homophony is not possible across candidate sets, meanings are distinguished much less often. For example, if the pronunciations of two words differ in vowels their consonants are not under pressure to be distinct. In such a simulation only 1,364 pairs were distinguished by their consonants (compare with 29,937).

Joe Pater, Robert Staubs UMass Amherst 21mfm Feature economy and iterated grammar learning 26 / 32

slide-51
SLIDE 51

Feature economy in iterated learning

More economical systems rely on more general constraints, as shown earlier.

Joe Pater, Robert Staubs UMass Amherst 21mfm Feature economy and iterated grammar learning 27 / 32

slide-52
SLIDE 52

Feature economy in iterated learning

More economical systems rely on more general constraints, as shown earlier. These constraints allow faster updates in probability in iterated learning.

Joe Pater, Robert Staubs UMass Amherst 21mfm Feature economy and iterated grammar learning 27 / 32

slide-53
SLIDE 53

Feature economy in iterated learning

More economical systems rely on more general constraints, as shown earlier. These constraints allow faster updates in probability in iterated learning. A pronunciation is more likely to gain advantage – or to capitalize

  • n chance – if it is helped by a general constraint.

Joe Pater, Robert Staubs UMass Amherst 21mfm Feature economy and iterated grammar learning 27 / 32

slide-54
SLIDE 54

Feature economy in iterated learning

More economical systems rely on more general constraints, as shown earlier. These constraints allow faster updates in probability in iterated learning. A pronunciation is more likely to gain advantage – or to capitalize

  • n chance – if it is helped by a general constraint.

The perception/production process can work across meaning pairs through general constraints.

Joe Pater, Robert Staubs UMass Amherst 21mfm Feature economy and iterated grammar learning 27 / 32

slide-55
SLIDE 55

Reduction to chance without general constraints Count 1 contrast 2 contrasts 3 contrasts General constraints 32.14% 57.96% 09.69% No general constraints 10.93% 65.99% 22.67% Chance 11.11% 66.66% 22.22% n = 10, 000

Joe Pater, Robert Staubs UMass Amherst 21mfm Feature economy and iterated grammar learning 28 / 32

slide-56
SLIDE 56

Feature economy and contrast emerge in the output of iterated learning without any principles specifically demanding economy or contrast.

Joe Pater, Robert Staubs UMass Amherst 21mfm Feature economy and iterated grammar learning 29 / 32

slide-57
SLIDE 57

Feature economy and contrast emerge in the output of iterated learning without any principles specifically demanding economy or contrast. Boersma and Hamann (2008) demonstrate that dispersion can also emerge from agent interaction

Joe Pater, Robert Staubs UMass Amherst 21mfm Feature economy and iterated grammar learning 29 / 32

slide-58
SLIDE 58

Feature economy and contrast emerge in the output of iterated learning without any principles specifically demanding economy or contrast. Boersma and Hamann (2008) demonstrate that dispersion can also emerge from agent interaction Suggests that phonological grammars might not need to evaluate whole systems (cf. especially Flemming’s 1995 constraints on dispersion and contrast)

Joe Pater, Robert Staubs UMass Amherst 21mfm Feature economy and iterated grammar learning 29 / 32

slide-59
SLIDE 59

Further work

Mackie and Mielke (2011) show that some feature economy effects emerge without distinctive features - what is the relationship between our models?

Joe Pater, Robert Staubs UMass Amherst 21mfm Feature economy and iterated grammar learning 30 / 32

slide-60
SLIDE 60

Further work

Mackie and Mielke (2011) show that some feature economy effects emerge without distinctive features - what is the relationship between our models? An account of non-economical features - e.g. Kingston 1995 points

  • ut that place is not particularly economical across phonation

types.

Joe Pater, Robert Staubs UMass Amherst 21mfm Feature economy and iterated grammar learning 30 / 32

slide-61
SLIDE 61

Acknowledgments

This research was supported by grant BCS-0813829 from the National Science Foundation to the University of Massachusetts Amherst and an NSF Graduate Research Fellowship to Robert

  • Staubs. Thanks to John Kingston and Elliott Moreton for

discussion.

Joe Pater, Robert Staubs UMass Amherst 21mfm Feature economy and iterated grammar learning 31 / 32

slide-62
SLIDE 62

Use of the same production and perception grammar means that “ties” are broken sooner. In Case I this is the ambiguous updates from b and c. In Case II it holds for all three pronunciations. Eventually one will dominate by random chance. Biasing production probabilities helps this happen faster.

Joe Pater, Robert Staubs UMass Amherst 21mfm Feature economy and iterated grammar learning 32 / 32