A Joint Learning Model of Word Segmentation, Lexical Acquisition and Phonetic Variability
Micha Elsner Sharon Goldwater Naomi Feldman Frank Wood
The Ohio State University, University of Edinburgh, University of Maryland and Oxford University
A Joint Learning Model of Word Segmentation, Lexical Acquisition and - - PowerPoint PPT Presentation
A Joint Learning Model of Word Segmentation, Lexical Acquisition and Phonetic Variability Micha Elsner Sharon Goldwater Naomi Feldman Frank Wood The Ohio State University, University of Edinburgh, University of Maryland and Oxford University
The Ohio State University, University of Edinburgh, University of Maryland and Oxford University
◮ The infant learner hears a stream of
2
◮ The infant learner hears a stream of
◮ And is sensitive to repeated sequences
2
◮ We follow: (Goldwater, Griffiths, Johnson ‘09) (GGJ) ◮ Basic idea since (Brent ‘99) ◮ Many extensions since
◮ Word boundaries from phonotactics: (Fleck ‘08,
◮ Word-like units from acoustics: (Park+al ‘08,
3
◮ “Intended form” /want/ ends up as [wan] or
◮ Lowers overall performance of GGJ... ◮ And changes qualitative results
◮ Learn syllables or morphemes instead of words
◮ “youlike”, “wantto” ◮ Production evidence: Early words show up
◮ Infants don’t produce subwords
4
◮ Segments words ◮ Clusters word tokens into lexical entries ◮ Infers a model of phonetic variation ◮ ...on a broad-coverage corpus
5
◮ (Feldman+al ‘09, ‘13): vowel learning (fixed lexicon) ◮ (Driesen+al ‘09, Rasanen ‘11): words and sounds
◮ (Börschinger+al ‘13): segmentation and phonetics
◮ (Neubig+al ‘10): LM from phone lattices (eval
◮ (Elsner+al ‘12): two-stage pipeline
6
7
◮ Standard problem with pipelines:
◮ Not a good cognitive model: doesn’t capture
◮ Type-level inference doesn’t scale to
7
Infants form collocations ...and have trouble with vowel-initial words
Infants learn consonants better ...and underestimate variation
Short, frequent words are hard
8
a, b, ..., ju, ... want, ... juwant, ... Generator for possible words Probabilities for each word (sparse) p(ði) = .1, p(a) = .05, p(want) = .01...
∞ contexts
Conditional probabilities for each word after each word p(ði | want) = .3, p(a | want) = .1, p(want | want) = .0001...
x
Intended forms ju want ə kuki ju want ɪt ...
n utterances
9
a, b, ..., ju, ... want, ... juwant, ... Generator for possible words Probabilities for each word (sparse) p(ði) = .1, p(a) = .05, p(want) = .01...
∞ contexts
Conditional probabilities for each word after each word p(ði | want) = .3, p(a | want) = .1, p(want | want) = .0001...
x
Intended forms ju want ə kuki ju want ɪt ...
n utterances
Surface forms jə wan ə kuki ju wand ɪt ...
10
◮ Independently rewrites each character
◮ Log-linear features based on articulation
◮ Can insert (→ h) but not delete (h →) ◮ Similar to (Neubig, Elsner, Börschinger) but simpler
◮ Initialize with simple model (a → a) ◮ Learn via EM
11
◮ Character-by-character Gibbs likely to get
◮ Following previous work ◮ Semi-Markov formulation of GGJ (Mochihashi+al ‘09) ◮ Composition with transducer yields large
12
...
13
...
14
...
14
...
14
Infants form collocations ...and have trouble with vowel-initial words
Infants learn consonants better ...and underestimate variation
Short, frequent words are hard
15
◮ Use: Bernstein-Ratner (child-directed) (Bernstein-Ratner ‘87) ◮ Buckeye (closely transcribed) (Pitt+al ‘07) ◮ Sample pronunciation for each BR word from Buckeye
◮ No coarticulation between words
16
◮ GGJ on clean data has high precision, low
◮ On variable data, tradeoff flips (as in (Fleck ‘08))
17
◮ Our inference scheme works
◮ Confidence intervals overlap 17
◮ Segmentation with transducer trades recall
◮ Moving closer to original qualitative results
17
◮ Correct boundaries and lexical item ◮ Correct boundaries, wrong lexical item: ju analyzed as jEs ◮ Collocation: boundaries are real but too wide: real ju•want
◮ Split: dOgiz as dO•giz ◮ One boundary: ju•wa... ◮ Just plain wrong
18
◮ “Wrong form” errors could be repaired in
◮ ...but collocation vs split cannot be
19
◮ Infants are slow to segment vowel-initial
◮ Initial vowels often variable, resyllabified
◮ Transducer system has trouble with vowels... ◮ More likely to find collocation, less likely to
20
◮ Infants learn consonant categories slower
◮ Non-native vowel contrasts lost by 8 ms (Kuhl,
◮ Consonant contrasts by 10-12 ms (Werker+Tees)
◮ Generalize across talkers/dialects slowly
◮ (Houston+Jusczyk, Singh)
21
◮ u and D are around equally variable ◮ But model learns variants of u better ◮ In general model underestimates true
22
◮ Little known about infant misrecognitions ◮ Adults misrecognize things... (Butterfield+Cutler) ◮ Incorrect hypothesis contains frequent words
◮ Indefinite article is hard (Kim+al, Dilley+Pitt)
23
◮ two/to ◮ can/can’t ◮ and/an ◮ his/is
◮ it/it’s/is ◮ a/is ◮ who/who’s/whose ◮ that’s/what’s ◮ there/there’s
24
◮ Replicates several experimental results ◮ Broad-coverage, naturalistic corpus
◮ Token-based sampling can extend to
◮ Cross-linguistic?
◮ ACL archive ◮ bitbucket.org/melsner/beamseg
25