a joint learning model of word segmentation lexical
play

A Joint Learning Model of Word Segmentation, Lexical Acquisition and - PowerPoint PPT Presentation

A Joint Learning Model of Word Segmentation, Lexical Acquisition and Phonetic Variability Micha Elsner Sharon Goldwater Naomi Feldman Frank Wood The Ohio State University, University of Edinburgh, University of Maryland and Oxford University


  1. A Joint Learning Model of Word Segmentation, Lexical Acquisition and Phonetic Variability Micha Elsner Sharon Goldwater Naomi Feldman Frank Wood The Ohio State University, University of Edinburgh, University of Maryland and Oxford University October 18, 2013

  2. Infant word learning youwanttoseethebook lookthere’saboywithhishat andadoggie youwanttolookatthis lookatthis haveadrink takeitout youwantitin putthaton that yes okay openitup takethedoggieout ithinkitwillcomeout what daddy wherediditgo youwantthatone daddy i’llgogetyourblock what’sthatalice what’sthatablock that that’satelephone that’sthephone say hello youwanttospeaktoalice sayhello what’s youhavetotellme block youwanttheblocks ◮ The infant learner hears a stream of utterances... 2

  3. Infant word learning you want toseethebook lookthere’saboywithhishat andadoggie you want tolookatthis lookatthis haveadrink takeitout you want itin putthaton that yes okay openitup takethedoggieout ithinkitwillcomeout what daddy wherediditgo you want thatone daddy i’llgogetyourblock what’sthatalice what’sthatablock that that’satelephone that’sthephone say hello you want tospeaktoalice sayhello what’s youhavetotellme block you want theblocks ◮ The infant learner hears a stream of utterances... ◮ And is sensitive to repeated sequences 2

  4. Models have been very successful... Lexical models Goal: learn lexicon and LM ◮ We follow: (Goldwater, Griffiths, Johnson ‘09) (GGJ) ◮ Basic idea since (Brent ‘99) ◮ Many extensions since Non-lexical models ◮ Word boundaries from phonotactics: (Fleck ‘08, Rytting ‘07, Daland+al ‘10) ◮ Word-like units from acoustics: (Park+al ‘08, Aimetti ‘09, Jansen+al ‘10) 3

  5. But lexical models handle phonetics poorly ◮ “Intended form” /want/ ends up as [wan] or [w˜ aP] ◮ Lowers overall performance of GGJ... ◮ And changes qualitative results ◮ Learn syllables or morphemes instead of words (Fleck ‘08) Real infants learn collocations Sequences learned as words (Peters, Tomasello) ◮ “youlike”, “wantto” ◮ Production evidence: Early words show up in fixed multi-word contexts ◮ Infants don’t produce subwords 4

  6. Our work This paper Model jointly: ◮ Segments words ◮ Clusters word tokens into lexical entries ◮ Infers a model of phonetic variation ◮ ...on a broad-coverage corpus 5

  7. Research context Previous models integrate lexical/phonetic learning... ◮ (Feldman+al ‘09, ‘13) : vowel learning (fixed lexicon) ◮ (Driesen+al ‘09, Rasanen ‘11) : words and sounds (tiny datasets) ◮ (Börschinger+al ‘13) : segmentation and phonetics (only t-deletion) ◮ (Neubig+al ‘10) : LM from phone lattices (eval phone recognition only) ◮ (Elsner+al ‘12) : two-stage pipeline 6

  8. Last year... Elsner+al ‘12 j@w˜ Messy data aPw2n wan@kUki GGJ segmentation j@ • w˜ aP • w2n , wan@k • Uki Segmented Cluster word types { /wan/ : w˜ aP, wan@k, wan } Clustering Normalized ju • wan • w2n , wan • Uki 7

  9. Last year... Elsner+al ‘12 j@w˜ Messy data aPw2n wan@kUki GGJ segmentation j@ • w˜ aP • w2n , wan@k • Uki Segmented Cluster word types { /wan/ : w˜ aP, wan@k, wan } Clustering Normalized ju • wan • w2n , wan • Uki ◮ Standard problem with pipelines: errors propagate ◮ Not a good cognitive model: doesn’t capture interactions between levels ◮ Type-level inference doesn’t scale to acoustics 7

  10. In this paper... Technical details GGJ: Bayesian word segmentation Our noisy-channel model Joint inference without types: beam sampling Cognitive modeling results Words, collocations and morphemes Infants form collocations ...and have trouble with vowel-initial words Phonetic learning Infants learn consonants better ...and underestimate variation Missegmentations and misrecognitions Short, frequent words are hard 8

  11. GGJ: a non-parametric bigram language model Generator for possible words Geom a, b, ..., ju, ... want, ... juwant, ... Probabilities for each word (sparse) α G p(ði) = .1, p(a) = .05, p(want) = .01... 0 0 Conditional probabilities α 1 for each word after each word G p(ði | want) = .3, p(a | want) = .1, x p(want | want) = .0001... ∞ contexts Intended forms ju want ə kuki x 1 x 2 ... ju want ɪ t ... n utterances 9

  12. Noisy channel component Generator for possible words Geom a, b, ..., ju, ... want, ... juwant, ... Probabilities for each word (sparse) α G p(ði) = .1, p(a) = .05, p(want) = .01... 0 0 Conditional probabilities α 1 for each word after each word G p(ði | want) = .3, p(a | want) = .1, x p(want | want) = .0001... ∞ contexts Intended forms ju want ə kuki x 1 x 2 ... ju want ɪ t ... Surface forms T s 1 s 2 ... j ə wan ə kuki ju wand ɪ t n utterances ... 10

  13. The transducer ◮ Independently rewrites each character ( a → u ) ◮ Log-linear features based on articulation (Hayes+Wilson, Dreyer+Eisner) Constrained by efficiency issues: ◮ Can insert ( → h ) but not delete ( h → ) ◮ Similar to (Neubig, Elsner, Börschinger) but simpler Learning phonetics ◮ Initialize with simple model ( a → a ) ◮ Learn via EM 11

  14. Inference Intended forms vary from surface forms: large search space! ◮ Character-by-character Gibbs likely to get stuck Forward-backward style sampling method: ◮ Following previous work ◮ Semi-Markov formulation of GGJ (Mochihashi+al ‘09) ◮ Composition with transducer yields large FSM (Neubig ‘10) 12

  15. Finite-state encoding word j word u u u/u p(u|j) p(j|[s]) word ju u/u [s] j u j/j p(ju|[s]) ə/u word jə d/j ə d p(jə|[s]) ... 13

  16. Sampling from huge transducers (beam sampling) j/j u/u [s] j u ə/u j/d word jə ə d p(jə|[s]) ... j/k k (van Gael+al ‘08) , (Huggins+Wood ‘13) 14

  17. Sampling from huge transducers (beam sampling) j/j u/u [s] j u ə/u j/d word jə ə d p(jə|[s]) ... j/k k (van Gael+al ‘08) , (Huggins+Wood ‘13) 14

  18. Sampling from huge transducers (beam sampling) j/j u/u [s] j u ~ [0, p(u/u)] ə/u j/d word jə ə d p(jə|[s]) ... ~ [0, p(j/j)] j/k k (van Gael+al ‘08) , (Huggins+Wood ‘13) 14

  19. Overview Technical details GGJ: Bayesian word segmentation Our noisy-channel model Joint inference without types: beam sampling Cognitive modeling results Words, collocations and morphemes Infants form collocations ...and have trouble with vowel-initial words Phonetic learning Infants learn consonants better ...and underestimate variation Missegmentations and misrecognitions Short, frequent words are hard 15

  20. Synthetic dataset from (Elsner+al ‘12) Simulate child-directed speech in close phonetic transcription ◮ Use: Bernstein-Ratner (child-directed) (Bernstein-Ratner ‘87) ◮ Buckeye (closely transcribed) (Pitt+al ‘07) ◮ Sample pronunciation for each BR word from Buckeye ◮ No coarticulation between words “about” ahbawt:15, bawt:9, ihbawt:4, ahbawd:4, ihbawd:4, ahbaat:2, baw:1, ahbaht:1, erbawd:1, bawd:1, ahbaad:1, ahpaat:1, bah:1, baht:1 16

  21. Word segmentation results Prec Rec F-score GGJ, clean data 90.1 80.3 84.9 GGJ segmentation 70.4 93.5 80.3 ◮ GGJ on clean data has high precision, low recall... ◮ On variable data, tradeoff flips (as in (Fleck ‘08) ) 17

  22. Word segmentation results Prec Rec F-score GGJ, clean data 90.1 80.3 84.9 GGJ segmentation 70.4 93.5 80.3 GGJ, our beam inference 73.9 91.0 81.6 ◮ Our inference scheme works ◮ Confidence intervals overlap 17

  23. Word segmentation results Prec Rec F-score GGJ, clean data 90.1 80.3 84.9 GGJ segmentation 70.4 93.5 80.3 GGJ, our beam inference 73.9 91.0 81.6 EM transducer 80.1 83.0 81.5 ◮ Segmentation with transducer trades recall for precision ◮ Moving closer to original qualitative results 17

  24. A closer look Where do gold-standard word tokens end up? ◮ Correct boundaries and lexical item ◮ Correct boundaries, wrong lexical item: ju analyzed as jEs ◮ Collocation: boundaries are real but too wide: real ju • want as juwant ◮ Split: dOgiz as dO • giz ◮ One boundary: ju • wa... ◮ Just plain wrong 18

  25. Analysis EM-learned GGJ Correct 49.88 47.61 Wrong form 17.96 23.73 Collocation 15.60 7.59 Split 8.69 15.84 One bound 7.11 15.18 Wrong 0.75 0.22 ◮ “Wrong form” errors could be repaired in pipeline ◮ ...but collocation vs split cannot be 19

  26. Vowel-initial words ◮ Infants are slow to segment vowel-initial words (Mattys+Jusczyk, Nazzi+al, Seidl+Johnson) ◮ Initial vowels often variable, resyllabified (Seidl+Johnson) EM transducer Vow. init Cons. init Correct 41.5 52.1 Wrong form 20.4 17.3 Collocation 19.2 12.5 ◮ Transducer system has trouble with vowels... ◮ More likely to find collocation, less likely to get left boundary correct 20

  27. Phonetic learning ◮ Infants learn consonant categories slower than vowels ◮ Non-native vowel contrasts lost by 8 ms (Kuhl, Bosch+Sebastian-Galles) ◮ Consonant contrasts by 10-12 ms (Werker+Tees) ◮ Generalize across talkers/dialects slowly ◮ (Houston+Jusczyk, Singh) What about the model? 21

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend