Synergies in learning syllables and words
- r
Adaptor grammars: a class of nonparametric Bayesian models
Mark Johnson Brown University Joint work with Sharon Goldwater and Tom Griffiths NECPHON, November, 2008
1 / 26
Synergies in learning syllables and words or Adaptor grammars: a - - PowerPoint PPT Presentation
Synergies in learning syllables and words or Adaptor grammars: a class of nonparametric Bayesian models Mark Johnson Brown University Joint work with Sharon Goldwater and Tom Griffiths NECPHON, November, 2008 1 / 26 Research goals Most
1 / 26
◮ non-parametric Bayesian inference ◮ Adaptor grammars
◮ Learning words, collocations and syllables simultaneously
2 / 26
3 / 26
4 / 26
5 / 26
◮ by picking a rule and recursively expanding its children, or ◮ by generating a previously generated tree (with probability
6 / 26
◮ to a subtree τ rooted in A with probability proportional to
◮ using A → β with probability proportional to αAθA→β
◮ also learn base grammar PCFG rule probabilities θA→β ◮ use Pitman-Yor adaptors (which discount frequency of
◮ learn the parameters (e.g., αA) associated with adaptors 7 / 26
◮ Certain structures (words, syllables) are adapted or
◮ Algorithm counts how often each adapted structure
◮ Chooses parse for next sentence with probability
◮ Probability of an adapted structure is proportional to:
8 / 26
9 / 26
◮ base PCFG specifies prior in adaptor grammars
◮ expresses uncertainty about which grammar is correct ◮ sampling is a natural way to characterize posterior 10 / 26
◮ sample a parse for next sentence ◮ count how often each adapted structure appears in parse
◮ assign every sentence a (random) parse ◮ repeatedly cycle through training sentences:
◮ Learn different versions (“particles”) of grammar at once ◮ For each particle sample a parse of next sentence ◮ Keep/replicate particles with high probability parses 11 / 26
12 / 26
13 / 26
◮ poor approximation to syntactic/semantic dependencies
◮ learns collocations without being told what the words are
14 / 26
15 / 26
16 / 26
Sentence Colloc2 Colloc Word OnsetI g Nucleus I CodaF v Word OnsetI h Nucleus I CodaF m Colloc Word Nucleus 6 Word OnsetI k Nucleus I CodaF s Colloc2 Colloc Word Nucleus
OnsetI k Nucleus e
17 / 26
18 / 26
19 / 26
20 / 26
21 / 26
22 / 26
◮ even if their mathematical justification is really cool . . .
◮ preferring Onsets dramatically improves syllabification
◮ Learning interword dependencies improves word
◮ Learning syllabification improves word segmentation ◮ Learning word segmentation improves syllabification
23 / 26
◮ the number of times τ was seen before
◮ plus αA times prob. of generating it via PCFG expansion
◮ an adaptor grammar learns from its previous output
24 / 26
◮ reuse an old Word, or ◮ generate a fresh one from the
Word Stem Chars Chars Char t Chars Char a Chars Char l Chars Char k Suffix Chars Char i Chars Char n Chars Char g Chars Char #
25 / 26
◮ An adaptor grammar has one CRP or PYP for each
◮ In adaptor grammars, B is given by the PCFG rules
x m(x).
26 / 26