Weakly-Supervised Bayesian Learning of a CCG Supertagger
Dan Garrette, Chris Dyer, Jason Baldridge, Noah A. Smith
Weakly-Supervised Bayesian Learning of a CCG Supertagger Dan - - PowerPoint PPT Presentation
Weakly-Supervised Bayesian Learning of a CCG Supertagger Dan Garrette, Chris Dyer, Jason Baldridge, Noah A. Smith Type-Level Supervision Type-Level Supervision Unannotated text Incomplete tag dictionary: word {tags} Type-Level
Dan Garrette, Chris Dyer, Jason Baldridge, Noah A. Smith
Used for POS tagging for 20+ years
[Kupiec, 1992] [Merialdo, 1994]
Good POS tagger performance even with low supervision
[Das & Petrov 2011] [Garrette & Baldridge 2013] [Garrette et al. 2013]
Every word token is associated with a category Categories combine to categories of constituents
[Steedman, 2000] [Steedman and Baldridge, 2011]
the dog n np n / np np
s dogs np sleep \ s s
np
the dog sleeps np/n n s\np np s the dog sleeps DT NN VBZ NP S VP
Type-supervised learning for supertagging is much more difficult than for POS Penn Treebank POS CCGBank Supertags 48 tags 1,239 tags
The grammar formalism itself can be used to guide learning
“almost parsing”
[Bangalore and Joshi 1999]
sleeps the lazy dog
/ np n n / n n \ s np
n np n np np n sleeps the lazy dog np / n n / \ s s s
n np np np n sleeps the lazy dog / n n / \ s np s n n
n np np np n sleeps the lazy dog / n n / \ s np s n
sleeps the lazy dog s\np n
n / n np n /
the lazy dog np/n ? n
the lazy dog np/n n np X X
the dog sleeps np/n n s\np np s the dog sleeps DT NN VBZ NP S VP universal, intrinsic grammar properties all relationships must be learned ?
the lazy dog np/n (np\(np/n))/n n
buy := (((sb\np)/pp)/pp)/np appears 342 times in CCGbank buy := (sb\np)/np appears once
e.g. “Opponents don't buy such arguments.” “Tele-Communications agreed to buy half of Showtime Networks from Viacom for $ 225 million.” pp pp
a {s, np, n,…} A B / B A B \ B patom(a) A A B \ C B / C × pterm pterm pterm pterm pterm × pfwd × pfwd × pfwd × pfwd × pmod × pmod × pmod × pmod
the lazy dog np/n n/n n (np\(np/n))/n np
simple is good connecting is good
unlabeled corpus tag dictionary universal properties of the CCG formalism
same as POS tagging
Forward-Filter Backward-Sample (FFBS)
the lazy dogs wander np/n n np n/n np (s\np)/np n n/n np/n s\np …
Tag Dictionary ___ : __, __, __ ___ : __, __, __ ___ : __, __, __ ___ : __, __, __ ___ : __, __, __ Unlabeled Data ______________ ______________ ______________ ______________ ______________
the lazy dogs np/n n np n/n np wander (s\np)/np n n/n np/n s\np …
Priors HMM
the lazy dogs np/n n np n/n np wander (s\np)/np n n/n np/n s\np …
Priors HMM
the lazy dogs np/n n np n/n np wander (s\np)/np n n/n np/n s\np …
Priors HMM
the lazy dogs np np wander (s\np)/np n n/n np/n …
Priors HMM
np/n n n/n s\np np/n n n/n s\np
the lazy dogs np np wander (s\np)/np n n/n np/n …
Priors HMM
np/n n n/n s\np
Use universal properties of CCG to initialize EM
25 50 75 100 0.1 0.01 0.001 none
Baldridge '08 Ours
78 80 80 67 55 73 41 51 tag dictionary pruning cutoff t a g g i n g a c c u r a c y
25 50 75 100 0.1 0.01 0.001 none
Baldridge '08 Ours
66 69 62 43 33 56 28 49 tag dictionary pruning cutoff t a g g i n g a c c u r a c y
25 50 75 100 0.1 0.01 0.001 none
Baldridge '08 Ours
54 53 47 36 33 45 32 46 tag dictionary pruning cutoff t a g g i n g a c c u r a c y
GitHub repository linked from my website
Combining annotation exploitation with universal grammatical knowledge yields good models from weak supervision