A Supertag-Context Model for Weakly-Supervised CCG Parser Learning
Dan Garrette Chris Dyer Jason Baldridge Noah A. Smith
- U. Washington
CMU UT-Austin CMU
A Supertag-Context Model for Weakly-Supervised CCG Parser Learning - - PowerPoint PPT Presentation
A Supertag-Context Model for Weakly-Supervised CCG Parser Learning Dan Garrette U. Washington Chris Dyer CMU Jason Baldridge UT-Austin Noah A. Smith CMU Contributions 1. A new generative model for learning CCG parsers from weak
Dan Garrette Chris Dyer Jason Baldridge Noah A. Smith
CMU UT-Austin CMU
CCG parsers from weak supervision
capture properties of CCG
learn the parameters of our model
the lazy dogs
np/n n/n np
wander
n np (s\np)/np
the lazy dogs
np/n n/n np
wander
n np (s\np)/np n n/n np/n s\np …
the lazy dogs
np/n n/n np
wander
n np (s\np)/np n n/n np/n s\np …
R
R L
(This makes inference tricky… we’ll come back to that)
to guide learning
know whether they are combinable.
before we even look at the data
combinable.
buy the book np/n n s/np np s universal, intrinsic grammar properties all relationships must be learned buy the book VB DT NN NP S VP ?
np np n sleeps the lazy dog / n n / \ s n n np s
FA
np np n sleeps the lazy dog / n n / \ s n np s
np/n FC
np np np n sleeps the lazy dog n /n n n / \ s np s n
np np np n sleeps the lazy dog n /n n n / \ s np s n n
np np np n sleeps the lazy dog n /n n n / \ s np s n n
np np np n sleeps the lazy dog n / \ s np s n
np np np n sleeps the lazy dog n / \ s np s n n /n n n
sleeps the lazy dog
[Klein & Manning 2002]
context with the Constituent Context Model (CCM)
NN
[Klein & Manning 2002]
DT JJ VBZ ( )
NN
[Klein & Manning 2002]
DT JJ VBZ ( ) lazy dog “substitutability”
[Klein & Manning 2002]
DT VBZ ( ) NN dog “substitutability”
NN
[Klein & Manning 2002]
DT JJ VBZ ( ) JJ big lazy dog “substitutability”
[Klein & Manning 2002]
DT VBZ ( ) “substitutability”
[Klein & Manning 2002]
DT VBZ ( ) “substitutability”
sleeps the np np n / np \ s s
( ) lazy dog n /n n n n
sleeps the np np n / np \ s s
( ) n
looking at the data
CCG parsers from weak supervision
capture properties of CCG
learn the parameters of our model
P(A → Aleft Aright OR wi) Standard PCFG w1 w2 w3 w4 t1 t2 t3 t4 A13
1 2 3 4
A04 A03
P(Aroot)
With Context w1 w2 w3 w4 t1 t2 t3 t4 A04 A13
1 2 3 4
A03 P(A → tleft) P(A → tright)
<s> <e>
P(Aroot) P(A → Aleft Aright OR wi)
np/n n the lazy dog np\(np/n) np np the lazy dog np/n n/n n n np
[Garrette, Dyer, Baldridge, and Smith, 2015]
(np\(np/n))/n
PL-prior(tleft | A)
105 1 if tleft can combine with A
sleeps the lazy dog A
tleft tright
sleeps the lazy dog n
PR-prior(tright | A) {
105 1 if A can combine with tright
tleft tright
CCG parsers from weak supervision
capture properties of CCG
learn the parameters of our model
the lazy dogs
np/n n/n np
wander
n np (s\np)/np
unlabeled corpus tag dictionary universal properties of the CCG formalism
use of our linguistically-informed priors
with dynamic programming.
Idea:
the lazy dogs
np/n n/n np
wander
n np (s\np)/np
Priors
(prefer connections)
Model
the lazy dogs
np/n n/n np
wander
n np (s\np)/np
Priors
(prefer connections)
Model
the lazy dogs
np/n n/n np
wander
n np (s\np)/np n n/n np/n s\np …
Priors
(prefer connections)
Model
the lazy dogs
np/n n/n np
wander
n np (s\np)/np n n/n np/n s\np …
Priors
(prefer connections)
Model
Inside
the lazy dogs
np/n n/n np
wander
n np (s\np)/np n n/n np/n s\np …
Priors
(prefer connections)
Model
Sample
Priors
(prefer connections)
Model
Priors
(prefer connections)
Model
Existing Tree New Tree
Priors
(prefer connections)
Model
Existing Tree New Tree
Priors
(prefer connections)
Model
Existing Tree New Tree
Priors
(prefer connections)
Model
Priors
(prefer connections)
Model
context, and biasing toward combining contexts, help learn better parsing models?
25 50 75 250k 200k 150k 100k 50k 25k
no context +context combinability
parsing accuracy size of the corpus from which the tag dictionary is drawn
60 65 61 64 60 64 59 63 56 60 55 58
20 40 60
English Italian Chinese
no context +context combinability
parsing accuracy 25k token TD corpus
55 58 52 54 29 34
Under weak supervision, we can use universal grammatical knowledge about context to find trees with a better global structure.
context-generated nonterminals don’t match the tree.
(condition on well-formed structures).
accept if z <
z ∼ uniform(0,1) = current tree new tree Pfull(y) Ppcfg(y) Pcontext(y) Pfull(y′) Ppcfg(y′) Pcontext(y′) / = / = Ppcfg(y) Pfull(y′) Ppcfg(y′) Pfull(y) / / Pcontext(y′) Pcontext(y)