CS11-747 Neural Networks for NLP
Transition-based Parsing with Neural Nets
Graham Neubig
Site https://phontron.com/class/nn4nlp2017/
Transition-based Parsing with Neural Nets Graham Neubig Site - - PowerPoint PPT Presentation
CS11-747 Neural Networks for NLP Transition-based Parsing with Neural Nets Graham Neubig Site https://phontron.com/class/nn4nlp2017/ Two Types of Linguistic Structure Dependency: focus on relations between words ROOT I saw a girl
CS11-747 Neural Networks for NLP
Graham Neubig
Site https://phontron.com/class/nn4nlp2017/
I saw a girl with a telescope
PRP VBD DT NN IN DT NN NP NP PP VP S
I saw a girl with a telescope ROOT
some sort of dynamic programming
as related words are close in the tree
that explicitly show the relationship between words
det dobj det
I saw a girl with a telescope
prep nsubj pobj
(Yamada & Matsumoto 2003, Nivre 2003)
Stack Buffer Stack Buffer I saw a girl ROOT I saw a girl ROOT
shift
I saw a girl ROOT
shift
I saw a girl ROOT
shift
I saw a girl ROOT
left
I saw a girl ROOT
shift
∅ I saw a girl ROOT
left
∅ I saw a girl ROOT
right
∅ I saw a girl ROOT
right
∅
I saw a girl ROOT Stack Buffer
shift
I saw a girl ROOT ∅
left
I saw a girl ROOT
right
I saw a girl ROOT
likely
Nivre 2011), now we can use neural nets!
(Chen and Manning 2014)
(Chen and Manning 2014)
s1, s2, s3, b1, b2, b3
the stack (8 features) lc1(si), lc2(si), rc1(si), rc2(si) i=1,2
lc1(lc1(si)), rc1(rc1(si)) i=1,2
to three (similar to Polynomial Kernel in SVMs)
(1000 words/second)
multiplies of common words
based parsers at the time
I hate this movie
Tree-RNN Tree-RNN
tree-rnn(h1, h2) = tanh(W[h1; h2] + b)
Can also parameterize by constituent type → different composition behavior for NP, VP, etc.
Tree-RNN
network can learn to “ignore” children (e.g. give less weight to non-head nodes)
Tree RNN)
I hate this movie
BiLSTM BiLSTM BiLSTM
(Dyer et al. 2015)
leftmost and rightmost grandchildren only?!)
configuration with an RNN?
an decision was
amod
REDUCE-LEFT(amod) SHIFT
…
SHIFT RED-L(amod)
…
made S B A ∅ pt root
TOP TOP TOP
REDUCE_L REDUCE_R SHIFT
(Slide credits: Chris Dyer)
syntactic compositon
better?
(Slide credits: Chris Dyer)
current contents)
(Slide credits: Chris Dyer)
s=[rnn.inital_state()] s.append[s[-1].add_input(x1) s.pop() s.append[s[-1].add_input(x2) s.pop() s.append[s[-1].add_input(x3)
DyNet:
(Slide credits: Chris Dyer)
s=[rnn.inital_state()] s.append[s[-1].add_input(x1) s.pop() s.append[s[-1].add_input(x2) s.pop() s.append[s[-1].add_input(x3)
DyNet:
(Slide credits: Chris Dyer)
s=[rnn.inital_state()] s.append[s[-1].add_input(x1) s.pop() s.append[s[-1].add_input(x2) s.pop() s.append[s[-1].add_input(x3)
DyNet:
(Slide credits: Chris Dyer)
s=[rnn.inital_state()] s.append[s[-1].add_input(x1) s.pop() s.append[s[-1].add_input(x2) s.pop() s.append[s[-1].add_input(x3)
DyNet:
(Slide credits: Chris Dyer)
s=[rnn.inital_state()] s.append[s[-1].add_input(x1) s.pop() s.append[s[-1].add_input(x2) s.pop() s.append[s[-1].add_input(x3)
DyNet:
(Slide credits: Chris Dyer)
s=[rnn.inital_state()] s.append[s[-1].add_input(x1) s.pop() s.append[s[-1].add_input(x2) s.pop() s.append[s[-1].add_input(x3)
DyNet:
(Slide credits: Chris Dyer)
(Sagae and Lavie 2005, Watanabe 2015)
people saw the girl
NP VP S
tall that
WHNP SBAR NP DT JJ NP VBD WDT NNS
the girl
NP
tall
DT JJ NP
the girl
NP’
tall
DT JJ NP NP
First, Binarize
shift
the girl tall
Stack Buffer
the girl tall ∅
reduce-NP’ Stack
the girl tall the girl tall
NP’ unary-S Stack
saw
NP … VP
saw
NP … VP S
(Dyer et al. 2016)
model then rerank, importance sampling for LM evaluation
models