CS11-747 Neural Networks for NLP
Parsing with Dynamic Programming
Graham Neubig
Site https://phontron.com/class/nn4nlp2020/
Parsing with Dynamic Programming Graham Neubig Site - - PowerPoint PPT Presentation
CS11-747 Neural Networks for NLP Parsing with Dynamic Programming Graham Neubig Site https://phontron.com/class/nn4nlp2020/ Two Types of Linguistic Structure Dependency: focus on relations between words ROOT I saw a girl with a
CS11-747 Neural Networks for NLP
Graham Neubig
Site https://phontron.com/class/nn4nlp2020/
I saw a girl with a telescope
PRP VBD DT NN IN DT NN NP NP PP VP S
I saw a girl with a telescope ROOT
some sort of dynamic programming
I saw a girl with a telescope
PRP VBD DT NN IN DT NN NP NP PP VP S
degree of its edges
binarized grammar, do hypergraph search
graphs, but over hyper-graphs
CFG grammar
forward-backward is over graphs, inside-outside is
constituent in tree or not
structured), inference algorithms (CKY, top down)
inference
github.com/nikitakit/self- attentive-parser
this is an example this is an example
7
3
4
5
this is an example
4 7 5
prediction)
(and subtract its score from all incoming edges)
single node
contracted node
appropriately
(Figure Credit: Jurafsky and Martin)
(Figure Credit: Jurafsky and Martin)
(Figure Credit: Jurafsky and Martin)
(Figure Credit: Jurafsky and Martin)
(Figure Credit: Jurafsky and Martin)
trees in O(n3)
Edmonds is non-projective)
O(m + n log n)
structured prediction class)
correct edge by the margin
update parameters using hinge loss
(e.g. Zhang and McDonald 2012)
I saw a girl with a telescope I saw a girl with a telescope I saw a girl with a telescope I saw a girl with a telescope I saw a girl with a telescope I saw a girl with a telescope
First Order Second Order Third Order
combination
based model
and dependent are important clues
(Pei et al. 2015)
at UAS
dependencies parsing task
Learn specific representations for head/dependent for each word Calculate score of each arc
training
style algorithm for dependencies (Eisner et al. 1996)
marginals over directed graphs (Koo et al. 2007)
P(Y | X) = e
P|Y |
j=1 S(yj|X,y1,...,yj−1)
P
˜ Y ∈V ∗ e P| ˜
Y | j=1 S(˜
yj|X,˜ y1,...,˜ yj−1)
dynamic programming decoding algorithm
Zuidema 2014)
2016)
2016)
works very well)
space