Advanced Dependency Parsing
Joakim Nivre
Uppsala University Linguistics and Philology Based on tutorials with Ryan McDonald
Advanced Dependency Parsing 1(36)
Advanced Dependency Parsing Joakim Nivre Uppsala University - - PowerPoint PPT Presentation
Advanced Dependency Parsing Joakim Nivre Uppsala University Linguistics and Philology Based on tutorials with Ryan McDonald Advanced Dependency Parsing 1(36) Introduction Plan for the Lecture 1. Graph-based vs. transition-based dependency
Advanced Dependency Parsing 1(36)
◮ Higher order models ◮ Non-projective parsing
◮ Beam search ◮ Dynamic oracles ◮ Non-projective parsing Advanced Dependency Parsing 2(36)
◮ Define a space of candidate dependency trees for a sentence ◮ Learning: Induce a model for scoring an entire dependency tree
◮ Parsing: Find the highest-scoring dependency tree, given the
◮ Global learning of a model for optimal dependency trees ◮ Exhaustive search during parsing (exact) Advanced Dependency Parsing 3(36)
◮ Decoding guaranteed to find highest scoring tree ◮ Training algorithms use global structure learning
◮ Must limit context statistical model can look at ◮ Results in bad ‘easy’ decisions ◮ For example, first-order models often predict two subjects ◮ No parameter exists to discourage this
Advanced Dependency Parsing 4(36)
◮ Define a transition system (state machine) for mapping a
◮ Learning: Induce a model for predicting the next state
◮ Parsing: Construct the optimal transition sequence, given the
◮ Local learning of a model for optimal transitions ◮ Greedy best-first search (heuristic) Advanced Dependency Parsing 5(36)
◮ Highly efficient parsing – linear time complexity ◮ Rich history-based feature representations – no rigid
◮ Sensitive to search errors and error propagation due to greedy
Advanced Dependency Parsing 6(36)
10 20 30 40 50 50+
0.7 0.72 0.74 0.76 0.78 0.8 0.82 0.84
MSTParser MaltParser
Advanced Dependency Parsing 7(36)
5 10 15 20 25 30
Dependency Length
0.3 0.4 0.5 0.6 0.7 0.8 0.9
Dependency Precision
MSTParser MaltParser 5 10 15 20 25 30
Dependency Length
0.3 0.4 0.5 0.6 0.7 0.8 0.9
Dependency Recall
MSTParser MaltParser
Advanced Dependency Parsing 8(36)
◮ Higher order models ◮ Non-projective parsing
◮ Beam search ◮ Dynamic oracles ◮ Non-projective parsing
Advanced Dependency Parsing 9(36)
◮ Vertical: e.g.,“remain”is the grandparent of“emeritus” ◮ Horizontal: e.g.,“remain”is first child of“will” Advanced Dependency Parsing 10(36)
Advanced Dependency Parsing 11(36)
◮ McDonald and Pereira [2006] (2nd-order sibling) ◮ Carreras [2007] (2nd-order sibling and grandparent) ◮ Koo and Collins [2010] (3rd-order grand-sibling and tri-sibling) ◮ Ma and Zhao [2012] (4th-order grand-tri-sibling+) h m h m s g m h HORIZONTAL CONTEXT VERTICAL CONTEXT
* From Koo et al. 2010 presentation
h m s s’ g m h s
O(n3) O(n3) O(n4) O(n4) O(n4) h m s s’ O(n5) g Advanced Dependency Parsing 12(36)
◮ Specialized chart items and combination rules ◮ Time complexity increases for every added order ◮ Anything beyond 2nd-order is too slow in practice
◮ Global training and exact inference – local feature scope ◮ Increasing feature scope makes exact inference harder
Advanced Dependency Parsing 13(36)
◮ Construct a graph with the highest-scoring head for each word ◮ If this is a tree, it must be the MST ◮ If not, contract a cycle and recurse on smaller graph
John saw Mary ROOT 9 10 20 9 30 11 3 30 John saw Mary ROOT 10 30 30
Advanced Dependency Parsing 14(36)
◮ Construct a graph with the highest-scoring head for each word ◮ If this is a tree, it must be the MST ◮ If not, contract a cycle and recurse on smaller graph
John saw Mary ROOT 9 10 20 9 30 11 3 30 John saw Mary ROOT 10 30 30
Advanced Dependency Parsing 14(36)
◮ Higher order models ◮ Non-projective parsing
◮ Beam search ◮ Dynamic oracles ◮ Non-projective parsing
Advanced Dependency Parsing 15(36)
Advanced Dependency Parsing 16(36)
◮ Pruning the beam requires that we score transition sequences ◮ Global learning to maximize score of entire sequence Advanced Dependency Parsing 17(36)
Advanced Dependency Parsing 18(36)
◮ Global learning – minimize loss over entire sentence ◮ Non-greedy search – accuracy increases with beam size
◮ Highly efficient – complexity still linear for fixed beam size ◮ Rich features – no constraints from parsing algorithm Advanced Dependency Parsing 19(36)
2 4 6 8 10 12 14 0.4 0.5 0.6 0.7 0.8 0.9 MST Malt ZPar
Advanced Dependency Parsing 20(36)
◮ At parsing time, the parser can recover from early bad decisions ◮ At training time, the parser can learn to avoid costly mistakes
◮ Yes – but we need dynamic oracles for training ◮ Then we can improve greedy parsing for maximum speed Advanced Dependency Parsing 21(36)
Advanced Dependency Parsing 22(36)
Advanced Dependency Parsing 22(36)
◮ Derives T in a configuration sequence Co,T = c0, . . . , cm
◮ Deterministic: Ignores other derivations of T ◮ Incomplete: Valid only for configurations in Co,T Advanced Dependency Parsing 23(36)
Advanced Dependency Parsing 24(36)
Advanced Dependency Parsing 24(36)
Advanced Dependency Parsing 24(36)
sbj
Advanced Dependency Parsing 24(36)
sbj
ROOT root
Advanced Dependency Parsing 24(36)
sbj
ROOT root
iobj
Advanced Dependency Parsing 24(36)
sbj
ROOT root
iobj
Advanced Dependency Parsing 24(36)
sbj
ROOT root
iobj
det
Advanced Dependency Parsing 24(36)
sbj
ROOT root
iobj
det
Advanced Dependency Parsing 24(36)
sbj
ROOT root
iobj
det
dobj
Advanced Dependency Parsing 24(36)
sbj
ROOT root
iobj
det
dobj
Advanced Dependency Parsing 24(36)
sbj
ROOT root
iobj
det
dobj
p
Advanced Dependency Parsing 24(36)
sbj
ROOT root
iobj
Advanced Dependency Parsing 25(36)
sbj
ROOT root
iobj
Advanced Dependency Parsing 25(36)
sbj
ROOT root
iobj
Advanced Dependency Parsing 25(36)
sbj
ROOT root
iobj
det
Advanced Dependency Parsing 25(36)
sbj
ROOT root
iobj
det
dobj
Advanced Dependency Parsing 25(36)
sbj
ROOT root
iobj
det
dobj
Advanced Dependency Parsing 25(36)
sbj
ROOT root
iobj
det
dobj
p
Advanced Dependency Parsing 25(36)
sbj
ROOT root
Advanced Dependency Parsing 26(36)
sbj
ROOT root
Advanced Dependency Parsing 26(36)
sbj
ROOT root
Advanced Dependency Parsing 26(36)
sbj
ROOT root
det
Advanced Dependency Parsing 26(36)
sbj
ROOT root
det
Advanced Dependency Parsing 26(36)
sbj
ROOT root
det
Advanced Dependency Parsing 26(36)
sbj
ROOT root
det
Advanced Dependency Parsing 26(36)
sbj
ROOT root
det
?
Advanced Dependency Parsing 26(36)
sbj
ROOT root
det
?
dobj
Advanced Dependency Parsing 26(36)
sbj
ROOT root
det
?
dobj
Advanced Dependency Parsing 26(36)
sbj
ROOT root
det
?
dobj
p
Advanced Dependency Parsing 26(36)
◮ A transition is optimal if the best tree remains reachable ◮ Best tree = argminT ′ L(T, T ′)
◮ Boolean function o(c, t, T) = true if t is optimal for c and T ◮ Non-deterministic: More than one transition can be optimal ◮ Complete: Correct for all configurations
◮ How do we know which trees are reachable? ◮ Easy for some transition systems (called arc-decomposable) Advanced Dependency Parsing 27(36)
= false if ∃w ∈ Bc : s ↔ w ∈ T (except s ← b) true
= false if ∃w ∈ Sc : w ↔ b ∈ T (except s → b) true
= false if ∃w ∈ Bc : s → w ∈ T true
= false if ∃w ∈ Sc : w ↔ b ∈ T true
Notation: s = node on top of the stack S b = first node in the buffer B
Advanced Dependency Parsing 28(36)
Advanced Dependency Parsing 29(36)
Advanced Dependency Parsing 29(36)
Advanced Dependency Parsing 30(36)
◮ Pseudo-projective parsing [Nivre and Nilsson 2005] ◮ Non-adjacent arc transitions
◮ Online reordering [Nivre 2009, Nivre et al. 2009] Advanced Dependency Parsing 31(36)
◮ Words can always be reordered to make the tree projective ◮ Given a dependency tree T = (V , A, <), let the projective
ROOT
ROOT
root det aux nsubj prep pobj det tmod p
Advanced Dependency Parsing 32(36)
◮ Words can always be reordered to make the tree projective ◮ Given a dependency tree T = (V , A, <), let the projective
ROOT
ROOT
root det aux nsubj prep pobj det tmod p
Advanced Dependency Parsing 32(36)
◮ Words can always be reordered to make the tree projective ◮ Given a dependency tree T = (V , A, <), let the projective
ROOT
ROOT
root det aux nsubj prep pobj det tmod p
Advanced Dependency Parsing 32(36)
◮ Words can always be reordered to make the tree projective ◮ Given a dependency tree T = (V , A, <), let the projective
ROOT
ROOT
root det aux nsubj prep pobj det tmod p
Advanced Dependency Parsing 32(36)
◮ Words can always be reordered to make the tree projective ◮ Given a dependency tree T = (V , A, <), let the projective
ROOT
ROOT
root det aux nsubj prep pobj det tmod p
Advanced Dependency Parsing 32(36)
◮ Words can always be reordered to make the tree projective ◮ Given a dependency tree T = (V , A, <), let the projective
ROOT
ROOT
root det aux nsubj prep pobj det tmod p
Advanced Dependency Parsing 32(36)
◮ Words can always be reordered to make the tree projective ◮ Given a dependency tree T = (V , A, <), let the projective
ROOT
ROOT
root det aux nsubj prep pobj det tmod p
Advanced Dependency Parsing 32(36)
◮ Words can always be reordered to make the tree projective ◮ Given a dependency tree T = (V , A, <), let the projective
ROOT
ROOT
root det aux nsubj prep pobj det tmod p
Advanced Dependency Parsing 32(36)
◮ Words can always be reordered to make the tree projective ◮ Given a dependency tree T = (V , A, <), let the projective
ROOT
ROOT
root det aux nsubj prep pobj det tmod p
Advanced Dependency Parsing 32(36)
◮ Words can always be reordered to make the tree projective ◮ Given a dependency tree T = (V , A, <), let the projective
ROOT
ROOT
root det aux nsubj prep pobj det tmod p
Advanced Dependency Parsing 32(36)
◮ Words can always be reordered to make the tree projective ◮ Given a dependency tree T = (V , A, <), let the projective
ROOT
ROOT
root det aux nsubj prep pobj det tmod p
Advanced Dependency Parsing 32(36)
◮ Words can always be reordered to make the tree projective ◮ Given a dependency tree T = (V , A, <), let the projective
ROOT
ROOT
root det aux nsubj prep pobj det tmod p
Advanced Dependency Parsing 32(36)
◮ Words can always be reordered to make the tree projective ◮ Given a dependency tree T = (V , A, <), let the projective
ROOT
ROOT
root det aux nsubj prep pobj det tmod p
Advanced Dependency Parsing 32(36)
◮ Words can always be reordered to make the tree projective ◮ Given a dependency tree T = (V , A, <), let the projective
ROOT
ROOT
root det aux nsubj prep pobj det tmod p
Advanced Dependency Parsing 32(36)
◮ Words can always be reordered to make the tree projective ◮ Given a dependency tree T = (V , A, <), let the projective
ROOT
ROOT
root det aux nsubj prep pobj det tmod p
Advanced Dependency Parsing 32(36)
◮ Words can always be reordered to make the tree projective ◮ Given a dependency tree T = (V , A, <), let the projective
ROOT
ROOT
root det aux nsubj prep pobj det tmod p
Advanced Dependency Parsing 32(36)
◮ Words can always be reordered to make the tree projective ◮ Given a dependency tree T = (V , A, <), let the projective
ROOT
ROOT
root det aux nsubj prep pobj det tmod p
Advanced Dependency Parsing 32(36)
◮ Words can always be reordered to make the tree projective ◮ Given a dependency tree T = (V , A, <), let the projective
ROOT
ROOT
root det aux nsubj prep pobj det tmod p
Advanced Dependency Parsing 32(36)
Advanced Dependency Parsing 33(36)
Advanced Dependency Parsing 33(36)
ROOT
ROOT
Advanced Dependency Parsing 34(36)
ROOT
ROOT
Advanced Dependency Parsing 34(36)
ROOT
ROOT
Advanced Dependency Parsing 34(36)
ROOT
ROOT
Advanced Dependency Parsing 34(36)
ROOT
ROOT
det
Advanced Dependency Parsing 34(36)
ROOT
ROOT
det
Advanced Dependency Parsing 34(36)
ROOT
ROOT
det
Advanced Dependency Parsing 34(36)
ROOT
ROOT
det aux
Advanced Dependency Parsing 34(36)
ROOT
ROOT
det aux
Advanced Dependency Parsing 34(36)
ROOT
ROOT
det aux
Advanced Dependency Parsing 34(36)
ROOT
ROOT
det aux
Advanced Dependency Parsing 34(36)
ROOT
ROOT
det aux det
Advanced Dependency Parsing 34(36)
ROOT
ROOT
det aux pobj det
Advanced Dependency Parsing 34(36)
ROOT
ROOT
det aux pobj det
Advanced Dependency Parsing 34(36)
ROOT
ROOT
det aux prep pobj det
Advanced Dependency Parsing 34(36)
ROOT
ROOT
det aux prep pobj det
Advanced Dependency Parsing 34(36)
ROOT
ROOT
det aux nsubj prep pobj det
Advanced Dependency Parsing 34(36)
ROOT
ROOT
det aux nsubj prep pobj det
Advanced Dependency Parsing 34(36)
ROOT
ROOT
det aux nsubj prep pobj det tmod
Advanced Dependency Parsing 34(36)
ROOT
ROOT
det aux nsubj prep pobj det tmod
Advanced Dependency Parsing 34(36)
ROOT
ROOT
det aux nsubj prep pobj det tmod p
Advanced Dependency Parsing 34(36)
ROOT
ROOT
root det aux nsubj prep pobj det tmod
Advanced Dependency Parsing 34(36)
◮ Sound and complete for the class of non-projective trees
◮ Quadratic running time in the worst case ◮ Linear running time in the average case
Advanced Dependency Parsing 35(36)
◮ Graph-based: Increase feature scope (higher order models)
◮ Transition-based: Improve learning and inference (beam
Advanced Dependency Parsing 36(36)
Advanced Dependency Parsing 36(36)
Advanced Dependency Parsing 36(36)
Advanced Dependency Parsing 36(36)
Advanced Dependency Parsing 36(36)