Dependency Parsing
- Dr. Besnik Fetahu
Dependency Parsing Dr. Besnik Fetahu Parsing so far Use context - - PowerPoint PPT Presentation
Dependency Parsing Dr. Besnik Fetahu Parsing so far Use context free grammars to determine constituents in a clause or sentence Use CFGs to parse entire sentences into constituency based parse trees, e.g. syntactic parse trees In
sentence
trees, e.g. syntactic parse trees
a sentence are “latent”
rules for each of the different positions to capture specific phrases
rules proposed by Collins)
2
3
Relations among words are illustrated through directed and labelled arcs (typed dependencies) Relations are drawn from a deterministic set of relations that are linguistically motivated (e.g. nsubj describes a nominal subject in a sentence) Each word has exactly one incoming arc (except from the root node) Dependency parse trees are useful for coreference resolution, question answering etc.
4
the case of CFGs that are used in constituent based parsing.
between any parent-child nodes
free word order (e.g. Czech)
can be directly used in other NLP applications, since we can directly extract the verbs and its arguments (e.g. the case of prefer)
5
6
Selected dependency relations from the Universal Dependency set. (de Marneffe et al., 2014)
correlated in the positions they appear, however, this is not the case for languages with a free word order (e.g. Czech)
w.r.t the predicate, and (ii) modifier relations that categorize the ways the words modify their heads (e.g. nmod or amod).
canceled
flights and Houston.
7
8
some cases the stems of words), and E are the arcs which represent the word relations.
word to every word that lies between the head and the dependent word in the sentence.
see if the arcs cross each other).
10
stack, and a list of tokens that need to be parsed.
elements in the stack are matched against the rules in the CFG. When matched, replace the two words with the non-terminal from the CFG.
which consists of: (i) stack, (ii) input buffer of words, (iii) set of relations representing a dependency tree.
accounted for and an appropriate dependency tree has been synthesized.
12
13
word list is initialized with the word list from the sentence, and an empty set of relation is created to represent the parse.
relation is found.
stack if a relation is found.
14
standard approach to transition based parsing.
the top-2 words in the stack. Once an element has its head-word, it is removed from the stack.
LEFTARC cannot be applied to it if it is the second element in the stack.
relations.
word pairs we have one choice as a relation between them.
15
16
“Book me the morning flight”
is one single parse for any two words from an input sentence: 1.Due to ambiguity there may be different transition sequences that lead to valid parses. 2.We assume that our oracle provides us with correct parses for each word pair. This assumption is unlikely to hold in reality.
17
transition operators.
between the top words in the stack and the transition operators.
reference parse and the current configuration.
the reference parse, and (ii) all of the dependents of the word at the top of the stack have already been assigned.
18
19
Training data for training the dependency parsing oracle.
20
⟨s1 . w, op⟩, ⟨s2 . w, op⟩⟨s1 . t, op⟩, ⟨s2 . t, op⟩⟨b1 . w, op⟩⟨s1 . wt, op⟩
Extract features based on feature templates from the configurations
algorithm delay the removal of dependent words whose head has been assigned until all its subsequent dependents have been found.
parses.
assigned as early as possible, before all its dependent words have been encountered.
22
at the front of the input buffer and the word at the top of the
word on top of the stack and the word at the front of the input
push it onto the stack
23
24
that maximizes some score (similar to constituent parsing).
dependencies,
graph with vertices being the words and directed edges being all the possible head-dependent relations.
algorithm to find the best parse trees.
26
̂ T(S) = arg max
t∈S
score(t, S)
score(t, S) = ∑
e∈t
score(e)
27
28
Maximum spanning tree shown in blue.
root Book that flight
4 4 12 5 7 5 6 8 7
29
Step: v=‘Book’
root Book 12 that 7 flight 8
4 4 12 5 7 5 6 8 7
30
Step: v=‘Book’
root Book 12 that 7 flight 8
4 4 12 5 7 5 6 8 7
31
Step: v=‘Book’
root Book 12 that 7 flight 8
4 4
7 5
8 7
32
Step: v=‘that’
root Book 12 that 7 flight 8
4 4
7 5
8 7
33
Step: v=‘that’
root Book 12 that 7 flight 8
4
7
8
34
Step: v=‘flight’
root Book 12 that 7 flight 8
4
7
8
35
Step: v=‘flight’
root Book 12 that 7 flight 8
36
Step: contract cycles
root Book 12 that 7 flight 8
cycle
recursively apply the algorithm
37
Step: contract cycles
root Book tf
recursively apply the algorithm
38
Step: expand
root Book that flight
expand T’ to determine which edge to delete
results in very strict criteria, for which, most of the algorithms will fail to produce entirely correct parse trees.
performance:
assigning the correct head for each dependent word, along with the correct relation.
40
41
42