. . . . . .
Graph Based Dependency Parsing
Graph Based Dependency Parsing
Wei Qiu December 15, 2011
Graph Based Dependency Parsing Wei Qiu December 15, 2011 . . - - PowerPoint PPT Presentation
. . . . . . Graph Based Dependency Parsing Graph Based Dependency Parsing Wei Qiu December 15, 2011 . . . . . . Graph Based Dependency Parsing Outline Outline 1 Introduction to dependency parsing 2 Graph based dependency
. . . . . .
Graph Based Dependency Parsing
Graph Based Dependency Parsing
Wei Qiu December 15, 2011
. . . . . .
Graph Based Dependency Parsing Outline
Outline
1 Introduction to dependency parsing 2 Graph based dependency parsing 3 Parsing algorithms
Projective dependency parsing Non-projective depency parsing(MST)
4 Learning framework 5 Evaluation
. . . . . .
Graph Based Dependency Parsing Introduction to dependency parsing
Outline
1 Introduction to dependency parsing 2 Graph based dependency parsing 3 Parsing algorithms
Projective dependency parsing Non-projective depency parsing(MST)
4 Learning framework 5 Evaluation
. . . . . .
Graph Based Dependency Parsing Introduction to dependency parsing
What is dependency parsing?
Input: a sentence John saw Mary yesterday. Output: a dependency tree . . John . saw . Mary . yesterday . . .
poss
.
nsubj
.
xcomp
.
dobj
. . . . . .
Graph Based Dependency Parsing Introduction to dependency parsing
Dependency tree
. . Root . John . saw . Mary . yesterday . . .
PRED
.
SBJ
.
OBJ
.
TMP
.
PU
Directed tree with root The arcs indicate certain grammatical relatoins between words Properties Single-headness:Each word depends on exactly one parent. Connectivity acyclic
. . . . . .
Graph Based Dependency Parsing Introduction to dependency parsing
Projective dependency tree
. . Root . John . saw . Mary . yesterday . . .
PRED
.
SBJ
.
OBJ
.
TMP
.
PU
The edges above the words don’t cross. Word and its decendents form a substring.
. . . . . .
Graph Based Dependency Parsing Introduction to dependency parsing
Projective dependency tree
. . Root . John . saw . Mary . yesterday . . .
PRED
.
SBJ
.
OBJ
.
TMP
.
PU
The edges above the words don’t cross. Word and its decendents form a substring.
. . . . . .
Graph Based Dependency Parsing Introduction to dependency parsing
Projective dependency tree
. . Root . John . saw . Mary . yesterday . . .
PRED
.
SBJ
.
OBJ
.
TMP
.
PU
The edges above the words don’t cross. Word and its decendents form a substring.
. . . . . .
Graph Based Dependency Parsing Introduction to dependency parsing
Projective dependency tree
. . . Root . John . saw . Mary . yesterday . . .
PRED
.
SBJ
.
OBJ
.
TMP
.
PU
The edges above the words don’t cross. Word and its decendents form a substring.
. . . . . .
Graph Based Dependency Parsing Introduction to dependency parsing
Non-projective dependency tree
John saw Mary yesterday who was a young lady. . . Root . John . saw . Mary . yesterday . who . . was . . a . . young . . lady . . . .
. . . . . .
Graph Based Dependency Parsing Introduction to dependency parsing
Non-projective dependency tree
John saw Mary yesterday who was a young lady. . . Root . John . saw . Mary . yesterday . who . . was . . a . . young . . lady . . . .
. . . . . .
Graph Based Dependency Parsing Introduction to dependency parsing
Non-projective dependency tree
John saw Mary yesterday who was a young lady. . . Root . John . saw . Mary . yesterday . who . . was . . a . . young . . lady . . . .
. . . . . .
Graph Based Dependency Parsing Introduction to dependency parsing
Non-projective dependency tree
John saw Mary yesterday who was a young lady. . . . Root . John . saw . Mary . yesterday . who . . was . . a . . young . . lady . . . .
. . . . . .
Graph Based Dependency Parsing Introduction to dependency parsing
Data-driven parsing framework
. . . . . .
Graph Based Dependency Parsing Graph based dependency parsing
Outline
1 Introduction to dependency parsing 2 Graph based dependency parsing 3 Parsing algorithms
Projective dependency parsing Non-projective depency parsing(MST)
4 Learning framework 5 Evaluation
. . . . . .
Graph Based Dependency Parsing Graph based dependency parsing
What is graph based dependcy parsing?
Graph-based models: Define a space of candidate dependency trees
Learning: induce a model for scoring a candidate tree Parsing: find a tree with the highest score given the model
. . . . . .
Graph Based Dependency Parsing Graph based dependency parsing
What is graph based dependency parsing?
Candidate trees for “John saw mary yesterday” . . Root . John . saw . Mary . yesterday . . .
PRED
.
SBJ
.
OBJ
.
TMP
.
PU
. . Root . John . saw . Mary . yesterday . . .
PRED
.
SBJ
.
OBJ
.
TMP
.
PU
. . . . Root . . John . . saw . . Mary . . yesterday . . . .
PRED
.
SBJ
.
OBJ
.
TMP
.
PU
. . . . Root . . John . . saw . . Mary . . yesterday . . . .
PRED
.
SBJ
.
OBJ
.
TMP
.
PU
. . .
. . . . . .
Graph Based Dependency Parsing Graph based dependency parsing
What is graph based dependency parsing?
Candidate trees for “John saw mary yesterday” . . Root . John . saw . Mary . yesterday . . .
PRED
.
SBJ
.
OBJ
.
TMP
.
PU
. . Root . John . saw . Mary . yesterday . .
.
.
PRED
..
SBJ
..
OBJ
.
.
TMP
..
PU
. . . . Root . . John . . saw . . Mary . . yesterday . . . .
PRED
.
SBJ
.
OBJ
.
TMP
.
PU
. . . . Root . . John . . saw . . Mary . . yesterday . . . .
PRED
.
SBJ
.
OBJ
.
TMP
.
PU
. . .
. . . . . .
Graph Based Dependency Parsing Graph based dependency parsing
Arc-factored model
. . Root . John . saw . Mary . yesterday . .
.
.
PRED
..
SBJ
..
OBJ
.
.
TMP
..
PU
X: an input sentence Y: a candidate dependency tree xi → xj: a dependency edge from word i to word j Φ(X): the set of possible depenent trees over X Y∗ = arg max
Y∈Φ(X)
score(Y|X) = arg max
Y∈Φ(X)
∑
(xi→xj)∈Y
score(xi → xj)
. . . . . .
Graph Based Dependency Parsing Graph based dependency parsing
Arc-factored model
Y∗ = arg max
Y∈Φ(X)
score(Y|X) = arg max
Y∈Φ(X)
∑
(xi→xj)∈Y
score(xi → xj) score(xi → xj) can be either probability or not. Mcdonald2005: score(xi → xj) = ⃗ w ·⃗ f(xi → xj)
. . . . . .
Graph Based Dependency Parsing Graph based dependency parsing
Arc-factored model
. Root . John . saw . Mary . yesterday
. . . . . .
Graph Based Dependency Parsing Graph based dependency parsing
Arc-factored model
. Root . John . saw . Mary . yesterday
. . . . . .
Graph Based Dependency Parsing Parsing algorithms
Outline
1 Introduction to dependency parsing 2 Graph based dependency parsing 3 Parsing algorithms
Projective dependency parsing Non-projective depency parsing(MST)
4 Learning framework 5 Evaluation
. . . . . .
Graph Based Dependency Parsing Parsing algorithms Projective dependency parsing
Outline
1 Introduction to dependency parsing 2 Graph based dependency parsing 3 Parsing algorithms
Projective dependency parsing Non-projective depency parsing(MST)
4 Learning framework 5 Evaluation
. . . . . .
Graph Based Dependency Parsing Parsing algorithms Projective dependency parsing
Naive CYK-like parsing
. . . Root . John . saw . Mary . yesterday . . .
PRED
.
SBJ
.
OBJ
.
TMP
.
PU
. saw . . . yesteday . . saw . . . Mary . . saw . . . saw . . John Ideas Legal subtree spans on contiguous string. Subtree can be built from smaller subtrees step by step.In each step, always combine only 2 subtrees!(Exactly CYK!)
. . . . . .
Graph Based Dependency Parsing Parsing algorithms Projective dependency parsing
Naive CYK-like parsing: example
Example John saw mary . John . . . saw . . John . saw . . . saw . . John . saw . . . Mary . . saw . . . . Mary . . . . . . . Mary . . . . . saw (John) (saw mary) . John . . . saw . . . Mary . . saw . . John . John . . . Mary . . . Mary . . saw . . John . . .
. . . . . .
Graph Based Dependency Parsing Parsing algorithms Projective dependency parsing
Naive CYK-like parsing: example
Example John saw mary . John . . . saw . . John . saw . . . saw . . John . saw . . . Mary . . saw . . . . Mary . . . . . . . Mary . . . . . saw (John) (saw mary) . John . . . saw . . . Mary . . saw . . John . John . . . Mary . . . Mary . . saw . . John . . .
. . . . . .
Graph Based Dependency Parsing Parsing algorithms Projective dependency parsing
Naive CYK-like parsing: example
Example John saw mary . John . . . saw . . John . saw . . . saw . . John . saw . . . Mary . . saw . . . . Mary . . . . . . . Mary . . . . . saw (John) (saw mary) . John . . . saw . . . Mary . . saw . . John . John . . . Mary . . . Mary . . saw . . John . . .
. . . . . .
Graph Based Dependency Parsing Parsing algorithms Projective dependency parsing
Naive CYK-like parsing: example
Example John saw mary . John . . . saw . . John . saw . . . saw . . John . saw . . . Mary . . saw . . . . Mary . . . . . . . Mary . . . . . saw (John) (saw mary) . John . . . saw . . . Mary . . saw . . John . . . . John . . . . . . . Mary . . . . . . . Mary . . . . . saw . . . . . John . . .
. . . . . .
Graph Based Dependency Parsing Parsing algorithms Projective dependency parsing
Naive CYK-like parsing: example
Example John saw mary . John . . . saw . . John . saw . . . saw . . John . saw . . . Mary . . saw . . . . Mary . . . . . . . Mary . . . . . saw (John saw) (mary) . John . . . Mary . . John . . . saw . . John . . .
. . . . . .
Graph Based Dependency Parsing Parsing algorithms Projective dependency parsing
Naive CYK-like parsing: example
Example John saw mary . John . . . saw . . John . saw . . . saw . . John . saw . . . Mary . . saw . . . . Mary . . . . . . . Mary . . . . . saw (John saw) (mary) . John . . . Mary . . John . . . saw . . John . . .
. . . . . .
Graph Based Dependency Parsing Parsing algorithms Projective dependency parsing
Naive CYK-like parsing: example
Example John saw mary . John . . . saw . . John . saw . . . saw . . John . saw . . . Mary . . saw . . . . Mary . . . . . . . Mary . . . . . saw . John . . . Mary . . John . . . saw . . John . John . . . saw . . . Mary . . saw . . John . . . . John . . . . . . . Mary . . . . . . . Mary . . . . . saw . . . . . John . . .
. . . . . .
Graph Based Dependency Parsing Parsing algorithms Projective dependency parsing
Naive CYK-like parsing
C[s][t][i] = max
s≤q<t,s≤j≤t
{ C[s][q][i] + C[q + 1][t][j] + λ(wi,wj)if j > i C[s][q][j] + C[q + 1][t][i] + λ(wj,wi)if j < i
. . . . . .
Graph Based Dependency Parsing Parsing algorithms Projective dependency parsing
Naive CYK-like parsing: why is it slow?
The time complexity is O(n5)! . s . q . t . i . j . s . t . i Heads are in the middle, we need extra indices.
. . . . . .
Graph Based Dependency Parsing Parsing algorithms Projective dependency parsing
Extention: Eisner’s algorithm
Intuition For each word, building left dependents is independent of building right dependents. Get rid of the inner indices of heads.
. . . . . .
Graph Based Dependency Parsing Parsing algorithms Projective dependency parsing
Eisner’s algorithm
. . . . . .
Graph Based Dependency Parsing Parsing algorithms Projective dependency parsing
Eisner’s algorithm
.
Figure: E[s][t][0][1]
.
Figure: E[s][t][1][1]
.
Figure: E[s][t][0][0]
.
Figure: E[s][t][1][0]
. . . . . .
Graph Based Dependency Parsing Parsing algorithms Projective dependency parsing
Eisner’s algorithm
. . . John . saw . Mary . . E[s][t][0][1] = maxs≤q<t(E[s][q][1][0] + E[q + 1][t][0][0] + λ(wt,ws))
. . . . . .
Graph Based Dependency Parsing Parsing algorithms Projective dependency parsing
Eisner’s algorithm
. . . John . saw . Mary . E[s][t][0][1] = maxs≤q<t(E[s][q][1][0] + E[q + 1][t][0][0] + λ(wt,ws))
. . . . . .
Graph Based Dependency Parsing Parsing algorithms Projective dependency parsing
Eisner’s algorithm
. . . John . saw . Mary . E[s][t][0][1] = maxs≤q<t(E[s][q][1][0] + E[q + 1][t][0][0] + λ(wt,ws)) . E[s][t][1][1] = maxs≤q<t(E[s][q][1][0] + E[q + 1][t][0][0] + λ(ws,wt)) . . . John . saw . Mary . E s t maxs
q t E s q
E q t . E s t maxs
q t E s q
E q t
. . . . . .
Graph Based Dependency Parsing Parsing algorithms Projective dependency parsing
Eisner’s algorithm
. . . John . saw . Mary . E[s][t][0][1] = maxs≤q<t(E[s][q][1][0] + E[q + 1][t][0][0] + λ(wt,ws)) . E[s][t][1][1] = maxs≤q<t(E[s][q][1][0] + E[q + 1][t][0][0] + λ(ws,wt)) . . . John . saw . Mary . . E[s][t][1][0] = maxs<q≤t(E[s][q][1][1] + E[q][t][1][0]) . E[s][t][0][0] = maxs≤q<t(E[s][q][0][0] + E[q][t][0][1])
. . . . . .
Graph Based Dependency Parsing Parsing algorithms Projective dependency parsing
Eisner’s algorithm
. . . John . saw . Mary . E[s][t][0][1] = maxs≤q<t(E[s][q][1][0] + E[q + 1][t][0][0] + λ(wt,ws)) . E[s][t][1][1] = maxs≤q<t(E[s][q][1][0] + E[q + 1][t][0][0] + λ(ws,wt)) . . . John . saw . Mary . E[s][t][1][0] = maxs<q≤t(E[s][q][1][1] + E[q][t][1][0]) . E s t maxs
q t E s q
E q t
. . . . . .
Graph Based Dependency Parsing Parsing algorithms Projective dependency parsing
Eisner’s algorithm
. . . John . saw . Mary . E[s][t][0][1] = maxs≤q<t(E[s][q][1][0] + E[q + 1][t][0][0] + λ(wt,ws)) . E[s][t][1][1] = maxs≤q<t(E[s][q][1][0] + E[q + 1][t][0][0] + λ(ws,wt)) . . . John . saw . Mary . E[s][t][1][0] = maxs<q≤t(E[s][q][1][1] + E[q][t][1][0]) . E[s][t][0][0] = maxs≤q<t(E[s][q][0][0] + E[q][t][0][1])
. . . . . .
Graph Based Dependency Parsing Parsing algorithms Non-projective depency parsing(MST)
Outline
1 Introduction to dependency parsing 2 Graph based dependency parsing 3 Parsing algorithms
Projective dependency parsing Non-projective depency parsing(MST)
4 Learning framework 5 Evaluation
. . . . . .
Graph Based Dependency Parsing Parsing algorithms Non-projective depency parsing(MST)
Maximum spanning tree
. Root . John . saw . Mary . yesterday
. . . . . .
Graph Based Dependency Parsing Parsing algorithms Non-projective depency parsing(MST)
Maximum spanning tree
. Root . John . saw . Mary . yesterday
. . . . . .
Graph Based Dependency Parsing Parsing algorithms Non-projective depency parsing(MST)
Chu-Liu-Edmonds algorithm
Ideas Greedy: always try to select the edges with highest weight. Contract: if cirlces occur, always try to break the circle with least value lost. Recursive: repeat this procedure until get the MST.
. . . . . .
Graph Based Dependency Parsing Parsing algorithms Non-projective depency parsing(MST)
Chu-Liu-Edmonds algorithm: example
. root . John . saw . Mary . 9 . 9 . 10 . 30 . 30 . . 11 . 20 . 3
. . . . . .
Graph Based Dependency Parsing Parsing algorithms Non-projective depency parsing(MST)
Chu-Liu-Edmonds algorithm: example
. root . John . saw . Mary . 9 . 9 . 10 . 30 . 30 . . 11 . 20 . 3 For each node, select incoming arc with highest weight. If there is no circle, done!
. . . . . .
Graph Based Dependency Parsing Parsing algorithms Non-projective depency parsing(MST)
Chu-Liu-Edmonds algorithm: example
. root . John . saw . Mary . 9 . 9 . 10 . 30 . 30 . . 11 . 20 . 3 . wjs
. . . . . .
Graph Based Dependency Parsing Parsing algorithms Non-projective depency parsing(MST)
Chu-Liu-Edmonds algorithm: example
. root . John . saw . Mary . 9 . 9 . 10 . 30 . 30 . . 11 . 20 . 3 . wjs No trick for the outgoing arc from the new node. Select the arc with the highest weight.
. . . . . .
Graph Based Dependency Parsing Parsing algorithms Non-projective depency parsing(MST)
Chu-Liu-Edmonds algorithm: example
. root . John . saw . Mary . 9 . 10 . 30 . 20 . wjs
. . . . . .
Graph Based Dependency Parsing Parsing algorithms Non-projective depency parsing(MST)
Chu-Liu-Edmonds algorithm: example
. root . John . saw . Mary . 9 . 40 . . 20 . wjs
. . . . . .
Graph Based Dependency Parsing Parsing algorithms Non-projective depency parsing(MST)
Chu-Liu-Edmonds algorithm: example
. root . John . saw . Mary . 9 . 10 . 30 . 20 . wjs
. . . . . .
Graph Based Dependency Parsing Parsing algorithms Non-projective depency parsing(MST)
Chu-Liu-Edmonds algorithm: example
. root . John . saw . Mary . 29 . 10 . 30 . . wjs
. . . . . .
Graph Based Dependency Parsing Parsing algorithms Non-projective depency parsing(MST)
Chu-Liu-Edmonds algorithm: example
. root . John . saw . Mary . 9 . 40 . . 20 . wjs
. . . . . .
Graph Based Dependency Parsing Parsing algorithms Non-projective depency parsing(MST)
Chu-Liu-Edmonds algorithm: example
. root . John . saw . Mary . 40 . 30 . . 31 . . wjs
. . . . . .
Graph Based Dependency Parsing Parsing algorithms Non-projective depency parsing(MST)
Chu-Liu-Edmonds algorithm: example
. root . John . saw . Mary . 40 . 30 . . 31 . . wjs
. . . . . .
Graph Based Dependency Parsing Parsing algorithms Non-projective depency parsing(MST)
Chu-Liu-Edmonds algorithm: example
. root . John . saw . Mary . 10 . 30 . 30 . wjs
. . . . . .
Graph Based Dependency Parsing Learning framework
Outline
1 Introduction to dependency parsing 2 Graph based dependency parsing 3 Parsing algorithms
Projective dependency parsing Non-projective depency parsing(MST)
4 Learning framework 5 Evaluation
. . . . . .
Graph Based Dependency Parsing Learning framework
Local learning
Given training data (X, Y) . . . Root . John . . saw . . Mary . . yesterday . . .
.
.
PRED
..
SBJ
..
OBJ
.
.
TMP
..
PU Word pair Link label Instance weight Features John-saw L 1 W1 John, W2 saw, W1W2 John saw, Pos1 noun, Pos2 verb,. . . saw-Mary R 1 W1 saw, W2 Mary, W1W2 saw Mary, Pos1 verb, Pos2 noun,. . . John-Mary N 1 W1 John, W2 Mary, W1W2 saw Mary, Pos1 noun, Pos2 noun,. . . . . .
Table: Training instances
. . . . . .
Graph Based Dependency Parsing Learning framework
Local learning
linear classfier → link classifier For each word pair in a sentence: No arc, left arc, right arc? Each arc is scored separetely without knowledge of other arcs
. . . . . .
Graph Based Dependency Parsing Learning framework
Online large-margin training
Intuition Not feed all of the training data once. Update ⃗ w step by step instead. Average on the sequence of ⃗ w∗
. . . . . .
Graph Based Dependency Parsing Evaluation
Outline
1 Introduction to dependency parsing 2 Graph based dependency parsing 3 Parsing algorithms
Projective dependency parsing Non-projective depency parsing(MST)
4 Learning framework 5 Evaluation
. . . . . .
Graph Based Dependency Parsing Evaluation
Evaluation methods
Simply use(labeled) dependency accuracy . . Root . John . saw . Mary . . yesterday . . . .
PRED
.
SBJ
.
OBJ
.
TMP
.
PU
1 3 Root PRED 3 2 saw SBJ 3 4 saw OBJ 3 5 saw TMP 3 6 saw PU
Table: Gold
. . . . . .
Graph Based Dependency Parsing Evaluation
Evaluation methods
Simply use(labeled) dependency accuracy . . Root . John . saw . Mary . . yesterday . . . .
PRED
.
SBJ
.
OBJ
.
TMP
.
PU
1 3 Root PRED 3 2 saw SBJ 3 4 saw OBJ 4 5 saw TMP 3 6 saw PU
Table: Parsed result
. . . . . .
Graph Based Dependency Parsing Evaluation
Evaluation methods
1 3 Root PRED 3 2 saw SBJ 3 4 saw OBJ 3 5 saw TMP 3 6 saw PU
Table: Gold
1 3 Root PRED 3 2 saw SBJ 3 4 saw OBJ 4 5 saw TMP 3 6 saw PU
Table: Parsed
accuracy = 4 5 = 0.8 No need to use F-meausre. Other metrics: Complete right tree . . .
. . . . . .
Graph Based Dependency Parsing Evaluation
Evaluation results
Czech-A Czech-B Accuracy Complete Accuracy Complete COLL1999 82.8
80.0 31.8
83.3 31.3 74.8 0.0 Single-best MIRA 84.1 32.2 81.0 14.9 Factored MIRA 84.4 32.3 81.5 14.3
Table: Dependency parsing results for Czech. Czech-B is the subset of Czech-A containing only sentences with at least one non-projective dependency.
. . . . . .
Graph Based Dependency Parsing Evaluation
Evaluation results
English Accuracy Complete McD2005 90.9 37.5 Singe-best MIRA 90.2 33.2 Factored MIRA 90.2 32.3
Table: Dependency parsing results for English using spanning tree algorithms.
. . . . . .
Graph Based Dependency Parsing Evaluation
. . . . . .
Graph Based Dependency Parsing Appendix
Eisner’s algorithm
S = w0w1 · · · wn Arc weight parameters λwi,wj ∈ Λ Instaniate E[n][n][2][2] ∈ R Initialization: E[s][s][d][c] = 0 for m: 1 . . . n do for s: 1 . . . n do t = s + m if t > n then break end if E[s][t][0][1] = maxs≤q<t(E[s][q][1][0] + E[q + 1][t][0][0] + λ(wt,ws)) E[s][t][1][1] = maxs≤q<t(E[s][q][1][0] + E[q + 1][t][0][0] + λ(ws,wt)) E[s][t][0][0] = maxs≤q<t(E[s][q][0][0] + E[q][t][0][1]) E[s][t][1][0] = maxs<q≤t(E[s][q][1][1] + E[q][t][1][0]) end for end for
. . . . . .
Graph Based Dependency Parsing Appendix
MIRA learning algorithm
Training data: T = (xt, yt)T
t=1
⃗ w0 = 0; ⃗ (v) = 0; i = 0 for n : 1 · · · N do for t : 1 · · · T do min ||⃗ wi+1 − ⃗ wi|| s.t. s(xt, yt) − s(xt, y
′) ≥ L(yt, y ′), ∀y ′ ∈ Φ(xt)
⃗ v = ⃗ v + ⃗ wi+1 i = i + 1 end for end for ⃗ w =
⃗ v N∗T