Emma Strubell
Algorithms for NLP
CS 11-711 · Fall 2020
Algorithms for NLP CS 11-711 Fall 2020 Lecture 14: Graph-based - - PowerPoint PPT Presentation
Algorithms for NLP CS 11-711 Fall 2020 Lecture 14: Graph-based dependency parsing Emma Strubell Announcements No recitation on Friday (Tartan Community Day). 2 Dependency parsing 3 Dependency parsing Input buffer Transition-based
Emma Strubell
CS 11-711 · Fall 2020
2
3
Dependency Relations wn w1 w2 s2 ... s1 sn
Parser
Input buffer Stack
Oracle
■ Transition-based (shift-reduce) parsing:
3
Dependency Relations wn w1 w2 s2 ... s1 sn
Parser
Input buffer Stack
Oracle
■ Transition-based (shift-reduce) parsing: ■ Greedy choice of local transitions guided
by a good classifier.
3
Dependency Relations wn w1 w2 s2 ... s1 sn
Parser
Input buffer Stack
Oracle
■ Transition-based (shift-reduce) parsing: ■ Greedy choice of local transitions guided
by a good classifier.
■ Examples: MaltParser [Nivre et al. 2008],
Stack LSTM [Dyer et al. 2015]
3
Dependency Relations wn w1 w2 s2 ... s1 sn
Parser
Input buffer Stack
Oracle
■ Transition-based (shift-reduce) parsing: ■ Greedy choice of local transitions guided
by a good classifier.
■ Examples: MaltParser [Nivre et al. 2008],
Stack LSTM [Dyer et al. 2015]
■ Graph-based dependency parsing:
3
root Book that flight 12 4 4 5 6 8 7 5 7
Dependency Relations wn w1 w2 s2 ... s1 sn
Parser
Input buffer Stack
Oracle
■ Transition-based (shift-reduce) parsing: ■ Greedy choice of local transitions guided
by a good classifier.
■ Examples: MaltParser [Nivre et al. 2008],
Stack LSTM [Dyer et al. 2015]
■ Graph-based dependency parsing: ■ Given scores for every pair of words, find
the (globally) highest scoring set of edges.
3
root Book that flight 12 4 4 5 6 8 7 5 7
Dependency Relations wn w1 w2 s2 ... s1 sn
Parser
Input buffer Stack
Oracle
■ Transition-based (shift-reduce) parsing: ■ Greedy choice of local transitions guided
by a good classifier.
■ Examples: MaltParser [Nivre et al. 2008],
Stack LSTM [Dyer et al. 2015]
■ Graph-based dependency parsing: ■ Given scores for every pair of words, find
the (globally) highest scoring set of edges.
■ Examples: MSTParser [McDonald et al.
2005], TurboParser [Martins et al. 2009], Deep Biaffine [Dozat et al. 2017]
3
root Book that flight 12 4 4 5 6 8 7 5 7
root Book that flight 12 4 4 5 6 8 7 5 7
4
root Book that flight 12 4 4 5 6 8 7 5 7
4
root Book that flight 12 4 4 5 6 8 7 5 7
4
Ψ(y, w; θ) = X
i
r
− →j∈y ψ(i
r
− → j, w, θ)
<latexit sha1_base64="B8n3543ays3HiDAwfo7VpnAkHG8=">AD9XicfVJLixNBEO4kPtb42KwevbSGhVCSFZBQYRFPXgRI5jdhXQIPT01kzb9GLp78mCYn+JNvPpbPpLvNozSZadzWLD9HxUfVX1VUHieDW9Xp/avXGjZu3bu/dad69d/Bfuvg4anVqWEwZFpocx5QC4IrGDruBJwnBqgMBJwFs/eF/2wOxnKtvrpVAmNJY8UjzqjzpklrRgaWH5FAZqu8g4v/In+DiZuCo8/wW0xsKicZJ0vD46mjxuhFZnL8DROu8DosJ4lPsUvZputs01a7V63Vx68C/ob0EabM5gc1A0JNUslKMcEtXbU7yVunFHjOBOQN0lqIaFsRmMYeaioBDvOyqnk+NBbQhxp4z/lcGm9HJFRae1KBp4pqZvaq7CeJ1vlLro9TjKkdKLYuFKUCO42LEeOQG2BOrDygzHCvFbMpNZQ5v4jm4eUyUxBzcNVGmBxnNiqrVyQFMq8GL9eNokBQumpaQqfJ6RiEouViFENBUuz4iNtvi6eXCOU/sZnQXKQU4ov1CuaJCQORIcVXN5bpJeTfJB/ALMvDJq/6cgKFOG6+EmljSZe4XFpMnpID/Y3J1wfSw2lZWCvDNFHPRCagsLyET2gIJYqPTpCJ4J74U6hPQyK9hzYdq2JrhX2n/6pvcBafH3f6L7vGXl+2Td5v3uoceo6foCPXRK3SCPqIBGiKGfqO/NVSrNRaN740fjZ9rar2iXmEKqfx6x+lY1wO</latexit>root Book that flight 12 4 4 5 6 8 7 5 7
4
Ψ(y, w; θ) = X
i
r
− →j∈y ψ(i
r
− → j, w, θ)
<latexit sha1_base64="B8n3543ays3HiDAwfo7VpnAkHG8=">AD9XicfVJLixNBEO4kPtb42KwevbSGhVCSFZBQYRFPXgRI5jdhXQIPT01kzb9GLp78mCYn+JNvPpbPpLvNozSZadzWLD9HxUfVX1VUHieDW9Xp/avXGjZu3bu/dad69d/Bfuvg4anVqWEwZFpocx5QC4IrGDruBJwnBqgMBJwFs/eF/2wOxnKtvrpVAmNJY8UjzqjzpklrRgaWH5FAZqu8g4v/In+DiZuCo8/wW0xsKicZJ0vD46mjxuhFZnL8DROu8DosJ4lPsUvZputs01a7V63Vx68C/ob0EabM5gc1A0JNUslKMcEtXbU7yVunFHjOBOQN0lqIaFsRmMYeaioBDvOyqnk+NBbQhxp4z/lcGm9HJFRae1KBp4pqZvaq7CeJ1vlLro9TjKkdKLYuFKUCO42LEeOQG2BOrDygzHCvFbMpNZQ5v4jm4eUyUxBzcNVGmBxnNiqrVyQFMq8GL9eNokBQumpaQqfJ6RiEouViFENBUuz4iNtvi6eXCOU/sZnQXKQU4ov1CuaJCQORIcVXN5bpJeTfJB/ALMvDJq/6cgKFOG6+EmljSZe4XFpMnpID/Y3J1wfSw2lZWCvDNFHPRCagsLyET2gIJYqPTpCJ4J74U6hPQyK9hzYdq2JrhX2n/6pvcBafH3f6L7vGXl+2Td5v3uoceo6foCPXRK3SCPqIBGiKGfqO/NVSrNRaN740fjZ9rar2iXmEKqfx6x+lY1wO</latexit>root Book that flight 12 4 4 5 6 8 7 5 7
4
Ψ(y, w; θ) = X
i
r
− →j∈y ψ(i
r
− → j, w, θ)
<latexit sha1_base64="B8n3543ays3HiDAwfo7VpnAkHG8=">AD9XicfVJLixNBEO4kPtb42KwevbSGhVCSFZBQYRFPXgRI5jdhXQIPT01kzb9GLp78mCYn+JNvPpbPpLvNozSZadzWLD9HxUfVX1VUHieDW9Xp/avXGjZu3bu/dad69d/Bfuvg4anVqWEwZFpocx5QC4IrGDruBJwnBqgMBJwFs/eF/2wOxnKtvrpVAmNJY8UjzqjzpklrRgaWH5FAZqu8g4v/In+DiZuCo8/wW0xsKicZJ0vD46mjxuhFZnL8DROu8DosJ4lPsUvZputs01a7V63Vx68C/ob0EabM5gc1A0JNUslKMcEtXbU7yVunFHjOBOQN0lqIaFsRmMYeaioBDvOyqnk+NBbQhxp4z/lcGm9HJFRae1KBp4pqZvaq7CeJ1vlLro9TjKkdKLYuFKUCO42LEeOQG2BOrDygzHCvFbMpNZQ5v4jm4eUyUxBzcNVGmBxnNiqrVyQFMq8GL9eNokBQumpaQqfJ6RiEouViFENBUuz4iNtvi6eXCOU/sZnQXKQU4ov1CuaJCQORIcVXN5bpJeTfJB/ALMvDJq/6cgKFOG6+EmljSZe4XFpMnpID/Y3J1wfSw2lZWCvDNFHPRCagsLyET2gIJYqPTpCJ4J74U6hPQyK9hzYdq2JrhX2n/6pvcBafH3f6L7vGXl+2Td5v3uoceo6foCPXRK3SCPqIBGiKGfqO/NVSrNRaN740fjZ9rar2iXmEKqfx6x+lY1wO</latexit>root Book that flight 12 4 4 5 6 8 7 5 7
Chu and Liu (1965) and Edmonds (1967) algorithm
4
Ψ(y, w; θ) = X
i
r
− →j∈y ψ(i
r
− → j, w, θ)
<latexit sha1_base64="B8n3543ays3HiDAwfo7VpnAkHG8=">AD9XicfVJLixNBEO4kPtb42KwevbSGhVCSFZBQYRFPXgRI5jdhXQIPT01kzb9GLp78mCYn+JNvPpbPpLvNozSZadzWLD9HxUfVX1VUHieDW9Xp/avXGjZu3bu/dad69d/Bfuvg4anVqWEwZFpocx5QC4IrGDruBJwnBqgMBJwFs/eF/2wOxnKtvrpVAmNJY8UjzqjzpklrRgaWH5FAZqu8g4v/In+DiZuCo8/wW0xsKicZJ0vD46mjxuhFZnL8DROu8DosJ4lPsUvZputs01a7V63Vx68C/ob0EabM5gc1A0JNUslKMcEtXbU7yVunFHjOBOQN0lqIaFsRmMYeaioBDvOyqnk+NBbQhxp4z/lcGm9HJFRae1KBp4pqZvaq7CeJ1vlLro9TjKkdKLYuFKUCO42LEeOQG2BOrDygzHCvFbMpNZQ5v4jm4eUyUxBzcNVGmBxnNiqrVyQFMq8GL9eNokBQumpaQqfJ6RiEouViFENBUuz4iNtvi6eXCOU/sZnQXKQU4ov1CuaJCQORIcVXN5bpJeTfJB/ALMvDJq/6cgKFOG6+EmljSZe4XFpMnpID/Y3J1wfSw2lZWCvDNFHPRCagsLyET2gIJYqPTpCJ4J74U6hPQyK9hzYdq2JrhX2n/6pvcBafH3f6L7vGXl+2Td5v3uoceo6foCPXRK3SCPqIBGiKGfqO/NVSrNRaN740fjZ9rar2iXmEKqfx6x+lY1wO</latexit>5
function MAXSPANNINGTREE(G=(V,E), root,score) returns spanning tree F←[] T’←[] score’←[] for each v ∈ V do bestInEdge←argmaxe=(u,v)∈ E score[e] F←F ∪ bestInEdge for each e=(u,v) ∈ E do score’[e]←score[e] − score[bestInEdge] if T=(V,F) is a spanning tree then return it else C←a cycle in F G’←CONTRACT(G,C) T’←MAXSPANNINGTREE(G’,root,score’) T←EXPAND(T’, C) return T function CONTRACT(G,C) returns contracted graph function EXPAND(T, C) returns expanded graph
5
function MAXSPANNINGTREE(G=(V,E), root,score) returns spanning tree F←[] T’←[] score’←[] for each v ∈ V do bestInEdge←argmaxe=(u,v)∈ E score[e] F←F ∪ bestInEdge for each e=(u,v) ∈ E do score’[e]←score[e] − score[bestInEdge] if T=(V,F) is a spanning tree then return it else C←a cycle in F G’←CONTRACT(G,C) T’←MAXSPANNINGTREE(G’,root,score’) T←EXPAND(T’, C) return T function CONTRACT(G,C) returns contracted graph function EXPAND(T, C) returns expanded graph
select best incoming edge for each node
5
function MAXSPANNINGTREE(G=(V,E), root,score) returns spanning tree F←[] T’←[] score’←[] for each v ∈ V do bestInEdge←argmaxe=(u,v)∈ E score[e] F←F ∪ bestInEdge for each e=(u,v) ∈ E do score’[e]←score[e] − score[bestInEdge] if T=(V,F) is a spanning tree then return it else C←a cycle in F G’←CONTRACT(G,C) T’←MAXSPANNINGTREE(G’,root,score’) T←EXPAND(T’, C) return T function CONTRACT(G,C) returns contracted graph function EXPAND(T, C) returns expanded graph
select best incoming edge for each node subtract its score from all incoming edges
5
function MAXSPANNINGTREE(G=(V,E), root,score) returns spanning tree F←[] T’←[] score’←[] for each v ∈ V do bestInEdge←argmaxe=(u,v)∈ E score[e] F←F ∪ bestInEdge for each e=(u,v) ∈ E do score’[e]←score[e] − score[bestInEdge] if T=(V,F) is a spanning tree then return it else C←a cycle in F G’←CONTRACT(G,C) T’←MAXSPANNINGTREE(G’,root,score’) T←EXPAND(T’, C) return T function CONTRACT(G,C) returns contracted graph function EXPAND(T, C) returns expanded graph
select best incoming edge for each node subtract its score from all incoming edges stopping condition
5
function MAXSPANNINGTREE(G=(V,E), root,score) returns spanning tree F←[] T’←[] score’←[] for each v ∈ V do bestInEdge←argmaxe=(u,v)∈ E score[e] F←F ∪ bestInEdge for each e=(u,v) ∈ E do score’[e]←score[e] − score[bestInEdge] if T=(V,F) is a spanning tree then return it else C←a cycle in F G’←CONTRACT(G,C) T’←MAXSPANNINGTREE(G’,root,score’) T←EXPAND(T’, C) return T function CONTRACT(G,C) returns contracted graph function EXPAND(T, C) returns expanded graph
select best incoming edge for each node subtract its score from all incoming edges stopping condition contract nodes if there are cycles
5
function MAXSPANNINGTREE(G=(V,E), root,score) returns spanning tree F←[] T’←[] score’←[] for each v ∈ V do bestInEdge←argmaxe=(u,v)∈ E score[e] F←F ∪ bestInEdge for each e=(u,v) ∈ E do score’[e]←score[e] − score[bestInEdge] if T=(V,F) is a spanning tree then return it else C←a cycle in F G’←CONTRACT(G,C) T’←MAXSPANNINGTREE(G’,root,score’) T←EXPAND(T’, C) return T function CONTRACT(G,C) returns contracted graph function EXPAND(T, C) returns expanded graph
select best incoming edge for each node subtract its score from all incoming edges stopping condition contract nodes if there are cycles recursively compute MST
5
function MAXSPANNINGTREE(G=(V,E), root,score) returns spanning tree F←[] T’←[] score’←[] for each v ∈ V do bestInEdge←argmaxe=(u,v)∈ E score[e] F←F ∪ bestInEdge for each e=(u,v) ∈ E do score’[e]←score[e] − score[bestInEdge] if T=(V,F) is a spanning tree then return it else C←a cycle in F G’←CONTRACT(G,C) T’←MAXSPANNINGTREE(G’,root,score’) T←EXPAND(T’, C) return T function CONTRACT(G,C) returns contracted graph function EXPAND(T, C) returns expanded graph
select best incoming edge for each node subtract its score from all incoming edges stopping condition contract nodes if there are cycles recursively compute MST expand contracted nodes
6
root Book 12 that 7 flight 8 12 4 4 5 6 8 7 5 7
7
root Book 12 that 7 flight 8 12 4 4 5 6 8 7 5 7
root Book 12 that 7 flight 8
8
root Book 12 that 7 flight 8
9
10
root Book tf
11
root Book tf
12
root Book that flight Deleted from cycle
13
function MAXSPANNINGTREE(G=(V,E), root,score) returns spanning tree F←[] T’←[] score’←[] for each v ∈ V do bestInEdge←argmaxe=(u,v)∈ E score[e] F←F ∪ bestInEdge for each e=(u,v) ∈ E do score’[e]←score[e] − score[bestInEdge] if T=(V,F) is a spanning tree then return it else C←a cycle in F G’←CONTRACT(G,C) T’←MAXSPANNINGTREE(G’,root,score’) T←EXPAND(T’, C) return T function CONTRACT(G,C) returns contracted graph function EXPAND(T, C) returns expanded graph
runtime?
13
function MAXSPANNINGTREE(G=(V,E), root,score) returns spanning tree F←[] T’←[] score’←[] for each v ∈ V do bestInEdge←argmaxe=(u,v)∈ E score[e] F←F ∪ bestInEdge for each e=(u,v) ∈ E do score’[e]←score[e] − score[bestInEdge] if T=(V,F) is a spanning tree then return it else C←a cycle in F G’←CONTRACT(G,C) T’←MAXSPANNINGTREE(G’,root,score’) T←EXPAND(T’, C) return T function CONTRACT(G,C) returns contracted graph function EXPAND(T, C) returns expanded graph
runtime? naive: O(n3)
13
function MAXSPANNINGTREE(G=(V,E), root,score) returns spanning tree F←[] T’←[] score’←[] for each v ∈ V do bestInEdge←argmaxe=(u,v)∈ E score[e] F←F ∪ bestInEdge for each e=(u,v) ∈ E do score’[e]←score[e] − score[bestInEdge] if T=(V,F) is a spanning tree then return it else C←a cycle in F G’←CONTRACT(G,C) T’←MAXSPANNINGTREE(G’,root,score’) T←EXPAND(T’, C) return T function CONTRACT(G,C) returns contracted graph function EXPAND(T, C) returns expanded graph
runtime? naive: O(n3) fancy: O(n2 + nlogn)
13
function MAXSPANNINGTREE(G=(V,E), root,score) returns spanning tree F←[] T’←[] score’←[] for each v ∈ V do bestInEdge←argmaxe=(u,v)∈ E score[e] F←F ∪ bestInEdge for each e=(u,v) ∈ E do score’[e]←score[e] − score[bestInEdge] if T=(V,F) is a spanning tree then return it else C←a cycle in F G’←CONTRACT(G,C) T’←MAXSPANNINGTREE(G’,root,score’) T←EXPAND(T’, C) return T function CONTRACT(G,C) returns contracted graph function EXPAND(T, C) returns expanded graph
runtime? naive: O(n3) fancy: O(n2 + nlogn) what about labeled parsing?
root Book that flight 12 4 4 5 6 8 7 5 7
Chu and Liu (1965) and Edmonds (1967) algorithm
14
Ψ(y, w; θ) = X
i
r
− →j∈y ψ(i
r
− → j, w, θ)
<latexit sha1_base64="B8n3543ays3HiDAwfo7VpnAkHG8=">AD9XicfVJLixNBEO4kPtb42KwevbSGhVCSFZBQYRFPXgRI5jdhXQIPT01kzb9GLp78mCYn+JNvPpbPpLvNozSZadzWLD9HxUfVX1VUHieDW9Xp/avXGjZu3bu/dad69d/Bfuvg4anVqWEwZFpocx5QC4IrGDruBJwnBqgMBJwFs/eF/2wOxnKtvrpVAmNJY8UjzqjzpklrRgaWH5FAZqu8g4v/In+DiZuCo8/wW0xsKicZJ0vD46mjxuhFZnL8DROu8DosJ4lPsUvZputs01a7V63Vx68C/ob0EabM5gc1A0JNUslKMcEtXbU7yVunFHjOBOQN0lqIaFsRmMYeaioBDvOyqnk+NBbQhxp4z/lcGm9HJFRae1KBp4pqZvaq7CeJ1vlLro9TjKkdKLYuFKUCO42LEeOQG2BOrDygzHCvFbMpNZQ5v4jm4eUyUxBzcNVGmBxnNiqrVyQFMq8GL9eNokBQumpaQqfJ6RiEouViFENBUuz4iNtvi6eXCOU/sZnQXKQU4ov1CuaJCQORIcVXN5bpJeTfJB/ALMvDJq/6cgKFOG6+EmljSZe4XFpMnpID/Y3J1wfSw2lZWCvDNFHPRCagsLyET2gIJYqPTpCJ4J74U6hPQyK9hzYdq2JrhX2n/6pvcBafH3f6L7vGXl+2Td5v3uoceo6foCPXRK3SCPqIBGiKGfqO/NVSrNRaN740fjZ9rar2iXmEKqfx6x+lY1wO</latexit>root Book that flight 12 4 4 5 6 8 7 5 7
Chu and Liu (1965) and Edmonds (1967) algorithm
14
Ψ(y, w; θ) = X
i
r
− →j∈y ψ(i
r
− → j, w, θ)
<latexit sha1_base64="B8n3543ays3HiDAwfo7VpnAkHG8=">AD9XicfVJLixNBEO4kPtb42KwevbSGhVCSFZBQYRFPXgRI5jdhXQIPT01kzb9GLp78mCYn+JNvPpbPpLvNozSZadzWLD9HxUfVX1VUHieDW9Xp/avXGjZu3bu/dad69d/Bfuvg4anVqWEwZFpocx5QC4IrGDruBJwnBqgMBJwFs/eF/2wOxnKtvrpVAmNJY8UjzqjzpklrRgaWH5FAZqu8g4v/In+DiZuCo8/wW0xsKicZJ0vD46mjxuhFZnL8DROu8DosJ4lPsUvZputs01a7V63Vx68C/ob0EabM5gc1A0JNUslKMcEtXbU7yVunFHjOBOQN0lqIaFsRmMYeaioBDvOyqnk+NBbQhxp4z/lcGm9HJFRae1KBp4pqZvaq7CeJ1vlLro9TjKkdKLYuFKUCO42LEeOQG2BOrDygzHCvFbMpNZQ5v4jm4eUyUxBzcNVGmBxnNiqrVyQFMq8GL9eNokBQumpaQqfJ6RiEouViFENBUuz4iNtvi6eXCOU/sZnQXKQU4ov1CuaJCQORIcVXN5bpJeTfJB/ALMvDJq/6cgKFOG6+EmljSZe4XFpMnpID/Y3J1wfSw2lZWCvDNFHPRCagsLyET2gIJYqPTpCJ4J74U6hPQyK9hzYdq2JrhX2n/6pvcBafH3f6L7vGXl+2Td5v3uoceo6foCPXRK3SCPqIBGiKGfqO/NVSrNRaN740fjZ9rar2iXmEKqfx6x+lY1wO</latexit>15
Ψ(y, w; θ) = X
i
r
− →j∈y ψ(i
r
− → j, w, θ)
<latexit sha1_base64="B8n3543ays3HiDAwfo7VpnAkHG8=">AD9XicfVJLixNBEO4kPtb42KwevbSGhVCSFZBQYRFPXgRI5jdhXQIPT01kzb9GLp78mCYn+JNvPpbPpLvNozSZadzWLD9HxUfVX1VUHieDW9Xp/avXGjZu3bu/dad69d/Bfuvg4anVqWEwZFpocx5QC4IrGDruBJwnBqgMBJwFs/eF/2wOxnKtvrpVAmNJY8UjzqjzpklrRgaWH5FAZqu8g4v/In+DiZuCo8/wW0xsKicZJ0vD46mjxuhFZnL8DROu8DosJ4lPsUvZputs01a7V63Vx68C/ob0EabM5gc1A0JNUslKMcEtXbU7yVunFHjOBOQN0lqIaFsRmMYeaioBDvOyqnk+NBbQhxp4z/lcGm9HJFRae1KBp4pqZvaq7CeJ1vlLro9TjKkdKLYuFKUCO42LEeOQG2BOrDygzHCvFbMpNZQ5v4jm4eUyUxBzcNVGmBxnNiqrVyQFMq8GL9eNokBQumpaQqfJ6RiEouViFENBUuz4iNtvi6eXCOU/sZnQXKQU4ov1CuaJCQORIcVXN5bpJeTfJB/ALMvDJq/6cgKFOG6+EmljSZe4XFpMnpID/Y3J1wfSw2lZWCvDNFHPRCagsLyET2gIJYqPTpCJ4J74U6hPQyK9hzYdq2JrhX2n/6pvcBafH3f6L7vGXl+2Td5v3uoceo6foCPXRK3SCPqIBGiKGfqO/NVSrNRaN740fjZ9rar2iXmEKqfx6x+lY1wO</latexit>as sum of edge scores.
15
Ψ(y, w; θ) = X
i
r
− →j∈y ψ(i
r
− → j, w, θ)
<latexit sha1_base64="B8n3543ays3HiDAwfo7VpnAkHG8=">AD9XicfVJLixNBEO4kPtb42KwevbSGhVCSFZBQYRFPXgRI5jdhXQIPT01kzb9GLp78mCYn+JNvPpbPpLvNozSZadzWLD9HxUfVX1VUHieDW9Xp/avXGjZu3bu/dad69d/Bfuvg4anVqWEwZFpocx5QC4IrGDruBJwnBqgMBJwFs/eF/2wOxnKtvrpVAmNJY8UjzqjzpklrRgaWH5FAZqu8g4v/In+DiZuCo8/wW0xsKicZJ0vD46mjxuhFZnL8DROu8DosJ4lPsUvZputs01a7V63Vx68C/ob0EabM5gc1A0JNUslKMcEtXbU7yVunFHjOBOQN0lqIaFsRmMYeaioBDvOyqnk+NBbQhxp4z/lcGm9HJFRae1KBp4pqZvaq7CeJ1vlLro9TjKkdKLYuFKUCO42LEeOQG2BOrDygzHCvFbMpNZQ5v4jm4eUyUxBzcNVGmBxnNiqrVyQFMq8GL9eNokBQumpaQqfJ6RiEouViFENBUuz4iNtvi6eXCOU/sZnQXKQU4ov1CuaJCQORIcVXN5bpJeTfJB/ALMvDJq/6cgKFOG6+EmljSZe4XFpMnpID/Y3J1wfSw2lZWCvDNFHPRCagsLyET2gIJYqPTpCJ4J74U6hPQyK9hzYdq2JrhX2n/6pvcBafH3f6L7vGXl+2Td5v3uoceo6foCPXRK3SCPqIBGiKGfqO/NVSrNRaN740fjZ9rar2iXmEKqfx6x+lY1wO</latexit>First order h m Second order h s m g h m Third order g h s m h t s m
as sum of edge scores.
third order [Koo and Collins, 2010] projective dependency parsing.
15
Ψ(y, w; θ) = X
i
r
− →j∈y ψ(i
r
− → j, w, θ)
<latexit sha1_base64="B8n3543ays3HiDAwfo7VpnAkHG8=">AD9XicfVJLixNBEO4kPtb42KwevbSGhVCSFZBQYRFPXgRI5jdhXQIPT01kzb9GLp78mCYn+JNvPpbPpLvNozSZadzWLD9HxUfVX1VUHieDW9Xp/avXGjZu3bu/dad69d/Bfuvg4anVqWEwZFpocx5QC4IrGDruBJwnBqgMBJwFs/eF/2wOxnKtvrpVAmNJY8UjzqjzpklrRgaWH5FAZqu8g4v/In+DiZuCo8/wW0xsKicZJ0vD46mjxuhFZnL8DROu8DosJ4lPsUvZputs01a7V63Vx68C/ob0EabM5gc1A0JNUslKMcEtXbU7yVunFHjOBOQN0lqIaFsRmMYeaioBDvOyqnk+NBbQhxp4z/lcGm9HJFRae1KBp4pqZvaq7CeJ1vlLro9TjKkdKLYuFKUCO42LEeOQG2BOrDygzHCvFbMpNZQ5v4jm4eUyUxBzcNVGmBxnNiqrVyQFMq8GL9eNokBQumpaQqfJ6RiEouViFENBUuz4iNtvi6eXCOU/sZnQXKQU4ov1CuaJCQORIcVXN5bpJeTfJB/ALMvDJq/6cgKFOG6+EmljSZe4XFpMnpID/Y3J1wfSw2lZWCvDNFHPRCagsLyET2gIJYqPTpCJ4J74U6hPQyK9hzYdq2JrhX2n/6pvcBafH3f6L7vGXl+2Td5v3uoceo6foCPXRK3SCPqIBGiKGfqO/NVSrNRaN740fjZ9rar2iXmEKqfx6x+lY1wO</latexit>First order h m Second order h s m g h m Third order g h s m h t s m
as sum of edge scores.
third order [Koo and Collins, 2010] projective dependency parsing.
[McDonald and Pereira, 2006]!
15
Ψ(y, w; θ) = X
i
r
− →j∈y ψ(i
r
− → j, w, θ)
<latexit sha1_base64="B8n3543ays3HiDAwfo7VpnAkHG8=">AD9XicfVJLixNBEO4kPtb42KwevbSGhVCSFZBQYRFPXgRI5jdhXQIPT01kzb9GLp78mCYn+JNvPpbPpLvNozSZadzWLD9HxUfVX1VUHieDW9Xp/avXGjZu3bu/dad69d/Bfuvg4anVqWEwZFpocx5QC4IrGDruBJwnBqgMBJwFs/eF/2wOxnKtvrpVAmNJY8UjzqjzpklrRgaWH5FAZqu8g4v/In+DiZuCo8/wW0xsKicZJ0vD46mjxuhFZnL8DROu8DosJ4lPsUvZputs01a7V63Vx68C/ob0EabM5gc1A0JNUslKMcEtXbU7yVunFHjOBOQN0lqIaFsRmMYeaioBDvOyqnk+NBbQhxp4z/lcGm9HJFRae1KBp4pqZvaq7CeJ1vlLro9TjKkdKLYuFKUCO42LEeOQG2BOrDygzHCvFbMpNZQ5v4jm4eUyUxBzcNVGmBxnNiqrVyQFMq8GL9eNokBQumpaQqfJ6RiEouViFENBUuz4iNtvi6eXCOU/sZnQXKQU4ov1CuaJCQORIcVXN5bpJeTfJB/ALMvDJq/6cgKFOG6+EmljSZe4XFpMnpID/Y3J1wfSw2lZWCvDNFHPRCagsLyET2gIJYqPTpCJ4J74U6hPQyK9hzYdq2JrhX2n/6pvcBafH3f6L7vGXl+2Td5v3uoceo6foCPXRK3SCPqIBGiKGfqO/NVSrNRaN740fjZ9rar2iXmEKqfx6x+lY1wO</latexit>First order h m Second order h s m g h m Third order g h s m h t s m
as sum of edge scores.
16
First order h m
Ψ(y, w; θ) = X
i
r
− →j∈y ψ(i
r
− → j, w, θ)
<latexit sha1_base64="B8n3543ays3HiDAwfo7VpnAkHG8=">AD9XicfVJLixNBEO4kPtb42KwevbSGhVCSFZBQYRFPXgRI5jdhXQIPT01kzb9GLp78mCYn+JNvPpbPpLvNozSZadzWLD9HxUfVX1VUHieDW9Xp/avXGjZu3bu/dad69d/Bfuvg4anVqWEwZFpocx5QC4IrGDruBJwnBqgMBJwFs/eF/2wOxnKtvrpVAmNJY8UjzqjzpklrRgaWH5FAZqu8g4v/In+DiZuCo8/wW0xsKicZJ0vD46mjxuhFZnL8DROu8DosJ4lPsUvZputs01a7V63Vx68C/ob0EabM5gc1A0JNUslKMcEtXbU7yVunFHjOBOQN0lqIaFsRmMYeaioBDvOyqnk+NBbQhxp4z/lcGm9HJFRae1KBp4pqZvaq7CeJ1vlLro9TjKkdKLYuFKUCO42LEeOQG2BOrDygzHCvFbMpNZQ5v4jm4eUyUxBzcNVGmBxnNiqrVyQFMq8GL9eNokBQumpaQqfJ6RiEouViFENBUuz4iNtvi6eXCOU/sZnQXKQU4ov1CuaJCQORIcVXN5bpJeTfJB/ALMvDJq/6cgKFOG6+EmljSZe4XFpMnpID/Y3J1wfSw2lZWCvDNFHPRCagsLyET2gIJYqPTpCJ4J74U6hPQyK9hzYdq2JrhX2n/6pvcBafH3f6L7vGXl+2Td5v3uoceo6foCPXRK3SCPqIBGiKGfqO/NVSrNRaN740fjZ9rar2iXmEKqfx6x+lY1wO</latexit>as sum of edge scores.
16
ψ
<latexit sha1_base64="0rWVhw/omzXUia/crvc+39E482A=">AD+3icfVJbaxNBFJ4k2tb1luqjL6MhUCWUpAoKIhT1wRcxgmkLmRBmJ2c3Y+eyzszmwrK/xjfx1d8i/hjB2U1Suk1xYGc/zvnOd85Z8JEcOu63T+1euPGzZ3dvVvB7Tt3791v7j84sTo1DAZMC23OQmpBcAUDx52As8QAlaGA0/D8XeE/nYGxXKsvbpnASNJY8Ygz6rxp3PzWJn3LD0gos2XewcV/nr/GxE3B0af4DSY2leOMk4Xh8dRY/Q8Mzn+iglXeBWk8Sn2KZs0nU26YKCOG62uofd8uBt0FuDFlqf/ni/bshEs1SCckxQa4e9buJGTWOMwF5QFILCWXnNIah4pKsKOsnE2O294ywZE2/lMOl9bLERmV1i5l6JmSuqm96iuM1/mGqYtejTKuktSBYqtCUSqw07gYNJ5wA8yJpQeUGe61YjalhjLn1xG0L5eZgpiBqzbC5CizUVm9IimUeTV4sWo0IAYUzJmWkqrJs4xEVHKxnEBEU+HyjNhog6+bV2cy4ldj+4ipQBHtN8qV1QIiBwprq53Dkp74C8B78gAx+96k8JGOq08UqoiSVd5H5hMXlMCvg/JlcXTA+rbWlAN9MRedgMryEjKhLZAwNjpNKoK34kuhPgGN/BpWfKiGrRj+lfauvsltcHJ02Ht+ePT5Rev47fq97qFH6Ak6QD30Eh2jD6iPBoih3+hvbae28gb3xs/Gj9X1HptHfMQVU7j1z84HF4h</latexit>First order h m
Ψ(y, w; θ) = X
i
r
− →j∈y ψ(i
r
− → j, w, θ)
<latexit sha1_base64="B8n3543ays3HiDAwfo7VpnAkHG8=">AD9XicfVJLixNBEO4kPtb42KwevbSGhVCSFZBQYRFPXgRI5jdhXQIPT01kzb9GLp78mCYn+JNvPpbPpLvNozSZadzWLD9HxUfVX1VUHieDW9Xp/avXGjZu3bu/dad69d/Bfuvg4anVqWEwZFpocx5QC4IrGDruBJwnBqgMBJwFs/eF/2wOxnKtvrpVAmNJY8UjzqjzpklrRgaWH5FAZqu8g4v/In+DiZuCo8/wW0xsKicZJ0vD46mjxuhFZnL8DROu8DosJ4lPsUvZputs01a7V63Vx68C/ob0EabM5gc1A0JNUslKMcEtXbU7yVunFHjOBOQN0lqIaFsRmMYeaioBDvOyqnk+NBbQhxp4z/lcGm9HJFRae1KBp4pqZvaq7CeJ1vlLro9TjKkdKLYuFKUCO42LEeOQG2BOrDygzHCvFbMpNZQ5v4jm4eUyUxBzcNVGmBxnNiqrVyQFMq8GL9eNokBQumpaQqfJ6RiEouViFENBUuz4iNtvi6eXCOU/sZnQXKQU4ov1CuaJCQORIcVXN5bpJeTfJB/ALMvDJq/6cgKFOG6+EmljSZe4XFpMnpID/Y3J1wfSw2lZWCvDNFHPRCagsLyET2gIJYqPTpCJ4J74U6hPQyK9hzYdq2JrhX2n/6pvcBafH3f6L7vGXl+2Td5v3uoceo6foCPXRK3SCPqIBGiKGfqO/NVSrNRaN740fjZ9rar2iXmEKqfx6x+lY1wO</latexit>as sum of edge scores.
17
18
back the bill will Janet
word embeddings per-token features
neural network
19
per-token features
neural network
back/VB the/DT bill/NN will/MD Janet/NNP
word embeddings pos embeddings
19
per-token features
neural network
edge features
…
root → Janet will → Janet back → Janet the → Janet bill → Janet Janet → bill
back/VB the/DT bill/NN will/MD Janet/NNP
word embeddings pos embeddings
19
per-token features
neural network
edge features
…
root → Janet will → Janet back → Janet the → Janet bill → Janet Janet → bill
back/VB the/DT bill/NN will/MD Janet/NNP
word embeddings pos embeddings
20
20
LSTM f xthe concat LSTM f xbrown concat LSTM f xfox concat LSTM f xjumped concat LSTM f x∗ concat LSTM b s0 LSTM b s1 LSTM b s2 LSTM b s3 LSTM b s4 Vthe Vbrown Vfox Vjumped V∗ MLP MLP MLP MLP
+
20
LSTM f xthe concat LSTM f xbrown concat LSTM f xfox concat LSTM f xjumped concat LSTM f x∗ concat LSTM b s0 LSTM b s1 LSTM b s2 LSTM b s3 LSTM b s4 Vthe Vbrown Vfox Vjumped V∗ MLP MLP MLP MLP
+
stacked LSTM token encoder
20
LSTM f xthe concat LSTM f xbrown concat LSTM f xfox concat LSTM f xjumped concat LSTM f x∗ concat LSTM b s0 LSTM b s1 LSTM b s2 LSTM b s3 LSTM b s4 Vthe Vbrown Vfox Vjumped V∗ MLP MLP MLP MLP
+
stacked LSTM token encoder concat head, dependent as input to MLP
20
LSTM f xthe concat LSTM f xbrown concat LSTM f xfox concat LSTM f xjumped concat LSTM f x∗ concat LSTM b s0 LSTM b s1 LSTM b s2 LSTM b s3 LSTM b s4 Vthe Vbrown Vfox Vjumped V∗ MLP MLP MLP MLP
+
stacked LSTM token encoder concat head, dependent as input to MLP
max ⇣ 0, 1 max
y06=y
X
(h,m)2y0
MLP(vh vm) X ⌘ ⇣ X
2
+ X
(h,m)2y
MLP(vh vm) ⌘
20
LSTM f xthe concat LSTM f xbrown concat LSTM f xfox concat LSTM f xjumped concat LSTM f x∗ concat LSTM b s0 LSTM b s1 LSTM b s2 LSTM b s3 LSTM b s4 Vthe Vbrown Vfox Vjumped V∗ MLP MLP MLP MLP
+
stacked LSTM token encoder concat head, dependent as input to MLP
max ⇣ 0, 1 max
y06=y
X
(h,m)2y0
MLP(vh vm) X ⌘ ⇣ X
2
+ X
(h,m)2y
MLP(vh vm) ⌘
max(0, 1 + score(x, y) X
y06=y
X
part2y0
(scorelocal(x, part) +
part62y))
21
21
... root ROOT Kim NNP
1 1 1 1
⊤ · · = BiLSTM: ri Embeddings: xi MLP: h(arc-dep)
i
, h(arc-head)
i
H(arc-dep) ⊕ 1 U (arc) H(arc-head) S(arc)
21
... root ROOT Kim NNP
1 1 1 1
⊤ · · = BiLSTM: ri Embeddings: xi MLP: h(arc-dep)
i
, h(arc-head)
i
H(arc-dep) ⊕ 1 U (arc) H(arc-head) S(arc)
stacked LSTM token encoder
21
... root ROOT Kim NNP
1 1 1 1
⊤ · · = BiLSTM: ri Embeddings: xi MLP: h(arc-dep)
i
, h(arc-head)
i
H(arc-dep) ⊕ 1 U (arc) H(arc-head) S(arc)
stacked LSTM token encoder MLPhead MLPdep
21
... root ROOT Kim NNP
1 1 1 1
⊤ · · = BiLSTM: ri Embeddings: xi MLP: h(arc-dep)
i
, h(arc-head)
i
H(arc-dep) ⊕ 1 U (arc) H(arc-head) S(arc)
stacked LSTM token encoder MLPhead MLPdep biaffine classifier
si = Wri + b Fixed-class affine classifier s(arc)
i
=
ri +
Variable-class biaffine classifier
21
... root ROOT Kim NNP
1 1 1 1
⊤ · · = BiLSTM: ri Embeddings: xi MLP: h(arc-dep)
i
, h(arc-head)
i
H(arc-dep) ⊕ 1 U (arc) H(arc-head) S(arc)
stacked LSTM token encoder MLPhead MLPdep biaffine classifier
si = Wri + b Fixed-class affine classifier s(arc)
i
=
ri +
Variable-class biaffine classifier
22
ψ
<latexit sha1_base64="0rWVhw/omzXUia/crvc+39E482A=">AD+3icfVJbaxNBFJ4k2tb1luqjL6MhUCWUpAoKIhT1wRcxgmkLmRBmJ2c3Y+eyzszmwrK/xjfx1d8i/hjB2U1Suk1xYGc/zvnOd85Z8JEcOu63T+1euPGzZ3dvVvB7Tt3791v7j84sTo1DAZMC23OQmpBcAUDx52As8QAlaGA0/D8XeE/nYGxXKsvbpnASNJY8Ygz6rxp3PzWJn3LD0gos2XewcV/nr/GxE3B0af4DSY2leOMk4Xh8dRY/Q8Mzn+iglXeBWk8Sn2KZs0nU26YKCOG62uofd8uBt0FuDFlqf/ni/bshEs1SCckxQa4e9buJGTWOMwF5QFILCWXnNIah4pKsKOsnE2O294ywZE2/lMOl9bLERmV1i5l6JmSuqm96iuM1/mGqYtejTKuktSBYqtCUSqw07gYNJ5wA8yJpQeUGe61YjalhjLn1xG0L5eZgpiBqzbC5CizUVm9IimUeTV4sWo0IAYUzJmWkqrJs4xEVHKxnEBEU+HyjNhog6+bV2cy4ldj+4ipQBHtN8qV1QIiBwprq53Dkp74C8B78gAx+96k8JGOq08UqoiSVd5H5hMXlMCvg/JlcXTA+rbWlAN9MRedgMryEjKhLZAwNjpNKoK34kuhPgGN/BpWfKiGrRj+lfauvsltcHJ02Ht+ePT5Rev47fq97qFH6Ak6QD30Eh2jD6iPBoih3+hvbae28gb3xs/Gj9X1HptHfMQVU7j1z84HF4h</latexit>First order h m
Ψ(y, w; θ) = X
i
r
− →j∈y ψ(i
r
− → j, w, θ)
<latexit sha1_base64="B8n3543ays3HiDAwfo7VpnAkHG8=">AD9XicfVJLixNBEO4kPtb42KwevbSGhVCSFZBQYRFPXgRI5jdhXQIPT01kzb9GLp78mCYn+JNvPpbPpLvNozSZadzWLD9HxUfVX1VUHieDW9Xp/avXGjZu3bu/dad69d/Bfuvg4anVqWEwZFpocx5QC4IrGDruBJwnBqgMBJwFs/eF/2wOxnKtvrpVAmNJY8UjzqjzpklrRgaWH5FAZqu8g4v/In+DiZuCo8/wW0xsKicZJ0vD46mjxuhFZnL8DROu8DosJ4lPsUvZputs01a7V63Vx68C/ob0EabM5gc1A0JNUslKMcEtXbU7yVunFHjOBOQN0lqIaFsRmMYeaioBDvOyqnk+NBbQhxp4z/lcGm9HJFRae1KBp4pqZvaq7CeJ1vlLro9TjKkdKLYuFKUCO42LEeOQG2BOrDygzHCvFbMpNZQ5v4jm4eUyUxBzcNVGmBxnNiqrVyQFMq8GL9eNokBQumpaQqfJ6RiEouViFENBUuz4iNtvi6eXCOU/sZnQXKQU4ov1CuaJCQORIcVXN5bpJeTfJB/ALMvDJq/6cgKFOG6+EmljSZe4XFpMnpID/Y3J1wfSw2lZWCvDNFHPRCagsLyET2gIJYqPTpCJ4J74U6hPQyK9hzYdq2JrhX2n/6pvcBafH3f6L7vGXl+2Td5v3uoceo6foCPXRK3SCPqIBGiKGfqO/NVSrNRaN740fjZ9rar2iXmEKqfx6x+lY1wO</latexit>as sum of edge scores.
22
ψ
<latexit sha1_base64="0rWVhw/omzXUia/crvc+39E482A=">AD+3icfVJbaxNBFJ4k2tb1luqjL6MhUCWUpAoKIhT1wRcxgmkLmRBmJ2c3Y+eyzszmwrK/xjfx1d8i/hjB2U1Suk1xYGc/zvnOd85Z8JEcOu63T+1euPGzZ3dvVvB7Tt3791v7j84sTo1DAZMC23OQmpBcAUDx52As8QAlaGA0/D8XeE/nYGxXKsvbpnASNJY8Ygz6rxp3PzWJn3LD0gos2XewcV/nr/GxE3B0af4DSY2leOMk4Xh8dRY/Q8Mzn+iglXeBWk8Sn2KZs0nU26YKCOG62uofd8uBt0FuDFlqf/ni/bshEs1SCckxQa4e9buJGTWOMwF5QFILCWXnNIah4pKsKOsnE2O294ywZE2/lMOl9bLERmV1i5l6JmSuqm96iuM1/mGqYtejTKuktSBYqtCUSqw07gYNJ5wA8yJpQeUGe61YjalhjLn1xG0L5eZgpiBqzbC5CizUVm9IimUeTV4sWo0IAYUzJmWkqrJs4xEVHKxnEBEU+HyjNhog6+bV2cy4ldj+4ipQBHtN8qV1QIiBwprq53Dkp74C8B78gAx+96k8JGOq08UqoiSVd5H5hMXlMCvg/JlcXTA+rbWlAN9MRedgMryEjKhLZAwNjpNKoK34kuhPgGN/BpWfKiGrRj+lfauvsltcHJ02Ht+ePT5Rev47fq97qFH6Ak6QD30Eh2jD6iPBoih3+hvbae28gb3xs/Gj9X1HptHfMQVU7j1z84HF4h</latexit>First order h m
Ψ(y, w; θ) = X
i
r
− →j∈y ψ(i
r
− → j, w, θ)
<latexit sha1_base64="B8n3543ays3HiDAwfo7VpnAkHG8=">AD9XicfVJLixNBEO4kPtb42KwevbSGhVCSFZBQYRFPXgRI5jdhXQIPT01kzb9GLp78mCYn+JNvPpbPpLvNozSZadzWLD9HxUfVX1VUHieDW9Xp/avXGjZu3bu/dad69d/Bfuvg4anVqWEwZFpocx5QC4IrGDruBJwnBqgMBJwFs/eF/2wOxnKtvrpVAmNJY8UjzqjzpklrRgaWH5FAZqu8g4v/In+DiZuCo8/wW0xsKicZJ0vD46mjxuhFZnL8DROu8DosJ4lPsUvZputs01a7V63Vx68C/ob0EabM5gc1A0JNUslKMcEtXbU7yVunFHjOBOQN0lqIaFsRmMYeaioBDvOyqnk+NBbQhxp4z/lcGm9HJFRae1KBp4pqZvaq7CeJ1vlLro9TjKkdKLYuFKUCO42LEeOQG2BOrDygzHCvFbMpNZQ5v4jm4eUyUxBzcNVGmBxnNiqrVyQFMq8GL9eNokBQumpaQqfJ6RiEouViFENBUuz4iNtvi6eXCOU/sZnQXKQU4ov1CuaJCQORIcVXN5bpJeTfJB/ALMvDJq/6cgKFOG6+EmljSZe4XFpMnpID/Y3J1wfSw2lZWCvDNFHPRCagsLyET2gIJYqPTpCJ4J74U6hPQyK9hzYdq2JrhX2n/6pvcBafH3f6L7vGXl+2Td5v3uoceo6foCPXRK3SCPqIBGiKGfqO/NVSrNRaN740fjZ9rar2iXmEKqfx6x+lY1wO</latexit>as sum of edge scores.
22
ψ
<latexit sha1_base64="0rWVhw/omzXUia/crvc+39E482A=">AD+3icfVJbaxNBFJ4k2tb1luqjL6MhUCWUpAoKIhT1wRcxgmkLmRBmJ2c3Y+eyzszmwrK/xjfx1d8i/hjB2U1Suk1xYGc/zvnOd85Z8JEcOu63T+1euPGzZ3dvVvB7Tt3791v7j84sTo1DAZMC23OQmpBcAUDx52As8QAlaGA0/D8XeE/nYGxXKsvbpnASNJY8Ygz6rxp3PzWJn3LD0gos2XewcV/nr/GxE3B0af4DSY2leOMk4Xh8dRY/Q8Mzn+iglXeBWk8Sn2KZs0nU26YKCOG62uofd8uBt0FuDFlqf/ni/bshEs1SCckxQa4e9buJGTWOMwF5QFILCWXnNIah4pKsKOsnE2O294ywZE2/lMOl9bLERmV1i5l6JmSuqm96iuM1/mGqYtejTKuktSBYqtCUSqw07gYNJ5wA8yJpQeUGe61YjalhjLn1xG0L5eZgpiBqzbC5CizUVm9IimUeTV4sWo0IAYUzJmWkqrJs4xEVHKxnEBEU+HyjNhog6+bV2cy4ldj+4ipQBHtN8qV1QIiBwprq53Dkp74C8B78gAx+96k8JGOq08UqoiSVd5H5hMXlMCvg/JlcXTA+rbWlAN9MRedgMryEjKhLZAwNjpNKoK34kuhPgGN/BpWfKiGrRj+lfauvsltcHJ02Ht+ePT5Rev47fq97qFH6Ak6QD30Eh2jD6iPBoih3+hvbae28gb3xs/Gj9X1HptHfMQVU7j1z84HF4h</latexit>First order h m
Ψ(y, w; θ) = X
i
r
− →j∈y ψ(i
r
− → j, w, θ)
<latexit sha1_base64="B8n3543ays3HiDAwfo7VpnAkHG8=">AD9XicfVJLixNBEO4kPtb42KwevbSGhVCSFZBQYRFPXgRI5jdhXQIPT01kzb9GLp78mCYn+JNvPpbPpLvNozSZadzWLD9HxUfVX1VUHieDW9Xp/avXGjZu3bu/dad69d/Bfuvg4anVqWEwZFpocx5QC4IrGDruBJwnBqgMBJwFs/eF/2wOxnKtvrpVAmNJY8UjzqjzpklrRgaWH5FAZqu8g4v/In+DiZuCo8/wW0xsKicZJ0vD46mjxuhFZnL8DROu8DosJ4lPsUvZputs01a7V63Vx68C/ob0EabM5gc1A0JNUslKMcEtXbU7yVunFHjOBOQN0lqIaFsRmMYeaioBDvOyqnk+NBbQhxp4z/lcGm9HJFRae1KBp4pqZvaq7CeJ1vlLro9TjKkdKLYuFKUCO42LEeOQG2BOrDygzHCvFbMpNZQ5v4jm4eUyUxBzcNVGmBxnNiqrVyQFMq8GL9eNokBQumpaQqfJ6RiEouViFENBUuz4iNtvi6eXCOU/sZnQXKQU4ov1CuaJCQORIcVXN5bpJeTfJB/ALMvDJq/6cgKFOG6+EmljSZe4XFpMnpID/Y3J1wfSw2lZWCvDNFHPRCagsLyET2gIJYqPTpCJ4J74U6hPQyK9hzYdq2JrhX2n/6pvcBafH3f6L7vGXl+2Td5v3uoceo6foCPXRK3SCPqIBGiKGfqO/NVSrNRaN740fjZ9rar2iXmEKqfx6x+lY1wO</latexit>as sum of edge scores.
22
ψ
<latexit sha1_base64="0rWVhw/omzXUia/crvc+39E482A=">AD+3icfVJbaxNBFJ4k2tb1luqjL6MhUCWUpAoKIhT1wRcxgmkLmRBmJ2c3Y+eyzszmwrK/xjfx1d8i/hjB2U1Suk1xYGc/zvnOd85Z8JEcOu63T+1euPGzZ3dvVvB7Tt3791v7j84sTo1DAZMC23OQmpBcAUDx52As8QAlaGA0/D8XeE/nYGxXKsvbpnASNJY8Ygz6rxp3PzWJn3LD0gos2XewcV/nr/GxE3B0af4DSY2leOMk4Xh8dRY/Q8Mzn+iglXeBWk8Sn2KZs0nU26YKCOG62uofd8uBt0FuDFlqf/ni/bshEs1SCckxQa4e9buJGTWOMwF5QFILCWXnNIah4pKsKOsnE2O294ywZE2/lMOl9bLERmV1i5l6JmSuqm96iuM1/mGqYtejTKuktSBYqtCUSqw07gYNJ5wA8yJpQeUGe61YjalhjLn1xG0L5eZgpiBqzbC5CizUVm9IimUeTV4sWo0IAYUzJmWkqrJs4xEVHKxnEBEU+HyjNhog6+bV2cy4ldj+4ipQBHtN8qV1QIiBwprq53Dkp74C8B78gAx+96k8JGOq08UqoiSVd5H5hMXlMCvg/JlcXTA+rbWlAN9MRedgMryEjKhLZAwNjpNKoK34kuhPgGN/BpWfKiGrRj+lfauvsltcHJ02Ht+ePT5Rev47fq97qFH6Ak6QD30Eh2jD6iPBoih3+hvbae28gb3xs/Gj9X1HptHfMQVU7j1z84HF4h</latexit>First order h m
Ψ(y, w; θ) = X
i
r
− →j∈y ψ(i
r
− → j, w, θ)
<latexit sha1_base64="B8n3543ays3HiDAwfo7VpnAkHG8=">AD9XicfVJLixNBEO4kPtb42KwevbSGhVCSFZBQYRFPXgRI5jdhXQIPT01kzb9GLp78mCYn+JNvPpbPpLvNozSZadzWLD9HxUfVX1VUHieDW9Xp/avXGjZu3bu/dad69d/Bfuvg4anVqWEwZFpocx5QC4IrGDruBJwnBqgMBJwFs/eF/2wOxnKtvrpVAmNJY8UjzqjzpklrRgaWH5FAZqu8g4v/In+DiZuCo8/wW0xsKicZJ0vD46mjxuhFZnL8DROu8DosJ4lPsUvZputs01a7V63Vx68C/ob0EabM5gc1A0JNUslKMcEtXbU7yVunFHjOBOQN0lqIaFsRmMYeaioBDvOyqnk+NBbQhxp4z/lcGm9HJFRae1KBp4pqZvaq7CeJ1vlLro9TjKkdKLYuFKUCO42LEeOQG2BOrDygzHCvFbMpNZQ5v4jm4eUyUxBzcNVGmBxnNiqrVyQFMq8GL9eNokBQumpaQqfJ6RiEouViFENBUuz4iNtvi6eXCOU/sZnQXKQU4ov1CuaJCQORIcVXN5bpJeTfJB/ALMvDJq/6cgKFOG6+EmljSZe4XFpMnpID/Y3J1wfSw2lZWCvDNFHPRCagsLyET2gIJYqPTpCJ4J74U6hPQyK9hzYdq2JrhX2n/6pvcBafH3f6L7vGXl+2Td5v3uoceo6foCPXRK3SCPqIBGiKGfqO/NVSrNRaN740fjZ9rar2iXmEKqfx6x+lY1wO</latexit>as sum of edge scores.
tree (with a margin).
22
ψ
<latexit sha1_base64="0rWVhw/omzXUia/crvc+39E482A=">AD+3icfVJbaxNBFJ4k2tb1luqjL6MhUCWUpAoKIhT1wRcxgmkLmRBmJ2c3Y+eyzszmwrK/xjfx1d8i/hjB2U1Suk1xYGc/zvnOd85Z8JEcOu63T+1euPGzZ3dvVvB7Tt3791v7j84sTo1DAZMC23OQmpBcAUDx52As8QAlaGA0/D8XeE/nYGxXKsvbpnASNJY8Ygz6rxp3PzWJn3LD0gos2XewcV/nr/GxE3B0af4DSY2leOMk4Xh8dRY/Q8Mzn+iglXeBWk8Sn2KZs0nU26YKCOG62uofd8uBt0FuDFlqf/ni/bshEs1SCckxQa4e9buJGTWOMwF5QFILCWXnNIah4pKsKOsnE2O294ywZE2/lMOl9bLERmV1i5l6JmSuqm96iuM1/mGqYtejTKuktSBYqtCUSqw07gYNJ5wA8yJpQeUGe61YjalhjLn1xG0L5eZgpiBqzbC5CizUVm9IimUeTV4sWo0IAYUzJmWkqrJs4xEVHKxnEBEU+HyjNhog6+bV2cy4ldj+4ipQBHtN8qV1QIiBwprq53Dkp74C8B78gAx+96k8JGOq08UqoiSVd5H5hMXlMCvg/JlcXTA+rbWlAN9MRedgMryEjKhLZAwNjpNKoK34kuhPgGN/BpWfKiGrRj+lfauvsltcHJ02Ht+ePT5Rev47fq97qFH6Ak6QD30Eh2jD6iPBoih3+hvbae28gb3xs/Gj9X1HptHfMQVU7j1z84HF4h</latexit>First order h m
Ψ(y, w; θ) = X
i
r
− →j∈y ψ(i
r
− → j, w, θ)
<latexit sha1_base64="B8n3543ays3HiDAwfo7VpnAkHG8=">AD9XicfVJLixNBEO4kPtb42KwevbSGhVCSFZBQYRFPXgRI5jdhXQIPT01kzb9GLp78mCYn+JNvPpbPpLvNozSZadzWLD9HxUfVX1VUHieDW9Xp/avXGjZu3bu/dad69d/Bfuvg4anVqWEwZFpocx5QC4IrGDruBJwnBqgMBJwFs/eF/2wOxnKtvrpVAmNJY8UjzqjzpklrRgaWH5FAZqu8g4v/In+DiZuCo8/wW0xsKicZJ0vD46mjxuhFZnL8DROu8DosJ4lPsUvZputs01a7V63Vx68C/ob0EabM5gc1A0JNUslKMcEtXbU7yVunFHjOBOQN0lqIaFsRmMYeaioBDvOyqnk+NBbQhxp4z/lcGm9HJFRae1KBp4pqZvaq7CeJ1vlLro9TjKkdKLYuFKUCO42LEeOQG2BOrDygzHCvFbMpNZQ5v4jm4eUyUxBzcNVGmBxnNiqrVyQFMq8GL9eNokBQumpaQqfJ6RiEouViFENBUuz4iNtvi6eXCOU/sZnQXKQU4ov1CuaJCQORIcVXN5bpJeTfJB/ALMvDJq/6cgKFOG6+EmljSZe4XFpMnpID/Y3J1wfSw2lZWCvDNFHPRCagsLyET2gIJYqPTpCJ4J74U6hPQyK9hzYdq2JrhX2n/6pvcBafH3f6L7vGXl+2Td5v3uoceo6foCPXRK3SCPqIBGiKGfqO/NVSrNRaN740fjZ9rar2iXmEKqfx6x+lY1wO</latexit>as sum of edge scores.
tree (with a margin).
variant of Kirchhoff’s Matrix-Tree Theorem [Tutte, 1984; Koo et al. 2007].
23
23
Dependency Relations wn w1 w2 s2 ... s1 sn
Parser
Input buffer Stack
Oracle
23
Dependency Relations wn w1 w2 s2 ... s1 sn
Parser
Input buffer Stack
Oracle
root Book that flight 12 4 4 5 6 8 7 5 7
24
Dependency parsing: accuracy vs. speed Accuracy (UAS) Speed (sentences/sec)
24
[Zhang & McDonald 2014] TurboParser [Martins et al. 2010]
Dependency parsing: accuracy vs. speed Accuracy (UAS) Speed (sentences/sec)
Graph-based
24
MaltParser [Nivre et al. 2009] [Zhang & McDonald 2014] TurboParser [Martins et al. 2010]
Dependency parsing: accuracy vs. speed Accuracy (UAS) Speed (sentences/sec)
Graph-based Transition-based
24
MaltParser [Nivre et al. 2009] [Zhang & McDonald 2014] TurboParser [Martins et al. 2010] [Rush & Petrov 2012;
Dependency parsing: accuracy vs. speed Accuracy (UAS) Speed (sentences/sec)
Graph-based Transition-based Structured prediction cascades Weiss & Taskar 2010]
24
MaltParser [Nivre et al. 2009] [Chen & Manning 2014] [Zhang & McDonald 2014] TurboParser [Martins et al. 2010] [Rush & Petrov 2012;
Dependency parsing: accuracy vs. speed Accuracy (UAS) Speed (sentences/sec)
Graph-based Transition-based FF neural nets Structured prediction cascades Weiss & Taskar 2010]
24
transition-based w/ dynamic feature selection MaltParser [Nivre et al. 2009] [Chen & Manning 2014] [Zhang & McDonald 2014] TurboParser [Martins et al. 2010] [Rush & Petrov 2012;
Dependency parsing: accuracy vs. speed Accuracy (UAS) Speed (sentences/sec)
Graph-based Transition-based FF neural nets Structured prediction cascades [Strubell et al. 2015] Weiss & Taskar 2010] [Strubell et al. 2015]
25
transition-based w/ dynamic feature selection
Dependency parsing: accuracy vs. speed Accuracy (UAS) Speed (sentences/sec)
Graph-based Transition-based FF neural nets Structured prediction cascades
94 95 96
25
transition-based w/ dynamic feature selection
Dependency parsing: accuracy vs. speed Accuracy (UAS) Speed (sentences/sec)
Graph-based Transition-based FF neural nets Structured prediction cascades
94 95 96
Stacked LSTMs [Dozat and Manning 2017]*
25
transition-based w/ dynamic feature selection
Dependency parsing: accuracy vs. speed Accuracy (UAS) Speed (sentences/sec)
Graph-based Transition-based FF neural nets Structured prediction cascades
94 95 96
Stacked LSTMs [Dozat and Manning 2017]*
*on GPU!!
25
transition-based w/ dynamic feature selection
Dependency parsing: accuracy vs. speed Accuracy (UAS) Speed (sentences/sec)
Graph-based Transition-based FF neural nets Structured prediction cascades
94 95 96
Stacked LSTMs [Dozat and Manning 2017]*
*on GPU!!
26