Graph-based Dependency Parsing (Chu-Liu-Edmonds algorithm) Sam - - PowerPoint PPT Presentation

graph based dependency parsing
SMART_READER_LITE
LIVE PREVIEW

Graph-based Dependency Parsing (Chu-Liu-Edmonds algorithm) Sam - - PowerPoint PPT Presentation

Graph-based Dependency Parsing (Chu-Liu-Edmonds algorithm) Sam Thomson (with thanks to Swabha Swayamdipta) University of Washington, CSE 490u February 22, 2017 Outline Dependency trees Three main approaches to parsing


slide-1
SLIDE 1

Graph-based Dependency Parsing

(Chu-Liu-Edmonds algorithm) Sam Thomson (with thanks to Swabha Swayamdipta)

University of Washington, CSE 490u

February 22, 2017

slide-2
SLIDE 2

Outline

◮ Dependency trees ◮ Three main approaches to parsing ◮ Chu-Liu-Edmonds algorithm ◮ Arc scoring / Learning

slide-3
SLIDE 3

Dependency Parsing - Output

slide-4
SLIDE 4

Dependency Parsing

TurboParser output from http://demo.ark.cs.cmu.edu/parse?sentence=I%20ate%20the%20fish%20with%20a%20fork.

slide-5
SLIDE 5

Dependency Parsing - Output Structure

A parse is an arborescence (aka directed rooted tree):

◮ Directed [Labeled] Graph ◮ Acyclic ◮ Single Root ◮ Connected and Spanning: ∃ directed path from root to every

  • ther word
slide-6
SLIDE 6

Projective / Non-projective

◮ Some parses are projective: edges don’t cross ◮ Most English sentences are projective, but non-projectivity is

common in other languages (e.g. Czech, Hindi) Non-projective sentence in English: and Czech:

Examples from Non-projective Dependency Parsing using Spanning Tree Algorithms McDonald et al., EMNLP ’05

slide-7
SLIDE 7

Dependency Parsing - Approaches

slide-8
SLIDE 8

Dependency Parsing Approaches

◮ Chart (Eisner, CKY)

◮ O(n3) ◮ Only produces projective parses

slide-9
SLIDE 9

Dependency Parsing Approaches

◮ Chart (Eisner, CKY)

◮ O(n3) ◮ Only produces projective parses

◮ Shift-reduce

◮ O(n) (fast!), but inexact ◮ “Pseudo-projective” trick can capture some non-projectivity

slide-10
SLIDE 10

Dependency Parsing Approaches

◮ Chart (Eisner, CKY)

◮ O(n3) ◮ Only produces projective parses

◮ Shift-reduce

◮ O(n) (fast!), but inexact ◮ “Pseudo-projective” trick can capture some non-projectivity

◮ Graph-based (MST)

◮ O(n2) for arc-factored ◮ Can produce projective and non-projective parses

slide-11
SLIDE 11

Graph-based Dependency Parsing

slide-12
SLIDE 12

Arc-Factored Model

Every possible labeled directed edge e between every pair of nodes gets a score, score(e).

slide-13
SLIDE 13

Arc-Factored Model

Every possible labeled directed edge e between every pair of nodes gets a score, score(e). G = V , E = (O(n2) edges)

Example from Non-projective Dependency Parsing using Spanning Tree Algorithms McDonald et al., EMNLP ’05

slide-14
SLIDE 14

Arc-Factored Model

Best parse is: A∗ = arg max

A⊆G s.t. A an arborescence

  • e∈A

score(e)

  • etc. . .

The Chu-Liu-Edmonds algorithm finds this argmax.

Example from Non-projective Dependency Parsing using Spanning Tree Algorithms McDonald et al., EMNLP ’05

slide-15
SLIDE 15

Arc-Factored Model

Best parse is: A∗ = arg max

A⊆G s.t. A an arborescence

  • e∈A

score(e)

  • etc. . .

The Chu-Liu-Edmonds algorithm finds this argmax.

Example from Non-projective Dependency Parsing using Spanning Tree Algorithms McDonald et al., EMNLP ’05

slide-16
SLIDE 16

Arc-Factored Model

Best parse is: A∗ = arg max

A⊆G s.t. A an arborescence

  • e∈A

score(e)

  • etc. . .

The Chu-Liu-Edmonds algorithm finds this argmax.

Example from Non-projective Dependency Parsing using Spanning Tree Algorithms McDonald et al., EMNLP ’05

slide-17
SLIDE 17

Chu-Liu-Edmonds

Chu and Liu ’65, On the Shortest Arborescence of a Directed Graph, Science Sinica Edmonds ’67, Optimum Branchings, JRNBS

slide-18
SLIDE 18

Chu-Liu-Edmonds - Intuition

Every non-ROOT node needs exactly 1 incoming edge

slide-19
SLIDE 19

Chu-Liu-Edmonds - Intuition

Every non-ROOT node needs exactly 1 incoming edge In fact, every connected component that doesn’t contain ROOT needs exactly 1 incoming edge

slide-20
SLIDE 20

Chu-Liu-Edmonds - Intuition

Every non-ROOT node needs exactly 1 incoming edge In fact, every connected component that doesn’t contain ROOT needs exactly 1 incoming edge

◮ Greedily pick an incoming edge for each node.

slide-21
SLIDE 21

Chu-Liu-Edmonds - Intuition

Every non-ROOT node needs exactly 1 incoming edge In fact, every connected component that doesn’t contain ROOT needs exactly 1 incoming edge

◮ Greedily pick an incoming edge for each node. ◮ If this forms an arborescence, great!

slide-22
SLIDE 22

Chu-Liu-Edmonds - Intuition

Every non-ROOT node needs exactly 1 incoming edge In fact, every connected component that doesn’t contain ROOT needs exactly 1 incoming edge

◮ Greedily pick an incoming edge for each node. ◮ If this forms an arborescence, great! ◮ Otherwise, it will contain a cycle C.

slide-23
SLIDE 23

Chu-Liu-Edmonds - Intuition

Every non-ROOT node needs exactly 1 incoming edge In fact, every connected component that doesn’t contain ROOT needs exactly 1 incoming edge

◮ Greedily pick an incoming edge for each node. ◮ If this forms an arborescence, great! ◮ Otherwise, it will contain a cycle C. ◮ Arborescences can’t have cycles, so we can’t keep every edge

in C. One edge in C must get kicked out.

slide-24
SLIDE 24

Chu-Liu-Edmonds - Intuition

Every non-ROOT node needs exactly 1 incoming edge In fact, every connected component that doesn’t contain ROOT needs exactly 1 incoming edge

◮ Greedily pick an incoming edge for each node. ◮ If this forms an arborescence, great! ◮ Otherwise, it will contain a cycle C. ◮ Arborescences can’t have cycles, so we can’t keep every edge

in C. One edge in C must get kicked out.

◮ C also needs an incoming edge.

slide-25
SLIDE 25

Chu-Liu-Edmonds - Intuition

Every non-ROOT node needs exactly 1 incoming edge In fact, every connected component that doesn’t contain ROOT needs exactly 1 incoming edge

◮ Greedily pick an incoming edge for each node. ◮ If this forms an arborescence, great! ◮ Otherwise, it will contain a cycle C. ◮ Arborescences can’t have cycles, so we can’t keep every edge

in C. One edge in C must get kicked out.

◮ C also needs an incoming edge. ◮ Choosing an incoming edge for C determines which edge to

kick out

slide-26
SLIDE 26

Chu-Liu-Edmonds - Recursive (Inefficient) Definition

def maxArborescence(V , E, ROOT ): ””” returns best arborescence as a map from each node to its parent ””” for v in V \ ROOT: bestInEdge[v] ← arg maxe∈inEdges[v] e.score if bestInEdge contains a cycle C: # build a new graph where C is contracted into a single node vC ← new Node() V ′ ← V ∪ {vC } \ C E′ ← {adjust(e) for e ∈ E \ C} A ← maxArborescence(V ′, E′, ROOT ) return {e.original for e ∈ A} ∪ C \ {A[vC ].kicksOut} # each node got a parent without creating any cycles return bestInEdge def adjust(e): e′ ← copy(e) e′.original ← e if e.dest ∈ C: e′.dest ← vC e′.kicksOut ← bestInEdge[e.dest] e′.score ← e.score − e′.kicksOut.score elif e.src ∈ C: e′.src ← vC return e′

slide-27
SLIDE 27

Chu-Liu-Edmonds

Consists of two stages:

◮ Contracting (everything before the recursive call) ◮ Expanding (everything after the recursive call)

slide-28
SLIDE 28

Chu-Liu-Edmonds - Preprocessing

◮ Remove every edge incoming to ROOT

◮ This ensures that ROOT is in fact the root of any solution

◮ For every ordered pair of nodes, vi, vj, remove all but the

highest-scoring edge from vi to vj

slide-29
SLIDE 29

Chu-Liu-Edmonds - Contracting Stage

◮ For each non-ROOT node v, set bestInEdge[v] to be its

highest scoring incoming edge.

◮ If a cycle C is formed:

◮ contract the nodes in C into a new node vC ◮ edges outgoing from any node in C now get source vC ◮ edges incoming to any node in C now get destination vC ◮ For each node u in C, and for each edge e incoming to u from

  • utside of C:

◮ set e.kicksOut to bestInEdge[u], and ◮ set e.score to be e.score − e.kicksOut.score.

◮ Repeat until every non-ROOT node has an incoming edge and

no cycles are formed

slide-30
SLIDE 30

An Example - Contracting Stage

V1 ROOT V3 V2 a : 5 b : 1 c : 1 f : 5 d : 11 h : 9 e : 4 i : 8 g : 10

bestInEdge V1 V2 V3

kicksOut

a b c d e f g h i

slide-31
SLIDE 31

An Example - Contracting Stage

V1 ROOT V3 V2 a : 5 b : 1 c : 1 f : 5 d : 11 h : 9 e : 4 i : 8 g : 10

bestInEdge V1 g V2 V3

kicksOut

a b c d e f g h i

slide-32
SLIDE 32

An Example - Contracting Stage

V1 ROOT V3 V2 a : 5 b : 1 c : 1 f : 5 d : 11 h : 9 e : 4 i : 8 g : 10

bestInEdge V1 g V2 d V3

kicksOut

a b c d e f g h i

slide-33
SLIDE 33

An Example - Contracting Stage

V1 ROOT V3 V2 a : 5 − 10 b : 1 − 11 c : 1 f : 5 d : 11 h : 9 − 10 e : 4 i : 8 − 11 g : 10

V4

bestInEdge V1 g V2 d V3

kicksOut

a g b d c d e f g h g i d

slide-34
SLIDE 34

An Example - Contracting Stage

V4 ROOT V3 b : −10 c : 1 f : 5 a : −5 h : −1 e : 4 i : −3

bestInEdge V1 g V2 d V3 V4 kicksOut a g b d c d e f g h g i d

slide-35
SLIDE 35

An Example - Contracting Stage

V4 ROOT V3 b : −10 c : 1 f : 5 a : −5 h : −1 e : 4 i : −3

bestInEdge V1 g V2 d V3 f V4 kicksOut a g b d c d e f g h g i d

slide-36
SLIDE 36

An Example - Contracting Stage

V4 ROOT V3 b : −10 c : 1 f : 5 a : −5 h : −1 e : 4 i : −3

bestInEdge V1 g V2 d V3 f V4 h kicksOut a g b d c d e f g h g i d

slide-37
SLIDE 37

An Example - Contracting Stage

V4 ROOT V3 b : −10 − −1 c : 1 − 5 f : 5 a : −5 − −1 h : −1 e : 4 i : −3

V5

bestInEdge V1 g V2 d V3 f V4 h V5 kicksOut a g, h b d, h c f d e f g h g i d

slide-38
SLIDE 38

An Example - Contracting Stage

V5 ROOT b : −9 a : −4 c : −4

bestInEdge V1 g V2 d V3 f V4 h V5 kicksOut a g, h b d, h c f d e f f g h g i d

slide-39
SLIDE 39

An Example - Contracting Stage

V5 ROOT b : −9 a : −4 c : −4

bestInEdge V1 g V2 d V3 f V4 h V5 a kicksOut a g, h b d, h c f d e f f g h g i d

slide-40
SLIDE 40

Chu-Liu-Edmonds - Expanding Stage

After the contracting stage, every contracted node will have exactly one bestInEdge. This edge will kick out one edge inside the contracted node, breaking the cycle.

◮ Go through each bestInEdge e in the reverse order that we

added them

◮ lock down e, and remove every edge in kicksOut(e) from

bestInEdge.

slide-41
SLIDE 41

An Example - Expanding Stage

V5 ROOT b : −9 a : −4 c : −4

bestInEdge V1 g V2 d V3 f V4 h V5 a kicksOut a g, h b d, h c f d e f f g h g i d

slide-42
SLIDE 42

An Example - Expanding Stage

V5 ROOT b : −9 a : −4 c : −4

bestInEdge V1 a ✁ g V2 d V3 f V4 a ✁ h V5 a kicksOut a g, h b d, h c f d e f f g h g i d

slide-43
SLIDE 43

An Example - Expanding Stage

V4 ROOT V3 b : −10 c : 1 f : 5 a : −5 h : −1 e : 4 i : −3

bestInEdge V1 a ✁ g V2 d V3 f V4 a ✁ h V5 a kicksOut a g, h b d, h c f d e f f g h g i d

slide-44
SLIDE 44

An Example - Expanding Stage

V4 ROOT V3 b : −10 c : 1 f : 5 a : −5 h : −1 e : 4 i : −3

bestInEdge V1 a ✁ g V2 d V3 f V4 a ✁ h V5 a kicksOut a g, h b d, h c f d e f f g h g i d

slide-45
SLIDE 45

An Example - Expanding Stage

V1 ROOT V3 V2 a : 5 b : 1 c : 1 f : 5 d : 11 h : 9 e : 4 i : 8 g : 10

bestInEdge V1 a ✁ g V2 d V3 f V4 a ✁ h V5 a kicksOut a g, h b d, h c f d e f f g h g i d

slide-46
SLIDE 46

An Example - Expanding Stage

V1 ROOT V3 V2 a : 5 b : 1 c : 1 f : 5 d : 11 h : 9 e : 4 i : 8 g : 10

bestInEdge V1 a ✁ g V2 d V3 f V4 a ✁ h V5 a kicksOut a g, h b d, h c f d e f f g h g i d

slide-47
SLIDE 47

Chu-Liu-Edmonds - Notes

◮ This is a greedy algorithm with a clever form of delayed

back-tracking to recover from inconsistent decisions (cycles).

◮ CLE is exact: it always recovers the optimal arborescence.

slide-48
SLIDE 48

Chu-Liu-Edmonds - Notes

◮ Efficient implementation:

Tarjan ’77, Finding Optimum Branchings, Networks

Not recursive. Uses a union-find (a.k.a. disjoint-set) data structure to keep track of collapsed nodes.

slide-49
SLIDE 49

Chu-Liu-Edmonds - Notes

◮ Efficient (wrong) implementation:

Tarjan ’77, Finding Optimum Branchings*, Networks *corrected in Camerini et al. ’79, A note on finding optimum branchings, Networks

Not recursive. Uses a union-find (a.k.a. disjoint-set) data structure to keep track of collapsed nodes.

slide-50
SLIDE 50

Chu-Liu-Edmonds - Notes

◮ Efficient (wrong) implementation:

Tarjan ’77, Finding Optimum Branchings*, Networks *corrected in Camerini et al. ’79, A note on finding optimum branchings, Networks

Not recursive. Uses a union-find (a.k.a. disjoint-set) data structure to keep track of collapsed nodes.

◮ Even more efficient:

Gabow et al. ’86, Efficient Algorithms for Finding Minimum Spanning Trees in Undirected and Directed Graphs, Combinatorica

Uses a Fibonacci heap to keep incoming edges sorted. Finds cycles by following bestInEdge instead of randomly visiting nodes. Describes how to constrain ROOT to have only one outgoing edge

slide-51
SLIDE 51

Arc Scoring / Learning

slide-52
SLIDE 52

Arc Scoring

Features

can look at source (head), destination (child), and arc label. For example:

◮ number of words between head and child, ◮ sequence of POS tags between head and child, ◮ is head to the left or right of child? ◮ vector state of a recurrent neural net at head and child, ◮ vector embedding of label, ◮ etc.

slide-53
SLIDE 53

Learning

Recall that when we have a parameterized model, and we have a decoder that can make predictions given that model. . .

slide-54
SLIDE 54

Learning

Recall that when we have a parameterized model, and we have a decoder that can make predictions given that model. . . we can use structured perceptron, or structured hinge loss: Lθ(xi, yi) = max

y∈Y {scoreθ(y) + cost(y, yi)} − scoreθ(yi)