End-to-end Neural Coreference Resolution
1
Kenton Lee Luheng He Mike Lewis Luke Zettlemoyer
UWNLP
University of Washington Facebook AI Research Allen Institute for Artificial Intelligence
End-to-end Neural Coreference Resolution Kenton Lee Luheng He - - PowerPoint PPT Presentation
End-to-end Neural Coreference Resolution Kenton Lee Luheng He Mike Lewis Luke Zettlemoyer UWNLP Allen Institute for University of Washington Facebook AI Research Artificial Intelligence 1 Coreference Resolution Input document
1
Kenton Lee Luheng He Mike Lewis Luke Zettlemoyer
UWNLP
University of Washington Facebook AI Research Allen Institute for Artificial Intelligence
2
A fire in a Bangladeshi garment factory has left at least 37 people dead and 100 hospitalized. Most of the deceased were killed in the crush as workers tried to flee the blaze in the four-story building. Input document
3
A fire in a Bangladeshi garment factory has left at least 37 people dead and 100 hospitalized. Most of the deceased were killed in the crush as workers tried to flee the blaze in the four-story building. Input document
Cluster #1 A fire in a Bangladeshi garment factory the blaze in the four-story building
4
A fire in a Bangladeshi garment factory has left at least 37 people dead and 100 hospitalized. Most of the deceased were killed in the crush as workers tried to flee the blaze in the four-story building.
Cluster #1 A fire in a Bangladeshi garment factory the blaze in the four-story building Cluster #2 a Bangladeshi garment factory the four-story building
Input document
5
A fire in a Bangladeshi garment factory has left at least 37 people dead and 100 hospitalized. Most of the deceased were killed in the crush as workers tried to flee the blaze in the four-story building.
Cluster #1 A fire in a Bangladeshi garment factory the blaze in the four-story building Cluster #2 a Bangladeshi garment factory the four-story building Cluster #3 at least 37 people the deceased
Input document
6 A fire in a Bangladeshi garment factory has left at least 37 people dead and 100 hospitalized. Most of the deceased were killed in the crush as workers tried to flee the blaze in the four-story building. Input document
Mention detection
A fire in a Bangladeshi garment factory at least 37 people … the four-story building
Mention clustering
Cluster #1 A fire in a Bangladeshi garment factory the blaze in the four-story building Cluster #2 a Bangladeshi garment factory the four-story building Cluster #3 at least 37 people the deceased
7
A fire in a Bangladeshi garment factory has left at least 37 people dead and 100 hospitalized. Candidate mentions A fire in a Bangladeshi garment factory garment factory at least 37 people dead and 100 hospitalized … Input document Hand-engineered rules Syntactic parser Mention #1 Mention #2 Coreferent? A fire in a Bangladeshi garment factory garment ✓/✗ garment factory ✓/✗ factory at least 37 people dead and 100 hospitalized ✓/✗ … … ✓/✗
8
A fire in a Bangladeshi garment factory has left at least 37 people dead and 100 hospitalized. Candidate mentions A fire in a Bangladeshi garment factory garment factory at least 37 people dead and 100 hospitalized … Input document Hand-engineered rules Syntactic parser Mention #1 Mention #2 Coreferent? A fire in a Bangladeshi garment factory garment ✓/✗ garment factory ✓/✗ factory at least 37 people dead and 100 hospitalized ✓/✗ … … ✓/✗
Mention clustering: main source of improvement for many years!
9
A fire in a Bangladeshi garment factory has left at least 37 people dead and 100 hospitalized. Candidate mentions A fire in a Bangladeshi garment factory garment factory at least 37 people dead and 100 hospitalized … Input document Hand-engineered rules Syntactic parser Mention #1 Mention #2 Coreferent? A fire in a Bangladeshi garment factory garment ✓/✗ garment factory ✓/✗ factory at least 37 people dead and 100 hospitalized ✓/✗ … … ✓/✗
Relies on parser for:
10
11
12 A fire in a Bangladeshi garment factory has left at least 37 people dead and 100 hospitalized. Most of the deceased were killed in the crush as workers tried to flee the blaze in the four-story building. Witnesses say the only exit door was on the ground floor, and that it was locked when the fire broke out. Input document (N words)
Naive joint model is O(N4):
13 Span #1 A A fire A fire in …
O(N2) spans in every document
A fire in a Bangladeshi garment factory has left at least 37 people dead and 100 hospitalized. Most of the deceased were killed in the crush as workers tried to flee the blaze in the four-story building. Witnesses say the only exit door was on the ground floor, and that it was locked when the fire broke out. Input document (N words)
Naive joint model is O(N4):
14 Span #1 Span #2 Coreferent? A A fire
✓/✗
A fire A fire in
✓/✗
A fire in A fire in a
✓/✗
… …
✓/✗
O(N4) pairwise decisions
A fire in a Bangladeshi garment factory has left at least 37 people dead and 100 hospitalized. Most of the deceased were killed in the crush as workers tried to flee the blaze in the four-story building. Witnesses say the only exit door was on the ground floor, and that it was locked when the fire broke out. Input document (N words)
Naive joint model is O(N4):
15
16
A fire in a Bangladeshi garment factory has left at least 37 people dead and 100 hospitalized. Most of the deceased were killed in the crush as workers tried to flee the blaze in the four-story building. Witnesses say the only exit door was on the ground floor, and that it was locked when the fire broke out.
Every span independently chooses an antecedent
Input document
17
✏
Span Antecedent 1 A 2 A fire 3 A fire in … … … M
y3 y2 y1 yM
18
✏
Span Antecedent 1 A 2 A fire 3 A fire in … … … M
y3 y2 y1 yM
19
: no coreference link
Span Antecedent 1 A 2 A fire 3 A fire in … … … M
y3 y2 y1 yM
20
Coreference link from span 1 to span 3
Span Antecedent 1 A 2 A fire 3 A fire in … … … M
y3 y2 y1 yM
21
Coreference link from span 2 to span 3
Span Antecedent 1 A 2 A fire 3 A fire in … … … M
y3 y2 y1 yM
22
A fire in a Bangladeshi garment factory has left at least 37 people dead and 100 hospitalized. Most of the deceased were killed in the crush as workers tried to flee the blaze in the four-story building. Witnesses say the only exit door was on the ground floor, and that it was locked when the fire broke out. Input document
Span Antecedent ( ) A A fire … … a Bangladeshi garment factory … … the four-story building a Bangladeshi garment factory … …
✏ ✏ ✏ ✏
yi
23
A fire in a Bangladeshi garment factory has left at least 37 people dead and 100 hospitalized. Most of the deceased were killed in the crush as workers tried to flee the blaze in the four-story building. Witnesses say the only exit door was on the ground floor, and that it was locked when the fire broke out. Input document
Span Antecedent ( ) A A fire … … a Bangladeshi garment factory … … the four-story building a Bangladeshi garment factory … …
✏ ✏ ✏ ✏
yi
Not a mention
24
A fire in a Bangladeshi garment factory has left at least 37 people dead and 100 hospitalized. Most of the deceased were killed in the crush as workers tried to flee the blaze in the four-story building. Witnesses say the only exit door was on the ground floor, and that it was locked when the fire broke out. Input document
Span Antecedent ( ) A A fire … … a Bangladeshi garment factory … … the four-story building a Bangladeshi garment factory … …
✏ ✏ ✏ ✏
yi
No link with previously occurring span
25
A fire in a Bangladeshi garment factory has left at least 37 people dead and 100 hospitalized. Most of the deceased were killed in the crush as workers tried to flee the blaze in the four-story building. Witnesses say the only exit door was on the ground floor, and that it was locked when the fire broke out. Input document
Span Antecedent ( ) A A fire … … a Bangladeshi garment factory … … the four-story building a Bangladeshi garment factory … …
✏ ✏ ✏ ✏
yi
Predicted coreference link
26
P(y1, . . . , yM | D) =
M
Y
i=1
P(yi | D) =
M
Y
i=1
es(i,yi) P
y0∈Y(i) es(i,y0)
P(y1, . . . , yM | D) =
M
Y
i=1
P(yi | D) =
M
Y
i=1
es(i,yi) P
y0∈Y(i) es(i,y0)
27
Independent decision for every span
P(y1, . . . , yM | D) =
M
Y
i=1
P(yi | D) =
M
Y
i=1
es(i,yi) P
y0∈Y(i) es(i,y0)
28
Pairwise coreference score between span i and span j
s(i, j)
P(y1, . . . , yM | D) =
M
Y
i=1
P(yi | D) =
M
Y
i=1
es(i,yi) P
y0∈Y(i) es(i,y0)
29
s(i, j) s(i, j) = ( sm(i) + sm(j) + sa(i, j) j 6= ✏ j = ✏
Factor coreference score to enable span pruning:
s(i, j)
P(y1, . . . , yM | D) =
M
Y
i=1
P(yi | D) =
M
Y
i=1
es(i,yi) P
y0∈Y(i) es(i,y0)
30
s(i, j) s(i, j) = ( sm(i) + sm(j) + sa(i, j) j 6= ✏ j = ✏
Factor coreference score to enable span pruning:
s(i, j)
Is this span a mention?
P(y1, . . . , yM | D) =
M
Y
i=1
P(yi | D) =
M
Y
i=1
es(i,yi) P
y0∈Y(i) es(i,y0)
31
s(i, j) s(i, j) = ( sm(i) + sm(j) + sa(i, j) j 6= ✏ j = ✏
Factor coreference score to enable span pruning:
s(i, j)
Is span j an antecedent of span i?
32
s(i, j)
Dummy antecedent has a fixed zero score
s(i, j) = ( sm(i) + sm(j) + sa(i, j) j 6= ✏ j = ✏
Factor coreference score to enable span pruning:
s(i, j) P(y1, . . . , yM | D) =
M
Y
i=1
P(yi | D) =
M
Y
i=1
es(i,yi) P
y0∈Y(i) es(i,y0)
33
A fire in a Bangladeshi garment factory has left at least 37 people dead and 100 hospitalized. Most of the deceased were killed in the crush as workers tried to flee the blaze in the four-story building. Witnesses say the only exit door was on the ground floor, and that it was locked when the fire broke out. Input document (N words)
34
A fire in a Bangladeshi garment factory has left at least 37 people dead and 100 hospitalized. Most of the deceased were killed in the crush as workers tried to flee the blaze in the four-story building. Witnesses say the only exit door was on the ground floor, and that it was locked when the fire broke out.
Span A
A fire 4 … … a Bangladeshi garment factory 6 … … the four-story building 10 … …
sm
Input document (N words)
35
A fire in a Bangladeshi garment factory has left at least 37 people dead and 100 hospitalized. Most of the deceased were killed in the crush as workers tried to flee the blaze in the four-story building. Witnesses say the only exit door was on the ground floor, and that it was locked when the fire broke out.
Spans with low mention scores likely to have a negative overall score
Span A
A fire 4 … … a Bangladeshi garment factory 6 … … the four-story building 10 … …
sm
Input document (N words)
36 Span A fire 4 … … a Bangladeshi garment factory 6 … … the four-story building 10 … …
sm
Span A
A fire 4 … … a Bangladeshi garment factory 6 … … the four-story building 10 … …
sm
A fire in a Bangladeshi garment factory has left at least 37 people dead and 100 hospitalized. Most of the deceased were killed in the crush as workers tried to flee the blaze in the four-story building. Witnesses say the only exit door was on the ground floor, and that it was locked when the fire broke out. Input document (N words)
Keep top λN
37 Target Span A fire … a Bangladeshi garment factory … the four-story building …
A fire in a Bangladeshi garment factory has left at least 37 people dead and 100 hospitalized. Most of the deceased were killed in the crush as workers tried to flee the blaze in the four-story building. Witnesses say the only exit door was on the ground floor, and that it was locked when the fire broke out. Input document (N words)
38 Antecedent A fire … a Bangladeshi garment factory … the four-story building … Target Span A fire
…
… … …
2
10
… … … … …
A fire in a Bangladeshi garment factory has left at least 37 people dead and 100 hospitalized. Most of the deceased were killed in the crush as workers tried to flee the blaze in the four-story building. Witnesses say the only exit door was on the ground floor, and that it was locked when the fire broke out. Input document (N words)
39
Span #1 Span #2 Coreferent? A A fire
✓/✗
A fire A fire in
✓/✗
A fire in A fire in a
✓/✗
… …
✓/✗
A fire in a Bangladeshi garment factory has left at least 37 people dead and 100 hospitalized. Most of the deceased were killed in the crush as workers tried to flee the blaze in the four-story building. Witnesses say the only exit door was on the ground floor, and that it was locked when the fire broke out. Input document (N words)
Naive joint model is O(N4):
40
A fire in a Bangladeshi garment factory has left at least 37 people dead and 100 hospitalized. Most of the deceased were killed in the crush as workers tried to flee the blaze in the four-story building……. Input document
Only clusters with multiple mentions annotated:
41
A fire in a Bangladeshi garment factory has left at least 37 people dead and 100 hospitalized. Most of the deceased were killed in the crush as workers tried to flee the blaze in the four-story building……. Input document
Singleton mention missing from data
Only clusters with multiple mentions annotated:
42
Marginal log-likelihood objective.
log
M
Y
i=1
X
ˆ y∈Y(i)∩gold(i)
P(ˆ y | D)
43
Marginal log-likelihood objective.
log
M
Y
i=1
X
ˆ y∈Y(i)∩gold(i)
P(ˆ y | D)
44
s(i, j) = ( sm(i) + sm(j) + sa(i, j) j 6= ✏ j = ✏
Marginal log-likelihood objective.
log
M
Y
i=1
X
ˆ y∈Y(i)∩gold(i)
P(ˆ y | D)
45
s(i, j) = ( sm(i) + sm(j) + sa(i, j) j 6= ✏ j = ✏
A fire in a Bangladeshi garment factory has left at least 37 people dead and 100 hospitalized. Most of the deceased were killed in the crush as workers tried to flee the blaze in the four-story building. Input document left at 37 people
Span i Span j
46
s(i, j) = ( sm(i) + sm(j) + sa(i, j) j 6= ✏ j = ✏
A fire in a Bangladeshi garment factory has left at least 37 people dead and 100 hospitalized. Most of the deceased were killed in the crush as workers tried to flee the blaze in the four-story building. Input document left at 37 people
Span i Span j
Bad mention
47
s(i, j) = ( sm(i) + sm(j) + sa(i, j) j 6= ✏ j = ✏
A fire in a Bangladeshi garment factory has left at least 37 people dead and 100 hospitalized. Most of the deceased were killed in the crush as workers tried to flee the blaze in the four-story building. Input document left at 37 people
Span i Span j
Blame mention factor for absent link
48
A fire in a Bangladeshi garment factory has left at least 37 people dead and 100 hospitalized. Most of the deceased were killed in the crush as workers tried to flee the blaze in the four-story building. Input document a Bangladeshi garment factory left at
Span i Span j
Bad mention
49
s(i, j) = ( sm(i) + sm(j) + sa(i, j) j 6= ✏ j = ✏
A fire in a Bangladeshi garment factory has left at least 37 people dead and 100 hospitalized. Most of the deceased were killed in the crush as workers tried to flee the blaze in the four-story building. Input document a Bangladeshi garment factory left at
Span i Span j
Blame mention factor for absent link
50
s(i, j) = ( sm(i) + sm(j) + sa(i, j) j 6= ✏ j = ✏
A fire in a Bangladeshi garment factory has left at least 37 people dead and 100 hospitalized. Most of the deceased were killed in the crush as workers tried to flee the blaze in the four-story building. Input document a Bangladeshi garment factory 37 people
Span i Span j
Incompatible mentions
51
s(i, j) = ( sm(i) + sm(j) + sa(i, j) j 6= ✏ j = ✏
A fire in a Bangladeshi garment factory has left at least 37 people dead and 100 hospitalized. Most of the deceased were killed in the crush as workers tried to flee the blaze in the four-story building. Input document a Bangladeshi garment factory 37 people
Span i Span j
Blame antecedent factor for absent link
52
A fire in a Bangladeshi garment factory has left at least 37 people dead and 100 hospitalized. Most of the deceased were killed in the crush as workers tried to flee the blaze in the four-story building……. Input document
Only clusters with multiple mentions annotated:
53
A fire in a Bangladeshi garment factory has left at least 37 people dead and 100 hospitalized. Most of the deceased were killed in the crush as workers tried to flee the blaze in the four-story building……. Input document
Lexical and contextual cues are useful:
54
A fire in a Bangladeshi garment factory has left at least 37 people dead and 100 hospitalized. Most of the deceased were killed in the crush as workers tried to flee the blaze in the four-story building……. Input document
Lexical and contextual cues are useful:
e.g. paraphrased head words
55 General Electric said the Postal Service contacted the company the Postal Service
+
Word & character embeddings Span representation
56 General Electric said the Postal Service contacted the company the Postal Service
+
Bidirectional LSTM Word & character embeddings Span representation
57
Bidirectional LSTM Word & character embeddings Span representation
General Electric said the Postal Service contacted the company the Postal Service
Boundary representations
58 General Electric said the Postal Service contacted the company the Postal Service
+
Bidirectional LSTM Word & character embeddings Head-finding attention Span representation
Attention mechanism to learn headedness
59
Bidirectional LSTM Word & character embeddings Head-finding attention Span representation
General Electric said the Postal Service contacted the company General Electric
+
Electric said the
+
the Postal Service
+
Service contacted the
+
the company
+
Compute all span representations
60
P(yi | D)
General Electric the Postal Service the company s(the company, General Electric) s(the company, the Postal Service) s(the company, ✏) = 0
Span representation
General Electric the Postal Service the company s(the company, General Electric) s(the company, the Postal Service) s(the company, ✏) = 0
61
P(yi | D)
span i
Span representation
62
P(yi | D)
General Electric the Postal Service the company
Span representation
sm(i)
63
P(yi | D)
General Electric the Postal Service the company
Span representation
sm(i) sa(i, j)
64
P(yi | D)
General Electric the Postal Service the company
Span representation
sm(i) s(i, j) sa(i, j)
65
Span representation
P(yi | D)
General Electric the Postal Service the company s(the company, General Electric) s(the company, the Postal Service) s(the company, ✏) = 0
sm(i) s(i, j) sa(i, j)
66
Dataset: English OntoNotes (CoNLL-2012) Genres: Telephone conversations, newswire, newsgroups, broadcast conversation, broadcast news, weblogs Documents: 2802 training, 343 development, 348 test
67
Dataset: English OntoNotes (CoNLL-2012) Genres: Telephone conversations, newswire, newsgroups, broadcast conversation, broadcast news, weblogs Documents: 2802 training, 343 development, 348 test Additional pruning: Maximum span width, maximum sentence training, suppress spans with inconsistent bracketing, maximum number of antecedents Longest document has 4009 words!
68
Dataset: English OntoNotes (CoNLL-2012) Genres: Telephone conversations, newswire, newsgroups, broadcast conversation, broadcast news, weblogs Documents: 2802 training, 343 development, 348 test Additional pruning: Maximum span width, maximum sentence training, suppress spans with inconsistent bracketing, maximum number of antecedents Features: distance between spans, span width Metadata: speaker information, genre Longest document has 4009 words!
69
Test Avg. F1 (%) 50.0 54.0 58.0 62.0 66.0 70.0
70
Test Avg. F1 (%) 50.0 54.0 58.0 62.0 66.0 70.0 Durrett & Klein (2013) Björkelund & Kuhn (2014) Martschat & Strube (2015)
62.5 61.6 60.3
Linear models
71
Test Avg. F1 (%) 50.0 54.0 58.0 62.0 66.0 70.0 Durrett & Klein (2013) Björkelund & Kuhn (2014) Martschat & Strube (2015) Wiseman et al. (2016) Clark & Manning (2016)
65.7 64.2 62.5 61.6 60.3
Linear models Neural models
72
Test Avg. F1 (%) 50.0 54.0 58.0 62.0 66.0 70.0 Durrett & Klein (2013) Björkelund & Kuhn (2014) Martschat & Strube (2015) Wiseman et al. (2016) Clark & Manning (2016)
65.7 64.2 62.5 61.6 60.3
Pipelined models
73
Test Avg. F1 (%) 50.0 54.0 58.0 62.0 66.0 70.0 Durrett & Klein (2013) Björkelund & Kuhn (2014) Martschat & Strube (2015) Wiseman et al. (2016) Clark & Manning (2016) Our model (single) Our model (ensemble)
68.8 67.2 65.7 64.2 62.5 61.6 60.3
Pipelined models End-to-end models
% mention recall 20 40 60 80 100 Spans per word 0.1 0.2 0.3 0.4 0.5
Raghunathan et al. (2010) Our model (actual threshold) Our model (various thresholds)
74
% mention recall 20 40 60 80 100 Spans per word 0.1 0.2 0.3 0.4 0.5
Raghunathan et al. (2010) Our model (actual threshold) Our model (various thresholds)
75
Recall of gold mentions increases as we keep more spans
% mention recall 20 40 60 80 100 Spans per word 0.1 0.2 0.3 0.4 0.5
Raghunathan et al. (2010) Our model (actual threshold) Our model (various thresholds)
76
92.7% @ 0.4 spans per word
% mention recall 20 40 60 80 100 Spans per word 0.1 0.2 0.3 0.4 0.5
Raghunathan et al. (2010) Our model (actual threshold) Our model (various thresholds)
77
92.7% @ 0.4 spans per word 89.2% @ 0.26 spans per word
A fire in a Bangladeshi garment factory has left at least 37 people dead and 100 hospitalized. Most of the deceased were killed in the crush as workers tried to flee the blaze in the four-story building.
78
A fire in a Bangladeshi garment factory has left at least 37 people dead and 100 hospitalized. Most of the deceased were killed in the crush as workers tried to flee the blaze in the four-story building.
79
: Mention in a predicted cluster
A fire in a Bangladeshi garment factory has left at least 37 people dead and 100 hospitalized. Most of the deceased were killed in the crush as workers tried to flee the blaze in the four-story building.
80
: Mention in a predicted cluster : Head-finding attention weight
A fire in a Bangladeshi garment factory has left at least 37 people dead and 100 hospitalized. Most of the deceased were killed in the crush as workers tried to flee the blaze in the four-story building.
81
: Mention in a predicted cluster : Head-finding attention weight
Attention-based head finder facilitates soft similarity cues
A fire in a Bangladeshi garment factory has left at least 37 people dead and 100 hospitalized. Most of the deceased were killed in the crush as workers tried to flee the blaze in the four-story building.
82
: Mention in a predicted cluster : Head-finding attention weight
Good head-finding requires word-order information!
83
% of constituent spans with predicted heads that agree with syntactic heads
% agreement 25 50 75 100 Span width 1 2 3 4 5 6 7 8 9 10
: Mention in a predicted cluster : Head-finding attention weight
84
The flight attendants have until 6:00 today to ratify labor concessions. The pilots' union and ground crew did so yesterday.
: Mention in a predicted cluster : Head-finding attention weight
85
The flight attendants have until 6:00 today to ratify labor concessions. The pilots' union and ground crew did so yesterday. Conflating relatedness with paraphrasing
86
87