Design Challenges for Entity Linking
Xiao Ling, Sameer Singh, Daniel S. Weld
Design Challenges for Entity Linking Xiao Ling , Sameer Singh, - - PowerPoint PPT Presentation
Design Challenges for Entity Linking Xiao Ling , Sameer Singh, Daniel S. Weld Entity Linking Seattle beat Portland yesterday. 2 Entity Linking Seattle beat Portland yesterday. 3 Entity Linking Seattle beat Portland yesterday. Seattle
Xiao Ling, Sameer Singh, Daniel S. Weld
2
Seattle beat Portland yesterday.
3
Seattle beat Portland yesterday.
4
Seattle beat Portland yesterday. Seattle (city) Seattle Sounders Sea-Tac (airport)
5
Seattle beat Portland yesterday. Seattle (city) Seattle Sounders Sea-Tac (airport) ~3-4 M entries
(e.g. Koch et al. 2014)
(e.g. Hajishirzi et al. 2013, Durrett & Klein 2014)
(e.g. Sun et al. 2015)
(e.g. Knowledge Graph)
(see Shen et al. 2014; Roth et al. 2014)
6
hubs.
Antibiotic Resolution
7
hubs.
Antibiotic Resolution Seattle Sounders
8
hubs.
Antibiotic Resolution Seattle Sounders Seattle (city)
9
hubs.
Antibiotic Resolution
10
hubs.
Antibiotic Resolution Seattle (city)
11
12
13
Joint Inference
14
Joint Inference Learning to rank
15
Joint Inference Deep Neural Networks Learning to rank
16
Datase t # of Mentions Knowledge Base UIUC ACE 244 Wikipedia MSNBC 654 Wikipedia AIDA
(Hoffart et
AIDA-D 5917 Yago AIDA-T 5616 Yago
TAC KBP
TAC09 3904 Wikipedia 2008 TAC10 2250 Wikipedia 2008 TAC10T 1500 Wikipedia 2008 TAC11 2250 Wikipedia 2008 TAC12 2226 Wikipedia 2008
17
ACE MSNBC AIDA-D AIDA-T KBP09 KBP10 KBP10T KBP11 KBP12
Cucerzan (2007)
Milne & Witten (2008)
Kulkarni et al. (2009)
Ratinov et al. (2011) ⎷
Hoffart et al. (2011)
Han & Sun (2012)
He et al. (2013a)
He et al. (2013b)
Cheng & Roth (2013) ⎷
Sil & Yates (2013) ⎷
Li et al. (2013)
Cornolti et al. (2013)
TAC-KBP participants
18
ACE MSNBC AIDA-D AIDA-T KBP09 KBP10 KBP10T KBP11 KBP12
Cucerzan (2007)
Milne & Witten (2008)
Kulkarni et al. (2009)
Ratinov et al. (2011) ⎷
Hoffart et al. (2011)
Han & Sun (2012)
He et al. (2013a)
He et al. (2013b)
Cheng & Roth (2013) ⎷
Sil & Yates (2013) ⎷
Li et al. (2013)
Cornolti et al. (2013)
TAC-KBP participants
Joint Inference Deep Neural Networks Learning to rank
19
ACE MSNBC AIDA-D AIDA-T KBP09 KBP10 KBP10T KBP11 KBP12
Cucerzan (2007)
Milne & Witten (2008)
Kulkarni et al. (2009)
Ratinov et al. (2011) ⎷
Hoffart et al. (2011)
Han & Sun (2012)
He et al. (2013a)
He et al. (2013b)
Cheng & Roth (2013) ⎷
Sil & Yates (2013) ⎷
Li et al. (2013)
Cornolti et al. (2013)
TAC-KBP participants
… Moscow ’s as yet undisclosed proposals … Moscow (city) Russia (country) Government of Russia
20
ACE MSNBC AIDA-D AIDA-T KBP09 KBP10 KBP10T KBP11 KBP12
Cucerzan (2007)
Milne & Witten (2008)
Kulkarni et al. (2009)
Ratinov et al. (2011) ⎷
Hoffart et al. (2011)
Han & Sun (2012)
He et al. (2013a)
He et al. (2013b)
Cheng & Roth (2013) ⎷
Sil & Yates (2013) ⎷
Li et al. (2013)
Cornolti et al. (2013)
TAC-KBP participants
… Florida Green Party … Green Party of the US Green Party of Florida
21
22 Entity Type Candidate Generation Coreference Coherence Mention Extraction
Seattle beat Portland yesterday.
23
Input:
Entity Type Candidate Generation Coreference Coherence Mention Extraction
Seattle beat Portland yesterday.
24
Mention Extraction
Seattle beat Portland yesterday.
25
Candidate Entities
(airport) Candidate Generation Mention Extraction
26
… capital of the state of Washington . In 1990, Washington starred as Bleek Gilliam … Washington refused to run for a third term … … Washington …
# [m -> e] # m
Entity Type Candidate Generation Coreference Coherence Mention Extraction
27
… capital of the state of Washington . In 1990, Washington starred as Bleek Gilliam … Washington refused to run for a third term … … Washington …
# “W” -> # “W”
Entity Type Candidate Generation Coreference Coherence Mention Extraction
Seattle beat Portland yesterday.
28
Candidate Entities
(airport) Candidate Generation Mention Extraction
Seattle beat Portland yesterday.
29
Candidate Entities
(airport) 0.6 0.2
0.1
Candidate Generation Mention Extraction
Seattle beat Portland yesterday.
30
Candidate Entities
(airport) 0.6 0.2
0.1
Entity Type Prediction
0.1 0.4 0.1 Entity Type Candidate Generation Mention Extraction
Seattle beat Portland yesterday.
31
Candidate Entities
(airport) 0.6 0.2
0.1
Entity Type Prediction
0.1 0.4 0.1
Entity Type Candidate Generation Coreference Coherence Mention Extraction
Seattle beat Portland yesterday.
32
Candidate Entities
(airport) 0.6 0.2
0.1
p(e | t,m) : re-normalization of cond. prob.
Entity Type Candidate Generation Coreference Coherence Mention Extraction
Seattle beat Portland yesterday.
33
Candidate Entities
(airport) 0.6 0.2
0.1
p(e | t,m) : re-normalization of cond. prob. e.g. t = LOC p(Seattle-city | LOC, “Seattle”) = 0.6 / 0.7 p(Sea-Tac | LOC, “Seattle”) = 0.1 / 0.7
Entity Type Candidate Generation Coreference Coherence Mention Extraction
Seattle beat Portland yesterday.
34
Candidate Entities
(airport) 0.6 0.2
0.1
Entity Type Prediction
0.1 0.4 0.1
Entity Type Candidate Generation Coreference Coherence Mention Extraction
Seattle beat Portland yesterday.
35
Entity Type Prediction
0.1 0.4 0.1
Entity Type Candidate Generation Coreference Coherence Mention Extraction
Candidate Entities
(airport) 0.6 0.2
0.1
Candidate Generation
Candidate Entities
(airport) 0.2 0.4
0.1
Entity Type
Seattle beat Portland yesterday.
36
Entity Type Prediction
0.1 0.4 0.1
Entity Type Candidate Generation Coreference Coherence Mention Extraction
Candidate Entities
(airport) 0.6 0.2
0.1
Candidate Generation
Candidate Entities
(airport) 0.2 0.4
0.1
Entity Type
37
Seattle Sounders head coach Sigi Schmid has some ideas … Seattle beat Portland yesterday.
Entity Type Candidate Generation Coreference Mention Extraction
38
Candidate Entities
Sounders
(airport)
Candidate Entities
Portland
Timbers
Seattle beat Portland yesterday.
0.2 0.4
0.1
0.2 0.2
0.1
Entity Type Candidate Generation Coreference Coherence Mention Extraction
39
(Milne & Witten, 2008)
Entity Type Candidate Generation Coreference Coherence Mention Extraction
40
(Milne & Witten, 2008)
George Washington Denzel Washington President
US Constitution American Revolutionary War Tony Award Golden Globe Training Day
Entity Type Candidate Generation Coreference Coherence Mention Extraction
41
(Milne & Witten, 2008)
George Washington Denzel Washington President
US Constitution American Revolutionary War Tony Award Golden Globe Training Day
Entity Type Candidate Generation Coreference Coherence Mention Extraction
42
(Milne & Witten, 2008)
George Washington John Adams President
US Constitution American Revolutionary War Quasi-War President
US Constitution
Entity Type Candidate Generation Coreference Coherence Mention Extraction
=> r (Barack Obama, United States) = 1
43
(Cheng & Roth, 2013)
Entity Type Candidate Generation Coreference Coherence Mention Extraction
44
Entity Type Candidate Generation Coreference Coherence Mention Extraction
45
ACE MSNBC AIDA-D AIDA-T KBP09 KBP1 KBP10 T KBP1 1 KBP1 2 Cucerzan (2007) ⎷ Milne & Witten (2008) Kulkarni et al. (2009) ⎷ Ratinov et al. (2011) ⎷ ⎷ Hoffart et al. (2011) ⎷ Han & Sun (2012) ⎷ He et al. (2013a) ⎷ ⎷ He et al. (2013b) ⎷ ⎷ Cheng & Roth (2013) ⎷ ⎷ ⎷ Sil & Yates (2013) ⎷ ⎷ ⎷ Li et al. (2013) ⎷ ⎷ Cornolti et al. (2013) ⎷ ⎷ TAC-KBP participants ⎷ ⎷ ⎷ ⎷ ⎷
46
Dataset # of Mentions Knowledge Base ACE 244 Wikipedia MSNBC 654 Wikipedia AIDA-D 5917 Yago AIDA-T 5616 Yago TAC09 3904 Wikipedia 2008 TAC10 2250 Wikipedia 2008 TAC10T 1500 Wikipedia 2008 TAC11 2250 Wikipedia 2008 TAC12 2226 Wikipedia 2008
Mention based F1 Official Eval.
(Spitkovsky & Chang, 2012)
47
Entity Type Candidate Generation Coreference Coherence Mention Extraction
Conditional Probability p(e | m)
Entity Type Candidate Generation Coreference Coherence Mention Extraction
48
1 3 5 10 20 30 50 100 Inf 0.5 0.6 0.7 0.8 0.9 1
k Recall@k CrossWikis Intra−Wikipedia Freebase Search
49
(Stanford NER)
(Ling & Weld, 2012)
Entity Type Candidate Generation Coreference Coherence Mention Extraction
50
(Ling & Weld, 2012)
Entity Type Candidate Generation Coreference Coherence Mention Extraction
Entity Type Candidate Generation Coreference Coherence Mention Extraction
51
F1 50 62.5 75 87.5 100 ACE MSNBC AIDA-D AIDA-T TAC09 TAC10 TAC10T TAC11 TAC12 Overall
+NER +FIGER
52
F1 50 62.5 75 87.5 100 ACE MSNBC AIDA-D AIDA-T TAC09 TAC10 TAC10T TAC11 TAC12
Candidate Generation +Entity Types +Coref. +Coherence
Entity Type Candidate Generation Coreference Coherence Mention Extraction
53
F1 50 62.5 75 87.5 100 Average Cand +Entity Type +Coref +Coherence
79.0 78.0 76.7 75.0
Entity Type Candidate Generation Coreference Coherence Mention Extraction
54
F1 50 62.5 75 87.5 100 Average Cand +Entity Type +Coref +Coherence AIDA Wikifier
79.6 72.2 79.0 78.0 76.7 75.0
Entity Type Candidate Generation Coreference Coherence Mention Extraction
55
Misc 10% Specific Labels 14% Context 33% Coreference 10% Types 14% Metonymy 19%
56
Vinculum
56
Vinculum
56
Vinculum
56
Vinculum
56
Vinculum
56
Vinculum
57
Vinculum
Entity Type Candidate Generation Coreference Coherence Mention Extraction
58
F1 50 62.5 75 87.5 100 ACE MSNBC AIDA-D AIDA-T TAC09 TAC10 TAC10T TAC11 TAC12 Overall
+NER (Gold) +FIGER (Gold)
59
Component Implementation Mention Extraction Stanford NER Candidate Generation CrossWikis Entity Type Prediction Fine-grained Entity Types Coreference Stanford Coreference Coherence NGD + relational triples
60
VINCULUM AIDA WIKIFIER Mention Extraction NER NER NER, noun phrases Candidate Generation CrossWikis intra-Wikipedia intra-Wikipedia Entity Types FIGER NER NER Coreference representative mention
candidates Coherence NGD, relational NGD NGD, relational Learning deterministic trained on AIDA trained on Wiki
defeat in 1996 at the hands of the All Blacks …
61
consider flying into Burbank or John Wayne Airport ...
62
Obama, an imperious if alluring voice gone distant and then missing.
63
Green never really stirred the passions of former Walker supporters, nor did he garner outsized support “outstate”.
64