Design Challenges for Entity Linking Xiao Ling , Sameer Singh, - - PowerPoint PPT Presentation

design challenges for entity linking
SMART_READER_LITE
LIVE PREVIEW

Design Challenges for Entity Linking Xiao Ling , Sameer Singh, - - PowerPoint PPT Presentation

Design Challenges for Entity Linking Xiao Ling , Sameer Singh, Daniel S. Weld Entity Linking Seattle beat Portland yesterday. 2 Entity Linking Seattle beat Portland yesterday. 3 Entity Linking Seattle beat Portland yesterday. Seattle


slide-1
SLIDE 1

Design Challenges 
 for Entity Linking

Xiao Ling, Sameer Singh, Daniel S. Weld

slide-2
SLIDE 2

Entity Linking

2

Seattle beat Portland yesterday.

slide-3
SLIDE 3

Entity Linking

3

Seattle beat Portland yesterday.

slide-4
SLIDE 4

Entity Linking

4

Seattle beat Portland yesterday. Seattle 
 (city) Seattle 
 Sounders Sea-Tac
 (airport)

slide-5
SLIDE 5

Entity Linking

5

Seattle beat Portland yesterday. Seattle 
 (city) Seattle 
 Sounders Sea-Tac
 (airport) ~3-4 M
 entries

slide-6
SLIDE 6

Applications

  • Relation Extraction 


(e.g. Koch et al. 2014)

  • Coreference Resolution 


(e.g. Hajishirzi et al. 2013, Durrett & Klein 2014)

  • Question Answering 


(e.g. Sun et al. 2015)

  • Web Search 


(e.g. Knowledge Graph)

  • many others… 


(see Shen et al. 2014; Roth et al. 2014)

6

slide-7
SLIDE 7

Ambiguity

  • Seattle beat Portland yesterday.
  • Seattle scores high in the latest report of startup

hubs.

  • The Emerald City Council To Make Decision on

Antibiotic Resolution

7

slide-8
SLIDE 8

Ambiguity

  • Seattle beat Portland yesterday.
  • Seattle scores high in the latest report of startup

hubs.

  • The Emerald City Council To Make Decision on

Antibiotic Resolution Seattle Sounders

8

slide-9
SLIDE 9

Ambiguity

  • Seattle beat Portland yesterday.
  • Seattle scores high in the latest report of startup

hubs.

  • The Emerald City Council To Make Decision on

Antibiotic Resolution Seattle Sounders Seattle (city)

9

slide-10
SLIDE 10

Variability

  • Seattle scores high in the latest report of startup

hubs.
 


  • The Emerald City Council To Make Decision on

Antibiotic Resolution

10

slide-11
SLIDE 11

Variability

  • Seattle scores high in the latest report of startup

hubs.
 


  • The Emerald City Council To Make Decision on

Antibiotic Resolution Seattle (city)

11

slide-12
SLIDE 12

Related Work

  • Cucerzan (2007)
  • Milne and Witten (2008)
  • Kulkarni et al. (2009)
  • Ratinov et al. (2011)
  • Hoffart et al. (2011)
  • Han and Sun (2012)
  • He et al. (2013a)

12

  • He et al. (2013b)
  • Cheng and Roth (2013)
  • Sil and Yates (2013)
  • Li et al. (2013)
  • Cornolti et al. (2013)
  • … many others
slide-13
SLIDE 13

Related Work

  • Cucerzan (2007)
  • Milne and Witten (2008)
  • Kulkarni et al. (2009)
  • Ratinov et al. (2011)
  • Hoffart et al. (2011)
  • Han and Sun (2012)
  • He et al. (2013a)

13

  • He et al. (2013b)
  • Cheng and Roth (2013)
  • Sil and Yates (2013)
  • Li et al. (2013)
  • Cornolti et al. (2013)
  • … many others

Joint Inference

slide-14
SLIDE 14

Related Work

  • Cucerzan (2007)
  • Milne and Witten (2008)
  • Kulkarni et al. (2009)
  • Ratinov et al. (2011)
  • Hoffart et al. (2011)
  • Han and Sun (2012)
  • He et al. (2013a)

14

  • He et al. (2013b)
  • Cheng and Roth (2013)
  • Sil and Yates (2013)
  • Li et al. (2013)
  • Cornolti et al. (2013)
  • … many others

Joint Inference Learning 
 to rank

slide-15
SLIDE 15

Related Work

  • Cucerzan (2007)
  • Milne and Witten (2008)
  • Kulkarni et al. (2009)
  • Ratinov et al. (2011)
  • Hoffart et al. (2011)
  • Han and Sun (2012)
  • He et al. (2013a)

15

  • He et al. (2013b)
  • Cheng and Roth (2013)
  • Sil and Yates (2013)
  • Li et al. (2013)
  • Cornolti et al. (2013)
  • … many others

Joint Inference Deep Neural
 Networks Learning 
 to rank

slide-16
SLIDE 16

Popular Data Sets

16

Datase t # of Mentions Knowledge Base UIUC ACE 244 Wikipedia MSNBC 654 Wikipedia AIDA


(Hoffart et

  • al. 2011)

AIDA-D 5917 Yago AIDA-T 5616 Yago

TAC KBP

TAC09 3904 Wikipedia 2008 TAC10 2250 Wikipedia 2008 TAC10T 1500 Wikipedia 2008 TAC11 2250 Wikipedia 2008 TAC12 2226 Wikipedia 2008

slide-17
SLIDE 17

Unfortunately…

17

ACE MSNBC AIDA-D AIDA-T KBP09 KBP10 KBP10T KBP11 KBP12

Cucerzan (2007)

Milne & Witten (2008)

Kulkarni et al. (2009)

Ratinov et al. (2011) ⎷

Hoffart et al. (2011)

Han & Sun (2012)

He et al. (2013a)

⎷ ⎷

He et al. (2013b)

⎷ ⎷

Cheng & Roth (2013) ⎷

⎷ ⎷

Sil & Yates (2013) ⎷

⎷ ⎷

Li et al. (2013)

⎷ ⎷

Cornolti et al. (2013)

⎷ ⎷

TAC-KBP participants

⎷ ⎷ ⎷ ⎷ ⎷

slide-18
SLIDE 18

Unfortunately…

18

ACE MSNBC AIDA-D AIDA-T KBP09 KBP10 KBP10T KBP11 KBP12

Cucerzan (2007)

Milne & Witten (2008)

Kulkarni et al. (2009)

Ratinov et al. (2011) ⎷

Hoffart et al. (2011)

Han & Sun (2012)

He et al. (2013a)

⎷ ⎷

He et al. (2013b)

⎷ ⎷

Cheng & Roth (2013) ⎷

⎷ ⎷

Sil & Yates (2013) ⎷

⎷ ⎷

Li et al. (2013)

⎷ ⎷

Cornolti et al. (2013)

⎷ ⎷

TAC-KBP participants

⎷ ⎷ ⎷ ⎷ ⎷

Joint Inference Deep Neural
 Networks Learning 
 to rank

slide-19
SLIDE 19

Metonymy

19

ACE MSNBC AIDA-D AIDA-T KBP09 KBP10 KBP10T KBP11 KBP12

Cucerzan (2007)

Milne & Witten (2008)

Kulkarni et al. (2009)

Ratinov et al. (2011) ⎷

Hoffart et al. (2011)

Han & Sun (2012)

He et al. (2013a)

⎷ ⎷

He et al. (2013b)

⎷ ⎷

Cheng & Roth (2013) ⎷

⎷ ⎷

Sil & Yates (2013) ⎷

⎷ ⎷

Li et al. (2013)

⎷ ⎷

Cornolti et al. (2013)

⎷ ⎷

TAC-KBP participants

⎷ ⎷ ⎷ ⎷ ⎷

… Moscow ’s as yet undisclosed proposals … Moscow (city) Russia (country) Government of Russia

slide-20
SLIDE 20

Nested Entities

20

ACE MSNBC AIDA-D AIDA-T KBP09 KBP10 KBP10T KBP11 KBP12

Cucerzan (2007)

Milne & Witten (2008)

Kulkarni et al. (2009)

Ratinov et al. (2011) ⎷

Hoffart et al. (2011)

Han & Sun (2012)

He et al. (2013a)

⎷ ⎷

He et al. (2013b)

⎷ ⎷

Cheng & Roth (2013) ⎷

⎷ ⎷

Sil & Yates (2013) ⎷

⎷ ⎷

Li et al. (2013)

⎷ ⎷

Cornolti et al. (2013)

⎷ ⎷

TAC-KBP participants

⎷ ⎷ ⎷ ⎷ ⎷

… Florida Green Party … Green Party of the US Green Party of Florida

slide-21
SLIDE 21

Contributions

  • Vinculum: a simple, deterministic, modular EL sys.
  • comprehensive evaluation over nine data sets
  • candidate conditional prob. can work quite well
  • entity types are important to the final performance
  • comparable results with two state-of-the-art sys.

21

slide-22
SLIDE 22

Agenda

  • Introduction
  • Vinculum
  • Experiments
  • Conclusion

22 Entity Type Candidate Generation Coreference Coherence Mention Extraction

slide-23
SLIDE 23

Vinculum Architecture

Seattle beat Portland yesterday.

23

Input:

Entity Type Candidate Generation Coreference Coherence Mention Extraction

slide-24
SLIDE 24

Mention Extraction

Seattle beat Portland yesterday.

24

Mention Extraction

slide-25
SLIDE 25

Candidate Generation

Seattle beat Portland yesterday.

25

Candidate Entities


  • Seattle (city)

  • Seattle Sounders

  • Seattle-Tacoma 


(airport) Candidate Generation Mention Extraction

slide-26
SLIDE 26

Conditional probability

26

… capital of the state of Washington . In 1990, Washington starred as Bleek Gilliam … Washington refused to run for a third term … … Washington …

p(e | m) =

# [m -> e] # m

Entity Type Candidate Generation Coreference Coherence Mention Extraction

slide-27
SLIDE 27

27

… capital of the state of Washington . In 1990, Washington starred as Bleek Gilliam … Washington refused to run for a third term … … Washington …

p( | “Washington”) =

# “W” -> # “W”

Entity Type Candidate Generation Coreference Coherence Mention Extraction

Conditional probability

slide-28
SLIDE 28

Candidate Generation

Seattle beat Portland yesterday.

28

Candidate Entities


  • Seattle (city)

  • Seattle Sounders

  • Seattle-Tacoma 


(airport) Candidate Generation Mention Extraction

slide-29
SLIDE 29

Candidate Generation

Seattle beat Portland yesterday.

29

Candidate Entities


  • Seattle (city)

  • Seattle Sounders

  • Seattle-Tacoma 


(airport) 0.6 0.2

0.1

Candidate Generation Mention Extraction

slide-30
SLIDE 30

Entity Types

Seattle beat Portland yesterday.

30

Candidate Entities


  • Seattle (city)

  • Seattle Sounders

  • Seattle-Tacoma 


(airport) 0.6 0.2

0.1

Entity Type Prediction

  • city
  • sports_team
  • facility/airport

0.1 0.4 0.1 Entity Type Candidate Generation Mention Extraction

slide-31
SLIDE 31

Entity Types

Seattle beat Portland yesterday.

31

Candidate Entities


  • Seattle (city)

  • Seattle Sounders

  • Seattle-Tacoma 


(airport) 0.6 0.2

0.1

Entity Type Prediction

  • city
  • sports_team
  • facility/airport

0.1 0.4 0.1

p(e | m) = ∑t p(e,t | m)
 p(e | m) = ∑t p(e | t,m) p(t | m)

Entity Type Candidate Generation Coreference Coherence Mention Extraction

slide-32
SLIDE 32

Entity Types

Seattle beat Portland yesterday.

32

Candidate Entities


  • Seattle (city)

  • Seattle Sounders

  • Seattle-Tacoma 


(airport) 0.6 0.2

0.1

p(e | m) = ∑t p(e,t | m)
 p(e | m) = ∑t p(e | t,m) p(t | m)

p(e | t,m) : re-normalization of cond. prob.


Entity Type Candidate Generation Coreference Coherence Mention Extraction

slide-33
SLIDE 33

Entity Types

Seattle beat Portland yesterday.

33

Candidate Entities


  • Seattle (city)

  • Seattle Sounders

  • Seattle-Tacoma 


(airport) 0.6 0.2

0.1

p(e | m) = ∑t p(e,t | m)
 p(e | m) = ∑t p(e | t,m) p(t | m)

p(e | t,m) : re-normalization of cond. prob.
 e.g. t = LOC p(Seattle-city | LOC, “Seattle”) = 0.6 / 0.7
 p(Sea-Tac | LOC, “Seattle”) = 0.1 / 0.7

Entity Type Candidate Generation Coreference Coherence Mention Extraction

slide-34
SLIDE 34

Entity Types

Seattle beat Portland yesterday.

34

Candidate Entities


  • Seattle (city)

  • Seattle Sounders

  • Seattle-Tacoma 


(airport) 0.6 0.2

0.1

Entity Type Prediction

  • city
  • sports_team
  • facility/airport

0.1 0.4 0.1

p(e | m) = ∑t p(e,t | m)
 p(e | m) = ∑t p(e,t | m) p(t | m) p(t | m)

Entity Type Candidate Generation Coreference Coherence Mention Extraction

slide-35
SLIDE 35

Entity Types

Seattle beat Portland yesterday.

35

Entity Type 
 Prediction

  • city
  • sports_team
  • facility/airport

0.1 0.4 0.1

Entity Type Candidate Generation Coreference Coherence Mention Extraction

Candidate Entities


  • Seattle (city)

  • Seattle Sounders

  • Seattle-Tacoma 


(airport) 0.6 0.2

0.1

Candidate Generation

Candidate Entities


  • Seattle (city)

  • Seattle Sounders

  • Seattle-Tacoma 


(airport) 0.2 0.4

0.1

Entity Type

slide-36
SLIDE 36

Entity Types

Seattle beat Portland yesterday.

36

Entity Type 
 Prediction

  • city
  • sports_team
  • facility/airport

0.1 0.4 0.1

Entity Type Candidate Generation Coreference Coherence Mention Extraction

Candidate Entities


  • Seattle (city)

  • Seattle Sounders

  • Seattle-Tacoma 


(airport) 0.6 0.2

0.1

Candidate Generation

Candidate Entities


  • Seattle Sounders

  • Seattle (city)

  • Seattle-Tacoma 


(airport) 0.2 0.4

0.1

Entity Type

slide-37
SLIDE 37

Coreference

37

Seattle Sounders head coach Sigi Schmid has some ideas … 
 Seattle beat Portland yesterday.

Entity Type Candidate Generation Coreference Mention Extraction

slide-38
SLIDE 38

Coherence

38

Candidate Entities


  • Seattle (city)

  • Seattle

Sounders


  • Seattle-Tacoma

(airport)

Candidate Entities


  • Portland, OR

  • Univ. of

Portland


  • Portland

Timbers

Seattle beat Portland yesterday.

0.2 0.4

0.1

0.2 0.2

0.1

Entity Type Candidate Generation Coreference Coherence Mention Extraction

slide-39
SLIDE 39

Normalized Google Distance (NGD)

39

(Milne & Witten, 2008)

Entity Type Candidate Generation Coreference Coherence Mention Extraction

slide-40
SLIDE 40

Normalized Google Distance (NGD)

40

(Milne & Witten, 2008)

George Washington Denzel Washington President

  • f the US

US Constitution American Revolutionary War Tony Award Golden Globe Training Day

Entity Type Candidate Generation Coreference Coherence Mention Extraction

slide-41
SLIDE 41

Normalized Google Distance (NGD)

41

(Milne & Witten, 2008)

George Washington Denzel Washington President

  • f the US

US Constitution American Revolutionary War Tony Award Golden Globe Training Day

Entity Type Candidate Generation Coreference Coherence Mention Extraction

slide-42
SLIDE 42

Normalized Google Distance (NGD)

42

(Milne & Witten, 2008)

George Washington John Adams President

  • f the US

US Constitution American Revolutionary War Quasi-War President

  • f the US

US Constitution

Entity Type Candidate Generation Coreference Coherence Mention Extraction

slide-43
SLIDE 43

Relational Score

  • Relation triples from Freebase
  • A binary score =
  • 1, if two entities appear in a triple
  • 0, otherwise
  • E.g. (Barack Obama, birthplace, United States)


=> r (Barack Obama, United States) = 1

43

(Cheng & Roth, 2013)

Entity Type Candidate Generation Coreference Coherence Mention Extraction

slide-44
SLIDE 44

Context

44

Entity Type Candidate Generation Coreference Coherence Mention Extraction

slide-45
SLIDE 45

Agenda

  • Introduction
  • Vinculum
  • Experiments
  • Conclusion

45

ACE MSNBC AIDA-D AIDA-T KBP09 KBP1 KBP10 T KBP1 1 KBP1 2 Cucerzan (2007) ⎷ Milne & Witten (2008) Kulkarni et al. (2009) ⎷ Ratinov et al. (2011) ⎷ ⎷ Hoffart et al. (2011) ⎷ Han & Sun (2012) ⎷ He et al. (2013a) ⎷ ⎷ He et al. (2013b) ⎷ ⎷ Cheng & Roth (2013) ⎷ ⎷ ⎷ Sil & Yates (2013) ⎷ ⎷ ⎷ Li et al. (2013) ⎷ ⎷ Cornolti et al. (2013) ⎷ ⎷ TAC-KBP participants ⎷ ⎷ ⎷ ⎷ ⎷

slide-46
SLIDE 46

Data Sets

46

Dataset # of Mentions Knowledge Base ACE 244 Wikipedia MSNBC 654 Wikipedia AIDA-D 5917 Yago AIDA-T 5616 Yago TAC09 3904 Wikipedia 2008 TAC10 2250 Wikipedia 2008 TAC10T 1500 Wikipedia 2008 TAC11 2250 Wikipedia 2008 TAC12 2226 Wikipedia 2008

Mention based F1 Official Eval.

slide-47
SLIDE 47

Candidate Generation

  • intra-Wikipedia
  • CrossWikis


(Spitkovsky & Chang, 2012)

  • Freebase Search API

47

Entity Type Candidate Generation Coreference Coherence Mention Extraction

Conditional Probability p(e | m)


e.g. p( | “Washington”)

slide-48
SLIDE 48

Entity Type Candidate Generation Coreference Coherence Mention Extraction

Aggregate Recall

48

1 3 5 10 20 30 50 100 Inf 0.5 0.6 0.7 0.8 0.9 1

k Recall@k CrossWikis Intra−Wikipedia Freebase Search

slide-49
SLIDE 49

49

  • Coarse-grained NER


(Stanford NER)

  • Fine-grained Entity Types 


(Ling & Weld, 2012)

Entity Type Candidate Generation Coreference Coherence Mention Extraction

Effect of Entity Types

p(e | m) = ∑t p(e,t | m)
 p(e | m) = ∑t p(e,t | m) p(t | m) Entity Type Probability

slide-50
SLIDE 50

FIGER

  • 112 entity types
  • multi-label multi-class

50

(Ling & Weld, 2012)

Entity Type Candidate Generation Coreference Coherence Mention Extraction

slide-51
SLIDE 51

Entity Type Candidate Generation Coreference Coherence Mention Extraction

Predicted Entity Types

51

F1 50 62.5 75 87.5 100 ACE MSNBC AIDA-D AIDA-T TAC09 TAC10 TAC10T TAC11 TAC12 Overall

+NER +FIGER

slide-52
SLIDE 52

Overall Performance

52

F1 50 62.5 75 87.5 100 ACE MSNBC AIDA-D AIDA-T TAC09 TAC10 TAC10T TAC11 TAC12

Candidate Generation +Entity Types +Coref. +Coherence

Entity Type Candidate Generation Coreference Coherence Mention Extraction

slide-53
SLIDE 53

Overall Performance

53

F1 50 62.5 75 87.5 100 Average Cand +Entity Type +Coref +Coherence

79.0 78.0 76.7 75.0

Entity Type Candidate Generation Coreference Coherence Mention Extraction

slide-54
SLIDE 54

Overall Performance

  • AIDA (Hoffart et al. 2011)
  • Illinois Wikifier 2.0 (Cheng & Roth, 2013)

54

F1 50 62.5 75 87.5 100 Average Cand +Entity Type +Coref +Coherence AIDA Wikifier

79.6 72.2 79.0 78.0 76.7 75.0

Entity Type Candidate Generation Coreference Coherence Mention Extraction

slide-55
SLIDE 55

Error Analysis

55

Misc 10% Specific Labels 14% Context 33% Coreference 10% Types 14% Metonymy 19%

slide-56
SLIDE 56

Conclusion

  • a modular deterministic system achieves good performance

56

Vinculum

slide-57
SLIDE 57

Conclusion

  • a modular deterministic system achieves good performance
  • a comprehensive evaluation over nine data sets

56

Vinculum

slide-58
SLIDE 58

Conclusion

  • a modular deterministic system achieves good performance
  • a comprehensive evaluation over nine data sets
  • CrossWikis provides better cond. prob.

56

Vinculum

slide-59
SLIDE 59

Conclusion

  • a modular deterministic system achieves good performance
  • a comprehensive evaluation over nine data sets
  • CrossWikis provides better cond. prob.
  • Fine-grained entity types are very useful

56

Vinculum

slide-60
SLIDE 60

Conclusion

  • a modular deterministic system achieves good performance
  • a comprehensive evaluation over nine data sets
  • CrossWikis provides better cond. prob.
  • Fine-grained entity types are very useful
  • Coreference and Coherence also improve the performance

56

Vinculum

slide-61
SLIDE 61

Conclusion

  • a modular deterministic system achieves good performance
  • a comprehensive evaluation over nine data sets
  • CrossWikis provides better cond. prob.
  • Fine-grained entity types are very useful
  • Coreference and Coherence also improve the performance
  • http://github.com/xiaoling/vinculum

56

Vinculum

slide-62
SLIDE 62
  • a modular deterministic system achieves good performance
  • a comprehensive evaluation over nine data sets
  • CrossWikis provides better cond. prob.
  • Fine-grained entity types are very useful
  • Coreference and Coherence also improve the performance
  • http://github.com/xiaoling/vinculum

Conclusion

57

Thanks!
 Questions?

Vinculum

slide-63
SLIDE 63

Entity Type Candidate Generation Coreference Coherence Mention Extraction

Oracle Entity Types

58

F1 50 62.5 75 87.5 100 ACE MSNBC AIDA-D AIDA-T TAC09 TAC10 TAC10T TAC11 TAC12 Overall

+NER (Gold) +FIGER (Gold)

slide-64
SLIDE 64

Implementation Details

59

Component Implementation Mention Extraction Stanford NER Candidate Generation CrossWikis Entity Type Prediction Fine-grained Entity Types Coreference Stanford Coreference Coherence NGD + relational triples

slide-65
SLIDE 65

System Comparison

60

VINCULUM AIDA WIKIFIER Mention Extraction NER NER NER, noun phrases Candidate Generation CrossWikis intra-Wikipedia intra-Wikipedia Entity Types FIGER NER NER Coreference representative mention

  • re-rank the

candidates Coherence NGD, relational NGD NGD, relational Learning deterministic trained on AIDA trained on Wiki

slide-66
SLIDE 66

Error Analysis: Metonymy

  • South Africa managed to avoid a fifth successive

defeat in 1996 at the hands of the All Blacks …

  • Prediction : South Africa
  • Label : South Africa national rugby union team

61

slide-67
SLIDE 67

Error Analysis: Entity Types

  • Instead of Los Angeles International, for example,

consider flying into Burbank or John Wayne Airport ...

  • Prediction : Burbank, California
  • Label : Bob Hope Airport

62

slide-68
SLIDE 68

Error Analysis: Coreference

  • It is about his mysterious father, Barack Hussein

Obama, an imperious if alluring voice gone distant and then missing.

  • Prediction : Barack Obama
  • Label : Barack Obama Sr.

63

slide-69
SLIDE 69

Error Analysis: Context

  • Scott Walker removed himself from the race, but

Green never really stirred the passions of former Walker supporters, nor did he garner outsized support “outstate”.

  • Prediction : Scott Walker (singer)
  • Label : Scott Walker (politician)

64