Multilingual Entity Linking: Comparing English and Spanish Henry - - PowerPoint PPT Presentation

multilingual entity linking
SMART_READER_LITE
LIVE PREVIEW

Multilingual Entity Linking: Comparing English and Spanish Henry - - PowerPoint PPT Presentation

1 / 36 Multilingual Entity Linking: Comparing English and Spanish Henry Rosales-M endez, Barbara Poblete and Aidan Hogan University of Chile { hrosales,bpoblete,ahogan } @dcc.uchile.cl October 22nd, 2017 LD4IE - Linked Data for


slide-1
SLIDE 1

1 / 36

Multilingual Entity Linking: Comparing English and Spanish †

Henry Rosales-M´ endez, Barbara Poblete and Aidan Hogan

University of Chile {hrosales,bpoblete,ahogan}@dcc.uchile.cl

October 22nd, 2017

† LD4IE - Linked Data for Information Extraction Workshop.

slide-2
SLIDE 2

2 / 36

Example

In November 1983 Michael Jackson and his brothers partnered with PepsiCo in a $5 million promotional deal that broke records for a celebrity endorsement. The first Pepsi Cola campaign, which ran in the United States from 1983 to 1984 and launched its iconic "New Generation" theme, included tour sponsorship, public relations events, and in-store displays.

slide-3
SLIDE 3

3 / 36

Example - DBpedia Spotlight

In November 1983 Michael Jackson and his brothers partnered with PepsiCo in a $5 million promotional deal that broke records for a celebrity endorsement. The first Pepsi Cola campaign, which ran in the United States from 1983 to 1984 and launched its iconic "New Generation" theme, included tour sponsorship, public relations events, and in-store displays.

slide-4
SLIDE 4

4 / 36

Example - DBpedia Spotlight

In November 1983 Michael Jackson and his brothers partnered with PepsiCo in a $5 million promotional deal that broke records for a celebrity endorsement. The first Pepsi Cola campaign, which ran in the United States from 1983 to 1984 and launched its iconic "New Generation" theme, included tour sponsorship, public relations events, and in-store displays.

http://dbpedia.org/resource/Indium http://dbpedia.org/resource/November http://dbpedia.org/resource/Michael_Jackson http://dbpedia.org/resource/The_Miami_Herald

slide-5
SLIDE 5

5 / 36

Phases in Entity Linking

1 Entity Recognition. 2 Entity Disambiguation.

slide-6
SLIDE 6

6 / 36

Name Variations in Entity Linking

Michael Jackson

Michael J. Jackson King of Pop Michael Joseph Jackson

slide-7
SLIDE 7

7 / 36

Overview of multilingual EL approaches

slide-8
SLIDE 8

8 / 36

Research Questions

  • How does Entity Linking performance differ between English

and Spanish?

slide-9
SLIDE 9

9 / 36

Research Questions

  • Do multilingual systems configured for the language perform

much better for Spanish than monolingual systems not configured for that language?

slide-10
SLIDE 10

10 / 36

Research Questions

  • What might be the possible reasons for the observed results?
slide-11
SLIDE 11

11 / 36

Dataset of SemEval 2015 Task 13

  • Composed by 4 documents, each document in English,

Spanish and Italian. Doc1 Doc2 Doc3 Doc4 Sentences 38 53 22 24

  • Contains 769 entity mentions with their corresponding links to

DBpedia, WordNet and BabelNet.

slide-12
SLIDE 12

12 / 36

Gold Standard

Key speakers at the social exclusion conference were (left to right) Cristina Louro, Employment, Industrial Relations and Social Affairs Directorate, European Commission Fernando Gomes, a high degree of Committee of the Regions Barbara Weiler, Member of the European Parliament José Maria Gil- Robles, Vice-President, European Parliament and John Carroll, Economic and Social Committee .

slide-13
SLIDE 13

13 / 36

Gold Standard

Key speakers at the social exclusion conference were (left to right) Cristina Louro, Employment, Industrial Relations and Social Affairs Directorate, European Commission Fernando Gomes, a high degree of Committee of the Regions Barbara Weiler, Member of the European Parliament José Maria Gil- Robles, Vice-President, European Parliament and John Carroll, Economic and Social Committee.

slide-14
SLIDE 14

14 / 36

Gold Standard

Key speakers at the social exclusion conference were (left to right) Cristina Louro, Employment, Industrial Relations and Social Affairs Directorate, European Commission Fernando Gomes, a high degree of Committee of the Regions Barbara Weiler, Member of the European Parliament José Maria Gil- Robles, Vice-President, European Parliament and John Carroll, Economic and Social Committee.

http://babelnet.org/synset?word=bn:00001739n (affairs) https://en.wikipedia.org/wiki/Affair http://dbpedia.org/page/Affair wn:affairs%1:04:00:: https://pt.wikipedia.org/wiki/Assembleia_Nacional https://en.wikipedia.org/wiki/Majlis https://en.wikipedia.org/wiki/Parlamentet https://fr.wikipedia.org/wiki/Parlimentaire https://en.wikipedia.org/wiki/Parliament

slide-15
SLIDE 15

15 / 36

Overview of multilingual EL approaches

slide-16
SLIDE 16

16 / 36

Selection criteria

1 The system must support Spanish. 2 Details of the system must be published. 3 A public demo or API must be available for the system. 4 The system must be a complete EL system including both ER

and ED phases

5 The system must perform linking to Wikipedia or a related

resource, such as DBpedia, YAGO or BabelNet.

slide-17
SLIDE 17

17 / 36

Overview of multilingual EL approaches

slide-18
SLIDE 18

18 / 36

Example annotations

Key speakers at the social exclusion conference were (left to right) Cristina Louro, Employment, Industrial Relations and Social Affairs Directorate, European Commission Fernando Gomes, a high degree of Committee of the Regions Barbara Weiler, Member of the European Parliament José Maria Gil- Robles, Vice-President, European Parliament and John Carroll, Economic and Social Committee .

slide-19
SLIDE 19

19 / 36

Example annotations - DBpedia Spotlight

Key speakers at the social exclusion conference were (left to right) Cristina Louro, Employment, Industrial Relations and Social Affairs Directorate, European Commission Fernando Gomes, a high degree of Committee of the Regions Barbara Weiler, Member of the European Parliament José Maria Gil- Robles, Vice-President, European Parliament and John Carroll, Economic and Social Committee.

slide-20
SLIDE 20

20 / 36

Example annotations - DBpedia Spotlight

Key speakers at the social exclusion conference were (left to right) Cristina Louro, Employment, Industrial Relations and Social Affairs Directorate, European Commission Fernando Gomes, a high degree of Committee of the Regions Barbara Weiler, Member of the European Parliament José Maria Gil- Robles, Vice-President, European Parliament and John Carroll, Economic and Social Committee. Los principales oradores en la conferencia sobre exclusión social fueron (de izquierda a derecha) Cristina Louro, Dirección General de Empleo, Relaciones Laborales y Asuntos Sociales de la Comisión Europea Fernando Gomes, Comité de las Regiones Barbara Weiler, diputada al Parlamento Europeo José María Gil-Robles Gil-Delgado, Vicepresidente, Parlamento Europeo, John Carroll Comité Económico y Social.

Spanish

slide-21
SLIDE 21

21 / 36

Entity Linking approaches for English

  • AIDA
  • THD
  • TAGME
slide-22
SLIDE 22

22 / 36

Example annotations - TAGME

Key speakers at the social exclusion conference were (left to right) Cristina Louro, Employment, Industrial Relations and Social Affairs Directorate, European Commission Fernando Gomes, a high degree of Committee of the Regions Barbara Weiler, Member of the European Parliament José Maria Gil- Robles, Vice-President, European Parliament and John Carroll, Economic and Social Committee .

slide-23
SLIDE 23

23 / 36

Example annotations - TAGME

Key speakers at the social exclusion conference were (left to right) Cristina Louro, Employment, Industrial Relations and Social Affairs Directorate, European Commission Fernando Gomes, a high degree of Committee of the Regions Barbara Weiler, Member of the European Parliament José Maria Gil- Robles, Vice-President, European Parliament and John Carroll, Economic and Social Committee.

slide-24
SLIDE 24

24 / 36

Example annotations - TAGME

Los principales oradores en la conferencia sobre exclusión social fueron (de izquierda a derecha) Cristina Louro, Dirección General de Empleo, Relaciones Laborales y Asuntos Sociales de la Comisión Europea Fernando Gomes, Comité de las Regiones Barbara Weiler, diputada al Parlamento Europeo José María Gil-Robles Gil-Delgado, Vicepresidente, Parlamento Europeo, John Carroll Comité Económico y Social. Spanish Key speakers at the social exclusion conference were (left to right) Cristina Louro, Employment, Industrial Relations and Social Affairs Directorate, European Commission Fernando Gomes, a high degree of Committee of the Regions Barbara Weiler, Member of the European Parliament José Maria Gil- Robles, Vice-President, European Parliament and John Carroll, Economic and Social Committee.

slide-25
SLIDE 25

25 / 36

Overall Evaluation of Entity Linking

Babefy .56 .42 .41 .36 .05 .03 DB-sp WIKIME TAGME .46 .17 THD .12 .06 AIDA .04 .01 F Measure

1

Entity Linking approaches English documents Spanish documents

slide-26
SLIDE 26

26 / 36

Overall Evaluation of Entity Linking

Babefy .56 .42 .41 .36 .05 .03 DB-sp WIKIME TAGME .46 .17 THD .12 .06 AIDA .04 .01 F Measure

1

Entity Linking approaches support spanish English documents Spanish documents

slide-27
SLIDE 27

27 / 36

Overall Evaluation of Entity Linking

Babefy .56 .42 .41 .36 .05 .03 DB-sp WIKIME TAGME .46 .17 THD .12 .06 AIDA .04 .01 F Measure

1

Entity Linking approaches support spanish English documents Spanish documents

slide-28
SLIDE 28

28 / 36

Results and Discussion

  • All approaches obtain the best results for English.
  • Entity Linking for Spanish performs much worse for

monolingual approaches than multilingual approaches that support Spanish.

  • The best score is obtained for Babelfy.
slide-29
SLIDE 29

29 / 36

Results and Discussion

We propose that this result may be due to one (or more) of the following issues faced by multilingual systems

  • The knowledge base contains different information for both

languages (DBpedia-Spotlight, WIKIME).

  • Models/techniques change according to the target language

(Babelfy).

  • Variations in the languages themselves.
slide-30
SLIDE 30

30 / 36

Results and Discussion

Issues faced by monolingual systems

  • TAGME is based on the analysis of anchor text of the English

Wikipedia pages.

  • THD selects candidates using the Search API of English

Wikipedia

  • AIDA is based on an English part-of-speech tagger.
slide-31
SLIDE 31

31 / 36

Conclusion

  • We survey the main multilingual Entity Linking approaches.
  • We performed experiments to compare selected approaches in

Spanish and English.

  • We propose some potential explanations for the observed

results.

slide-32
SLIDE 32

32 / 36

Future Work

  • Incorporate more languages.
  • Incorporate more quality measures: Accuracy, BCubed, etc.
  • Incorporate approaches that do not have Demo and APIs

available, but do have the source code.

  • Run experiments for validating explanations for differences.
slide-33
SLIDE 33

33 / 36

Multilingual Entity Linking: Comparing English and Spanish †

Henry Rosales-M´ endez, Barbara Poblete and Aidan Hogan

University of Chile {hrosales,bpoblete,ahogan}@dcc.uchile.cl

October 22nd, 2017

† LD4IE - Linked Data for Information Extraction Workshop.

slide-34
SLIDE 34

34 / 36

Evaluation of the Entity Recognition Phase

slide-35
SLIDE 35

35 / 36

Multilingual Entity Linking: Comparing English and Spanish †

Henry Rosales-M´ endez, Barbara Poblete and Aidan Hogan

University of Chile {hrosales,bpoblete,ahogan}@dcc.uchile.cl

October 22nd, 2017

† LD4IE - Linked Data for Information Extraction Workshop.

slide-36
SLIDE 36

36 / 36

Evaluation of the Entity Recognition Phase