Extraction and Linking Speaker: Shih-Han Lo Advisor: Professor - - PowerPoint PPT Presentation

extraction and linking
SMART_READER_LITE
LIVE PREVIEW

Extraction and Linking Speaker: Shih-Han Lo Advisor: Professor - - PowerPoint PPT Presentation

Lightweight Multilingual Entity Extraction and Linking Speaker: Shih-Han Lo Advisor: Professor Jia-Ling Koh Author: Aasish Pappu, Roi Blanco, Yashar Mehdad, Amanda Stent, Kapil Thadani Date: 2017/09/19 Source: WSDM 17 1 Outline


slide-1
SLIDE 1

Speaker: Shih-Han Lo Advisor: Professor Jia-Ling Koh Author: Aasish Pappu, Roi Blanco, Yashar Mehdad, Amanda Stent, Kapil Thadani Date: 2017/09/19 Source: WSDM ’17

1

Lightweight Multilingual Entity Extraction and Linking

slide-2
SLIDE 2

Outline

2

 Introduction  Method  Experiment  Conclusion

slide-3
SLIDE 3

Introduction

3

 Key tasks for text analytic systems:

 Named Entity Recognition (NER)  Named Entity Linking (NEL)

 Some systems perform NER and NEL jointly.

slide-4
SLIDE 4

Introduction

4

 Most approaches involve (some of) the following

steps:

 Mention detection  Mention normalization  Candidate entity retrieval for each mention  Entity disambiguation for mentions with multiple

candidate entities

 Mention clustering for mentions that do not link

to any entity

Motivation

slide-5
SLIDE 5

Outline

5

 Introduction  Method  Experiment  Conclusion

slide-6
SLIDE 6

Mention Detection

6

 Typically consists of running an NER system over

input text.

 We use simple CRFs and only a few lexical,

syntactic and semantic features.

slide-7
SLIDE 7

System Description

7

slide-8
SLIDE 8

Candidate Entity Retrieval

8

 Entity Embeddings

 We aim to simultaneously learn D-dimensional

representations of Ent and W in a common vector space.

 Training our embedding model: continuous skip-

grams with 300 dimensions and a window size of 10.

slide-9
SLIDE 9

Candidate Entity Retrieval

9

 Entity Embeddings

slide-10
SLIDE 10

Candidate Entity Retrieval

10

 Fast Entity Linking

 Fast Entity Linker (FEL) is an unsupervised

approach.

 FEL imposes contextual dependencies by

calculating the cosine distance between two entities.

 Candidate  From the substrings of the input string

 Minimal perfect hash function  Elias-Fano integer coding

slide-11
SLIDE 11

Entity Disambiguation

11

 Task of figuring out to which candidate entity a

mention refers.

 The task is complex because mentions may refer

to different entities, depend on local context.

slide-12
SLIDE 12

Entity Disambiguation

12

 Forward-Backward Algorithm (FwBw)

slide-13
SLIDE 13

Entity Disambiguation

13

 Exemplar (Clustering)

slide-14
SLIDE 14

Entity Disambiguation

14

 Label Propagation (LabelProp)

 Modified adsorption (MAD)  For , we inject seed labels L on a

few nodes.

 For nodes V’, we assign a label distribution:  Along with , MAD takes three hyper-

parameters as input.

 We pick the highest ranked label for each node in

V as the final candidate.

slide-15
SLIDE 15

Outline

15

 Introduction  Method  Experiment  Conclusion

slide-16
SLIDE 16

Experiment

16

 Datasets:

 Cross-lingual TAC KBP 2013  Mono-lingual AIDA-CONLL 2003

slide-17
SLIDE 17

Experiment

17

 Setup

 N-best: N = 10  FwBw: λ = 0.5  Exemplar: max_iterations = 300, λ = 0.5  LabelProp: μ1 = 1, μ2 = 1e − 2, μ3 = 1e − 2

slide-18
SLIDE 18

Experiment

18

 TAC KBP Evaluation Results

slide-19
SLIDE 19

Experiment

19

 Analysis

slide-20
SLIDE 20

Experiment

20

 Analysis

slide-21
SLIDE 21

Experiment

21

 AIDA Evaluation

slide-22
SLIDE 22

Experiment

22

 Runtime Performance

slide-23
SLIDE 23

Outline

23

 Introduction  Method  Experiment  Conclusion

slide-24
SLIDE 24

Conclusion

24

 Our NER implementation is outperformed only

by NER systems that use much more complex feature engineering and/or modeling methods.

 In future work, we plan to improve the

performance of our system for other languages, by expanding the pool of entities for which we have information.

 Candidate retrieval in Spanish is relatively poor

compared to English and Chinese.