[PPT] - Avi Sil Joint work with: Georgiana Dinu, Gourab Kundu and PowerPoint Presentation

SLIDE 1

Avi Sil

Joint work with: Georgiana Dinu, Gourab Kundu and RaduFlorian

IBM Research AI

SLIDE 2

IBM Research AI

¡ Architecture for the IBM Entity Discovery & Linking (EDL) System

§ Model & Results

▪ Mention Detection ▪ In doc Coref Resolution ▪ Entity Linking & Clustering

2

SLIDE 3

IBM Research AI

¡ Architecture for the IBM Entity Discovery & Linking (EDL) System

§ Model & Results

▪ Mention Detection ▪ In doc Coref Resolution ▪ Entity Linking & Clustering

3

Neural & Traditional Models

SLIDE 4

IBM Research AI

¡ Standard IOB sequence classifier, trained on the task ¡ 2 main classifiers: CRF and Neural Network-based

4

MD Coref EL Experiments Conclusion

SLIDE 5

IBM Research AI

5

Model probability:
Additional features: Gazetteers,

Character-level LSTMs

Recurrence: previous 2 labels

are embedded and added as input

P(yt | X, yt−1)

MD Coref EL Experiments Conclusion

SLIDE 6

IBM Research AI

¡ Both systems (CRF, NN) have high precision ¡ We combine them as follows

§ Start with the “best” system § For each consequent system

▪ Add any mentions that do not overlap with the current output

6

CRF - dev NN - dev NN+CRF - tst English 0.803 0.843 0.806 Spanish 0.785 0.809 0.785 Chinese 0.811 0.843 0.699

MD Coref EL Experiments Conclusion

0.75* CharCNNs

The Lample model didn’t produce better results on our dev data.

2016 2017

SLIDE 7

IBM Research AI

¡ Train monolingual embeddings in En and foreign language ¡ Use a small dictionary to train a map from a foreign language

into the English embedding space (Mikolov 13)

¡ Train a En mention detection model ¡ Decode new languages using the En model and mapped

embeddings

7

MD Coref EL Experiments Conclusion

SLIDE 8

IBM Research AI

¡ Weak classifiers:

§ Silver-data (Pan et.al16) trained NN models § Cross-lingual transfer of models with: 1. TAC data and 2. In-house mention

detection data

¡ Train a NN classifier to combine all the weak classifier outputs ¡ Use Spanish as a test case, apply to all other languages

8

Silver-trained Best transfer Combination Supervised Spanish 0.335 0.609 0.704 0.809

MD Coref EL Experiments Conclusion

Pan et.al_ACL16

SLIDE 9

IBM Research AI

¡ Architecture for the IBM Entity Discovery & Linking (EDL) System

§ Model & Results

▪ Mention Detection ▪ In doc Coref Resolution ▪ Entity Linking & Clustering

9

SLIDE 10

IBM Research AI

¡ All mentions in a document are clustered into entities using an in

document coreference system

¡ The canonical mention of an entity is linked using EL system ¡ The link of canonical mention is assigned to all mentions in the

entity

¡ We use 2 different coreference systems in this evaluation

§ MaxEnt Model § Neural network based Model

10

MD Coref EL Experiments Conclusion

SLIDE 11

IBM Research AI

¡ This model is used for languages without any gold standard

training data

§ low resource languages like Nepali

¡ This model is trained over English coreference data using

multilingual embeddings

¡ Subsequently, the model is tested over data from new language

without any retraining

11

MD Coref EL Experiments Conclusion

SLIDE 12

IBM Research AI

12

MD Coref EL Experiments Conclusion

m1 m2 m3 m4 E1 E2 gen features ϕ(m1, m3) ϕ(m2, m3) ϕ(m1, m4) ϕ(m2, m4) embed features vm1, m3 vm2, m3 vm1, m4 vm2, m4 ! ! 𝜍(𝑢𝑧𝑞𝑓())

+,-,/ 0,-,/

𝑤(2,() hidden layer softmax layer P(y=1|E1,E2) P(y=0|E1,E2) weighted average layer

SLIDE 13

IBM Research AI

¡ Model is trained with multilingual embeddings over

§ TAC 15 training portion of English coreference data § TAC 16 test portion of English coreference data

¡ Model is tested over

§ TAC 15 test portion of 3 languages

13

Language MUC B3 CEAF TAC 15- test-Eng 0.9 0.89 0.84 TAC 15-test-Spa 0.91 0.92 0.88 TAC 15- test-Cmn 0.97 0.96 0.91

MD Coref EL Experiments Conclusion

SLIDE 14

IBM Research AI

¡ Architecture for the IBM Entity Discovery & Linking (EDL) System

§ Model & Results

▪ Mention Detection ▪ In doc Coref Resolution ▪ Entity Linking & Clustering

14

SLIDE 15

IBM Research AI

¡ Language Independent EL system: LIEL (Sil & Florian,16)

§ Collective disambiguation model based on Maximum Entropy

¡ SOTA performance on TAC evaluation & other benchmarks

15

[查理周刊]记者 [洛朗·莱热]捍卫杂志的时候，他说的漫画并不是要挑起愤怒或暴力行为。

Chinese WP

NIL009

查理周刊

[查理周刊]记者[洛朗·莱热]捍卫杂志的时候，他说的漫画并不是要挑起愤怒或暴力行为。

English WP

NIL009

Charlie_Hebdo

MD Coref EL Experiments Conclusion

SLIDE 16

IBM Research AI

¡ New system ¡ Neural Cross-lingual Entity Linking

§ Zero-shot model § Avi Sil, Gourab Kundu, Radu Florian, Wael Hamza § AAAI 2018

16

MD Coref EL Experiments Conclusion

SLIDE 17

IBM Research AI

¡ Given: Query mention 𝑛 and a document 𝐸 ∈ 𝑓𝑜 and Wikipedia KBen ¡ Step 1 (Fast Search): Extract the most likely list of links 𝑚+9,.., 𝑚+: for 𝑛 in

𝐸

¡ Step 2 (Ranking): Estimate:

¡ where “𝐷” is the consistency measure for matching contexts between:

§ the pair (𝑛,𝐸) and a Wikipedia link 𝑚𝑘

17

MD Coref EL Experiments Conclusion

SLIDE 18

IBM Research AI

¡ Given: Query mention 𝑛 and a document 𝐸 ∈ tr and Wikipedia KBen ¡ Step 1 (Fast Search): Extract the most likely list of links 𝑚+9,.., 𝑚+: for 𝑛 in

𝐸

¡ Step 2 (Ranking): Estimate:

¡ where “𝐷” is the consistency measure for matching contexts between:

§ the pair (𝑛,𝐸) and a Wikipedia link 𝑚𝑘

18

MD Coref EL Experiments Conclusion

SLIDE 19

IBM Research AI

19

Tayvan, ABD ve İngiltere'de hukuk okuması, Tsai'ye bir LL.B. kazandırdı …

Challenges:
Link to the English Wikipedia
Comparing non-English words to English Wikipedia titles

Example by Tsai & Roth’16

MD Coref EL Experiments Conclusion

SLIDE 20

IBM Research AI

¡ Problem Formulation

§ Fast Search

¡ Word Embeddings ¡ Modeling Contexts ¡ Cross-Lingual Entity Linking

§ Model § Feature Abstraction layer

¡ Experiments

20

MD Coref EL Experiments Conclusion

SLIDE 21

IBM Research AI

21

On June 29, 2012, Holmes had filed for divorce from Cruise in New York after five years of marriage. Ethan Hunt (Cruise) while vacationing is alerted… Cruise joined in and made his debut for Arsenal F.C. Reserves… Tom Cruise Thomas Cruise (footballer)

Cruise:

en/Tom_Cruise (probability: 0.66)
en/Thomas_Cruise_(footballer) (probability: 0.33)

MD Coref EL Experiments Conclusion

SLIDE 22

IBM Research AI

22

..a los Premios Óscar y en cuatro a los Premios Globo de Oro, su significativa presencia..

Premios Oscar: en/Academy_Awards (probability: 1.0) Premios Globo de Oro: en/Golden_Globe_Awards(probability: 1.0) Interlanguage Links

MD Coref EL Experiments Conclusion

SLIDE 23

IBM Research AI

¡ Problem Formulation

§ Fast Search

¡ Word Embeddings ¡ Modeling Contexts ¡ Cross-Lingual Entity Linking

§ Model § Feature Abstraction layer

¡ Experiments

23

MD Coref EL Experiments Conclusion

SLIDE 24

IBM Research AI

¡ Mono-lingual (English)

§ CBOW Word2Vec

¡ Multi-Lingual

§ Canonical Correlation Analysis (CCA) (Faruqui & Dyer, 14; Tsai & Roth, 16):

▪ Alignment using Wikipedia title mapping obtained from inter-language links

§ Multi-CCA (Ammar et.al, 16)

▪ Project pre-trained monolingual embeddings in each language (except English) to the vector space of pre-trained English word embeddings

§ Weighted Least Squares (LS) (Mikolov et.al, 13)

24

MD Coref EL Experiments Conclusion

SLIDE 25

IBM Research AI

¡ Problem Formulation

§ Fast Search

¡ Word Embeddings ¡ Modeling Contexts ¡ Cross-Lingual Entity Linking

§ Model § Feature Abstraction layer

¡ Experiments

25

MD Coref EL Experiments Conclusion

SLIDE 26

IBM Research AI

¡ Get all sentences from the entity coref chain ¡ Concatenate them together

§ Get a variable length representation

26

“ [Broad] catapulted [England] to a 74-run win over [Australia]… [Broad] sent captain [Michael Clarke]'s off stump cart-wheeling before [Steve Smith].. [Broad] and [Bresnan] found their stride in the evening session..”

MD Coref EL Experiments Conclusion

SLIDE 27

IBM Research AI

27

tanh Mean Pool Convolution Layer Context from the Source Document

MD Coref EL Experiments Conclusion

SLIDE 28

IBM Research AI

¡ Get all possible links of the mention from the KB

28

“ [Broad] catapulted [England] to a 74-run win over [Australia]…

MD Coref EL Experiments Conclusion

SLIDE 29

IBM Research AI

¡ Extract the first paragraph of the current link/page ¡ Run CNNs on them

29

MD Coref EL Experiments Conclusion

SLIDE 30

IBM Research AI

¡ Objective: Model the whole Wikipedia page for an entity ¡ We compute the embeddings 𝑓𝑞 of the page 𝑞:

30

MD Coref EL Experiments Conclusion

SLIDE 31

IBM Research AI

h21 h41 h8 h2

… … … …

Left Context 1

w1 w2 m

LSTM LSTM LSTM

hm h1 Left Context n

w7 w8 m

LSTM LSTM LSTM

hm h7 Right Context 1

m w21 w22

LSTM LSTM LSTM

h22 hm Right Context 1

m w41 w42

LSTM LSTM LSTM

h42 hm

Overall Left Context Overall Right Context

Final Context Vector

Slices of NTN Mean-pooling Mean-pooling

MD Coref EL Experiments Conclusion

SLIDE 32

IBM Research AI

¡ Problem Formulation

§ Fast Search

¡ Word Embeddings ¡ Modeling Contexts ¡ Cross-Lingual Entity Linking

§ Model § Feature Abstraction layer

¡ Experiments

32

MD Coref EL Experiments Conclusion

SLIDE 33

IBM Research AI

…

LIEL features Cosine features

…

LDC Vectors

RBF Layer MPCM Layer

𝑻𝒑𝒈𝒖𝒏𝒃𝒚(𝑿𝟏𝒊 + 𝒄𝟏) 𝒊 = 𝝉(𝑿𝒊𝑻 + 𝒄𝒊)

…

Query Ctx Wiki Ctx

Correct Link (C=1) Incorrect Link (C=0)

MD Coref EL Experiments Conclusion

SLIDE 34

IBM Research AI

¡ Similarity Features by comparing Context Representations

§ “Sentence context - Wiki Link” Similarity § “Sentence context - Wiki First Paragraph” Similarity § “Fine-grained context - Wiki Link” Similarity § Within-language Features (LIEL, Sil & Florian, ACL16)

¡ Semantic Similarities and Dis-similarities

§ Lexical Decomposition and Composition (LDC) (Wang et.al.,16a) § Multi-perspective Context Matching (MPCM) (Wang et.al.,16b)

34

MD Coref EL Experiments Conclusion

SLIDE 35

IBM Research AI

35

tanh Mean Pool Convolution Layer Context from the query Document tanh Mean Pool Convolution Layer Context from the target Wiki Page sim

MD Coref EL Experiments Conclusion

SLIDE 36

IBM Research AI

¡ Cosine Similarity based features: ¡ These values are mapped to a 100-D vector using an RBF node

§ Smooth binning process § More parameters than a single cosine value

36

Conv(sent) Wiki Page Emb sim Conv(sent) Conv(WFPara) sim

MD Coref EL Experiments Conclusion

SLIDE 37

IBM Research AI

¡ Problem Formulation

§ Fast Search

¡ Word Embeddings ¡ Modeling Contexts ¡ Cross-Lingual Entity Linking

§ Model § Feature Abstraction layer

¡ Experiments

37

MD Coref EL Experiments Conclusion

SLIDE 38

IBM Research AI

¡ Datasets:

§ English:

▪ CoNLL 2003 ▪ TAC 2010

§ Cross-lingual (Spanish & Chinese):

▪ TAC 2015

38

MD Coref EL Experiments Conclusion

SLIDE 39

IBM Research AI

39

MD Coref EL Experiments Conclusion

SLIDE 40

IBM Research AI

40

MD Coref EL Experiments Conclusion

SLIDE 41

IBM Research AI

41

MD Coref EL Experiments Conclusion

SLIDE 42

IBM Research AI

42

1. Second in Mention Detection (English)
2. Top score in End-end metric (English)
3. Third in Spanish Mention and EL

MD Coref EL Experiments Conclusion

Trained once on English!

SLIDE 43

IBM Research AI

43

1. Models:
1. Mention: System combo
2. Coref & EL: Purely NN
2. Second position overall end-to-end metric
3. Transfer of knowledge from English helps

MD Coref EL Experiments Conclusion

Lang NERC NERLC Kikuyu 0.803 0.797 Swahili 0.664 0.51 Nepali 0.319 0.312 All 10 langs 0.488 0.401

SLIDE 44

IBM Research AI

¡ Model performs zero-shot learning for x-lingual EL

§ Can be applied to any language if we have multi-lingual embeddings § Makes effective use of deep NNs

▪ mixing CNNs and LSTMs to produce contextual representation ▪ Capture similarities + dis-similarities for the task (AAAI 2018 paper)

¡ Obtained the top score in the English EL task

§ Competitive performance in the other languages e.g. Spanish

44

MD Coref EL Experiments Conclusion

SLIDE 45

IBM Research AI

¡ Questions?

45