Entity Linking via Low-rank Subspaces Akhil Arora , Alberto - - PowerPoint PPT Presentation

entity linking via low rank subspaces
SMART_READER_LITE
LIVE PREVIEW

Entity Linking via Low-rank Subspaces Akhil Arora , Alberto - - PowerPoint PPT Presentation

Entity Linking via Low-rank Subspaces Akhil Arora , Alberto Garca-Durn, and Bob West SMLD November 13, 2019 What is Entity Linking? Michael Jordan is one of the leading figures in machine learning, and in 2016 reported him


slide-1
SLIDE 1

Entity Linking via Low-rank Subspaces

Akhil Arora, Alberto García-Durán, and Bob West SMLD

November 13, 2019

slide-2
SLIDE 2

2

What is Entity Linking?

is one of the leading figures in machine learning, and in 2016 reported him as the world’s most influential computer scientist.” “Michael Jordan Science

slide-3
SLIDE 3

2

What is Entity Linking?

is one of the leading figures in machine learning, and in 2016 reported him as the world’s most influential computer scientist.” “Michael Jordan Science

slide-4
SLIDE 4

2

What is Entity Linking?

is one of the leading figures in machine learning, and in 2016 reported him as the world’s most influential computer scientist.” “Michael Jordan Science

slide-5
SLIDE 5

2

What is Entity Linking?

is one of the leading figures in machine learning, and in 2016 reported him as the world’s most influential computer scientist.” “Michael Jordan Science

slide-6
SLIDE 6

2

What is Entity Linking?

is one of the leading figures in machine learning, and in 2016 reported him as the world’s most influential computer scientist.” “Michael Jordan Science

en.wikipedia.org/wiki/Michael_I._Jordan en.wikipedia.org/wiki/Science_(journal)

slide-7
SLIDE 7

3

How to perform Entity Linking?

  • Use Dictionaries/Alias-tables/Probability-Maps
slide-8
SLIDE 8

3

How to perform Entity Linking?

  • Use Dictionaries/Alias-tables/Probability-Maps

Candidate Entity Prior P(e|m) Michael_Jordan 0.997521 Michael_I._Jordan 0.000826 Michael_Jordan_statue 0.000826 Michael_Jordan_(footballer) 0.000826

“Michael Jordan”

slide-9
SLIDE 9

3

How to perform Entity Linking?

  • Use Dictionaries/Alias-tables/Probability-Maps

Candidate Entity Prior P(e|m) Michael_Jordan 0.997521 Michael_I._Jordan 0.000826 Michael_Jordan_statue 0.000826 Michael_Jordan_(footballer) 0.000826

“Michael Jordan”

Candidate Entity Prior P(e|m) Science 0.737955 Science_(journal) 0.207151 Science_Channel 0.005036

“Science”

slide-10
SLIDE 10

3

How to perform Entity Linking?

  • Use Dictionaries/Alias-tables/Probability-Maps

– High quality candidate generation – Prior information: a strong feature

Candidate Entity Prior P(e|m) Michael_Jordan 0.997521 Michael_I._Jordan 0.000826 Michael_Jordan_statue 0.000826 Michael_Jordan_(footballer) 0.000826

“Michael Jordan”

Candidate Entity Prior P(e|m) Science 0.737955 Science_(journal) 0.207151 Science_Channel 0.005036

“Science”

slide-11
SLIDE 11

3

How to perform Entity Linking?

  • Use Dictionaries/Alias-tables/Probability-Maps

– High quality candidate generation – Prior information: a strong feature

  • Other Features:

– Local/Global context – Coherence in disambiguated entities

Candidate Entity Prior P(e|m) Michael_Jordan 0.997521 Michael_I._Jordan 0.000826 Michael_Jordan_statue 0.000826 Michael_Jordan_(footballer) 0.000826

“Michael Jordan”

Candidate Entity Prior P(e|m) Science 0.737955 Science_(journal) 0.207151 Science_Channel 0.005036

“Science”

slide-12
SLIDE 12

3

How to perform Entity Linking?

  • Use Dictionaries/Alias-tables/Probability-Maps

– High quality candidate generation – Prior information: a strong feature

  • Other Features:

– Local/Global context – Coherence in disambiguated entities

  • Sophisticated Supervised Models

– XGBoost – Deep Neural Networks

Candidate Entity Prior P(e|m) Michael_Jordan 0.997521 Michael_I._Jordan 0.000826 Michael_Jordan_statue 0.000826 Michael_Jordan_(footballer) 0.000826

“Michael Jordan”

Candidate Entity Prior P(e|m) Science 0.737955 Science_(journal) 0.207151 Science_Channel 0.005036

“Science”

slide-13
SLIDE 13

3

How to perform Entity Linking?

  • Use Dictionaries/Alias-tables/Probability-Maps

– High quality candidate generation – Prior information: a strong feature

  • Other Features:

– Local/Global context – Coherence in disambiguated entities

  • Sophisticated Supervised Models

– XGBoost – Deep Neural Networks

Candidate Entity Prior P(e|m) Michael_Jordan 0.997521 Michael_I._Jordan 0.000826 Michael_Jordan_statue 0.000826 Michael_Jordan_(footballer) 0.000826

“Michael Jordan”

Candidate Entity Prior P(e|m) Science 0.737955 Science_(journal) 0.207151 Science_Channel 0.005036

“Science”

Sky is the limit J!

slide-14
SLIDE 14

3

How to perform Entity Linking?

  • Use Dictionaries/Alias-tables/Probability-Maps

– High quality candidate generation – Prior information: a strong feature

  • Other Features:

– Local/Global context – Coherence in disambiguated entities

  • Sophisticated Supervised Models

– XGBoost – Deep Neural Networks

Candidate Entity Prior P(e|m) Michael_Jordan 0.997521 Michael_I._Jordan 0.000826 Michael_Jordan_statue 0.000826 Michael_Jordan_(footballer) 0.000826

“Michael Jordan”

Candidate Entity Prior P(e|m) Science 0.737955 Science_(journal) 0.207151 Science_Channel 0.005036

“Science”

“NLP Progress: Entity Linking”, http://nlpprogress.com/english/entity_linking.html

Sky is the limit J!

[NAACL’18] SOTA P@1 = 95.9

slide-15
SLIDE 15

4

“Unaddressed” Research Questions

  • Are dictionaries naturally available across use-cases?
slide-16
SLIDE 16

4

“Unaddressed” Research Questions

  • Are dictionaries naturally available across use-cases?

– Lack of annotated data

  • Specialized Domains: Medical, Scientific, Legal, Enterprise specific corpora

– Noisy and rapidly evolving annotated data

  • Web queries
slide-17
SLIDE 17

4

“Unaddressed” Research Questions

  • Are dictionaries naturally available across use-cases?

– Lack of annotated data

  • Specialized Domains: Medical, Scientific, Legal, Enterprise specific corpora

– Noisy and rapidly evolving annotated data

  • Web queries
  • Can existing SOTA methods operate at Web Scale?
slide-18
SLIDE 18

4

“Unaddressed” Research Questions

  • Are dictionaries naturally available across use-cases?

– Lack of annotated data

  • Specialized Domains: Medical, Scientific, Legal, Enterprise specific corpora

– Noisy and rapidly evolving annotated data

  • Web queries
  • Can existing SOTA methods operate at Web Scale?

– We can only hope!

slide-19
SLIDE 19

4

“Unaddressed” Research Questions

  • Are dictionaries naturally available across use-cases?

– Lack of annotated data

  • Specialized Domains: Medical, Scientific, Legal, Enterprise specific corpora

– Noisy and rapidly evolving annotated data

  • Web queries
  • Can existing SOTA methods operate at Web Scale?

– We can only hope!

slide-20
SLIDE 20

4

“Unaddressed” Research Questions

  • Are dictionaries naturally available across use-cases?

– Lack of annotated data

  • Specialized Domains: Medical, Scientific, Legal, Enterprise specific corpora

– Noisy and rapidly evolving annotated data

  • Web queries
  • Can existing SOTA methods operate at Web Scale?

– We can only hope!

  • NAACL’18 SOTA: 9 hours to train using 16

threads on CoNLL benchmark of only 18K entity mentions

  • Some DL methods take more than 1 day
slide-21
SLIDE 21

4

“Unaddressed” Research Questions

  • Are dictionaries naturally available across use-cases?

– Lack of annotated data

  • Specialized Domains: Medical, Scientific, Legal, Enterprise specific corpora

– Noisy and rapidly evolving annotated data

  • Web queries
  • Can existing SOTA methods operate at Web Scale?

– We can only hope!

  • NAACL’18 SOTA: 9 hours to train using 16

threads on CoNLL benchmark of only 18K entity mentions

  • Some DL methods take more than 1 day

Scalable EL without Annotated Data

slide-22
SLIDE 22

5

Entity Linking without Annotated Data

  • Candidate generator
  • Entity embeddings

– Learn from the underlying graph – Learn from textual descriptions of entities

  • Collective disambiguation

– Ensures “topical coherence” among entities in a document

slide-23
SLIDE 23

Candidate Generation

6

  • Simple yet practical

– Candidates contain all tokens of the mention – Example: For mention “Michael Jordan”

  • Michael Jordan (basketball player) and Michael Jordan

(computer scientist) are candidates

  • Michael Jackson is not

– Rank candidates using entity degree (relates to popularity)

slide-24
SLIDE 24

Candidate Generation

6

  • Simple yet practical

– Candidates contain all tokens of the mention – Example: For mention “Michael Jordan”

  • Michael Jordan (basketball player) and Michael Jordan

(computer scientist) are candidates

  • Michael Jackson is not

– Rank candidates using entity degree (relates to popularity)

  • Aliases of entity names to boost recall

0.2 0.4 0.6 0.8 1 1 10 100 1000 10000

Oracle Recall #Candidates per Mention Alias W/O Alias

slide-25
SLIDE 25

7

Eigenthemes for Entity Disambiguation

Similarity

Function

Subspace Learning Mention-Wise Ranking Collection of Documents

slide-26
SLIDE 26

8

Subspace Learning: Intuition

Candidate Entity Michael_Jordan Michael_I._Jordan Michael_Jordan_statue Michael_Jordan_(footballer)

“Michael Jordan”

Candidate Entity Science Science_(journal) Science_Channel

“Science” Subspace captures the main “theme” of a document

slide-27
SLIDE 27

8

Subspace Learning: Intuition

Candidate Entity Michael_Jordan Michael_I._Jordan Michael_Jordan_statue Michael_Jordan_(footballer)

“Michael Jordan”

Candidate Entity Science Science_(journal) Science_Channel

“Science” Subspace captures the main “theme” of a document Top-k d-dimensional eigen vectors of the covariance matrix of candidate entity embeddings in a document

slide-28
SLIDE 28

8

Subspace Learning: Intuition

Candidate Entity Michael_Jordan Michael_I._Jordan Michael_Jordan_statue Michael_Jordan_(footballer)

“Michael Jordan”

Candidate Entity Science Science_(journal) Science_Channel

“Science” Subspace captures the main “theme” of a document Top-k d-dimensional eigen vectors of the covariance matrix of candidate entity embeddings in a document

slide-29
SLIDE 29

8

Subspace Learning: Intuition

Candidate Entity Michael_Jordan Michael_I._Jordan Michael_Jordan_statue Michael_Jordan_(footballer)

“Michael Jordan”

Candidate Entity Science Science_(journal) Science_Channel

“Science” Subspace captures the main “theme” of a document Top-k d-dimensional eigen vectors of the covariance matrix of candidate entity embeddings in a document External signals to enrich subspace learning – Eigendecomposition of the weighted covariance matrix

slide-30
SLIDE 30

8

Subspace Learning: Intuition

Candidate Entity Michael_Jordan Michael_I._Jordan Michael_Jordan_statue Michael_Jordan_(footballer)

“Michael Jordan”

Candidate Entity Science Science_(journal) Science_Channel

“Science” Subspace captures the main “theme” of a document Top-k d-dimensional eigen vectors of the covariance matrix of candidate entity embeddings in a document External signals to enrich subspace learning – Eigendecomposition of the weighted covariance matrix – Entity embeddings with high weights act as “anchor embeddings”

  • Prioritized in subspace learning

– Weighting scheme: Inverse of the rank computed using entity degree information

slide-31
SLIDE 31

Setup

9

  • Datasets

– CoNLL: Most popular benchmark dataset for EL, based on CoNLL 2003 shared task – More in the Paper:

  • WNED (Wiki and Clueweb): Benchmarks from English Wikipedia and Clueweb corpora
  • Wikilinks-Random: Tables extracted from English Wikipedia
  • Referent KB: Wikidata
slide-32
SLIDE 32

Setup

9

  • Datasets

– CoNLL: Most popular benchmark dataset for EL, based on CoNLL 2003 shared task – More in the Paper:

  • WNED (Wiki and Clueweb): Benchmarks from English Wikipedia and Clueweb corpora
  • Wikilinks-Random: Tables extracted from English Wikipedia
  • Referent KB: Wikidata
  • Embeddings:

– Words: Pre-trained Word2vec – Entity embeddings:

  • Deepwalk trained on Wikidata
  • Average of Word2vec vectors of entity description words
slide-33
SLIDE 33

Tuning on CoNLL-Val

10

Impact of entity embedding technique on EL

0.2 0.4 0.6 0.8 1 AVG EIGEN

Precision@1 Method Word2vec Deepwalk

0.2 0.4 0.6 0.8 1 5 10 15 20 25 30

Precision@1 #Components Deepwalk Word2vec

Tuning #components

slide-34
SLIDE 34

11

Baselines

  • NameMatch:

– Retrieves all entities whose names match exactly with the mention string – Ties are broken using entity degree

slide-35
SLIDE 35

11

Baselines

  • NameMatch:

– Retrieves all entities whose names match exactly with the mention string – Ties are broken using entity degree

  • Degree:

– Candidates are ranked based on entity degree – Highest degree candidate entity is the prediction for a given mention

  • Avg and WAvg:

– (Weighted)Avg of candidate embeddings in a document as its representation – Most similar candidate (Cosine Sim) with the doc representation is the prediction

slide-36
SLIDE 36

11

Baselines

  • NameMatch:

– Retrieves all entities whose names match exactly with the mention string – Ties are broken using entity degree

  • Degree:

– Candidates are ranked based on entity degree – Highest degree candidate entity is the prediction for a given mention

  • Avg and WAvg:

– (Weighted)Avg of candidate embeddings in a document as its representation – Most similar candidate (Cosine Sim) with the doc representation is the prediction

  • Le and Titov: Uses weak supervision or distant learning

– Candidate entities of a mention (which might miss the ‘true’ entity) are scored higher than a number of randomly sampled entities – Rank based on similarity between candidates and the mention context

slide-37
SLIDE 37

Is Eigenthemes Effective?

12

slide-38
SLIDE 38

Is Eigenthemes Effective?

12

0.2 0.4 0.6 0.8 1 NameMatch Avg Eigen WAvg Degree WEigen

Precision@1 Ceiling

0.2 0.4 0.6 0.8 1 NameMatch Avg Eigen WAvg Degree WEigen

Precision@1 Ceiling

Easy Mentions: Degree ranks gold entity at the top Hard Mentions: Gold entity not at the top using degree

slide-39
SLIDE 39

Is Eigenthemes Effective?

12

0.2 0.4 0.6 0.8 1 NameMatch Avg Eigen WAvg Degree WEigen

Precision@1 Ceiling

0.2 0.4 0.6 0.8 1 NameMatch Avg Eigen WAvg Degree WEigen

Precision@1 Ceiling

Easy Mentions: Degree ranks gold entity at the top Precision@1 in Le and Titov’s CoNLL Test Dataset Hard Mentions: Gold entity not at the top using degree

slide-40
SLIDE 40

Is Eigenthemes Effective?

12

0.2 0.4 0.6 0.8 1 NameMatch Avg Eigen WAvg Degree WEigen

Precision@1 Ceiling

0.2 0.4 0.6 0.8 1 NameMatch Avg Eigen WAvg Degree WEigen

Precision@1 Ceiling

Easy Mentions: Degree ranks gold entity at the top Precision@1 in Le and Titov’s CoNLL Test Dataset Using Eigenthemes score as a feature for Supervised models portrays significant performance improvements Hard Mentions: Gold entity not at the top using degree

slide-41
SLIDE 41

Takeaways

A single hyperparameter (#components) – ease of tuning for unannotated data Light-weight and scalable – < 10 min for CoNLL, approx. 20 times faster than existing SOTA Language independence Ability to incorporate external signals as weights

13

slide-42
SLIDE 42

Early work that just scratches the surface

Takeaways

A single hyperparameter (#components) – ease of tuning for unannotated data Light-weight and scalable – < 10 min for CoNLL, approx. 20 times faster than existing SOTA Language independence Ability to incorporate external signals as weights

13

slide-43
SLIDE 43

Early work that just scratches the surface – Candidate generation too simplistic – Quality of entity embeddings can be improved – Other tricks to boost performance …

Takeaways

A single hyperparameter (#components) – ease of tuning for unannotated data Light-weight and scalable – < 10 min for CoNLL, approx. 20 times faster than existing SOTA Language independence Ability to incorporate external signals as weights

13

slide-44
SLIDE 44

THANK YOU

Questions?

14