Entity Linking Enityt Linking Laura Dietz dietz@cs.umass.edu - - PowerPoint PPT Presentation

entity linking enityt linking
SMART_READER_LITE
LIVE PREVIEW

Entity Linking Enityt Linking Laura Dietz dietz@cs.umass.edu - - PowerPoint PPT Presentation

Entity Linking Enityt Linking Laura Dietz dietz@cs.umass.edu University of Massachusetts Use cursor keys to fl ip through slides. Problem: Entity Linking Query Entity NIL Given query mention in a source document, identify which Wikipedia


slide-1
SLIDE 1

Enityt Linking Entity Linking

Use cursor keys to flip through slides.

Laura Dietz dietz@cs.umass.edu University of Massachusetts

slide-2
SLIDE 2

Given query mention in a source document, identify which Wikipedia entity it represents

Query Entity

Problem: Entity Linking

NIL

slide-3
SLIDE 3

Problem: Example

Example Query: Northern Ireland has a population of about one and a half million people. At the time of partition in 1921 Protestants / unionists had a two-thirds majority in the

  • region. The first Prime Minister of Northern Ireland, Sir

James Craig, described the state as having ‘a Protestant Parliament for a Protestant people.’ The state effectively discriminated against Catholics in housing, jobs, and political representation. http://cain.ulst.ac.uk/othelem/incorepaper09.htm Northern Ireland Northern Ireland

Search for:

slide-4
SLIDE 4
slide-5
SLIDE 5

Problem: Example

Example Query: Northern Ireland has a population of about one and a half million people. At the time of partition in 1921 Protestants / unionists had a two-thirds majority in the

  • region. The first Prime Minister of Northern Ireland, Sir

James Craig, described the state as having ‘a Protestant Parliament for a Protestant people.’ The state effectively discriminated against Catholics in housing, jobs, and political representation. http://cain.ulst.ac.uk/othelem/incorepaper09.htm James Craig James Craig

Search for:

slide-6
SLIDE 6
slide-7
SLIDE 7

near miss! :(

slide-8
SLIDE 8

Overview M1: Popularity Method M2: Machine Learned Similarity M3: Context with IR M4: Joint Assignment Model M5: Joint Retrieval Model Experimental Results Online Demos

slide-9
SLIDE 9

Challenges

slide-10
SLIDE 10

Problem: Example

Example Query: Northern Ireland has a population of about one and a half million people. At the time of partition in 1921 Protestants / unionists had a two-thirds majority in the

  • region. The first Prime Minister of Northern Ireland, Sir

James Craig, described the state as having ‘a Protestant Parliament for a Protestant people.’ The state effectively discriminated against Catholics in housing, jobs, and political representation. http://cain.ulst.ac.uk/othelem/incorepaper09.htm James Craig

slide-11
SLIDE 11

Q: Query String V: Name Variants M: Neighbor Mentions S: Sentence

James Craig

Document Analysis

Name Variants: Within-doc Coreference Neighbor Mentions: NER T agger (Alternative Mention Detection) Sentence: T erm models Symbol Notation:

slide-12
SLIDE 12

Method 1: Popularity of Links Step 1: Build a dictionary of names for each entity. Step 2: Inspect all KB entities that have the query mention as a name variant. Step 3: Choose the entity with the most inlinks through this name.

slide-13
SLIDE 13

Names and Links on Wikipedia

slide-14
SLIDE 14

Ulster Unionists Northern Ireland Prime Minister of Northern Ireland Sir James Craig 1st Viscount Craigavon Northern Ireland James Craig, 1st Viscount Craigavon Irish Unionist Unionism in Ireland Ulster

Mining Name Variants and Neighbors

slide-15
SLIDE 15

Pros & Cons: Popularity of Links Works for very popular entities such as "Northern Ireland" Fails for entities with confusable names "James Craig", "Springfield", "Jaguar"

slide-16
SLIDE 16

Method 1: Popularity of Links Step 1: Build a dictionary of names for each entity. Step 2: Inspect all KB entities that have the query mention as a name variant. Step 3: Choose the entity with the most inlinks through this name.

slide-17
SLIDE 17

Method 2: Machine Learn Similarity Step 1: Collect different similarity features of query mention and entities Step 2: Machine learn the feature weights

  • n training data (e.g. learning to rank)

Step 3: Apply similarity to query and each entity, select the most similar entity.

slide-18
SLIDE 18

Method 2: Similarity Features

James Craig JC, 1st Viscount Craigavon

title: James Craig, 1st Viscount Craigavon anchor text: Sir James Craig's Craig Administration disambiguation: James Craig freebase name: Lord Craigavon

James Craig James Craig (actor)

title: James Craig (actor) anchor text: James Craig James Craig in disambiguation: James Craig freebase name: James Craig (actor)

James Craig

is exact title match? is disambiguation match? inlinks through this name is approx match? TF-IDF similarity score

slide-19
SLIDE 19

Features: Name variants, Document T erms, Links, Popularity ...

Query Feature vector for supervised Re-ranking and classification Re-ranking NIL classification: Is it similar enough to be a match? NIL?

Learn Similarity and NIL

Candidate Entities

Q: Query String V: Name Variants M: Neighbor Mentions S: Sentence

slide-20
SLIDE 20

Pros & Cons:Machine Learn Similarity Pro: Combination of different indicators

  • f similarity; option to predict "NILs".

Pro: Can incorporate name variants found in the text (coreference tools) Con: Requires selection of a pool of candidate entities, which can be large ("John Smith"). Will still fail on "James Craig", because the wrong James has more anchor text matches.

slide-21
SLIDE 21

Method 3: Context Disambiguation Step 1: Identify surrounding text, entities, etc. Step 2: Issue search query containing all of it.

slide-22
SLIDE 22

Different Kinds of Context

Example Query: Northern Ireland has a population of about one and a half million people. At the time of partition in 1921 Protestants / unionists had a two-thirds majority in the

  • region. The first Prime Minister of Northern Ireland, Sir

James Craig, described the state as having ‘a Protestant Parliament for a Protestant people.’ The state effectively discriminated against Catholics in housing, jobs, and political representation. http://cain.ulst.ac.uk/othelem/incorepaper09.htm James Craig James Craig +Name Variants + Neighbors + Sentence

Search for:

slide-23
SLIDE 23
slide-24
SLIDE 24

Method 3: Pros and Cons Works for "James Craig"! Problematic when neighbors are ambiguous: "Lisa witnessed a shooting at Springfield high school". (Unclear which "Lisa" and which "Springfield")

slide-25
SLIDE 25

Method 3: Pros and Cons Also problematic when neighbors don't provide enough disambiguation power Example, all other James Craigs of Ireland which are less popular.

slide-26
SLIDE 26
slide-27
SLIDE 27

Method 4: Joint Assignment Models Step 1: Identify all entity mentions in text Step 2: For each mention retrieve candidates Step 3: Select the entity that maximizes: across all neighbor entities

James Craig

slide-28
SLIDE 28

James Craig Northern Ireland Catholics

American Catholic Church

Method 4 Example: Candidates

slide-29
SLIDE 29

James Craig Northern Ireland Catholics

American Catholic Church

Method 4 Example: Correct Selection

slide-30
SLIDE 30

James Craig Northern Ireland Catholics

American Catholic Church

Method 4 Example: Scoring

slide-31
SLIDE 31

James Craig Northern Ireland Catholics

American Catholic Church

Method 4 Example: Wrong Selection

not compatible

slide-32
SLIDE 32

Method 4: Learn Similarities As in Method 2, learn feature-based similarity entity-entity similarity features: mutual links, same categories, RDF relations mention-entity similarity entity-entity similarity

slide-33
SLIDE 33

Method 4: Joint Assignment Models Step 1: Identify all entity mentions in text Step 2: For each mention retrieve candidates Step 3: Select the entity that maximizes: across all neighbor entities

James Craig

slide-34
SLIDE 34

Method 4: Pros and Cons Pro: Can mutually resolve uncertainty Con: Requires a pool of candidates (trade-off runtime versus recall) Con: expensive inference problem May still fail on less popular James Craigs

  • r when context does not resolve

ambiguities.

slide-35
SLIDE 35

Method 5: Joint Retrieval Model Step 1: Identify all entity mentions in text Step 2: For each query mention: Issue a search query including query, neighboring mentions, sentence Weighting each "ingredient" differently Intuition: structured matching of text to KB

slide-36
SLIDE 36

Names and Links on Wikipedia

slide-37
SLIDE 37

Ulster Unionists Northern Ireland Prime Minister of Northern Ireland Sir James Craig 1st Viscount Craigavon Northern Ireland James Craig, 1st Viscount Craigavon Irish Unionist Unionism in Ireland Ulster

Mining Name Variants and Neighbors

slide-38
SLIDE 38

James Craig Northern Ireland Catholics

Ulster Unionists Northern Ireland Prime Minister of Northern Ireland Nashville, T ennessee B-Movies

Method 5 Example: Scoring

slide-39
SLIDE 39

Method 4

Q: Query String V: Name Variants M: Neighbor Mentions S: Sentence

Integrate over Method 5

Connection between 4 and 5

Requires iterative optimization Can be solved inside a search engine

slide-40
SLIDE 40

Identify context of query mention

Need a Search Index for the KB

Preprocessing: build a special KB Index

neighbor-entity similarity features: neighbor occurs in entity's text neighbor is title of inlinks/outlinks

slide-41
SLIDE 41

Special Wikipedia Index

Search Index with special Fields

Ulster Unionists Northern Ireland Prime Minister of Northern Ireland Ulster Unionists Northern Ireland

slide-42
SLIDE 42

Neighbor-Entity Features

Ulster Unionists Northern Ireland

James Craig

neighbor occurs in text? neighbor in inlink titles? neighbor in outlink titles? is approx match? TF-IDF similarity score Northern Ireland

Machine learn the feature weights

  • n training data (e.g. learning to rank)
slide-43
SLIDE 43

Query mention-Entity Features

Ulster Unionists Northern Ireland

James Craig

Machine learn the feature weights

  • n training data (e.g. learning to rank)

is exact title match? is disambiguation match? inlinks through this name is approx match? TF-IDF similarity score

slide-44
SLIDE 44

Issue the Entity Linking IR Actually: structured matching

Method 5: Joint Retrieval Model

with special KB Index Select the entity that maximizes:

slide-45
SLIDE 45

Method 5: Pros and Cons Pro: Similar to joint assignment, but cheaper Pro: Does not require pools (optimize in IR) Pro: Can be combined with Machine Learning (Method 2) to improve precision. Con: Fails when context is misleading

slide-46
SLIDE 46

Really Difficult Example

Example Query: ABC shot "Lost" in Australia ABC True entity: American Broadcasting Company Context "Australia" and mention similarity will point instead to Australian Broadcasting Corporation Approach: Identify misleading neighbors (variant of M5)

slide-47
SLIDE 47

5 10 15 20 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00

average recall

2009

Q QV QVM_nrm QVM_nrm LTR 5 10 15 20 0.75 0.80 0.85 0.90 0.95 1.00

2010

5 10 15 20 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00

2011

5 10 15 20

cutoff rank k

0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00

2012

Q: Query String V: Name Variants M: Neighbor Mentions S: Sentence

M1 (Popularity) variant of M5 (Joint Retrieval) M5 + M2 (JR + ML)

TAC KBP Entity Linking T ask

slide-48
SLIDE 48

References

M1 Popularity / Keyphraseness: Mihalcea et al. In CIKM, 2007. Wikify!: linking documents to encyclopedic knowledge. M2 Machine Learn Mention-to-Entity Similarity Bunescu et al. In EACL, 2006. "Using Encyclopedic Knowledge for Named entity Disambiguation."

  • M. Dredze, et al. In ACL, 2010

"Entity disambiguation for knowledge base population". M4 Joint Assignment Silviu Cucerzan. In EMNLP-CoNLL, 2007. "Large-scale named entity disambiguation based on wikipedia data." Ratinov et al. ACL 2011. "Local and global algorithms for disambiguation to wikipedia." Entity-to-Entity Features: Ceccarelli et al. In CIKM, 2013. "Learning relatedness measures for entity linking." M5 Joint Retrieval Model Dalton et al. In OAIR, 2013. "A neighborhood relevance model for entity linking." more: http://nlp.cs.rpi.edu/kbp/2014/elreading.html http://www.mendeley.com/groups/3339761/entity-linking-and-retrieval/

slide-49
SLIDE 49

T

  • olkits & Online Demos

List of toolkits: http://nlp.cs.rpi.edu/kbp/2014/tools.html Several Online Demos: UIUC Wikifier http://cogcomp.cs.illinois.edu/demo/wikify/ T agMe! http://tagme.di.unipi.it/ AIDA https://gate.d5.mpi-inf.mpg.de/webaida/

slide-50
SLIDE 50

UIUC Wikifier

slide-51
SLIDE 51

T agMe!

slide-52
SLIDE 52

AIDA (prior+sim+coherence)

slide-53
SLIDE 53

AIDA (prior only)

slide-54
SLIDE 54

Another Example: Lisa Fletcher

slide-55
SLIDE 55

UIUC Wikifier

slide-56
SLIDE 56

T agMe!

slide-57
SLIDE 57

AIDA

slide-58
SLIDE 58

Search Engine (DuckDuckGo)

slide-59
SLIDE 59

Participate!

TAC KBP Entity Linking T ask http://nlp.cs.rpi.edu/kbp/2014/ SIGIR Entity Recognition and Disambiguation Challenge http://web-ngram.research.microsoft.com/erd2014/ INEX 2014 T weet Contextualization Track https://inex.mmci.uni-saarland.de/tracks/qa/ Questions? email: dietz@cs.umass.edu web: http://ciir.cs.umass.edu/~dietz/