entity linking enityt linking
play

Entity Linking Enityt Linking Laura Dietz dietz@cs.umass.edu - PowerPoint PPT Presentation

Entity Linking Enityt Linking Laura Dietz dietz@cs.umass.edu University of Massachusetts Use cursor keys to fl ip through slides. Problem: Entity Linking Query Entity NIL Given query mention in a source document, identify which Wikipedia


  1. Entity Linking Enityt Linking Laura Dietz dietz@cs.umass.edu University of Massachusetts Use cursor keys to fl ip through slides.

  2. Problem: Entity Linking Query Entity NIL Given query mention in a source document, identify which Wikipedia entity it represents

  3. Problem: Example Example Query: Northern Ireland has a population of about one and a half million people. At the time of partition in 1921 Protestants / unionists had a two-thirds majority in the Northern Ireland region. The fi rst Prime Minister of Northern Ireland, Sir James Craig, described the state as having ‘a Protestant Parliament for a Protestant people.’ The state e ff ectively discriminated against Catholics in housing, jobs, and political representation. http://cain.ulst.ac.uk/othelem/incorepaper09.htm Search for: Northern Ireland

  4. Problem: Example Example Query: Northern Ireland has a population of about one and a half million people. At the time of partition in 1921 Protestants / unionists had a two-thirds majority in the region. The fi rst Prime Minister of Northern Ireland, Sir James Craig James Craig, described the state as having ‘a Protestant Parliament for a Protestant people.’ The state e ff ectively discriminated against Catholics in housing, jobs, and political representation. http://cain.ulst.ac.uk/othelem/incorepaper09.htm Search for: James Craig

  5. near miss! :(

  6. Overview M1: Popularity Method M2: Machine Learned Similarity M3: Context with IR M4: Joint Assignment Model M5: Joint Retrieval Model Experimental Results Online Demos

  7. Challenges

  8. Problem: Example Example Query: Northern Ireland has a population of about one and a half million people. At the time of partition in 1921 Protestants / unionists had a two-thirds majority in the region. The fi rst Prime Minister of Northern Ireland, Sir James Craig James Craig, described the state as having ‘a Protestant Parliament for a Protestant people.’ The state e ff ectively discriminated against Catholics in housing, jobs, and political representation. http://cain.ulst.ac.uk/othelem/incorepaper09.htm

  9. Document Analysis Symbol Notation: James Craig Name Variants: Within-doc Coreference Q: Query String Neighbor Mentions: V: Name Variants NER T agger M: Neighbor Mentions (Alternative Mention Detection) S: Sentence Sentence: T erm models

  10. Method 1: Popularity of Links Step 1: Build a dictionary of names for each entity. Step 2: Inspect all KB entities that have the query mention as a name variant. Step 3: Choose the entity with the most inlinks through this name.

  11. Names and Links on Wikipedia

  12. Mining Name Variants and Neighbors Sir James Craig 1st Viscount Craigavon Ulster Unionists Northern Ireland Prime Minister of Northern Ireland Irish Unionist Unionism in Ireland Ulster Northern Ireland James Craig, 1st Viscount Craigavon

  13. Pros & Cons: Popularity of Links Works for very popular entities such as "Northern Ireland" Fails for entities with confusable names "James Craig", "Spring fi eld", "Jaguar"

  14. Method 1: Popularity of Links Step 1: Build a dictionary of names for each entity. Step 2: Inspect all KB entities that have the query mention as a name variant. Step 3: Choose the entity with the most inlinks through this name.

  15. Method 2: Machine Learn Similarity Step 1: Collect di ff erent similarity features of query mention and entities Step 2: Machine learn the feature weights on training data (e.g. learning to rank) Step 3: Apply similarity to query and each entity, select the most similar entity.

  16. Method 2: Similarity Features James Craig James Craig JC, 1st Viscount James Craig Craigavon (actor) title: title: James Craig, 1st James Craig (actor) Viscount Craigavon anchor text: anchor text: James Craig James Craig Sir James Craig's James Craig in Craig Administration disambiguation: disambiguation: James Craig James Craig freebase name: is exact title match? freebase name: James Craig (actor) is disambiguation match? Lord Craigavon inlinks through this name is approx match? TF-IDF similarity score

  17. Learn Similarity and NIL Query Candidate Entities Q: Query String Feature vector for V: Name Variants M: Neighbor Mentions supervised Re-ranking S: Sentence and classi fi cation Re-ranking NIL classi fi cation: Is it similar enough to be a match? NIL? Features: Name variants, Document T erms, Links, Popularity ...

  18. Pros & Cons:Machine Learn Similarity Pro: Combination of di ff erent indicators of similarity; option to predict "NILs". Pro: Can incorporate name variants found in the text (coreference tools) Con: Requires selection of a pool of candidate entities, which can be large ("John Smith"). Will still fail on "James Craig", because the wrong James has more anchor text matches.

  19. Method 3: Context Disambiguation Step 1: Identify surrounding text, entities, etc. Step 2: Issue search query containing all of it.

  20. Di ff erent Kinds of Context Example Query: Northern Ireland has a population of about one and a half million people. At the time of partition in 1921 Protestants / unionists had a two-thirds majority in the region. The fi rst Prime Minister of Northern Ireland, Sir James Craig James Craig, described the state as having ‘a Protestant Parliament for a Protestant people.’ The state e ff ectively discriminated against Catholics in housing, jobs, and political representation. http://cain.ulst.ac.uk/othelem/incorepaper09.htm Search for: James Craig + Name Variants + Neighbors + Sentence

  21. Method 3: Pros and Cons Works for "James Craig"! Problematic when neighbors are ambiguous: "Lisa witnessed a shooting at Spring fi eld high school". (Unclear which "Lisa" and which "Spring fi eld")

  22. Method 3: Pros and Cons Also problematic when neighbors don't provide enough disambiguation power Example, all other James Craigs of Ireland which are less popular.

  23. Method 4: Joint Assignment Models Step 1: Identify all entity mentions in text Step 2: For each mention retrieve candidates James Craig Step 3: Select the entity that maximizes: across all neighbor entities

  24. Method 4 Example: Candidates Northern Ireland James Craig American Catholics Catholic Church

  25. Method 4 Example: Correct Selection Northern Ireland James Craig American Catholics Catholic Church

  26. Method 4 Example: Scoring Northern Ireland James Craig American Catholics Catholic Church

  27. Method 4 Example: Wrong Selection Northern Ireland not compatible James Craig American Catholics Catholic Church

  28. Method 4: Learn Similarities As in Method 2, learn feature-based similarity mention-entity entity-entity similarity similarity entity-entity similarity features: mutual links, same categories, RDF relations

  29. Method 4: Joint Assignment Models Step 1: Identify all entity mentions in text Step 2: For each mention retrieve candidates James Craig Step 3: Select the entity that maximizes: across all neighbor entities

  30. Method 4: Pros and Cons Pro: Can mutually resolve uncertainty Con: Requires a pool of candidates (trade-o ff runtime versus recall) Con: expensive inference problem May still fail on less popular James Craigs or when context does not resolve ambiguities.

  31. Method 5: Joint Retrieval Model Step 1: Identify all entity mentions in text Step 2: For each query mention: Issue a search query including query, neighboring mentions, sentence Weighting each "ingredient" di ff erently Intuition: structured matching of text to KB

  32. Names and Links on Wikipedia

  33. Mining Name Variants and Neighbors Sir James Craig 1st Viscount Craigavon Ulster Unionists Northern Ireland Prime Minister of Northern Ireland Irish Unionist Unionism in Ireland Ulster Northern Ireland James Craig, 1st Viscount Craigavon

  34. Method 5 Example: Scoring Northern Ireland James Craig Ulster Unionists Nashville, T ennessee Northern Ireland B-Movies Prime Minister of Northern Ireland Catholics

  35. Connection between 4 and 5 Method 4 Method 5 Integrate over Requires iterative optimization Can be solved inside a search engine Q: Query String V: Name Variants M: Neighbor Mentions S: Sentence

  36. Need a Search Index for the KB Preprocessing: Identify context of query build a special KB Index mention neighbor-entity similarity features: neighbor occurs in entity's text neighbor is title of inlinks/outlinks

  37. Special Wikipedia Index Ulster Unionists Search Index Northern Ireland with special Fields Prime Minister of Northern Ireland Ulster Unionists Northern Ireland

  38. Neighbor-Entity Features Northern Ireland Ulster Unionists James Craig Northern Ireland neighbor occurs in text? neighbor in inlink titles? neighbor in outlink titles? is approx match? TF-IDF similarity score Machine learn the feature weights on training data (e.g. learning to rank)

  39. Query mention-Entity Features James Craig Ulster Unionists Northern Ireland is exact title match? is disambiguation match? inlinks through this name is approx match? TF-IDF similarity score Machine learn the feature weights on training data (e.g. learning to rank)

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend