lecture 24 ner entity linking
play

Lecture 24: NER & Entity Linking Kai-Wei Chang CS @ University - PowerPoint PPT Presentation

Lecture 24: NER & Entity Linking Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage: http://kwchang.net/teaching/NLP16 CS6501-NLP 1 Organizing knowledge Its a version of Chicago the Chicago was used by default


  1. Lecture 24: NER & Entity Linking Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage: http://kwchang.net/teaching/NLP16 CS6501-NLP 1

  2. Organizing knowledge It’s a version of Chicago – the Chicago was used by default Chicago VIII was one of the standard classic Macintosh for Mac menus through early 70s-era Chicago menu font, with that distinctive MacOS 7.6, and OS 8 was albums to catch my thick diagonal in the ”N”. released mid-1997.. ear, along with Chicago II . Slides are adapted from Dan Roth CS6501-NLP 2

  3. Cross-document co-reference resolution It’s a version of Chicago – the Chicago was used by default Chicago VIII was one of the standard classic Macintosh for Mac menus through early 70s-era Chicago menu font, with that distinctive MacOS 7.6 , and OS 8 was albums to catch my thick diagonal in the ”N”. released mid-1997.. ear, along with Chicago II . CS6501-NLP 3

  4. Reference resolution: (disambiguation to Wikipedia) It’s a version of Chicago – the Chicago was used by default Chicago VIII was one of the standard classic Macintosh for Mac menus through early 70s-era Chicago menu font, with that distinctive MacOS 7.6 , and OS 8 was albums to catch my thick diagonal in the ”N”. released mid-1997.. ear, along with Chicago II . CS6501-NLP 4

  5. The “Reference” Collection has Structure It’s a version of Chicago – the Chicago was used by default Chicago VIII was one of the standard classic Macintosh for Mac menus through early 70s-era Chicago menu font, with that distinctive MacOS 7.6 , and OS 8 was albums to catch my thick diagonal in the ”N”. released mid-1997.. ear, along with Chicago II . Is_a Is_a Used_In Released Succeeded CS6501-NLP 5

  6. Analysis of Information Networks It’s a version of Chicago – the Chicago was used by default Chicago VIII was one of the standard classic Macintosh for Mac menus through early 70s-era Chicago menu font, with that distinctive MacOS 7.6 , and OS 8 was albums to catch my thick diagonal in the ”N”. released mid-1997.. ear, along with Chicago II . CS6501-NLP 6

  7. Wikipedia as a knowledge resource …. Is_a Is_a Used_In Released Succeeded CS6501-NLP 7

  8. Cycles of Wikification: Knowledge: Grounding The Reference Problem for/using Knowledge Blumenthal (D) is a candidate for the U.S. Senate seat now held by Christopher Dodd (D), and he has held a commanding lead in the race since he entered it. But the Times report has the potential to fundamentally reshape the contest in the Nutmeg State. Blumenthal (D) is a candidate for the U.S. Senate seat now held by Christopher Dodd (D), and he has held a commanding lead in the race since he entered it. But the Times report has the potential to fundamentally reshape the contest in the Nutmeg State. CS6501-NLP 8

  9. Challenging v Dealing with Ambiguity of Natural Language v Mentions of entities and concepts could have multiple meanings v Dealing with Variability of Natural Language v A given concept could be expressed in many ways v Wikification addresses these two issues in a specific way: v The Reference Problem v What is meant by this concept? (WSD + Grounding) v More than just co-reference (within and across documents) CS6501-NLP 9

  10. General Challenges Blumenthal (D) is a candidate for the U.S. Senate seat now held by Christopher Dodd (D), and he has held a commanding lead in the race since he entered it. But the Times report has the potential to fundamentally reshape the contest in the Nutmeg State. • Ambiguity • Variability CT The New York Times Times Connecticut The Nutmeg State The Times • Concepts outside of • Scale Wikipedia (NIL) • Millions of labels • Blumenthal ? CS6501-NLP 10

  11. Wikification: Subtasks v Wikification and Entity Linking requires addressing several sub-tasks: v Identifying Target Mentions v Mentions in the input text that should be Wikified v Identifying Candidate Titles v Candidate Wikipedia titles that could correspond to each mention v Candidate Title Ranking v Rank the candidate titles for a given mention v NIL Detection and Clustering v Identify mentions that do not correspond to a Wikipedia title v Entity Linking: cluster NIL mentions that represent the same entity. CS6501-NLP 11

  12. High-level Algorithmic Approach. v Input: A text document d; Output: a set of pairs ( m i ,t i ) v m i are mentions in d; t j ( m i ) are corresponding Wikipedia titles, or NIL. v (1) Identify mentions m i in d v (2) Local Inference v For each m i in d: v Identify a set of relevant titles T( m i ) v Rank titles t i ∈ T( m i ) [E.g., consider local statistics of edges [( m i ,t i ) , ( m i ,*), and (*, t i )] occurrences in the Wikipedia graph] v (3) Global Inference v For each document d: v Consider all m i ∈ d; and all t i ∈ T( m i ) v Re-rank titles t i ∈ T( m i ) [E.g., if m, m’ are related by virtue of being in d, their corresponding titles t, t’ may also be related] CS6501-NLP 12

  13. Local approach A text Document Identified mentions Wikipedia Articles Local score of matching the mention to the title Γ is a solution to the problem § (decomposed by m i ) A set of pairs (m,t) § m: a mention in the document § t: the matched Wikipedia Title § CS6501-NLP 13

  14. Global Approach: Using Additional Structure Text Document(s)—News, Blogs,… Wikipedia Articles Adding a “global” term to evaluate how good the structure of the solution is. Use the local solutions Γ’ (each • mention considered independently. Evaluate the structure based on pair- • wise coherence scores Ψ(t i ,t j ) Choose those that satisfy document • coherence conditions. CS6501-NLP 14

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend