Cross-Lingual Cross-Document Coreference with Entity Linking Sean - PDF document

Cross-Lingual Cross-Document Coreference with Entity Linking Sean Monahan, John Lehmann, Timothy Nyberg, Jesse Plymale, and Arnold Jung Language Computer Corporation 2435 North Central Expressway Richardson, TX, USA sean@languagecomputer.com Abstract they are shared by different entities. Given a name in text, it must be disambiguated among the possible This paper describes our approach to the 2011 meanings. Wikipedia contains over 100 people with Text Analysis Conference (TAC) Knowledge the name “John Williams”. Base Population (KBP) cross-lingual entity Second, entities are often characterized by syn- linking problem. We recast the problem of entity linking as one of cross-document en- onymy , being referred to by different name variants tity coreference. We compare an approach or aliases. Recognizing all instances or mentions of where deductive entity linking informs cross- an entity in text requires identifying all of its vari- document coreference to an inductive ap- ants. Both “Cassius Clay” and “Muhammad Ali” proach where coreference and linking judge- refer to the same entity. ments are mutually beneficial. We also de- A third problem is identifying when an entity scribe our approach to cross-lingual entity mentioned in text is not contained in the KB at all. linking comparing a native linking approach with an approach utilizing machine transla- Such a reference is said to be a NIL mention . De- tion. Our results show that inductive linking tecting NIL mentions is important not only to avoid to a native language knowledge base offers the creating spurious links, but also for identifying new best performance. candidates for addition to the KB. As many people as there are in Wikipedia, there are billions that are 1 Introduction not. To create new KB entries, a system also needs Entity linking is the task of associating entity men- to correctly generate links between the co-referring tions in text with entries in a knowledge base (KB). NIL entities. This would enable not only the auto- For example, when seeing the text “movie star Tom matic growth of a KB in terms of knowledge about Cruise”, the text “Tom Cruise” should be linked known entities, but also in terms of previously un- the Wikipedia page http://en.wikipedia. known entities. This extension to the base problem org/wiki/Tom_cruise . This is useful because has been described as entity linking with NIL clus- it enables the automatic population of a KB with tering . new facts about that entity extracted from the text. Conversely, existing information stored in the KB Entity linking with NIL clustering can be recast as can be used to aid in more accurate text extraction. a cross-document coreference approach where the Correlation of entities between documents also ben- cross-document and linking components are mutu- efits other cross-document natural language process- ally beneficial. In both approaches, the challenges ing tasks like question answering and event corefer- of polysemy and synonymy must be resolved. The ence. difference is that entity linking uses a set of pre- Entity linking is challenging for three primary existing identifiers supplied by the KB, thus facili- reasons. First, names are often polysemous in that tating integration of different knowledge stores. In

cross-document coreference the identifiers created independent. In the cross-lingual task, entity men- are implied by the cluster membership and are rel- tions from Chinese documents must also be mapped ative to the corpus. back to an English KB. We also report improve- We take an inductive approach which treats the ments to our entity linking system originally used in 2010 and show how those enhancements affected problem as cross-document coreference with entity linking. Rather than only clustering the detected our end-to-end score. NIL mentions, we cluster all entities while using 2 Related Work output from our entity linker as suggestions but not fact. This is counter to the deductive approach which Over the past few years, TAC’s Knowledge Base first links all of the entities and then clusters the re- Population task has been at the forefront of devel- maining NIL mentions. The inductive approach is opment in the area of entity linking. State-of-the- illustrated in Figure 1. art approaches have recently been summarized by Ji and Grishman (2011). Several entity linking ef- forts preceded TAC, and used Wikipedia as a KB as well. Cucerzan (2007) formed an extensive map- ping of surface text to Wikipedia pages and used it to maximize agreement between context and candidates being disambiguated. Milne and Witten (2008) used Wikipedia concepts as context terms to cross- link documents with Wikipedia articles. Lehmann et al. (2010) utilized a similar contextual model along with a number of other features in a system which achieved top entity linking performance at Figure 1: Inductive Entity Linking TAC 2010 KBP. Our approach to cross-document coreference was In doing so, we effectively use clustering to im- shaped in part by the challenge of implementing su- prove our entity linker performance and attain a bet- pervised learning with highly imbalanced data sets. 1 ter end-to-end score. The difference between these A variety of techniques including under-sampling two approaches is described in Algorithms 1 and 2. negative examples and over-sampling positive examples have been proposed to handle skewed dis- Algorithm 1 Deductive Approach tributions, e.g. Akbani et al. (2004). We chose to 1. Link each entity mention to KB or assign NIL. implement supervised learning over tractable sub- 2. Cluster NIL mentions. sets of mentions—in this case, we limited super- 3. Assign each NIL cluster a unique NIL id. vised learning to pairs of mentions that share the same text. There are still relatively few examples of super- Algorithm 2 Inductive Approach vised cross-document coreferencing in the litera- 1. Link each entity mention to KB or assign NIL. ture. Mayfield et al. (2009) implemented an SVM 2. Cluster ALL mentions with links as features. classifier for pairs of entity mentions in their cross- 3. Vote in each cluster to assign KB id or NIL. document coreference system. Entity mention clusters were formed by the transitive closure of the We demonstrate our inductive approach for the positive mention pairings classified by their model. TAC 2011 Knowledge Base Population (KBP) En- 1 Consider a set of 2,000 mentions with an average of four tity Linking evaluation. In 2011, the entity linking mentions per cluster and where a cluster is taken to represent an task gained the additional requirement of NIL clus- entity. In this case, there are 1,999,000 unique pairwise combi- tering. We participated both in the English mono- nations of mentions. In a random draw of two mentions, there lingual as well as the English-Chinese cross-lingual is only a 0.0002% chance that the pair will belong to the same tasks using a system which is largely language- cluster.

Cross-Lingual Cross-Document Coreference with Entity Linking Sean - PDF document

Cross-Lingual Cross-Document Coreference with Entity Linking Sean Monahan, John Lehmann, Timothy Nyberg, Jesse Plymale, and Arnold Jung Language Computer Corporation 2435 North Central Expressway Richardson, TX, USA sean@languagecomputer.com

Cross-Lingual Cross-Document Coreference with Entity Linking Sean Monahan, John Lehmann, Timothy

Entity Linking and Coreference Resolution CSCI 699 Instructor: Xiang Ren USC Computer Science

Additional Semantic Tasks: Entity Coreference and Question Answering CMSC 473/673 UMBC Outline

Resolving Entity Coreference in Croatian with a Constrained Mention-Pair Model s and Jan

Co-training Embeddings of Knowledge Graphs and Entity Descriptions for Cross-lingual Entity

Interplay of Coreference and Discourse Research and Annotations Anna Nedoluzhko Charles University,

CORBON 2016: Coreference Resolution Beyond OntoNotes NAACL HLT 2016 Workshop Maciej Ogrodniczuk

Evaluating Theories of Coreference Resolution Coreference Resolution: The Task Bayer AG has

Latent Structures for Coreference Resolution Sebastian Martschat and Michael Strube Heidelberg

Easy Victories and Uphill Ba4les in Coreference Resolu9on Greg

linking, cross-lingual entity linking) TAC 2011 Summarization Track Guided Summarization task

by Learning Entity-Level Distributed Representations K. Clark and C. Manning, ACL 2016

End-to-end Neural Coreference Resolution Kenton Lee, Luheng He, Mike Lewis and Luke Zettlemoyer

Bleaching Text: Abstract Features for Cross-lingual Gender Prediction. Rob van der Goot, Nikola

Cross-lingual Information Retrieval Pavel Pecina Institute of Formal and Applied Linguistics

Cross-Lingual Information Retrieval Language Technology I Language Technology I Crosslingual

Budget Amendment: Operations Maintenance Facility (OMF) Link Control Center Upgrade 05/28/20

Early College Program Michigan Transfer Agreement Student Requirements Partnerships O

Rachel Muir USGS Northeast Area Atlantic Coastal Fish Habitat Partnership Steering Committee

Jon Boswell, Chief Executive Local Community Coordination Older Peoples Partnership Board 9 th

Community Outreach Results Lanesville Community Center Outreach Committee Presentation to

Academic Programs Physical Spaces Community outreach Harper support services Organizational

COVID 19 Provider Engagement Webinar Friday 20 th March Welcome Welcome and Introductions

Network Characteristics of the Worlds Best LCCs Compared World Low Cost Airlines