Cross-lingual named entity disambiguation for concept translation - PowerPoint PPT Presentation

Cross-lingual named entity disambiguation for concept translation Tadej Štajner Jožef Stefan Institute 15 March 2012, Luxembourg W3C Workshop: The Multilingual Web – The Way Ahead ailab.ijs.si

Motivation Translating proper names … can be problematic for statistical MT systems HTML5 translate attribute helps, but someone still needs to do the actual mark-up ailab.ijs.si

Motivation (2) Depends on source and target language: There are specific rules to translate (or transliterate) particular proper names or concepts Sometimes, they should not even be translated Possible solution: figure out what entity is actually being mentioned and see if any existing translated expression exists for that entity: Using a background knowledge base Translates the problem into named entity disambiguation ailab.ijs.si

Named entity disambiguation Document Entity Label Mention ailab.ijs.si

Knowledge bases Doing this requires good coverage of entities in the KB The usual choice is DBPedia Works well for the bigger languages (en) What about languages with less coverage? as of Jan 2012, English has 3.9M articles, Slovene has 132k* *http://stats.wikimedia.org/EN/TablesArticlesTotal.htm ailab.ijs.si

Cross-lingual named entity disambiguation What if the input document and the knowledge base are in different languages? … there is no knowledge base for a particular language ... the proper knowledge base is too sparse Can we share these knowledge bases across languages , given that they have different coverage? ailab.ijs.si

Important ranking features Mention popularity – P(entity|mention) ”Kashmir” .. Kashmir_(song) = 0.05 ”Kashmir” … Kashmir_(region) = 0.91 Captures the most likely entity behind the mention Context similarity - sim(ctx(mention), ctx(entity)) Context of a mention: surrounding sentences Context of an entity: the description of the entity Captures the entity that best fits the lexical context Coherence Entities that appear together tend to be related to one another Usually solved by a greedy graph pruning algorithm Collectively captures the entities that make sense appearing together ailab.ijs.si

What breaks when going cross-lingual? Gathering candidate entities for a label Only works reliably for proper names, and even that only when there’s no transliteration or the KB has the concept name in a local language Mention popularity (same problem) Context similarity Similarity operates in vector space, treating the distinct words as dimensions. Across different languages the words don’t line up, so the similarity is almost meaningless! ailab.ijs.si

Cross-lingual context similarity Instead of just directly computing similarity, map the input document into the target language via a mapping, and compute similarity in that space. Entity Direct similarity Source document Knowledge Mapped base document Cross Cross- lingual similarity mapping ailab.ijs.si

How do we obtain the mapping? We train it via a parallel (or comparable) corp Not statistical MT – just providing a linear mapping from one language space to another, which is an easier problem to solve CLIR technique: Canonical Correlation Analysis Our implementation: EuroParl ailab.ijs.si

Potential issues If the mapping is weak because of low domain overlap, back off to direct similarity Entity Direct similarity Source document Knowledge Mapped base document Cross Cross- lingual similarity mapping ailab.ijs.si

Future work Re-use language and semantic resources to improve performance on NLP tasks across different languages FP7 - XLike Lower the barrier for using this technology for enriching content within a CMS standardization work in the W3C Multilingual Web – LT WG ailab.ijs.si

How to make this technology useful? Use these annotations within HTML Transparent to: Normal CMS operation Web browser rendering Readable to: Localization workflow (terminology management - ITS) Downstream NLP processing (OLiA, NIF) Metadata crawlers (knowledge management) Training of MT systems ailab.ijs.si

Demo Example (RDFa Lite) enrycher.ijs.si ailab.ijs.si

Cross-lingual named entity disambiguation for concept translation - PowerPoint PPT Presentation

Cross-lingual named entity disambiguation for concept translation Tadej tajner Joef Stefan Institute 15 March 2012, Luxembourg W3C Workshop: The Multilingual Web The Way Ahead ailab.ijs.si Motivation Translating proper names

Word Sense Word Sense Word Sense Disambiguation Disambiguation Disambiguation Presented by

AIDA-light: High-Throughput Named-Entity Disambiguation Ba Dat Nguyen Johannes Hoffart Martin

Cross-Lingual Cross-Document Coreference with Entity Linking Sean Monahan, John Lehmann, Timothy

Named Entity Recognition Using BERT and ELMo Group 8 : Mikaela Guerrero Vikash Kumar Nitya

Co-training Embeddings of Knowledge Graphs and Entity Descriptions for Cross-lingual Entity

Recycling Named Entity Taggers Unsupervised Domain and Language Adaptation for Named Entity

Cross-Lingual Cross-Document Coreference with Entity Linking Sean Monahan, John Lehmann, Timothy

Cross-Lingual Word Sense Disambiguation using WordNets and Context Mapping Priyank Jaini Ankit

Tulip: Lightweight Entity Recognition and Disambiguation Using Wikipedia-Based Topic Centroids

linking, cross-lingual entity linking) TAC 2011 Summarization Track Guided Summarization task

Named Entity WordNet *Istituto di Linguistica Computazionale (Pisa, Italy) ^University of

diaNED: Time-Aware Named Entity Disambiguation for Diachronic Corpora Prabal Agarwal 1 , Jannik

Full-document Entity Extraction and Disambiguation Silviu Cucerzan Microsoft Research Machine

02 | 27 SOUTHERN CROSS 23.04 03 | 27 SOUTHERN CROSS 23.04 04 | 27 SOUTHERN CROSS 23.04 06

Bleaching Text: Abstract Features for Cross-lingual Gender Prediction. Rob van der Goot, Nikola

Cross-lingual Information Retrieval Pavel Pecina Institute of Formal and Applied Linguistics

Adversarial Examples and Adversarial Training Ian Goodfellow, OpenAI Research Scientist Guest

Overfitting, Cross-Validation Recommended reading: Neural nets: Mitchell Chapter 4

CSC411 Tutorial #3 Cross-Validation and Decision Trees February 3, 2016 Boris Ivanovic*

Achieving Cross-System Collaboration to Support Young People in the Transition Years: A Tip

Combating Label Noise in Deep Learning using Abstention Speaker: Sunil Thulasidasan

Calling External Routines in Stata Giovanni Cerulli and Antonio Zinilli IRCrES-CNR 1

Security and Privacy in Machine Learning Nicolas Papernot Pennsylvania State University &

FIGHTING COVID-19 ON THE FRONTLINE Frank Baez, BS, RN NYU Langone Health Challenges for Nursing

Cross-lingual named entity disambiguation for concept translation - PowerPoint PPT Presentation

Cross-lingual named entity disambiguation for concept translation Tadej tajner Joef Stefan Institute 15 March 2012, Luxembourg W3C Workshop: The Multilingual Web The Way Ahead ailab.ijs.si Motivation Translating proper names

Word Sense Word Sense Word Sense Disambiguation Disambiguation Disambiguation Presented by

AIDA-light: High-Throughput Named-Entity Disambiguation Ba Dat Nguyen Johannes Hoffart Martin

Cross-Lingual Cross-Document Coreference with Entity Linking Sean Monahan, John Lehmann, Timothy

Named Entity Recognition Using BERT and ELMo Group 8 : Mikaela Guerrero Vikash Kumar Nitya

Co-training Embeddings of Knowledge Graphs and Entity Descriptions for Cross-lingual Entity

Recycling Named Entity Taggers Unsupervised Domain and Language Adaptation for Named Entity

Cross-Lingual Cross-Document Coreference with Entity Linking Sean Monahan, John Lehmann, Timothy

Cross-Lingual Word Sense Disambiguation using WordNets and Context Mapping Priyank Jaini Ankit

Tulip: Lightweight Entity Recognition and Disambiguation Using Wikipedia-Based Topic Centroids

linking, cross-lingual entity linking) TAC 2011 Summarization Track Guided Summarization task

Named Entity WordNet *Istituto di Linguistica Computazionale (Pisa, Italy) ^University of

diaNED: Time-Aware Named Entity Disambiguation for Diachronic Corpora Prabal Agarwal 1 , Jannik

Full-document Entity Extraction and Disambiguation Silviu Cucerzan Microsoft Research Machine

02 | 27 SOUTHERN CROSS 23.04 03 | 27 SOUTHERN CROSS 23.04 04 | 27 SOUTHERN CROSS 23.04 06

Bleaching Text: Abstract Features for Cross-lingual Gender Prediction. Rob van der Goot, Nikola

Cross-lingual Information Retrieval Pavel Pecina Institute of Formal and Applied Linguistics

Adversarial Examples and Adversarial Training Ian Goodfellow, OpenAI Research Scientist Guest

Overfitting, Cross-Validation Recommended reading: Neural nets: Mitchell Chapter 4

CSC411 Tutorial #3 Cross-Validation and Decision Trees February 3, 2016 Boris Ivanovic*

Achieving Cross-System Collaboration to Support Young People in the Transition Years: A Tip

Combating Label Noise in Deep Learning using Abstention Speaker: Sunil Thulasidasan

Calling External Routines in Stata Giovanni Cerulli and Antonio Zinilli IRCrES-CNR 1

Security and Privacy in Machine Learning Nicolas Papernot Pennsylvania State University &amp;

FIGHTING COVID-19 ON THE FRONTLINE Frank Baez, BS, RN NYU Langone Health Challenges for Nursing

Security and Privacy in Machine Learning Nicolas Papernot Pennsylvania State University &