Combining Probabilistic and Translation- Based Models for - PowerPoint PPT Presentation

Combining Probabilistic and Translation- Based Models for Information Retrieval based on Word Sense Annotations Elisabeth Wolf, Delphine Bernhard, Iryna Gurevych Ubiquitous Knowledge Processing (UKP) Lab Prof. Dr. Iryna Gurevych Fachbereich Informatik Technische Universität Darmstadt

UKP Motivation: monolingual task 1. Increase precision of WSD 2. Apply translation-based model + combination with probabilistic m. 1. ….. 1. ….. UBC NUS 2. ….. 2. ….. 3. ….. 3. ….. 1. ….. 2. ….. Comb 3. ….. Heuristic-based combinations Reranking of retrieved of both annotations documents I N D E X I N G R E T R I E V A L 02.10.09 | Computer Science Department | Ubiquitous Knowledge Processing Lab

1. Increase precision of WSD • Four different index types: UBC NUS <SYNSET SCORE="0.32" <SYNSET CODE="00735486-n"/> SCORE="0.82" <SYNSET CODE="00735486-n"/> SCORE="0.21" <SYNSET CODE="03857483-n"/> SCORE="0.18" <SYNSET CODE="03857483-n"/> SCORE="0.47" CODE="01252343-n"/> 02.10.09 | Computer Science Department | Ubiquitous Knowledge Processing Lab

1. Increase precision of WSD • Four different index types: UBC NUS <SYNSET SCORE="0.32" <SYNSET • UBCBest CODE="00735486-n"/> SCORE="0.82" <SYNSET CODE="00735486-n"/> SCORE="0.21" <SYNSET CODE="03857483-n"/> SCORE="0.18" <SYNSET CODE="03857483-n"/> SCORE="0.47" CODE="01252343-n"/> 02.10.09 | Computer Science Department | Ubiquitous Knowledge Processing Lab

1. Increase precision of WSD • Four different index types: UBC NUS <SYNSET SCORE="0.32" <SYNSET • UBCBest CODE="00735486-n"/> SCORE="0.82" • NUSBest <SYNSET CODE="00735486-n"/> SCORE="0.21" <SYNSET CODE="03857483-n"/> SCORE="0.18" <SYNSET CODE="03857483-n"/> SCORE="0.47" CODE="01252343-n"/> 02.10.09 | Computer Science Department | Ubiquitous Knowledge Processing Lab

1. Increase precision of WSD • Four different index types: UBC NUS <SYNSET SCORE="0.32" <SYNSET • UBCBest CODE="00735486-n"/> SCORE="0.82" • NUSBest <SYNSET CODE="00735486-n"/> • CombBest SCORE="0.21" <SYNSET CODE="03857483-n"/> SCORE="0.18" <SYNSET CODE="03857483-n"/> SCORE="0.47" CODE="01252343-n"/> 02.10.09 | Computer Science Department | Ubiquitous Knowledge Processing Lab

1. Increase precision of WSD • Four different index types: UBC NUS <SYNSET SCORE="0.32" <SYNSET • UBCBest CODE="00735486-n"/> SCORE="0.82" • NUSBest <SYNSET CODE="00735486-n"/> • CombBest SCORE="0.21" <SYNSET CODE="03857483-n"/> SCORE="0.18" <SYNSET CODE="03857483-n"/> SCORE="0.47" CODE="01252343-n"/> 0.82 + 0.32 = 1.14 0.18 + 0.21 = 0.39 02.10.09 | Computer Science Department | Ubiquitous Knowledge Processing Lab

1. Increase precision of WSD • Four different index types: UBC NUS <SYNSET SCORE="0.32" <SYNSET • UBCBest CODE="00735486-n"/> SCORE="0.82" • NUSBest <SYNSET CODE="00735486-n"/> • CombBest SCORE="0.21" <SYNSET CODE="03857483-n"/> SCORE="0.18" <SYNSET CODE="03857483-n"/> SCORE="0.47" CODE="01252343-n"/> CombBest 0.82 + 0.32 = 1.14 0.18 + 0.21 = 0.39 02.10.09 | Computer Science Department | Ubiquitous Knowledge Processing Lab

1. Increase precision of WSD • Four different index types: UBC NUS <SYNSET SCORE="0.32" <SYNSET • UBCBest CODE="0111222-n "/> SCORE="0.82" • NUSBest <SYNSET CODE="00735486-n"/> • CombBest SCORE="0.21" <SYNSET CODE=„0333444-n"/> • CombBest+ SCORE="0.18" <SYNSET CODE="03857483-n"/> SCORE="0.47" CODE="01252343-n"/> 02.10.09 | Computer Science Department | Ubiquitous Knowledge Processing Lab

1. Increase precision of WSD • Four different index types: UBC NUS <SYNSET SCORE="0.32" <SYNSET • UBCBest CODE="0111222-n "/> SCORE="0.82" • NUSBest <SYNSET CODE="00735486-n"/> • CombBest SCORE="0.21" <SYNSET CODE=„0333444-n"/> • CombBest+ SCORE="0.18" <SYNSET CODE="03857483-n"/> SCORE="0.47" CODE="01252343-n"/> • Terrier, version 2.1 • Multi field indices: token, lemma, sense (UBCBest, NUSBest, CombBest, CombBest+) 02.10.09 | Computer Science Department | Ubiquitous Knowledge Processing Lab

2. Combination of Retrieval Models Probabilistic model + Translation-based Query expansion: model: Divergence From Randomness Monolingual translation-based model (TM): BM25 model (DFR_BM25): • Motivation: • address the lexical gap problem • learn translation probabilities between terms trained on Kullback-Leibler model (KL): parallel dataset: dictionary and encyclopedic definitions • 10 terms out of 3 top ranked docs • „the translation probability reflects the association between query term and document term” • Usage: • trained model recently successfully applied by Bernhard&Gurevych (2009) for answer finding • trained on token 02.10.09 | Computer Science Department | Ubiquitous Knowledge Processing Lab

2. Combination of Retrieval Models • Hypothesis: probabilistic and translation-based models retrieve different sets of relevant documents 1. ….. 1. ….. TM DFR_BM25 + KL 2. ….. 2. ….. 3. ….. 3. ….. token token lemma 1. ….. 2. ….. sense 3. ….. 02.10.09 | Computer Science Department | Ubiquitous Knowledge Processing Lab

2. Combination of Retrieval Models • Hypothesis: probabilistic and translation-based models retrieve different sets of relevant documents 1. ….. 1. ….. TM DFR_BM25 + KL 2. ….. 2. ….. 3. ….. 3. ….. token token lemma 1. ….. 2. ….. sense 3. ….. A) normalization: r norm (i) = (r orig (i) – r min ) / (r max – r min ) B) CombSUM by Fox&Shaw(1994): r comb (i) = SUM(Individual r norm (i)) 02.10.09 | Computer Science Department | Ubiquitous Knowledge Processing Lab

Extrinsic evaluation: sense index types • Retrieval based on indexed senses (DFR_BM25 +KL): Index type MAP (training) MAP (test) UBCBest 0.2514 0.2636 NUSBest 0.2930 0.3473 CombBest 0.2921 0.3313 CombBest+ 0.3011 0.3551 • CombBest+ outperforms CombBest • Focus on „combined“ indices 02.10.09 | Computer Science Department | Ubiquitous Knowledge Processing Lab

Combining Probabilistic and Translation- Based Models for - PowerPoint PPT Presentation

Combining Probabilistic and Translation- Based Models for Information Retrieval based on Word Sense Annotations Elisabeth Wolf, Delphine Bernhard, Iryna Gurevych Ubiquitous Knowledge Processing (UKP) Lab Prof. Dr. Iryna Gurevych Fachbereich

4CSLL5 IBM Translation Models Martin Emms October 22, 2020 4CSLL5 IBM Translation Models IBM

4CSLL5 IBM Translation Models IBM models Probabilities and Translation Alignments Martin Emms

Combining Models Oliver Schulte - CMPT 726 Bishop PRML Ch. 14 Combining Models: Some Theory

11-731 Machine Translation Speech 2 Speech Translation Speech Translation Three part systems

Community Translation By Willem Stoeller Examples Community Translation Virtual Teams Powering

Probabilistic model Probabilistic model c Probabilistic model Probabilistic model c c

4CSLL5 IBM Translation Models Martin Emms October 23, 2020 4CSLL5 IBM Translation Models

4CSLL5 IBM Translation Models Martin Emms October 29, 2020 4CSLL5 IBM Translation Models

Probabilistic Graphical Models CMSC 678 UMBC Probabilistic Graphical Models A graph G that

Global Translation Services Website translation using post-edited machine translation and

Statistical Machine Translation Nadir Durrani 21-November-2014 Machine Translation

Computer Aided Translation Philipp Koehn 30 April 2015 Philipp Koehn Machine Translation:

Computer Aided Translation Philipp Koehn 15 November 2018 Philipp Koehn Machine Translation:

Representing Huge Translation Models Statistical Machine Translation parallel text + alignment

CS 4110 Probabilistic Programming Probabilistic Programming It's not about writing software.

From Probabilistic Circuits to Probabilistic Programs and Back Guy Van den Broeck PROBPROG - Oct

Experiments on Active Learning for Croatian Word Sense Disambiguation c and Jan Domagoj

Online: Unit Testjng Michael Meeks <michael.meeks@collabora.com> mmeeks / irc.freenode.net

Lexical Semantics & WSD Ling571 Deep Processing Techniques for NLP February 24, 2016

A Sublinear Bipartiteness Tester for Bounded Degree Graphs Oded Goldreich Dana Ron

Topic Models for Word Sense Disambiguation and Token-based Idiom Detection Linlin Li, Benjamin

INF4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 (Mostly Text)

Computational Semantics and Pragmatics Autumn 2011 Raquel Fernndez Institute for Logic,

Identifying Generic Expressions Nils Reiter and Anette Frank Department of Computational

Combining Probabilistic and Translation- Based Models for - PowerPoint PPT Presentation

Combining Probabilistic and Translation- Based Models for Information Retrieval based on Word Sense Annotations Elisabeth Wolf, Delphine Bernhard, Iryna Gurevych Ubiquitous Knowledge Processing (UKP) Lab Prof. Dr. Iryna Gurevych Fachbereich

4CSLL5 IBM Translation Models Martin Emms October 22, 2020 4CSLL5 IBM Translation Models IBM

4CSLL5 IBM Translation Models IBM models Probabilities and Translation Alignments Martin Emms

Combining Models Oliver Schulte - CMPT 726 Bishop PRML Ch. 14 Combining Models: Some Theory

11-731 Machine Translation Speech 2 Speech Translation Speech Translation Three part systems

Community Translation By Willem Stoeller Examples Community Translation Virtual Teams Powering

Probabilistic model Probabilistic model c Probabilistic model Probabilistic model c c

4CSLL5 IBM Translation Models Martin Emms October 23, 2020 4CSLL5 IBM Translation Models

4CSLL5 IBM Translation Models Martin Emms October 29, 2020 4CSLL5 IBM Translation Models

Probabilistic Graphical Models CMSC 678 UMBC Probabilistic Graphical Models A graph G that

Global Translation Services Website translation using post-edited machine translation and

Statistical Machine Translation Nadir Durrani 21-November-2014 Machine Translation

Computer Aided Translation Philipp Koehn 30 April 2015 Philipp Koehn Machine Translation:

Computer Aided Translation Philipp Koehn 15 November 2018 Philipp Koehn Machine Translation:

Representing Huge Translation Models Statistical Machine Translation parallel text + alignment

CS 4110 Probabilistic Programming Probabilistic Programming It's not about writing software.

From Probabilistic Circuits to Probabilistic Programs and Back Guy Van den Broeck PROBPROG - Oct

Experiments on Active Learning for Croatian Word Sense Disambiguation c and Jan Domagoj

Online: Unit Testjng Michael Meeks &lt;michael.meeks@collabora.com&gt; mmeeks / irc.freenode.net

Lexical Semantics &amp; WSD Ling571 Deep Processing Techniques for NLP February 24, 2016

A Sublinear Bipartiteness Tester for Bounded Degree Graphs Oded Goldreich Dana Ron

Topic Models for Word Sense Disambiguation and Token-based Idiom Detection Linlin Li, Benjamin

INF4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 (Mostly Text)

Computational Semantics and Pragmatics Autumn 2011 Raquel Fernndez Institute for Logic,

Identifying Generic Expressions Nils Reiter and Anette Frank Department of Computational

Online: Unit Testjng Michael Meeks <michael.meeks@collabora.com> mmeeks / irc.freenode.net

Lexical Semantics & WSD Ling571 Deep Processing Techniques for NLP February 24, 2016