combining probabilistic and translation based models for
play

Combining Probabilistic and Translation- Based Models for - PowerPoint PPT Presentation

Combining Probabilistic and Translation- Based Models for Information Retrieval based on Word Sense Annotations Elisabeth Wolf, Delphine Bernhard, Iryna Gurevych Ubiquitous Knowledge Processing (UKP) Lab Prof. Dr. Iryna Gurevych Fachbereich


  1. Combining Probabilistic and Translation- Based Models for Information Retrieval based on Word Sense Annotations Elisabeth Wolf, Delphine Bernhard, Iryna Gurevych Ubiquitous Knowledge Processing (UKP) Lab Prof. Dr. Iryna Gurevych Fachbereich Informatik Technische Universität Darmstadt

  2. UKP Motivation: monolingual task 1. Increase precision of WSD 2. Apply translation-based model + combination with probabilistic m. 1. ….. 1. ….. UBC NUS 2. ….. 2. ….. 3. ….. 3. ….. 1. ….. 2. ….. Comb 3. ….. Heuristic-based combinations Reranking of retrieved of both annotations documents I N D E X I N G R E T R I E V A L 02.10.09 | Computer Science Department | Ubiquitous Knowledge Processing Lab

  3. 1. Increase precision of WSD • Four different index types: UBC NUS <SYNSET SCORE="0.32" <SYNSET CODE="00735486-n"/> SCORE="0.82" <SYNSET CODE="00735486-n"/> SCORE="0.21" <SYNSET CODE="03857483-n"/> SCORE="0.18" <SYNSET CODE="03857483-n"/> SCORE="0.47" CODE="01252343-n"/> 02.10.09 | Computer Science Department | Ubiquitous Knowledge Processing Lab

  4. 1. Increase precision of WSD • Four different index types: UBC NUS <SYNSET SCORE="0.32" <SYNSET • UBCBest CODE="00735486-n"/> SCORE="0.82" <SYNSET CODE="00735486-n"/> SCORE="0.21" <SYNSET CODE="03857483-n"/> SCORE="0.18" <SYNSET CODE="03857483-n"/> SCORE="0.47" CODE="01252343-n"/> 02.10.09 | Computer Science Department | Ubiquitous Knowledge Processing Lab

  5. 1. Increase precision of WSD • Four different index types: UBC NUS <SYNSET SCORE="0.32" <SYNSET • UBCBest CODE="00735486-n"/> SCORE="0.82" • NUSBest <SYNSET CODE="00735486-n"/> SCORE="0.21" <SYNSET CODE="03857483-n"/> SCORE="0.18" <SYNSET CODE="03857483-n"/> SCORE="0.47" CODE="01252343-n"/> 02.10.09 | Computer Science Department | Ubiquitous Knowledge Processing Lab

  6. 1. Increase precision of WSD • Four different index types: UBC NUS <SYNSET SCORE="0.32" <SYNSET • UBCBest CODE="00735486-n"/> SCORE="0.82" • NUSBest <SYNSET CODE="00735486-n"/> • CombBest SCORE="0.21" <SYNSET CODE="03857483-n"/> SCORE="0.18" <SYNSET CODE="03857483-n"/> SCORE="0.47" CODE="01252343-n"/> 02.10.09 | Computer Science Department | Ubiquitous Knowledge Processing Lab

  7. 1. Increase precision of WSD • Four different index types: UBC NUS <SYNSET SCORE="0.32" <SYNSET • UBCBest CODE="00735486-n"/> SCORE="0.82" • NUSBest <SYNSET CODE="00735486-n"/> • CombBest SCORE="0.21" <SYNSET CODE="03857483-n"/> SCORE="0.18" <SYNSET CODE="03857483-n"/> SCORE="0.47" CODE="01252343-n"/> 02.10.09 | Computer Science Department | Ubiquitous Knowledge Processing Lab

  8. 1. Increase precision of WSD • Four different index types: UBC NUS <SYNSET SCORE="0.32" <SYNSET • UBCBest CODE="00735486-n"/> SCORE="0.82" • NUSBest <SYNSET CODE="00735486-n"/> • CombBest SCORE="0.21" <SYNSET CODE="03857483-n"/> SCORE="0.18" <SYNSET CODE="03857483-n"/> SCORE="0.47" CODE="01252343-n"/> 0.82 + 0.32 = 1.14 0.18 + 0.21 = 0.39 02.10.09 | Computer Science Department | Ubiquitous Knowledge Processing Lab

  9. 1. Increase precision of WSD • Four different index types: UBC NUS <SYNSET SCORE="0.32" <SYNSET • UBCBest CODE="00735486-n"/> SCORE="0.82" • NUSBest <SYNSET CODE="00735486-n"/> • CombBest SCORE="0.21" <SYNSET CODE="03857483-n"/> SCORE="0.18" <SYNSET CODE="03857483-n"/> SCORE="0.47" CODE="01252343-n"/> CombBest 0.82 + 0.32 = 1.14 0.18 + 0.21 = 0.39 02.10.09 | Computer Science Department | Ubiquitous Knowledge Processing Lab

  10. 1. Increase precision of WSD • Four different index types: UBC NUS <SYNSET SCORE="0.32" <SYNSET • UBCBest CODE="0111222-n "/> SCORE="0.82" • NUSBest <SYNSET CODE="00735486-n"/> • CombBest SCORE="0.21" <SYNSET CODE=„0333444-n"/> • CombBest+ SCORE="0.18" <SYNSET CODE="03857483-n"/> SCORE="0.47" CODE="01252343-n"/> 02.10.09 | Computer Science Department | Ubiquitous Knowledge Processing Lab

  11. 1. Increase precision of WSD • Four different index types: UBC NUS <SYNSET SCORE="0.32" <SYNSET • UBCBest CODE="0111222-n "/> SCORE="0.82" • NUSBest <SYNSET CODE="00735486-n"/> • CombBest SCORE="0.21" <SYNSET CODE=„0333444-n"/> • CombBest+ SCORE="0.18" <SYNSET CODE="03857483-n"/> SCORE="0.47" CODE="01252343-n"/> • Terrier, version 2.1 • Multi field indices: token, lemma, sense (UBCBest, NUSBest, CombBest, CombBest+) 02.10.09 | Computer Science Department | Ubiquitous Knowledge Processing Lab

  12. 2. Combination of Retrieval Models Probabilistic model + Translation-based Query expansion: model: Divergence From Randomness Monolingual translation-based model (TM): BM25 model (DFR_BM25): • Motivation: • address the lexical gap problem • learn translation probabilities between terms trained on Kullback-Leibler model (KL): parallel dataset: dictionary and encyclopedic definitions • 10 terms out of 3 top ranked docs • „the translation probability reflects the association between query term and document term” • Usage: • trained model recently successfully applied by Bernhard&Gurevych (2009) for answer finding • trained on token 02.10.09 | Computer Science Department | Ubiquitous Knowledge Processing Lab

  13. 2. Combination of Retrieval Models Probabilistic model + Translation-based Query expansion: model: Divergence From Randomness Monolingual translation-based model (TM): BM25 model (DFR_BM25): • Motivation: • address the lexical gap problem • learn translation probabilities between terms trained on Kullback-Leibler model (KL): parallel dataset: dictionary and encyclopedic definitions • 10 terms out of 3 top ranked docs • „the translation probability reflects the association between query term and document term” • Usage: • trained model recently successfully applied by Bernhard&Gurevych (2009) for answer finding • trained on token 02.10.09 | Computer Science Department | Ubiquitous Knowledge Processing Lab

  14. 2. Combination of Retrieval Models Probabilistic model + Translation-based Query expansion: model: Divergence From Randomness Monolingual translation-based model (TM): BM25 model (DFR_BM25): • Motivation: • address the lexical gap problem • learn translation probabilities between terms trained on Kullback-Leibler model (KL): parallel dataset: dictionary and encyclopedic definitions • 10 terms out of 3 top ranked docs • „the translation probability reflects the association between query term and document term” • Usage: • trained model recently successfully applied by Bernhard&Gurevych (2009) for answer finding • trained on token 02.10.09 | Computer Science Department | Ubiquitous Knowledge Processing Lab

  15. 2. Combination of Retrieval Models • Hypothesis: probabilistic and translation-based models retrieve different sets of relevant documents 1. ….. 1. ….. TM DFR_BM25 + KL 2. ….. 2. ….. 3. ….. 3. ….. token token lemma 1. ….. 2. ….. sense 3. ….. 02.10.09 | Computer Science Department | Ubiquitous Knowledge Processing Lab

  16. 2. Combination of Retrieval Models • Hypothesis: probabilistic and translation-based models retrieve different sets of relevant documents 1. ….. 1. ….. TM DFR_BM25 + KL 2. ….. 2. ….. 3. ….. 3. ….. token token lemma 1. ….. 2. ….. sense 3. ….. A) normalization: r norm (i) = (r orig (i) – r min ) / (r max – r min ) B) CombSUM by Fox&Shaw(1994): r comb (i) = SUM(Individual r norm (i)) 02.10.09 | Computer Science Department | Ubiquitous Knowledge Processing Lab

  17. Extrinsic evaluation: sense index types • Retrieval based on indexed senses (DFR_BM25 +KL): Index type MAP (training) MAP (test) UBCBest 0.2514 0.2636 NUSBest 0.2930 0.3473 CombBest 0.2921 0.3313 CombBest+ 0.3011 0.3551 • CombBest+ outperforms CombBest • Focus on „combined“ indices 02.10.09 | Computer Science Department | Ubiquitous Knowledge Processing Lab

  18. Extrinsic evaluation: sense index types • Retrieval based on indexed senses (DFR_BM25 +KL): Index type MAP (training) MAP (test) UBCBest 0.2514 0.2636 NUSBest 0.2930 0.3473 CombBest 0.2921 0.3313 CombBest+ 0.3011 0.3551 • CombBest+ outperforms CombBest • Focus on „combined“ indices 02.10.09 | Computer Science Department | Ubiquitous Knowledge Processing Lab

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend