from from ir wsd ir wsd to to ir wsd ir wsd
play

From From IR WSD IR WSD to to IR WSD IR WSD Julio Gonzalo - PowerPoint PPT Presentation

From From IR WSD IR WSD to to IR WSD IR WSD Julio Gonzalo Julio Gonzalo UNED UNED IR @ UNED: WSD IR @ UNED: WSD initial motivation initial motivation 1997: EuroWordNet: 1997: EuroWordNet: lets lets


  1. From From → IR WSD → IR WSD to to ← IR WSD ← IR WSD Julio Gonzalo Julio Gonzalo UNED UNED

  2. → IR @ UNED: WSD → IR @ UNED: WSD initial motivation initial motivation � 1997: EuroWordNet: 1997: EuroWordNet: let’s let’s use use it it! ! � � 1998: (manual) 1998: (manual) indexing with indexing with synsets +29% synsets +29% � � 1999: 1999: Sanderson pseudo Sanderson pseudo- -senses senses vs. vs. � WordNet synsets (EMNLP) WordNet synsets (EMNLP) � 1999: WSD versus 1999: WSD versus first sense heuristic first sense heuristic � (SIGLEX) (SIGLEX) � 2000: ITEM conceptual 2000: ITEM conceptual search engine search engine �

  3. WSD strategy Conceptual versus textual indexing

  4. ITEM search engine search engine ITEM � Scalable to several languages Scalable to several languages � � Conceptual Conceptual query expansion query expansion � � Translations via hyperonym relations Translations via hyperonym relations (e.g (e.g governor’s governor’s � race) ) race but but � Granularity Granularity � � Indexing units Indexing units versus versus translation units translation units � – Words Words are are not good for translation not good for translation – » (té cargado/ » (té cargado/strong strong tea) tea) – Phrases Phrases are are not good for indexing not good for indexing – » “ » “word word+ +sense sense+ +disambiguation disambiguation”/“ ”/“sense tagging sense tagging” ” → Is Word Sense Disambiguation an issue for the semantic web?

  5. Website Term Browser QUERY EXPLORE DOCUMENT EXPLORE PHRASE RECONSULT WITH PHRASE

  6. Website Term Browser WTB Evaluation Evaluation WTB • 1523 sessions with interaction • average 5.11 actions per session • explore phrase used in 65.13% sessions All queries 1 word queries >1 word queries First action DOC 40.70% 45.49% 37.30% after QUERY PHRASE 51.14% 45.65% 55.05% RECONSULT 8.141% 8.846% 7.640% Last action Before ending QUERY 48.74% 53.38% 45.15% Session with PHRASE 42.95% 40.85% 44.57% explore DOC RECONSULT 8.306% 5.764% 10.27%

  7. Is WSD easier than MT/CLIR? abortion aborto Corpus evidence abortion issue •tema del aborto tema •asunto del aborto issue •asuntos como el aborto número •asuntos del aborto asunto •temas como el aborto edición •asunto aborto emisión Alignment without parallel corpora abortion issue tema del aborto

  8. Results on Results on CLEF comparable corpus CLEF comparable corpus Spanish Size # Phrases Phrases # Aligned Aligned Size # # 2 6,577,763 2,004,760 2 6,577,763 2,004,760 3 7,623,168 252,795 3 7,623,168 252,795 English Size # Phrases Phrases # Aligned Aligned Size # # 2 3,830,663 1,456,140 2 3,830,663 1,456,140 3 3,058,698 198,956 3 3,058,698 198,956

  9. Results on CLEF corpus CLEF corpus Results on 2 lemmas Algorithm Random Selection EN ES EN ES EN ES EN ES + frequent frequent .83 .80 .02 .02 + .83 .80 .02 .02 - frequent frequent .66 .54 .02 .02 - .66 .54 .02 .02 3 lemmas Algorithm Random Selection EN ES EN ES EN ES EN ES + frequent frequent .94 .80 .004 .005 + .94 .80 .004 .005 - frequent frequent .81 .62 .004 .004 - .81 .62 .004 .004

  10. Noun Phrase translation 1) Select aligned sub-phrase with most frequent translation 2) discard overlapping sub-phrases 3) iterate. advances in treatment of a wide variety of diseases advances in treatment advances in treatment treatment of a wide treatment of a wide wide variety wide variety variety of disea variety of disease ses

  11. advances in treatment of a wide variety of diseases advances in treatment advances in treatment treatment of a wide treatment of a wide wide variety wide variety variety of disea variety of disease variety of disea variety of disease ses ses tipo de enfermedades tipo de enfermedades

  12. advances in treatment of a wide variety of diseases advances in treatment advances in treatment treatment of a wide treatment of a wide wide variety (amplio) wide variety (amplio) (amplio) wide variety wide variety (amplio) variety of disea variety of disease variety of disea variety of disease ses ses tipo de enfermedades tipo de enfermedades

  13. advances in treatment of a wide variety of diseases advances advances in in in treatment treatment treatment advances advances in treatment treatment of a wide treatment of a wide wide variety (amplio) wide variety (amplio) (amplio) wide variety wide variety (amplio) variety of disea variety of disease variety of disea variety of disease ses ses avances en el trat avances en el tratamiento amiento tipo de enfermedades tipo de enfermedades

  14. advances in treatment of a wide variety of diseases advances advances in in in treatment treatment treatment advances advances in treatment treatment of treatment of treatment of treatment of a a a wide a wide (amplio) wide wide (amplio) (amplio) (amplio) wide variety (amplio) wide variety (amplio) (amplio) wide variety wide variety (amplio) variety of disea variety of disease variety of disea variety of disease ses ses avances en el trat avances en el tratamiento amiento tipo de enfermedades tipo de enfermedades

  15. advances in treatment of a wide variety of diseases advances advances in in in treatment treatment treatment advances advances in treatment treatment of treatment of treatment of treatment of a a a wide a wide wide wide wide variety wide variety wide variety wide variety variety of disea variety of disease variety of disea variety of disease ses ses avances en el tr avances en el tratamiento atamiento amplio tipo de enfermedades tipo de enfermedades

  16. Is this document relevant? Source: Oard 2000

  17. Systran Systran UNED @ iCLEF’2001

  18. Noun phrases Noun phrases UNED @ iCLEF’2001

  19. UNED @ iCLEF’2001 Results Results System Precision Recall System Precision Recall Systran MT MT 0.48 0.22 Systran 0.48 0.22 UNED NPs NPs 0.47 (- -2%) 2%) 0.34 (+52%) UNED 0.47 ( 0.34 (+52%) cf. U. Maryland experiment: word-by-word translation substantially worse than Systran.

  20. UNED @ iCLEF’2002 CLIR Query formulation Query formulation CLIR Reference system UNED system system Reference system UNED � Assisted word Assisted word- -by by- - � Assisted formulation Assisted formulation � � word translation. . by phrases phrases. . word translation by � Automatic translation Automatic translation � using alignment. . using alignment

  21. UNED @ iCLEF’2002 UNED query formulation query formulation UNED

  22. UNED relevance relevance feedback feedback UNED

  23. UNED relevance relevance feedback feedback UNED

  24. UNED @ iCLEF’2002 Results Results System F α= System F α= 0 0.8 .8 Reference .23 Reference .23 UNED .37 (+65%) UNED .37 (+65%) Statistical significance: p< 0.05 linear mixed-effects model + ANOVA.

  25. UNED @ iCLEF’2002 Initial Query formulation Initial Query formulation System Average time System Average time Reference 286 s. Reference 286 s. UNED 44 s. UNED 44 s.

  26. UNED @ iCLEF’2002 Initial query formulation Initial query formulation System P @ 20 System P @ 20 Reference .19 Reference .19 UNED .29 UNED .29

  27. And what about WSD? WSD? And what about � Supervised systems have little to Supervised systems have little to be be � supervised with... ... supervised with – Research on unsupervised systems Research on unsupervised systems (Senseval 2) (Senseval 2) – – Solve the acquisition bottleneck of supervised Solve the acquisition bottleneck of supervised – systems: : obtain obtain training training instances instances systems automatically. . automatically � Better understanding of the problem Better understanding of the problem: : sense sense � inventories, , test test suites, polysemy. suites, polysemy. inventories

  28. WSD is harder than the is harder than the WSD applications applications → WSD: IR → � IR WSD: automatic assignment of automatic assignment of web web � directories to word senses ( (Computational Computational directories to word senses Linguistics, , to appear to appear) ) Linguistics → WSD: use MT → � MT WSD: use aligned phrases for partial aligned phrases for partial � disambiguation (no (no need for parallel need for parallel disambiguation corpora!) ( !) (work work in in progress progress) ) corpora � WSD WSD: : go to the basics go to the basics: : study sense study sense � inventories, , and and polysemy polysemy distinctions for distinctions for inventories clustering (SIGLEX 00, 02) (SIGLEX 00, 02) clustering

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend