cross lingual word sense disambiguation using wordnets
play

Cross-Lingual Word Sense Disambiguation using WordNets and Context - PowerPoint PPT Presentation

Cross-Lingual Word Sense Disambiguation using WordNets and Context Mapping Priyank Jaini Ankit Agrawal {pjaini,ankitag}@iitk.ac.in Department of Mathematics and Statistics IIT Kanpur Advisor: Prof. Amitabha Mukerjee Date: March 21,2013 What


  1. Cross-Lingual Word Sense Disambiguation using WordNets and Context Mapping Priyank Jaini Ankit Agrawal {pjaini,ankitag}@iitk.ac.in Department of Mathematics and Statistics IIT Kanpur Advisor: Prof. Amitabha Mukerjee Date: March 21,2013

  2. What is Word Sense Disambiguation(WSD)? -assigning the correct sense(meaning/context) to a word in a sentence when it can have multiple meanings -Example: ->I was standing on the bank of river Ganga. ->Mr. Bank owns a bank. Importance and Motivation - Machine translation, Lexicography,semantic interpretation,Information retrieval etc -Hindi lacks resources -Can be used to create/enrich sense-tagged data

  3. Our Approach ● Parallel Corpus and Alignment ● English WSD ● Synset Mapping ● Transfer to Hindi

  4. Methodology and Algorithms Used 1)Parallel Corpus for 2)Alignment of text 3)English WSD on Hindi-English (Emille) using Church and the English text Gale Algorithm 4)Synset mapping using [10] 5)Transfer senses to Hindi text Figure taken from [1]

  5. The English Word Sense Disambiguation (Step-3) - We shall use “WordNet::SenseRelate::AllWords” -Uses Lesk Algorithm for disambiguation -After this step, we would have a sense-tagged English text. - English WordNet would be used for English WSD

  6. Synset Mapping (Step 4) -Takes an English synset as input and produces as output the best matching Hindi Synset -Uses the fact that in WordNet, the first word in a synset best represents the sense of the synset -Hypernymy relation is the basis for finding the best match -In Hypernymy Hierarchies, a weighted formula given in [10] is used to determine the best synset.

  7. Synset Mapping -Candidate synsets: obtained by finding the Hindi translations of the first word in the input synset and then finding the Hindi synsets that contain one or more of these translations in them -Hypernymy hierarchies of these candidate synsets found. They are called candidate hierarchies -Hypernymy hierarchy of the input English synset is also obtained. -For each synset obtained in the English hypernymy hierarchy, hindi translations of all the words occuring in it are found. -These Hindi words are found in the candidate hierarchies. If a match is found, weight of that candidate synset is increased. Initially, the weights are zero. -The total weight for each candidate hierarchy is obtained, and the one with the highest weight is mapped to the English synset.

  8. We are expecting: -Since a parallel aligned corpus is used we should achieve a better accuracy -Would give better results for scenarios where: -An English word is polysemous and it's Hindi equivalent is also polysemous -An English word is monosemous and it's Hindi equivalent is polysemous Limitations - Is valid only for nouns - Not trained for morphological handling

  9. References 1)Debasri Chakrabarti,Dipak Kumar Narayan,Prabhakar Pandey,Pushpak Bhattacharyya.Experiences in building the Indo Word Net-A WordNet for Hindi. 2)Bahareh Sarrafzadeh, Nikolay Yakovets, Nick Cercone, Aijun An. Cross Lingual Word Sense Disambiguation for Languages with Scarce Resources. 3)Els Lefever and Veronique Hoste. SemEval-2010 Task 3:Cross-Lingual Word Sense Disambiguation. 4) Michael Lesk.Automatic sense disambiguation using machine readable dictionaries:how to tell a pine cone from an ice cream cone. In SIGDOC’86: Proceedings of the 5th annual international conference on Systems documentation,pages 24-26, New York, NY, USA, 1986.ACM 5) Satanjeev Banerjee and Ted Pedersen. Extended gloss overlaps as a measure of semantic relatedness. In IJCAI’03, pages 850-810,2003. 6) Els Lefever and Veronique Hoste. Examining the validity of Cross-Lingual Word Sense Disambiguation 7) http://wordnet.princeton.edu/ 8) Roberto Navigli. Word Sense Disambiguation-A Survey. 9) Roberto Navigli and Simone Paolo Ponzetto. BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network. 10) Ramanand, Akshay Ukey, Brahm Kiran Singh, and Pushpak Bhattacharyya. Mapping and structural analysis of multi-lingual wordnets.

  10. Thank You!!

  11. Church and Gale Algorithm -A method for aligning sentences based on a statistical model of character lengths -Uses the fact that longer/shorter sentences in one language tend to be translated into longer/shorter sentences in another language. -The algorithm is a two step process: 1)Paragraph alignment and then 2)Sentence alignment -Based on a probabilistic model -Also, it is language independent, though would have to be tested on Hindi-English. Ref:A Program for Aligning Sentences in Bilingual Corpora, William A Gale and Kenneth W. Church

  12. English WSD:WordNet::SenseRelate::AllWords -Each target word is centered in a balanced window whose size is decided by the user. -The possilble senses of the word are measured for similarity relative to the senses of the surrounding words present in the window in a pairwise fashion -The sense of the word that has the highest score after summing up the pair-wise score is considered the sense of the word. -For finding similarity it uses the 10 measures of relatedness proposed in WordNet::Similarity[http://wn-similarity.sourceforge.net] Lesk Algorithm -Assigns sense to a word by comparing glosses of the surrounding words with the glosses of various senses of the target word. -The sense whose gloss has most number of overlaps is assigned. -Extended Lesk Algorithm uses context hierarchy of WordNet to improve the accuracy.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend