wsd
play

WSD Word Sense Disambiguation: Determine from context (or - PDF document

WSD Word Sense Disambiguation: Determine from context (or otherwise) what Word Sense Disambiguation sense is intended a particular word Useful, in particular, in Query and Search IR-related tasks (continued) Several


  1. WSD • Word Sense Disambiguation: – Determine from context (or otherwise) what Word Sense Disambiguation sense is intended a particular word – Useful, in particular, in • Query and Search • IR-related tasks (continued) – Several types of methods for WSD: • Dictionary-based • Unsupervised • Supervised Dictionary-Based Unsupervised • Cluster vectors representing ambiguous words into • WordNet, Longman’s, Roget’s, other groups MRDs and MRTs • Methods usually involve starting with a pre-determined number of senses • We’ve seen some with WordNet • Clusters “merged” with each iteration until desired # achieved • Chen et al 1998 discusses use of MRDs, • Similarity metric used to discern the senses and to a lesser extent, MRTs • Some difficulty in discerning senses based on clusters (not necessarily one-to-one) – (More on creation of MRTs from MRDs • When do you decide – when a cluster constitutes a sense – when should you stop “merging” clusters • Schuetze 98 shows, however, that unsupervised methods can achieve a high degree of success (compared to supervised) Unsupervised: Feature Vectors Collocational • If each word is represented by a feature • Position-specific information regarding the vector target lexical item and its neighbors – What constitutes a feature? • A “window” surrounding the target • Features in some way represent he sat on the bank of the river and surrounding context watched the currents • Can be • POS and words surrounding target – Collocational encoded: – Co-occurence [on, IN, the, DT, of, IN, the, DT] 1

  2. Collocational Co-occurence • Co-occurrence of target and other content • Example from J&M: bearing words in context An electric guitar and bass player stand – Larger window, includes content words off to one side, not really part of the… – Fixed set of content words (could be • [guitar, NN, and, CJC, player, NN, stand, determined automatically by frequency) VB] – Each content word mapped to component in a vector he sat on the bank of the river and watched the currents – Vector could contain components for “river” and “currents” Co-occurence Supervised • In a training corpus: Each occurrence of a • Example from J&M: potentially ambiguous word is hand- An electric guitar and bass player stand tagged with the appropriate sense off to one side, not really part of the… • Tagged sense is appropriate to the context • Most common words co-occurring with • Machine learning approach is used to bass are: fishing, big, sound, player, fly, discern what sense is most appropriate for rod, pound, double, runs, playing, guitar, a given context: band sense = argmax P(sense|context) • Vector for the above context: • The trained model run over raw text [0,0,0,1,0,0,0,0,0,0,1,0] • Bayesian classifiers a frequent approach Supervised: Bayesian WSD Bayesian WSD • Bayes decision rule: • Apply Bayes Rule: Decide s ’ if P( s ’|c) > P( s k |c) for s k ≠ s’ • Minimizes probability of error since it chooses P c s ( | ) P s c = k P s ( | ) ( ) sense with highest conditional probability k k P c ( ) • Sequence of decisions thus made will also be P( s k ) = prior probability of sense s k quite low – what’s the probability that we have s k not knowing • Bayesian WSD: Look at the context, the content context words in a large window, to try to determine the most appropriate sense for the target word P( c | s k ) = given a sense s k , what’s the probability of this context? • Problem: we don’t necessarily know P( s k |c). More likely to know: P(c| s k ). Why? P( c ) = probability of this context = 1 2

  3. Bayesian WSD Bayesian WSD • How do we represent c ? • For Bayesian Classification, we want to maximize • As a bag of words! • Each word a feature used to represent part of the context • Gale et al 1992 discuss Bayes Classifier, • Thus, choose the s ’ where: namely a Naïve Bayes Classifier = s P c s P s ' arg max ( | ) ( ) • Naïve Bayes Assumption: attributes of k k sk context are independent s = P c s + P s ' arg max [log ( | ) ( )] k k • Is this a valid assumption? sk Bayesian WSD • Naïve Bayes Assumption makes processing easier • Decision rule for Naïve Bayes: Decide s ’ if ∑ s = P s + P v s ' arg max [log ( ) log ( ( | ))] k j k v in c _ _ sk j 3

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend