WSD Word Sense Disambiguation: Determine from context (or - - PDF document

wsd
SMART_READER_LITE
LIVE PREVIEW

WSD Word Sense Disambiguation: Determine from context (or - - PDF document

WSD Word Sense Disambiguation: Determine from context (or otherwise) what Word Sense Disambiguation sense is intended a particular word Useful, in particular, in Query and Search IR-related tasks (continued) Several


slide-1
SLIDE 1

1

Word Sense Disambiguation

(continued)

WSD

  • Word Sense Disambiguation:

– Determine from context (or otherwise) what sense is intended a particular word – Useful, in particular, in

  • Query and Search
  • IR-related tasks

– Several types of methods for WSD:

  • Dictionary-based
  • Unsupervised
  • Supervised

Dictionary-Based

  • WordNet, Longman’s, Roget’s, other

MRDs and MRTs

  • We’ve seen some with WordNet
  • Chen et al 1998 discusses use of MRDs,

and to a lesser extent, MRTs

– (More on creation of MRTs from MRDs

Unsupervised

  • Cluster vectors representing ambiguous words into

groups

  • Methods usually involve starting with a pre-determined

number of senses

  • Clusters “merged” with each iteration until desired #

achieved

  • Similarity metric used to discern the senses
  • Some difficulty in discerning senses based on clusters

(not necessarily one-to-one)

  • When do you decide

– when a cluster constitutes a sense – when should you stop “merging” clusters

  • Schuetze 98 shows, however, that unsupervised

methods can achieve a high degree of success (compared to supervised)

Unsupervised: Feature Vectors

  • If each word is represented by a feature

vector

– What constitutes a feature?

  • Features in some way represent

surrounding context

  • Can be

– Collocational – Co-occurence

Collocational

  • Position-specific information regarding the

target lexical item and its neighbors

  • A “window” surrounding the target

he sat on the bank of the river and watched the currents

  • POS and words surrounding target

encoded:

[on, IN, the, DT, of, IN, the, DT]

slide-2
SLIDE 2

2

Collocational

  • Example from J&M:

An electric guitar and bass player stand

  • ff to one side, not really part of the…
  • [guitar, NN, and, CJC, player, NN, stand,

VB]

Co-occurence

  • Co-occurrence of target and other content

bearing words in context

– Larger window, includes content words – Fixed set of content words (could be determined automatically by frequency) – Each content word mapped to component in a vector

he sat on the bank of the river and watched the currents

– Vector could contain components for “river” and “currents”

Co-occurence

  • Example from J&M:

An electric guitar and bass player stand

  • ff to one side, not really part of the…
  • Most common words co-occurring with

bass are: fishing, big, sound, player, fly, rod, pound, double, runs, playing, guitar, band

  • Vector for the above context:

[0,0,0,1,0,0,0,0,0,0,1,0]

Supervised

  • In a training corpus: Each occurrence of a

potentially ambiguous word is hand- tagged with the appropriate sense

  • Tagged sense is appropriate to the context
  • Machine learning approach is used to

discern what sense is most appropriate for a given context: sense = argmax P(sense|context)

  • The trained model run over raw text
  • Bayesian classifiers a frequent approach

Supervised: Bayesian WSD

  • Bayes decision rule:

Decide s’ if P(s’|c) > P(sk|c) for sk ≠ s’

  • Minimizes probability of error since it chooses

sense with highest conditional probability

  • Sequence of decisions thus made will also be

quite low

  • Bayesian WSD: Look at the context, the content

words in a large window, to try to determine the most appropriate sense for the target word

  • Problem: we don’t necessarily know P(sk|c).

More likely to know: P(c|sk). Why?

Bayesian WSD

  • Apply Bayes Rule:

P(sk) = prior probability of sense sk

– what’s the probability that we have sk not knowing context

P(c|sk) = given a sense sk, what’s the probability of this context? P(c) = probability of this context = 1 ) ( ) ( ) | ( ) | (

k k k

s P c P s c P c s P =

slide-3
SLIDE 3

3

Bayesian WSD

  • For Bayesian Classification, we want to

maximize

  • Thus, choose the s’ where:

)] ( ) | ( [log max arg ' ) ( ) | ( max arg '

k k sk k k sk

s P s c P s s P s c P s + = =

Bayesian WSD

  • How do we represent c?
  • As a bag of words!
  • Each word a feature used to represent

part of the context

  • Gale et al 1992 discuss Bayes Classifier,

namely a Naïve Bayes Classifier

  • Naïve Bayes Assumption: attributes of

context are independent

  • Is this a valid assumption?

Bayesian WSD

  • Naïve Bayes Assumption makes processing

easier

  • Decision rule for Naïve Bayes:

Decide s’ if

))] | ( ( log ) ( [log max arg '

_ _ k j c in v k sk

s v P s P s

j

+ =