WSD Word Sense Disambiguation: Determine from context (or - PDF document

WSD • Word Sense Disambiguation: – Determine from context (or otherwise) what Word Sense Disambiguation sense is intended a particular word – Useful, in particular, in • Query and Search • IR-related tasks (continued) – Several types of methods for WSD: • Dictionary-based • Unsupervised • Supervised Dictionary-Based Unsupervised • Cluster vectors representing ambiguous words into • WordNet, Longman’s, Roget’s, other groups MRDs and MRTs • Methods usually involve starting with a pre-determined number of senses • We’ve seen some with WordNet • Clusters “merged” with each iteration until desired # achieved • Chen et al 1998 discusses use of MRDs, • Similarity metric used to discern the senses and to a lesser extent, MRTs • Some difficulty in discerning senses based on clusters (not necessarily one-to-one) – (More on creation of MRTs from MRDs • When do you decide – when a cluster constitutes a sense – when should you stop “merging” clusters • Schuetze 98 shows, however, that unsupervised methods can achieve a high degree of success (compared to supervised) Unsupervised: Feature Vectors Collocational • If each word is represented by a feature • Position-specific information regarding the vector target lexical item and its neighbors – What constitutes a feature? • A “window” surrounding the target • Features in some way represent he sat on the bank of the river and surrounding context watched the currents • Can be • POS and words surrounding target – Collocational encoded: – Co-occurence [on, IN, the, DT, of, IN, the, DT] 1

Collocational Co-occurence • Co-occurrence of target and other content • Example from J&M: bearing words in context An electric guitar and bass player stand – Larger window, includes content words off to one side, not really part of the… – Fixed set of content words (could be • [guitar, NN, and, CJC, player, NN, stand, determined automatically by frequency) VB] – Each content word mapped to component in a vector he sat on the bank of the river and watched the currents – Vector could contain components for “river” and “currents” Co-occurence Supervised • In a training corpus: Each occurrence of a • Example from J&M: potentially ambiguous word is hand- An electric guitar and bass player stand tagged with the appropriate sense off to one side, not really part of the… • Tagged sense is appropriate to the context • Most common words co-occurring with • Machine learning approach is used to bass are: fishing, big, sound, player, fly, discern what sense is most appropriate for rod, pound, double, runs, playing, guitar, a given context: band sense = argmax P(sense|context) • Vector for the above context: • The trained model run over raw text [0,0,0,1,0,0,0,0,0,0,1,0] • Bayesian classifiers a frequent approach Supervised: Bayesian WSD Bayesian WSD • Bayes decision rule: • Apply Bayes Rule: Decide s ’ if P( s ’|c) > P( s k |c) for s k ≠ s’ • Minimizes probability of error since it chooses P c s ( | ) P s c = k P s ( | ) ( ) sense with highest conditional probability k k P c ( ) • Sequence of decisions thus made will also be P( s k ) = prior probability of sense s k quite low – what’s the probability that we have s k not knowing • Bayesian WSD: Look at the context, the content context words in a large window, to try to determine the most appropriate sense for the target word P( c | s k ) = given a sense s k , what’s the probability of this context? • Problem: we don’t necessarily know P( s k |c). More likely to know: P(c| s k ). Why? P( c ) = probability of this context = 1 2

Bayesian WSD Bayesian WSD • How do we represent c ? • For Bayesian Classification, we want to maximize • As a bag of words! • Each word a feature used to represent part of the context • Gale et al 1992 discuss Bayes Classifier, • Thus, choose the s ’ where: namely a Naïve Bayes Classifier = s P c s P s ' arg max ( | ) ( ) • Naïve Bayes Assumption: attributes of k k sk context are independent s = P c s + P s ' arg max [log ( | ) ( )] k k • Is this a valid assumption? sk Bayesian WSD • Naïve Bayes Assumption makes processing easier • Decision rule for Naïve Bayes: Decide s ’ if ∑ s = P s + P v s ' arg max [log ( ) log ( ( | ))] k j k v in c _ _ sk j 3

WSD Word Sense Disambiguation: Determine from context (or - PDF document

WSD Word Sense Disambiguation: Determine from context (or otherwise) what Word Sense Disambiguation sense is intended a particular word Useful, in particular, in Query and Search IR-related tasks (continued) Several

From From IR WSD IR WSD to to IR WSD IR WSD Julio Gonzalo Julio Gonzalo

Word Sense Disambiguation Unsupervised WSD Modern WSD L645 / B659 (Some material from Jurafsky

Using UMLS CUIs for WSD in the Biomedical Domain Bridget T. McInnes Ted Pedersen and John

Unsupervised Methods for NLP WSD Samuel Brody Department of Biomedical Informatics Columbia

Inducing Interpretable Word Senses for WSD and Enrichment of Lexical Resources Overview Jan 11,

Word Sense Disambiguation (WSD) Based on Foundations of Statistical NLP by C. Manning &

Outline of NAP DAM Presentation: Brief overview Six workshops 1. Sample Sites 2.

WSD Athletics and Actitivies 2016 Board Presentation MISSION STATEMENT WOODLAND SCHOOL

The New Woodland Middle School State of our Union A report for WSD Board -- 3.28.16

Identifying the barriers and drivers to the uptake of WSD in South Africa David Ellis WRC

Word Sense Disambiguation Word Sense Disambiguation (WSD) Given A

1 Learning for WSD WSD line Corpus Assume part-of-speech (POS), e.g. noun, verb,

CSCI 5832 Natural Language Processing Jim Martin Lecture 21 4/10/08 1 Today 4/8 Finish

WSD for n -best reranking and local language modeling in SMT Marianna Apidianaki, Guillaume

Lexical Semantics & WSD Computertaalkunde December 8, 2014

Computational Semantics and Pragmatics Autumn 2012 Raquel Fernndez Institute for Logic,

Pattern Recognition: An Overview Prof. Richard Zanibbi Pattern Recognition (One) Definition

Decision Making Probabilistic model Known Unknown Bayes Decision Supervised Unsupervised

TOWARDS MULTI-INSTRUMENT DRUM TRANSCRIPTION Richard Vogl 1,2 , Gerhard Widmer 2 , Peter Knees 1

Structured training for large-vocabulary chord recognition Brian McFee* & Juan Pablo Bello

Introduction to Machine Learning Machine Perception An Example Pattern Recognition Systems The

Supervised Classification with the Perceptron CMSC 470 Marine Carpuat Slides credit: Hal Daume

MATHEMATICS AND SOCIAL JUSTICE MODULES Hyman Bass, Elena Crosley, and Matthew Dahlgren

Spectral zeta function & quantum statistical mechanics on Sierpinski carpets Joe P. Chen

WSD Word Sense Disambiguation: Determine from context (or - PDF document

WSD Word Sense Disambiguation: Determine from context (or otherwise) what Word Sense Disambiguation sense is intended a particular word Useful, in particular, in Query and Search IR-related tasks (continued) Several

From From IR WSD IR WSD to to IR WSD IR WSD Julio Gonzalo Julio Gonzalo

Word Sense Disambiguation Unsupervised WSD Modern WSD L645 / B659 (Some material from Jurafsky

Using UMLS CUIs for WSD in the Biomedical Domain Bridget T. McInnes Ted Pedersen and John

Unsupervised Methods for NLP WSD Samuel Brody Department of Biomedical Informatics Columbia

Inducing Interpretable Word Senses for WSD and Enrichment of Lexical Resources Overview Jan 11,

Word Sense Disambiguation (WSD) Based on Foundations of Statistical NLP by C. Manning &amp;

Outline of NAP DAM Presentation: Brief overview Six workshops 1. Sample Sites 2.

WSD Athletics and Actitivies 2016 Board Presentation MISSION STATEMENT WOODLAND SCHOOL

The New Woodland Middle School State of our Union A report for WSD Board -- 3.28.16

Identifying the barriers and drivers to the uptake of WSD in South Africa David Ellis WRC

Word Sense Disambiguation Word Sense Disambiguation (WSD) Given A

1 Learning for WSD WSD line Corpus Assume part-of-speech (POS), e.g. noun, verb,

CSCI 5832 Natural Language Processing Jim Martin Lecture 21 4/10/08 1 Today 4/8 Finish

WSD for n -best reranking and local language modeling in SMT Marianna Apidianaki, Guillaume

Lexical Semantics &amp; WSD Computertaalkunde December 8, 2014

Computational Semantics and Pragmatics Autumn 2012 Raquel Fernndez Institute for Logic,

Pattern Recognition: An Overview Prof. Richard Zanibbi Pattern Recognition (One) Definition

Decision Making Probabilistic model Known Unknown Bayes Decision Supervised Unsupervised

TOWARDS MULTI-INSTRUMENT DRUM TRANSCRIPTION Richard Vogl 1,2 , Gerhard Widmer 2 , Peter Knees 1

Structured training for large-vocabulary chord recognition Brian McFee* &amp; Juan Pablo Bello

Introduction to Machine Learning Machine Perception An Example Pattern Recognition Systems The

Supervised Classification with the Perceptron CMSC 470 Marine Carpuat Slides credit: Hal Daume

MATHEMATICS AND SOCIAL JUSTICE MODULES Hyman Bass, Elena Crosley, and Matthew Dahlgren

Spectral zeta function &amp; quantum statistical mechanics on Sierpinski carpets Joe P. Chen

Word Sense Disambiguation (WSD) Based on Foundations of Statistical NLP by C. Manning &

Lexical Semantics & WSD Computertaalkunde December 8, 2014

Structured training for large-vocabulary chord recognition Brian McFee* & Juan Pablo Bello

Spectral zeta function & quantum statistical mechanics on Sierpinski carpets Joe P. Chen