Word Meaning & Word Sense Disambiguation CMSC 723 / LING 723 / - - PowerPoint PPT Presentation

word meaning word
SMART_READER_LITE
LIVE PREVIEW

Word Meaning & Word Sense Disambiguation CMSC 723 / LING 723 / - - PowerPoint PPT Presentation

Word Meaning & Word Sense Disambiguation CMSC 723 / LING 723 / INST 725 M ARINE C ARPUAT marine@cs.umd.edu T oday Representing word meaning Word sense disambiguation as supervised classification Word sense disambiguation


slide-1
SLIDE 1

Word Meaning & Word Sense Disambiguation

CMSC 723 / LING 723 / INST 725 MARINE CARPUAT

marine@cs.umd.edu

slide-2
SLIDE 2

T

  • day
  • Representing word meaning
  • Word sense disambiguation as supervised

classification

  • Word sense disambiguation without

annotated examples

slide-3
SLIDE 3

Drunk gets nine year in violin case.

http://www.ling.upenn.edu/ ˜beatrice/humor/headlines.html

slide-4
SLIDE 4

How do we know that a word (lemma) has distinct senses?

  • Linguists often design

tests for this purpose

  • e.g., zeugma

combines distinct senses in an uncomfortable way

Which flight serves breakfast? Which flights serve Tuscon? *Which flights serve breakfast and Tuscon?

slide-5
SLIDE 5

Where can we look up the meaning of words?

  • Dictionary?
slide-6
SLIDE 6

Word Senses

  • “Word sense” = distinct meaning of a word
  • Same word, different senses

– Homonyms (homonymy): unrelated senses; identical

  • rthographic form is coincidental

– Polysemes (polysemy): related, but distinct senses – Metonyms (metonymy): “stand in”, technically, a sub- case of polysemy

  • Different word, same sense

– Synonyms (synonymy)

slide-7
SLIDE 7
  • Homophones: same pronunciation,

different orthography, different meaning

– Examples: would/wood, to/too/two

  • Homographs: distinct senses, same
  • rthographic form, different pronunciation

– Examples: bass (fish) vs. bass (instrument)

slide-8
SLIDE 8

Relationship Between Senses

  • IS-A relationships

– From specific to general (up): hypernym (hypernymy) – From general to specific (down): hyponym (hyponymy)

  • Part-Whole relationships

– wheel is a meronym of car (meronymy) – car is a holonym of wheel (holonymy)

slide-9
SLIDE 9

WordNet: a lexical database for English

https://wordnet.princeton.edu/

  • Includes most English nouns, verbs, adjectives,

adverbs

  • Electronic format makes it amenable to

automatic manipulation: used in many NLP applications

  • “WordNets” generically refers to similar

resources in other languages

slide-10
SLIDE 10

WordNet: History

  • Research in artificial intelligence:

– How do humans store and access knowledge about concept? – Hypothesis: concepts are interconnected via meaningful relations – Useful for reasoning

  • The WordNet project started in 1986

– Can most (all?) of the words in a language be represented as a semantic network where words are interlinked by meaning? – If so, the result would be a large semantic network…

slide-11
SLIDE 11

Synonymy in WordNet

  • WordNet is organized in terms of “synsets”

– Unordered set of (roughly) synonymous “words” (or multi-word phrases)

  • Each synset expresses a distinct

meaning/concept

slide-12
SLIDE 12

WordNet: Example

Noun {pipe, tobacco pipe} (a tube with a small bowl at one end; used for smoking tobacco) {pipe, pipage, piping} (a long tube made of metal or plastic that is used to carry water or oil or gas etc.) {pipe, tube} (a hollow cylindrical shape) {pipe} (a tubular wind instrument) {organ pipe, pipe, pipework} (the flues and stops on a pipe organ) Verb {shriek, shrill, pipe up, pipe} (utter a shrill cry) {pipe} (transport by pipeline) “pipe oil, water, and gas into the desert” {pipe} (play on a pipe) “pipe a tune” {pipe} (trim with piping) “pipe the skirt”

What do you think of the sense granularity?

slide-13
SLIDE 13

The “Net” Part of WordNet

{ v e h i c l e } { c

  • n

v e y a n c e ; t r a n s p

  • r

t } { c a r ; a u t

  • ;

a u t

  • m
  • b

i l e ; m a c h i n e ; m

  • t
  • r

c a r } { c r u i s e r ; s q u a d c a r ; p a t r

  • l

c a r ; p

  • l

i c e c a r ; p r

  • w

l c a r } { c a b ; t a x i ; h a c k ; t a x i c a b ; }

{ m

  • t
  • r

v e h i c l e ; a u t

  • m
  • t

i v e v e h i c l e } { b u m p e r } { c a r d

  • r

} { c a r w i n d

  • w

} { c a r m i r r

  • r

} { h i n g e ; f l e x i b l e j

  • i

n t } { d

  • r

l

  • c

k } { a r m r e s t }

h y p e r

  • n

y m h y p e r

  • n

y m h y p e r

  • n

y m h y p e r

  • n

y m h y p e r

  • n

y m m e r

  • n

y m m e r

  • n

y m m e r

  • n

y m m e r

  • n

y m

slide-14
SLIDE 14

WordNet 3.0: Size

Part of speech Word form Synsets Noun 117,798 82,115 Verb 11,529 13,767 Adjective 21,479 18,156 Adverb 4,481 3,621 Total 155,287 117,659

http://wordnet.princeton.edu/

slide-15
SLIDE 15

Word Sense

Noun {pipe, tobacco pipe} (a tube with a small bowl at one end; used for smoking tobacco) {pipe, pipage, piping} (a long tube made of metal or plastic that is used to carry water or oil or gas etc.) {pipe, tube} (a hollow cylindrical shape) {pipe} (a tubular wind instrument) {organ pipe, pipe, pipework} (the flues and stops on a pipe organ) Verb {shriek, shrill, pipe up, pipe} (utter a shrill cry) {pipe} (transport by pipeline) “pipe oil, water, and gas into the desert” {pipe} (play on a pipe) “pipe a tune” {pipe} (trim with piping) “pipe the skirt”

From WordNet:

slide-16
SLIDE 16

WO WORD S D SENS NSE E DI DISAM AMBIGU BIGUATIO TION

slide-17
SLIDE 17

Word Sense Disambiguation

  • Task: automatically select the correct sense of a

word – Input: a word in context – Output: sense of the word

– Can be framed as lexical sample (focus on one word type at a time) or all-words (disambiguate all content words in a document)

  • Motivated by many applications:

– Information retrieval – Machine translation – …

slide-18
SLIDE 18

How big is the problem?

  • Most words in English have only one sense

– 62% in Longman’s Dictionary of Contemporary English – 79% in WordNet

  • But the others tend to have several senses

– Average of 3.83 in LDOCE – Average of 2.96 in WordNet

  • Ambiguous words are more frequently used

– In the British National Corpus, 84% of instances have more than

  • ne sense
  • Some senses are more frequent than others
slide-19
SLIDE 19

Ground Truth

  • Which sense inventory do we use?
slide-20
SLIDE 20

Existing Corpora

  • Lexical sample

– line-hard-serve corpus (4k sense-tagged examples) – interest corpus (2,369 sense-tagged examples) – …

  • All-words

– SemCor (234k words, subset of Brown Corpus) – Senseval/SemEval (2081 tagged content words from 5k total words) – …

slide-21
SLIDE 21

Evaluation

  • Intrinsic

– Measure accuracy of sense selection wrt ground truth

  • Extrinsic

– Integrate WSD as part of a bigger end-to-end system, e.g., machine translation or information retrieval – Compare WSD

slide-22
SLIDE 22

Baseline Performance

  • Baseline: most frequent sense

– Equivalent to “take first sense” in WordNet – Does surprisingly well!

62% accuracy in this case!

slide-23
SLIDE 23

Upper Bound Performance

  • Upper bound

– Fine-grained WordNet sense: 75-80% human agreement – Coarser-grained inventories: 90% human agreement possible

slide-24
SLIDE 24

WSD as Supervised Classification

label1 label2 label3 label4 Classifier supervised machine learning algorithm

?

unlabeled document label1? label2? label3? label4?

Testing Training

training data

Feature Functions

slide-25
SLIDE 25

WSD Approaches

  • Depending on use of manually created

knowledge sources

– Knowledge-lean – Knowledge-rich

  • Depending on use of labeled data

– Supervised – Semi- or minimally supervised – Unsupervised

slide-26
SLIDE 26

Simplest WSD algorithm: Lesk’s Algorithm

  • Intuition: note word overlap between context

and dictionary entries

– Unsupervised, but knowledge rich

The bank can guarantee deposits will eventually cover future tuition costs because it invests in adjustable-rate mortgage securities. WordNet

slide-27
SLIDE 27

Lesk’s Algorithm

  • Simplest implementation:

– Count overlapping content words between glosses and context

  • Lots of variants:

– Include the examples in dictionary definitions – Include hypernyms and hyponyms – Give more weight to larger overlaps (e.g., bigrams) – Give extra weight to infrequent words – …

slide-28
SLIDE 28

WSD Accuracy

  • Generally

– Supervised approaches yield ~70-80% accuracy

  • However

– depends on actual word, sense inventory, amount of training data, number of features, etc.

slide-29
SLIDE 29

WO WORD S D SENS NSE E DI DISAM AMBIGU BIGUATIO TION: N: MI MINI NIMI MIZI ZING NG SUPE PERVISION RVISION

slide-30
SLIDE 30

Minimally Supervised WSD

  • Problem: annotations are expensive!
  • Solution 1: “Bootstrapping” or co-training

(Yarowsky 1995)

– Start with (small) seed, learn classifier – Use classifier to label rest of corpus – Retain “confident” labels, add to training set – Learn new classifier – Repeat…

Heuristics (derived from observation):

– One sense per discourse – One sense per collocation

slide-31
SLIDE 31

One Sense per Discourse

A word tends to preserve its meaning across all its

  • ccurrences in a given discourse
  • Gale et al. 1992

– 8 words with two-way ambiguity, e.g. plant, crane, etc. – 98% of the two-word occurrences in the same discourse carry the same meaning

  • Krovetz 1998

– Heuristic true mostly for coarse-grained senses and for homonymy rather than polysemy – Performance of “one sense per discourse” measured on SemCor is approximately 70%

slide-32
SLIDE 32

One Sense per Collocation

A word tends to preserve its meaning when used in the same collocation

– Strong for adjacent collocations – Weaker as the distance between words increases

  • Evaluation:

– 97% precision on words with two-way ambiguity – Again, accuracy depends on granularity:

  • 70% precision on SemCor words
slide-33
SLIDE 33
slide-34
SLIDE 34
slide-35
SLIDE 35
slide-36
SLIDE 36
slide-37
SLIDE 37

Yarowsky’s Method: Stopping

  • Stop when:

– Error on training data is less than a threshold – No more training data is covered

slide-38
SLIDE 38

Yarowsky’s Method: Discussion

  • Advantages

– Accuracy is about as good as a supervised algorithm – Bootstrapping: far less manual effort

  • Disadvantages

– Seeds may be tricky to construct – Works only for coarse-grained sense distinctions – Snowballing error with co-training

slide-39
SLIDE 39

WSD with 2nd language as supervision

  • Problems: annotations are expensive!
  • What’s the “proper” sense inventory?

– How fine or coarse grained? – Application specific?

  • Observation: multiple senses translate to

different words in other languages

– Use the foreign language as the sense inventory – Added bonus: annotations for free! (Using machine translation data)

slide-40
SLIDE 40

T

  • day
  • Representing word senses & word relations

– WordNet

  • Word Sense Disambiguation

– Lesk Algorithm – Supervised classification – Minimizing supervision

  • Co-training: combine 2 views of data
  • Use parallel text
  • Next: general techniques for learning without

annotated training examples

– If needed review probability, expectations