Semantic Distance Jimmy Lin Jimmy Lin The iSchool University of - - PowerPoint PPT Presentation

semantic distance
SMART_READER_LITE
LIVE PREVIEW

Semantic Distance Jimmy Lin Jimmy Lin The iSchool University of - - PowerPoint PPT Presentation

CMSC 723: Computational Linguistics I Session #10 Semantic Distance Jimmy Lin Jimmy Lin The iSchool University of Maryland Wednesday, November 4, 2009 Material drawn from slides by Saif Mohammad and Bonnie Dorr Progression of the Course


slide-1
SLIDE 1

Semantic Distance

CMSC 723: Computational Linguistics I ― Session #10

Jimmy Lin Jimmy Lin The iSchool University of Maryland Wednesday, November 4, 2009

Material drawn from slides by Saif Mohammad and Bonnie Dorr

slide-2
SLIDE 2

Progression of the Course

Words

Finite-state morphology Part-of-speech tagging (TBL + HMM)

Structure

CFGs + parsing (CKY, Earley) N-gram language models

Meaning! Meaning!

slide-3
SLIDE 3

Today’s Agenda

Lexical semantic relations WordNet

  • d et

Computational approaches to word similarity

slide-4
SLIDE 4

Lexical Semantic Relations

slide-5
SLIDE 5

What’s meaning?

Let’s start at the word level… How do you define the meaning of a word?

  • do you de

e t e ea g o a

  • d

Look it up in the dictionary!

Well that really doesn’t help Well, that really doesn t help…

slide-6
SLIDE 6

Approaches to meaning

Truth conditional Semantic network

Se a t c et o

slide-7
SLIDE 7

Word Senses

“Word sense” = distinct meaning of a word Same word, different senses

Sa e

  • d, d e e t se ses

Homonyms (homonymy): unrelated senses; identical orthographic

form is coincidental

E l “fi i l i tit ti ” “ id f i ” f b k

  • Example: “financial institution” vs. “side of river” for bank

Polysemes (polysemy): related, but distinct senses

  • Example: “financial institution” vs. “sperm bank”

M t ( t ) “ t d i ” t h i ll b f

Metonyms (metonymy): “stand in”, technically, a sub-case of

polysemy

  • Examples: author for works or author, building for organization, capital

it f t city for government Different word, same sense

Synonyms (synonymy) Synonyms (synonymy)

slide-8
SLIDE 8

Just to confuse you…

Homophones: same pronunciation, different orthography,

different meaning

Examples: would/wood, to/too/two

Homographs: distinct senses, same orthographic form,

different pronunciation different pronunciation

  • Examples: bass (fish) vs. bass (instrument)
slide-9
SLIDE 9

Relationship Betw een Senses

IS-A relationships

From specific to general (up): hypernym (hypernymy)

  • Example: bird is a hypernym of robin

From general to specific (down): hyponym (hyponymy)

  • Example: robin is a hyponym of bird

p yp y Part-Whole relationships

wheel is a meronym of car (meronymy) car is a holonym of wheel (holonymy)

slide-10
SLIDE 10

WordNet Tour

Material drawn from slides by Christiane Fellbaum

slide-11
SLIDE 11

What is WordNet?

A large lexical database developed and maintained at

Princeton University

Includes most English nouns, verbs, adjectives, adverbs Electronic format makes it amenable to automatic

manipulation: used in many NLP applications

“WordNets” generically refers to similar resources in other

languages

slide-12
SLIDE 12

WordNet: History

Research in artificial intelligence:

How do humans store and access knowledge about concept? Hypothesis: concepts are interconnected via meaningful relations Useful for reasoning

The WordNet project started in 1986 The WordNet project started in 1986

Can most (all?) of the words in a language be represented as a

semantic network where words are interlinked by meaning?

If so, the result would be a large semantic network

slide-13
SLIDE 13

Synonymy in WordNet

WordNet is organized in terms of “synsets”

Unordered set of (roughly) synonymous “words” (or multi-word

phrases)

Each synset expresses a distinct meaning/concept

slide-14
SLIDE 14

WordNet: Example

Noun {pipe, tobacco pipe} (a tube with a small bowl at one end; used for {p p p p } ( smoking tobacco) {pipe, pipage, piping} (a long tube made of metal or plastic that is used to carry water or oil or gas etc.) {pipe tube} (a hollow cylindrical shape) {pipe, tube} (a hollow cylindrical shape) {pipe} (a tubular wind instrument) {organ pipe, pipe, pipework} (the flues and stops on a pipe organ) Verb {shriek, shrill, pipe up, pipe} (utter a shrill cry) {pipe} (transport by pipeline) “pipe oil, water, and gas into the desert” {pipe} (play on a pipe) “pipe a tune” {pipe} (play on a pipe) pipe a tune {pipe} (trim with piping) “pipe the skirt”

Observations about sense granularity?

slide-15
SLIDE 15

The “Net” Part of WordNet

{conveyance; transport} {vehicle}

{bum per} {hinge; flexible joint}

hyperonym hyperonym

{m

  • tor vehicle; autom
  • tive vehicle}

{bum per} {car door} {doorlock}

hyperonym hyperonym meronym meronym meronym

{car; auto; autom

  • bile; m

achine; m

  • torcar}

{car window} {car m irror} {arm rest}

yp y meronym

{cruiser; squad car; patrol car; police car; prowl car} {cab; taxi; hack; taxicab; }

hyperonym hyperonym

slide-16
SLIDE 16

WordNet: Size

Part of speech Word form Synsets Noun 117,798 82,115 Verb 11,529 13,767 Adjective 21,479 18,156 Adverb 4 481 3 621 Adverb 4,481 3,621 Total 155,287 117,659

http://wordnet.princeton.edu/

slide-17
SLIDE 17

MeSH

Medical Subject Headings: another example of a theasuri

http://www.nlm.nih.gov/mesh/MBrowser.html

Thesauri, ontologies, taxonomies, etc.

slide-18
SLIDE 18

Word Similarity

slide-19
SLIDE 19

Intuition of Semantic Similarity

Semantically close

b k

Semantically distant

d t b

bank–money apple–fruit tree–forest doctor–beer painting–January money–river tree–forest bank–river pen–paper money–river apple–penguin nurse–fruit

p p p

run–walk mistake–error pen–river clown–tramway car–wheel car–algebra

19

slide-20
SLIDE 20

Why?

Meaning

The two concepts are close in terms of their meaning

World knowledge

The two concepts have similar properties, often occur together, or

  • ccur in similar contexts
  • ccur in similar contexts

Psychology

We often think of the two concepts together We often think of the two concepts together

20

slide-21
SLIDE 21

Tw o Types of Relations

Synonymy: two words are (roughly) interchangeable Semantic similarity (distance): somehow “related”

Sometimes explicit lexical semantic relationship often not

Sometimes, explicit lexical semantic relationship, often, not

21

slide-22
SLIDE 22

Validity of Semantic Similarity

Is semantic distance a valid linguistic phenomenon? Experiment (Rubenstein and Goodenough, 1965)

pe e t ( ube ste a d Goode oug , 965)

Compiled a list of word pairs Subjects asked to judge semantic distance (from 0 to 4) for each of

the word pairs the word pairs

Results:

Rank correlation between subjects is ~0 9 Rank correlation between subjects is 0.9 People are consistent!

22

slide-23
SLIDE 23

Why do this?

Task: automatically compute semantic similarity between

words

Theoretically useful for many applications:

Detecting paraphrases (i.e., automatic essay grading, plagiarism

detection) detection)

Information retrieval Machine translation …

Solution in search of a problem?

slide-24
SLIDE 24

Types of Evaluations

Intrinsic

Internal to the task itself With respect to some pre-defined criteria

Extrinsic

Impact on end-to-end task

Analogy with cooking…

24

slide-25
SLIDE 25

Evaluation: Correlation w ith Humans

Ask automatic method to rank word pairs in order of

semantic distance

Compare this ranking with human-created ranking Measure correlation

25

slide-26
SLIDE 26

Evaluation: Word-Choice Problems

Identify that alternative which is closest in meaning to the target: g accidental imprison wheedle ferment inadvertent incarcerate writhe meander inadvertent abominate meander inhibit

26

slide-27
SLIDE 27

Evaluation: Malapropisms

Jack withdrew money from the ATM next to the band. Jac t d e

  • ey
  • t e

e t to t e ba d band is unrelated to all of the other words in its context…

27

slide-28
SLIDE 28

Evaluation: Malapropisms

Jack withdrew money from the ATM next to the bank. Jac t d e

  • ey
  • t e

e t to t e ba Wait, you mean bank?

28

slide-29
SLIDE 29

Evaluation: Malapropisms

Actually, semantic distance is a poor technique… What’s a simple, better solution?

at s a s p e, bette so ut o

Even still, task can be used for a fair comparison

29

slide-30
SLIDE 30

Word Similarity: Tw o Approaches

Thesaurus-based

We’ve invested in all these resources… let’s exploit them!

Distributional

Count words in context

slide-31
SLIDE 31

Word Similarity:

Thesaurus-Based Approaches pp

Note: In theory, applicable to any hierarchically-arranged lexical semantic resource, but most commonly applied to WordNet

slide-32
SLIDE 32

Path-Length Similarity

Similarity based on length of path between concepts:

) , ( pathlen log ) , ( sim

2 1 2 1 path

c c c c − = ) ( p g ) (

2 1 2 1 path

32

slide-33
SLIDE 33

Concepts vs. Words

Similarity based on length of path between concepts

) , ( pathlen log ) , ( sim

2 1 2 1 path

c c c c − =

But which sense? Pick closest pair:

) ( p g ) (

2 1 2 1 path

Pick closest pair:

) , ( sim max ) , ( sim

2 1 ) ( senses ) ( senses 2 1

2 2 1 1

c c w w

w c w c ∈ ∈

=

Similar techniques applied to all concept-based metrics

) ( senses

2 2

w c ∈

slide-34
SLIDE 34

Wu-Palmer Method

Similarity based on depth of nodes:

)) , ( LCS ( depth 2 ) ( i

2 1 c

c ×

LCS is the lowest common subsumer

) ( depth ) ( depth )) , ( LCS ( depth 2 ) , ( sim

2 1 2 1 2 1 Palmer

  • Wu

c c c c c c + =

LCS is the lowest common subsumer

depth(c) is the depth of node c in the hierarchy

Explain the behavior of this similarity metric…

p y

What if the LCS is close? Far? What if c1 and c2 are at different levels in the hierarchy?

34

slide-35
SLIDE 35

Edge-Counting Methods: Discussion

Advantages

Simple, intuitive Easy to implement

Major disadvantage:

Assumes each edge has same semantic distance… not the case?

35

slide-36
SLIDE 36

Resnik Method

Probability that a randomly selected word in a corpus is an

instance of concept c: N w c P

c w

∑ ∈

=

) ( words

) ( count ) (

words(c) is the set of words subsumed by concept c N is total number of words in corpus also in thesaurus

Define “information content”:

) ( log ) ( IC c P c − =

Define similarity:

)) , ( LCS ( log ) , ( sim

2 1 2 1 Resnik

c c P c c − =

slide-37
SLIDE 37

Resnik Method: Example

)) , ( LCS ( log ) , ( sim

2 1 2 1 Resnik

c c P c c − =

Explain its behavior…

slide-38
SLIDE 38

Jiang-Conrath Distance

Can we do better than the Resnik method? Intuition (duh?)

tu t o (du )

Commonality: the more A and B have in common, more similar

they are Difference: the more differences between A and B the less similar

Difference: the more differences between A and B, the less similar

they are

Jiang-Conrath Distance:

Note: distance, not similarity!

)) ( log ) ( (log )) , ( LCS ( log 2 ) , ( dist

2 1 2 1 2 1 JC

c P c P c c P c c + − × =

Explain its behavior…

Generally works well

slide-39
SLIDE 39

Thesaurus Methods: Limitations

Measure is only as good as the resource Limited in scope

ted scope

Assumes IS-A relations Works mostly for nouns

Role of context not accounted for Not easily domain-adaptable Resources not available in many languages

39

slide-40
SLIDE 40

Quick Aside: Thesauri Induction

Building thesauri automatically? Pattern-based techniques work really well!

atte based tec ques

  • ea y

e

Co-training between patterns and relations Useful for augmenting/adapting existing resources

slide-41
SLIDE 41

Word Similarity:

Distributional Approaches pp

slide-42
SLIDE 42

Distributional Approaches: Intuition

“You shall know a word by the company it keeps!”

(Firth, 1957)

Intuition:

If two words appear in the same context, then they must be similar Watch out for antonymy!

Basic idea: represent a word w as a feature vector

Features represent the context…

) ,... , ,

3 2 1 N

f f f (f w = r

So what’s the context?

slide-43
SLIDE 43

Context Features

Word co-occurrence within a window: Grammatical relations:

slide-44
SLIDE 44

Context Features

Feature values

Boolean Raw counts Some other weighting scheme (e.g., idf, tf.idf) Association values (next slide)

Association values (next slide)

Does anything from last week applicable here?

slide-45
SLIDE 45

Association Metric

Commonly-used metric: Pointwise Mutual Information

) , ( l ) ( i i f w P f

What’s the interpretation?

) ( ) ( ) , ( log ) , ( n associatio

2 PMI

f P w P f w P f w =

What s the interpretation?

Can be used as a feature value or by itself

slide-46
SLIDE 46

Cosine Distance

Semantic similarity boils down to computing some

measure on context vectors

Cosine distance: borrowed from information retrieval

× ⋅

N i i

w v w v r r

∑ ∑ ∑

= = =

= =

N i i N i i i i i

w v w v w v w v

1 2 1 2 1 cosine

) , ( sim r r r r

Interpretation?

slide-47
SLIDE 47

Jaccard and Dice

Jaccard

N

w v ) min(

∑ ∑

= =

=

N i i i i i i

w v w v w v

1 1 Jaccard

) , max( ) , min( ) , ( sim r r

Dice

N

∑ ∑

= =

+ × =

N i i i N i i i

w v w v w v

1 1 Dice

) ( ) , min( 2 ) , ( sim r r

slide-48
SLIDE 48

Information-Theoretic Measures

Kullback-Leibler divergence (aka relative entropy)

x P P Q P D ) ( l ) ( ) || (

See any issues?

=

x

x Q x P x P Q P D ) ( ) ( log ) ( ) || (

See any issues?

Note: asymmetric

Jenson-Shannon divergence

) 2 || ( ) 2 || ( ) || ( Q P Q D Q P P D Q P JS + + + =

slide-49
SLIDE 49

Distributional Approaches: Evaluation

Same as thesaurus-based approaches One additional method: use thesaurus as ground truth!

O e add t o a et od use t esau us as g ou d t ut

slide-50
SLIDE 50

Distributional Approaches: Discussion

No thesauri needed: data driven Can be applied to any pair of words

Ca be app ed to a y pa

  • ds

Can be adapted to different domains

slide-51
SLIDE 51

Distributional Profiles: Example

51

slide-52
SLIDE 52

Distributional Profiles: Example

52

slide-53
SLIDE 53

What’s the problem?

53

slide-54
SLIDE 54

Distributional Profiles of Concepts

54

slide-55
SLIDE 55

Semantic Similarity: “Celebrity”

S ti ll di t t

55

Semantically distant…

slide-56
SLIDE 56

Semantic Similarity: “Celestial body”

S ti ll l !

56

Semantically close!

slide-57
SLIDE 57

Solution?

We need word sense disambiguation! Stay tuned for next week…

Stay tu ed o e t ee

57

slide-58
SLIDE 58

Recap: Today’s Agenda

Lexical semantic relations WordNet

  • d et

Computational approaches to word similarity