Distributional Semantics CMSC 723 / LING 723 / INST 725 M ARINE C - - PowerPoint PPT Presentation

distributional semantics
SMART_READER_LITE
LIVE PREVIEW

Distributional Semantics CMSC 723 / LING 723 / INST 725 M ARINE C - - PowerPoint PPT Presentation

Word Similarity & Distributional Semantics CMSC 723 / LING 723 / INST 725 M ARINE C ARPUAT marine@cs.umd.edu Last week Q: what is understanding meaning? A: knowing the sense of words in context Requires word sense inventory


slide-1
SLIDE 1

Word Similarity & Distributional Semantics

CMSC 723 / LING 723 / INST 725 MARINE CARPUAT

marine@cs.umd.edu

slide-2
SLIDE 2

Last week…

  • Q: what is understanding meaning?
  • A: knowing the sense of words in context

– Requires word sense inventory – Requires a word sense disambiguation algorithm

slide-3
SLIDE 3

Last week… WordNet

Noun {pipe, tobacco pipe} (a tube with a small bowl at one end; used for smoking tobacco) {pipe, pipage, piping} (a long tube made of metal or plastic that is used to carry water or oil or gas etc.) {pipe, tube} (a hollow cylindrical shape) {pipe} (a tubular wind instrument) {organ pipe, pipe, pipework} (the flues and stops on a pipe organ) Verb {shriek, shrill, pipe up, pipe} (utter a shrill cry) {pipe} (transport by pipeline) “pipe oil, water, and gas into the desert” {pipe} (play on a pipe) “pipe a tune” {pipe} (trim with piping) “pipe the skirt”

slide-4
SLIDE 4

Last week… WordNet

{ v e h i c l e } { c

  • n

v e y a n c e ; t r a n s p

  • r

t } { c a r ; a u t

  • ;

a u t

  • m
  • b

i l e ; m a c h i n e ; m

  • t
  • r

c a r } { c r u i s e r ; s q u a d c a r ; p a t r

  • l

c a r ; p

  • l

i c e c a r ; p r

  • w

l c a r } { c a b ; t a x i ; h a c k ; t a x i c a b ; }

{ m

  • t
  • r

v e h i c l e ; a u t

  • m
  • t

i v e v e h i c l e } { b u m p e r } { c a r d

  • r

} { c a r w i n d

  • w

} { c a r m i r r

  • r

} { h i n g e ; f l e x i b l e j

  • i

n t } { d

  • r

l

  • c

k } { a r m r e s t }

h y p e r

  • n

y m h y p e r

  • n

y m h y p e r

  • n

y m h y p e r

  • n

y m h y p e r

  • n

y m m e r

  • n

y m m e r

  • n

y m m e r

  • n

y m m e r

  • n

y m

slide-5
SLIDE 5

T

  • day
  • Q: what is understanding meaning?
  • A: knowing when words are similar or not
  • Topics

– Word similarity – Thesaurus-based methods – Distributional word representations – Dimensionality reduction

slide-6
SLIDE 6

WO WORD S D SIMI MILARIT ARITY

slide-7
SLIDE 7

Intuition of Semantic Similarity

Semantically close

– bank–money – apple–fruit – tree–forest – bank–river – pen–paper – run–walk – mistake–error – car–wheel

Semantically distant

– doctor–beer – painting–January – money–river – apple–penguin – nurse–fruit – pen–river – clown–tramway – car–algebra

slide-8
SLIDE 8

Why are 2 words similar?

  • Meaning

– The two concepts are close in terms of their meaning

  • World knowledge

– The two concepts have similar properties,

  • ften occur together, or occur in similar

contexts

  • Psychology

– We often think of the two concepts together

slide-9
SLIDE 9

Two Types of Relations

  • Synonymy: two words are (roughly)

interchangeable

  • Semantic similarity (distance): somehow “related”

– Sometimes, explicit lexical semantic relationship,

  • ften, not
slide-10
SLIDE 10

Validity of Semantic Similarity

  • Is semantic distance a valid linguistic

phenomenon?

  • Experiment (Rubenstein and Goodenough, 1965)

– Compiled a list of word pairs – Subjects asked to judge semantic distance (from 0 to 4) for each of the word pairs

  • Results:

– Rank correlation between subjects is ~0.9 – People are consistent!

slide-11
SLIDE 11

Why do this?

  • Task: automatically compute semantic

similarity between words

  • Can be useful for many applications:

– Detecting paraphrases (i.e., automatic essay grading, plagiarism detection) – Information retrieval – Machine translation

  • Why? Because similarity gives us a way to

generalize beyond word identities

slide-12
SLIDE 12

Evaluation: Correlation with Humans

  • Ask automatic method to rank word pairs

in order of semantic distance

  • Compare this ranking with human-created

ranking

  • Measure correlation
slide-13
SLIDE 13

Evaluation: Word-Choice Problems

Identify that alternative which is closest in meaning to the target: accidental wheedle ferment inadvertent abominate imprison incarcerate writhe meander inhibit

slide-14
SLIDE 14

Evaluation: Malapropisms

Jack withdrew money from the ATM next to the band. band is unrelated to all of the other words in its context…

slide-15
SLIDE 15

Word Similarity: Two Approaches

  • Thesaurus-based

– We’ve invested in all these resources… let’s exploit them!

  • Distributional

– Count words in context

slide-16
SLIDE 16

TH THESAURUS RUS-BASED BASED SIMI MILARIT ARITY MOD MODELS

slide-17
SLIDE 17

Path-Length Similarity

  • Similarity based on length of path between

concepts:

) , ( pathlen log ) , ( sim

2 1 2 1 path

c c c c  

How would you deal with ambiguous words?

slide-18
SLIDE 18

Path-Length Similarity Pros and Cons

  • Advantages

– Simple, intuitive – Easy to implement

  • Major disadvantage:

– Assumes each edge has same semantic distance

slide-19
SLIDE 19

Resnik Method

  • Probability that a randomly selected word in a

corpus is an instance of concept c:

– words(c) is the set of words subsumed by concept c – N is total number of words in corpus also in thesaurus

  • Define “information content”:
  • Define similarity:

N w c P

c w

 

) ( words

) ( count ) ( ) ( log ) ( IC c P c   )) , ( LCS ( log ) , ( sim

2 1 2 1 Resnik

c c P c c  

slide-20
SLIDE 20

Resnik Method: Example

)) , ( LCS ( log ) , ( sim

2 1 2 1 Resnik

c c P c c  

slide-21
SLIDE 21

Thesaurus Methods: Limitations

  • Measure is only as good as the resource
  • Limited in scope

– Assumes IS-A relations – Works mostly for nouns

  • Role of context not accounted for
  • Not easily domain-adaptable
  • Resources not available in many languages
slide-22
SLIDE 22

Quick Aside: Thesauri Induction

  • Building thesauri automatically?
  • Pattern-based techniques work really well!

– Co-training between patterns and relations – Useful for augmenting/adapting existing resources

slide-23
SLIDE 23

DI DISTR TRIBU IBUTIO TIONAL NAL WOR ORD D SIMI MILARIT ARITY MOD MODELS

slide-24
SLIDE 24

Distributional Approaches: Intuition

“You shall know a word by the company it keeps!” (Firth, 1957) “Differences of meaning correlates with differences

  • f distribution” (Harris, 1970)
  • Intuition:

– If two words appear in the same context, then they must be similar

  • Basic idea: represent a word w as a feature

vector

) ,... , ,

3 2 1 N

f f f (f w  

slide-25
SLIDE 25
slide-26
SLIDE 26

Context Features

  • Word co-occurrence within a window:
  • Grammatical relations:
slide-27
SLIDE 27

Context Features

  • Feature values

– Boolean – Raw counts – Some other weighting scheme (e.g., idf, tf.idf) – Association values (next slide)

slide-28
SLIDE 28

Association Metric

  • Commonly-used metric: Pointwise Mutual

Information

  • Can be used as a feature value or by itself

) ( ) ( ) , ( log ) , ( n associatio

2 PMI

f P w P f w P f w 

slide-29
SLIDE 29

Computing Similarity

  • Semantic similarity boils down to

computing some measure on context vectors

  • Cosine distance: borrowed from

information retrieval

  

  

   

N i i N i i N i i i

w v w v w v w v w v

1 2 1 2 1 cosine

) , ( sim      

slide-30
SLIDE 30

Distributional Approaches: Discussion

  • No thesauri needed: data driven
  • Can be applied to any pair of words
  • Can be adapted to different domains
slide-31
SLIDE 31

Distributional Profiles: Example

slide-32
SLIDE 32

Distributional Profiles: Example

slide-33
SLIDE 33

Problem?

slide-34
SLIDE 34

Distributional Profiles of Concepts

slide-35
SLIDE 35

Semantic Similarity: “Celebrity”

Semantically distant…

slide-36
SLIDE 36

Semantic Similarity: “Celestial body”

Semantically close!

slide-37
SLIDE 37

DI DIME MENS NSION IONALIT ALITY REDU DUCTIO TION

Slides based on presentation by Christopher Potts

slide-38
SLIDE 38

Why dimensionality reduction?

  • So far, we’ve defined word representations as

rows in F, a m x n matrix

– m = vocab size – n = number of context dimensions / features

  • Problems: n is very large, F is very sparse
  • Solution: find a low rank approximation of F

– Matrix of size m x d where d << n

slide-39
SLIDE 39

Methods

  • Latent Semantic Analysis
  • Also:

– Principal component analysis – Probabilistic LSA – Latent Dirichlet Allocation – Word2vec – …

slide-40
SLIDE 40

Latent Semantic Analysis

  • Based on Singular Value Decomposition
slide-41
SLIDE 41

LSA illustrated: SVD + select top k dimensions

slide-42
SLIDE 42

Before & After LSA (k=100)

slide-43
SLIDE 43

Methods

  • Latent Semantic Analysis
  • Also:

– Principal component analysis – Probabilistic LSA – Latent Dirichlet Allocation – Word2vec – …

slide-44
SLIDE 44

Recap: T

  • day
  • Q: what is understanding meaning?
  • A: meaning is knowing when words are

similar or not

  • Topics

– Word similarity – Thesaurus-based methods – Distributional word representations – Dimensionality reduction

slide-45
SLIDE 45

Bonus…

  • Let’s try our hand at annotating word

similarity