Word Similarity & Distributional Semantics
CMSC 723 / LING 723 / INST 725 MARINE CARPUAT
marine@cs.umd.edu
Distributional Semantics CMSC 723 / LING 723 / INST 725 M ARINE C - - PowerPoint PPT Presentation
Word Similarity & Distributional Semantics CMSC 723 / LING 723 / INST 725 M ARINE C ARPUAT marine@cs.umd.edu Last week Q: what is understanding meaning? A: knowing the sense of words in context Requires word sense inventory
CMSC 723 / LING 723 / INST 725 MARINE CARPUAT
marine@cs.umd.edu
– Requires word sense inventory – Requires a word sense disambiguation algorithm
Noun {pipe, tobacco pipe} (a tube with a small bowl at one end; used for smoking tobacco) {pipe, pipage, piping} (a long tube made of metal or plastic that is used to carry water or oil or gas etc.) {pipe, tube} (a hollow cylindrical shape) {pipe} (a tubular wind instrument) {organ pipe, pipe, pipework} (the flues and stops on a pipe organ) Verb {shriek, shrill, pipe up, pipe} (utter a shrill cry) {pipe} (transport by pipeline) “pipe oil, water, and gas into the desert” {pipe} (play on a pipe) “pipe a tune” {pipe} (trim with piping) “pipe the skirt”
{ v e h i c l e } { c
v e y a n c e ; t r a n s p
t } { c a r ; a u t
a u t
i l e ; m a c h i n e ; m
c a r } { c r u i s e r ; s q u a d c a r ; p a t r
c a r ; p
i c e c a r ; p r
l c a r } { c a b ; t a x i ; h a c k ; t a x i c a b ; }
{ m
v e h i c l e ; a u t
i v e v e h i c l e } { b u m p e r } { c a r d
} { c a r w i n d
} { c a r m i r r
} { h i n g e ; f l e x i b l e j
n t } { d
l
k } { a r m r e s t }
h y p e r
y m h y p e r
y m h y p e r
y m h y p e r
y m h y p e r
y m m e r
y m m e r
y m m e r
y m m e r
y m
– Word similarity – Thesaurus-based methods – Distributional word representations – Dimensionality reduction
Semantically close
– bank–money – apple–fruit – tree–forest – bank–river – pen–paper – run–walk – mistake–error – car–wheel
Semantically distant
– doctor–beer – painting–January – money–river – apple–penguin – nurse–fruit – pen–river – clown–tramway – car–algebra
– The two concepts are close in terms of their meaning
– The two concepts have similar properties,
contexts
– We often think of the two concepts together
interchangeable
– Sometimes, explicit lexical semantic relationship,
phenomenon?
– Compiled a list of word pairs – Subjects asked to judge semantic distance (from 0 to 4) for each of the word pairs
– Rank correlation between subjects is ~0.9 – People are consistent!
– Detecting paraphrases (i.e., automatic essay grading, plagiarism detection) – Information retrieval – Machine translation
Identify that alternative which is closest in meaning to the target: accidental wheedle ferment inadvertent abominate imprison incarcerate writhe meander inhibit
Jack withdrew money from the ATM next to the band. band is unrelated to all of the other words in its context…
– We’ve invested in all these resources… let’s exploit them!
– Count words in context
concepts:
2 1 2 1 path
How would you deal with ambiguous words?
– Simple, intuitive – Easy to implement
– Assumes each edge has same semantic distance
corpus is an instance of concept c:
– words(c) is the set of words subsumed by concept c – N is total number of words in corpus also in thesaurus
N w c P
c w
) ( words
) ( count ) ( ) ( log ) ( IC c P c )) , ( LCS ( log ) , ( sim
2 1 2 1 Resnik
c c P c c
)) , ( LCS ( log ) , ( sim
2 1 2 1 Resnik
c c P c c
– Assumes IS-A relations – Works mostly for nouns
– Co-training between patterns and relations – Useful for augmenting/adapting existing resources
“You shall know a word by the company it keeps!” (Firth, 1957) “Differences of meaning correlates with differences
– If two words appear in the same context, then they must be similar
3 2 1 N
– Boolean – Raw counts – Some other weighting scheme (e.g., idf, tf.idf) – Association values (next slide)
) ( ) ( ) , ( log ) , ( n associatio
2 PMI
f P w P f w P f w
N i i N i i N i i i
w v w v w v w v w v
1 2 1 2 1 cosine
) , ( sim
Semantically distant…
Semantically close!
Slides based on presentation by Christopher Potts
rows in F, a m x n matrix
– m = vocab size – n = number of context dimensions / features
– Matrix of size m x d where d << n
– Principal component analysis – Probabilistic LSA – Latent Dirichlet Allocation – Word2vec – …
– Principal component analysis – Probabilistic LSA – Latent Dirichlet Allocation – Word2vec – …
– Word similarity – Thesaurus-based methods – Distributional word representations – Dimensionality reduction