distributional semantics
play

Distributional Semantics CMSC 723 / LING 723 / INST 725 M ARINE C - PowerPoint PPT Presentation

Word Similarity & Distributional Semantics CMSC 723 / LING 723 / INST 725 M ARINE C ARPUAT marine@cs.umd.edu Last week Q: what is understanding meaning? A: knowing the sense of words in context Requires word sense inventory


  1. Word Similarity & Distributional Semantics CMSC 723 / LING 723 / INST 725 M ARINE C ARPUAT marine@cs.umd.edu

  2. Last week… • Q: what is understanding meaning? • A: knowing the sense of words in context – Requires word sense inventory – Requires a word sense disambiguation algorithm

  3. Last week… WordNet Noun {pipe, tobacco pipe} (a tube with a small bowl at one end; used for smoking tobacco) {pipe, pipage, piping} (a long tube made of metal or plastic that is used to carry water or oil or gas etc.) {pipe, tube} (a hollow cylindrical shape) {pipe} (a tubular wind instrument) {organ pipe, pipe, pipework} (the flues and stops on a pipe organ) Verb {shriek, shrill, pipe up, pipe} (utter a shrill cry) {pipe} (transport by pipeline) “pipe oil, water, and gas into the desert” {pipe} (play on a pipe) “pipe a tune” {pipe} (trim with piping) “pipe the skirt”

  4. Last week… WordNet { c o n v e y a n c e ; t r a n s p o r t } h y p e r o n y m { v e h i c l e } { h i n g e ; f l e x i b l e j o i n t } { b u m p e r } h y p e r o n y m { m o t o r v e h i c l e ; a u t o m o t i v e v e h i c l e } m e r o n y m { c a r d o o r } { d o o r l o c k } m e r o n y m m e r o n y m h y p e r o n y m { c a r w i n d o w } { c a r ; a u t o ; a u t o m o b i l e ; m a c h i n e ; m o t o r c a r } { a r m r e s t } m e r o n y m { c a r m i r r o r } h y p e r o n y m h y p e r o n y m { c r u i s e r ; s q u a d c a r ; p a t r o l c a r ; p o l i c e c a r ; p r o w l c a r } { c a b ; t a x i ; h a c k ; t a x i c a b ; }

  5. T oday • Q: what is understanding meaning? • A: knowing when words are similar or not • Topics – Word similarity – Thesaurus-based methods – Distributional word representations – Dimensionality reduction

  6. WO WORD S D SIMI MILARIT ARITY

  7. Intuition of Semantic Similarity Semantically close Semantically distant – bank – money – doctor – beer – apple – fruit – painting – January – tree – forest – money – river – bank – river – apple – penguin – pen – paper – nurse – fruit – run – walk – pen – river – mistake – error – clown – tramway – car – wheel – car – algebra

  8. Why are 2 words similar? • Meaning – The two concepts are close in terms of their meaning • World knowledge – The two concepts have similar properties, often occur together, or occur in similar contexts • Psychology – We often think of the two concepts together

  9. Two Types of Relations • Synonymy: two words are (roughly) interchangeable • Semantic similarity (distance): somehow “related” – Sometimes, explicit lexical semantic relationship, often, not

  10. Validity of Semantic Similarity • Is semantic distance a valid linguistic phenomenon? • Experiment (Rubenstein and Goodenough, 1965) – Compiled a list of word pairs – Subjects asked to judge semantic distance (from 0 to 4) for each of the word pairs • Results: – Rank correlation between subjects is ~0.9 – People are consistent!

  11. Why do this? • Task: automatically compute semantic similarity between words • Can be useful for many applications: – Detecting paraphrases (i.e., automatic essay grading, plagiarism detection) – Information retrieval – Machine translation • Why? Because similarity gives us a way to generalize beyond word identities

  12. Evaluation: Correlation with Humans • Ask automatic method to rank word pairs in order of semantic distance • Compare this ranking with human-created ranking • Measure correlation

  13. Evaluation: Word-Choice Problems Identify that alternative which is closest in meaning to the target: accidental imprison wheedle incarcerate ferment writhe inadvertent meander abominate inhibit

  14. Evaluation: Malapropisms Jack withdrew money from the ATM next to the band. band is unrelated to all of the other words in its context…

  15. Word Similarity: Two Approaches • Thesaurus-based – We’ve invested in all these resources… let’s exploit them! • Distributional – Count words in context

  16. TH THESAURUS RUS-BASED BASED SIMI MILARIT ARITY MOD MODELS

  17. Path-Length Similarity • Similarity based on length of path between concepts:   sim ( c , c ) log pathlen ( c , c ) path 1 2 1 2 How would you deal with ambiguous words?

  18. Path-Length Similarity Pros and Cons • Advantages – Simple, intuitive – Easy to implement • Major disadvantage: – Assumes each edge has same semantic distance

  19. Resnik Method • Probability that a randomly selected word in a corpus is an instance of concept c :   count ( w )  w words ( c ) P ( c ) N – words( c ) is the set of words subsumed by concept c – N is total number of words in corpus also in thesaurus • Define “information content”:   IC ( c ) log P ( c ) • Define similarity:   sim ( c , c ) log P ( LCS ( c , c )) Resnik 1 2 1 2

  20. Resnik Method: Example   sim ( c , c ) log P ( LCS ( c , c )) Resnik 1 2 1 2

  21. Thesaurus Methods: Limitations • Measure is only as good as the resource • Limited in scope – Assumes IS-A relations – Works mostly for nouns • Role of context not accounted for • Not easily domain-adaptable • Resources not available in many languages

  22. Quick Aside: Thesauri Induction • Building thesauri automatically? • Pattern-based techniques work really well! – Co-training between patterns and relations – Useful for augmenting/adapting existing resources

  23. DI DISTR TRIBU IBUTIO TIONAL NAL WOR ORD D SIMI MILARIT ARITY MOD MODELS

  24. Distributional Approaches: Intuition “You shall know a word by the company it keeps!” (Firth, 1957) “ Differences of meaning correlates with differences of distribution” (Harris, 1970) • Intuition: – If two words appear in the same context, then they must be similar • Basic idea: represent a word w as a feature vector  w  (f , f , f ,... f ) 1 2 3 N

  25. Context Features • Word co-occurrence within a window: • Grammatical relations:

  26. Context Features • Feature values – Boolean – Raw counts – Some other weighting scheme (e.g., idf, tf.idf ) – Association values (next slide)

  27. Association Metric • Commonly-used metric: Pointwise Mutual Information P ( w , f )  associatio n ( w , f ) log PMI 2 P ( w ) P ( f ) • Can be used as a feature value or by itself

  28. Computing Similarity • Semantic similarity boils down to computing some measure on context vectors • Cosine distance: borrowed from information retrieval    N   v w   v w    i i i 1 sim ( v , w )   cosine   v w N N 2 2 v w  i  i i 1 i 1

  29. Distributional Approaches: Discussion • No thesauri needed: data driven • Can be applied to any pair of words • Can be adapted to different domains

  30. Distributional Profiles: Example

  31. Distributional Profiles: Example

  32. Problem?

  33. Distributional Profiles of Concepts

  34. Semantic Similarity: “Celebrity” Semantically distant…

  35. Semantic Similarity: “Celestial body” Semantically close!

  36. DI DIME MENS NSION IONALIT ALITY REDU DUCTIO TION Slides based on presentation by Christopher Potts

  37. Why dimensionality reduction? • So far, we’ve defined word representations as rows in F , a m x n matrix – m = vocab size – n = number of context dimensions / features • Problems: n is very large, F is very sparse • Solution: find a low rank approximation of F – Matrix of size m x d where d << n

  38. Methods • Latent Semantic Analysis • Also: – Principal component analysis – Probabilistic LSA – Latent Dirichlet Allocation – Word2vec – …

  39. Latent Semantic Analysis • Based on Singular Value Decomposition

  40. LSA illustrated: SVD + select top k dimensions

  41. Before & After LSA (k=100)

  42. Methods • Latent Semantic Analysis • Also: – Principal component analysis – Probabilistic LSA – Latent Dirichlet Allocation – Word2vec – …

  43. Recap: T oday • Q: what is understanding meaning? • A: meaning is knowing when words are similar or not • Topics – Word similarity – Thesaurus-based methods – Distributional word representations – Dimensionality reduction

  44. Bonus… • Let’s try our hand at annotating word similarity

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend