Natural Language Processing (CSEP 517): Distributional Semantics - PowerPoint PPT Presentation

Natural Language Processing (CSEP 517): Distributional Semantics Roy Schwartz � 2017 c University of Washington roysch@cs.washington.edu May 15, 2017 1 / 59

To-Do List ◮ Read: (Jurafsky and Martin, 2016a,b) 2 / 59

Distributional Semantics Models Aka, Vector Space Models, Word Embeddings  -0.23   -0.72  -0.21 -00.2         -0.15 -0.71         -0.61 -0.13 v mountain = , v lion =      .   .  . .     . .         -0.02 0-0.1     -0.12 -0.11 3 / 59

Distributional Semantics Models Aka, Vector Space Models, Word Embeddings  -0.23   -0.72  -0.21 -00.2         -0.15 -0.71     mountain     -0.61 -0.13 v mountain = , v lion =      .   .  . .     . .     lion     -0.02 0-0.1     -0.12 -0.11 4 / 59

Distributional Semantics Models Aka, Vector Space Models, Word Embeddings  -0.23   -0.72  -0.21 -00.2         -0.15 -0.71     mountain     -0.61 -0.13 v mountain = , v lion =      .   .  . .     . .     lion     -0.02 0-0.1     -0.12 -0.11 θ 5 / 59

Distributional Semantics Models Aka, Vector Space Models, Word Embeddings  -0.23   -0.72  mountain lion -0.21 -00.2         -0.15 -0.71     mountain     -0.61 -0.13 v mountain = , v lion =      .   .  . .     . .     lion     -0.02 0-0.1     -0.12 -0.11 6 / 59

Distributional Semantics Models Aka, Vector Space Models, Word Embeddings Applications Linguistic Study Deep learning models: Lexical Semantics Machine Translation Multilingual Studies Question Answering Evolution of Language Syntactic Parsing . . . . . . 7 / 59

Outline Vector Space Models Lexical Semantic Applications Word Embeddings Compositionality Current Research Problems 8 / 59

Distributional Semantics Hypothesis Harris (1954) Words that have similar contexts are likely to have similar meaning 10 / 59

Distributional Semantics Hypothesis Harris (1954) Words that have similar contexts are likely to have similar meaning 11 / 59

Vector Space Models ◮ Representation of words by vectors of real numbers ◮ ∀ w ∈ V , v w is function of the contexts in which w occurs ◮ Vectors are computed using a large text corpus ◮ No requirement for any sort of annotation in the general case 12 / 59

V 1 . 0 : Count Models Salton (1971) ◮ Each element v w i ∈ v w represents the co-occurrence of w with another word i ◮ v dog = (cat: 10, leash: 15, loyal: 27, bone: 8, piano: 0, cloud: 0, . . . ) ◮ Vector dimension is typically very large (vocabulary size) ◮ Main motivation: lexical semantics 13 / 59

Count Models Example  0   0  0 2         15 11         17 13 v dog = , v cat =      .   .  . .     . .         0 20     102 11 14 / 59

Count Models Example dog  0   0  0 2         15 11 cat         17 13 v dog = , v cat =      .   .  . .     . .         0 20     102 11 15 / 59

Count Models Example dog  0   0  0 2         15 11 cat         17 13 v dog = , v cat =      .   .  . .     . .         0 20     θ 102 11 16 / 59

Variants of Count Models ◮ Reduce the effect of high frequency words by applying a weighting scheme ◮ Pointwise mutual information (PMI), TF-IDF 17 / 59

Variants of Count Models ◮ Reduce the effect of high frequency words by applying a weighting scheme ◮ Pointwise mutual information (PMI), TF-IDF ◮ Smoothing by dimensionality reduction ◮ Singular value decomposition (SVD), principal component analysis (PCA), matrix factorization methods 18 / 59

Variants of Count Models ◮ Reduce the effect of high frequency words by applying a weighting scheme ◮ Pointwise mutual information (PMI), TF-IDF ◮ Smoothing by dimensionality reduction ◮ Singular value decomposition (SVD), principal component analysis (PCA), matrix factorization methods ◮ What is a context? ◮ Bag-of-words context, document context (Latent Semantic Analysis (LSA)), dependency contexts, pattern contexts 19 / 59

Vector Space Models Evaluation ◮ Vector space models as features ◮ Synonym detection ◮ TOEFL (Landauer and Dumais, 1997) ◮ Word clustering ◮ CLUTO (Karypis, 2002) 21 / 59

Vector Space Models Evaluation ◮ Vector space models as features ◮ Synonym detection ◮ TOEFL (Landauer and Dumais, 1997) ◮ Word clustering ◮ CLUTO (Karypis, 2002) ◮ Vector operations ◮ Semantic Similarity ◮ RG-65 (Rubenstein and Goodenough, 1965), wordsim353 (Finkelstein et al., 2001), MEN (Bruni et al., 2014), SimLex999 (Hill et al., 2015) ◮ Word Analogies ◮ Mikolov et al. (2013) 22 / 59

Semantic Similarity w 1 w 2 human score model score tiger cat 7.35 0.8 computer keyboard 7.62 0.54 . . . . . . . . . . . . architecture century 3.78 0.03 book paper 7.46 0.66 king cabbage 0.23 -0.42 Table: Human scores taken from wordsim353 (Finkelstein et al., 2001) ◮ Model scores are cosine similarity scores between vectors ◮ Model’s performance is the Spearman/Pearson correlation between human ranking and model ranking 23 / 59

Word Analogy Mikolov et al. (2013) France woman Italy queen man Paris Rome king 24 / 59

V 2 . 0 : Predict Models (Aka Word Embeddings) ◮ A new generation of vector space models ◮ Instead of representing vectors as cooccurrence counts, train a supervised machine learning algorithm to predict p ( word | context ) ◮ Models learn a latent vector representation of each word ◮ These representations turn out to be quite effective vector space representations ◮ Word embeddings 26 / 59

Word Embeddings ◮ Vector size is typically a few dozens to a few hundreds ◮ Vector elements are generally uninterpretable ◮ Developed to initialize feature vectors in deep learning models ◮ Initially language models, nowadays virtually every sequence level NLP task ◮ Bengio et al. (2003); Collobert and Weston (2008); Collobert et al. (2011); word2vec (Mikolov et al., 2013); GloVe (Pennington et al., 2014) 27 / 59

Word Embeddings ◮ Vector size is typically a few dozens to a few hundreds ◮ Vector elements are generally uninterpretable ◮ Developed to initialize feature vectors in deep learning models ◮ Initially language models, nowadays virtually every sequence level NLP task ◮ Bengio et al. (2003); Collobert and Weston (2008); Collobert et al. (2011); word2vec (Mikolov et al., 2013); GloVe (Pennington et al., 2014) 28 / 59

word2vec Mikolov et al. (2013) ◮ A software toolkit for running various word embedding algorithms Based on (Goldberg and Levy, 2014) 29 / 59

word2vec Mikolov et al. (2013) ◮ A software toolkit for running various word embedding algorithms � ◮ Continuous bag-of-words: argmax p ( w | C ( w ); θ ) θ w ∈ corpus Based on (Goldberg and Levy, 2014) 30 / 59

word2vec Mikolov et al. (2013) ◮ A software toolkit for running various word embedding algorithms � ◮ Continuous bag-of-words: argmax p ( w | C ( w ); θ ) θ w ∈ corpus � ◮ Skip-gram: argmax p ( c | w ; θ ) θ ( w,c ) ∈ corpus Based on (Goldberg and Levy, 2014) 31 / 59

word2vec Mikolov et al. (2013) ◮ A software toolkit for running various word embedding algorithms � ◮ Continuous bag-of-words: argmax p ( w | C ( w ); θ ) θ w ∈ corpus � ◮ Skip-gram: argmax p ( c | w ; θ ) θ ( w,c ) ∈ corpus ◮ Negative sampling: randomly sample negative ( word,context ) pairs, then: � � (1 − p ( c ′ | w ; θ )) argmax p ( c | w ; θ ) · θ ( w,c ) ∈ corpus ( w,c ′ ) Based on (Goldberg and Levy, 2014) 32 / 59

Skip-gram with Negative Sampling (SGNS) ◮ Obtained significant improvements on a range of lexical semantic tasks ◮ Is very fast to train, even on large corpora ◮ Nowadays, by far the most popular word embedding approach 1 1 Along with GloVe (Pennington et al., 2014) 33 / 59

Natural Language Processing (CSEP 517): Distributional Semantics - PowerPoint PPT Presentation

Natural Language Processing (CSEP 517): Distributional Semantics Roy Schwartz 2017 c University of Washington roysch@cs.washington.edu May 15, 2017 1 / 59 To-Do List Read: (Jurafsky and Martin, 2016a,b) 2 / 59 Distributional

CSEP 517 Natural Language Processing Language Models Luke Zettlemoyer Slides adapted from Dan

CSEP 517 Natural Language Processing Introduction Luke Zettlemoyer Slides adapted from Dan

CSEP 517: Natural Language Processing New PMP Course! Instructor: Luke Zettlemoyer Autumn 2013

CSEP 517 Natural Language Processing Autumn 2018 Introduction Luke Zettlemoyer Slides adapted

Natural Language Processing (CSEP 517): Computational Pragmatics Chenhao Tan 2017 c

Natural Language Processing (CSEP 517): Introduction & Language Models Noah Smith c 2017

CSEP 517 Natural Language Processing Autumn 2015 Parsing (Trees) Yejin Choi - University of

CSEP 517 Natural Language Processing Frame Semantics Luke Zettlemoyer Slides adapted from Yejin

CSEP 517: Natural Language Processing Recurrent Neural Networks Autumn 2018 Luke Zettlemoyer

CSEP 517 Natural Language Processing Autumn 2015 Introduction Yejin Choi Slides adapted

CSEP 517 Natural Language Processing Luke Zettlemoyer Machine Translation, Sequence-to-sequence

Natural Language Processing (CSEP 517): Machine Translation (Continued), Summarization, &

CSEP 517 Natural Language Processing Coreference Resolution Luke Zettlemoyer University of

Natural Language Processing (CSEP 517): Dependency Syntax and Parsing Noah Smith 2017 c

CSEP 517 Natural Language Processing Text Classification Linear Models Luke Zettlemoyer -

CSEP 517 Natural Language Processing Autumn 2018 Distributed Semantics & Embeddings Luke

Retention Office of Academic and Student Affairs College of Agriculture and Natural Resources

2 THESSALONIANS 2:3A Apostasy or Rapture ? Andrew Marshall Woods, Th.M., JD., PhD. Sr. Pastor,

Inequality, Living Standards and Growth: Two Centuries of Economic Development in Mexico Ingrid

I see it, but because by it I see everything else. C. S. Lewis, Is Theology Poetry? The

Comparing Textual and Block Interfaces in a Novice Programming Environment Thomas Price Tiffany

Leadership Updates Brian Montgomery has been Confirmed as the New Assistant Secretary for

Prize Prize 2007 2007 Gnther Laukien Prize Gnther Laukien Prize Established in 1999 to

Memory-Bounded Left-Corner Unsupervised Grammar Induction on Child-Directed Input Cory Shain 1 ,

Natural Language Processing (CSEP 517): Distributional Semantics - PowerPoint PPT Presentation

Natural Language Processing (CSEP 517): Distributional Semantics Roy Schwartz 2017 c University of Washington roysch@cs.washington.edu May 15, 2017 1 / 59 To-Do List Read: (Jurafsky and Martin, 2016a,b) 2 / 59 Distributional

CSEP 517 Natural Language Processing Language Models Luke Zettlemoyer Slides adapted from Dan

CSEP 517 Natural Language Processing Introduction Luke Zettlemoyer Slides adapted from Dan

CSEP 517: Natural Language Processing New PMP Course! Instructor: Luke Zettlemoyer Autumn 2013

CSEP 517 Natural Language Processing Autumn 2018 Introduction Luke Zettlemoyer Slides adapted

Natural Language Processing (CSEP 517): Computational Pragmatics Chenhao Tan 2017 c

Natural Language Processing (CSEP 517): Introduction &amp; Language Models Noah Smith c 2017

CSEP 517 Natural Language Processing Autumn 2015 Parsing (Trees) Yejin Choi - University of

CSEP 517 Natural Language Processing Frame Semantics Luke Zettlemoyer Slides adapted from Yejin

CSEP 517: Natural Language Processing Recurrent Neural Networks Autumn 2018 Luke Zettlemoyer

CSEP 517 Natural Language Processing Autumn 2015 Introduction Yejin Choi Slides adapted

CSEP 517 Natural Language Processing Luke Zettlemoyer Machine Translation, Sequence-to-sequence

Natural Language Processing (CSEP 517): Machine Translation (Continued), Summarization, &amp;

CSEP 517 Natural Language Processing Coreference Resolution Luke Zettlemoyer University of

Natural Language Processing (CSEP 517): Dependency Syntax and Parsing Noah Smith 2017 c

CSEP 517 Natural Language Processing Text Classification Linear Models Luke Zettlemoyer -

CSEP 517 Natural Language Processing Autumn 2018 Distributed Semantics &amp; Embeddings Luke

Retention Office of Academic and Student Affairs College of Agriculture and Natural Resources

2 THESSALONIANS 2:3A Apostasy or Rapture ? Andrew Marshall Woods, Th.M., JD., PhD. Sr. Pastor,

Inequality, Living Standards and Growth: Two Centuries of Economic Development in Mexico Ingrid

I see it, but because by it I see everything else. C. S. Lewis, Is Theology Poetry? The

Comparing Textual and Block Interfaces in a Novice Programming Environment Thomas Price Tiffany

Leadership Updates Brian Montgomery has been Confirmed as the New Assistant Secretary for

Prize Prize 2007 2007 Gnther Laukien Prize Gnther Laukien Prize Established in 1999 to

Memory-Bounded Left-Corner Unsupervised Grammar Induction on Child-Directed Input Cory Shain 1 ,

Natural Language Processing (CSEP 517): Introduction & Language Models Noah Smith c 2017

Natural Language Processing (CSEP 517): Machine Translation (Continued), Summarization, &

CSEP 517 Natural Language Processing Autumn 2018 Distributed Semantics & Embeddings Luke