Word Embeddings CS 6956: Deep Learning for NLP Overview - PowerPoint PPT Presentation

Word Embeddings CS 6956: Deep Learning for NLP

Overview • Representing meaning • Word embeddings: Early work • Word embeddings via language models • Word2vec and Glove • Evaluating embeddings • Design choices and open questions 1

Overview • Representing meaning • Word embeddings: Early work • Word embeddings via language models • Word2vec and Glove • Evaluating embeddings • Design choices and open questions 2

Vector space representations of words Historically, a diverse collection of ideas and methods • 1980s/1990s/2000s – Latent semantic analysis (LSA) – Probabilistic LSA, topic models • 2000s/2010s – Word embeddings via neural language models – word2vec – Glove 3

What defines the context of a word? Several answers possible 1. Entire documents : Words that occur in the same documents are related Example: soccer and referee may show up in the same – document often because they share a topic 2. Neighboring words : Words that occur in the context of the same words carry similar meanings Example: USA and America may be used in interchangably in – certain contexts 4

What defines the context of a word? Several answers possible 1. Entire documents : Words that occur in the same documents are related Example: soccer and referee may show up in the same – document often because they share a topic 2. Neighboring words : Words that occur in the context of the same words carry similar meanings Example: USA and America may be used in interchangably in – certain contexts 5

What defines the context of a word? Several answers possible 1. Entire documents : Words that occur in the same documents are related Example: soccer and referee may show up in the same – document often because they share a topic 2. Neighboring words : Words that occur in the context of the same words carry similar meanings Example: NYC and Yankees may be used in interchangably in – certain contexts, but NYC and baseball may not. 6

Documents as context • Arose in the information retrieval world • Led to latent semantic analysis (LSA), topic models, latent Dirichlet analysis • Captures relatedness between words 7

Neighboring words as context • Typically uses a window around a word • For example, suppose we consider a window of size 2 to the left and right John sleeps during the day and works at night. Mary starts her day with a cup of coffee. John starts his day with an angry look at his inbox. 8

Neighboring words as context • Typically uses a window around a word • For example, suppose we consider a window of size 2 to the left and right John sleeps during the day and works at night. Mary starts her day with a cup of coffee. John starts his day with an angry look at his inbox. 9

Neighboring words as context • Typically uses a window around a word • For example, suppose we consider a window of size 2 to the left and right John sleeps during the day and works at night. Mary starts her day with a cup of coffee. John starts his day with an angry look at his inbox. We have a co-occurrence vector during the and works starts her his with a an 1 1 1 1 2 1 1 2 1 1 Not showing entries with zeros, which will include all other words 10

Neighboring words as features Commonly seen in NLP, especially with linear models – Standard features before neural networks became common However: 1. Sparsity can cause problems 2. High dimensionality can cause problems In both cases, with regard to generalization and memory 11

Addressing sparsity and dimensionality • Dimensionality reduction • Project the word-word co-occurrence matrix to a lower dimensional space – Perform singular value decomposition – Suppose 𝐷 is the co-occurrence matrix, then • 𝑉, Σ, 𝑊 & = 𝑡𝑤𝑒(𝐷) • Treat the rows of 𝑉 as word embeddings • Key idea: Word embeddings as dense, low dimensional vectors 12

Variants on this theme 1. Frequent words can dominate counts Words like a, the, is, in, etc will occcur in the context of nearly – every word Control for this by putting an upper limit on the count. For eg: – If a word occurs more than 100 times in a context, then restrict its count to 100. 2. Instead of counts, we can use other properties of words in contexts Eg: log frequencies, correlation coefficients, etc – All these will give us different embeddings – We will revisit this idea soon – 13

Variants on this theme 1. Frequent words can dominate counts Words like a, the, is, in, etc will occcur in the context of nearly – every word Control for this by putting an upper limit on the count. For eg: – If a word occurs more than 100 times in a context, then restrict its count to 100. 2. Instead of counts, we can use other properties of words in contexts Eg: log frequencies, correlation coefficients, etc – All these will give us different embeddings – We will revisit this idea soon – 14

Good news: The embeddings capture meaningful regularities Both syntactic and semantic Rohde, Douglas LT, Laura M. Gonnerman, and David C. Plaut. "An improved model of semantic similarity based on lexical co-occurrence." Communications of the ACM 8, no. 15 627-633 (2006): 116.

Bad news: SVD is slow • The matrix at hand is huge – Rows/columns = Number of words • Time complexity of SVD is cubic in this number – However, various incremental SVD algorithms exist • But do we need to perform this computation at all? Rohde, Douglas LT, Laura M. Gonnerman, and David C. Plaut. "An improved model of semantic similarity based on lexical co-occurrence." Communications of the ACM 8, no. 16 627-633 (2006): 116.

Word Embeddings CS 6956: Deep Learning for NLP Overview - PowerPoint PPT Presentation

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word embeddings: Early work Word embeddings via language models Word2vec and Glove Evaluating embeddings Design choices and open questions 1

Word Embeddings Natural Language Processing VU (706.230) - Andi Rexha 02/04/2020 Word Embeddings

Word embeddings Rappel Embeddings ( pas Word Embeddings ) Est une lookup table Formalisme:

Word Embeddings Revisited: Contextual Embeddings CS 6956: Deep Learning for NLP Overview

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Embeddings @ Twitter Making ML easy with Embeddings !!! Sept 2018 Agenda 1 Team 2 Whats an

Word Embeddings Tutorial HILA GONEN PHD STUDENT AT YOAV GOLDBERGS LAB BAR ILAN UNIVERSITY

Mixed membership word embeddings: Corpus-specific embeddings without big data James Foulds

Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction Roy Schwartz + ,

Dense Word Embeddings CMSC 470 Marine Carpuat Slides credit: Jurasky & Martin How to

Dense Word Embeddings CMSC 470 Marine Carpuat Slides credit: Jurasky & Martin How to

Memory Memory Decoders M bits M bits RWM NVRWM ROM S 0 S 0 Word 0 Word 0 S 1 Word 1 Word

Lecture 8: NLP and Word Embeddings Alireza Akhavan Pour CLASS.VISION

Word Embeddings through Hellinger PCA Rmi Lebret and Ronan Collobert Idiap Research Institute /

Searching for the X-Factor: Exploring Corpus Subjectivity for Word Embeddings Maksim Tkachenko

Homotopy Theory and formalization foundations of Mathematics Tuesday, March 10, 15 We need to

Story Generation From Knowledge Graphs Patrick Saad Referee: Prof. Dr. Benno Stein Referee:

Communication and Memory Efficient Testing of Discrete Distributions Themis Gouleakis USC

Mini-Series: Own your CV Episode 5 Interests, Additional Information and Referees Additional

Pharmacy Residency Application and Matching Service (PRAMS) Canadian Pharmacy Residency Board

Optimizing the Performance of Robots in Production Logistics Scenarios Gerhard Lakemeyer

Quantum Communication: How quantum signals help to maintain privacy and speed things up Juan

CISC883: LECTURE 2 INTRODUCTION TO ULSS Cor-Paul Bezemer 2 Website

Word Embeddings CS 6956: Deep Learning for NLP Overview - PowerPoint PPT Presentation

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word embeddings: Early work Word embeddings via language models Word2vec and Glove Evaluating embeddings Design choices and open questions 1

Word Embeddings Natural Language Processing VU (706.230) - Andi Rexha 02/04/2020 Word Embeddings

Word embeddings Rappel Embeddings ( pas Word Embeddings ) Est une lookup table Formalisme:

Word Embeddings Revisited: Contextual Embeddings CS 6956: Deep Learning for NLP Overview

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Embeddings @ Twitter Making ML easy with Embeddings !!! Sept 2018 Agenda 1 Team 2 Whats an

Word Embeddings Tutorial HILA GONEN PHD STUDENT AT YOAV GOLDBERGS LAB BAR ILAN UNIVERSITY

Mixed membership word embeddings: Corpus-specific embeddings without big data James Foulds

Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction Roy Schwartz + ,

Dense Word Embeddings CMSC 470 Marine Carpuat Slides credit: Jurasky &amp; Martin How to

Dense Word Embeddings CMSC 470 Marine Carpuat Slides credit: Jurasky &amp; Martin How to

Memory Memory Decoders M bits M bits RWM NVRWM ROM S 0 S 0 Word 0 Word 0 S 1 Word 1 Word

Lecture 8: NLP and Word Embeddings Alireza Akhavan Pour CLASS.VISION

Word Embeddings through Hellinger PCA Rmi Lebret and Ronan Collobert Idiap Research Institute /

Searching for the X-Factor: Exploring Corpus Subjectivity for Word Embeddings Maksim Tkachenko

Homotopy Theory and formalization foundations of Mathematics Tuesday, March 10, 15 We need to

Story Generation From Knowledge Graphs Patrick Saad Referee: Prof. Dr. Benno Stein Referee:

Communication and Memory Efficient Testing of Discrete Distributions Themis Gouleakis USC

Mini-Series: Own your CV Episode 5 Interests, Additional Information and Referees Additional

Pharmacy Residency Application and Matching Service (PRAMS) Canadian Pharmacy Residency Board

Optimizing the Performance of Robots in Production Logistics Scenarios Gerhard Lakemeyer

Quantum Communication: How quantum signals help to maintain privacy and speed things up Juan

CISC883: LECTURE 2 INTRODUCTION TO ULSS Cor-Paul Bezemer 2 Website

Dense Word Embeddings CMSC 470 Marine Carpuat Slides credit: Jurasky & Martin How to

Dense Word Embeddings CMSC 470 Marine Carpuat Slides credit: Jurasky & Martin How to