Word Embeddings - Word2Vec Fall 2020 2020-09-30 Adapted from slides - PowerPoint PPT Presentation

SFU NatLangLab CMPT 825: Natural Language Processing Word Embeddings - Word2Vec Fall 2020 2020-09-30 Adapted from slides from Dan Jurafsky, Chris Manning, Danqi Chen and Karthik Narasimhan 1

Announcements • Homework 1 due today • Both parts are due • Programming component has 2 grace days, but something must be turned in by tonight • Single person groups - highly encouraged to team up with each other • Video Lectures • Summary of logisBc regression (opBonal) • Word vectors (required) - covers PPMI • Word vectors TF-IDF (required, not yet posted) - covers TF-IDF • Word vectors Summary (opBonal, not yet posted) • Using SVD to get dense word vectors, and connecBons to word2vec • TA video summarizing key points about word vectors 2

Representing words by their context Distributional hypothesis : words that occur in similar contexts tend to have similar meanings J.R.Firth 1957 • “You shall know a word by the company it keeps” • One of the most successful ideas of modern statistical NLP! These context words will represent banking . 3

Word Vectors • One-hot vectors hotel = [0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0] motel = [0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0] • Represent words by their context context word-word (term-context) co-occurrence matrix (other words in the span around the target word) term matrix | V | × | V | sugar, a sliced lemon, a tablespoonful of apricot jam, a pinch each of, their enjoyment. Cautiously she sampled her first pineapple and another fruit whose taste she likened well suited to programming on the digital computer . In finding the optimal R-stage policy from for the purpose of gathering data and information necessary for the study authorized in the 4

Sparse vs dense vectors Vectors we get from word-word (term-context) co-occurrence matrix are • long (length |V|= 20,000 to 50,000) • sparse (most elements are zero) True for both one-hot, U-idf and PPMI vectors AlternaBve: we want to represent words as • The focus of this lecture • short (50-300 dimensional) • The basis of all the modern NLP systems • dense (real-valued) vectors 5

<latexit sha1_base64="rlyV9z4BFdXN5ATSBwve48t4Vs=">ACa3icbZHPT9swFMedjA3oNijAGI7WKuQdlmVhELaAxKC0cmrYDUVJXjvhYLx4nsF0QV9bI/cTf+Ay78DzhNQAP2JEsfd8P38dZ1IY9Lw7x3239P7D8spq4+Onz2vrzY0v5ybNYc+T2WqL2NmQAoFfRQo4TLTwJYwkV8fVLmL25AG5Gq3zjLYJiwqRITwRladT8EyHcYgFJtMZgJnTQxrFMBWqyBKGWtzOqdcOugc0iyEvaCEn17bD8Mn8hbkl9Srtf1OUDXsdXoVBKFfQadry0GNn+ePmi2v7S2CvgW/hap42zU/BuNU54noJBLZszA9zIcFkyj4BLmjSg3kDF+zaYwsKhYAmZYLya012rjOk1fYopAv1346CJcbMkthW2v2uzOtcKf4vN8hx0h0WQmU5guLVRZNcUkxpaTwdCw0c5cwC41rYXSm/YpxtN/TsCb4r5/8Fs6Ddmnzr07r6Li2Y4V8Jd/JD+KTkByRU3JG+oSTe2fN2XK2nQd3091xv1WlrlP3bJIX4e4+Ani5r64=</latexit> <latexit sha1_base64="rlyV9z4BFdXN5ATSBwve48t4Vs=">ACa3icbZHPT9swFMedjA3oNijAGI7WKuQdlmVhELaAxKC0cmrYDUVJXjvhYLx4nsF0QV9bI/cTf+Ay78DzhNQAP2JEsfd8P38dZ1IY9Lw7x3239P7D8spq4+Onz2vrzY0v5ybNYc+T2WqL2NmQAoFfRQo4TLTwJYwkV8fVLmL25AG5Gq3zjLYJiwqRITwRladT8EyHcYgFJtMZgJnTQxrFMBWqyBKGWtzOqdcOugc0iyEvaCEn17bD8Mn8hbkl9Srtf1OUDXsdXoVBKFfQadry0GNn+ePmi2v7S2CvgW/hap42zU/BuNU54noJBLZszA9zIcFkyj4BLmjSg3kDF+zaYwsKhYAmZYLya012rjOk1fYopAv1346CJcbMkthW2v2uzOtcKf4vN8hx0h0WQmU5guLVRZNcUkxpaTwdCw0c5cwC41rYXSm/YpxtN/TsCb4r5/8Fs6Ddmnzr07r6Li2Y4V8Jd/JD+KTkByRU3JG+oSTe2fN2XK2nQd3091xv1WlrlP3bJIX4e4+Ani5r64=</latexit> <latexit sha1_base64="rlyV9z4BFdXN5ATSBwve48t4Vs=">ACa3icbZHPT9swFMedjA3oNijAGI7WKuQdlmVhELaAxKC0cmrYDUVJXjvhYLx4nsF0QV9bI/cTf+Ay78DzhNQAP2JEsfd8P38dZ1IY9Lw7x3239P7D8spq4+Onz2vrzY0v5ybNYc+T2WqL2NmQAoFfRQo4TLTwJYwkV8fVLmL25AG5Gq3zjLYJiwqRITwRladT8EyHcYgFJtMZgJnTQxrFMBWqyBKGWtzOqdcOugc0iyEvaCEn17bD8Mn8hbkl9Srtf1OUDXsdXoVBKFfQadry0GNn+ePmi2v7S2CvgW/hap42zU/BuNU54noJBLZszA9zIcFkyj4BLmjSg3kDF+zaYwsKhYAmZYLya012rjOk1fYopAv1346CJcbMkthW2v2uzOtcKf4vN8hx0h0WQmU5guLVRZNcUkxpaTwdCw0c5cwC41rYXSm/YpxtN/TsCb4r5/8Fs6Ddmnzr07r6Li2Y4V8Jd/JD+KTkByRU3JG+oSTe2fN2XK2nQd3091xv1WlrlP3bJIX4e4+Ani5r64=</latexit> <latexit sha1_base64="rlyV9z4BFdXN5ATSBwve48t4Vs=">ACa3icbZHPT9swFMedjA3oNijAGI7WKuQdlmVhELaAxKC0cmrYDUVJXjvhYLx4nsF0QV9bI/cTf+Ay78DzhNQAP2JEsfd8P38dZ1IY9Lw7x3239P7D8spq4+Onz2vrzY0v5ybNYc+T2WqL2NmQAoFfRQo4TLTwJYwkV8fVLmL25AG5Gq3zjLYJiwqRITwRladT8EyHcYgFJtMZgJnTQxrFMBWqyBKGWtzOqdcOugc0iyEvaCEn17bD8Mn8hbkl9Srtf1OUDXsdXoVBKFfQadry0GNn+ePmi2v7S2CvgW/hap42zU/BuNU54noJBLZszA9zIcFkyj4BLmjSg3kDF+zaYwsKhYAmZYLya012rjOk1fYopAv1346CJcbMkthW2v2uzOtcKf4vN8hx0h0WQmU5guLVRZNcUkxpaTwdCw0c5cwC41rYXSm/YpxtN/TsCb4r5/8Fs6Ddmnzr07r6Li2Y4V8Jd/JD+KTkByRU3JG+oSTe2fN2XK2nQd3091xv1WlrlP3bJIX4e4+Ani5r64=</latexit> Dense vectors   0 . 286 0 . 792     − 0 . 177     − 0 . 107     employees = 10 . 109     − 0 . 542     0 . 349     0 . 271   0 . 487 short + dense 6

Why dense vectors? • Short vectors are easier to use as features in ML systems • Dense vectors may generalize better than storing explicit counts • They do better at capturing synonymy • co-occurs with “car”, co-occurs with “automobile” w 1 w 2 • Different methods for getting dense vectors: • Singular value decomposition (SVD) • word2vec and friends: “learn” the vectors! 7

Word2vec and friends 8

Download pretrained word embeddings Word2vec (Mikolov et al.) https://code.google.com/archive/p/word2vec/ Fasttext http://www.fasttext.cc/ Glove (Pennington, Socher, Manning) http://nlp.stanford.edu/projects/glove/ 9

Word2Vec • Popular embedding method • Very fast to train • Idea: predict rather than count (Mikolov et al, 2013): Distributed Representations of Words and Phrases and their Compositionality 10

Word2Vec • Instead of counting how often each word occurs near “apricot” w • Train a classifier on a binary prediction task: • Is likely to show up near “apricot”? w • We don’t actually care about this task • But we’ll take the learned classifier weights as the word embeddings 11

Word2Vec Insight: use running text as implicitly supervised training data! • A word near apricot s • Act as gold “correct answer” to the question “Is word w likely to show up near apricot?” • No need for hand-labeled supervision • The idea comes from neural language modeling • Bengio et al (2003) • Collobert et al (2011) (Bengio et al, 2003): A Neural Probabilistic Language Model 12

Word Embeddings - Word2Vec Fall 2020 2020-09-30 Adapted from slides - PowerPoint PPT Presentation

SFU NatLangLab CMPT 825: Natural Language Processing Word Embeddings - Word2Vec Fall 2020 2020-09-30 Adapted from slides from Dan Jurafsky, Chris Manning, Danqi Chen and Karthik Narasimhan 1 Announcements Homework 1 due today Both

word2vec Kuan-Ting Lai 2020/5/28 Word2vec (Word Embeddings) Embed one-hot encoded word

Word Embeddings Tutorial HILA GONEN PHD STUDENT AT YOAV GOLDBERGS LAB BAR ILAN UNIVERSITY

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings Natural Language Processing VU (706.230) - Andi Rexha 02/04/2020 Word Embeddings

Word embeddings Rappel Embeddings ( pas Word Embeddings ) Est une lookup table Formalisme:

Word Embeddings Revisited: Contextual Embeddings CS 6956: Deep Learning for NLP Overview

Embeddings @ Twitter Making ML easy with Embeddings !!! Sept 2018 Agenda 1 Team 2 Whats an

word2vec Durgesh Kumar OSINT LAB, CSE Department IIT Guwahati Table of contents 1 Overview 2

An overview of word2vec Benjamin Wilson Berlin ML Meetup, July 8 2014 Benjamin Wilson word2vec

Mixed membership word embeddings: Corpus-specific embeddings without big data James Foulds

Introduction CSCE CSCE 496/896 496/896 Lecture 9: Lecture 9: word2vec and word2vec and To

word2vec Tom Kenter IR Reading Group September 12 2014

Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction Roy Schwartz + ,

IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 Neural networks, Language

Software, Faster Patterns of Effective Delivery Dan North @tastapod Patterns of Effective

Lecture 22/Chapter 19 Part 4. Statistical Inference Ch. 19 Diversity of Sample Proportions

WATER Ocean Properties Module 2.3 Proudly developed by SMART with funding from Inspiring

Risk-Limiting Audits Michigan Association of Municipal Clerks The Internet Philip B. Stark

Colossians Series Lesson #49 March 18, 2012 Dean Bible Ministries www.deanbible.org Dr. Robert

Outline for Today Wednesday, Dec. 5 Chapter 11: Intermolecular Forces and Liquids Phase

Native Advertising and Content Marketing UDLS May 5, 2017 Neil Newman - Almost all content is