Word Embeddings Tutorial HILA GONEN PHD STUDENT AT YOAV GOLDBERGS - PowerPoint PPT Presentation

Word Embeddings Tutorial HILA GONEN PHD STUDENT AT YOAV GOLDBERG’S LAB BAR ILAN UNIVERSITY 5/3/18

Outline ◦ NLP Intro ◦ Word representations and word embeddings ◦ Word2vec models ◦ Visualizing word embeddings ◦ Word2vec in Hebrew ◦ Similarity ◦ Analogies ◦ Evaluation ◦ A simple classification example WIDS - NLP TUTORIAL - WORD EMBEDDINGS 2

NLP - Natural Language Processing understanding NLP is the field that includes natural languages processing analyzing generating We aim to create applicative models that perform as similar as possible to humans WIDS - NLP TUTORIAL - WORD EMBEDDINGS 3

NLP Applications in NLP: ◦ Translation ◦ Information Extraction ◦ Summarization ◦ Parsing ◦ Question Answering ◦ Sentiment Analysis ◦ Text Classification And many more… WIDS - NLP TUTORIAL - WORD EMBEDDINGS 4

NLP challenges This field encounters numerous challenges: ◦Polysemy ◦Syntactic ambiguity ◦Variability ◦Co-reference resolution ◦Lack of data / huge amounts of data WIDS - NLP TUTORIAL - WORD EMBEDDINGS 5

NLP challenges - Polysemy Book Verb: Book a flight Noun: He says it’s a very good book Bank The edge of a river: He was strolling near the river bank A financial institution: He works at the bank Solution An answer to a problem: Work out the solution in your head From Chemistry: Heat the solution to 75° Celsius WIDS - NLP TUTORIAL - WORD EMBEDDINGS 6

NLP challenges – Polysemy Kids make nutritious snacks ◦ Kids, when cooked well, can make nutritious snacks Kids make nutritious snacks ◦ Kids know how to prepare nutritious snacks WIDS - NLP TUTORIAL - WORD EMBEDDINGS 7

NLP challenges – Syntactic Ambiguity 12 on their way to cruise among dead in plane crash 12 on their way to cruise among dead in plane crash same words – different meanings WIDS - NLP TUTORIAL - WORD EMBEDDINGS 8

NLP challenges – Syntactic Ambiguity The cotton clothing is usually made of grows in Mississippi The cotton clothing is usually made of grows in Mississippi same words – different meanings WIDS - NLP TUTORIAL - WORD EMBEDDINGS 9

NLP challenges – Syntactic Ambiguity Fat people eat accumulates Fat people eat accumulates same words – different meanings WIDS - NLP TUTORIAL - WORD EMBEDDINGS 10

NLP challenges – Variability They allowed him to… They let him … He was allowed to… He was permitted to… Different words – same meaning WIDS - NLP TUTORIAL - WORD EMBEDDINGS 11

NLP challenges – Co-Reference Resolution Rachel had to wait for Dan because he said he wanted her advice. This is a simple case… There are more complex ones. Dan called Bob to tell him about his surprising experience last week: “you won’t believe it, I myself could not believe it”. WIDS - NLP TUTORIAL - WORD EMBEDDINGS 12

NLP challenges – Data-related issues A lot of data In some cases, we deal with huge amounts of data Need to come up with models that can process a lot of data efficiently Lack of data Many problems in NLP suffer from lack of data: ◦ Non-standard platforms (code-switching) ◦ Expensive annotation (word-sense disambiguation, named-entity recognition) Need to use methods to overcome this challenge (semi-supervised learning, multi-task learning…) WIDS - NLP TUTORIAL - WORD EMBEDDINGS 13

Representation We can represent objects in different hierarchy levels: ◦ Documents ◦ Sentences ◦ Phrases ◦ Words We want the representation to be interpretable and easy-to-use Vector representation meets those requirements We will focus on word representation WIDS - NLP TUTORIAL - WORD EMBEDDINGS 14

The Distributional Hypothesis The Distributional Hypothesis: ◦ words that occur in the same contexts tend to have similar meanings (Harris, 1954) ◦ “You shall know a word by the company it keeps” (Firth, 1957) Examples: tomato ◦ Cucumber, sauce, pizza, ketchup ◦ Soundtrack, lyrics, sang, duet song WIDS - NLP TUTORIAL - WORD EMBEDDINGS 15

Vector Representation We can define a word by a vector of counts over contexts, For Example: song cucumber meal black tomato 0 6 5 0 book 2 0 2 3 pizza 0 2 4 1 ◦ Each word is associated with a vector of dimension |V| (the size of the vocabulary) ◦ We expect similar words to have similar vectors ◦ Given the vectors of two words, we can determine their similarity (more about that later) We can use different granularities of contexts: documents, sentences, phrases, n-grams WIDS - NLP TUTORIAL - WORD EMBEDDINGS 16

Vector Representation Raw counts are problematic: ◦ frequent words will characterize most words -> not informative Except from raw counts, we can use other functions: ◦ TF-IDF (for term (t) – document (d)): 𝐸 – set of all documents ◦ Pointwise Mutual Information: WIDS - NLP TUTORIAL - WORD EMBEDDINGS 17

From Sparse to Dense These vectors are: ◦ huge – each of dimension |V| (the size of the vocabulary ~ ) ◦ sparse – most entries will be 0 We want our vectors to be small and dense, two options: 1. Use a reduction algorithm such as SVD over a matrix of sparse vectors 2. Learn low-dimensional word vectors directly - usually referred as “word embeddings” We will focus on the second option WIDS - NLP TUTORIAL - WORD EMBEDDINGS 18

Word Embeddings Each word in the vocabulary is represented by a low dimensional vector (~ ) All words are embedded into the same space Similar words have similar vectors (= their vectors are close to each other in the vector space) Word embeddings are successfully used for various NLP applications WIDS - NLP TUTORIAL - WORD EMBEDDINGS 19

Uses of word embeddings Word embeddings are successfully used for various NLP applications (usually simply for initialization) ◦ Semantic similarity ◦ Word sense Disambiguation ◦ Semantic Role Labeling ◦ Named entity Recognition ◦ Summarization ◦ Question Answering ◦ Textual Entailment ◦ Coreference Resolution ◦ Sentiment analysis ◦ etc. WIDS - NLP TUTORIAL - WORD EMBEDDINGS 20

Word2Vec Models for efficiently creating word embeddings Remember: our assumption is that similar words appear with similar context Intuition: two words that share similar contexts are associated with vectors that are close to each other in the vector space Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean, 2013. Efficient estimation of word representations in vector space . arXiv preprint arXiv:1301.3781. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean, 2013. Distributed representations of words and phrases and their compositionality . In Advances in neural information processing systems. WIDS - NLP TUTORIAL - WORD EMBEDDINGS 21

Word2Vec Models for efficiently creating word embeddings Remember: our assumption is that similar words appear with similar context Intuition: two words that share similar contexts are associated with vectors that are close to each other in the vector space Distributional Context of 𝑦 Context of 𝑧 hypothesis Let 𝑦 and 𝑧 be Model Model similar words objective objective Resulting 𝑦 𝑧 similarity WIDS - NLP TUTORIAL - WORD EMBEDDINGS 22

Word2Vec The input: one-hot vectors ◦ bananas: (1,0,0,0) ◦ monkey: 0,1,0,0 vocabulary size |V| = 4 ◦ likes: 0,0,1,0 ◦ every: (0,0,0,1) We are going to look at pairs of neighboring words: 𝑓𝑤𝑓𝑠𝑧, 𝑛𝑝𝑜𝑙𝑓𝑧 Every monkey likes bananas 𝑚𝑗𝑙𝑓𝑡, 𝑛𝑝𝑜𝑙𝑓𝑧 (𝑐𝑏𝑜𝑏𝑜𝑏𝑡, 𝑛𝑝𝑜𝑙𝑓𝑧) WIDS - NLP TUTORIAL - WORD EMBEDDINGS 23

CBOW – high level The resulting projection matrix 𝑄 is the embedding matrix Goal: Predict the middle word given the words of the context Projection Softmax Layer One-hot vector One-hot Vectors Matrix - 𝑄 Sum of context 𝑥 �� 𝑥 �� 𝑄 vectors Output Cross-entropy Matrix - 𝑁 loss 𝑥 �� 𝑄 𝑥 �� 𝑑 ⋅ 𝑁 𝑥 �� 𝑄 𝑑 𝑥 � 𝑥 �� 𝑁 ��×�� 𝑥 �� 𝑄 𝑥 �� 𝑒 = 300 𝑄 ��×�� 𝑒 = 100𝐿 𝑒 = 100𝐿 𝑒 = 100𝐿 WIDS - NLP TUTORIAL - WORD EMBEDDINGS 24

Skip-gram – high level The resulting projection matrix 𝑄 is the embedding matrix Goal: Predict the context words given the middle word Output Softmax Layer One-hot vectors One-hot Vector Matrix - 𝑁 Representation Projection 𝑥 �� of 𝑥 � Matrix - 𝑄 𝑦 ⋅ 𝑁 𝑥 �� 𝑦 ⋅ 𝑁 𝑥 � ⋅ 𝑄 𝑥 � 𝑦 𝑦 ⋅ 𝑁 𝑥 �� 𝑄 𝑦 ⋅ 𝑁 ��×�� 𝑒 = 300 𝑁 ��×�� 𝑥 �� Cross-entropy loss 𝑒 = 100𝐿 𝑒 = 100𝐿 𝑒 = 100𝐿 WIDS - NLP TUTORIAL - WORD EMBEDDINGS 25

Skip-gram – details Vector representations will be useful for predicting the surrounding words. Formally: Given a sequence of training words , the objective of the Skip-gram model is to maximize the average log probability: The basic Skip-gram formulation defines using the softmax function: WIDS - NLP TUTORIAL - WORD EMBEDDINGS 26

Word Embeddings Tutorial HILA GONEN PHD STUDENT AT YOAV GOLDBERGS - PowerPoint PPT Presentation

Word Embeddings Tutorial HILA GONEN PHD STUDENT AT YOAV GOLDBERGS LAB BAR ILAN UNIVERSITY 5/3/18 Outline NLP Intro Word representations and word embeddings Word2vec models Visualizing word embeddings Word2vec in

Word Embeddings Natural Language Processing VU (706.230) - Andi Rexha 02/04/2020 Word Embeddings

Word embeddings Rappel Embeddings ( pas Word Embeddings ) Est une lookup table Formalisme:

Word Embeddings Revisited: Contextual Embeddings CS 6956: Deep Learning for NLP Overview

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Embeddings @ Twitter Making ML easy with Embeddings !!! Sept 2018 Agenda 1 Team 2 Whats an

Mixed membership word embeddings: Corpus-specific embeddings without big data James Foulds

Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction Roy Schwartz + ,

Tutorial Tutorial A2 is out, its called Inpainting Tutorial Tutorial A2 is out, its called

Dense Word Embeddings CMSC 470 Marine Carpuat Slides credit: Jurasky & Martin How to

Dense Word Embeddings CMSC 470 Marine Carpuat Slides credit: Jurasky & Martin How to

Memory Memory Decoders M bits M bits RWM NVRWM ROM S 0 S 0 Word 0 Word 0 S 1 Word 1 Word

Lecture 8: NLP and Word Embeddings Alireza Akhavan Pour CLASS.VISION

Word Embeddings through Hellinger PCA Rmi Lebret and Ronan Collobert Idiap Research Institute /

On the Eigenspectrum Eigenspectrum of the Gram of the Gram On the Matrix and the Generalisation

Accelerated Natural Language Processing Lecture 5 N-gram models, entropy Sharon Goldwater (some

The ENCOPLOT Similarity Measure for Automatic Detection of Plagiarism Cristian Grozea 1 Marius

Machine Translation Evaluation Sara Stymne Partly based on Philipp Koehns slides for chapter 8

11 Practicalities 2: Evaluating MT Systems Now that weve talked about how to create machine

Corpus Linguistics Statistical Measures in Information Retrieval Niko Schenk Institut f ur

Image Captioning Image Captioning Image Captioning A survey of recent deep-learning approaches

Goals and Motivations Measure how well an automatic system can describe a video in natural

Word Embeddings Tutorial HILA GONEN PHD STUDENT AT YOAV GOLDBERGS - PowerPoint PPT Presentation

Word Embeddings Tutorial HILA GONEN PHD STUDENT AT YOAV GOLDBERGS LAB BAR ILAN UNIVERSITY 5/3/18 Outline NLP Intro Word representations and word embeddings Word2vec models Visualizing word embeddings Word2vec in

Word Embeddings Natural Language Processing VU (706.230) - Andi Rexha 02/04/2020 Word Embeddings

Word embeddings Rappel Embeddings ( pas Word Embeddings ) Est une lookup table Formalisme:

Word Embeddings Revisited: Contextual Embeddings CS 6956: Deep Learning for NLP Overview

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Embeddings @ Twitter Making ML easy with Embeddings !!! Sept 2018 Agenda 1 Team 2 Whats an

Mixed membership word embeddings: Corpus-specific embeddings without big data James Foulds

Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction Roy Schwartz + ,

Tutorial Tutorial A2 is out, its called Inpainting Tutorial Tutorial A2 is out, its called

Dense Word Embeddings CMSC 470 Marine Carpuat Slides credit: Jurasky &amp; Martin How to

Dense Word Embeddings CMSC 470 Marine Carpuat Slides credit: Jurasky &amp; Martin How to

Memory Memory Decoders M bits M bits RWM NVRWM ROM S 0 S 0 Word 0 Word 0 S 1 Word 1 Word

Lecture 8: NLP and Word Embeddings Alireza Akhavan Pour CLASS.VISION

Word Embeddings through Hellinger PCA Rmi Lebret and Ronan Collobert Idiap Research Institute /

On the Eigenspectrum Eigenspectrum of the Gram of the Gram On the Matrix and the Generalisation

Accelerated Natural Language Processing Lecture 5 N-gram models, entropy Sharon Goldwater (some

The ENCOPLOT Similarity Measure for Automatic Detection of Plagiarism Cristian Grozea 1 Marius

Machine Translation Evaluation Sara Stymne Partly based on Philipp Koehns slides for chapter 8

11 Practicalities 2: Evaluating MT Systems Now that weve talked about how to create machine

Corpus Linguistics Statistical Measures in Information Retrieval Niko Schenk Institut f ur

Image Captioning Image Captioning Image Captioning A survey of recent deep-learning approaches

Goals and Motivations Measure how well an automatic system can describe a video in natural

Dense Word Embeddings CMSC 470 Marine Carpuat Slides credit: Jurasky & Martin How to

Dense Word Embeddings CMSC 470 Marine Carpuat Slides credit: Jurasky & Martin How to