word embeddings - PowerPoint PPT Presentation

Τι θα δούμε σήμερα ▪ Τα βασικά στοιχεία των word embeddings ▪ Ερωτήσεις, ασκήσεις ▪ Στατιστικά συλλογής (και ίσως συμπίεση) 1

Word embeddings ΙΙ ( basics) 2

Διανυσματική αναπαράσταση (representation) λέξεων – κατανεμημένη ( distributed) αναπαράσταση embedding λέξη διάνυσμα dense, low dimension Στόχος: όμοιες λέξεις - > όμοια διανύσματα 3

Είσοδος κείμενα , προτάσεις Google news … 4

Ομοιότητα/απόσταση 4 1 2 3 5 6 ||(1, 2, 3|| 5

Πως θα βρούμε τις ποιο όμοιες λέξεις με το dog; 7

TIP: Όπου μπορούμε χρησιμοποιούμε πράξεις πινάκων. Γιατί; Το σωστό |V| x 1 8

Λέξη ποιο όμοια σε πολλές άλλες; 9

Σε προηγούμενα μαθήματα είδαμε Lemmatization Stemming Λέξεις σημασιολογικά κοντινές 10

Πως θα πάρουμε αυτόν τον πίνακα; 11

Βασική ιδέα Μία λέξη προσδιορίζεται από τις συμφραζόμενες της λέξεις (context) Ο καθηγητής διδάσκει το μάθημα στους φοιτητές του στην αίθουσα. Παράθυρο (window) = 3 Center word Κάθε λέξη δύο αναπαραστάσεις : (1) center (2) context Context word Δηλαδή , έχουμε 2 | V| x d πίνακες ▪ Το center- διάνυσμα της center λέξης πρέπει να είναι όμοιο με τα context- διανύσματα ( δηλαδή, το άθροισμα των context διανυσμάτων) των context λέξεων ▪ Και προφανώς το συμμετρικό Learning: παραδείγματα κειμένου και προσπαθούμε να «μάθουμε» αυτά τα διανύσματα (βάρη) Training examples – fix the matrices to work for them 12

w: center representation – c: context representation 13

w: center representation – c: context representation Negative sampling (αρνητικά παραδείγματα) 14

Word2Vec Two algorithms 1. Continuous Bag of Words (CBOW) Predict center word from a bag-of-words context 2. Skip-grams (SG) Predict context words given the center word Position independent (do not account for distance from center) Two training methods 1. Hierarchical softmax 2. Negative sampling Tomas Mikolov, Ilya Sutskever, Kai Chen, Gregory S. Corrado, Jeffrey Dean: Distributed Representations of Words and 15 Phrases and their Compositionality. NIPS 2013: 3111-3119

Βασική ιδέα Το σκυλί __ την ουρά CBOW Η γάτα __ το ποντίκι Ο ήλιος __ το πρωί Το φεγγάρι __ κάθε νύχτα __ __ κουνά __ __ skipgram __ __ κυνηγάει __ __ __ __ ανατέλλει __ __ __ __ δύει __ __ 16

Ας δούμε πάλι και κάποιες λεπτομέρειες 17

One-hot vectors Έστω ότι υπάρχουν |V| διαφορετικές λέξεις (όροι) στο λεξικό μας ▪ Διατάσσουμε τις λέξεις αλφαβητικά Αναπαριστούμε κάθε λέξη με ένα R |𝑊|𝑦1 διάνυσμα που έχει παντού 0 ▪ και μόνο έναν 1 στη θέση που αντιστοιχεί στη θέση της λέξης στη διάταξη 𝟐 0 1 0 0 𝟐 0 0 .. . 0 0 𝑥 𝑏𝑢 = 𝑥 𝑨𝑓𝑠𝑐𝑏 = 𝑥 𝑏𝑏𝑠𝑒𝑤𝑏𝑠𝑙 = 𝑥 𝑏 = 𝟏 0 . . . . . . . . . . . . 0 0 0 𝟐 ▪ Καμία πληροφορία για ομοιότητα ▪ Πολλές διαστάσεις 18

Given matrix W, πως παίρνουμε το embedding της i- οστής λέξης; Lookup/project 𝐹𝑂𝐷 𝑗 = 𝑋 𝐽 𝑗 W i 𝑃𝑜𝑓 ℎ𝑝𝑢 𝑤𝑓𝑑𝑢𝑝𝑠 𝐽 𝑗 i 0 0 0 1 One-hot or indicator vector, all 0s but position i 𝑥 𝑗 19

CBOW |V| number of words N size of embedding m size of the window (context) Use a window of context words to predict the center word Input: 2m context words Output: center word each represented as a one-hot vector 20

CBOW Use a window of context words to predict the center word Learns two matrices (two embeddings per word, one when context, one when center) Embedding of the i-th word when W center word N W’ i i Embedding of N the i-th word |V| when context word |V| N x |V| center embeddings |V| x N context embeddings when output when input 21

CBOW Intuition The W’ -embedding of the center word should be similar to the (sum of the) W-embeddings of its context words We want similarity close to one for the center word and close to 0 for all other words 22

CBOW Given window size m 𝑦 (𝑑) one hot vector for context words, y one hot vector for the center word 1. INPUT: the one hot vectors for the 2m context words 𝑦 (𝑑−𝑛) , …, 𝑦 (𝑑−1) , 𝑦 (𝑑+1) , …, 𝑦 (𝑑+𝑛) 2. GET THE EMBEDDINGS of the context words 𝑤 𝑑−𝑛 = 𝑋𝑦 (𝑑−𝑛) , …, 𝑤 𝑑−1 = 𝑋𝑦 (𝑑−1) , 𝑤 𝑑+1 = 𝑋𝑦 (𝑑+1) , …, 𝑤 𝑑+𝑛 = 𝑋𝑦 (𝑑+𝑛) 3. TAKE THE SUM these vectors 𝑤 𝑑−𝑛 +𝑤 𝑑−𝑛+1 +⋯𝑤 𝑑+𝑛 𝑤 ∈ 𝑆 𝑂 𝑤 = ො , ො 2𝑛 4. COMPUTE SIMILARITY: dot produce W’ (all center vectors) and context ො 𝑤 z = W’ ො 𝑤 5. Turn the score vector to probabilities 𝑧 = softmax(z) ො We want this to be close to 1 for the center word 23

Input layer Index of cat in vocabulary 0 1 0 0 Hidden layer Output layer cat 0 0 0 0 0 0 … 0 0 0 one-hot 0 one-hot sat 0 vector vector 0 0 0 1 … 0 1 0 0 on 0 0 0 … 0 25

We must learn W and W ’ Input layer 0 1 0 0 Hidden layer Output layer cat 0 𝑋 𝑊×𝑂 0 0 0 0 0 … 0 V-dim 0 0 𝑋′ 𝑂×𝑊 0 sat 0 0 0 1 0 0 … N-dim 𝑋 0 V-dim 1 𝑊×𝑂 0 on 0 0 0 … V-dim N will be the size of word vector 0 26

𝑈 𝑋 × 𝑦 𝑑𝑏𝑢 = 𝑤 𝑑𝑏𝑢 𝑊×𝑂 0 0.1 2.4 1.6 1.8 0.5 0.9 … … … 3.2 2.4 1 Input layer 0.5 2.6 1.4 2.9 1.5 3.6 … … … 6.1 2.6 0 0 … … … … … … … … … … × 0 = … 1 0 … … … … … … … … … … … 0 0 0.6 1.8 2.7 1.9 2.4 2.0 … … … 1.2 1.8 0 0 Output layer x cat 0 0 0 … 0 0 0 0 0 … 0 V-dim 0 0 𝑤 = 𝑤 𝑑𝑏𝑢 + 𝑤 𝑝𝑜 0 ො + sat 2 0 0 0 0 1 … 0 V-dim 1 0 Hidden layer 0 x on 0 N-dim 0 0 … V-dim 0 27

𝑈 𝑋 × 𝑦 𝑝𝑜 = 𝑤 𝑝𝑜 𝑊×𝑂 0 0.1 2.4 1.6 1.8 0.5 0.9 … … … 3.2 1.8 0 Input layer 0.5 2.6 1.4 2.9 1.5 3.6 … … … 6.1 2.9 0 0 … … … … … … … … … … × 1 = … 1 0 … … … … … … … … … … … 0 0 0.6 1.8 2.7 1.9 2.4 2.0 … … … 1.2 1.9 0 0 Output layer x cat 0 0 0 … 0 0 0 0 0 … 0 V-dim 0 0 𝑤 = 𝑤 𝑑𝑏𝑢 + 𝑤 𝑝𝑜 0 ො + sat 2 0 0 0 0 1 … 0 V-dim 1 0 Hidden layer 0 x on 0 N-dim 0 0 … V-dim 0 28

Input layer 0 1 0 0 Hidden layer Output layer cat 0 𝑋 𝑊×𝑂 0 0 0 0 0 … 0 V-dim 0 0 𝑧 = 𝑡𝑝𝑔𝑢𝑛𝑏𝑦(𝑨) ො ′ 𝑋 × ො 𝑤 = 𝑨 0 𝑊×𝑂 0 0 0 1 0 0 𝑤 ො … 𝑋 0 1 𝑊×𝑂 N-dim 0 on 𝑧 sat ො 0 0 V-dim 0 … V-dim N will be the size of word vector 0 29

Input layer 0 We would prefer ො 𝑧 close to ො 𝑧 𝑡𝑏𝑢 1 0 0 Hidden layer Output layer cat 0 𝑋 𝑊×𝑂 0 0 0 0.01 0 0 0.02 … 0 V-dim 0 0 0.00 ′ 𝑋 × ො 𝑤 = 𝑨 0 𝑊×𝑂 0.02 0 𝑧 = 𝑡𝑝𝑔𝑢𝑛𝑏𝑦(𝑨) ො 0.01 0 0 1 0 0.02 0 𝑤 ො … 0.01 𝑋 0 1 𝑊×𝑂 0.7 N-dim 0 on 𝑧 sat ො 0 … 0 V-dim 0.00 0 … 𝑧 ො V-dim N will be the size of word vector 0 30

𝑈 𝑋 𝑊×𝑂 0.1 2.4 1.6 1.8 0.5 0.9 … … … 3.2 Contain word’s vectors Input layer 0.5 2.6 1.4 2.9 1.5 3.6 … … … 6.1 0 … … … … … … … … … … 1 … … … … … … … … … … 0 0.6 1.8 2.7 1.9 2.4 2.0 … … … 1.2 0 Output layer x cat 0 0 0 0 𝑋 𝑊×𝑂 0 0 … 0 V-dim 0 0 ′ 𝑋 0 𝑊×𝑂 sat 0 0 0 0 1 … 0 𝑋 V-dim 1 𝑊×𝑂 0 Hidden layer 0 x on 0 N-dim 0 0 … V-dim 0 We can consider either W (context) or W’ (center) as the word’s representation. Or even take the average. 31

Skipgram Given the center word, predict (or, generate) the context words Input: center word Output: 2m context word each represented as a one-hot vectors Learn two matrices W: N x |V|, input matrix, word representation as center word W’ : |V| x N, output matrix, word representation as context word 32

word embeddings - PowerPoint PPT Presentation

word embeddings , ( ) 1 Word embeddings (

Word Embeddings Natural Language Processing VU (706.230) - Andi Rexha 02/04/2020 Word Embeddings

Word embeddings Rappel Embeddings ( pas Word Embeddings ) Est une lookup table Formalisme:

Word Embeddings Revisited: Contextual Embeddings CS 6956: Deep Learning for NLP Overview

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Embeddings @ Twitter Making ML easy with Embeddings !!! Sept 2018 Agenda 1 Team 2 Whats an

Word Embeddings Tutorial HILA GONEN PHD STUDENT AT YOAV GOLDBERGS LAB BAR ILAN UNIVERSITY

Mixed membership word embeddings: Corpus-specific embeddings without big data James Foulds

Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction Roy Schwartz + ,

Dense Word Embeddings CMSC 470 Marine Carpuat Slides credit: Jurasky & Martin How to

Dense Word Embeddings CMSC 470 Marine Carpuat Slides credit: Jurasky & Martin How to

Memory Memory Decoders M bits M bits RWM NVRWM ROM S 0 S 0 Word 0 Word 0 S 1 Word 1 Word

Lecture 8: NLP and Word Embeddings Alireza Akhavan Pour CLASS.VISION

Word Embeddings through Hellinger PCA Rmi Lebret and Ronan Collobert Idiap Research Institute /

Geophysical Ice Flows: Analytical and Numerical Approaches Will Mitchell University of Alaska -

Understanding Urine pH of Pre-Fresh Cows Differing in Metabolic Acid-Base Status Tim Brown,

Calmness of solution mappings in parametric optimization problems Diethard Klatte, University

Three Body Mean Motion Resonances Tabar Gallardo Departamento de Astronoma Facultad de

Implementing Distributed Consensus Dan Ldtke What? My hobby project of learning about

tr stts s t P

Experimentation with CCN Daniel Camara, Frederic Urbani, Mathieu Lacage, Thierry Turletti, Walid

Branches of the Landscape Michael Dine (work with Tom Banks, Elie Gorbatov and Scott Thomas, Deva

word embeddings - PowerPoint PPT Presentation

word embeddings , ( ) 1 Word embeddings (

Word Embeddings Natural Language Processing VU (706.230) - Andi Rexha 02/04/2020 Word Embeddings

Word embeddings Rappel Embeddings ( pas Word Embeddings ) Est une lookup table Formalisme:

Word Embeddings Revisited: Contextual Embeddings CS 6956: Deep Learning for NLP Overview

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Embeddings @ Twitter Making ML easy with Embeddings !!! Sept 2018 Agenda 1 Team 2 Whats an

Word Embeddings Tutorial HILA GONEN PHD STUDENT AT YOAV GOLDBERGS LAB BAR ILAN UNIVERSITY

Mixed membership word embeddings: Corpus-specific embeddings without big data James Foulds

Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction Roy Schwartz + ,

Dense Word Embeddings CMSC 470 Marine Carpuat Slides credit: Jurasky &amp; Martin How to

Dense Word Embeddings CMSC 470 Marine Carpuat Slides credit: Jurasky &amp; Martin How to

Memory Memory Decoders M bits M bits RWM NVRWM ROM S 0 S 0 Word 0 Word 0 S 1 Word 1 Word

Lecture 8: NLP and Word Embeddings Alireza Akhavan Pour CLASS.VISION

Word Embeddings through Hellinger PCA Rmi Lebret and Ronan Collobert Idiap Research Institute /

Geophysical Ice Flows: Analytical and Numerical Approaches Will Mitchell University of Alaska -

Understanding Urine pH of Pre-Fresh Cows Differing in Metabolic Acid-Base Status Tim Brown,

Calmness of solution mappings in parametric optimization problems Diethard Klatte, University

Three Body Mean Motion Resonances Tabar Gallardo Departamento de Astronoma Facultad de

Implementing Distributed Consensus Dan Ldtke What? My hobby project of learning about

tr stts s t P

Experimentation with CCN Daniel Camara, Frederic Urbani, Mathieu Lacage, Thierry Turletti, Walid

Branches of the Landscape Michael Dine (work with Tom Banks, Elie Gorbatov and Scott Thomas, Deva

Dense Word Embeddings CMSC 470 Marine Carpuat Slides credit: Jurasky & Martin How to

Dense Word Embeddings CMSC 470 Marine Carpuat Slides credit: Jurasky & Martin How to