- PowerPoint PPT Presentation

Τι θα δούμε σήμερα ▪ Διανυσματική αναπαράσταση λέξεων ( word embeddings) ▪ Μερικές πληροφορίες για την εργασία Books on Wikipedia colored by genre in two dimensions. Source: Koehrsen 2018 1

Word embeddings 2

Πρόβλημα: Διανυσματική αναπαράσταση (representation) όρων embedding όρος διάνυσμα Στόχος: Όμοιοι όροι - > όμοια διανύσματα 3

Παράδειγμα: 2 - διάστατα embeddings http://suriyadeepan.github.io 4

Apple: φρούτο και εταιρεία https://www.analyticsvidhya.com/blog/2017/06/word-embeddings-count-word2veec/ 5

Embeddings: why? Machine learning lifecycle Raw Structured Learning Model Data Data Algorithm Feature Engineering Downstream prediction task For words: document occurrences, k-grams. etc Classification For documents: length, words, Learning to rank etc Clustering For graphs: degree, PageRank, motifs, degrees of neighbors, Pagerank of neighbors, etc 6

Embeddings: why? Machine learning lifecycle Raw Structured Learning Model Data Data Algorithm Downstream Feature Engineering prediction task For words: document occurrences, k-grams. etc For documents: length, words, Automatically learn the features (embeddings) etc For graphs: degree, PageRank, motifs, degrees of neighbors, Pagerank of neighbors, etc 7

One-hot vectors Έστω ότι υπάρχουν |V| διαφορετικές λέξεις (όροι) στο λεξικό μας ▪ Διατάσσουμε τις λέξεις αλφαβητικά Αναπαριστούμε κάθε λέξη με ένα R |𝑊|𝑦1 διάνυσμα που έχει παντού 0 ▪ και μόνο έναν 1 στη θέση που αντιστοιχεί στη θέση της λέξης στη διάταξη 𝟐 0 1 0 0 𝟐 0 0 .. . 0 0 𝑥 𝑏𝑢 = 𝑥 𝑨𝑓𝑠𝑐𝑏 = 𝑥 𝑏𝑏𝑠𝑒𝑤𝑏𝑠𝑙 = 𝑥 𝑏 = 𝟏 0 . . . . . . . . . . . . 0 0 0 𝟐 ▪ Καμία πληροφορία για ομοιότητα ▪ Πολλές διαστάσεις 8

Term-Document co-occurrence matrix Έστω ότι υπάρχουν |V| διαφορετικές λέξεις (όροι) στο λεξικό μας και | Μ | έγγραφα ▪ Κατασκευάζουμε ένα |V|xM πίνακα με τις εμφανίσεις των λέξεων στα έγγραφα ▪ Αναπαριστούμε κάθε λέξη με ένα R |Μ|𝑦1 d1 d2 d3 d4 d5 Παράδειγμα: a 1 1 1 1 1 Word vector d1 : a b c b 1 1 0 1 1 for c d2 : a d a b d3 : a c d e c a f c 1 0 1 0 1 d4 : b e a b d 0 1 1 0 1 d5 : a b d c a |V| = 6, | Μ | =5 e 0 0 1 1 0 f 0 0 1 0 0 9

Term-Document co-occurrence matrix Μπορούμε αντί για 0 - 1 να έχουμε το tf ή και το tf-idf βάρος d1 d2 d3 d4 d5 Παράδειγμα: a 1 2 2 1 2 Word vector d1 : a b c b 1 1 0 2 1 d2 : a d a b for c d3 : a c d e c a f c 1 0 2 0 1 d4 : b e a b d5 : a b d c a d 0 1 1 0 1 e 0 0 1 1 0 f 0 0 1 0 0 ▪ Πολλές διαστάσεις ▪ Πρόβλημα κλιμάκωσης με τον αριθμό των εγγράφων 10

Window-based co-occurrence matrix ▪ Κατασκευάζουμε ένα |V|x|V| affinity-matrix για τις λέξεις: για δύο λέξεις, μετράμε τον αριθμό των φορών που αυτές δύο λέξεις εμφανίζονται μαζί σε έγγραφα ▪ Συγκεκριμένα, μετράνε τον αριθμό των φορών που κάθε λέξη εμφανίζεται μέσα σε ένα παράθυρο συγκεκριμένου μεγέθους γύρω από τη λέξη ενδιαφέροντος a b c d e f Παράδειγμα: a 0 4 3 1 1 1 d1 : a b c b 4 0 1 1 1 0 d2 : a d a b c 3 1 0 2 1 0 d3 : a c d e c a f d4 : b e a b d 1 1 2 0 1 0 d5 : a b d c a e 1 1 1 1 0 0 f 1 0 0 0 0 0 W = 1 ( σε απόσταση 1) 11

Window-based co-occurrence matrix ▪ Κατασκευάζουμε ένα |V|x|V| affinity-matrix για τις λέξεις: μετράμε τον αριθμό των φορών που δυο λέξεις εμφανίζονται μέσα σε ένα παράθυρο συγκεκριμένου μεγέθους Λέξεις όπως apple, orange, mango, κλπ μαζί με λέξεις όπως eat, grow, cultivate, slice, κλπ και το ανάποδο Παράδειγμα: d1 : I enjoy flying. d2 : I like NLP. d3 : I like deep learning. W = 1 ▪ Πολλές διαστάσεις 12

Θα μπορούσαμε να χρησιμοποιήσουμε μια τεχνική για να μειώσουμε τις διαστάσεις (dimensionality reduction) ( πχ PCA analysis) 13

Singular Value Decomposition From dimension d to dimension r σ 1 v 1 σ 2 v 2 V T = u 1 A = U Σ u 2 ⋯ u n ⋱ ⋮ σ n v n [ × n ] [ n × n ] [ n × n ] • σ 1 ≥ σ 2 ≥ … ≥σ n : singular values (square roots of eigenvals AA T , A T A) • u 1 ,u 2 , ⋯ ,u n : left singular vectors (eigenvectors of AA T ) • : right singular vectors (eigenvectors of A T A) v 1 ,v 2 , ⋯ ,v n ▪ Cut the singular values at some index r (get the largest r such values) ▪ Get the first r columns of U to get the r-dimensional vectors

Singular Value Decomposition Ar best approximation of A  (Frobernius norm)     σ v 1 1      σ    v       = = Σ 2 2 T  A U V u u u     1 2 r        [ n × r ] [ r × r ] [ r × n ] σ     v r r • r : rank of matrix A • σ 1 ≥ σ 2 ≥ … ≥σ r : singular values (square roots of eigenvals AA T , A T A)    u , u , , u •  : left singular vectors (eigenvectors of AA T ) 1 2 r    v , v , , v •  : right singular vectors (eigenvectors of A T A) 1 2 r       = + + + σ σ σ T T T  A u v u v u v r 1 1 1 2 2 2 r r r

Αλλά ▪ Δύσκολο να ενημερώσουμε, πχ, αλλάζουν οι διαστάσεις συχνά ▪ Αραιός πίνακας ▪ Πολύ μεγάλες διαστάσεις Θα δούμε μια τεχνική που βασίζεται σε επαναληπτικές μεθόδους 16

word2vec 17

Basic Idea • You can get a lot of value by representing a word by means of its neighbors • “You shall know a word by the company it keeps” • (J. R. Firth 1957: 11) • One of the most successful ideas of modern statistical NLP government debt problems turning into banking crises as has happened in saying that Europe needs unified banking regulation to replace the hodgepodge  These words will represent banking  18

Basic idea Define a model that aims to predict between a center word w c and context words in some window of length m in terms of word vectors P 𝑥 𝑑 𝑥 𝑑−𝑛, … , 𝑥 𝑑−1, 𝑥 𝑑+1 … , 𝑥 𝑑+𝑛 ) Loss function 1-P that we want to minimize

Basic idea Define a model that aims to predict between a center word w c and context words in some window of length m in terms of word vectors P 𝑥 𝑑 𝑥 𝑑−𝑛, … , 𝑥 𝑑−1, 𝑥 𝑑+1 … , 𝑥 𝑑+𝑛 ) Loss function 1-P that we want to minimize Pairwise probabilities Independence assumption (bigram model) 𝑜 𝑄(𝑥 1 , 𝑥 2 , …, 𝑥 𝑜 ) = ς 𝑗=2 𝑄(𝑥 𝑗 |𝑥 𝑗−1 )

Word2Vec Predict between every word and its context words Two algorithms 1. Skip-grams (SG) Predict context words given the center word 2. Continuous Bag of Words (CBOW) Predict center word from a bag-of-words context Position independent (do not account for distance from center) Two training methods 1. Hierarchical softmax 2. Negative sampling Tomas Mikolov, Ilya Sutskever, Kai Chen, Gregory S. Corrado, Jeffrey Dean: Distributed Representations of Words and 22 Phrases and their Compositionality. NIPS 2013: 3111-3119

|V| number of words Note N size of embedding m size of the window (context) Each word is assigned a single N-dimensional vector Learn embedding matrix Z: each column i is the embedding 𝑨 𝑗 of word i |V| Z i N Dimension/size of the embedding 𝑨 𝑗 23

Note Encoder is an embedding lookup 𝐹𝑂𝐷 𝑗 = 𝑎 𝐽 𝑗 Z i 𝑃𝑜𝑓 ℎ𝑝𝑢 𝑤𝑓𝑑𝑢𝑝𝑠 𝐽 𝑗 i 0 0 0 1 One-hot or indicator vector, all 0s but position i 𝑨 𝑗 24

CBOW |V| number of words N size of embedding m size of the window (context) Use a window of context words to predict the center word Input: 2m context words Output: center word each represented as a one-hot vector 25

CBOW Use a window of context words to predict the center word Learns two matrices (two embeddings per word, one when context, one when center) Embedding of the i-th word when W center word N W’ i i Embedding of N the i-th word |V| when context word |V| N x |V| center embeddings |V| x N context embeddings when output when input 26

- PowerPoint PPT Presentation

( word embeddings) Books on Wikipedia colored by genre in two

Memory Memory Decoders M bits M bits RWM NVRWM ROM S 0 S 0 Word 0 Word 0 S 1 Word 1 Word

>>>CLICK HERE<<< Presentation d un document word New Haven. peugeot 207 workshop

Is this a word that would be used by a mature language user? Is it a frequently used word?

Word Meaning & Word Sense Disambiguation CMSC 723 / LING 723 / INST 725 M ARINE C ARPUAT

Building On The Word Building On The Word Nehemiah 8:1-8 Nehemiah 8:1-8

Create PDF in MS Word 2013 using Adobe Distiller 10 Sep 2020 V0C V0C Create PDF In MS Word 2013

Word Sense Word Sense Word Sense Disambiguation Disambiguation Disambiguation Presented by

Word Class A recap on earlier years word class learning for Year 5 and 6 classes Grammarsaurus

4.2 Microsoft Word Microsoft Word is the word processing component of the Microsoft Office

Word Sense Disambiguation Word Sense Disambiguation (WSD) Given A

Word Meaning and Similarity Word Senses and Word Rela-ons Dan

Initial word... Robogames Initial word... Robogames Initial word... Robogames The "CS

Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction Roy Schwartz + ,

Last class Represent a word by a context vector Each word x is represented by a vector v .

Word Embeddings Natural Language Processing VU (706.230) - Andi Rexha 02/04/2020 Word Embeddings

How Did I Get My Bible? Inspiration The Source of the Biblical Writings The Meaning of

pgloader, Your Migration Companion PostgreSQL Conference Europe, Warsaw Dimitri Fontaine

June 2012 Entity Detection Structured Recommendations Play trailer Structured Data Price

Academic Orientation Session St. Michaels College Faculty of Arts & Science, University of

General Game Playing Michael Thielscher, Dresden Some of the material presented in this tutorial

12 T ouchstones of Good T eaching MENA TEACHER SUMMIT October 7-8, 2016 Dubai, United Arab

THE 5 QUESTIONS: A QUICK GUIDE TO REFINING YOUR ANTHEM, AND APPLYING IT TO GROW YOUR BUSINESS

Welcome and Introduction How to Optimize Your Experience at the 2012 B2B Summit Dr. Flint

Diana Gee 1988 Olympics Seoul 1992 Olympics Barcelona Table Tennis Ryan Held 2016

- PowerPoint PPT Presentation

( word embeddings) Books on Wikipedia colored by genre in two

Memory Memory Decoders M bits M bits RWM NVRWM ROM S 0 S 0 Word 0 Word 0 S 1 Word 1 Word

&gt;&gt;&gt;CLICK HERE&lt;&lt;&lt; Presentation d un document word New Haven. peugeot 207 workshop

Is this a word that would be used by a mature language user? Is it a frequently used word?

Word Meaning &amp; Word Sense Disambiguation CMSC 723 / LING 723 / INST 725 M ARINE C ARPUAT

Building On The Word Building On The Word Nehemiah 8:1-8 Nehemiah 8:1-8

Create PDF in MS Word 2013 using Adobe Distiller 10 Sep 2020 V0C V0C Create PDF In MS Word 2013

Word Sense Word Sense Word Sense Disambiguation Disambiguation Disambiguation Presented by

Word Class A recap on earlier years word class learning for Year 5 and 6 classes Grammarsaurus

4.2 Microsoft Word Microsoft Word is the word processing component of the Microsoft Office

Word Sense Disambiguation Word Sense Disambiguation (WSD) Given A

Word Meaning and Similarity Word Senses and Word Rela-ons Dan

Initial word... Robogames Initial word... Robogames Initial word... Robogames The &quot;CS

Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction Roy Schwartz + ,

Last class Represent a word by a context vector Each word x is represented by a vector v .

Word Embeddings Natural Language Processing VU (706.230) - Andi Rexha 02/04/2020 Word Embeddings

How Did I Get My Bible? Inspiration The Source of the Biblical Writings The Meaning of

pgloader, Your Migration Companion PostgreSQL Conference Europe, Warsaw Dimitri Fontaine

June 2012 Entity Detection Structured Recommendations Play trailer Structured Data Price

Academic Orientation Session St. Michaels College Faculty of Arts &amp; Science, University of

General Game Playing Michael Thielscher, Dresden Some of the material presented in this tutorial

12 T ouchstones of Good T eaching MENA TEACHER SUMMIT October 7-8, 2016 Dubai, United Arab

THE 5 QUESTIONS: A QUICK GUIDE TO REFINING YOUR ANTHEM, AND APPLYING IT TO GROW YOUR BUSINESS

Welcome and Introduction How to Optimize Your Experience at the 2012 B2B Summit Dr. Flint

Diana Gee 1988 Olympics Seoul 1992 Olympics Barcelona Table Tennis Ryan Held 2016

>>>CLICK HERE<<< Presentation d un document word New Haven. peugeot 207 workshop

Word Meaning & Word Sense Disambiguation CMSC 723 / LING 723 / INST 725 M ARINE C ARPUAT

Initial word... Robogames Initial word... Robogames Initial word... Robogames The "CS

Academic Orientation Session St. Michaels College Faculty of Arts & Science, University of