Distrib ute d Re pre se nta tio ns o f Se nte nc e s a nd Do c ume - PDF document

2017-11-27 Distrib ute d Re pre se nta tio ns o f Se nte nc e s a nd Do c ume nts QUOC L E , T OMAS MI K OL OV PRE SE NT E RS: AMI N a nd AL I Outline ▶ Introduction ▶ Algorithm Learning Vector Representation of Words Paragraph Vector: A distributed memory model Paragraph Vector without word ordering: Distributed bag of words ▶ Experiments ▶ Conclusion ▶ Demo 2 1

2017-11-27 I ntro duc tio n ▶ Many machine learning algorithms require the input to be represented as a fixed-length feature vector. ▶ When it comes to texts, one of the most common fixed-length features is bag-of-words. 3 Ba g o f Wo rds 4 2

2017-11-27 Ba g o f Wo rds Disa dva nta g e s ▶ The word order is lost, and thus different sentences can have exactly the same representation, as long as the same words are used. ▶ Even though bag-of-n-grams considers the word order in short context, it suffers from data sparsity and high dimensionality. ▶ Bag-of-words and bag-of-n-grams have very little sense about the semantics of the words or more formally the distances between the words. (powerful, Paris, strong) 5 Wo rd E mb e dding word(i-k) word(i-k) sum word(i) word(i) projection word(i-k+1) word(i-k+1) … … word(i+k) word(i+k) CBOW Skipgram 6 3

2017-11-27 Wo rd E mb e dding 7 Pro po se d Me tho d ▶ Distributed Representations of Sentences and Documents model was proposed. ▶ Paragraph Vector, an unsupervised algorithm that learns fixed- length feature representations from variable-length pieces of texts. ▶ Proposed algorithm represents each document by a dense vector which is trained to predict words in the document. 8 4

2017-11-27 L e a rning Ve c to r Re pre se nta tio n o f Wo rds ▶ The task is to predict a word given the other words in a context. 9 Pa ra g ra ph Ve c to r: A distrib ute d me mo ry mo de l ( PV-DM ) ▶ Paragraph vectors are used for prediction ▶ Every paragraph is mapped to a unique vector. ▶ Every word is also mapped to a unique vector 10 5

2017-11-27 Pa ra g ra ph Ve c to r: A distrib ute d me mo ry mo de l ( PV-DM ) ▶ The contexts are sampled from a sliding window over paragraph ▶ Paragraph vector is shared across all contexts from the same paragraph. ▶ Word vectors are shared across paragraphs 11 Adva nta g e s o ve r BOW  Semantics of the words. In this space, “powerful” is closer to “strong” than to “Paris”  Take into consideration the word order. 12 6

2017-11-27 Pa ra g ra ph Ve c to r Distrib ute d Ba g o f Wo rds ( PV-DBOW ) ▶ In this version, the paragraph vector is trained to predict the words in a small window. 13 E xpe rime nt ▶ Each paragraph vector is a combination of two vectors: one learned by PV-DM and one learned by PV-DBOW. ▶ Sentiment Analysis. ▶ Stanford sentiment treebank 11855 sentences ▶ ▶ IMDB 100000 movie reviews ▶ ▶ Information Retrieval 14 7

2017-11-27 Sta nfo rd se ntime nt tre e b a nk ▶ Learn the representations for all the sentences ▶ The paragraph vector is the concatenation of two vectors from PV-DBOW and PV-DM ▶ Logistic Regression was used for prediction ▶ Every sentence has label which goes from 0.0 to 1.0 15 Sta nfo rd se ntime nt tre e b a nk 16 8

2017-11-27 I MDB ▶ Using Neural Networks and Logistic Regression for prediction ▶ The paragraph vector is the concatenation of two vectors from PV-DBOW and PV-DM 17 I MDB 18 9

2017-11-27 I nfo rma tio n Re trie va l calls from ( 000 ) 000 - 0000 . 3913 calls reported from this number . ▶ according to 4 reports the identity of this caller is american airlines . do you want to find out who called you from +1 000 - 000 - 0000 , +1 ▶ 0000000000 or ( 000) 000 - 0000 ? see reports and share information you have about this caller allina health clinic patients for your convenience , you can pay your ▶ allina health clinic bill online . pay your clinic bill now , question and answers... 19 Ob se rva tio ns ▶ PV-DM is consistently better than PV-DBOW ▶ PV-DM alone can achieve good results ▶ The combination of PV-DM and PV-DOW can gain best results. ▶ A good guess for window size is between 5 and 12. ▶ The proposed method must be run in parallel. 20 10

2017-11-27 Adva nta g e s a nd Disa dva nta g e s The proposed method is competitive with state-of-the-art methods. ▶ The good performance demonstrates the merits of Paragraph vector ▶ in capturing the semantics of paragraphs. It is scalable (sentences, paragraphs, and documents). ▶ Paragraph vectors have the potential to overcome many weaknesses ▶ of bag-of-words (word orders, word meaning, …) Paragraph vector can be expensive. ▶ Too many parameters. ▶ If the input corpus is one with lots of misspellings like tweets, this ▶ algorithm may not be a good choice 21 De mo 22 11

2017-11-27 Input layer 0 Index of cat in vocabulary 1 0 0 Hidden layer Output layer cat 0 0 0 0 0 0 … 0 0 0 one-hot 0 one-hot vh2 sat 0 vector vector 0 0 1 0 … 0 1 0 0 on 0 0 0 … 0 23 We must learn W and W ’ Input layer 0 1 0 0 Hidden layer Output layer cat � 0 �� 0 0 0 0 0 … 0 V-dim 0 0 �′ �� 0 sat 0 0 0 0 1 0 … N-dim � V-dim 1 0 �� on 0 0 0 0 … V-dim N will be the size of word vector 0 24 12

vh2 One hot encoding technique is used to encode categorical integer features using a one-hot aka one-of-K scheme. Suppose you have ‘color’ feature which can take values ‘green’, ‘red’, and ‘blue’. One hot encoding will convert this ‘color’ feature to three features, namely, ‘is_green’, ‘is_red’, and ‘is_blue’ which all are binary. vagelis hristidis, 2016-11-06

2017-11-27 � � � � �� 0 0.1 2.4 1.6 1.8 0.5 0.9 … … … 3.2 2.4 Input layer 1 0.5 2.6 1.4 2.9 1.5 3.6 … … … 6.1 2.6 0 � 0 … … … … … … … … … … � … 0 1 0 … … … … … … … … … … … 0 0 0.6 1.8 2.7 1.9 2.4 2.0 … … … 1.2 1.8 0 0 Output layer x cat 0 0 0 … 0 0 0 0 0 … 0 V-dim 0 0 � � � �� 0 + sat 2 0 0 0 0 1 0 … V-dim 1 0 Hidden layer 0 x on 0 N-dim 0 0 … V-dim 0 25 � � � � �� 0 0.1 2.4 1.6 1.8 0.5 0.9 … … … 3.2 1.8 Input layer 0 0.5 2.6 1.4 2.9 1.5 3.6 … … … 6.1 2.9 0 � 0 … … … … … … … … … … � … 1 1 0 … … … … … … … … … … … 0 0 0.6 1.8 2.7 1.9 2.4 2.0 … … … 1.2 1.9 0 0 Output layer x cat 0 0 0 … 0 0 0 0 0 … 0 V-dim 0 0 � � � �� 0 + sat 2 0 0 0 0 1 0 … V-dim 1 0 Hidden layer 0 x on 0 N-dim 0 0 … V-dim 0 26 13

2017-11-27 Input layer 0 1 0 0 Hidden layer Output layer � cat 0 �� 0 0 0 0 0 … 0 V-dim 0 0 � � � �� 0 �� 0 0 0 0 1 � � 0 … � 1 0 �� N-dim 0 on � �� 0 0 V-dim 0 … V-dim N will be the size of word vector 0 27 Input layer 0 We would prefer � � close to � � �� 1 0 0 Hidden layer Output layer cat � 0 �� 0 0 0 0.01 0 0 0.02 … 0 V-dim 0.00 0 0 � � � � � � � 0 �� 0.02 0 � � � �� 0.01 0 0 0 1 0.02 � � 0 … 0.01 � 1 0 �� N-dim 0.7 on 0 � � �� 0 … 0 V-dim 0.00 0 … � � V-dim N will be the size of word vector 0 28 14

2017-11-27 � � �� 0.1 2.4 1.6 1.8 0.5 0.9 … … … 3.2 Contain word’s vectors Input layer 0.5 2.6 1.4 2.9 1.5 3.6 … … … 6.1 0 … … … … … … … … … … 1 … … … … … … … … … … 0 0.6 1.8 2.7 1.9 2.4 2.0 … … … 1.2 0 Output layer x cat 0 0 0 � 0 �� 0 0 … 0 V-dim 0 � 0 � �� 0 sat 0 0 0 0 1 0 � … �� V-dim 1 0 Hidden layer 0 x on 0 N-dim 0 0 … V-dim 0 We can consider either W or W’ as the word’s representation. Or even take the average. 29 15

Distrib ute d Re pre se nta tio ns o f Se nte nc e s a nd Do c ume - PDF document

2017-11-27 Distrib ute d Re pre se nta tio ns o f Se nte nc e s a nd Do c ume nts QUOC L E , T OMAS MI K OL OV PRE SE NT E RS: AMI N a nd AL I Outline Introduction Algorithm Learning Vector Representation of Words

Pe e r Pre se nta tio ns 15-20 minute c la ssro o m o r g ra de le ve l pre se nta tio n

Ame ric a n I Ame ric a n I nte rna tio na l Gro up I nte rna tio na l Gro up, I nc nc . F

Mo ntc la ir Bo a rd o f E duc a tio n 2020-2021 Budg e t Pre se nta tio n Spe c ia l E duc

2019-2020 Budg e t Pre se nta tio n June 17, 2019 Ope ning c o mme nta ry L o re tta No tte

Pre se nta tio n to Pre se nta tio n to Be lc he r Be lc he r town De par town De par tme nt

JINDAL ST AINL E SS L IMIT E D Disc laime r 2 T his pre se nta tio n a nd the a c c o mp

Five Oaks Investment Corp. I nve sto r Pre se nta tio n May 2018 Disc la ime r & Na me Cha

Co mpa ny Pre se nta tio n No ve mb e r 2016 Ce rta in Disc lo sure s Ce rta in sta te me nts

Bridg ing Po la ritie s thro ug h Art A pre se nta tio n a b o ut the F a c ilita tio n T ra

For personal use only Inte rna tio na l Mining a nd Re so urc e C o nfe re nc e Pre se nta tio

Pre se nte r Disc lo sure Pre se nte r : Zo Ca mpb e ll No re la tio nships with c o

I RP Pre se nta tio n Ro b e rt L a ug hto n Dire c to r-E nviro nme nta l He a lth a nd Sa fe

CPN Re ta il Growth L e a se hold RE IT Inve stor Pre se nta tion Q4 2017 Ma rc h 2018 Disc

Yuko n Ag ric ultura l Asso c ia tio n Pre se nta tio n fo r Circ umpo la r Ag ric ultura l Co

Pro c e dura l Rig hts Pre se nta tio n to Pa re nts- 11/ 13/ 17 Spe c ia l E duc a tio n

T o da y s Pre se nta tio n Ge ne ra l info rma tio n a nd tips to pre pa re yo u to c o

PREPROCESSING PARAGRAPHS: A BEGINNERS GUIDE DRUPALCON SEATTLE WEDNESDAY 4:00-4:30PM ROOM

Writing a research paper 1 Organizing a research paper Decide up front what the point of your

I have power! 1 About Barefoot Power (1) In business for 11 years International

Gender violence Christianity and Human Rights : Perspectives from Papua New Guinea and Vanuatu 1 /

BML Operations Order and Geospatial Representations GMU BML Symposium 2009 4 5 February 2009

Introduction to Lexical Functional Grammar Mary Dalrymple, John Lowe, & Louise Mycock Centre

Deep Linguistic Information in Hybrid Machine Translation Charles

Scalable Multi-Coloring Preconditioning for Multi-core CPUs and GPUs Vincent Heuveline 1 , Dimitar

Sambuz

Useful Links

Newsletter

Mail Us

Distrib ute d Re pre se nta tio ns o f Se nte nc e s a nd Do c ume - PDF document

2017-11-27 Distrib ute d Re pre se nta tio ns o f Se nte nc e s a nd Do c ume nts QUOC L E , T OMAS MI K OL OV PRE SE NT E RS: AMI N a nd AL I Outline Introduction Algorithm Learning Vector Representation of Words

Pe e r Pre se nta tio ns 15-20 minute c la ssro o m o r g ra de le ve l pre se nta tio n

Ame ric a n I Ame ric a n I nte rna tio na l Gro up I nte rna tio na l Gro up, I nc nc . F

Mo ntc la ir Bo a rd o f E duc a tio n 2020-2021 Budg e t Pre se nta tio n Spe c ia l E duc

2019-2020 Budg e t Pre se nta tio n June 17, 2019 Ope ning c o mme nta ry L o re tta No tte

Pre se nta tio n to Pre se nta tio n to Be lc he r Be lc he r town De par town De par tme nt

JINDAL ST AINL E SS L IMIT E D Disc laime r 2 T his pre se nta tio n a nd the a c c o mp

Five Oaks Investment Corp. I nve sto r Pre se nta tio n May 2018 Disc la ime r &amp; Na me Cha

Co mpa ny Pre se nta tio n No ve mb e r 2016 Ce rta in Disc lo sure s Ce rta in sta te me nts

Bridg ing Po la ritie s thro ug h Art A pre se nta tio n a b o ut the F a c ilita tio n T ra

For personal use only Inte rna tio na l Mining a nd Re so urc e C o nfe re nc e Pre se nta tio

Pre se nte r Disc lo sure Pre se nte r : Zo Ca mpb e ll No re la tio nships with c o

I RP Pre se nta tio n Ro b e rt L a ug hto n Dire c to r-E nviro nme nta l He a lth a nd Sa fe

CPN Re ta il Growth L e a se hold RE IT Inve stor Pre se nta tion Q4 2017 Ma rc h 2018 Disc

Yuko n Ag ric ultura l Asso c ia tio n Pre se nta tio n fo r Circ umpo la r Ag ric ultura l Co

Pro c e dura l Rig hts Pre se nta tio n to Pa re nts- 11/ 13/ 17 Spe c ia l E duc a tio n

T o da y s Pre se nta tio n Ge ne ra l info rma tio n a nd tips to pre pa re yo u to c o

PREPROCESSING PARAGRAPHS: A BEGINNERS GUIDE DRUPALCON SEATTLE WEDNESDAY 4:00-4:30PM ROOM

Writing a research paper 1 Organizing a research paper Decide up front what the point of your

I have power! 1 About Barefoot Power (1) In business for 11 years International

Gender violence Christianity and Human Rights : Perspectives from Papua New Guinea and Vanuatu 1 /

BML Operations Order and Geospatial Representations GMU BML Symposium 2009 4 5 February 2009

Introduction to Lexical Functional Grammar Mary Dalrymple, John Lowe, &amp; Louise Mycock Centre

Deep Linguistic Information in Hybrid Machine Translation Charles

Scalable Multi-Coloring Preconditioning for Multi-core CPUs and GPUs Vincent Heuveline 1 , Dimitar

Sambuz

Useful Links

Newsletter

Mail Us

Five Oaks Investment Corp. I nve sto r Pre se nta tio n May 2018 Disc la ime r & Na me Cha

Introduction to Lexical Functional Grammar Mary Dalrymple, John Lowe, & Louise Mycock Centre