Distributed Representations of Sentences and Documents Quoc Le and - PowerPoint PPT Presentation

Word Vector Paragraph Vector Experiments Distributed Representations of Sentences and Documents Quoc Le and Tomas Mikolov (ICML 2014) Discussion by: Chunyuan Li April 17, 2015 1 / 15

Word Vector Paragraph Vector Experiments Outline Word Vector 1 Background Neural Language Model Continous Bag-of-Words Skip-gram Model Paragraph Vector 2 Distributed Memory Model of Paragraph Vectors Distributed Bag of Words of Paragraph Vector Experiments 3 Sentiment Analysis Information Retrieval 2 / 15

Background Word Vector Neural Language Model Paragraph Vector Continous Bag-of-Words Experiments Skip-gram Model Background in text representation One-hot representation/One-of-N coding Bag-of-words N-gram model 3 / 15

Background Word Vector Neural Language Model Paragraph Vector Continous Bag-of-Words Experiments Skip-gram Model Neural Language Model A mapping C from any element i of V to a real vector C ( i ) . It represents the distributed feature vectors . Learning in context . “The cat is walking in the bedroom” Maximize the average (regularized) log-likelihood L = 1 � t log f ( w t , w t − 1 , · · · , w t − ( n − 1 ) ; θ ) T A neural probabilistic language model (Bengio et al. JMLR 2003) 4 / 15

Background Word Vector Neural Language Model Paragraph Vector Continous Bag-of-Words Experiments Skip-gram Model Neural Language Model A conditional probability distribution over words in V for the next word w t exp ( y w t ) p ( w t | w t − 1 , · · · , w t − n + 1 ) = � i exp ( y i ) where y = b + Wx + U tanh ( d + Hx ) x = ( C ( w t − 1 ) , C ( w t − 2 ) , · · · , C ( w t − ( n − 1 ) )) θ = ( b , d , W , U , H , C ) red: model parameters, green: vector representation 5 / 15

Background Word Vector Neural Language Model Paragraph Vector Continous Bag-of-Words Experiments Skip-gram Model Continous Bag-of-Words (Mikolov et al, 2013) Predict the current word based on the context The nonlinear hidden layer is removed y = b + Wx θ = ( b , W , C ) Efficient estimation of word representations in vector space (Mikolov et al, 2013) 6 / 15

Background Word Vector Neural Language Model Paragraph Vector Continous Bag-of-Words Experiments Skip-gram Model Skip-gram Model Predict the surrounding words � f = p ( w t + j | w t ) − ℓ ≤ j ≤ ℓ, j � = 0 where exp ( y ⊤ wt + j y wt ) p ( w t + j | w t ) = � i exp ( y i ⊤ y wt ) y i = C ( w i ) θ = C Distributed representations of words and phrases and their compositionality (Mikolov et al, NIPS 2013) 7 / 15

Background Word Vector Neural Language Model Paragraph Vector Continous Bag-of-Words Experiments Skip-gram Model Word Vector - Linguistic Regularities One can do nearest neighbor search around result of vector operation “King – man + woman” and obtain “Queen” Linguistic regularities in continuous space word representations (Mikolov et al, 2013) 8 / 15

Word Vector Distributed Memory Model of Paragraph Vectors Paragraph Vector Distributed Bag of Words of Paragraph Vector Experiments Distributed Memory Model of Paragraph Vectors (PV-DM) D : paragraph vectors; W : word vectors x is constructed from W and D It acts as a memory that remembers what is missing from the current context One paragraph vector is only shared across all contexts generated from the same paragraph; The word vector is shared across paragraphs. 9 / 15

Word Vector Distributed Memory Model of Paragraph Vectors Paragraph Vector Distributed Bag of Words of Paragraph Vector Experiments Distributed Bag of Words of Paragraph Vector (PV-DBOW) In practice 1. sample a text window 2. sample a random word from the text window 3. form a classification task given the Paragraph Vector. PV-DM alone usually works well for most tasks. The final paragraph vector is a combination of two vectors. 10 / 15

Word Vector Sentiment Analysis Paragraph Vector Information Retrieval Experiments Experiment I: Sentiment Analysis Sentiment analysis Stanford sentiment treebank dataset (Socher et al., 2013b) IMDB dataset (Maas et al., 2011) Evaluation Fine-grained: {Very Negative, Negative, Neutral, Positive, Very Positive} Coarse-grained: {Negative, Positive} Methods to compare Bag-of-Words Word Vector Averaging (Socher et al., 2013b) Recursive Neural Network (Socher et al., 2011) Martix Vector-RNN (Socher et al., 2012) Recursive Neural Tensor Network (Socher et al., 2013) 11 / 15

Word Vector Sentiment Analysis Paragraph Vector Information Retrieval Experiments Recursive Neural Network (RNN) Each node is attached 3 items A score s to determine whether neighboring words/phrase should be merged into a larger phrase, where s = W score p A new vector representation p for the larger phrase � p L � � � p = f W + b p R Its class label. e.g. , phrase types W is recursively used everywhere in the tree Other models can be obtained by augumenting the recursive composition functions 12 / 15

Word Vector Sentiment Analysis Paragraph Vector Information Retrieval Experiments Experiment I: Sentiment Analysis Figure: Stanford Sentiment Treebank dataset. Figure: IMDB dataset. 13 / 15

Word Vector Sentiment Analysis Paragraph Vector Information Retrieval Experiments Experiment II: Information Retrieval Dataset 1,000,000 "triplets" Two paragraphs are results of the same query, whereas the third paragraph from a different query. Performance 14 / 15

Word Vector Sentiment Analysis Paragraph Vector Information Retrieval Experiments References Le, Quoc V., and Tomas Mikolov. Distributed representations of sentences and documents ICML 2014 Mikolov, Tomas, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean Distributed representations of words and phrases and their compositionality NIPS 2013 Bengio, Yoshua, Rejean Ducharme, Pascal Vincent, and Christian Janvin. A neural probabilistic language model. The Journal of Machine Learning Research , 2003 Richard Socher Recursive Deep Learning for Natural Language Processing and Computer Vision PhD Thesis, Computer Science Department, Stanford University , 2014 15 / 15

Distributed Representations of Sentences and Documents Quoc Le and - PowerPoint PPT Presentation

Word Vector Paragraph Vector Experiments Distributed Representations of Sentences and Documents Quoc Le and Tomas Mikolov (ICML 2014) Discussion by: Chunyuan Li April 17, 2015 1 / 15 Word Vector Paragraph Vector Experiments Outline

A new Initiativ ive to save p e peatlan ands as the world's lar argest t terrestria ial or

Minor International PCL Minor International PCL 2Q06 Analyst Meeting Michael Sagild, COO Four

Mikolovs Language Models: Distributed Representations of Sentences and Documents Recurrent

SYMBOLIC LOGIC UNIT 10: SINGULAR SENTENCES Singular Sentences (monadic) Paris is beautiful

West t Midlands ands Rail l De Devolution olution November 2015 Who o is s West st Midl

CSC321 Lecture 7: Distributed Representations Roger Grosse Roger Grosse CSC321 Lecture 7:

61A Lecture 16 Announcements String Representations String Representations 4 String

Parsing of natural language sentences to syntactic and semantic graph representations

Activity 1 Describe this character using as many 2a sentences as you can. Try and use ambitious

Nouns, V erbs, and Sentences 98-348: Lecture 2 Nouns, verbs and sentences 98-348: Lecture 2

Quantifier Elimination Helpful lemmas Let S be a set of sentences. Helpful lemmas Let S be a set

Distributed Systems (ICE 601) Distributed Transactions Dongman Lee ICU Class Overview

Unleashing Talent in A Distributed Workforce C O R E N E T 2 0 2 0 HACKATHON: DISTRIBUTED W O R K

Fourier transform for nilpotent Lie groups Index sets and representations Granada Index sets

CSC421/2516 Lecture 3: Automatic Differentiation & Distributed Representations Jimmy Ba

Distributed Databases Distributed database management system A distributed database (DDB) is

Simple and Effective Multi-Paragraph Reading Comprehension Christopher Clark and Matt Gardner

Air passenger projections, tourism growth Ireland, Dublin Airport - 40 million PAX Peru,

PNG TACKLE PROJECT BRUSSELS, BELGIUM (2-3 JULY 2013) Summary of KRA3 Action Programmes in

Introduction Last year the Council held a series of discussions and workshops with local

BOARD OF TRUSTEES FINANCE AND AUDIT COMMITTEE BUDGET WORKSHOP MAY 27, 2004 BUDGET PROCESS

Major Factors to the FY08 Budget: #1. A 5% Revenue Increase based on existing rates. #2. A 6%

M A PHYS or the development of a parallel algebraic domain decomposition solver in the course of

APPLICABLE AUTOMOTIVE TECHNOLOGY IN THE FUTURE. GAIKINDO INTERNATIONAL AUTOMOTIVE CONFERENCE

Distributed Representations of Sentences and Documents Quoc Le and - PowerPoint PPT Presentation

Word Vector Paragraph Vector Experiments Distributed Representations of Sentences and Documents Quoc Le and Tomas Mikolov (ICML 2014) Discussion by: Chunyuan Li April 17, 2015 1 / 15 Word Vector Paragraph Vector Experiments Outline

A new Initiativ ive to save p e peatlan ands as the world's lar argest t terrestria ial or

Minor International PCL Minor International PCL 2Q06 Analyst Meeting Michael Sagild, COO Four

Mikolovs Language Models: Distributed Representations of Sentences and Documents Recurrent

SYMBOLIC LOGIC UNIT 10: SINGULAR SENTENCES Singular Sentences (monadic) Paris is beautiful

West t Midlands ands Rail l De Devolution olution November 2015 Who o is s West st Midl

CSC321 Lecture 7: Distributed Representations Roger Grosse Roger Grosse CSC321 Lecture 7:

61A Lecture 16 Announcements String Representations String Representations 4 String

Parsing of natural language sentences to syntactic and semantic graph representations

Activity 1 Describe this character using as many 2a sentences as you can. Try and use ambitious

Nouns, V erbs, and Sentences 98-348: Lecture 2 Nouns, verbs and sentences 98-348: Lecture 2

Quantifier Elimination Helpful lemmas Let S be a set of sentences. Helpful lemmas Let S be a set

Distributed Systems (ICE 601) Distributed Transactions Dongman Lee ICU Class Overview

Unleashing Talent in A Distributed Workforce C O R E N E T 2 0 2 0 HACKATHON: DISTRIBUTED W O R K

Fourier transform for nilpotent Lie groups Index sets and representations Granada Index sets

CSC421/2516 Lecture 3: Automatic Differentiation &amp; Distributed Representations Jimmy Ba

Distributed Databases Distributed database management system A distributed database (DDB) is

Simple and Effective Multi-Paragraph Reading Comprehension Christopher Clark and Matt Gardner

Air passenger projections, tourism growth Ireland, Dublin Airport - 40 million PAX Peru,

PNG TACKLE PROJECT BRUSSELS, BELGIUM (2-3 JULY 2013) Summary of KRA3 Action Programmes in

Introduction Last year the Council held a series of discussions and workshops with local

BOARD OF TRUSTEES FINANCE AND AUDIT COMMITTEE BUDGET WORKSHOP MAY 27, 2004 BUDGET PROCESS

Major Factors to the FY08 Budget: #1. A 5% Revenue Increase based on existing rates. #2. A 6%

M A PHYS or the development of a parallel algebraic domain decomposition solver in the course of

APPLICABLE AUTOMOTIVE TECHNOLOGY IN THE FUTURE. GAIKINDO INTERNATIONAL AUTOMOTIVE CONFERENCE

CSC421/2516 Lecture 3: Automatic Differentiation & Distributed Representations Jimmy Ba