 
              Outline Mikolov,ICML’14 Kalchbrenner,ACL’14 Hermann,ACL’14 Summary Appendix Distributed Representation of Sentences LU Yangyang luyy11@sei.pku.edu.cn July 16,2014 @ KERE Seminar
Outline Mikolov,ICML’14 Kalchbrenner,ACL’14 Hermann,ACL’14 Summary Appendix Outline Distributed Representation of Sentences and Documents. ICML’14 Word Vector Paragraph Vector Experiments of NLP Tasks A Convolutional Neural Network for Modelling Sentences. ACL’14 DCNN: Convolutional Neural Networks Experiments of NLP Tasks Multilingual Models for Compositional Distributed Semantics. ACL’14 Composition Models Experiments Summary
Outline Mikolov,ICML’14 Kalchbrenner,ACL’14 Hermann,ACL’14 Summary Appendix Authors • Distributed Representation of Sentences and Documents. • ICML’14 1 • Quoc Le, Tomas Mikolov • Google Inc, Mountain View • A Convolutional Neural Network for Modelling Sentences. • ACL’14 2 • Nal Kalchbrenner, Edward Grefenstette, Phil Blunsom • University of Oxford • Multilingual Models for Compositional Distributed Semantics. • ACL’14 • Karl Moritz Hermann, Phil Blunsom • University of Oxford 1 http://icml.cc/2014/index/article/15.htm 2 http://acl2014.org/acl2014/index.html
Outline Mikolov,ICML’14 Kalchbrenner,ACL’14 Hermann,ACL’14 Summary Appendix Outline Distributed Representation of Sentences and Documents. ICML’14 Word Vector Paragraph Vector Experiments of NLP Tasks A Convolutional Neural Network for Modelling Sentences. ACL’14 Multilingual Models for Compositional Distributed Semantics. ACL’14 Summary
Outline Mikolov,ICML’14 Kalchbrenner,ACL’14 Hermann,ACL’14 Summary Appendix Outline Distributed Representation of Sentences and Documents. ICML’14 Word Vector Paragraph Vector Experiments of NLP Tasks A Convolutional Neural Network for Modelling Sentences. ACL’14 DCNN: Convolutional Neural Networks Experiments of NLP Tasks Multilingual Models for Compositional Distributed Semantics. ACL’14 Composition Models Experiments Summary
Outline Mikolov,ICML’14 Kalchbrenner,ACL’14 Hermann,ACL’14 Summary Appendix Recall: Word Vector 3 Every word: • A unique vector, represented by a column in a matrix W Given a sequence of training words w 1 , w 2 , w 3 , ..., w T : 3Mikolov T, et al. Efficient estimation of word representations in vector space[C]. ICLR workshop, 2013
Outline Mikolov,ICML’14 Kalchbrenner,ACL’14 Hermann,ACL’14 Summary Appendix Recall: Word Vector 3 Every word: • A unique vector, represented by a column in a matrix W Given a sequence of training words w 1 , w 2 , w 3 , ..., w T : • Predicting a word given the other words in a context (CBOW) • Predicting the surrounding words given a word (Skip-gram) 3Mikolov T, et al. Efficient estimation of word representations in vector space[C]. ICLR workshop, 2013
Outline Mikolov,ICML’14 Kalchbrenner,ACL’14 Hermann,ACL’14 Summary Appendix Recall: Word Vector The Skip-gram Model 4 • Predicting the surrounding words given a word in sentence • The objective: T 1 ∑︂ ∑︂ maximize log p ( w t + j | w t ) T t =1 − c ≤ j ≤ c,j ̸ =0 where c : the size of the training context 4Mikolov T, et al. Distributed representations of words and phrases and their compositionality[J]. Advances in Neural Information Processing Systems, 2013
Outline Mikolov,ICML’14 Kalchbrenner,ACL’14 Hermann,ACL’14 Summary Appendix Recall: Word Vector Continuous Bag-of-Words Model(CBOW) 5 • Predicting a word given the other words in a context • The projection layer: shared for all words (not just the projection matrix) • The objective: T − k 1 ∑︂ maximize log p ( w t | w t − k , ..., w t + k ) T t = k 5Mikolov T, et al. Efficient estimation of word representations in vector space[C]. ICLR workshop, 2013
Outline Mikolov,ICML’14 Kalchbrenner,ACL’14 Hermann,ACL’14 Summary Appendix Word Vector • The objective: T − k maximize 1 ∑︂ log p ( w t | w t − k , ..., w t + k ) T t = k • The prediction task: via a multiple classifier (e.g. softmax 6 ) e y w p ( w t | w t − k , ..., w t + k ) = i e y i ∑︁ y = b + Uh ( w t − k , ..., w t + k ; W ) where U, b : the softmax parameters h : a concatenation or average of word vectors extracted from W 6GOTO 53
Outline Mikolov,ICML’14 Kalchbrenner,ACL’14 Hermann,ACL’14 Summary Appendix Outline Distributed Representation of Sentences and Documents. ICML’14 Word Vector Paragraph Vector Experiments of NLP Tasks A Convolutional Neural Network for Modelling Sentences. ACL’14 DCNN: Convolutional Neural Networks Experiments of NLP Tasks Multilingual Models for Compositional Distributed Semantics. ACL’14 Composition Models Experiments Summary
Outline Mikolov,ICML’14 Kalchbrenner,ACL’14 Hermann,ACL’14 Summary Appendix Paragraph Vector PV-DM: A Distributed Memory Model • The paragraph vectors are asked to contribute to the prediction task of the next word given many contexts sampled from the paragraph. • The paragraph acts as a memory that remembers what is missing from the current context – or the topic of the paragraph.
Outline Mikolov,ICML’14 Kalchbrenner,ACL’14 Hermann,ACL’14 Summary Appendix PV-DM • Every paragraph: a column in matrix D • Shared across all contexts generated from the same paragraph but not across paragraphs • Every word: a column in matrix W • Shared across paragraphs • Sampled from a fixed-length context over the paragraph • Concatenate paragraph and word vectors
Outline Mikolov,ICML’14 Kalchbrenner,ACL’14 Hermann,ACL’14 Summary Appendix PV-DM • Every paragraph: a column in matrix D • Shared across all contexts generated from the same paragraph but not across paragraphs • Every word: a column in matrix W • Shared across paragraphs • Sampled from a fixed-length context over the paragraph • Concatenate paragraph and word vectors The only change compared to the word vector model: y = b + Uh ( w t − k , ..., w t + k , d ; W, D ) where h : constructed from W and D d : the vector of the paragraph from which the context is sampled
Outline Mikolov,ICML’14 Kalchbrenner,ACL’14 Hermann,ACL’14 Summary Appendix Paragraph Vector without word ordering PV-DBOW: Distributed Bag-Of-Words 7 • Ignore the context words in the input • Force the model to predict words randomly sampled from the paragraph in the output • Sample a text window • Sample a random word from the text window • Form a classification task given the Paragraph Vector 7Skip-gram Model: GOTO 7
Outline Mikolov,ICML’14 Kalchbrenner,ACL’14 Hermann,ACL’14 Summary Appendix Outline Distributed Representation of Sentences and Documents. ICML’14 Word Vector Paragraph Vector Experiments of NLP Tasks A Convolutional Neural Network for Modelling Sentences. ACL’14 DCNN: Convolutional Neural Networks Experiments of NLP Tasks Multilingual Models for Compositional Distributed Semantics. ACL’14 Composition Models Experiments Summary
Outline Mikolov,ICML’14 Kalchbrenner,ACL’14 Hermann,ACL’14 Summary Appendix Sentiment Analysis Stanford Sentiment Treebank Dataset 8 Dataset: • 11855 sentences taken from the movie review site Rotten Tomatoes • train/test/development: 8544 / 2210 / 1101 sentences • sentence/subphrase labels: 5 -way fine-grained (+ + / + / 0 / − / − − ) , binary coarse-grained ( pos/neg ) • here only consider labeling the full sentences • treat a sentence as a paragraph 8Socher, R. et al. Recursive deep models for semantic compositionality over a sentiment treebank. EMNLP, 2013
Outline Mikolov,ICML’14 Kalchbrenner,ACL’14 Hermann,ACL’14 Summary Appendix Sentiment Analysis Stanford Sentiment Treebank Dataset 8 Dataset: • 11855 sentences taken from the movie review site Rotten Tomatoes • train/test/development: 8544 / 2210 / 1101 sentences • sentence/subphrase labels: 5 -way fine-grained (+ + / + / 0 / − / − − ) , binary coarse-grained ( pos/neg ) • here only consider labeling the full sentences • treat a sentence as a paragraph Experiment protocols: • Paraphrase Vector: a concatenation of PV-DM and PV-DBOW • PV-DM: 400 dimensions, PV-DBOW: 400 dimensions • The optimal window size: 8 • Predictor of the movie rating: a logistic regression 8Socher, R. et al. Recursive deep models for semantic compositionality over a sentiment treebank. EMNLP, 2013
Outline Mikolov,ICML’14 Kalchbrenner,ACL’14 Hermann,ACL’14 Summary Appendix Sentiment Analysis IMDB Dataset 9 Dataset: • 100 , 000 movie reviews taken from IMDB • each movie review: several sentences • labeled train/unlabeled train/labeled test: 25 , 000 / 50 , 000 / 25 , 000 • labels: binary ( pos/neg ) 9Maas, et al. Learning word vectors for sentiment analysis. ACL, 2011
Outline Mikolov,ICML’14 Kalchbrenner,ACL’14 Hermann,ACL’14 Summary Appendix Sentiment Analysis IMDB Dataset 9 Dataset: • 100 , 000 movie reviews taken from IMDB • each movie review: several sentences • labeled train/unlabeled train/labeled test: 25 , 000 / 50 , 000 / 25 , 000 • labels: binary ( pos/neg ) Experimental protocols: • PV-DM: 400 dimensions, PV-DBOW: 400 dimensions • Learning word vectors and paragraph vectors: 25 , 000 labeled + 50 , 000 unlabeled • The predictor: a neural network with one hidden layer with 50 units and a logistic classifier • The optimal window size: 10 9Maas, et al. Learning word vectors for sentiment analysis. ACL, 2011
Recommend
More recommend