Distributed Representation of Sentences LU Yangyang - PowerPoint PPT Presentation

Outline Mikolov,ICML’14 Kalchbrenner,ACL’14 Hermann,ACL’14 Summary Appendix Distributed Representation of Sentences LU Yangyang luyy11@sei.pku.edu.cn July 16,2014 @ KERE Seminar

Outline Mikolov,ICML’14 Kalchbrenner,ACL’14 Hermann,ACL’14 Summary Appendix Outline Distributed Representation of Sentences and Documents. ICML’14 Word Vector Paragraph Vector Experiments of NLP Tasks A Convolutional Neural Network for Modelling Sentences. ACL’14 DCNN: Convolutional Neural Networks Experiments of NLP Tasks Multilingual Models for Compositional Distributed Semantics. ACL’14 Composition Models Experiments Summary

Outline Mikolov,ICML’14 Kalchbrenner,ACL’14 Hermann,ACL’14 Summary Appendix Authors • Distributed Representation of Sentences and Documents. • ICML’14 1 • Quoc Le, Tomas Mikolov • Google Inc, Mountain View • A Convolutional Neural Network for Modelling Sentences. • ACL’14 2 • Nal Kalchbrenner, Edward Grefenstette, Phil Blunsom • University of Oxford • Multilingual Models for Compositional Distributed Semantics. • ACL’14 • Karl Moritz Hermann, Phil Blunsom • University of Oxford 1 http://icml.cc/2014/index/article/15.htm 2 http://acl2014.org/acl2014/index.html

Outline Mikolov,ICML’14 Kalchbrenner,ACL’14 Hermann,ACL’14 Summary Appendix Outline Distributed Representation of Sentences and Documents. ICML’14 Word Vector Paragraph Vector Experiments of NLP Tasks A Convolutional Neural Network for Modelling Sentences. ACL’14 Multilingual Models for Compositional Distributed Semantics. ACL’14 Summary

Outline Mikolov,ICML’14 Kalchbrenner,ACL’14 Hermann,ACL’14 Summary Appendix Recall: Word Vector 3 Every word: • A unique vector, represented by a column in a matrix W Given a sequence of training words w 1 , w 2 , w 3 , ..., w T : 3Mikolov T, et al. Efficient estimation of word representations in vector space[C]. ICLR workshop, 2013

Outline Mikolov,ICML’14 Kalchbrenner,ACL’14 Hermann,ACL’14 Summary Appendix Recall: Word Vector 3 Every word: • A unique vector, represented by a column in a matrix W Given a sequence of training words w 1 , w 2 , w 3 , ..., w T : • Predicting a word given the other words in a context (CBOW) • Predicting the surrounding words given a word (Skip-gram) 3Mikolov T, et al. Efficient estimation of word representations in vector space[C]. ICLR workshop, 2013

Outline Mikolov,ICML’14 Kalchbrenner,ACL’14 Hermann,ACL’14 Summary Appendix Recall: Word Vector The Skip-gram Model 4 • Predicting the surrounding words given a word in sentence • The objective: T 1 ∑︂ ∑︂ maximize log p ( w t + j | w t ) T t =1 − c ≤ j ≤ c,j ̸ =0 where c : the size of the training context 4Mikolov T, et al. Distributed representations of words and phrases and their compositionality[J]. Advances in Neural Information Processing Systems, 2013

Outline Mikolov,ICML’14 Kalchbrenner,ACL’14 Hermann,ACL’14 Summary Appendix Recall: Word Vector Continuous Bag-of-Words Model(CBOW) 5 • Predicting a word given the other words in a context • The projection layer: shared for all words (not just the projection matrix) • The objective: T − k 1 ∑︂ maximize log p ( w t | w t − k , ..., w t + k ) T t = k 5Mikolov T, et al. Efficient estimation of word representations in vector space[C]. ICLR workshop, 2013

Outline Mikolov,ICML’14 Kalchbrenner,ACL’14 Hermann,ACL’14 Summary Appendix Word Vector • The objective: T − k maximize 1 ∑︂ log p ( w t | w t − k , ..., w t + k ) T t = k • The prediction task: via a multiple classifier (e.g. softmax 6 ) e y w p ( w t | w t − k , ..., w t + k ) = i e y i ∑︁ y = b + Uh ( w t − k , ..., w t + k ; W ) where U, b : the softmax parameters h : a concatenation or average of word vectors extracted from W 6GOTO 53

Outline Mikolov,ICML’14 Kalchbrenner,ACL’14 Hermann,ACL’14 Summary Appendix Paragraph Vector PV-DM: A Distributed Memory Model • The paragraph vectors are asked to contribute to the prediction task of the next word given many contexts sampled from the paragraph. • The paragraph acts as a memory that remembers what is missing from the current context – or the topic of the paragraph.

Outline Mikolov,ICML’14 Kalchbrenner,ACL’14 Hermann,ACL’14 Summary Appendix PV-DM • Every paragraph: a column in matrix D • Shared across all contexts generated from the same paragraph but not across paragraphs • Every word: a column in matrix W • Shared across paragraphs • Sampled from a fixed-length context over the paragraph • Concatenate paragraph and word vectors

Outline Mikolov,ICML’14 Kalchbrenner,ACL’14 Hermann,ACL’14 Summary Appendix PV-DM • Every paragraph: a column in matrix D • Shared across all contexts generated from the same paragraph but not across paragraphs • Every word: a column in matrix W • Shared across paragraphs • Sampled from a fixed-length context over the paragraph • Concatenate paragraph and word vectors The only change compared to the word vector model: y = b + Uh ( w t − k , ..., w t + k , d ; W, D ) where h : constructed from W and D d : the vector of the paragraph from which the context is sampled

Outline Mikolov,ICML’14 Kalchbrenner,ACL’14 Hermann,ACL’14 Summary Appendix Paragraph Vector without word ordering PV-DBOW: Distributed Bag-Of-Words 7 • Ignore the context words in the input • Force the model to predict words randomly sampled from the paragraph in the output • Sample a text window • Sample a random word from the text window • Form a classification task given the Paragraph Vector 7Skip-gram Model: GOTO 7

Outline Mikolov,ICML’14 Kalchbrenner,ACL’14 Hermann,ACL’14 Summary Appendix Sentiment Analysis Stanford Sentiment Treebank Dataset 8 Dataset: • 11855 sentences taken from the movie review site Rotten Tomatoes • train/test/development: 8544 / 2210 / 1101 sentences • sentence/subphrase labels: 5 -way fine-grained (+ + / + / 0 / − / − − ) , binary coarse-grained ( pos/neg ) • here only consider labeling the full sentences • treat a sentence as a paragraph 8Socher, R. et al. Recursive deep models for semantic compositionality over a sentiment treebank. EMNLP, 2013

Outline Mikolov,ICML’14 Kalchbrenner,ACL’14 Hermann,ACL’14 Summary Appendix Sentiment Analysis Stanford Sentiment Treebank Dataset 8 Dataset: • 11855 sentences taken from the movie review site Rotten Tomatoes • train/test/development: 8544 / 2210 / 1101 sentences • sentence/subphrase labels: 5 -way fine-grained (+ + / + / 0 / − / − − ) , binary coarse-grained ( pos/neg ) • here only consider labeling the full sentences • treat a sentence as a paragraph Experiment protocols: • Paraphrase Vector: a concatenation of PV-DM and PV-DBOW • PV-DM: 400 dimensions, PV-DBOW: 400 dimensions • The optimal window size: 8 • Predictor of the movie rating: a logistic regression 8Socher, R. et al. Recursive deep models for semantic compositionality over a sentiment treebank. EMNLP, 2013

Outline Mikolov,ICML’14 Kalchbrenner,ACL’14 Hermann,ACL’14 Summary Appendix Sentiment Analysis IMDB Dataset 9 Dataset: • 100 , 000 movie reviews taken from IMDB • each movie review: several sentences • labeled train/unlabeled train/labeled test: 25 , 000 / 50 , 000 / 25 , 000 • labels: binary ( pos/neg ) 9Maas, et al. Learning word vectors for sentiment analysis. ACL, 2011

Outline Mikolov,ICML’14 Kalchbrenner,ACL’14 Hermann,ACL’14 Summary Appendix Sentiment Analysis IMDB Dataset 9 Dataset: • 100 , 000 movie reviews taken from IMDB • each movie review: several sentences • labeled train/unlabeled train/labeled test: 25 , 000 / 50 , 000 / 25 , 000 • labels: binary ( pos/neg ) Experimental protocols: • PV-DM: 400 dimensions, PV-DBOW: 400 dimensions • Learning word vectors and paragraph vectors: 25 , 000 labeled + 50 , 000 unlabeled • The predictor: a neural network with one hidden layer with 50 units and a logistic classifier • The optimal window size: 10 9Maas, et al. Learning word vectors for sentiment analysis. ACL, 2011

Distributed Representation of Sentences LU Yangyang - PowerPoint PPT Presentation

Outline Mikolov,ICML14 Kalchbrenner,ACL14 Hermann,ACL14 Summary Appendix Distributed Representation of Sentences LU Yangyang luyy11@sei.pku.edu.cn July 16,2014 @ KERE Seminar Outline Mikolov,ICML14 Kalchbrenner,ACL14

SYMBOLIC LOGIC UNIT 10: SINGULAR SENTENCES Singular Sentences (monadic) Paris is beautiful

Activity 1 Describe this character using as many 2a sentences as you can. Try and use ambitious

Nouns, V erbs, and Sentences 98-348: Lecture 2 Nouns, verbs and sentences 98-348: Lecture 2

Quantifier Elimination Helpful lemmas Let S be a set of sentences. Helpful lemmas Let S be a set

Distributed Systems (ICE 601) Distributed Transactions Dongman Lee ICU Class Overview

Unleashing Talent in A Distributed Workforce C O R E N E T 2 0 2 0 HACKATHON: DISTRIBUTED W O R K

K K Knowledge Knowledge l d l d Representation Representation Representation

Large Scale Knowledge Representation of Large Scale Knowledge Representation of Distributed

Distributed Databases Distributed database management system A distributed database (DDB) is

Distributed File Systems Distributed File Systems A distributed file system (DFS) is a

Stable and Efficient Representation Learning with Nonnegativity Constraints Tsung-Han Lin and

Precise and Approximate Representation of Numbers The Cartesian-Lagrangian representation of

Image and Video Coding: Representation, Acquisition, Display ... 10011 ... encoder decoder

Number representation in Java Scientific notation Overview topics Binary representation of

parametric surface patches 1 implicit representation implicit surface representation f ( P ) = 0

Mapping Text to Meaning Learning to Map Sentences to Logical Form Natural Meaning Language M

Opinion Mining in GATE Opinion Mining in GATE Horacio Saggion & Adam Funk

COS 495 Precept 2 Machine Learning in Practice Misha Precept Objectives Review how to train

MATH 105: Finite Mathematics 9-1: Introduction to Statistics Prof. Jonathan Duncan Walla Walla

CGT 215 Computer Graphics Programming I Introduc9on CGT 215

Latinos in Oregon: Evaluation Jam Session Trends and Opportunities Qualitative Analysis Part Two

Information Visualization Task Abstraction Tamara Munzner Department of Computer Science

Sentences and Documents Authors: QUOC LE, TOMAS MIKOLOV Presenters: Marjan Delpisheh, Nahid

Active Learning via Membership Query Synthesis for Semi-supervised Sentence Classification

Sambuz

Useful Links

Newsletter

Mail Us

Distributed Representation of Sentences LU Yangyang - PowerPoint PPT Presentation

Outline Mikolov,ICML14 Kalchbrenner,ACL14 Hermann,ACL14 Summary Appendix Distributed Representation of Sentences LU Yangyang luyy11@sei.pku.edu.cn July 16,2014 @ KERE Seminar Outline Mikolov,ICML14 Kalchbrenner,ACL14

SYMBOLIC LOGIC UNIT 10: SINGULAR SENTENCES Singular Sentences (monadic) Paris is beautiful

Activity 1 Describe this character using as many 2a sentences as you can. Try and use ambitious

Nouns, V erbs, and Sentences 98-348: Lecture 2 Nouns, verbs and sentences 98-348: Lecture 2

Quantifier Elimination Helpful lemmas Let S be a set of sentences. Helpful lemmas Let S be a set

Distributed Systems (ICE 601) Distributed Transactions Dongman Lee ICU Class Overview

Unleashing Talent in A Distributed Workforce C O R E N E T 2 0 2 0 HACKATHON: DISTRIBUTED W O R K

K K Knowledge Knowledge l d l d Representation Representation Representation

Large Scale Knowledge Representation of Large Scale Knowledge Representation of Distributed

Distributed Databases Distributed database management system A distributed database (DDB) is

Distributed File Systems Distributed File Systems A distributed file system (DFS) is a

Stable and Efficient Representation Learning with Nonnegativity Constraints Tsung-Han Lin and

Precise and Approximate Representation of Numbers The Cartesian-Lagrangian representation of

Image and Video Coding: Representation, Acquisition, Display ... 10011 ... encoder decoder

Number representation in Java Scientific notation Overview topics Binary representation of

parametric surface patches 1 implicit representation implicit surface representation f ( P ) = 0

Mapping Text to Meaning Learning to Map Sentences to Logical Form Natural Meaning Language M

Opinion Mining in GATE Opinion Mining in GATE Horacio Saggion &amp; Adam Funk

COS 495 Precept 2 Machine Learning in Practice Misha Precept Objectives Review how to train

MATH 105: Finite Mathematics 9-1: Introduction to Statistics Prof. Jonathan Duncan Walla Walla

CGT 215 Computer Graphics Programming I Introduc9on CGT 215

Latinos in Oregon: Evaluation Jam Session Trends and Opportunities Qualitative Analysis Part Two

Information Visualization Task Abstraction Tamara Munzner Department of Computer Science

Sentences and Documents Authors: QUOC LE, TOMAS MIKOLOV Presenters: Marjan Delpisheh, Nahid

Active Learning via Membership Query Synthesis for Semi-supervised Sentence Classification

Sambuz

Useful Links

Newsletter

Mail Us

Opinion Mining in GATE Opinion Mining in GATE Horacio Saggion & Adam Funk