RECURSIVE DEEP MODELS FOR SEMANTIC COMPOSITIONALITY OVER A SENTIMENT TREEBANK
Presented By: Dwayne Campbell
Richard Socher, Alex Perelygin, Jean Y. Wu, Jason Chuang, Christopher D. Manning, Andrew Y. Ng and Christopher Potts
RECURSIVE DEEP MODELS FOR SEMANTIC COMPOSITIONALITY OVER A SENTIMENT - - PowerPoint PPT Presentation
RECURSIVE DEEP MODELS FOR SEMANTIC COMPOSITIONALITY OVER A SENTIMENT TREEBANK Richard Socher, Alex Perelygin, Jean Y. Wu, Jason Chuang, Christopher D. Manning, Andrew Y. Ng and Christopher Potts Presented By: Dwayne Campbell Overview
Richard Socher, Alex Perelygin, Jean Y. Wu, Jason Chuang, Christopher D. Manning, Andrew Y. Ng and Christopher Potts
¨ Introduction ¨ Problem ¨ Stanford Sentiment Treebank ¨ Models ¨ Experiments
¨ Attitude ¨ Emotions ¨ Opinions
¨ Lack of large labeled compositionality corpus
¨ Semantic vector spaces are very useful but
¨ First corpus with fully labeled parse trees ¨ 10,662 single sentences extracted from movie
¨ 215,154 unique phrases generated by the Stanford
¨ Each phrase annotated by 3 human judges
parsed into 215, 154 phrases using the Stanford Parser
annotators . Presented with a slider of 25 different values initially set to neutral
all phrases
All models share the following:
representations for phrases of variable length.
representations derived from above as features to classify each phrase.
compositional models, it is then parsed into a binary tree where each leaf node is represented as a vector.
parent vectors in a bottom up fashion using different type of compositionally functions g(..)
Where f is the tanh
computed.
softmax classifier to compute its label probability. Disadvantage: Not enough interaction since the input vectors only implicitly interact through the nonlinearity (squashing) function
represent each word as both a vector and a matrix Disadvantage: The number of parameters become very large and is dependent on the vocabulary
split into (8544), dev(1101) and test splits(2210).
validate over regularization of weights,word vector sizes, learning rate and mini batch for AdaGrad. Optimal performance when:
25-30.
30.
RNTN(41%) , MV-RNN(37%), RNN(36%) & biNB(27%)