Adaptive Multi-Compositionality for Recursive Neural Models with - PowerPoint PPT Presentation

Adaptive Multi-Compositionality for Recursive Neural Models with Applications to Sentiment Analysis July 31, 2014

Semantic Composition ▪ Principle of Compositionality ▪ The meaning of a complex expression is determined by the meanings of its constituent expressions and the rules used to combine them ▪ Compositional nature of natural language ▪ Go beyond words towards sentences ▪ Examples ▪ red car -> red + car ▪ not very good -> not + ( very + good ) ▪ eat food -> eat + food ▪ …

Recursive Neural Models (RNMs) ▪ Utilize the recursive structures of sentences to obtain the semantic representations ▪ The vector representations are used as features and fed into a softmax classifier to predict their labels ▪ Learn to recursively perform semantic compositions in vector space ▪ One family of the popular deep learning models Negative Softmax not very good very good very good not

Semantic Composition with Matrix/Tensor ▪ The main difference among the recursive neural models (RNMs) lies in 𝒘 semantic composition methods 𝒘 𝑚 𝒘 𝑠 intersection + + + + + + + + + ⋯ + 𝑤 = 𝑔 𝑤 = 𝑔 + + + + + + + + + ⋯ + 𝑈 𝒘 𝑚 𝑼 [1:𝐸] 𝒘 𝑚 + 𝑿 𝒘 𝑚 𝒘 = 𝑔 𝑿 𝒘 𝑚 𝒘 = 𝑔 + 𝒄 + 𝒄 𝒘 𝑠 𝒘 𝑠 𝒘 𝑠 𝒘 𝑠 RNN (Socher et al. RNTN (Yu et al. 2013, Socher et al. 2011) 2013) Problem : RNN and RNTN employ the same global composition function for all pair of input vectors

Motivation of This Work ▪ Use different composition functions for different types of compositions ▪ Negation: not good, not bad ▪ Intensification: very good, pretty bad ▪ Contrast: the movie is good, but I love it ▪ Sentiment word + target/aspect: good movie, low price ▪ … ▪ Model the composition as a distribution over multiple composition functions, and adaptively select them

One Global Composition Function Adaptive Multi-Compositionality

Adaptive Compositionality ▪ Use more than one composition functions and adaptively select them depending on the input vectors 𝐷 Output vector 𝒘 = 𝑔 ෍ 𝑄 𝑕 ℎ |𝒘 𝑚 , 𝒘 𝑠 𝑕 ℎ 𝒘 𝑚 , 𝒘 𝑠 ℎ=1 Distribution of composition functions Softmax Classifier g 1 g 2 g 3 g 4 Input vectors

Adaptive Compositionality ▪ Use more than one composition functions and adaptively select them depending on the input vectors 𝐷 Output vector 𝒘 = 𝑔 ෍ 𝑄 𝑕 ℎ |𝒘 𝑚 , 𝒘 𝑠 𝑕 ℎ 𝒘 𝑚 , 𝒘 𝑠 ℎ=1 Distribution of composition functions The ℎ -th composition function (Both the matrices and tensors Softmax can be used) Classifier g 1 g 2 g 3 g 4 Input vectors

Adaptive Compositionality ▪ Use more than one composition functions and adaptively select them depending on the input vectors Output vector 𝐷 𝒘 = 𝑔 ෍ 𝑄 𝑕 ℎ |𝒘 𝑚 , 𝒘 𝑠 𝑕 ℎ 𝒘 𝑚 , 𝒘 𝑠 ℎ=1 Distribution of composition functions 𝑄 𝑕 1 |𝒘 𝑚 , 𝒘 𝑠 = 𝑡𝑝𝑔𝑢𝑛𝑏𝑦 𝜸𝑻 𝒘 𝑚 ⋮ Softmax 𝒘 𝑠 𝑄 𝑕 𝐷 |𝒘 𝑚 , 𝒘 𝑠 Classifier g 1 g 2 g 3 g 4 The Boltzmann distribution is used to adaptively select 𝑕 ℎ . Input vectors 𝑄 𝑕 1 |𝒘 𝑚 , 𝒘 𝑠 = 𝑡𝑝𝑔𝑢𝑛𝑏𝑦 𝑻 𝒘 𝑚 𝑄 𝑕 ℎ |𝒘 𝑚 , 𝒘 𝑠 = 1 𝑄 𝑕 ℎ |𝒘 𝑚 , 𝒘 𝑠 = ቊ1, 𝑛𝑏𝑦𝑛𝑣𝑛 𝑡𝑑𝑝𝑠𝑓 ⋮ 𝒘 𝑠 𝐷 0, 𝑝𝑢ℎ𝑓𝑠𝑥𝑗𝑡𝑓 𝑄 𝑕 𝐷 |𝒘 𝑚 , 𝒘 𝑠 Avg-AdaMC Weighted-AdaMC Max-AdaMC

Objective Function ▪ Minimize the cross-entropy error ▪ Target vector 𝒖 𝑘 = [0 … 1 … 0] ▪ Predicted distribution 𝒛 𝑘 = [0.07 … 0.69 … 0.15] 𝑗 log 𝑧 𝑘 𝑗 + ෍ 2 min Θ 𝐹 Θ = − ෍ ෍ 𝑢 𝑘 𝜇 𝜄 𝜄 2 𝑗 𝑘 𝜄∈Θ ▪ AdaGrad (Duchi, Hazan, and Singer 2011) 𝜄 𝑢 = 𝜄 𝑢−1 − 𝜃 1 𝜖𝐹 ቤ 𝜖𝜄 𝜄=𝜄 𝑢−1 𝐻 𝑢 2 𝜖𝐹 𝐻 𝑢 = 𝐻 𝑢−1 + ቤ 𝜖𝜄 𝜄=𝜄 𝑢−1

Parameter Estimation 𝑗 − 𝒖 𝑛 𝑗 ) , 𝑠 = 𝑗 𝑗 ෍ 𝒛 𝑛 𝑽 𝑛𝑙 𝑔′(𝒃 𝑛 ▪ Back-propagation algorithm: 𝑗←𝑠 = 𝑙 𝜺 𝑛 𝑞𝑏𝑠(𝑗) 𝑞𝑏𝑠(𝑗)←𝑠 𝜖𝒃 𝑙 𝑗 ) , 𝑠 ∈ 𝑏𝑜𝑑(𝑗) ෍ 𝜺 𝑛 𝑔′(𝒃 𝑛 𝑗 𝜖𝒘 𝑛 𝑙 ▪ Classification: 𝜖𝐹 𝑗 (𝒛 𝑛 𝑗 − 𝒖 𝑛 𝑗 )] 𝜖𝑽 𝑛𝑜 = σ 𝑗 [𝒘 𝑜 𝑗←𝑠 ෍ 𝑗 𝛾𝑄 𝑕 ℎ |𝒘 𝑚 𝑗,𝑕 ℎ 𝒚 𝑜 𝑗 , 𝒘 𝑠 𝑗 𝑗 , 𝒘 𝑠 𝑗 ෍ ෍ ෍ 𝜺 𝑙 𝒃 𝑙 𝑄 𝑕 ℎ |𝒘 𝑚 − 1 , ℎ = 𝑛 𝜖𝐹 ▪ Composition selection: 𝑗 𝑠∈𝑐𝑞(𝑗) 𝑙 ℎ = 𝜖𝑻 𝑛𝑜 𝑗←𝑠 ෍ 𝑗,𝑕 ℎ 𝒚 𝑜 𝑗 𝛾𝑄 𝑕 ℎ |𝒘 𝑚 𝑗 𝑄 𝑕 𝑛 |𝒘 𝑚 𝑗 , 𝒘 𝑠 𝑗 , 𝒘 𝑠 𝑗 ෍ ෍ ෍ 𝜺 𝑙 𝒃 𝑙 , ℎ ≠ 𝑛 𝑗 𝑠∈𝑐𝑞(𝑗) 𝑙 ℎ 𝜖𝐹 𝑗 𝑄 𝑕 ℎ |𝒘 𝑚 ▪ Linear composition: 𝑗 , 𝒘 𝑠 𝑗←𝑠 𝒚 𝑜 𝑗 𝜖𝑿 𝑛𝑜 = σ 𝑗 σ 𝑠∈𝑐𝑞(𝑗) 𝜺 𝑛 𝜖𝐹 𝑗 𝒚 𝑜 𝑗 𝑄 𝑕 ℎ |𝒘 𝑚 ▪ Tensor Composition: 𝑗←𝑠 𝒚 m 𝑗 , 𝒘 𝑠 𝑗 [𝑒] = σ 𝑗 σ 𝑠∈𝑐𝑞(𝑗) 𝜺 𝑒 𝜖𝑾 ℎ𝑛𝑜 𝜖𝐹 ▪ Word Embedding: 𝑗←𝑠 𝑥 = σ 𝑗 =𝑥 σ 𝑠∈𝑐𝑞(𝑗) 𝜺 𝑒 𝜖𝑴 𝑒

Stanford Sentiment Treebank ▪ 10,662 critic reviews in Rotten Tomatoes ▪ 215,154 phrases from results of Stanford Parser ▪ The workers in Amazon Mechanical Turk annotate polarity levels for all these phrases ▪ The sentiment scales are merged to five categories (very negative, negative, neutral, positive, very positive)

Results of evaluation on the Sentiment Treebank. The top three methods are in bold. Our methods achieve best performances when \beta is set to 2.

Vector Representations Word/Phrase Neighboring Words/Phrases in the Vector Space good cool, fantasy, classic, watchable, attractive boring dull, bad, disappointing, horrible, annoying ingenious extraordinary, inspirational, imaginative, thoughtful, creative soundtrack execution, animation, cast, colors, scene good actors good ideas, good acting, good looks, good sense, great cast thought-provoking film beautiful film, engaging film, lovely film, remarkable film, riveting story painfully bad how bad, too bad, really bad, so bad, very bad not a good movie isn’t much fun, isn’t very funny, nothing new , isn’t as funny of clichés

Positive fancy, good, cool, Objective plot, near, promising, buy, surface, interested them, version Very negative failure, worst, disaster, horrible creative, great, problem, perfect, slow, sick, Negative superb, Very positive mess, poor, amazing wrong t-SNE

Composition Pairs in the Composition Space ▪ For the composition pair (𝒘 𝑚 , 𝒘 𝑠 ) , we use the distribution of the composition functions 𝑄 𝑕 1 |𝒘 𝑚 , 𝒘 𝑠 to query its neighboring pairs ⋮ 𝑄 𝑕 𝐷 |𝒘 𝑚 , 𝒘 𝑠 Composition Pair Neighboring Composition Pairs really bad very bad / only dull / much bad / extremely bad / (all that) bad (is n’t ) (is n’t ) (painfully bad) / not mean-spirited / not (too slow) / not (necessarily bad) well-acted / (have otherwise) (been bland) great great (cinematic innovation) / great subject / great performance (Broadway play) / energetic entertainment / great (comedy filmmaker) (arty and) jazzy (Smart and) fun / (verve and) fun / (unique and) entertaining / (gentle and) engrossing / (warmth and) humor

these/this/the * Visualization: Composition Pairs * and for/with * 𝑄 𝑕 1 |𝒘 𝑚 , 𝒘 𝑠 and * adj noun ⋮ 𝑄 𝑕 𝐷 |𝒘 𝑚 , 𝒘 𝑠 (*) Entity Negation Intensification verb * * ‘s a/an/two * of * t-SNE

these/this/the * ▪ Best films * and for/with * ▪ Riveting story and * adj noun ▪ Solid cast (*) ▪ Talented director NE ▪ Gorgeous visuals Negation Intensification verb * * ‘s a/an/two * of *

these/this/the * ▪ Really good * and for/with * ▪ Quite funny and * adj noun ▪ Damn fine (*) ▪ Very good Entity ▪ Particularly funny Negation Intensification verb * * ‘s a/an/two * of *

these/this/the * ▪ Is never dull * and for/with * ▪ Not smart and * adj noun ▪ Not a good movie (*) ▪ Is n’t much fun Entity ▪ Wo n’t be Negation Intensification disappointed verb * * ‘s a/an/two * of *

these/this/the * ▪ Roberto Alagna ▪ Pearl Harbor * and for/with * ▪ Elizabeth Hurley and * ▪ Diane Lane adj noun ▪ Pauly Shore (*) Entity Negation Intensification verb * * ‘s a/an/two * of *

Adaptive Multi-Compositionality for Recursive Neural Models with - PowerPoint PPT Presentation

Adaptive Multi-Compositionality for Recursive Neural Models with Applications to Sentiment Analysis July 31, 2014 Semantic Composition Principle of Compositionality The meaning of a complex expression is determined by the meanings of its

Neural Nets for Adaptive Filter and Adaptive Neural Nets as Adaptive Filters Pattern Recognition

Compositionality in Recursive Neural Networks Martha Lewis ILLC University of Amsterdam SYCO3,

61A Lecture 6 Announcements Recursive Functions Recursive Functions 4 Recursive Functions

Recursive Methods Noter ch.2 Recursive Methods Recursive problem solution Problems

Recursion Announcements Recursive Functions Recursive Functions 4 Recursive Functions

Lesson 9 Recursive Types 2/19, 21 Chapters 20, 21 Recursive type Recursive type terms are

Recursive Methods Recursive problem solution Problems that are naturally solved by

Compositionality and Asynchrony Dr. Liam OConnor University of Edinburgh LFCS (and UNSW) Term

Exploiting multilingual lexical resources to predict the compositionality of MWEs Paul Cook

Compositionality and Asynchrony Dr. Liam OConnor University of Edinburgh LFCS (and UNSW) Term

Distributional Compositionality Compositionality in DS Raffaella Bernardi University of Trento

Adaptive Control Chapter 5: Recursive plant model identification in open loop 1 Adaptive Control

Adaptive Control Chapter 1: Introduction to Adaptive Control Adaptive Control Landau, Lozano,

Adaptive Control Chapter 11: Direct Adaptive Control 1 Adaptive Control Landau, Lozano,

RECURSIVE DEEP MODELS FOR SEMANTIC 1 COMPOSITIONALITY Zhicong Lu DGP Lab

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Multi-domain Hybrid RKDG and WENO-FD Method for Hyperbolic Conservation Laws Tiegang Liu School

Predicting Temporal Sets with Deep Neural Networks Le Yu, Leilei Sun*, Bowen Du, Chuanren Liu, Hui

Dual Variational Generation for Low Shot Heterogeneous Face Recognition Chaoyou Fu, Xiang Wu

Short introduction to the CHAIN-REDS Project Federico Ruggieri INFN/GARR Joint CHAIN-REDS /

The Automated Acquisition of Suggestions from Tweets July 16, 2013 What is suggestion?

CARSI: Cross University Identity Management and Resource Sharing over CERNET Prof. PING CHEN

Monthly Proton Flux Monthly Proton Flux Matteo Palermo (University of Hawaii) on behalf of the

CSE 331 Review: Classes, Inheritance, and Collections slides created by Marty Stepp based on