Adaptive Multi-Compositionality for Recursive Neural Models with - - PowerPoint PPT Presentation

adaptive multi compositionality for recursive neural
SMART_READER_LITE
LIVE PREVIEW

Adaptive Multi-Compositionality for Recursive Neural Models with - - PowerPoint PPT Presentation

Adaptive Multi-Compositionality for Recursive Neural Models with Applications to Sentiment Analysis July 31, 2014 Semantic Composition Principle of Compositionality The meaning of a complex expression is determined by the meanings of its


slide-1
SLIDE 1

Adaptive Multi-Compositionality for Recursive Neural Models with Applications to Sentiment Analysis

July 31, 2014

slide-2
SLIDE 2

▪ Principle of Compositionality

▪ The meaning of a complex expression is determined by the meanings of its constituent expressions and the rules used to combine them

▪ Compositional nature of natural language

▪ Go beyond words towards sentences

▪ Examples

▪ red car -> red + car ▪ not very good -> not + ( very + good ) ▪ eat food -> eat + food ▪ …

Semantic Composition

slide-3
SLIDE 3

▪ Utilize the recursive structures of sentences to obtain the semantic representations

▪ The vector representations are used as features and fed into a softmax classifier to predict their labels

▪ Learn to recursively perform semantic compositions in vector space ▪ One family of the popular deep learning models

Recursive Neural Models (RNMs)

not very good

Negative Softmax

very good not very good

slide-4
SLIDE 4

Semantic Composition with Matrix/Tensor

Problem: RNN and RNTN employ the same global composition function for all pair of input vectors

RNN (Socher et al. 2011) RNTN (Yu et al. 2013, Socher et al. 2013)

𝒘 = 𝑔 𝑿 𝒘𝑚 𝒘𝑠 + 𝒄 𝒘 = 𝑔 𝒘𝑚 𝒘𝑠

𝑈

𝑼[1:𝐸] 𝒘𝑚 𝒘𝑠 + 𝑿 𝒘𝑚 𝒘𝑠 + 𝒄

+ + + + + + 𝑤 = 𝑔 + + + 𝑤 = 𝑔 + + + ⋯ + + + + + + + ⋯ + intersection 𝒘𝑚 𝒘𝑠 𝒘

▪ The main difference among the recursive neural models (RNMs) lies in semantic composition methods

slide-5
SLIDE 5

Motivation of This Work

▪ Use different composition functions for different types of compositions

▪ Negation: not good, not bad ▪ Intensification: very good, pretty bad ▪ Contrast: the movie is good, but I love it ▪ Sentiment word + target/aspect: good movie, low price ▪ …

▪ Model the composition as a distribution over multiple composition functions, and adaptively select them

slide-6
SLIDE 6

One Global Composition Function Adaptive Multi-Compositionality

slide-7
SLIDE 7

Adaptive Compositionality

▪ Use more than one composition functions and adaptively select them depending on the input vectors

𝒘 = 𝑔 ෍

ℎ=1 𝐷

𝑄 𝑕ℎ|𝒘𝑚, 𝒘𝑠 𝑕ℎ 𝒘𝑚, 𝒘𝑠

g1 g2 g3 g4

Softmax Classifier

Input vectors Output vector Distribution of composition functions

slide-8
SLIDE 8

Adaptive Compositionality

▪ Use more than one composition functions and adaptively select them depending on the input vectors

𝒘 = 𝑔 ෍

ℎ=1 𝐷

𝑄 𝑕ℎ|𝒘𝑚, 𝒘𝑠 𝑕ℎ 𝒘𝑚, 𝒘𝑠

g1 g2 g3 g4

Softmax Classifier

Input vectors Output vector Distribution of composition functions

The ℎ -th composition function (Both the matrices and tensors can be used)

slide-9
SLIDE 9

Adaptive Compositionality

▪ Use more than one composition functions and adaptively select them depending on the input vectors

𝒘 = 𝑔 ෍

ℎ=1 𝐷

𝑄 𝑕ℎ|𝒘𝑚, 𝒘𝑠 𝑕ℎ 𝒘𝑚, 𝒘𝑠

g1 g2 g3 g4

Softmax Classifier

Input vectors Output vector Distribution of composition functions

𝑄 𝑕1|𝒘𝑚, 𝒘𝑠 ⋮ 𝑄 𝑕𝐷|𝒘𝑚, 𝒘𝑠 = 𝑡𝑝𝑔𝑢𝑛𝑏𝑦 𝜸𝑻 𝒘𝑚 𝒘𝑠

Avg-AdaMC Weighted-AdaMC Max-AdaMC

𝑄 𝑕ℎ|𝒘𝑚, 𝒘𝑠 = 1 𝐷 𝑄 𝑕1|𝒘𝑚, 𝒘𝑠 ⋮ 𝑄 𝑕𝐷|𝒘𝑚, 𝒘𝑠 = 𝑡𝑝𝑔𝑢𝑛𝑏𝑦 𝑻 𝒘𝑚 𝒘𝑠 𝑄 𝑕ℎ|𝒘𝑚, 𝒘𝑠 = ቊ1, 𝑛𝑏𝑦𝑛𝑣𝑛 𝑡𝑑𝑝𝑠𝑓 0, 𝑝𝑢ℎ𝑓𝑠𝑥𝑗𝑡𝑓

The Boltzmann distribution is used to adaptively select 𝑕ℎ.

slide-10
SLIDE 10

Objective Function

▪ Minimize the cross-entropy error

▪ Target vector 𝒖𝑘 = [0 … 1 … 0] ▪ Predicted distribution 𝒛𝑘 = [0.07 … 0.69 … 0.15]

▪ AdaGrad (Duchi, Hazan, and Singer 2011)

𝜄𝑢 = 𝜄𝑢−1 − 𝜃 1 𝐻𝑢 ቤ 𝜖𝐹 𝜖𝜄 𝜄=𝜄𝑢−1 𝐻𝑢 = 𝐻𝑢−1 + ቤ 𝜖𝐹 𝜖𝜄 𝜄=𝜄𝑢−1

2

min

Θ 𝐹 Θ = − ෍ 𝑗

𝑘

𝑢𝑘

𝑗 log 𝑧𝑘 𝑗 + ෍ 𝜄∈Θ

𝜇𝜄 𝜄 2

2

slide-11
SLIDE 11

Parameter Estimation

▪ Back-propagation algorithm: ▪ Classification:

𝜖𝐹 𝜖𝑽𝑛𝑜 = σ𝑗[𝒘𝑜 𝑗 (𝒛𝑛 𝑗 − 𝒖𝑛 𝑗 )]

▪ Composition selection: ▪ Linear composition:

𝜖𝐹 𝜖𝑿𝑛𝑜 = σ𝑗 σ𝑠∈𝑐𝑞(𝑗) 𝜺𝑛 𝑗←𝑠𝒚𝑜 𝑗 𝑄 𝑕ℎ|𝒘𝑚 𝑗, 𝒘𝑠 𝑗

▪ Tensor Composition:

𝜖𝐹 𝜖𝑾ℎ𝑛𝑜

[𝑒] = σ𝑗 σ𝑠∈𝑐𝑞(𝑗) 𝜺𝑒

𝑗←𝑠𝒚m 𝑗 𝒚𝑜 𝑗 𝑄 𝑕ℎ|𝒘𝑚 𝑗, 𝒘𝑠 𝑗

▪ Word Embedding:

𝜖𝐹 𝜖𝑴𝑒

𝑥 = σ 𝑗 =𝑥 σ𝑠∈𝑐𝑞(𝑗) 𝜺𝑒

𝑗←𝑠

𝜺𝑛

𝑗←𝑠 =

𝑙

𝒛𝑛

𝑗 − 𝒖𝑛 𝑗

𝑽𝑛𝑙𝑔′(𝒃𝑛

𝑗 ) , 𝑠 = 𝑗

𝑙

𝜺𝑛

𝑞𝑏𝑠(𝑗)←𝑠 𝜖𝒃𝑙 𝑞𝑏𝑠(𝑗)

𝜖𝒘𝑛

𝑗

𝑔′(𝒃𝑛

𝑗 ) , 𝑠 ∈ 𝑏𝑜𝑑(𝑗)

𝜖𝐹 𝜖𝑻𝑛𝑜 = ෍

𝑗

𝑠∈𝑐𝑞(𝑗)

𝑙

𝜺𝑙

𝑗←𝑠 ෍ ℎ

𝒃𝑙

𝑗,𝑕ℎ𝒚𝑜 𝑗 𝛾𝑄 𝑕ℎ|𝒘𝑚 𝑗, 𝒘𝑠 𝑗

𝑄 𝑕ℎ|𝒘𝑚

𝑗, 𝒘𝑠 𝑗

− 1 , ℎ = 𝑛 ෍

𝑗

𝑠∈𝑐𝑞(𝑗)

𝑙

𝜺𝑙

𝑗←𝑠 ෍ ℎ

𝒃𝑙

𝑗,𝑕ℎ𝒚𝑜 𝑗 𝛾𝑄 𝑕ℎ|𝒘𝑚 𝑗, 𝒘𝑠 𝑗 𝑄 𝑕𝑛|𝒘𝑚 𝑗, 𝒘𝑠 𝑗

, ℎ ≠ 𝑛

slide-12
SLIDE 12

Stanford Sentiment Treebank

▪ 10,662 critic reviews in Rotten Tomatoes ▪ 215,154 phrases from results of Stanford Parser ▪ The workers in Amazon Mechanical Turk annotate polarity levels for all these phrases ▪ The sentiment scales are merged to five categories (very negative, negative, neutral, positive, very positive)

slide-13
SLIDE 13

Results of evaluation on the Sentiment Treebank. The top three methods are in bold. Our methods achieve best performances when \beta is set to 2.

slide-14
SLIDE 14

𝒘 = 𝑔 ෍

ℎ=1 𝐷

𝑄 𝑕ℎ|𝒘𝑚, 𝒘𝑠 𝑕ℎ 𝒘𝑚, 𝒘𝑠 𝑄 𝑕1|𝒘𝑚, 𝒘𝑠 ⋮ 𝑄 𝑕𝐷|𝒘𝑚, 𝒘𝑠 = 𝑡𝑝𝑔𝑢𝑛𝑏𝑦 𝜸𝑻 𝒘𝑚 𝒘𝑠

Avg-AdaMC Weighted-AdaMC Max-AdaMC 𝑄 𝑕ℎ|𝒘𝑚, 𝒘𝑠 = 1 𝐷 𝑄 𝑕1|𝒘𝑚, 𝒘𝑠 ⋮ 𝑄 𝑕𝐷|𝒘𝑚, 𝒘𝑠 = 𝑡𝑝𝑔𝑢𝑛𝑏𝑦 𝑻 𝒘𝑚 𝒘𝑠 𝑄 𝑕ℎ|𝒘𝑚, 𝒘𝑠 = ቊ1, 𝑛𝑏𝑦 𝑡𝑑𝑝𝑠𝑓 0, 𝑝𝑢ℎ𝑓𝑠𝑥𝑗𝑡𝑓

slide-15
SLIDE 15

Vector Representations

Word/Phrase Neighboring Words/Phrases in the Vector Space good cool, fantasy, classic, watchable, attractive boring dull, bad, disappointing, horrible, annoying ingenious extraordinary, inspirational, imaginative, thoughtful, creative soundtrack execution, animation, cast, colors, scene good actors good ideas, good acting, good looks, good sense, great cast thought-provoking film beautiful film, engaging film, lovely film, remarkable film, riveting story painfully bad how bad, too bad, really bad, so bad, very bad not a good movie isn’t much fun, isn’t very funny, nothing new, isn’t as funny of clichés

slide-16
SLIDE 16

Positive Negative Very positive

fancy, good, cool, promising, interested failure, worst, disaster, horrible problem, slow, sick, mess, poor, wrong creative, great, perfect, superb, amazing plot, near, buy, surface, them, version

Objective Very negative

t-SNE

slide-17
SLIDE 17

Composition Pairs in the Composition Space

Composition Pair Neighboring Composition Pairs really bad very bad / only dull / much bad / extremely bad / (all that) bad (is n’t) (necessarily bad) (is n’t) (painfully bad) / not mean-spirited / not (too slow) / not well-acted / (have otherwise) (been bland) great (Broadway play) great (cinematic innovation) / great subject / great performance / energetic entertainment / great (comedy filmmaker) (arty and) jazzy (Smart and) fun / (verve and) fun / (unique and) entertaining / (gentle and) engrossing / (warmth and) humor ▪ For the composition pair (𝒘𝑚, 𝒘𝑠), we use the distribution of the composition functions 𝑄 𝑕1|𝒘𝑚, 𝒘𝑠 ⋮ 𝑄 𝑕𝐷|𝒘𝑚, 𝒘𝑠 to query its neighboring pairs

slide-18
SLIDE 18

(*) * ‘s Negation these/this/the * and * * and Entity

  • f *

a/an/two * Intensification adj noun for/with * verb *

t-SNE Visualization: Composition Pairs

𝑄 𝑕1|𝒘𝑚, 𝒘𝑠 ⋮ 𝑄 𝑕𝐷|𝒘𝑚, 𝒘𝑠

slide-19
SLIDE 19

(*) * ‘s Negation these/this/the * and * * and NE

  • f *

a/an/two * Intensification adj noun for/with * verb * ▪ Best films ▪ Riveting story ▪ Solid cast ▪ Talented director ▪ Gorgeous visuals

slide-20
SLIDE 20

(*) * ‘s Negation these/this/the * and * * and Entity

  • f *

a/an/two * Intensification adj noun for/with * verb * ▪ Really good ▪ Quite funny ▪ Damn fine ▪ Very good ▪ Particularly funny

slide-21
SLIDE 21

(*) * ‘s Negation these/this/the * and * * and Entity

  • f *

a/an/two * Intensification adj noun for/with * verb * ▪ Is never dull ▪ Not smart ▪ Not a good movie ▪ Is n’t much fun ▪ Wo n’t be disappointed

slide-22
SLIDE 22

(*) * ‘s Negation these/this/the * and * * and Entity

  • f *

a/an/two * Intensification adj noun for/with * verb * ▪ Roberto Alagna ▪ Pearl Harbor ▪ Elizabeth Hurley ▪ Diane Lane ▪ Pauly Shore

slide-23
SLIDE 23

▪ Use AdaMC for the other NLP tasks ▪ Utilize external information to adaptively select the composition functions

▪ Part-of-speech tags ▪ Syntactic parsing results

▪ Mix different composition types together

▪ Linear combination approach (RNN) ▪ Tensor-based approach (RNTN) ▪ Multiplication approach

▪ …

Future Work

slide-24
SLIDE 24

THANKS!