Searching for the X-Factor: Exploring Corpus Subjectivity for Word Embeddings
Maksim Tkachenko and Chong Cher Chia and Hady W. Lauw Singapore Management University
Searching for the X-Factor: Exploring Corpus Subjectivity for Word - - PowerPoint PPT Presentation
Searching for the X-Factor: Exploring Corpus Subjectivity for Word Embeddings Maksim Tkachenko and Chong Cher Chia and Hady W. Lauw Singapore Management University Word Embeddings Dense vectors of words Unsupervised training: GloVe,
Maksim Tkachenko and Chong Cher Chia and Hady W. Lauw Singapore Management University
apple banana Melbourne
good → … 8321, 235, 63444, … ∈ 𝑆Vocabulary Size (≈ 300𝑙) Counting Contexts good → … 0.0335, −0.1018, 0.2300, … ∈ 𝑆300 Reducing Dimensionality
This camera is good for high quality … Target Word Context Words
good → … ?, ?, ? , … ∈ 𝑆300 Counting Contexts Reducing Dimensionality
good → … ?, ?, ? , … ∈ 𝑆Vocabulary Size (≈ 300𝑙)
More Subjective More Objective
Objective Embeddings (OE) Subjective Embeddings (SE)
“A very funny movie” vs. “One lousy movie”
“The story needs more dramatic meat” vs. “She's an artist”
70 80 90
Subjectivity Classification Topic Classification Amazon Sentiment Rotten Tomatoes Sentiment Accuracy Objective Embeddings (OE) Subjective Embeddings (SE) SE understand sentiment words better than OE? SE and OE are very similar on “objective” tasks
Word Similarity bad 0.68 decent 0.67 nice 0.62 poor 0.61 … … Word Similarity decent 0.78 great 0.76 nice 0.69 terrific 0.64 … …
Word A Word B Their Similarity … … … waste Save 0.51 love hate 0.60 loves hates 0.68 easy difficult 0.56 … … …
Similar to “good” Similarity bad 0.68 decent 0.67 nice 0.62 poor 0.61 … …
Similar to “good” Similarity decent 0.79 nice 0.76 perfect 0.75 excellent 0.73 … …
Negative: waste, junk, horrible, defective, … Positive: love, great, recommend, easy, …
This camera is good for high quality …
(good, camera) (good, is) (good, for) (good, high)
Random Noise (good, frog) (good, duck) … Word2Vec Skip-gram
Lexical objective of SentiVec (two classes) 𝑄 good 𝑗𝑡 𝑄𝑃𝑇𝐽𝑈𝐽𝑊𝐹 = 𝜏 good ⋅ 𝜚 𝑄 good 𝑗𝑡 𝑂𝐹𝐻𝐵𝑈𝐽𝑊𝐹 = 1 − 𝑄 good 𝑗𝑡 𝑄𝑃𝑇𝐽𝑈𝐽𝑊𝐹 𝑄 good 𝑗𝑡 𝑄𝑃𝑇𝐽𝑈𝐽𝑊𝐹 → 𝑁𝐵𝑌𝐽𝑁𝐽𝑎𝐹
75 80 85 90
Objective Embeddings Subjective Embeddings Objective Embeddings Subjective Embeddings
Amazon Sentiment (average over 24 categories) Rotten Tomatoes Sentiment
Accuracy
SentiVec does not affect “objective” classification tasks
0.8% 0.3% 0.2% 0%
Positive Words Negative Words Neutral Words
Pre-trained Word Embeddings & Code: https://sentivec.preferred.ai/