searching for the x factor
play

Searching for the X-Factor: Exploring Corpus Subjectivity for Word - PowerPoint PPT Presentation

Searching for the X-Factor: Exploring Corpus Subjectivity for Word Embeddings Maksim Tkachenko and Chong Cher Chia and Hady W. Lauw Singapore Management University Word Embeddings Dense vectors of words Unsupervised training: GloVe,


  1. Searching for the X-Factor: Exploring Corpus Subjectivity for Word Embeddings Maksim Tkachenko and Chong Cher Chia and Hady W. Lauw Singapore Management University

  2. Word Embeddings • Dense vectors of words • Unsupervised training: GloVe, Word2Vec • Words in similar context tend to have similar meaning apple ∈ 𝑆 300 good → … 0.0335, −0.1018, 0.2300, … banana • Words with similar meanings tend to be close in embedding space Melbourne

  3. Training Word Embeddings Target Word This camera is good for high quality … Context Words Counting Contexts ∈ 𝑆 Vocabulary Size (≈ 300𝑙) good → … 8321, 235, 63444, … Reducing Dimensionality ∈ 𝑆 300 good → … 0.0335, −0.1018, 0.2300, …

  4. Different Input Corpora ? Counting Contexts ∈ 𝑆 Vocabulary Size (≈ 300𝑙) good → … ?, ?, ? , … Reducing Dimensionality ∈ 𝑆 300 good → … ?, ?, ? , …

  5. An article must be written from a neutral point of view, which among other things means “representing fairly, proportionately, and, as far as possible, without editorial bias, all of the significant views that have been published by reliable sources on a topic. ”

  6. “Amazon values diverse opinions” and that “content [customer reviews] you submit should be relevant and based on your own honest opinions and experience. ”

  7. Subjectivity Scale More Objective More Subjective Subjective Embeddings (SE) Objective Embeddings (OE)

  8. Binary Classification Tasks • Sentiment Classification ( positive vs. negative ): • Amazon Reviews (24 categories) + Rotten Tomatoes Reviews “A very funny movie” vs. “One lousy movie” • Subjectivity Classification ( subjective vs. objective ) • Rotten Tomatoes Reviews “T he story needs more dramatic meat ” vs. “She's an artist” • Topic Classification ( in-topic vs. out-of-topic ) • Newsgroups Dataset (6 categories)

  9. Methodology • Cross-validation on balanced samples • Binary logistic regression classifier • Sentence embedding = average of word embeddings • The same number of sentences and the same vocabulary when training embeddings

  10. Objective Embeddings (OE) Empirical Findings Subjective Embeddings (SE) SE and OE are very similar on SE understand “objective” tasks sentiment words Accuracy better than OE? 90 80 70 Subjectivity Topic Amazon Rotten Tomatoes Classification Classification Sentiment Sentiment

  11. Top Words Similar to “good” Objective Embeddings Subjective Embeddings Word Similarity Word Similarity bad 0.68 decent 0.78 decent 0.67 great 0.76 nice 0.62 nice 0.69 poor 0.61 terrific 0.64 … … … …

  12. Sentiment Words Still Cause Troubles! Subjective Embeddings Word A Word B Their Similarity … … … waste Save 0.51 love hate 0.60 loves hates 0.68 easy difficult 0.56 … … …

  13. SentiVec Embeddings Objective Word2Vec Objective SentiVec Embeddings Embeddings Similar to “good” Similarity Similar to “good” Similarity bad 0.68 decent 0.79 decent 0.67 nice 0.76 nice 0.62 perfect 0.75 poor 0.61 excellent 0.73 … … … …

  14. SentiVec: Infusing Sentiment Lexical = + SentiVec Word2Vec Resource Negative : waste, junk, • Predicts context words as in horrible, defective, … Word2Vec Skip-gram Positive : love, great, recommend, easy, … • Predicts word category

  15. Logistic SentiVec This camera is good for high quality … Word2Vec Skip-gram Lexical objective of objective SentiVec (two classes) 𝑄 good 𝑗𝑡 𝑄𝑃𝑇𝐽𝑈𝐽𝑊𝐹 = 𝜏 good ⋅ 𝜚 (good, camera) (good, is) 𝑄 good 𝑗𝑡 𝑂𝐹𝐻𝐵𝑈𝐽𝑊𝐹 = 1 − 𝑄 good 𝑗𝑡 𝑄𝑃𝑇𝐽𝑈𝐽𝑊𝐹 (good, for) (good, high) 𝑄 good 𝑗𝑡 𝑄𝑃𝑇𝐽𝑈𝐽𝑊𝐹 → 𝑁𝐵𝑌𝐽𝑁𝐽𝑎𝐹 vs. Random Noise (good, frog) (good, duck) …

  16. Spherical SentiVec Positive Words Negative Words 𝜚 𝑄𝑃𝑇𝐽𝑈𝐽𝑊𝐹 𝜚 𝑂𝐹𝐻𝐵𝑈𝐽𝑊𝐹 Neutral Words 𝜚 𝑂𝐹𝑉𝑈𝑆𝐵𝑀

  17. SentiVec does not affect Empirical Findings “objective” classification tasks Accuracy 90 0.3% 85 0.8% 80 0% 0.2% 75 Objective Subjective Objective Subjective Embeddings Embeddings Embeddings Embeddings Amazon Sentiment Rotten Tomatoes Sentiment (average over 24 categories)

  18. Positive Words Changes in Similarity Negative Words Neutral Words Target Word: Bad Target Word: Good

  19. Conclusion • Explored effects of corpus subjectivity for word embeddings • SentiVec, a method for infusing lexical information into word embeddings • Sentiment-infused SentiVec embeddings space facilitate better sentiment-related similarity Pre-trained Word Embeddings & Code: https://sentivec.preferred.ai/

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend