text mining paper presentation determining the sentiment
play

Text Mining Paper Presentation: Determining the Sentiment of - PowerPoint PPT Presentation

Text Mining Paper Presentation: Determining the Sentiment of Opinions Soo-Min Kim Eduard Hovy Presenters: Karthik Chinnathambi (kc4bf) Prashant Bhanu Gorthi (pg3bh) Sofia Francis Xavier (sf4uh) Background Opinion: [Topic, Holder, Claim,


  1. Text Mining Paper Presentation: Determining the Sentiment of Opinions Soo-Min Kim Eduard Hovy Presenters: Karthik Chinnathambi (kc4bf) Prashant Bhanu Gorthi (pg3bh) Sofia Francis Xavier (sf4uh)

  2. Background Opinion: [Topic, Holder, Claim, Sentiment] ● Topic : Theme of the texts ● Holder : Person and Organization ● Claim : Comment about the topic ● Sentiment : Positive, Negative and Neutral ●

  3. Problem Addressed Given a Topic and a set of texts about the topic, find the Sentiments expressed about the Topic in each text, and identify the people who hold each sentiment.

  4. Algorithm Given a topic and a set of texts, the system operates in four steps. Select sentences containing both the topic phrase and holder candidates ● Delimit the regions of opinion based on holder ● Calculate the polarity of all sentiment-bearing words individually using the sentence sentiment ● classifier Combine all the polarities to produce the holder’s sentiment for the whole sentence ●

  5. Architecture

  6. Sentiment Classifiers Word Sentiment Classifier ● Sentence Sentiment Classifier ●

  7. Construction of sentiment seed list Sentiment-bearing words: adjective, verb and noun ● seed lists : randomly selected verbs (23 positive and 21 negative) and adjectives (15 ● positive and 19 negative), adding nouns later. For each seed word, extract from WordNet its expansions and add them back into the ● appropriate seed lists. finally 5880 positive adjectives, 6233 negative adjectives, 2840 positive verbs, and 3239 ● negative verbs. Challenge: Some words are both positive and negative!

  8. Resolving sentiment ambiguous words Given a new word use WordNet to obtain a synonym set of the unseen word argmaxP ( c | w ) ≅ argmaxP ( c | syn1, syn2,....synn) c is a sentiment category (positive or negative) ● W is the unseen word ● synn are the WordNet synonyms of w. ●

  9. Word Sentiment Classifier Model 1 : fk: kth feature of sentiment class c and a member of the synonym set of w count(fk,synset(w)): total number of occurrences of fk in the synonym set of w. Model 2 : P(w|c) : Probability of word w given a sentiment class c.

  10. Sample Output of the Word Sentiment Classifier

  11. Sentence Sentiment Classifier Holder Identification ● Sentiment Region ● Sentence Sentiment Classification Models ●

  12. Holder Identification BBN’s named entity tagger IdentiFinder to identify potential holders of an opinion ● Consider PERSON and ORGANIZATION as the possible opinion holders ● For sentences with more than one Holder, chose the one closest to the Topic ●

  13. Sentiment Region

  14. Sentence Sentiment Classification Models

  15. Experiments Two sets of experiments to examine the performance of: Different word level classifier models ● Different sentence level classifier models ● Classification task defined as assigning each word / sentence as: Positive ● Negative ● Neutral or N/A ●

  16. Experiments: Word Classification Training Data Basic English word list for TOEFL test ● Intersected with a list of 19748 adjectives and 8011 verbs ● Methodology Randomly select 462 adjectives and 502 verbs ● 3 Humans (in pairs) classify the list of randomly selected words ● Baseline for evaluating models proposed in the paper ○ Test the word level classification using 2 models: ● Model that randomly assigns a sentiment category to each word (averaged over 10 iterations) ○ Model 1 proposed in slide 9 - statistical model that takes into account both polarity and strength of the ○ sentiment

  17. Experiments: Word Classification Testing the models Model trained with initial seed list of 23 +ve and 21 -ve verbs, 15 +ve and 19 -ve adjectives ● Tested the effect of increasing the size of the seed list of 251 verbs and 231 adjectives ● Evaluation Agreement measure ● Strict agreement - Agree over all 3 categories ○ Lenient agreement - Merge positive and neutral into one category. Differentiate words with negative ○ sentiment.

  18. Experiments: Word Classification Results Model 1 achieved lower agreement than humans, but better performance than random process. ● Algorithm able to classify 93.07% of verbs and 83.27% of adjectives as either +ve or -ve sentiment. ● Increasing the seed list improved the agreement between human and machine classification. ●

  19. Experiments: Sentence Classification Training Data 100 sentences selected from DUC 2001 corpus ● Topics include “illegal alien”, “term limits”, “gun control” and “NAFTA” ● Two humans annotated the sentences with overall sentiment ● Testing and Evaluation Experimented with combinations of: ● 3 models for sentence classification ○ 4 different window definitions ○ 4 variations of word level classifiers ○ Tested the models using manually annotated and automatic identification of sentiment holder ● Evaluation metric - Classification accuracy ●

  20. Experiments: Sentence Classification Observations Correctness defined as matching both the holder and sentiment ● Best model performance: ● 81% accuracy with manually annotated holder ○ 67% accuracy with automatic holder identification ○

  21. Experiment Results Best performance achieved using: Model 0 (sentence level) - considering only sentiment polarity ● Manually annotated topic and holder ● Variation 4 for sentiment window (words starting from topic/holder to end of the sentence) ● Effect of sentiment categories: Presence of negative words more important than sentiment strength ● Neutral sentiment words classified as non-opinion bearing words in most cases ●

  22. Experiment Results

  23. Problem with the methodology Some words have both strong positive and negative sentiment - ambiguity ● Unigram model is not sufficient ● E.g., ‘Term Limits really hit at democracy’, says Prof. Fenno ○ Holder in the sentence can express multiple opinions ● Models cannot infer sentiment from facts ● E.g., She thinks term limits will give women more opportunities in politics ○ Detecting the holder of the sentiment is challenging when multiple holders are detected in the ● sentence

  24. Conclusion Future work identified by the authors Sentences with weak-opinion-bearing words ● Sentences with multiple opinions about a topic ● Improved sentence parsers to reliably detect holder for a sentiment region ● Explore other learning techniques such as SVM, Decision Lists ●

  25. Q & A?

  26. Thank you!

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend