Text Mining Paper Presentation: Determining the Sentiment of - - PowerPoint PPT Presentation

text mining paper presentation determining the sentiment
SMART_READER_LITE
LIVE PREVIEW

Text Mining Paper Presentation: Determining the Sentiment of - - PowerPoint PPT Presentation

Text Mining Paper Presentation: Determining the Sentiment of Opinions Soo-Min Kim Eduard Hovy Presenters: Karthik Chinnathambi (kc4bf) Prashant Bhanu Gorthi (pg3bh) Sofia Francis Xavier (sf4uh) Background Opinion: [Topic, Holder, Claim,


slide-1
SLIDE 1

Text Mining Paper Presentation: Determining the Sentiment of Opinions

Soo-Min Kim Eduard Hovy Presenters: Karthik Chinnathambi (kc4bf) Prashant Bhanu Gorthi (pg3bh) Sofia Francis Xavier (sf4uh)

slide-2
SLIDE 2

Background

  • Opinion: [Topic, Holder, Claim, Sentiment]
  • Topic : Theme of the texts
  • Holder : Person and Organization
  • Claim : Comment about the topic
  • Sentiment : Positive, Negative and Neutral
slide-3
SLIDE 3

Problem Addressed

Given a Topic and a set of texts about the topic, find the Sentiments expressed about the Topic in each text, and identify the people who hold each sentiment.

slide-4
SLIDE 4

Algorithm

Given a topic and a set of texts, the system operates in four steps.

  • Select sentences containing both the topic phrase and holder candidates
  • Delimit the regions of opinion based on holder
  • Calculate the polarity of all sentiment-bearing words individually using the sentence sentiment

classifier

  • Combine all the polarities to produce the holder’s sentiment for the whole sentence
slide-5
SLIDE 5

Architecture

slide-6
SLIDE 6

Sentiment Classifiers

  • Word Sentiment Classifier
  • Sentence Sentiment Classifier
slide-7
SLIDE 7

Construction of sentiment seed list

  • Sentiment-bearing words: adjective, verb and noun
  • seed lists : randomly selected verbs (23 positive and 21 negative) and adjectives (15

positive and 19 negative), adding nouns later.

  • For each seed word, extract from WordNet its expansions and add them back into the

appropriate seed lists.

  • finally 5880 positive adjectives, 6233 negative adjectives, 2840 positive verbs, and 3239

negative verbs. Challenge: Some words are both positive and negative!

slide-8
SLIDE 8

Resolving sentiment ambiguous words

Given a new word use WordNet to obtain a synonym set of the unseen word argmaxP ( c | w ) ≅argmaxP ( c | syn1, syn2,....synn)

  • c is a sentiment category (positive or negative)
  • W is the unseen word
  • synn are the WordNet synonyms of w.
slide-9
SLIDE 9

Model 1 : Model 2 :

fk: kth feature of sentiment class c and a member of the synonym set of w count(fk,synset(w)): total number of occurrences of fk in the synonym set of w.

Word Sentiment Classifier

P(w|c) : Probability of word w given a sentiment class c.

slide-10
SLIDE 10

Sample Output of the Word Sentiment Classifier

slide-11
SLIDE 11

Sentence Sentiment Classifier

  • Holder Identification
  • Sentiment Region
  • Sentence Sentiment Classification Models
slide-12
SLIDE 12

Holder Identification

  • BBN’s named entity tagger IdentiFinder to identify potential holders of an opinion
  • Consider PERSON and ORGANIZATION as the possible opinion holders
  • For sentences with more than one Holder, chose the one closest to the Topic
slide-13
SLIDE 13

Sentiment Region

slide-14
SLIDE 14

Sentence Sentiment Classification Models

slide-15
SLIDE 15

Experiments

Two sets of experiments to examine the performance of:

  • Different word level classifier models
  • Different sentence level classifier models

Classification task defined as assigning each word / sentence as:

  • Positive
  • Negative
  • Neutral or N/A
slide-16
SLIDE 16

Experiments: Word Classification

Training Data

  • Basic English word list for TOEFL test
  • Intersected with a list of 19748 adjectives and 8011 verbs

Methodology

  • Randomly select 462 adjectives and 502 verbs
  • 3 Humans (in pairs) classify the list of randomly selected words

○ Baseline for evaluating models proposed in the paper

  • Test the word level classification using 2 models:

○ Model that randomly assigns a sentiment category to each word (averaged over 10 iterations) ○ Model 1 proposed in slide 9 - statistical model that takes into account both polarity and strength of the sentiment

slide-17
SLIDE 17

Experiments: Word Classification

Testing the models

  • Model trained with initial seed list of 23 +ve and 21 -ve verbs, 15 +ve and 19 -ve adjectives
  • Tested the effect of increasing the size of the seed list of 251 verbs and 231 adjectives

Evaluation

  • Agreement measure

○ Strict agreement - Agree over all 3 categories ○ Lenient agreement - Merge positive and neutral into one category. Differentiate words with negative sentiment.

slide-18
SLIDE 18

Experiments: Word Classification

Results

  • Model 1 achieved lower agreement than humans, but better performance than random process.
  • Algorithm able to classify 93.07% of verbs and 83.27% of adjectives as either +ve or -ve sentiment.
  • Increasing the seed list improved the agreement between human and machine classification.
slide-19
SLIDE 19

Experiments: Sentence Classification

Training Data

  • 100 sentences selected from DUC 2001 corpus
  • Topics include “illegal alien”, “term limits”, “gun control” and “NAFTA”
  • Two humans annotated the sentences with overall sentiment

Testing and Evaluation

  • Experimented with combinations of:

○ 3 models for sentence classification ○ 4 different window definitions ○ 4 variations of word level classifiers

  • Tested the models using manually annotated and automatic identification of sentiment holder
  • Evaluation metric - Classification accuracy
slide-20
SLIDE 20

Experiments: Sentence Classification

Observations

  • Correctness defined as matching both the holder and sentiment
  • Best model performance:

○ 81% accuracy with manually annotated holder ○ 67% accuracy with automatic holder identification

slide-21
SLIDE 21

Experiment Results

Best performance achieved using:

  • Model 0 (sentence level) - considering only sentiment polarity
  • Manually annotated topic and holder
  • Variation 4 for sentiment window (words starting from topic/holder to end of the sentence)

Effect of sentiment categories:

  • Presence of negative words more important than sentiment strength
  • Neutral sentiment words classified as non-opinion bearing words in most cases
slide-22
SLIDE 22

Experiment Results

slide-23
SLIDE 23

Problem with the methodology

  • Some words have both strong positive and negative sentiment - ambiguity
  • Unigram model is not sufficient

○ E.g., ‘Term Limits really hit at democracy’, says Prof. Fenno

  • Holder in the sentence can express multiple opinions
  • Models cannot infer sentiment from facts

○ E.g., She thinks term limits will give women more opportunities in politics

  • Detecting the holder of the sentiment is challenging when multiple holders are detected in the

sentence

slide-24
SLIDE 24

Conclusion

Future work identified by the authors

  • Sentences with weak-opinion-bearing words
  • Sentences with multiple opinions about a topic
  • Improved sentence parsers to reliably detect holder for a sentiment region
  • Explore other learning techniques such as SVM, Decision Lists
slide-25
SLIDE 25

Q & A?

slide-26
SLIDE 26

Thank you!