[PPT] - Opinion Mining in GATE Opinion Mining in GATE Horacio Saggion & PowerPoint Presentation

SLIDE 1

Opinion Mining in GATE

Opinion Mining in GATE

Horacio Saggion & Adam Funk

SLIDE 2

Is interested in the opinion a particular piece of discourse expresses

– Opinions are subjective statements reflecting people’s sentiments or perceptions on entities or events

There are various problems associated to opinion mining

– Identify if a piece of text is opinionated or not (factual news vs. – Identify if a piece of text is opinionated or not (factual news vs. Editorial) – Identify the entity expressing the opinion – Identify the polarity and degree of the opinion (in favour vs. against) – Identify the theme of the opinion (opinion about what?)

SLIDE 3

Extract Factual Data with

Information Extraction from Company Web Site Extract Opinions using Opinion Mining from Web Fora

SLIDE 4

Combine information extraction from company Web site with OM

findings – Given a review find company web pages and extract factual information from it including products and services – Associate the opinion to the found information

Use information extraction to identify positive/negative phrases and

the “object” of the opinion – Positive: correctly packed bulb, a totally free service, a very efficient management… – Negative: the same disappointing experience, unscrupulous double glazing sales, do not buy a sofa from DFS Poole or DFS anywhere, the utter inefficiency…

SLIDE 5

sentiment
pinion

SLIDE 6

positive opinions

negative opinions negative opinion, but less evident

SLIDE 7

Because we have access to documents which have already an associated class, we

see OM as a classification problem – we consider our data “opinionated”

We are interested in:

– differentiate between positive opinion vs negative opinion

“customer service is diabolical”
“I have always been impressed with this company”
“I have always been impressed with this company”

– recognising fine grained evaluative texts (1-star to 5-star classification)

“one of the easiest companies to order with” (5-stars)
“STAY AWAY FROM THIS SUPPLIER!!!” (1-star)
We use a supervised learning approach (Support Vector Machines) that uses

linguistic features; the system decides which features are most valuable for classification

We use precision, recall, and F-score to assess classification accuracy

SLIDE 8

We have a customisable crawling process to collect all texts from Web fora
92 texts from a Web Consumer forum

– Each text contains a review about a particular company/service/product and a thumbs up/down – texts are short (one/two paragraphs) – 67% negative and 33% positive

600 texts from another Web forum containing reviews on companies or
600 texts from another Web forum containing reviews on companies or

products – Each text is short and it is associated with a 1 to 5 stars review – * ~ 8%; ** ~ 2; *** ~ 3%; **** ~ 20%; ***** ~ 67%

Each document is analysed to separate the commentary/review from the

rest of the document and associate a class to each review

After this, the documents are processed with GATE processing resources:

– tokenisation; sentence identification; parts of speech tagging; morphological analysis; named entity recognition, and sentence parsing

SLIDE 9

Support Vector Machines (SVM) are very good algorithms used for

classification and have been also used in information extraction

Learning in SVM is treated as a binary classification problem and a

multiclass problem is transformed in a set of n binary classification problems

Given a set of training examples, each is represented as a vector in a space
f features and SVM tries to find an hyper plane which separates positive
f features and SVM tries to find an hyper plane which separates positive

from negative instances

Given a new instance SVM will identify in which side of the hyper plane the

new instance lies and produce the classification accordingly

The distance from the hyper plane to the positive and negative instances is

the margin and we use SVM with uneven margins available in GATE

In order to use them, we need to specify how instances are represented and

decide on a number of parameters usually adjusted experimentally over training data

SLIDE 10

We decided to start investigating a very simple approach – word-based or

bag of words approach (usually works very well in text classification) – the original word – the root or lemma of the word (for “running” we use “run”) – the parts of speech category of the word (determinant, noun, verb, etc.) – the orthography of the word (all uppercase, lowercase, etc.) – the orthography of the word (all uppercase, lowercase, etc.)

Each sentence/text is represented as a vector of features and values

– we carried out different combinations of features (different n-grams) – 10-fold cross validation experiments were run over the corpus with binary classifications (up/down) – the combination of root and orthography (unigram) provides the best classifier

around 80% F-score

– use of higher n-grams decreases performance of the classifier – use of more features not necessarily improves performance – a uninformed classifier would have a 67% accuracy

SLIDE 11

Same learning system used to produce the 5 stars

classification over the fine-grained dataset

Same feature combinations were studied:

– 74% overall classification accuracy using word root

nly
nly

– other combinations degrade performance – 1* classification accuracy = 80%; 5* classification accuracy = 75% – 2* = 2%; 3=3%; 4=19% – 2, 3, 4* difficult to classify because or either share vocabulary with extreme cases or are vague

SLIDE 12

!

! ! !

word-based binary classification

– thumbs-down: !, not, that, will, … – thumbs-up: excellent, good, www, com, site, …

word-based fine-grained classification
word-based fine-grained classification

– 1: worst, not, cancelled, avoid,… – 2: shirt, ball, waited,…. – 3: another, didn’t, improve, fine, wrong, … – 4: ok, test, wasn’t, but, however,… – 5*: very, excellent, future, experience, always, great,…

SLIDE 13

!

! ! !

Engineered features based on “linguistic” and sentiment information

associated to words

Linguistic features

– word-based features are restricted to adjective and adverbs and their bigram combinations – “good”, “bad”, “rather”, “quite”, “not”, etc. – “good”, “bad”, “rather”, “quite”, “not”, etc.

Sentiment information

– WordNet lexical database where words appear with their senses and synonyms

chair = the furniture
chair, professorship = the position
chair, president, chairman, … = the officer
chair, electric chair, … = execution instrument

– SentiWordNet adds sentiment information to WordNet and has been used in

pinion mining and sentiment analysis

SLIDE 14

!

! ! !

SentiWordNet (cont.)

– each word has three numerical scores (between 0 and 1): obj, pos, neg (obj+neg+pos=1) neg (obj+neg+pos=1)

SLIDE 15

!

! ! !

Features deduced from SentiWordNet

– word analysis:

countP(w) : the word positivity score (#(pos(w)>neg(w)))
countN(w) : the word negativity score (#(pos(w)<neg(w)))
countF(w): the number of entries of w in SentiWordNet
countF(w): the number of entries of w in SentiWordNet

– sentence analysis

sentiP: number of positive words in sentence

– a word is positive if countP(w)>½countF(w)

sentiN: number of negative words in sentence

– a word is negative if countN(w)>½countF(w)

senti: pos (sentiP > sentiN), neg (sentiN > sentiP), neutral (sentence feature)

– text analysis:

count_pos: number of pos sentences in text
count_neg: number of neg sentences in text
count_neutral: number of neutral sentences in text

SLIDE 16

!

! ! !

Each text is represented as a vector of features and values

– combining the linguistic features (adjectives, adverbs, and their combinations) and the senti, count_pos, count_neg, count_neutral features – 10-fold cross validation experiments were run over the corpus – 10-fold cross validation experiments were run over the corpus with binary classifications (up/down)

overall F-score 76%

– 10-fold cross validation over the fine-grained corpus

overall F-score 72%
1*=58%, 2*=24%, 3*=20%, 4*=19%, 5*=83% (better job in

less extreme categories)

SLIDE 17

!

! ! !

sentiment-based binary classification

– thumbs-down: 8 neutral , never, 1 neutral, negative sentiment (senti feature), very late – thumbs-up: 1 negative , 0 negative , good, original, 0 neutral, fast

sentiment-based fine-grained classification
sentiment-based fine-grained classification

– 1*: still not, cancelled, incorrect,… – 2*: 9 neutral, disappointing, fine, down, … – 3*: likely, expensive, wrong, not able,…. – 4*: competitive, positive, ok, … – 5*: happily, always, 0 negative, so simple, very positive, …

SLIDE 18

"

" " "

Hatzivassiloglou&McKeown’97 note that conjunctions (and, or,

but,…) help in classifying the semantic orientation of adjectives (excellent and useful; good but expensive;…); not used in classification experiments

Riloff&al’03 create a list of subjective words by bootstrapping an
Riloff&al’03 create a list of subjective words by bootstrapping an

initial set of 20 subjective words over a corpus; using the induced list and other features achieves 76% classification accuracy (objective vs subjective distinction)

Turney’02 uses pair-wise mutual information to detect the polarity of

words (mutual information wrt “excellent” and “poor”); using the list in a classifier he achieves 74% classification accuracy

Devitt&Ahmad’07 use SentiWordNet for detecting the polarity of a

piece of news (7-point scale) achieving 55% accuracy

SLIDE 19

The corpus for the exercises consists of 11 documents (already

preprocessed and saved as GATE XML), which contain 81 reviews. (The original corpus contained 600 documents and 7300 reviews.)

Each review is marked with a comment annotation that has a rating

feature with a value from 1_Star_Review to 5_Star_Review (these

feature with a value from 1_Star_Review to 5_Star_Review (these

are the 5 classes for ML). These annotations result from preprocessing the HTML mark-up.

The machine-learning task is to use linguistic features from ANNIE

to classify each comment with the appropriate rating.

SLIDE 20

Create an empty corpus and populate it with the training data files;

create another one with the test data file.

Load ANNIE and modify the Document Reset PR so it will not delete

the “Key” AnnotationSet.

Load Tools and create an Annotation Set Transfer PR to copy all the

#$ #$ #$ #$

Load Tools and create an Annotation Set Transfer PR to copy all the

“comment” annotations from “Key” to the default AS. Put this PR in ANNIE just after the Document Reset. Run the modified ANNIE

ver the training corpus.
Create a JAPE PR from the copy_comment_without_rating
grammar. In ANNIE, substitute it for the AS Transfer PR, with Key

as the inputAS. Run this pipeline over the test corpus.

Load the “learning” plug-in and create a Batch Learning PR from the

sample config.xml file. Create a pipeline for this PR.

SLIDE 21

In the training corpus, each document's default AS should contain

comment annotations with the rating feature. In the test corpus, each one should contain comment annotations without this feature.

Run the learning pipeline in “TRAINING” mode over the training

corpus, then in “APPLICATION” mode over the test corpus.

#$%&' #$%&' #$%&' #$%&'

corpus, then in “APPLICATION” mode over the test corpus.

Examine the default AS in the test corpus now.

SLIDE 22

The ML configuration file in Exercise 1 uses unigrams of

Token.string.

Modify the configuration to use a different feature, such as

Token.root. (Edit and save the config.xml file, then re-initialize the learning PR to reload the configuration.)

#( #( #( #(

learning PR to reload the configuration.)

Modify the configuration to use more than one feature.
Modify the configuration to use bigrams of a Token feature.

SLIDE 23

Modify the classification probabilities and margins in the

configuration file, and observe their effects on the results.

#) #) #) #)

SLIDE 24

Create an empty corpus and populate it from the “full” set of files.
Modify ANNIE as in the first part of Exercise 1 (so the Document

Reset PR does not delete the Key AS, and the AS Transfer copies the comment annotations from Key to default AS).

Run the modified ANNIE pipeline, then the learning pipeline in

#* #* #* #*

Run the modified ANNIE pipeline, then the learning pipeline in

“EVALUATION” mode. This carries out 5-fold cross-validation over the corpus and produces an averaged set of results.

Examine the document annotations in the full corpus.

Opinion Mining in GATE

Horacio Saggion & Adam Funk

classification over the fine-grained dataset

– 74% overall classification accuracy using word root

– other combinations degrade performance – 1* classification accuracy = 80%; 5* classification accuracy = 75% – 2* = 2%; 3*=3%; 4*=19% – 2*, 3*, 4* difficult to classify because or either share vocabulary with extreme cases or are vague

! ! !

– thumbs-down: !, not, that, will, … – thumbs-up: excellent, good, www, com, site, …

– 1*: worst, not, cancelled, avoid,… – 2*: shirt, ball, waited,…. – 3*: another, didn’t, improve, fine, wrong, … – 4*: ok, test, wasn’t, but, however,… – 5*: very, excellent, future, experience, always, great,…

! ! !

! ! !

! ! !

! ! !

! ! !

" " "

#$ #$ #$ #$

#$%&' #$%&' #$%&' #$%&'

#( #( #( #(

#) #) #) #)

#* #* #* #*

– other combinations degrade performance – 1* classification accuracy = 80%; 5* classification accuracy = 75% – 2* = 2%; 3=3%; 4=19% – 2, 3, 4* difficult to classify because or either share vocabulary with extreme cases or are vague

– 1: worst, not, cancelled, avoid,… – 2: shirt, ball, waited,…. – 3: another, didn’t, improve, fine, wrong, … – 4: ok, test, wasn’t, but, however,… – 5*: very, excellent, future, experience, always, great,…