SEMI-SUPERVISED STANCE DETECTION IN TWEETS BASED ON SENTIMENT RULES
Marcelo Dias and Karin Becker Instituto de Informática – UFRGS – Porto Alegre - Brazil marcelo.dias@inf.ufrgs.br and karin.becker@inf.ufrgs.br
1
SEMI-SUPERVISED STANCE DETECTION IN TWEETS BASED ON SENTIMENT RULES - - PowerPoint PPT Presentation
1 SEMI-SUPERVISED STANCE DETECTION IN TWEETS BASED ON SENTIMENT RULES Marcelo Dias and Karin Becker Instituto de Informtica UFRGS Porto Alegre - Brazil marcelo.dias@inf.ufrgs.br and karin.becker@inf.ufrgs.br Introduction 2
Marcelo Dias and Karin Becker Instituto de Informática – UFRGS – Porto Alegre - Brazil marcelo.dias@inf.ufrgs.br and karin.becker@inf.ufrgs.br
1
2
Opinion Analysis
Detect sentiment polarity (negative or positive) T
Stance Detection
Detect Stance (against or favor) T
In favor stance can be expressed through
3
Related Work
Structured text or discussion threads (congress vote,
wider textual context to interpret content [Thomas et al. 2006] [Anand et al. 2011]
[Somasundaran and Wiebe 2009]
T
weets: short text and poorly written content
rely more on inferences from static/dynamic properties of the
platform
[Rajadesingan and Liu 2014]
Less focus on properties extracted from textual contents only
Most works adopt supervised methods Often address a binary problem (Favor/Against)
4
Stance Detection based only on tweets
Rule-based, Semi-supervised method 3 classes problem (Favor, Against and None) Improvements on our early work
Third place in SemEval 2016 T
Evaluate generality using several distinct
SemEval 2016 T
6
7
8
9
Key n-grams: terms/phrases that denote a stance T
combined with polarity to denote a stance
May be Favor or Against
Main target: Hillary Clinton
N-GRAMS FAVOR AGAINST KEY ReadyForHillary, Hillary2016 StopHillary, MakeAmericaGreatAgain TARGET Hillary, Democrats T rump, Republicans
10
11
Input: domain corpus Current selection
N-Gram frequency ranking Manual selection of top frequent
n-grams
Output: selected Key and
Currently evaluating automatic
12
13
FEATURES Presence of at least one Favor/Against Key N-grams Presence of at least one Favor/Against T arget N-grams Presence of at least one hashtag T weet Polarity
14
FEATURES Presence of at least one Favor/Against Key N-grams Presence of at least one Favor/Against T arget N-grams Presence of at least one hashtag T weet Polarity
15
FEATURES Presence of at least one Favor/Against Key N-grams Presence of at least one Favor/Against T arget N-grams Presence of at least one hashtag T weet Polarity
16
FEATURES Presence of at least one Favor/Against Key N-grams Presence of at least one Favor/Against T arget N-grams Presence of at least one hashtag T weet Polarity
17
FEATURES Presence of at least one Favor/Against Key N-grams Presence of at least one Favor/Against T arget N-grams Presence of at least one hashtag T weet Polarity
18
Input: selected n-grams
T
features extraction tweet polarity detection
Rules Application Output: Filtered labeled
20
22
24
Goal:
Generality of the method for stance
6 datasets on various domains
Rules coverage Rules precision Stance prediction
25
Stance: Against, Favor or None Subtask A – Supervised
5 targets with 2 datasets each
(training and test)
Atheism, Climate change is a real
concern, Feminism, Hillary Clinton and Legalization of Abortion
Subtask B – Semi-
1 targets with 2 datasets each
(domain and test)
Donald Trump
Fonte: http://www.saifmohamma d.com/WebPages/StanceD ataset.htm
26
Average corpus coverage: 75% In general, Rules 2, 3, 4 and 7 were representative
13% to 17%
Rules 5 and 6 are representative only for Atheism Rule 1 is representative only for Feminism
27 RULE 1 RULE 2 RULE 3 RULE 4 RULE 5 RULE 6 RULE 7 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
28
Precision weighted Average
Abortion Atheism Climate Feminism Hillary Trump
10 20 30 40 50 60 70 80 69 75 77 62 69 58 48 41 63 42 56 35
Automatic Labelling Predictive Model
29 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.48 0.54 0.54 0.48 0.63 0.51 0.57 0.61 0.42 0.62 0.58 0.56
OUR RESUL T SEMEVAL WINNER
Except for Trump, all the baselines were developed using a supervised method
30
Strengths Simplicity of the method May be applied to difgerent domains/targets Simplify the manual corpus annotation efgort Restricted to n-grams Weakness Dependent on the appropriate selection of n-grams Requires domain knowledge Some rules do not perform well Performance depends on the prevalence of the class
31
Key and target N-grams automatic
Revised set of rules Neutral stance identifjcation improvement Improvement of supervised-learning
Predictive model features Automatic extraction of training instances from
Classifjcation algorithms or committees