SEMI-SUPERVISED STANCE DETECTION IN TWEETS BASED ON SENTIMENT RULES - - PowerPoint PPT Presentation

semi supervised stance detection in tweets based on
SMART_READER_LITE
LIVE PREVIEW

SEMI-SUPERVISED STANCE DETECTION IN TWEETS BASED ON SENTIMENT RULES - - PowerPoint PPT Presentation

1 SEMI-SUPERVISED STANCE DETECTION IN TWEETS BASED ON SENTIMENT RULES Marcelo Dias and Karin Becker Instituto de Informtica UFRGS Porto Alegre - Brazil marcelo.dias@inf.ufrgs.br and karin.becker@inf.ufrgs.br Introduction 2


slide-1
SLIDE 1

SEMI-SUPERVISED STANCE DETECTION IN TWEETS BASED ON SENTIMENT RULES

Marcelo Dias and Karin Becker Instituto de Informática – UFRGS – Porto Alegre - Brazil marcelo.dias@inf.ufrgs.br and karin.becker@inf.ufrgs.br

1

slide-2
SLIDE 2

Introduction

2

 Opinion Analysis

 Detect sentiment polarity (negative or positive)  T

arget (often mentioned in the text)

 Stance Detection

 Detect Stance (against or favor)  T

  • wards a given target (main target vs indirect

targets)

 In favor stance can be expressed through

positive/negative sentiments (and vice-versa)

slide-3
SLIDE 3

Introduction

3

 Related Work

 Structured text or discussion threads (congress vote,

  • n-line debate, ....)

 wider textual context to interpret content  [Thomas et al. 2006] [Anand et al. 2011]

[Somasundaran and Wiebe 2009]

 T

weets: short text and poorly written content

 rely more on inferences from static/dynamic properties of the

platform

 [Rajadesingan and Liu 2014]

 Less focus on properties extracted from textual contents only

 Most works adopt supervised methods  Often address a binary problem (Favor/Against)

slide-4
SLIDE 4

Goal

4

 Stance Detection based only on tweets

textual content

 Rule-based, Semi-supervised method  3 classes problem (Favor, Against and None)  Improvements on our early work

 Third place in SemEval 2016 T

ask 6-B (unsupervised, Trump T arget)

 Evaluate generality using several distinct

domains

 SemEval 2016 T

ask 6-A T argets (supervised)

slide-5
SLIDE 5

Process Overview

6

slide-6
SLIDE 6

Process Overview

7

slide-7
SLIDE 7

Process Overview: automatic labeling

8

slide-8
SLIDE 8

Key and T arget N-grams

9

 Key n-grams: terms/phrases that denote a stance  T

arget n-grams: identify a target directly or indirectly related to main target

 combined with polarity to denote a stance

 May be Favor or Against

Main target: Hillary Clinton

N-GRAMS FAVOR AGAINST KEY ReadyForHillary, Hillary2016 StopHillary, MakeAmericaGreatAgain TARGET Hillary, Democrats T rump, Republicans

slide-9
SLIDE 9

Key and T arget N-grams Identifjcation

10

slide-10
SLIDE 10

Key and T arget N-grams Identifjcation

11

 Input: domain corpus  Current selection

 N-Gram frequency ranking  Manual selection of top frequent

n-grams

 Output: selected Key and

T arget n-grams

 Currently evaluating automatic

n-grams selection methods

slide-11
SLIDE 11

Process Overview: Automatic Labeling

12

slide-12
SLIDE 12

Rules x Stance

13

FEATURES Presence of at least one Favor/Against Key N-grams Presence of at least one Favor/Against T arget N-grams Presence of at least one hashtag T weet Polarity

slide-13
SLIDE 13

Rules x Stance

14

FEATURES Presence of at least one Favor/Against Key N-grams Presence of at least one Favor/Against T arget N-grams Presence of at least one hashtag T weet Polarity

slide-14
SLIDE 14

Rules x Stance

15

FEATURES Presence of at least one Favor/Against Key N-grams Presence of at least one Favor/Against T arget N-grams Presence of at least one hashtag T weet Polarity

slide-15
SLIDE 15

Rules x Stance

16

FEATURES Presence of at least one Favor/Against Key N-grams Presence of at least one Favor/Against T arget N-grams Presence of at least one hashtag T weet Polarity

slide-16
SLIDE 16

Rules x Stance

17

FEATURES Presence of at least one Favor/Against Key N-grams Presence of at least one Favor/Against T arget N-grams Presence of at least one hashtag T weet Polarity

slide-17
SLIDE 17

Automatic Labeling

18

 Input: selected n-grams

and a dataset

 T

weet Pre-processing

 features extraction  tweet polarity detection

(combination of ofg-the- shelf APIs)

 Rules Application  Output: Filtered labeled

tweets and discarded tweets

slide-18
SLIDE 18

Predictive Model Generation

20

slide-19
SLIDE 19

Method Overview: Stance Detection

22

slide-20
SLIDE 20

Experiments

24

 Goal:

 Generality of the method for stance

detection

 6 datasets on various domains

 Rules coverage  Rules precision  Stance prediction

slide-21
SLIDE 21

Datasets: SemEval 2016 – T ask 6

25

 Stance: Against, Favor or None  Subtask A – Supervised

 5 targets with 2 datasets each

(training and test)

 Atheism, Climate change is a real

concern, Feminism, Hillary Clinton and Legalization of Abortion

 Subtask B – Semi-

supervised/Unsupervised

 1 targets with 2 datasets each

(domain and test)

 Donald Trump

Fonte: http://www.saifmohamma d.com/WebPages/StanceD ataset.htm

slide-22
SLIDE 22

Rules Coverage

26

 Average corpus coverage: 75%  In general, Rules 2, 3, 4 and 7 were representative

 13% to 17%

 Rules 5 and 6 are representative only for Atheism  Rule 1 is representative only for Feminism

slide-23
SLIDE 23

Rules Precision

27 RULE 1 RULE 2 RULE 3 RULE 4 RULE 5 RULE 6 RULE 7 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

slide-24
SLIDE 24

Automatic Labeling x Predictive Model

28

Precision weighted Average

Abortion Atheism Climate Feminism Hillary Trump

10 20 30 40 50 60 70 80 69 75 77 62 69 58 48 41 63 42 56 35

Automatic Labelling Predictive Model

slide-25
SLIDE 25

Results x Baseline

29 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.48 0.54 0.54 0.48 0.63 0.51 0.57 0.61 0.42 0.62 0.58 0.56

OUR RESUL T SEMEVAL WINNER

Except for Trump, all the baselines were developed using a supervised method

slide-26
SLIDE 26

Strengths and Weakness

30

 Strengths  Simplicity of the method  May be applied to difgerent domains/targets  Simplify the manual corpus annotation efgort  Restricted to n-grams  Weakness  Dependent on the appropriate selection of n-grams  Requires domain knowledge  Some rules do not perform well  Performance depends on the prevalence of the class

slide-27
SLIDE 27

Future Work

31

 Key and target N-grams automatic

identifjcation

 Revised set of rules  Neutral stance identifjcation improvement  Improvement of supervised-learning

predictive models

 Predictive model features  Automatic extraction of training instances from

authority twitter profjles

 Classifjcation algorithms or committees