semi supervised stance detection in tweets based on
play

SEMI-SUPERVISED STANCE DETECTION IN TWEETS BASED ON SENTIMENT RULES - PowerPoint PPT Presentation

1 SEMI-SUPERVISED STANCE DETECTION IN TWEETS BASED ON SENTIMENT RULES Marcelo Dias and Karin Becker Instituto de Informtica UFRGS Porto Alegre - Brazil marcelo.dias@inf.ufrgs.br and karin.becker@inf.ufrgs.br Introduction 2


  1. 1 SEMI-SUPERVISED STANCE DETECTION IN TWEETS BASED ON SENTIMENT RULES Marcelo Dias and Karin Becker Instituto de Informática – UFRGS – Porto Alegre - Brazil marcelo.dias@inf.ufrgs.br and karin.becker@inf.ufrgs.br

  2. Introduction 2  Opinion Analysis  Detect sentiment polarity (negative or positive)  T arget (often mentioned in the text)  Stance Detection  Detect Stance (against or favor)  T owards a given target (main target vs indirect targets)  In favor stance can be expressed through positive/negative sentiments (and vice-versa)

  3. Introduction 3  Related Work  Structured text or discussion threads (congress vote, on-line debate, ....)  wider textual context to interpret content  [Thomas et al. 2006] [Anand et al. 2011] [Somasundaran and Wiebe 2009]  T weets: short text and poorly written content  rely more on inferences from static/dynamic properties of the platform  [Rajadesingan and Liu 2014]  Less focus on properties extracted from textual contents only  Most works adopt supervised methods  Often address a binary problem (Favor/Against)

  4. Goal 4  Stance Detection based only on tweets textual content  Rule-based, Semi-supervised method  3 classes problem (Favor, Against and None)  Improvements on our early work  Third place in SemEval 2016 T ask 6-B (unsupervised, Trump T arget)  Evaluate generality using several distinct domains  SemEval 2016 T ask 6-A T argets (supervised)

  5. Process Overview 6

  6. Process Overview 7

  7. Process Overview: automatic labeling 8

  8. Key and T arget N-grams 9  Key n-grams: terms/phrases that denote a stance  T arget n-grams: identify a target directly or indirectly related to main target  combined with polarity to denote a stance  May be Favor or Against Main target: Hillary Clinton N-GRAMS FAVOR AGAINST KEY ReadyForHillary, StopHillary, Hillary2016 MakeAmericaGreatAgain TARGET Hillary, Democrats T rump, Republicans

  9. Key and T arget N-grams Identifjcation 10

  10. Key and T arget N-grams Identifjcation 11  Input: domain corpus  Current selection  N-Gram frequency ranking  Manual selection of top frequent n-grams  Output: selected Key and T arget n-grams  Currently evaluating automatic n-grams selection methods

  11. Process Overview: Automatic Labeling 12

  12. Rules x Stance 13 FEATURES Presence of at least one Favor/Against Key N-grams Presence of at least one Favor/Against T arget N-grams Presence of at least one hashtag T weet Polarity

  13. Rules x Stance 14 FEATURES Presence of at least one Favor/Against Key N-grams Presence of at least one Favor/Against T arget N-grams Presence of at least one hashtag T weet Polarity

  14. Rules x Stance 15 FEATURES Presence of at least one Favor/Against Key N-grams Presence of at least one Favor/Against T arget N-grams Presence of at least one hashtag T weet Polarity

  15. Rules x Stance 16 FEATURES Presence of at least one Favor/Against Key N-grams Presence of at least one Favor/Against T arget N-grams Presence of at least one hashtag T weet Polarity

  16. Rules x Stance 17 FEATURES Presence of at least one Favor/Against Key N-grams Presence of at least one Favor/Against T arget N-grams Presence of at least one hashtag T weet Polarity

  17. Automatic Labeling 18  Input: selected n-grams and a dataset  T weet Pre-processing  features extraction  tweet polarity detection (combination of ofg-the- shelf APIs)  Rules Application  Output: Filtered labeled tweets and discarded tweets

  18. Predictive Model Generation 20

  19. Method Overview: Stance Detection 22

  20. Experiments 24  Goal:  Generality of the method for stance detection  6 datasets on various domains  Rules coverage  Rules precision  Stance prediction

  21. Datasets: SemEval 2016 – T ask 6 25  Stance: Against, Favor or None  Subtask A – Supervised  5 targets with 2 datasets each (training and test)  Atheism, Climate change is a real concern, Feminism, Hillary Clinton and Legalization of Abortion  Subtask B – Semi- supervised/Unsupervised  1 targets with 2 datasets each (domain and test) Fonte:  Donald Trump http://www.saifmohamma d.com/WebPages/StanceD ataset.htm

  22. Rules Coverage 26  Average corpus coverage: 75%  In general, Rules 2, 3, 4 and 7 were representative  13% to 17%  Rules 5 and 6 are representative only for Atheism  Rule 1 is representative only for Feminism

  23. Rules Precision 27 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% RULE 1 RULE 2 RULE 3 RULE 4 RULE 5 RULE 6 RULE 7

  24. Automatic Labeling x Predictive Model 28 Precision weighted Average 80 77 75 69 69 70 63 62 58 60 56 48 50 42 41 Automatic Labelling 40 35 Predictive Model 30 20 10 0 Abortion Atheism Climate Feminism Hillary Trump

  25. Results x Baseline 29 0.7 0.63 0.62 0.61 0.58 0.57 0.6 0.56 0.54 0.54 0.51 0.48 0.48 0.5 0.42 0.4 0.3 OUR RESUL T 0.2 SEMEVAL WINNER 0.1 0 Except for Trump, all the baselines were developed using a supervised method

  26. Strengths and Weakness 30  Strengths  Simplicity of the method  May be applied to difgerent domains/targets  Simplify the manual corpus annotation efgort  Restricted to n-grams  Weakness  Dependent on the appropriate selection of n-grams  Requires domain knowledge  Some rules do not perform well  Performance depends on the prevalence of the class

  27. Future Work 31  Key and target N-grams automatic identifjcation  Revised set of rules  Neutral stance identifjcation improvement  Improvement of supervised-learning predictive models  Predictive model features  Automatic extraction of training instances from authority twitter profjles  Classifjcation algorithms or committees

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend