Case Study: Social Media Analytics for Stance Mining With Examples - - PDF document

case study social media analytics for stance mining
SMART_READER_LITE
LIVE PREVIEW

Case Study: Social Media Analytics for Stance Mining With Examples - - PDF document

<Your Name> Case Study: Social Media Analytics for Stance Mining With Examples From COVID-19 Twitter Analysis Sumeet Kumar sumeetku@andrew.cmu.edu Center for Computational Analysis of Social and Organizational Systems


slide-1
SLIDE 1

<Your Name> 1

Center for Computational Analysis of Social and Organizational Systems http://www.casos.cs.cmu.edu/

Case Study: Social Media Analytics for Stance Mining

With Examples From COVID-19 Twitter Analysis

Sumeet Kumar

sumeetku@andrew.cmu.edu

7 June 2020 2 Sumeet Kumar

Let’s Define the Terms

  • Stance is defined as a mental or emotional

position adopted with respect to a proposition, a person, an idea, etc. [1].

  • Users’ Stance is categorized as:

– Pro (Favor) – Con (Anti) – Neutral (or unknown)

  • 1. https://www.thefreedictionary.com/stance
slide-2
SLIDE 2

<Your Name> 2

7 June 2020 3 Sumeet Kumar

How to Learn Users’ Stance (Pro/Anti)? Prior research

  • n stance mining has appeared in two flavors
  • 1. Language (Text) based Approach [1]
  • 2. Network based Approach [2]
  • 1. SemEval-2016 Task 6: Detecting Stance in Tweets. Mohammad et al., 2016
  • 2. 2011, Conover, Michael, Jacob Ratkiewicz, Matthew R. Francisco, Bruno Gonçalves, Filippo Menczer, and Alessandro Flammini.

"Political polarization on twitter." ICWSM 133 (2011): 89-96

7 June 2020 4 Sumeet Kumar

Gun Gun Contr

  • l

Tweet Target/Topic Stance (Pro/Anti)

Prior work on Language Based Stance Learning is Mostly Supervised which Requires Labeled data. Labeling data is Expensive.

SemEval-2016 Task 6: Detecting Stance in Tweets. Mohammad et al., 2016

slide-3
SLIDE 3

<Your Name> 3

7 June 2020 5 Sumeet Kumar

Stance could also be learned from other multi-modal interactions (Networks)

7 June 2020 6 Sumeet Kumar

Network Based Stance Learning Methods are

  • ften Semi-Supervised, so Require Less Labeled
  • Data. However, they can’t handle isolates

2011, Conover, Michael, Jacob Ratkiewicz, Matthew R. Francisco, Bruno Gonçalves, Filippo Menczer, and Alessandro Flammini. "Political polarization on twitter." ICWSM 133 (2011): 89-96

Right Leaning Users Left Leaning Users

slide-4
SLIDE 4

<Your Name> 4

7 June 2020 7 Sumeet Kumar

In a Real (un-processed) Network, the Isolates in the Network form a Good Fraction of the Dataset

Unprocessed gun-control conversations on Twitter Collected by searching gun-control related terms. Links are based on Retweets. Twitter Users A retweets-based Network after removing the isolates

Conover et al. Political polarization on twitter." ICWSM 133 (2011): 89-96

7 June 2020 8 Sumeet Kumar

Three Main Challenges in Existing Approaches to Stance Mining 1.Most language-based stance mining models use supervised machine learning which is expensive 2.Network based semi-supervised approaches require less labeled data but cannot handle isolates 3.Topics change fast and new topics emerge which make the problem more challenging

slide-5
SLIDE 5

<Your Name> 5

7 June 2020 9 Sumeet Kumar

Goal of this New Methodology: Can we Combine the Strengths of Text based Methods and Networks based Methods?

Predict the Stance of All Users in a Realistic Network Text based Stance Learner Network based Stance Learner

7 June 2020 10 Sumeet Kumar

Co-Training on Social Networks: A Joint Network Label Propagation and Text Classification Approach for Stance Mining [2]

Red nodes are `Pro’ and Green nodes are `Anti’ Users Gun-control users’

  • Network. Links

represent retweets- based interactions. Input

#GunControlNow: Pro #2ndAmendment: Anti

Model Training Step 1 Step 2 Step 3

  • 2. Sumeet Kumar, Tom Mitchell, Kathleen M. Carley, Co-Training on Social Networks, Currently under review

Extract Data

slide-6
SLIDE 6

<Your Name> 6

7 June 2020 11 Sumeet Kumar

Proposed Idea: A Three Step Process

label_propa gation_v2.p ptx label_propa gation_v2.p ptx

Users ‐ Text Co‐Hashtags Graph

label_propa gation_v2.p ptx

Retweets Graph Network with Text features Label Propagation to Unlabeled Nodes Text Classifiers’ Predictions of Unlabeled Nodes Add new `Confident’ Node Labels New Label N ew N ode Labels Updates for the Next Iteration Label M ixing Seed Labeled Users

#GunControlNow: Pro #2ndAmendment: Anti

Extract Text Features and Users Networks Label 2 to 4 hashtags Derive stance of other users from seed users Step 3 Step 1 Step 2

7 June 2020 12 Sumeet Kumar

Step 1: Extract users’ text features and users’ networks from data

Interactions Extracted text-data and Networks

label_propa gation_v2.p ptx label_propa gation_v2.p ptx

Users ‐ Text Users ‐ Hashtags Graph

label_propa gation_v2.p ptx

Users ‐ Retweets Graph

slide-7
SLIDE 7

<Your Name> 7

7 June 2020 13 Sumeet Kumar

Step 1: Extract text features and users’ networks from data 1.Extract users text data 2.Extract networks

Users-Hashtags (Networks) Users’ Text

User Tag Weight

cenkuygur #IowaCau cuses

1

cenkuygur #NotMeUS

1

Users-Retweets (Networks)

User Retweet Weight

spthursby cenkuygur

1

7 June 2020 14 Sumeet Kumar

Step 2: Label 2 to 4 popular hashtags with clear stance

Steps:

  • 1. Use hashtags that

appear at the end of tweets

  • 2. Sort hashtags by their

popularity

  • 3. Label a few popular

hashtags that have clear stance e.g. #GunControlNow

slide-8
SLIDE 8

<Your Name> 8

7 June 2020 15 Sumeet Kumar

Step 3: A Semi-supervised Approach (Co-Training + Label Propagation)

  • Semi-supervised approaches of machine learning

is suitable for partially labeled data

  • We use a co-training setting

7 June 2020 16 Sumeet Kumar

What is Co-Training?

  • Co-training requires

two independent views to train two separate classifiers (weak learners) iteratively [1]

  • In the training

process, more confident predictions are used as new training data [1]

Image Source https://www.slideshare.net/butest/semisupervised-learning

1: Blum, Avrim, and Tom Mitchell. "Combining labeled and unlabeled data with co- training." Proceedings of the eleventh annual conference on Computational learning theory. ACM, 1998.

New labeled example

slide-9
SLIDE 9

<Your Name> 9

7 June 2020 17 Sumeet Kumar

What is Co-Training? Applied to Website Classification

Blum, Avrim, and Tom Mitchell. "Combining labeled and unlabeled data with co-training." Proceedings of the eleventh annual conference on Computational learning theory. ACM, 1998.

Academic / Non- Academic Webpage Classification View 1 (website) My advisor is Tom Mitchell and I work on….. View 2 (Text on the Links to the website)

  • Prof. Mitchell’s work on never

ending learning …

  • Prof. Mitchell, an expert in

machine learning, mentioned …

7 June 2020 18 Sumeet Kumar

Co-Training could be useful if each data point has two (or more) views

Blum, Avrim, and Tom Mitchell. "Combining labeled and unlabeled data with co-training." Proceedings of the eleventh annual conference on Computational learning theory. ACM, 1998.

View 1 View 2 New labeled examples Unlabeled examples

slide-10
SLIDE 10

<Your Name> 10

7 June 2020 19 Sumeet Kumar

Co-Training on Social-Networks.. What could be the multiple views?

View 1 View 2

User 1 #1 #2 .... #n User 3 1 9 User 3 2 User 4 1 1 1 User 5 6 1

Users‐Hashtags Matrix

Stance from Users’ Interaction Networks

New labeled examples Unlabeled examples Social Networks Data

7 June 2020 20 Sumeet Kumar

Co-Training on Social Networks - Texts and Networks Could be Considered as Different Views

Network with Text features Label Propagation to Unlabeled Nodes Text Classifiers’ Predictions of Unlabeled Nodes Add new `Confident’ Node Labels New Label N e w N o d e L a b e ls Updates for the Next Iteration L a b e l M ix in g Seed Labeled Users

View 1 – Network based View 2 – Text based

slide-11
SLIDE 11

<Your Name> 11

7 June 2020 21 Sumeet Kumar

Co-Training on Social Networks. Texts and Networks form Different Views

Network with Text features Label Propagation to Unlabeled Nodes Text Classifiers’ Predictions of Unlabeled Nodes Add new `Confident’ Node Labels New Label N e w N o d e L a b e ls Updates for the Next Iteration L a b e l M ix in g Seed Labeled Users

1 – Network based 2 – Text based Proposed Algorithm

7 June 2020 22 Sumeet Kumar

Classifier 1: Network Classifier – A Label Propagation Model

Step 1 Initialize Step 2

slide-12
SLIDE 12

<Your Name> 12

7 June 2020 23 Sumeet Kumar

Classifier 1: Label propagation on user-user networks has shortcomings

  • Many Social-Media Networks are bi-partitie i.e.

users relate to other entities

  • Often entities on Social Media follow power law

distribution

  • Converting user-posts network to user-user

network explodes the size

– For example. 100,000 users and 200 hashtags get converted to 100,000 x 100,000 size user-user network

7 June 2020 24 Sumeet Kumar

Label Propagation Model on Bipartite Networks

  • New users are labeled by propagating hashtag stance

to users Stance =+1 Stance =-1 W`43 > W`23

slide-13
SLIDE 13

<Your Name> 13

7 June 2020 25 Sumeet Kumar

Label Propagation Model on Bipartite Networks With Influence Functions

  • Influence functions are used to filter less confident predictions
  • In a Linear Threshold function, if a user gets higher then a

certain level of influence from the influencers, the user gets influenced

  • New users are labeled by propagating hashtag stance to users

Stance = +1 Stance = -1 If (W`43 - W`23) > K

7 June 2020 26 Sumeet Kumar

Classifier 1: Label Propagation Model on Bipartite Networks Better Suits our Needs

  • Influence functions are used to filter less confident

predictions

  • Influence functions ’ and are threshold functions and used

to filter out not confident hashtags and users respectively

slide-14
SLIDE 14

<Your Name> 14

7 June 2020 27 Sumeet Kumar

Classifier 2: Learn Stance from Text in Users’ Tweets

Network with Text features Label Propagation to Unlabeled Nodes Text Classifiers’ Predictions of Unlabeled Nodes Add new `Confident’ Node Labels New Label N ew N ode Labels Updates for the Next Iteration Label M ixing Seed Labeled Users

7 June 2020 28 Sumeet Kumar

Classifier 2: A Typical Text Based Classifier

  • A simple text classifier (e.g.

Support Vector Machine) uses labeled data to train a model

  • The trained model is used to

predict labels of unlabeled data

S1: ? S2:+1 S3: ? S4:‐1 S5: ?

Labeled Unlabeled

Initialize (θ, CT) Text Classifier Classifier Predictions

E‐Step M‐Step

Update (θ, CT)

slide-15
SLIDE 15

<Your Name> 15

7 June 2020 29 Sumeet Kumar

Classifier 2: A Text Classifier with Self- Training

  • When plenty of unlabeled data is

available, models’ predictions could be used to train a better model… also called self-training [1]

  • Self-training exploits unlabeled

data

  • In self-training, in every iteration,

new ‘confident’ predictions are used as new training examples

S1: ? S2:+1 S3: ? S4:‐1 S5: ?

Labeled Unlabeled

Initialize (θ, CT) Text Classifier Classifier Predictions

E‐Step M‐Step

Update (θ, CT)

Users with high confidence (Cj

T > Threshold)

  • 1. Nigam, Kamal, and Rayid Ghani. "Analyzing the effectiveness and applicability of co-

training." In Proceedings of the ninth international conference on Information and knowledge management, pp. 86-93. 2000.

7 June 2020 30 Sumeet Kumar

Classifier 2: An SVM Text Classifier with Confidence Estimate and a Decreasing Threshold Function

Stance and confidence estimate of user j based on his/her tweets’ text sT

j = stance of jth user

sk = stance of kth text message of user j ft = Uniformly decreasing function T = text threshold cT

j = user text-based confidence estimate

slide-16
SLIDE 16

<Your Name> 16

7 June 2020 31 Sumeet Kumar

In ‘Label Mixing’, add the top 5% confident predictions as new training examples in the next iteration

  • In co-training, more confident predictions (of both

classifiers) are added as new training data in each iteration

  • In each iteration, we use the top 5% predictions of both

classifiers as new training examples. In case of a conflict among classifiers, we use the the more confident prediction

Network with Text features Label Propagation to Unlabeled Nodes Text Classifiers’ Predictions of Unlabeled Nodes Add new `Confident’ Node Labels New Label N ew N ode Labels Updates for the Next Iteration Label M ixing Seed Labeled Users

Classifier 1 Classifier 2

7 June 2020 32 Sumeet Kumar

Joint Model: It Combines the Predictions of Both the Text and the Network Classifier

The joint model uses the predictions of the more confident of the two classifiers (text and network) to predict the final stance sj = stance of jth user (joint model) sT

j = stance of jth user based on text

sI

j = stance of jth user based on interaction

cT

j = user text based confidence estimate

cI

j = user interaction based confidence

slide-17
SLIDE 17

<Your Name> 17

7 June 2020 33 Sumeet Kumar

Summary - Three Steps to Train Two Stance Classifiers

label_propa gation_v2.p ptx label_propa gation_v2.p ptx

Users ‐ Text Co‐Hashtags Graph

label_propa gation_v2.p ptx

Retweets Graph Network with Text features Label Propagation to Unlabeled Nodes Text Classifiers’ Predictions of Unlabeled Nodes Add new `Confident’ Node Labels New Label N ew N ode Labels Updates for the Next Iteration Label M ixing Seed Labeled Users

#GunControlNow: Pro #2ndAmendment: Anti

Extract Text Features and Users Networks Label 2 to 4 hashtags Derive stance of other users from seed users Step 3 Step 1 Step 2

7 June 2020 34 Sumeet Kumar

Experiment: Users’ Stance Dataset on Three Controversial Topics

Dataset Labeled Users in the Dataset

Haokai Lu, James Caverlee, and Wei Niu. 2015. BiasWatch: A Lightweight System for Discovering and Tracking Topic-Sensitive Opinion Bias in Social Media. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management (CIKM '15). ACM, New York, NY, USA, 213-222. DOI: https://doi.org/10.1145/2806416.2806573

slide-18
SLIDE 18

<Your Name> 18

7 June 2020 35 Sumeet Kumar

Experiment: Manually Labeled Four Hashtags in Each Dataset

Labeled two pro and two anti hashtags in each dataset

7 June 2020 36 Sumeet Kumar

Experiment Details

  • 3 hop label propagation is used by the network

classifier

  • SVM classifier is used as the text classifier

– TF-IDF features – Unigrams and bigrams are used

  • Hyper-parameters were determined by evaluating

them on the gun-control dataset – Top 250 hashtags are used – Top 5000 retweets are used

slide-19
SLIDE 19

<Your Name> 19

7 June 2020 37 Sumeet Kumar

Result: Co-Trained Classifiers Perform Better than Self-Trained Test Accuracy Trend for the Gun- Control dataset

7 June 2020 38 Sumeet Kumar

Result: Co-trained Models Outperform Self-Trained Models

Red nodes are `Pro’ and Green are `Anti’ Red nodes are `Pro’ and Green are `Anti’ Co-trained Joint Model 85 % Accurate Text based Self-Trained Model 68 % Accurate Gun-Control Dataset

slide-20
SLIDE 20

<Your Name> 20

7 June 2020 39 Sumeet Kumar

Result: Co-Trained Classifiers Perform Better than Self-Trained on All Dataset

  • Text classifier improves by more than

17% on all three datasets

  • LP in the figure implies bi-partitie

label propagation

Gun-control Abortion Obamacare

7 June 2020 40 Sumeet Kumar

Results: Comparing Different Seed Hashtags

Comparison of Seed Hashtags: Some Seed Hashtags May lead to Poor Models

slide-21
SLIDE 21

<Your Name> 21

7 June 2020 41 Sumeet Kumar

Any Questions So Far?

41

7 June 2020 42 Sumeet Kumar

Case Study- Twitter COVID-19 Data Analyze Users Stance on `Fire Dr. Fauci’ Topic -- Fire Dr. Fauci

slide-22
SLIDE 22

<Your Name> 22

7 June 2020 43 Sumeet Kumar

Case Study- Twitter COVID-19 Data Analyze Users Stance on `Fire Dr. Fauci’ Input:

  • 1. Twitter data as Json file
  • 2. Labeled Hashtags

Output:

  • 1. Users Stance Labels
  • 2. Other Hashtags Stance Labels
  • 3. URL Stance labels

7 June 2020 44 Sumeet Kumar

Model for Propagating Stance from Users to Other Entities E.g. From Users’ Stance to Stance Given by Hashtags

Users Stance  Hashtags Stance Users Stance  URLs (Websites) Stance Users Stance  Media URLs (Pictures) Stance

slide-23
SLIDE 23

<Your Name> 23

7 June 2020 45 Sumeet Kumar

Stance Mining Applied to ‘Fire Dr. Fauci’ in Covid Data

– Fire Dr. Fauci (vs Save Dr. Fauci) – Labeled seed hashtags for stance analysis

Tags used for data filtering: 'fauci’, 'firing fauci’, '#firefauci', '#firetrump' ,'#savefauci',

firefauci:1,firedrfauci:1,faucithefraud:1, savefauci:-1,fauciisahero: -1,keepfauci: -1,firetrumpkeepfauci: -1

7 June 2020 46 Sumeet Kumar

Agenda - I Try to Answer Two Questions in This part

  • f the Talk

1.How to identify the users that are pro (or anti) a given topic? 2.How the users differ in their usage of hashtags?

slide-24
SLIDE 24

<Your Name> 24

7 June 2020 47 Sumeet Kumar

Start ORA

  • Start ORA and Import Data

7 June 2020 48 Sumeet Kumar

Import Twitter Data

  • Pick Twitter Data
slide-25
SLIDE 25

<Your Name> 25

7 June 2020 49 Sumeet Kumar

Import Twitter Data

  • Pick Twitter Data
  • Select Import Options

7 June 2020 50 Sumeet Kumar

Import Twitter Data

  • Pick Twitter Data
  • Select Import Options
slide-26
SLIDE 26

<Your Name> 26

7 June 2020 51 Sumeet Kumar

Import Twitter Data

  • Pick Twitter Data
  • Select Import Options

7 June 2020 52 Sumeet Kumar

Import Twitter Data

  • Pick Twitter Data
  • Select Import Options
  • Import Data
slide-27
SLIDE 27

<Your Name> 27

7 June 2020 53 Sumeet Kumar

Import Twitter Data

7 June 2020 54 Sumeet Kumar

Start Stance Detection Analysis

  • Pick the option shown below
slide-28
SLIDE 28

<Your Name> 28

7 June 2020 55 Sumeet Kumar

Start Stance Detection Analysis

  • Pick the option shown below

7 June 2020 56 Sumeet Kumar

Stance Detection Analysis

  • Pick the option shown below
slide-29
SLIDE 29

<Your Name> 29

7 June 2020 57 Sumeet Kumar

Stance Detection Analysis

  • Assign stance values to a selected set of hashtags
  • You can copy paste the values from the slide (or

enter it manually)

7 June 2020 58 Sumeet Kumar

Stance Detection Analysis

  • Assign stance values to a selected set of hashtags
  • You can copy paste the values from the slide (or

enter it manually)

slide-30
SLIDE 30

<Your Name> 30

7 June 2020 59 Sumeet Kumar

Stance Detection Analysis

  • Select save option
  • Stance detection report will be generated

7 June 2020 60 Sumeet Kumar

Stance Detection Analysis

  • Stance detection report – shows selected options
slide-31
SLIDE 31

<Your Name> 31

7 June 2020 61 Sumeet Kumar

Stance Detection Analysis

  • Stance detection report – shows Pro/Con Users

7 June 2020 62 Sumeet Kumar

Stance Detection Analysis

  • Stance detection report – shows Pro/Con hashtags

by confidence

slide-32
SLIDE 32

<Your Name> 32

7 June 2020 63 Sumeet Kumar

Stance Detection Analysis

  • Stance detection report – shows Pro/Con hashtags

by usage

7 June 2020 64 Sumeet Kumar

Thank You Please feel free to ask/send your questions