Fang Jin Assistant Professor Department of Computer Science, Texas - - PowerPoint PPT Presentation

fang jin
SMART_READER_LITE
LIVE PREVIEW

Fang Jin Assistant Professor Department of Computer Science, Texas - - PowerPoint PPT Presentation

Mass Movements and Their Adoption in Social Media Fang Jin Assistant Professor Department of Computer Science, Texas Tech University Ubiquity of social media Twitter users Facebook Flickr tags LinkedIn network 2 Big data research on social


slide-1
SLIDE 1

Mass Movements and Their Adoption in Social Media

Fang Jin

Assistant Professor Department of Computer Science, Texas Tech University

slide-2
SLIDE 2

2

Ubiquity of social media

Facebook Twitter users Flickr tags LinkedIn network

slide-3
SLIDE 3

Big data research on social networks

  • 1. How do we identify group

anomaly?

  • 2. How do we detect civil unrest

events in social networks?

  • 3. How do we distinguish rumors

from real news?

slide-4
SLIDE 4

Group Absenteeism as a basis for Event Detection

slide-5
SLIDE 5

Motivation

Student absent Information absenteeism How to detect group absenteeism on Twitter?

slide-6
SLIDE 6

(a) (b)

Caracas, Venezuela power cut on 2013-12-02, 8:00 PM

Why study absenteeism?

slide-7
SLIDE 7

Natal, Brazil protest on 2013-06-17, 18:00 – 20:00

(a) (b)

Why study absenteeism?

slide-8
SLIDE 8

Natal, Brazil protest on 2013-06-17, 18:00 – 20:00

(a) (b)

Why study absenteeism?

slide-9
SLIDE 9

(a) (b)

Why study absenteeism?

Protests in Brazil against world cup, 2014

slide-10
SLIDE 10

Chile Iquique earthquake on 2014-04-01

Why study absenteeism?

slide-11
SLIDE 11

(a) (b)

ArgenLna, Christmas holiday on 2015-12-25

Why study absenteeism?

slide-12
SLIDE 12

Absenteeism score

slide-13
SLIDE 13

Motivation

  • Absenteeism score (normalization
  • f Tweeter volumes).
  • Absenteeism score vector f(n) on

graph G. How to find a group of cities with uniform anomaly?

Absenteeism score distribution vector f(n) on April 1, 2014 in Chile. Natal, Brazil protest began at 18 PM on June 17, 2013

slide-14
SLIDE 14

Our approach

  • 1. Graph wavelet based approach, considering both the

graph structure and the vector f;

  • 2. Define an anomaly index of f’s distribution on G;
  • 3. Identify abnormal locations using graph wavelet;
slide-15
SLIDE 15

Graph spectrum

Shuman, David I., Benjamin Ricaud, and Pierre Vandergheynst. "A windowed graph Fourier transform." Statistical Signal Processing Workshop (SSP), 2012 IEEE. Ieee, 2012.

slide-16
SLIDE 16

Eigenvalue and eigenvector property (1)

Shuman, David I., Benjamin Ricaud, and Pierre Vandergheynst. "A windowed graph Fourier transform." Statistical Signal Processing Workshop (SSP), 2012 IEEE. Ieee, 2012.

The set of eigenvector represents N types’ pattern of graph G The larger eigenvalue corresponds to a severe fluctua4on.

slide-17
SLIDE 17

Eigenvalue and eigenvector property (2)

slide-18
SLIDE 18

Anomaly index on graph

  • 1. Define the eigenvector anomaly index:
  • 2. Define the global anomaly index of f on G:
slide-19
SLIDE 19

Graph wavelet construction

slide-20
SLIDE 20

Graph wavelet property (1)

a a

Node a Node a

Small scale Large scale

A B C D

slide-21
SLIDE 21

Graph wavelet coefficient

The wavelet coefficients for f can be defined as: f(n) can be recovered by the wavelet coefficients:

slide-22
SLIDE 22

Graph wavelet property (2)

slide-23
SLIDE 23

Graph wavelet scale example

Spectral graph wavelet on South America graph. (a) Center node (b) scale at 8 (c) scale at 18 (d) scale at 26 (e) scale at 80 (f) scale at 400

slide-24
SLIDE 24

Experiment design

Implementation Ø Build graph G for each country, based on KNN Ø Compute f(n) based on each city’s absenteeism score (Zscore30) Ø Calculate anomaly index of f on G Ø Set the wavelet coefficient threshold, find the central node and its kernel cities. Data Source Ø Gold standard report (GSR) protests in Latin American countries Ø 10% random sampled twitter data, from Jul. 2012 to Dec. 2014 Comparison criteria Ø Event date Ø Location (city) Ø Group size (group anomaly cities) Ø Protest or not

slide-25
SLIDE 25

Experiment dataset

slide-26
SLIDE 26

Experiment implementation (1)

Brazil absenteeism score distribution

  • n June 1st, 2013
  • 1. Build graph G, based on KNN, set K = 5.
  • 2. Compute f(n) based on each city’s absenteeism score (Zscore30)

Brazil 5 nearest-neighbor Graph: 1276 cities with all edge weights are 1.

slide-27
SLIDE 27

Experiment implementation (2)

3.

slide-28
SLIDE 28

Experiment implementation (3)

Two graph wavelet with different scale s S=1.31 S=0.68

  • 4. Calculate wavelet coefficient Wf(s,a) for each node a with different
  • 5. Select top wavelet coefficient with scale s, and center a.
slide-29
SLIDE 29

Experimental results: Mexico protests

Mexico protest detection performance

slide-30
SLIDE 30

Experimental results: Brazil protests

Brazil protest detection

slide-31
SLIDE 31

Experimental results: Venezuela protests

Venezuela protest detection performance

slide-32
SLIDE 32

Case study: Chile Earthquake

Iquique Earthquake, Chile. April 1, 2014.

(a) absenteeism score (b) wavelet coefficient

slide-33
SLIDE 33

Case study: Venezuela Power Outage

Venezuela power outage. Dec 2, 2013.

(a) absenteeism score (b) wavelet coefficient

slide-34
SLIDE 34

Civil Unrest Forecasting

slide-35
SLIDE 35

Twitter and the rioting

slide-36
SLIDE 36

Protest forecasting

Ø Focus on 10 Middle and South American

countries

Ø Forecast who, where, when and why

Distribution of civil unrest events in Latin America (Nov'12 -- Aug'14) as per Gold Standard Report*

In June 2013 countrywide protests erupted in Brazil, also known as the Vinegar Movement

Reasons: Increase in bus fares, corruption, health & education costs

slide-37
SLIDE 37

How to forecast protests? #Yosoy132 Protest – Mexico, 2012

slide-38
SLIDE 38

How to forecast protest?

Objective:

Ø Model the recruitment of protest participants within social networks Ø Capture the underlying social network and structural dynamics Ø Forecast the speed and scale of civil unrest events

slide-39
SLIDE 39

Approach: Bi-space model

(SEED QUERY) Protest, march, demonstraLon …

#YoSoy132

movement

"#megamarch (transparent, elecLon)

# granmarcha132

#yosoy13

Latent Space Men4ons network

We consider the menLons network to be stable

slide-40
SLIDE 40

Brownian Distance:

Propagation in the mentions network (1)

slide-41
SLIDE 41

Geometric Brownian motion (GBM)

Propagation in the mentions network (2)

slide-42
SLIDE 42

Inactive Node Active Node Brownian distance Trust function

v w

Stop!

U X M

Propagation in the mentions network (3)

slide-43
SLIDE 43

# granmarcha132

#yosoy13

Infected nodes in latent space Poisson distribution fit (λ = 4.18)

Latent space: Poisson distribution

slide-44
SLIDE 44

Assumptions: Ø Each community has its own parameters Ø Propagation among communities using source community’s parameters

Community level propagation

slide-45
SLIDE 45

Protest forecasting

Geographical Map Top Keywords for all three clusters

Protest example

Word Cloud Relevant Tweets

Twiaer – data source

slide-46
SLIDE 46

Case study: misinformation campaigns

46

Protest detection False rumors

How can we distinguish real movements from rumors?

Sept 5, 2012@ Mexico

slide-47
SLIDE 47

Distinguish rumors from real news

slide-48
SLIDE 48

Difference between rumor and news propagation

48

Amuay refinery explosion cascade Castro rumor cascade

Retweet cascade Rumor Real News

slide-49
SLIDE 49

Model intuition (comparing disease vs rumor propagation)

Ø susceptible, using status Ø infected, using status Ø may take time to accept, exposed status Ø with transmission route

49

Similarities: Ø Idea: can be skeptics, introduce skeptics Ø Idea: no immune system, no recover “R” S I E Z Differences:

slide-50
SLIDE 50

SEIZ Model

Susceptible Infected Exposed Skeptics Twitter accounts Believe news / rumor, (I) post a tweet Be exposed but not yet believe Skeptics, do not tweet

S E I Z Disease Ideas

50

Z

p b β l (1-l) (1-p) ρ

S

E I

Є

slide-51
SLIDE 51

Capturing people’s acceptance of ideas

RSI, a kind of flux ratio, the ratio of effects entering E to those leaving E.

p b β l (1-l) (1-p) ρ

S

E

I

Z

Є

51

RSI = Outflow from Exposed

Inflow to Exposed

Response ratio: Compare the speed of adding to the Exposed compartment with removing from the Exposed compartment.

slide-52
SLIDE 52

Dataset: Ebola related rumors

1 The first Ebola patient (Duncan) identified in US (Dallas). Dallas 2 The specific symptoms and travel activities of Spencer in the days before he was diagnosed. Spencer 3 The first confirmation of an Ebola patient in New York City NYC

Table 2: Ebola related news stories

Can you believe? Can you believe? Can you believe?

slide-53
SLIDE 53

Ebola related rumor distribution

slide-54
SLIDE 54

09/30/2014 10/01/2014 10/02/2014

Patent rumor First US patient news

Difference between rumor and news propagation

slide-55
SLIDE 55

Ebola rumors cluster

Rumors are color coded consistently across the two frames. 09/29/2014 10/06/2014

slide-56
SLIDE 56

SEIZ results of Ebola rumors

White Zombie Airborne Patent Response ratio of 3 real news and 10 rumors

slide-57
SLIDE 57

SEIZ results of Ebola rumors

57

White Zombie Airborne Patent Response ratio of 3 real news and 10 rumors

slide-58
SLIDE 58

Reference

  • 1. Liang Zhao, Feng Chen, Jing Dai, Ting Hua, Chang-Tien Lu, and Naren Ramakrishnan.

"Unsupervised Spatial Event Detection in Targeted Domains with Applications to Civil Unrest Modeling." PLOS ONE, vo. 9, no. 10 (2014): e110206.

  • 2. Fang Jin, Feng Chen, Rupinder Paul Khandpur, Chang-Tien Lu, Naren Ramakrishnan.

Absenteeism Detection in Social Media, in Proceedings of the SIAM International Conference on Data Mining (SDM'17), Houston, TX, April 2017.

  • 3. Fang Jin, Rupinder Paul Khandpur, Nathan Self, Edward Dougherty, Sheng Guo, Feng Chen, B.

Aditya Prakash, Naren Ramakrishnan. Modeling Mass Protest Adoption in Social Network Communities using Geometric Brownian Motion, in Proceedings of the 20th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD'14), Aug 2014.

  • 4. Fang Jin, Edward Dougherty, Parang Saraf, Peng Mi, Yang Cao, and Naren
  • Ramakrishnan. Epidemiological modeling of news and rumors on twitter, in Proceedings of the

7th ACM SIGKDD Workshop on Social Network Mining and Analysis (SNA-KDD 2013), Chicago, IL, 2013, pages 8:1-8:9.

  • 5. Fang Jin, Wei Wang, Liang Zhao, Edward Dougherty, Yang Cao, Chang-Tien Lu, Naren
  • Ramakrishnan. Misinformation Propagation in the age of Twitter, IEEE Computer, Volume 47,

Issue 12, pages 90-94, Dec 2014.

  • 6. Fang Jin, Nathan Self, Parang Saraf, Patrick Butler, Wei Wang, Naren Ramakrishnan. Forex-

Foreteller: Currency Trend Modeling using News Articles, in Proceedings of the 19th ACM SIGKDD Conference on Knowledge Discovery and Data Mining - Demo Track, pages 1470--1473, Aug 2013.

slide-59
SLIDE 59

Response ratio time series

59

Ø Response ratio is dynamically changing Ø Need to train classifier to dynamically classify two classes.

slide-60
SLIDE 60

Response ratio time series

60

Ø Response ratio is dynamically changing Ø Need to train classifier to dynamically classify two classes.

slide-61
SLIDE 61

Thank you

Fang Jin: fang.jin@ttu.edu