O tt itti Outtwitting the Twitterers th T itt Predicting - PowerPoint PPT Presentation

O tt itti Outtwitting the Twitterers – th T itt Predicting Information Predicting Information Cascades in Microblogs Wojciech Galuba Karl Aberer Wojciech Galuba , Karl Aberer EPFL, Switzerland Dipanjan Chakraborty Dipanjan Chakraborty IBM Research India Zoran Despotovic, Wolfgang Kellerer D Docomo Euro-Labs, Munich, Germany E L b M i h G

Why study information flows in OSNs? casual link sharing  improve how information flows breaking news Modeling M d li activism  new applications viral marketing  insights into g emergencies underlying sociology PR campaigns 2

Information overload? Full-time job (reading tweets 40h a week at 150WPM) k t 150WPM) Median: 23 tw/h, 552 tw/day (Sep 2009 data) 3

OSN information spread modeling  Related work:  generative models  reproduce statistical properties of info spread  reproduce statistical properties of info spread  predict coarse-grained aggregates  # of nodes reached by spread etc.  Our approach:  Our approach:  Look at URL diffusion on Twitter  Can we predict which user will mention which URL with what probability? URL with what probability? 4

Why predict URL tweets?  Protect from information overload  Protect from information overload  Sort incoming URLs by probability of retweeting t ti  Viral marketing  Viral marketing  Select a subset of users that ensure successful URL propagation f l URL ti  Spam detection  Spam detection  Mispredictions are a sign of anomalous activity ti it 5

Data  300 hour window in Sep’09  22M tweets  2.7M unique users  15M unique URLs  15M unique URLs  700M connections in the follower graph g p  Approx. 1/15th of the Twitter traffic 7

Follower graph* * active users only: that have sent at least one URL in 300h 8

F ll Follower graph* h* Mean (directed): Mean (directed): 3.61 * active users only: that have sent at least one URL in 300h 9

U User activity ti it 10

Per-URL activity 11

Information cascades Nodes: users that Nodes: users that mentioned a given URL A Arcs: information flow i f ti fl 12

Re-tweeting 13

RT-cascade @bob: RT @alice @alice: http://url.com @ p http://url.com p @charlie: http://url.com  Arcs: who retweets whom  Irrespective of wheter users follow one another  Single parent g p  only the user name immediately after „RT” taken into account 14

F-cascade @bob: http://url.com @alice: http://url.com @charlie: http://url.com  Arc @a  @b exists if:  user @a mentioned URL before user @b  user @a mentioned URL before user @b  user @b follows user @a 15

RT-cascades vs. F-cascades  RT cascades are trees  RT-cascades are trees  F-cascades are DAGs  33% of the retweets credit a source that th the user does not directly follow d t di tl f ll 16

cascade subcascade 17

Subcascade size 18

Cascade fragmentation 19

Cascade depth 20

Influence of the root 21

Information diffusion rate Median: 50mins 22

URL tweeting prediction  Based on the past URL retweets by users  Based on the past URL retweets by users, predict the future ones  Find probability that user i mentions URL u u = u p i p i 23

Influence α ij α 24

External influence β i β 25

URL virality γ u γ http://cnn com/ http://cnn.com/ 26

Per-user diffusion delay 2 , µ i σ i i i 27

Model α ij β i β i 2 , µ i σ i γ u http://cnn.com/ 28

At-Least-One (ALO) model u p α γ ij j u j j Temporal u p p = P( at least one * ( event happens ) * component component i i 2 , µ i σ i β β i γ γ u 29

Linear threshold (LT) model u p α γ ij u j Temporal u p p   = * component component * i i 2 , µ i σ i β β i γ γ Thresholding u function (sigmoid) 30

Performance metrics  Recall: fraction of tweets predicted  Recall: fraction of tweets predicted  out of all tweets that happened  Precision: fraction of true positives  out of all tweets predicted t f ll t t di t d  F-score: harmonic mean of recall and  F score: harmonic mean of recall and precision  F-score is the optimization goal 31

Learning  Input: a time window of tweets  Input: a time window of tweets  Computation: gradient ascent method p g 2 , , , , α β γ µ σ  Parameter space: ji i u i i  Goal: maximize F-score G l i i F u p p  Output:  Output: i i 32

Lineup  LT – Linear Threshold model  LT Linear Threshold model α  LTr – Linear Threshold model with j j α instead of ji  ALO – At-Least-One model ALO At L t O d l  RND – baseline makes random guesses  RND – baseline, makes random guesses u p about i 33

* training data: first 150 h, test data: next 150h, 34 results for 100 random URLs

Summary  L og-normal degree distribution  L og normal degree distribution  Small-world: 3.6 hops from user to user  Power-laws in the user activity and URL mentions e o s  Cascades are shallow: exponential depth falloff  Log-normally distributed diffusion delay ff  The LT model: The LT model:  predicts more than half of the URL tweets  with less than 15% false positive rate  with less than 15% false positive rate 35

Ongoing work  Investigating mispredictions  Investigating mispredictions  URLs  users  Scaling up the real-time data mining g p g  continous MapReduce  crawler farm  crawler farm  Website: personalized URL rankings for Twitter users  Apply to other systems pp y to ot e syste s 36

O tt itti Outtwitting the Twitterers th T itt Predicting - PowerPoint PPT Presentation

O tt itti Outtwitting the Twitterers th T itt Predicting Information Predicting Information Cascades in Microblogs Wojciech Galuba Karl Aberer Wojciech Galuba , Karl Aberer EPFL, Switzerland Dipanjan Chakraborty Dipanjan Chakraborty

Welcome Predicting Change Outcomes Leveraging SQL Server Profiler Lee Everest SQL Rx Predicting

Predicting Regulatory Elements Predicting Regulatory Elements in P. falciparum in P. falciparum

Predicting Return to Work Predicting Return to Work with Data Mining with Data Mining Claim A

Predicting and Comprehending Predicting and Comprehending Asteroid Impacts Asteroid Impacts

Predicting and modeling water chemistry Predicting and modeling water chemistry associated with

An Agent Architecture An Agent Architecture An Agent Architecture An Agent Architecture for

Predicting Min Predicting Min-Bias and the Bias and the Underlying Event at

Predicting implicit and explicit questions Matthijs Westera COLT kick-off workshop Predicting

Computational Algorithm Predicting Surface Computational Algorithm Predicting Surface Morphology

The Specialist Committee on Stability in Waves Final Report and Recommendations to the 25th ITT C

United States Court of Appeals for the Federal Circuit 05-1407 HONEYWELL INTERNATIONAL, INC. and

th the e Fut uture ure University of Pittsburgh Pittsburgh Campus Pit itt Ins nstitu

Clim ate change and UK Clim ate change and UK airport expansion airport expansion Cait Hew itt

Detecting and Quantifying Variation In Effects of Program Assignment (ITT) Howard Bloom Stephen

2017 2017 Professionali lism in in Practice Course - Bangkok ITT APIRAKTIVONG, FSA, FSAT,

) YEH # #t A ( Th Aa ITT ' D Y = $2 R2 IR ' v - . . i diagram of K

Non-equilibrium condensation in WT & GP models Sergey Nazarenko INPHYNI (Insitute de

Update on Cascade Care procurement activities and proposed timeline Senior Citizens

Sequential Fundraising and Social Insurance Amir Ban (Weizmann Institute of Science) and Moran

BORDERLINE PERSONALITY DISORDER Edward A. Selby, Ph.D. Assistant Professor Department of

Tuning Ber*ni Cascade Model Parameters Dennis Wright 22

The Cascade High Productivity Language The Cascade High Productivity Language Brad Chamberlain

Introduction to Web Design & Computer Principles CSS CSCI-UA 4 Cascading Style Sheets

CSCI 3210: Computational Game Theory Cascading Behavior in Networks Ref: [AGT] Ch 24 Mohammad

Sambuz

Useful Links

Newsletter

Mail Us

O tt itti Outtwitting the Twitterers th T itt Predicting - PowerPoint PPT Presentation

O tt itti Outtwitting the Twitterers th T itt Predicting Information Predicting Information Cascades in Microblogs Wojciech Galuba Karl Aberer Wojciech Galuba , Karl Aberer EPFL, Switzerland Dipanjan Chakraborty Dipanjan Chakraborty

Welcome Predicting Change Outcomes Leveraging SQL Server Profiler Lee Everest SQL Rx Predicting

Predicting Regulatory Elements Predicting Regulatory Elements in P. falciparum in P. falciparum

Predicting Return to Work Predicting Return to Work with Data Mining with Data Mining Claim A

Predicting and Comprehending Predicting and Comprehending Asteroid Impacts Asteroid Impacts

Predicting and modeling water chemistry Predicting and modeling water chemistry associated with

An Agent Architecture An Agent Architecture An Agent Architecture An Agent Architecture for

Predicting Min Predicting Min-Bias and the Bias and the Underlying Event at

Predicting implicit and explicit questions Matthijs Westera COLT kick-off workshop Predicting

Computational Algorithm Predicting Surface Computational Algorithm Predicting Surface Morphology

The Specialist Committee on Stability in Waves Final Report and Recommendations to the 25th ITT C

United States Court of Appeals for the Federal Circuit 05-1407 HONEYWELL INTERNATIONAL, INC. and

th the e Fut uture ure University of Pittsburgh Pittsburgh Campus Pit itt Ins nstitu

Clim ate change and UK Clim ate change and UK airport expansion airport expansion Cait Hew itt

Detecting and Quantifying Variation In Effects of Program Assignment (ITT) Howard Bloom Stephen

2017 2017 Professionali lism in in Practice Course - Bangkok ITT APIRAKTIVONG, FSA, FSAT,

) YEH # #t A ( Th Aa ITT ' D Y = $2 R2 IR ' v - . . i diagram of K

Non-equilibrium condensation in WT &amp; GP models Sergey Nazarenko INPHYNI (Insitute de

Update on Cascade Care procurement activities and proposed timeline Senior Citizens

Sequential Fundraising and Social Insurance Amir Ban (Weizmann Institute of Science) and Moran

BORDERLINE PERSONALITY DISORDER Edward A. Selby, Ph.D. Assistant Professor Department of

Tuning Ber*ni Cascade Model Parameters Dennis Wright 22

The Cascade High Productivity Language The Cascade High Productivity Language Brad Chamberlain

Introduction to Web Design &amp; Computer Principles CSS CSCI-UA 4 Cascading Style Sheets

CSCI 3210: Computational Game Theory Cascading Behavior in Networks Ref: [AGT] Ch 24 Mohammad

Sambuz

Useful Links

Newsletter

Mail Us

Non-equilibrium condensation in WT & GP models Sergey Nazarenko INPHYNI (Insitute de

Introduction to Web Design & Computer Principles CSS CSCI-UA 4 Cascading Style Sheets