How Predictable is Information Diffusion?
Travis Martin, Jake Hofman, Amit Sharma, Ashton Anderson, and Duncan Watts
How Predictable is Information Diffusion? 1 / 36
How Predictable is Information Diffusion? Travis Martin, Jake - - PowerPoint PPT Presentation
How Predictable is Information Diffusion? Travis Martin, Jake Hofman, Amit Sharma, Ashton Anderson, and Duncan Watts How Predictable is Information Diffusion? 1 / 36 How far will this spread? How Predictable is Information Diffusion? 2 / 36
Travis Martin, Jake Hofman, Amit Sharma, Ashton Anderson, and Duncan Watts
How Predictable is Information Diffusion? 1 / 36
How Predictable is Information Diffusion? 2 / 36
How Predictable is Information Diffusion? 2 / 36
How Predictable is Information Diffusion? 3 / 36
unified framework
How Predictable is Information Diffusion? 4 / 36
(What we know and how we got here)
How Predictable is Information Diffusion? 5 / 36
How Predictable is Information Diffusion? 6 / 36
Katz & Lazarsfeld (1955)
How Predictable is Information Diffusion? 6 / 36
Rogers (1962), Bass (1969)
How Predictable is Information Diffusion? 7 / 36
p > (1 + ǫ) ln n n Erd˝
enyi (1959)
How Predictable is Information Diffusion? 8 / 36
Newman, Barabasi, Watts (2006)
How Predictable is Information Diffusion? 9 / 36
Liben-Nowell & Kleinberg (2007)
How Predictable is Information Diffusion? 10 / 36
Celeb Media Org Blog
A B Category of Twitter Users B receive tweets from A % of tweets received from Celeb Media Org Blog Celeb 38.27 6.23 1.55 3.98 Media 3.91 26.22 1.66 5.69 Org 4.64 6.41 8.05 8.70 Blog 4.94 3.89 1.58 22.55
Wu, Hofman, Mason, Watts (2011)
How Predictable is Information Diffusion? 11 / 36
Density
0.03% 0.1% 0.3% 1% 3% 10% 30% 100% All Else
Tree Size CCDF
100% 10% 1% 0.1% 0.01% 0.001% 0.0001% 1 3 10 30 100 300
Tree Depth
1 2 3 4 5 6 7 8 Y! Kindness Zync Secretary Game Twitter News Twitter Videos Friendsense Y! Voice
A B C
Goel, Goldstein, Watts (2012)
How Predictable is Information Diffusion? 12 / 36
50 100 150 time size 5 10 15 20 time size 20 40 60 80 100 120 140 time size 20 40 60 80 100 120 time size 0.0 0.5 1.0 1.5 time size 10 20 30 40 50 60 70 time size
Goel, Anderson, Hofman, Watts (2015)
How Predictable is Information Diffusion? 13 / 36
support of the two-step flow of information
deal of diversity in diffusion patterns
how far they spread
How Predictable is Information Diffusion? 14 / 36
(Evaluating the state-of-the-art under a unified framework)
How Predictable is Information Diffusion? 15 / 36
Bakshy, Hofman, Mason, Watts (2011)
events across 1M users
correlation (R2 ∼ 30%) between predicted and actual cascade sizes
comes from examining past performance of a user or piece of content
How Predictable is Information Diffusion? 16 / 36
Bakshy, Hofman, Mason, Watts (2011)
events across 1M users
correlation (R2 ∼ 30%) between predicted and actual cascade sizes
comes from examining past performance of a user or piece of content How much better can we do?
How Predictable is Information Diffusion? 16 / 36
Topic model features outperform baselines (F1 = 0.47)
Social and content features beat humans (F1 = 0.46)
Content features lead to good performance (F1 = 0.90)
Detailed wording features are informative (Accuracy = 0.65)
Temporal features provide good performance (AUC = 0.88)
How Predictable is Information Diffusion? 17 / 36
All of this work examines a different question with a different measure of success, evaluated on a different subset of data, making it difficult to assess overall progress1
1http://hunch.net/?p=22 How Predictable is Information Diffusion? 18 / 36
We focus on predictions made prior to events of interest “X will succeed because of properties A, B, and C” vs. “X will succeed tomorrow because it is successful today”
How Predictable is Information Diffusion? 19 / 36
skill Q and luck ǫ: S = f (Q) + ǫ
variance remaining after conditioning on skill: F = E[Var(S|Q)] Var(S) = 1 − R2
R2 = 0 in pure luck world
P[Success] Empirical Observation Success P[Success|skill] “Luck World” Success P[Success|skill] “Skill World” Success 2Formalizes Maboussin (2012) How Predictable is Information Diffusion? 20 / 36
2015
How Predictable is Information Diffusion? 21 / 36
2015
How Predictable is Information Diffusion? 21 / 36
2015
100 English-speaking domains with the most unique adopters
How Predictable is Information Diffusion? 21 / 36
2015
100 English-speaking domains with the most unique adopters
news, entertainment, videos, images, and products
How Predictable is Information Diffusion? 21 / 36
2015
100 English-speaking domains with the most unique adopters
news, entertainment, videos, images, and products
How Predictable is Information Diffusion? 21 / 36
Most users in our dataset have relatively few followers, although low-degree users are under-represented
10−8 10−6 10−4 10−2 1 10 1,000 100,000 10,000,000
Number of followers of a user CCDF How Predictable is Information Diffusion? 22 / 36
Most cascades are small, fewer than 3% reach 10 or more users
10−9 10−7 10−5 10−3 10−1 10 1,000 100,000
Cascade size CCDF How Predictable is Information Diffusion? 23 / 36
Most cascades are started by low-degree users
10,000 1,000,000 10 1,000 100,000 10,000,000
Number of followers of a user Number of cascades
Number of users
100 10,000 1,000,000
How Predictable is Information Diffusion? 24 / 36
Cascades initiated by high-degree users tend to have larger reach
10.0 1,000.0 100,000.0 10 1,000 100,000 10,000,000
Number of followers of a user Mean cascade size for a typical user
Number of users
100 10,000 1,000,000
How Predictable is Information Diffusion? 25 / 36
Used a random forest to estimate success (cascade size) given skill (available features)
score, ODP category
number of posts, account creation time
topic for each user and tweet, along with an interaction term
each URL and user in the past
How Predictable is Information Diffusion? 26 / 36
Our best model explains roughly half of the variance in outcomes
How Predictable is Information Diffusion? 27 / 36
Content features alone perform poorly
How Predictable is Information Diffusion? 27 / 36
Basic user features provide a reasonable boost in performance
How Predictable is Information Diffusion? 27 / 36
Past user success alone accounts for almost all of predictive power
How Predictable is Information Diffusion? 27 / 36
performance from R2 ∼ 30% to R2 ∼ 50%
simple feature: a user’s past success
possible limit to the predictability of diffusion outcomes
How Predictable is Information Diffusion? 28 / 36
(Exploring the limits to predicting success)
How Predictable is Information Diffusion? 29 / 36
models, so we turn to numerical simulations where we have full access to and control of all relevant information
the same user with the same content
estimation error
How Predictable is Information Diffusion? 30 / 36
similar to Twitter but smaller in size
standard SIR model
each combination of 10,000 different seed users and 800 different infectiousness values
to our empirical data
y x z t r v u w s(a)
y x z t r v u w s(b)
y x z t r v u w s(c)
y x z t r v u w s(d)
How Predictable is Information Diffusion? 31 / 36
Outcomes are highly predictable when all content is identical Content heterogeneity Theoretical limit on predictability Perfect knowledge
Average content quality
How Predictable is Information Diffusion? 32 / 36
Predictive performance decreases sharply with content diversity (e.g., a 15% variation around R∗
0 = 0.2 gives an R2 of 60%)
Content heterogeneity Theoretical limit on predictability Perfect knowledge
Average content quality
How Predictable is Information Diffusion? 32 / 36
Outcomes are highly predictable assuming exact quality estimates Error in estimating quality Theoretical limit on predictability Perfect knowledge
Average content quality
How Predictable is Information Diffusion? 33 / 36
Predictive performance decreases sharply with estimation error (e.g., R2 < 60% with 30% error in estimating R∗
0 = 0.3)
Theoretical limit on predictability Imperfect knowledge
Error in estimating quality
Average content quality How Predictable is Information Diffusion? 33 / 36
itself that is unpredictable, rather than our ability to estimate
predictability
approach to assessing predictability, rather than the specific numerical outcomes presented here
How Predictable is Information Diffusion? 34 / 36
How Predictable is Information Diffusion? 35 / 36
Most things don’t spread, but when they do, it’s difficult to predict success
How Predictable is Information Diffusion? 36 / 36
Despite a great deal of research on the topic, it’s difficult to assess long-term progress in predicting success
How Predictable is Information Diffusion? 36 / 36
State-of-the-art models explain roughly half of the variance in
How Predictable is Information Diffusion? 36 / 36
This is likely due to randomness in diffusion process itself, rather than our ability to estimate or model it
How Predictable is Information Diffusion? 36 / 36