[PPT] - Modelling Cascades Over Time in Microblogs Wei Xie , Feida Zhu, PowerPoint Presentation

SLIDE 1

Modelling Cascades Over Time in Microblogs

Wei Xie, Feida Zhu, Siyuan Liu and Ke Wang* Living Analytics Research Centre Singapore Management University

* Ke Wang is from Simon Fraser University, and this work was done when the author was visiting Living Analytics Research Centre in Singapore Management University.

SLIDE 2

Motivation

Business applications such as viral marketing

have driven a lot of research effort predicting whether a cascade will go viral.

In real life, there are very few truly viral

cascades.

Previous research work* shows that temporal

features are the key predictor of cascade size.

* Justin Cheng, Lada A. Adamic, P. Alex Dow, Jon M. Kleinberg, Jure Leskovec: Can cascades be predicted? WWW 2014: 925-936

SLIDE 3

Time-aware Cascade Model

u0 u1 u2 u3 u5 u4 t1 t0 t3 t2

t t + dt

u0 u1 u2 u3 u5 u4 t1 t0 t3 t2 t4

SLIDE 4

Time-aware Cascade Model

u0 u1 u2 u3 u5 u4 t1 t0 t3 t2

t t + dt

u0 u1 u2 u3 u5 u4 t1 t0 t3 t2 t4

SLIDE 5

Time-aware Cascade Model

u0 u1 u2 u3 u5 u4 t1 t0 t3 t2

t t + dt

u0 u1 u2 u3 u5 u4 t1 t0 t3 t2 t4

⎧ ⎩ ⎨ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ P(C(t + dt)) = P(C(t + dt)|C(t)) ⋅ P(C(t)) P(C( )) = 1 t0 P(C(t + dt)|C(t)) = (t) ⋅ (1 − (t)) ∏

∈ (t) ui X(1)

Pi ∏

∈ (t) ui′ X(2)

Pi′

(t) = (t, { ; Θ) ⋅ dt Pi hi tj}

∈Followe (t) uj e(i)

SLIDE 6

Time-aware Cascade Model

u0 u1 u2 u3 u5 u4 t1 t0 t3 t2

t t + dt

u0 u1 u2 u3 u5 u4 t1 t0 t3 t2 t4

users who have re-shared

⎧ ⎩ ⎨ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ P(C(t + dt)) = P(C(t + dt)|C(t)) ⋅ P(C(t)) P(C( )) = 1 t0 P(C(t + dt)|C(t)) = (t) ⋅ (1 − (t)) ∏

∈ (t) ui X(1)

Pi ∏

∈ (t) ui′ X(2)

Pi′

(t) = (t, { ; Θ) ⋅ dt Pi hi tj}

∈Followe (t) uj e(i)

SLIDE 7

Time-aware Cascade Model

u0 u1 u2 u3 u5 u4 t1 t0 t3 t2

t t + dt

u0 u1 u2 u3 u5 u4 t1 t0 t3 t2 t4

users who have re-shared users who haven’t yet

⎧ ⎩ ⎨ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ P(C(t + dt)) = P(C(t + dt)|C(t)) ⋅ P(C(t)) P(C( )) = 1 t0 P(C(t + dt)|C(t)) = (t) ⋅ (1 − (t)) ∏

∈ (t) ui X(1)

Pi ∏

∈ (t) ui′ X(2)

Pi′

(t) = (t, { ; Θ) ⋅ dt Pi hi tj}

∈Followe (t) uj e(i)

SLIDE 8

Time-aware Cascade Model

u0 u1 u2 u3 u5 u4 t1 t0 t3 t2

t t + dt

u0 u1 u2 u3 u5 u4 t1 t0 t3 t2 t4

users who have re-shared users who haven’t yet

⎧ ⎩ ⎨ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ P(C(t + dt)) = P(C(t + dt)|C(t)) ⋅ P(C(t)) P(C( )) = 1 t0 P(C(t + dt)|C(t)) = (t) ⋅ (1 − (t)) ∏

∈ (t) ui X(1)

Pi ∏

∈ (t) ui′ X(2)

Pi′

(t) = (t, { ; Θ) ⋅ dt Pi hi tj}

∈Followe (t) uj e(i)

SLIDE 9

Observations in Twitter

Observation 1. Only the first re-sharer matters.

(t) = (t, ; Θ) ⋅ dt Pi hi tj⋆

where

= argmi { | ∈ Followe (t)} j⋆ nj tj uj e(i)

SLIDE 10

Observations in Twitter

Observation 1. Only the first re-sharer matters.

(t) = (t, ; Θ) ⋅ dt Pi hi tj⋆

where

= argmi { | ∈ Followe (t)} j⋆ nj tj uj e(i)

Observation 2. The chance of a tweet to be retweeted decreases as time goes by.

(t) = (τ ; Θ) ⋅ dt Pi hi

where and is a decreasing function. τ = t − tj⋆

(τ) hi

SLIDE 11

Hazard Function Design

h(t) = = lim

dt→0

P(t < T ≤ t + dt|T > t) dt f(t) 1 − F(t)

SLIDE 12

Hazard Function Design

h(t) = = lim

dt→0

P(t < T ≤ t + dt|T > t) dt f(t) 1 − F(t)

H(t) = h(u)du = du = −log(1 − F(u)) = −log(1 − F(t)) ∫

t

∫

t

(u) F′ 1 − F(u) |t

SLIDE 13

Hazard Function Design

h(t) = = lim

dt→0

P(t < T ≤ t + dt|T > t) dt f(t) 1 − F(t)

H(t) = h(u)du = du = −log(1 − F(u)) = −log(1 − F(t)) ∫

t

∫

t

(u) F′ 1 − F(u) |t

F(t) = 1 − e−H(t)

SLIDE 14

Hazard Function Design

h(t) = = lim

dt→0

P(t < T ≤ t + dt|T > t) dt f(t) 1 − F(t)

H(t) = h(u)du = du = −log(1 − F(u)) = −log(1 − F(t)) ∫

t

∫

t

(u) F′ 1 − F(u) |t

F(t) = 1 − e−H(t)

H(t) = t λ

⇒

F(t) = 1 − e− t

λ

Exponential distribution

SLIDE 15

Hazard Function Design

h(t) = = lim

dt→0

P(t < T ≤ t + dt|T > t) dt f(t) 1 − F(t)

H(t) = h(u)du = du = −log(1 − F(u)) = −log(1 − F(t)) ∫

t

∫

t

(u) F′ 1 − F(u) |t

F(t) = 1 − e−H(t)

H(t) = t λ

⇒

F(t) = 1 − e− t

λ

Exponential distribution

H(t) = ( t α )β

⇒

F(t) = 1 − e−( t

α )β

Weibull distribution

SLIDE 16

Hazard Function Design

H(t) = ( t α )β H(t) = t λ

⇒ ⇒

F(t) = 1 − e− t

λ

F(t) = 1 − e−( t

α )β

Exponential distribution Weibull distribution

SLIDE 17

Hazard Function Design

H(t) = ( t α )β H(t) = t λ

⇒ ⇒

F(t) = 1 − e− t

λ

F(t) = 1 − e−( t

α )β

Exponential distribution Weibull distribution

H(∞) = ∞ ⇒ F(∞) = 1 − ⇒ F(∞) = 1 e−∞

SLIDE 18

Hazard Function Design

H(t) = ( t α )β H(t) = t λ

⇒ ⇒

F(t) = 1 − e− t

λ

F(t) = 1 − e−( t

α )β

Exponential distribution Weibull distribution

H(∞) = ∞ ⇒ F(∞) = 1 − ⇒ F(∞) = 1 e−∞

SLIDE 19

Hazard Function Design

H(t) = ( t α )β H(t) = t λ

⇒ ⇒

F(t) = 1 − e− t

λ

F(t) = 1 − e−( t

α )β

Exponential distribution Weibull distribution

H(∞) = ∞ ⇒ F(∞) = 1 − ⇒ F(∞) = 1 e−∞

SLIDE 20

Hazard Function Design

H(τ) = λ ⋅ (1 − ( + 1 ) τ α )−β

h(τ) = = λ ⋅ ⋅ ( + 1 dH(τ) dτ β α τ α )−(β+1)

SLIDE 21

Hazard Function Design

H(τ) = λ ⋅ (1 − ( + 1 ) τ α )−β

SLIDE 22

Hazard Function Design

H(τ) = λ ⋅ (1 − ( + 1 ) τ α )−β

scale parameter

SLIDE 23

Hazard Function Design

H(τ) = λ ⋅ (1 − ( + 1 ) τ α )−β

scale parameter shape parameter

SLIDE 24

Hazard Function Design

H(τ) = λ ⋅ (1 − ( + 1 ) τ α )−β

F(∞) ≈ H(∞) = λ

scale parameter shape parameter

SLIDE 25

Hazard Function Design

H(τ) = λ ⋅ (1 − ( + 1 ) τ α )−β

F(∞) ≈ H(∞) = λ

scale parameter shape parameter describes the eventual re-tweeting probability

SLIDE 26

Hazard Rate Illustration

SLIDE 27

Hazard Rate Illustration

4 8 12 16 20 tC 60 Retweeting Rate Time (Minute)

SLIDE 28

Hazard Rate Illustration

4 8 12 16 20 tC 60 Retweeting Rate Time (Minute)

4e-4 8e-4 12e-4 16e-4 10 20 30 40 50 60 Hazard Rate Time (Minute) Emperical Rate Estimated Rate

SLIDE 29

Dataset

From a Singapore based Twitter data set, we get all the retweets to construct retweeting cascades. In all we get 2,425,348 cascades.

SLIDE 30

Probabilistic Model Fitting

TMt Threshold Model

  where

TCM-CH Constant Hazard 
TCM-EH Exponential Hazard 
TCM-LH Long tail Hazard (our proposed)

(t) = λ ⋅ s(|Followe (t)|) hi e(i)

s(x) = 1 1 + e−a(x−b)

H(τ) = λ ⋅ τ h(τ) = = λ dH(τ) dτ

H(τ) = λ ⋅ (1 − ( + 1 ) h(τ) = = λ ⋅ ⋅ ( + 1 τ α )−β dH(τ) dτ β α τ α )−(β+1) H(τ) = λ ⋅ (1 − ) h(τ) = = λ ⋅ k ⋅ e−k⋅τ dH(τ) dτ e−k⋅τ

SLIDE 31

Probabilistic Model Fitting

For each cascade, observe its development in first for  training, and the next for testing.

∆T

T0

SLIDE 32

Probabilistic Model Fitting

SLIDE 33

Predicting Cascade Growth

SLIDE 34

Virality Prediction

SLIDE 35

Thanks

SLIDE 36

Our work is based on previous cascade models

J. Goldenberg, B. Libai, and E. Muller. Talk of the network: A complex systems look at the

underlying process of word-of- mouth. Marketing letters, 12(3):211–223, 2001. 

M.Gomez-Rodriguez,D.Balduzzi,andB.Scho ̈

lkopf.Uncovering the temporal dynamics of diffusion networks. In Proceedings of the 28th International Conference on Machine Learning, ICML 2011, Bellevue, Washington, USA, June 28 - July 2, 2011, pages 561–568, 2011. 

S. A. Myers, C. Zhu, and J. Leskovec. Information diffusion and external influence in networks.

In The 18th ACM SIGKDD Inter- national Conference on Knowledge Discovery and Data Mining, KDD ’12, Beijing, China, August 12-16, 2012, pages 33–41, 2012. 

M. Gomez-Rodriguez, J. Leskovec, and B. Scho ̈
lkopf. Modeling information propagation with

survival theory. In ICML (3), pages 666–674, 2013. 

N. Du, L. Song, M. Gomez-Rodriguez, and H. Zha. Scalable influence estimation in continuous-

time diffusion networks. In Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013, Lake Tahoe, Nevada, United States., pages 3147–3155, 2013.