Modelling Cascades Over Time in Microblogs Wei Xie , Feida Zhu, - - PowerPoint PPT Presentation

modelling cascades over time in microblogs
SMART_READER_LITE
LIVE PREVIEW

Modelling Cascades Over Time in Microblogs Wei Xie , Feida Zhu, - - PowerPoint PPT Presentation

Modelling Cascades Over Time in Microblogs Wei Xie , Feida Zhu, Siyuan Liu and Ke Wang* Living Analytics Research Centre Singapore Management University * Ke Wang is from Simon Fraser University, and this work was done when the author was


slide-1
SLIDE 1

Modelling Cascades Over Time in Microblogs

Wei Xie, Feida Zhu, Siyuan Liu and Ke Wang* Living Analytics Research Centre Singapore Management University

* Ke Wang is from Simon Fraser University, and this work was done when the author was visiting Living Analytics Research Centre in Singapore Management University.

slide-2
SLIDE 2

Motivation

  • Business applications such as viral marketing

have driven a lot of research effort predicting whether a cascade will go viral.

  • In real life, there are very few truly viral

cascades.

  • Previous research work* shows that temporal

features are the key predictor of cascade size.

* Justin Cheng, Lada A. Adamic, P. Alex Dow, Jon M. Kleinberg, Jure Leskovec: Can cascades be predicted? WWW 2014: 925-936

slide-3
SLIDE 3

Time-aware Cascade Model

u0 u1 u2 u3 u5 u4 t1 t0 t3 t2

t t + dt

u0 u1 u2 u3 u5 u4 t1 t0 t3 t2 t4

slide-4
SLIDE 4

Time-aware Cascade Model

u0 u1 u2 u3 u5 u4 t1 t0 t3 t2

t t + dt

u0 u1 u2 u3 u5 u4 t1 t0 t3 t2 t4

slide-5
SLIDE 5

Time-aware Cascade Model

u0 u1 u2 u3 u5 u4 t1 t0 t3 t2

t t + dt

u0 u1 u2 u3 u5 u4 t1 t0 t3 t2 t4

⎧ ⎩ ⎨ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ P(C(t + dt)) = P(C(t + dt)|C(t)) ⋅ P(C(t)) P(C( )) = 1 t0 P(C(t + dt)|C(t)) = (t) ⋅ (1 − (t)) ∏

∈ (t) ui X(1)

Pi ∏

∈ (t) ui′ X(2)

Pi′

(t) = (t, { ; Θ) ⋅ dt Pi hi tj}

∈Followe (t) uj e(i)

slide-6
SLIDE 6

Time-aware Cascade Model

u0 u1 u2 u3 u5 u4 t1 t0 t3 t2

t t + dt

u0 u1 u2 u3 u5 u4 t1 t0 t3 t2 t4

users who have re-shared

⎧ ⎩ ⎨ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ P(C(t + dt)) = P(C(t + dt)|C(t)) ⋅ P(C(t)) P(C( )) = 1 t0 P(C(t + dt)|C(t)) = (t) ⋅ (1 − (t)) ∏

∈ (t) ui X(1)

Pi ∏

∈ (t) ui′ X(2)

Pi′

(t) = (t, { ; Θ) ⋅ dt Pi hi tj}

∈Followe (t) uj e(i)

slide-7
SLIDE 7

Time-aware Cascade Model

u0 u1 u2 u3 u5 u4 t1 t0 t3 t2

t t + dt

u0 u1 u2 u3 u5 u4 t1 t0 t3 t2 t4

users who have re-shared users who haven’t yet

⎧ ⎩ ⎨ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ P(C(t + dt)) = P(C(t + dt)|C(t)) ⋅ P(C(t)) P(C( )) = 1 t0 P(C(t + dt)|C(t)) = (t) ⋅ (1 − (t)) ∏

∈ (t) ui X(1)

Pi ∏

∈ (t) ui′ X(2)

Pi′

(t) = (t, { ; Θ) ⋅ dt Pi hi tj}

∈Followe (t) uj e(i)

slide-8
SLIDE 8

Time-aware Cascade Model

u0 u1 u2 u3 u5 u4 t1 t0 t3 t2

t t + dt

u0 u1 u2 u3 u5 u4 t1 t0 t3 t2 t4

users who have re-shared users who haven’t yet

⎧ ⎩ ⎨ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ P(C(t + dt)) = P(C(t + dt)|C(t)) ⋅ P(C(t)) P(C( )) = 1 t0 P(C(t + dt)|C(t)) = (t) ⋅ (1 − (t)) ∏

∈ (t) ui X(1)

Pi ∏

∈ (t) ui′ X(2)

Pi′

(t) = (t, { ; Θ) ⋅ dt Pi hi tj}

∈Followe (t) uj e(i)

slide-9
SLIDE 9

Observations in Twitter

Observation 1. Only the first re-sharer matters.

(t) = (t, ; Θ) ⋅ dt Pi hi tj⋆

where

= argmi { | ∈ Followe (t)} j⋆ nj tj uj e(i)

slide-10
SLIDE 10

Observations in Twitter

Observation 1. Only the first re-sharer matters.

(t) = (t, ; Θ) ⋅ dt Pi hi tj⋆

where

= argmi { | ∈ Followe (t)} j⋆ nj tj uj e(i)

Observation 2. The chance of a tweet to be retweeted decreases as time goes by.

(t) = (τ ; Θ) ⋅ dt Pi hi

where and is a decreasing function. τ = t − tj⋆

(τ) hi

slide-11
SLIDE 11

Hazard Function Design

h(t) = = lim

dt→0

P(t < T ≤ t + dt|T > t) dt f(t) 1 − F(t)

slide-12
SLIDE 12

Hazard Function Design

h(t) = = lim

dt→0

P(t < T ≤ t + dt|T > t) dt f(t) 1 − F(t)

H(t) = h(u)du = du = −log(1 − F(u)) = −log(1 − F(t)) ∫

t

t

(u) F′ 1 − F(u) |t

slide-13
SLIDE 13

Hazard Function Design

h(t) = = lim

dt→0

P(t < T ≤ t + dt|T > t) dt f(t) 1 − F(t)

H(t) = h(u)du = du = −log(1 − F(u)) = −log(1 − F(t)) ∫

t

t

(u) F′ 1 − F(u) |t

F(t) = 1 − e−H(t)

slide-14
SLIDE 14

Hazard Function Design

h(t) = = lim

dt→0

P(t < T ≤ t + dt|T > t) dt f(t) 1 − F(t)

H(t) = h(u)du = du = −log(1 − F(u)) = −log(1 − F(t)) ∫

t

t

(u) F′ 1 − F(u) |t

F(t) = 1 − e−H(t)

H(t) = t λ

F(t) = 1 − e− t

λ

Exponential distribution

slide-15
SLIDE 15

Hazard Function Design

h(t) = = lim

dt→0

P(t < T ≤ t + dt|T > t) dt f(t) 1 − F(t)

H(t) = h(u)du = du = −log(1 − F(u)) = −log(1 − F(t)) ∫

t

t

(u) F′ 1 − F(u) |t

F(t) = 1 − e−H(t)

H(t) = t λ

F(t) = 1 − e− t

λ

Exponential distribution

H(t) = ( t α )β

F(t) = 1 − e−( t

α )β

Weibull distribution

slide-16
SLIDE 16

Hazard Function Design

H(t) = ( t α )β H(t) = t λ

⇒ ⇒

F(t) = 1 − e− t

λ

F(t) = 1 − e−( t

α )β

Exponential distribution Weibull distribution

slide-17
SLIDE 17

Hazard Function Design

H(t) = ( t α )β H(t) = t λ

⇒ ⇒

F(t) = 1 − e− t

λ

F(t) = 1 − e−( t

α )β

Exponential distribution Weibull distribution

H(∞) = ∞ ⇒ F(∞) = 1 − ⇒ F(∞) = 1 e−∞

slide-18
SLIDE 18

Hazard Function Design

H(t) = ( t α )β H(t) = t λ

⇒ ⇒

F(t) = 1 − e− t

λ

F(t) = 1 − e−( t

α )β

Exponential distribution Weibull distribution

H(∞) = ∞ ⇒ F(∞) = 1 − ⇒ F(∞) = 1 e−∞

slide-19
SLIDE 19

Hazard Function Design

H(t) = ( t α )β H(t) = t λ

⇒ ⇒

F(t) = 1 − e− t

λ

F(t) = 1 − e−( t

α )β

Exponential distribution Weibull distribution

H(∞) = ∞ ⇒ F(∞) = 1 − ⇒ F(∞) = 1 e−∞

slide-20
SLIDE 20

Hazard Function Design

H(τ) = λ ⋅ (1 − ( + 1 ) τ α )−β

h(τ) = = λ ⋅ ⋅ ( + 1 dH(τ) dτ β α τ α )−(β+1)

slide-21
SLIDE 21

Hazard Function Design

H(τ) = λ ⋅ (1 − ( + 1 ) τ α )−β

slide-22
SLIDE 22

Hazard Function Design

H(τ) = λ ⋅ (1 − ( + 1 ) τ α )−β

scale parameter

slide-23
SLIDE 23

Hazard Function Design

H(τ) = λ ⋅ (1 − ( + 1 ) τ α )−β

scale parameter shape parameter

slide-24
SLIDE 24

Hazard Function Design

H(τ) = λ ⋅ (1 − ( + 1 ) τ α )−β

F(∞) ≈ H(∞) = λ

scale parameter shape parameter

slide-25
SLIDE 25

Hazard Function Design

H(τ) = λ ⋅ (1 − ( + 1 ) τ α )−β

F(∞) ≈ H(∞) = λ

scale parameter shape parameter describes the eventual re-tweeting probability

slide-26
SLIDE 26

Hazard Rate Illustration

slide-27
SLIDE 27

Hazard Rate Illustration

4 8 12 16 20 tC 60 Retweeting Rate Time (Minute)

slide-28
SLIDE 28

Hazard Rate Illustration

4 8 12 16 20 tC 60 Retweeting Rate Time (Minute)

4e-4 8e-4 12e-4 16e-4 10 20 30 40 50 60 Hazard Rate Time (Minute) Emperical Rate Estimated Rate

slide-29
SLIDE 29

Dataset

From a Singapore based Twitter data set, we get all the retweets to construct retweeting cascades. In all we get 2,425,348 cascades.

slide-30
SLIDE 30

Probabilistic Model Fitting

  • TMt Threshold Model



 where

  • TCM-CH Constant Hazard

  • TCM-EH Exponential Hazard

  • TCM-LH Long tail Hazard (our proposed)

(t) = λ ⋅ s(|Followe (t)|) hi e(i)

s(x) = 1 1 + e−a(x−b)

H(τ) = λ ⋅ τ h(τ) = = λ dH(τ) dτ

H(τ) = λ ⋅ (1 − ( + 1 ) h(τ) = = λ ⋅ ⋅ ( + 1 τ α )−β dH(τ) dτ β α τ α )−(β+1) H(τ) = λ ⋅ (1 − ) h(τ) = = λ ⋅ k ⋅ e−k⋅τ dH(τ) dτ e−k⋅τ

slide-31
SLIDE 31

Probabilistic Model Fitting

For each cascade, observe its development in first for
 training, and the next for testing.

∆T

T0

slide-32
SLIDE 32

Probabilistic Model Fitting

slide-33
SLIDE 33

Predicting Cascade Growth

slide-34
SLIDE 34

Virality Prediction

slide-35
SLIDE 35

Thanks

slide-36
SLIDE 36

Our work is based on previous cascade models

  • J. Goldenberg, B. Libai, and E. Muller. Talk of the network: A complex systems look at the

underlying process of word-of- mouth. Marketing letters, 12(3):211–223, 2001.


  • M.Gomez-Rodriguez,D.Balduzzi,andB.Scho ̈

lkopf.Uncovering the temporal dynamics of diffusion networks. In Proceedings of the 28th International Conference on Machine Learning, ICML 2011, Bellevue, Washington, USA, June 28 - July 2, 2011, pages 561–568, 2011.


  • S. A. Myers, C. Zhu, and J. Leskovec. Information diffusion and external influence in networks.

In The 18th ACM SIGKDD Inter- national Conference on Knowledge Discovery and Data Mining, KDD ’12, Beijing, China, August 12-16, 2012, pages 33–41, 2012.


  • M. Gomez-Rodriguez, J. Leskovec, and B. Scho ̈
  • lkopf. Modeling information propagation with

survival theory. In ICML (3), pages 666–674, 2013.


  • N. Du, L. Song, M. Gomez-Rodriguez, and H. Zha. Scalable influence estimation in continuous-

time diffusion networks. In Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013, Lake Tahoe, Nevada, United States., pages 3147–3155, 2013.