Why Do Cascade Sizes Follow a Power-Law? Andrzej Pacuk, Piotr - - PowerPoint PPT Presentation

why do cascade sizes follow a power law
SMART_READER_LITE
LIVE PREVIEW

Why Do Cascade Sizes Follow a Power-Law? Andrzej Pacuk, Piotr - - PowerPoint PPT Presentation

Why Do Cascade Sizes Follow a Power-Law? Andrzej Pacuk, Piotr Sankowski, Karol Wgrzycki , Piotr Wygocki University of Warsaw Paper: bit.ly/why-cascades WWW 2017 Information Cascade Cascades as Graphs Given a Social Network The process


slide-1
SLIDE 1

Why Do Cascade Sizes Follow a Power-Law?

Andrzej Pacuk, Piotr Sankowski, Karol Węgrzycki, Piotr Wygocki

University of Warsaw Paper: bit.ly/why-cascades

WWW 2017

slide-2
SLIDE 2

Information Cascade

slide-3
SLIDE 3

Cascades as Graphs

◮ Given a Social Network ◮ The process of spreading the information generates a graph (a DAG)

(a) Social Network (b) Cascade

slide-4
SLIDE 4

Cascade as Graphs

Cascade ⇐ ⇒ Propagation Graph

slide-5
SLIDE 5

Cascade as Graphs

Cascade ⇐ ⇒ Propagation Graph Cascade size ⇐ ⇒ Rumour Popularity

slide-6
SLIDE 6

Cascade Generation Model (Leskovec et al. 2007)

newly informed = {a} informed = {} a b c d e f g a

slide-7
SLIDE 7

Cascade Generation Model (Leskovec et al. 2007)

newly informed = {a, b} informed = {} a b c d e f g b a

slide-8
SLIDE 8

Cascade Generation Model (Leskovec et al. 2007)

newly informed = {a, b, d} informed = {} a b c d e f g b d a

slide-9
SLIDE 9

Cascade Generation Model (Leskovec et al. 2007)

newly informed = {b, d} informed = {a} a b c d e f g b d a b d a

slide-10
SLIDE 10

Cascade Generation Model (Leskovec et al. 2007)

newly informed = {b, d, c} informed = {a} a b c d e f g b d c a b d a

slide-11
SLIDE 11

Cascade Generation Model (Leskovec et al. 2007)

newly informed = {b, d, c} informed = {a} a b c d e f g b d c a b d a e

slide-12
SLIDE 12

Cascade Generation Model (Leskovec et al. 2007)

newly informed = {b, d, c, f } informed = {a} a b c d e f g b d c f a b d a e

slide-13
SLIDE 13

Cascade Generation Model (Leskovec et al. 2007)

newly informed = {c, f } informed = {a, b, d} a b c d e f g b d c f a b d a b d e

slide-14
SLIDE 14

Cascade Generation Model (Leskovec et al. 2007)

newly informed = {c, f } informed = {a, b, d} a b c d e f g b d c f a b d c f a b d e

slide-15
SLIDE 15

Cascade Generation Model (Leskovec et al. 2007)

newly informed = {c, f , g} informed = {a, b, d} a b c d e f g b d c f g a b d c f a b d e

slide-16
SLIDE 16

Cascade Generation Model (Leskovec et al. 2007)

newly informed = {g} informed = {a, b, d, c, f } a b c d e f g b d c f g a b d c f g a b d c f e

slide-17
SLIDE 17

Cascade Generation Model (Leskovec et al. 2007)

newly informed = {} informed = {a, b, d, c, f , g} a b c d e f g b d c f g a b d c f g a b d c f g e

slide-18
SLIDE 18

Cascade Generation Model (Leskovec et al. 2007)

Successful predicts small cascade sizes

◮ Over 650 citation ◮ Lots of practical applications

slide-19
SLIDE 19

Problem

Real vs. simulated cascade size distribution (similar observations in, e.g., Cui et al. [CIKM 2014])

slide-20
SLIDE 20

Problem

Real vs. simulated cascade size distribution (similar observations in, e.g., Cui et al. [CIKM 2014])

◮ Log-log plots

slide-21
SLIDE 21

Problem

Real vs. simulated cascade size distribution (similar observations in, e.g., Cui et al. [CIKM 2014])

◮ Log-log plots ◮ X-axis - cascade size

slide-22
SLIDE 22

Problem

Real vs. simulated cascade size distribution (similar observations in, e.g., Cui et al. [CIKM 2014])

◮ Log-log plots ◮ X-axis - cascade size ◮ Y-axis - probability of occurrence of cascade of such size

slide-23
SLIDE 23

Problem

Possible explanations why this problem occurs:

slide-24
SLIDE 24

Problem

Possible explanations why this problem occurs:

◮ Large number of Strongly Connected Components in a graph (Lee et al. PAKDD

2014)

◮ Burstiness of human behaviour (Mathews et al. WWW 2017) ◮ Time and space effects (Cui et al. CIKM 2014) ◮ Preferential attachment ◮ Many others!

slide-25
SLIDE 25

Problem

Possible explanations why this problem occurs:

◮ Large number of Strongly Connected Components in a graph (Lee et al. PAKDD

2014)

◮ Burstiness of human behaviour (Mathews et al. WWW 2017) ◮ Time and space effects (Cui et al. CIKM 2014) ◮ Preferential attachment ◮ Many others!

Only empirical

you need a social network to generate a random cascade

slide-26
SLIDE 26

What do we model?

◮ We propose model that outputs the probability of occurrence of the cascade.

slide-27
SLIDE 27

What do we model?

◮ We propose model that outputs the probability of occurrence of the cascade. ◮ For example, first with 2%, second with 1%, etc.

slide-28
SLIDE 28

What do we model?

◮ We propose model that outputs the probability of occurrence of the cascade. ◮ For example, first with 2%, second with 1%, etc. ◮ And parameters to adjust to the social network

slide-29
SLIDE 29

What do we model? (small analogy)

Recap

◮ Power-Law Graphs – number of nodes fixed. Degree distribution follows

power-law

◮ Power-Law Cascade – number of nodes follows power-law. Degree distribution is

fixed

slide-30
SLIDE 30

What do we model? (small analogy)

Recap

◮ Power-Law Graphs – number of nodes fixed. Degree distribution follows

power-law

◮ Power-Law Cascade – number of nodes follows power-law. Degree distribution is

fixed Empirical Theoretical Random Power- Law Graph Data from social networks Barabasi-Albert, Watts- Strogatz, Bianconi- Barabasi (and many

  • thers)

Random Power- Law Cascade Data from social networks

?

slide-31
SLIDE 31

What do we model? (small analogy)

Recap

◮ Power-Law Graphs – number of nodes fixed. Degree distribution follows

power-law

◮ Power-Law Cascade – number of nodes follows power-law. Degree distribution is

fixed Empirical Theoretical Random Power- Law Graph Data from social networks Barabasi-Albert, Watts- Strogatz, Bianconi- Barabasi (and many

  • thers)

Random Power- Law Cascade Data from social networks This paper

slide-32
SLIDE 32

Model (this paper)

1 2 3 4 5

◮ Start with n ordered nodes

slide-33
SLIDE 33

Model (this paper)

1 2 3 4 5

◮ Start with n ordered nodes ◮ Yellow – selected node

slide-34
SLIDE 34

Model (this paper)

1 2 3 4 5

◮ Start with n ordered nodes ◮ Yellow – selected node ◮ Black edge with probabiltiy p

slide-35
SLIDE 35

Model (this paper)

1 2 3 4 5

◮ Start with n ordered nodes ◮ Yellow – selected node ◮ Black edge with probabiltiy p

slide-36
SLIDE 36

Model (this paper)

1 2 3 4 5

◮ Start with n ordered nodes ◮ Yellow – selected node ◮ Black edge with probabiltiy p ◮ Red edge otherwise

slide-37
SLIDE 37

Model (this paper)

1 2 3 4 5

◮ Start with n ordered nodes ◮ Yellow – selected node ◮ Black edge with probabiltiy p ◮ Red edge otherwise

slide-38
SLIDE 38

Model (this paper)

1 2 3 4 5

◮ Start with n ordered nodes ◮ Yellow – selected node ◮ Black edge with probabiltiy p ◮ Red edge otherwise

slide-39
SLIDE 39

Model (this paper)

1 2 3 4 5

◮ Start with n ordered nodes ◮ Yellow – selected node ◮ Black edge with probabiltiy p ◮ Red edge otherwise

slide-40
SLIDE 40

Model (this paper)

1 2 3 4 5

◮ Start with n ordered nodes ◮ Yellow – selected node ◮ Black edge with probabiltiy p ◮ Red edge otherwise

slide-41
SLIDE 41

Model (this paper)

1 2 3 4 5

◮ Start with n ordered nodes ◮ Yellow – selected node ◮ Black edge with probabiltiy p ◮ Red edge otherwise

slide-42
SLIDE 42

Model (this paper)

1 2 3 4 5

◮ Start with n ordered nodes ◮ Yellow – selected node ◮ Black edge with probabiltiy p ◮ Red edge otherwise

slide-43
SLIDE 43

Model (this paper)

1 2 3 4 5

◮ Start with n ordered nodes ◮ Yellow – selected node ◮ Black edge with probabiltiy p ◮ Red edge otherwise

slide-44
SLIDE 44

Model (this paper)

1 2 3 4 5

◮ Start with n ordered nodes ◮ Yellow – selected node ◮ Black edge with probabiltiy p ◮ Red edge otherwise ◮ Remove Red edges

slide-45
SLIDE 45

Model (this paper)

1 2 3 4 5

◮ Start with n ordered nodes ◮ Yellow – selected node ◮ Black edge with probabiltiy p ◮ Red edge otherwise ◮ Remove Red edges ◮ Generate Cascade using CGM (Leskovec et al. 2007)

slide-46
SLIDE 46

Model (this paper)

Figure: Sample result for a large n

slide-47
SLIDE 47

Analysis

Theorem (this paper) If you start Cascade Generation model (with probability α) at a random node, then: P [|Informed| = k] ∼ 1 n(1 − βk)

◮ n – number of nodes ◮ k – cascade size ◮ β is model parameter (usually close to 1 because 1 − β is the probability that there

is an edge (parameter p) and information will be transferred (α).

slide-48
SLIDE 48

Takeaway Message

The model obeys power-law when:

◮ Number of nodes is large (n is large)

slide-49
SLIDE 49

Takeaway Message

The model obeys power-law when:

◮ Number of nodes is large (n is large) ◮ Probability that a node will inform possibly unrelated node is small (1 − β is small)

slide-50
SLIDE 50

Takeaway Message

The model obeys power-law when:

◮ Number of nodes is large (n is large) ◮ Probability that a node will inform possibly unrelated node is small (1 − β is small) ◮ Largest cascade k is small (e.g., in our data largest cascade had 70 000 nodes)

slide-51
SLIDE 51

Approximation

Approximation (this paper) When k ≪ 1/(1 − β) ≪ n Then the distribution of cascades follows Power-Law: P [|Informed| = k] ∼ k−γ

slide-52
SLIDE 52

Approximation

Approximation (this paper) When k ≪ 1/(1 − β) ≪ n Then the distribution of cascades follows Power-Law: P [|Informed| = k] ∼ k−γ Observations (2 weeks of publicly available Twitter data):

◮ k ≈ 7 · 104 ◮ 1/(1 − β) ≈ 5 · 106 ◮ n ≈ 3 · 108

slide-53
SLIDE 53

Conclusion

We propose the first model of generating cascades with theoretical guarantees (more guarantees in the paper).

Thank You!

slide-54
SLIDE 54

Question: Ground Truth

◮ Open sourced data from twitter ◮ Data were preprocessed and cleaned to get cascades ◮ New hashtags were treated as information cascades ◮ Connections through retweets and replies (and usage of hashtags) ◮ See paper (bit.ly/why-cascade) for tedious statistical analysis

slide-55
SLIDE 55

Question: Data open source

◮ Open sourced anonymized data (Hypertext 2016)

http://social-networks.mimuw.edu.pl/

◮ All source code available ◮ Our experiments (plots for the data and implementation) are also available (see

bit.ly/why-cascades)

slide-56
SLIDE 56

Question: Problems with model

For concrete social network you can have different:

◮ Distribution of edges ◮ Hierarchical structure of social network ◮ In our case γ = −1 but it is not always true ◮ When information spreads to almost all nodes – we do not model it properly (but

they probably get to know it from different sources, e.g., TV) In that situations we do not model world properly

◮ Same problem in randomly generated power-law graphs ◮ We encourage you to propose more model of random cascades with guarantees

(same as there are lots of models of random power-law graphs)

slide-57
SLIDE 57

Question: The proof

◮ Meticulously write math formulas for propagation probabilities ◮ It turns out that they are recursive in that model ◮ To solve it you could use generating functions ◮ But we simplified the proofs to avoid that technique