Why Do Cascade Sizes Follow a Power-Law?
Andrzej Pacuk, Piotr Sankowski, Karol Węgrzycki, Piotr Wygocki
University of Warsaw Paper: bit.ly/why-cascades
WWW 2017
Why Do Cascade Sizes Follow a Power-Law? Andrzej Pacuk, Piotr - - PowerPoint PPT Presentation
Why Do Cascade Sizes Follow a Power-Law? Andrzej Pacuk, Piotr Sankowski, Karol Wgrzycki , Piotr Wygocki University of Warsaw Paper: bit.ly/why-cascades WWW 2017 Information Cascade Cascades as Graphs Given a Social Network The process
Andrzej Pacuk, Piotr Sankowski, Karol Węgrzycki, Piotr Wygocki
University of Warsaw Paper: bit.ly/why-cascades
WWW 2017
◮ Given a Social Network ◮ The process of spreading the information generates a graph (a DAG)
(a) Social Network (b) Cascade
Cascade ⇐ ⇒ Propagation Graph
Cascade ⇐ ⇒ Propagation Graph Cascade size ⇐ ⇒ Rumour Popularity
newly informed = {a} informed = {} a b c d e f g a
newly informed = {a, b} informed = {} a b c d e f g b a
newly informed = {a, b, d} informed = {} a b c d e f g b d a
newly informed = {b, d} informed = {a} a b c d e f g b d a b d a
newly informed = {b, d, c} informed = {a} a b c d e f g b d c a b d a
newly informed = {b, d, c} informed = {a} a b c d e f g b d c a b d a e
newly informed = {b, d, c, f } informed = {a} a b c d e f g b d c f a b d a e
newly informed = {c, f } informed = {a, b, d} a b c d e f g b d c f a b d a b d e
newly informed = {c, f } informed = {a, b, d} a b c d e f g b d c f a b d c f a b d e
newly informed = {c, f , g} informed = {a, b, d} a b c d e f g b d c f g a b d c f a b d e
newly informed = {g} informed = {a, b, d, c, f } a b c d e f g b d c f g a b d c f g a b d c f e
newly informed = {} informed = {a, b, d, c, f , g} a b c d e f g b d c f g a b d c f g a b d c f g e
Successful predicts small cascade sizes
◮ Over 650 citation ◮ Lots of practical applications
Real vs. simulated cascade size distribution (similar observations in, e.g., Cui et al. [CIKM 2014])
Real vs. simulated cascade size distribution (similar observations in, e.g., Cui et al. [CIKM 2014])
◮ Log-log plots
Real vs. simulated cascade size distribution (similar observations in, e.g., Cui et al. [CIKM 2014])
◮ Log-log plots ◮ X-axis - cascade size
Real vs. simulated cascade size distribution (similar observations in, e.g., Cui et al. [CIKM 2014])
◮ Log-log plots ◮ X-axis - cascade size ◮ Y-axis - probability of occurrence of cascade of such size
Possible explanations why this problem occurs:
Possible explanations why this problem occurs:
◮ Large number of Strongly Connected Components in a graph (Lee et al. PAKDD
2014)
◮ Burstiness of human behaviour (Mathews et al. WWW 2017) ◮ Time and space effects (Cui et al. CIKM 2014) ◮ Preferential attachment ◮ Many others!
Possible explanations why this problem occurs:
◮ Large number of Strongly Connected Components in a graph (Lee et al. PAKDD
2014)
◮ Burstiness of human behaviour (Mathews et al. WWW 2017) ◮ Time and space effects (Cui et al. CIKM 2014) ◮ Preferential attachment ◮ Many others!
you need a social network to generate a random cascade
◮ We propose model that outputs the probability of occurrence of the cascade.
◮ We propose model that outputs the probability of occurrence of the cascade. ◮ For example, first with 2%, second with 1%, etc.
◮ We propose model that outputs the probability of occurrence of the cascade. ◮ For example, first with 2%, second with 1%, etc. ◮ And parameters to adjust to the social network
Recap
◮ Power-Law Graphs – number of nodes fixed. Degree distribution follows
power-law
◮ Power-Law Cascade – number of nodes follows power-law. Degree distribution is
fixed
Recap
◮ Power-Law Graphs – number of nodes fixed. Degree distribution follows
power-law
◮ Power-Law Cascade – number of nodes follows power-law. Degree distribution is
fixed Empirical Theoretical Random Power- Law Graph Data from social networks Barabasi-Albert, Watts- Strogatz, Bianconi- Barabasi (and many
Random Power- Law Cascade Data from social networks
Recap
◮ Power-Law Graphs – number of nodes fixed. Degree distribution follows
power-law
◮ Power-Law Cascade – number of nodes follows power-law. Degree distribution is
fixed Empirical Theoretical Random Power- Law Graph Data from social networks Barabasi-Albert, Watts- Strogatz, Bianconi- Barabasi (and many
Random Power- Law Cascade Data from social networks This paper
1 2 3 4 5
◮ Start with n ordered nodes
1 2 3 4 5
◮ Start with n ordered nodes ◮ Yellow – selected node
1 2 3 4 5
◮ Start with n ordered nodes ◮ Yellow – selected node ◮ Black edge with probabiltiy p
1 2 3 4 5
◮ Start with n ordered nodes ◮ Yellow – selected node ◮ Black edge with probabiltiy p
1 2 3 4 5
◮ Start with n ordered nodes ◮ Yellow – selected node ◮ Black edge with probabiltiy p ◮ Red edge otherwise
1 2 3 4 5
◮ Start with n ordered nodes ◮ Yellow – selected node ◮ Black edge with probabiltiy p ◮ Red edge otherwise
1 2 3 4 5
◮ Start with n ordered nodes ◮ Yellow – selected node ◮ Black edge with probabiltiy p ◮ Red edge otherwise
1 2 3 4 5
◮ Start with n ordered nodes ◮ Yellow – selected node ◮ Black edge with probabiltiy p ◮ Red edge otherwise
1 2 3 4 5
◮ Start with n ordered nodes ◮ Yellow – selected node ◮ Black edge with probabiltiy p ◮ Red edge otherwise
1 2 3 4 5
◮ Start with n ordered nodes ◮ Yellow – selected node ◮ Black edge with probabiltiy p ◮ Red edge otherwise
1 2 3 4 5
◮ Start with n ordered nodes ◮ Yellow – selected node ◮ Black edge with probabiltiy p ◮ Red edge otherwise
1 2 3 4 5
◮ Start with n ordered nodes ◮ Yellow – selected node ◮ Black edge with probabiltiy p ◮ Red edge otherwise
1 2 3 4 5
◮ Start with n ordered nodes ◮ Yellow – selected node ◮ Black edge with probabiltiy p ◮ Red edge otherwise ◮ Remove Red edges
1 2 3 4 5
◮ Start with n ordered nodes ◮ Yellow – selected node ◮ Black edge with probabiltiy p ◮ Red edge otherwise ◮ Remove Red edges ◮ Generate Cascade using CGM (Leskovec et al. 2007)
Figure: Sample result for a large n
Theorem (this paper) If you start Cascade Generation model (with probability α) at a random node, then: P [|Informed| = k] ∼ 1 n(1 − βk)
◮ n – number of nodes ◮ k – cascade size ◮ β is model parameter (usually close to 1 because 1 − β is the probability that there
is an edge (parameter p) and information will be transferred (α).
The model obeys power-law when:
◮ Number of nodes is large (n is large)
The model obeys power-law when:
◮ Number of nodes is large (n is large) ◮ Probability that a node will inform possibly unrelated node is small (1 − β is small)
The model obeys power-law when:
◮ Number of nodes is large (n is large) ◮ Probability that a node will inform possibly unrelated node is small (1 − β is small) ◮ Largest cascade k is small (e.g., in our data largest cascade had 70 000 nodes)
Approximation (this paper) When k ≪ 1/(1 − β) ≪ n Then the distribution of cascades follows Power-Law: P [|Informed| = k] ∼ k−γ
Approximation (this paper) When k ≪ 1/(1 − β) ≪ n Then the distribution of cascades follows Power-Law: P [|Informed| = k] ∼ k−γ Observations (2 weeks of publicly available Twitter data):
◮ k ≈ 7 · 104 ◮ 1/(1 − β) ≈ 5 · 106 ◮ n ≈ 3 · 108
We propose the first model of generating cascades with theoretical guarantees (more guarantees in the paper).
◮ Open sourced data from twitter ◮ Data were preprocessed and cleaned to get cascades ◮ New hashtags were treated as information cascades ◮ Connections through retweets and replies (and usage of hashtags) ◮ See paper (bit.ly/why-cascade) for tedious statistical analysis
◮ Open sourced anonymized data (Hypertext 2016)
http://social-networks.mimuw.edu.pl/
◮ All source code available ◮ Our experiments (plots for the data and implementation) are also available (see
bit.ly/why-cascades)
For concrete social network you can have different:
◮ Distribution of edges ◮ Hierarchical structure of social network ◮ In our case γ = −1 but it is not always true ◮ When information spreads to almost all nodes – we do not model it properly (but
they probably get to know it from different sources, e.g., TV) In that situations we do not model world properly
◮ Same problem in randomly generated power-law graphs ◮ We encourage you to propose more model of random cascades with guarantees
(same as there are lots of models of random power-law graphs)
◮ Meticulously write math formulas for propagation probabilities ◮ It turns out that they are recursive in that model ◮ To solve it you could use generating functions ◮ But we simplified the proofs to avoid that technique