why do cascade sizes follow a power law
play

Why Do Cascade Sizes Follow a Power-Law? Andrzej Pacuk, Piotr - PowerPoint PPT Presentation

Why Do Cascade Sizes Follow a Power-Law? Andrzej Pacuk, Piotr Sankowski, Karol Wgrzycki , Piotr Wygocki University of Warsaw Paper: bit.ly/why-cascades WWW 2017 Information Cascade Cascades as Graphs Given a Social Network The process


  1. Why Do Cascade Sizes Follow a Power-Law? Andrzej Pacuk, Piotr Sankowski, Karol Węgrzycki , Piotr Wygocki University of Warsaw Paper: bit.ly/why-cascades WWW 2017

  2. Information Cascade

  3. Cascades as Graphs ◮ Given a Social Network ◮ The process of spreading the information generates a graph (a DAG) (a) Social Network (b) Cascade

  4. Cascade as Graphs Cascade ⇐ ⇒ Propagation Graph

  5. Cascade as Graphs Cascade ⇐ ⇒ Propagation Graph Cascade size ⇐ ⇒ Rumour Popularity

  6. Cascade Generation Model (Leskovec et al. 2007) a a newly informed = { a } informed = {} c b e d g f

  7. Cascade Generation Model (Leskovec et al. 2007) a a newly informed = { a , b } informed = {} c b b e d g f

  8. Cascade Generation Model (Leskovec et al. 2007) a a newly informed = { a , b , d } informed = {} c b b e d d g f

  9. Cascade Generation Model (Leskovec et al. 2007) a a a newly informed = { b , d } informed = { a } c b b b e d d d g f

  10. Cascade Generation Model (Leskovec et al. 2007) a a a newly informed = { b , d , c } informed = { a } c c b b b e d d d g f

  11. Cascade Generation Model (Leskovec et al. 2007) a a a newly informed = { b , d , c } informed = { a } c c b b b e e d d d g f

  12. Cascade Generation Model (Leskovec et al. 2007) a a a newly informed = { b , d , c , f } informed = { a } c c b b b e e d d d g f f

  13. Cascade Generation Model (Leskovec et al. 2007) a a a newly informed = { c , f } informed = { a , b , d } c c b b b b e e d d d d g f f

  14. Cascade Generation Model (Leskovec et al. 2007) a a a newly informed = { c , f } informed = { a , b , d } c c c b b b b e e d d d d g f f f

  15. Cascade Generation Model (Leskovec et al. 2007) a a a newly informed = { c , f , g } informed = { a , b , d } c c c b b b b e e d d d d g g f f f

  16. Cascade Generation Model (Leskovec et al. 2007) a a a newly informed = { g } informed = { a , b , d , c , f } c c c c b b b b e e d d d d g g g f f f f

  17. Cascade Generation Model (Leskovec et al. 2007) a a a newly informed = {} informed = { a , b , d , c , f , g } c c c c b b b b e e d d d d g g g g f f f f

  18. Cascade Generation Model (Leskovec et al. 2007) Successful predicts small cascade sizes ◮ Over 650 citation ◮ Lots of practical applications

  19. Problem Real vs. simulated cascade size distribution (similar observations in, e.g., Cui et al. [CIKM 2014])

  20. Problem Real vs. simulated cascade size distribution (similar observations in, e.g., Cui et al. [CIKM 2014]) ◮ Log-log plots

  21. Problem Real vs. simulated cascade size distribution (similar observations in, e.g., Cui et al. [CIKM 2014]) ◮ Log-log plots ◮ X-axis - cascade size

  22. Problem Real vs. simulated cascade size distribution (similar observations in, e.g., Cui et al. [CIKM 2014]) ◮ Log-log plots ◮ X-axis - cascade size ◮ Y-axis - probability of occurrence of cascade of such size

  23. Problem Possible explanations why this problem occurs:

  24. Problem Possible explanations why this problem occurs: ◮ Large number of Strongly Connected Components in a graph (Lee et al. PAKDD 2014) ◮ Burstiness of human behaviour (Mathews et al. WWW 2017) ◮ Time and space effects (Cui et al. CIKM 2014) ◮ Preferential attachment ◮ Many others!

  25. Problem Possible explanations why this problem occurs: ◮ Large number of Strongly Connected Components in a graph (Lee et al. PAKDD 2014) ◮ Burstiness of human behaviour (Mathews et al. WWW 2017) ◮ Time and space effects (Cui et al. CIKM 2014) ◮ Preferential attachment ◮ Many others! Only empirical you need a social network to generate a random cascade

  26. What do we model? ◮ We propose model that outputs the probability of occurrence of the cascade.

  27. What do we model? ◮ We propose model that outputs the probability of occurrence of the cascade. ◮ For example, first with 2%, second with 1%, etc.

  28. What do we model? ◮ We propose model that outputs the probability of occurrence of the cascade. ◮ For example, first with 2%, second with 1%, etc. ◮ And parameters to adjust to the social network

  29. What do we model? (small analogy) Recap ◮ Power-Law Graphs – number of nodes fixed. Degree distribution follows power-law ◮ Power-Law Cascade – number of nodes follows power-law. Degree distribution is fixed

  30. What do we model? (small analogy) Recap ◮ Power-Law Graphs – number of nodes fixed. Degree distribution follows power-law ◮ Power-Law Cascade – number of nodes follows power-law. Degree distribution is fixed Empirical Theoretical Barabasi-Albert, Watts- Strogatz, Bianconi- Random Power- Data from social Law Graph Barabasi (and many networks others) Random Power- Data from social ? Law Cascade networks

  31. What do we model? (small analogy) Recap ◮ Power-Law Graphs – number of nodes fixed. Degree distribution follows power-law ◮ Power-Law Cascade – number of nodes follows power-law. Degree distribution is fixed Empirical Theoretical Barabasi-Albert, Watts- Strogatz, Bianconi- Random Power- Data from social Law Graph Barabasi (and many networks others) Random Power- Data from social This paper Law Cascade networks

  32. Model (this paper) 1 2 3 4 5 ◮ Start with n ordered nodes

  33. Model (this paper) 1 2 3 4 5 ◮ Start with n ordered nodes ◮ Yellow – selected node

  34. Model (this paper) 1 2 3 4 5 ◮ Start with n ordered nodes ◮ Yellow – selected node ◮ Black edge with probabiltiy p

  35. Model (this paper) 1 2 3 4 5 ◮ Start with n ordered nodes ◮ Yellow – selected node ◮ Black edge with probabiltiy p

  36. Model (this paper) 1 2 3 4 5 ◮ Start with n ordered nodes ◮ Yellow – selected node ◮ Black edge with probabiltiy p ◮ Red edge otherwise

  37. Model (this paper) 1 2 3 4 5 ◮ Start with n ordered nodes ◮ Yellow – selected node ◮ Black edge with probabiltiy p ◮ Red edge otherwise

  38. Model (this paper) 1 2 3 4 5 ◮ Start with n ordered nodes ◮ Yellow – selected node ◮ Black edge with probabiltiy p ◮ Red edge otherwise

  39. Model (this paper) 1 2 3 4 5 ◮ Start with n ordered nodes ◮ Yellow – selected node ◮ Black edge with probabiltiy p ◮ Red edge otherwise

  40. Model (this paper) 1 2 3 4 5 ◮ Start with n ordered nodes ◮ Yellow – selected node ◮ Black edge with probabiltiy p ◮ Red edge otherwise

  41. Model (this paper) 1 2 3 4 5 ◮ Start with n ordered nodes ◮ Yellow – selected node ◮ Black edge with probabiltiy p ◮ Red edge otherwise

  42. Model (this paper) 1 2 3 4 5 ◮ Start with n ordered nodes ◮ Yellow – selected node ◮ Black edge with probabiltiy p ◮ Red edge otherwise

  43. Model (this paper) 1 2 3 4 5 ◮ Start with n ordered nodes ◮ Yellow – selected node ◮ Black edge with probabiltiy p ◮ Red edge otherwise

  44. Model (this paper) 1 2 3 4 5 ◮ Start with n ordered nodes ◮ Yellow – selected node ◮ Black edge with probabiltiy p ◮ Red edge otherwise ◮ Remove Red edges

  45. Model (this paper) 1 2 3 4 5 ◮ Start with n ordered nodes ◮ Yellow – selected node ◮ Black edge with probabiltiy p ◮ Red edge otherwise ◮ Remove Red edges ◮ Generate Cascade using CGM (Leskovec et al. 2007)

  46. Model (this paper) Figure: Sample result for a large n

  47. Analysis Theorem (this paper) If you start Cascade Generation model (with probability α ) at a random node, then: 1 P [ | Informed | = k ] ∼ n ( 1 − β k ) ◮ n – number of nodes ◮ k – cascade size ◮ β is model parameter (usually close to 1 because 1 − β is the probability that there is an edge (parameter p ) and information will be transferred ( α ).

  48. Takeaway Message The model obeys power-law when: ◮ Number of nodes is large ( n is large)

  49. Takeaway Message The model obeys power-law when: ◮ Number of nodes is large ( n is large) ◮ Probability that a node will inform possibly unrelated node is small (1 − β is small)

  50. Takeaway Message The model obeys power-law when: ◮ Number of nodes is large ( n is large) ◮ Probability that a node will inform possibly unrelated node is small (1 − β is small) ◮ Largest cascade k is small (e.g., in our data largest cascade had 70 000 nodes)

  51. Approximation Approximation (this paper) When k ≪ 1 / ( 1 − β ) ≪ n Then the distribution of cascades follows Power-Law: P [ | Informed | = k ] ∼ k − γ

  52. Approximation Approximation (this paper) When k ≪ 1 / ( 1 − β ) ≪ n Then the distribution of cascades follows Power-Law: P [ | Informed | = k ] ∼ k − γ Observations (2 weeks of publicly available Twitter data): ◮ k ≈ 7 · 10 4 ◮ 1 / ( 1 − β ) ≈ 5 · 10 6 ◮ n ≈ 3 · 10 8

  53. Conclusion We propose the first model of generating cascades with theoretical guarantees (more guarantees in the paper). Thank You!

  54. Question: Ground Truth ◮ Open sourced data from twitter ◮ Data were preprocessed and cleaned to get cascades ◮ New hashtags were treated as information cascades ◮ Connections through retweets and replies (and usage of hashtags) ◮ See paper (bit.ly/why-cascade) for tedious statistical analysis

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend