seismic a self exciting point process model for
play

SEISMIC: A Self-Exciting Point Process Model for Predicting Tweet - PowerPoint PPT Presentation

SEISMIC: A Self-Exciting Point Process Model for Predicting Tweet Popularity Qingyuan Zhao 1 , Murat A. Erdogdu 1 , Hera Y. He 1 , Anand Rajaraman 2 , Jure Leskovec 2 Department of Statistics 1 and Computer Science 2 , Stanford University


  1. SEISMIC: A Self-Exciting Point Process Model for Predicting Tweet Popularity Qingyuan Zhao 1 , Murat A. Erdogdu 1 , Hera Y. He 1 , Anand Rajaraman 2 , Jure Leskovec 2 Department of Statistics 1 and Computer Science 2 , Stanford University KDD’15, Aug 12, 2015

  2. Information cascade SEISMIC An information cascade occurs when people engage in the same actions. Background SEISMIC Experiments Summary Source: wikimedia.org Source: adweek.com 1/19

  3. Twitter SEISMIC Twitter provides the ideal playground to study information cascades. Start: a Twitter user posts a 140-character message which can Background be seen by his/her followers. SEISMIC Spread: a tweet is forwarded in Twitter by another user. Experiments Summary 2/19

  4. Predicting cascades in real time SEISMIC Background Goal SEISMIC Given the tweet and retweets up to time T , predict its final Experiments popularity. Summary 3/19

  5. Predicting cascades in real time SEISMIC Background Goal SEISMIC Given the tweet and retweets up to time T , predict its final Experiments popularity. Summary Applications Ranking content. Detecting viral/breakout tweets. Understanding human social behavior. 3/19

  6. Mathematical definitions SEISMIC Data Background Relative retweet time t 0 = 0 , t 1 , t 2 , . . . SEISMIC Experiments � Number of retweets by time t : R t = 1. Summary t i ≤ t Number of followers of each retweeter n 0 , n 1 , n 2 , . . . � Number of exposed users by time t : N t = n i . t i ≤ t 4/19

  7. Mathematical definitions SEISMIC Data Background Relative retweet time t 0 = 0 , t 1 , t 2 , . . . SEISMIC Experiments � Number of retweets by time t : R t = 1. Summary t i ≤ t Number of followers of each retweeter n 0 , n 1 , n 2 , . . . � Number of exposed users by time t : N t = n i . t i ≤ t Problem statement Given ( R t , N t ) for 0 ≤ t ≤ T , predict R ∞ . 4/19

  8. Approaches to cascade prediction SEISMIC Background Broadly categorized into two groups: SEISMIC Feature based methods (the majority): Experiments Summary Point process based methods: 5/19

  9. Approaches to cascade prediction SEISMIC Background Broadly categorized into two groups: SEISMIC Feature based methods (the majority): Experiments Feature engineering: temporal, network structure, content, Summary user, . . . Point process based methods: 5/19

  10. Approaches to cascade prediction SEISMIC Background Broadly categorized into two groups: SEISMIC Feature based methods (the majority): Experiments Feature engineering: temporal, network structure, content, Summary user, . . . Supervised learning: linear regression, collaborative filtering, regression trees, topic modeling, . . . Point process based methods: 5/19

  11. Approaches to cascade prediction SEISMIC Background Broadly categorized into two groups: SEISMIC Feature based methods (the majority): Experiments Feature engineering: temporal, network structure, content, Summary user, . . . Supervised learning: linear regression, collaborative filtering, regression trees, topic modeling, . . . Point process based methods: Dynamic Poisson process, reinforced Poisson process 5/19

  12. Approaches to cascade prediction SEISMIC Background Broadly categorized into two groups: SEISMIC Feature based methods (the majority): Experiments Feature engineering: temporal, network structure, content, Summary user, . . . Supervised learning: linear regression, collaborative filtering, regression trees, topic modeling, . . . Point process based methods: Dynamic Poisson process, reinforced Poisson process Our model (SEISMIC): self-exciting point process. 5/19

  13. Example SEISMIC Background SEISMIC Experiments Summary 6/19

  14. Example SEISMIC Histogram of Retweet Times Retweet Count 75 Background 50 SEISMIC Experiments 25 Summary 0 0 2 4 6 Prediction by SEISMIC 20000 15000 Retweets 10000 5000 0 0 2 4 6 Time since original tweet (hour) Final SEISMIC Cumulative 7/19

  15. SEISMIC SEISMIC Background SEISMIC (Self-Exciting Model of Information Cascades) is a SEISMIC flexible model of information cascades. Experiments Summary Highlights Generative model. Easy interpretation. Scalable: prediction takes O (# retweets ). State-of-the-art performance. 8/19

  16. Background: point processes SEISMIC Point process models Background P ( R t +∆ − R t = 1) SEISMIC R t is characterized by its intensity λ t = lim . Experiments ∆ ∆ ↓ 0 Summary 9/19

  17. Background: point processes SEISMIC Point process models Background P ( R t +∆ − R t = 1) SEISMIC R t is characterized by its intensity λ t = lim . Experiments ∆ ∆ ↓ 0 Summary Examples Poisson process: λ t = λ ; Reinforced Poisson process 1 : λ t = p · φ ( t ) · g ( R t ). 1 S. Gao, J. Ma, and Z. Chen. Modeling and predicting retweeting dynamics on microblogging platforms. In WSDM ’15, 2015. 9/19

  18. Background: point processes SEISMIC Point process models Background P ( R t +∆ − R t = 1) SEISMIC R t is characterized by its intensity λ t = lim . Experiments ∆ ∆ ↓ 0 Summary Examples Poisson process: λ t = λ ; Reinforced Poisson process 1 : λ t = p · φ ( t ) · g ( R t ). They are not suitable to model viral tweets. 1 S. Gao, J. Ma, and Z. Chen. Modeling and predicting retweeting dynamics on microblogging platforms. In WSDM ’15, 2015. 9/19

  19. SEISMIC SEISMIC Key steps of retweeting How often does a user check Twitter? Background SEISMIC What is the user’s probability of retweeting a given tweet? Experiments Summary 10/19

  20. SEISMIC SEISMIC Key steps of retweeting How often does a user check Twitter? Background Memory kernel (power law distribution). SEISMIC What is the user’s probability of retweeting a given tweet? Experiments Summary 10/19

  21. SEISMIC SEISMIC Key steps of retweeting How often does a user check Twitter? Background Memory kernel (power law distribution). SEISMIC What is the user’s probability of retweeting a given tweet? Experiments Tweet infectiousness. Summary 10/19

  22. SEISMIC SEISMIC Key steps of retweeting How often does a user check Twitter? Background Memory kernel (power law distribution). SEISMIC What is the user’s probability of retweeting a given tweet? Experiments Tweet infectiousness. Summary Self-exciting point process Infectiousness: “probability” of retweeting � λ t = p · n i φ ( t − t i ) , t ≥ t 0 . t i ≤ t Self-exciting: “rate” of viewing 10/19

  23. Time-varying infectiousness SEISMIC Fixed p is not enough to model viral tweets. Background Histogram of Retweet Times SEISMIC Retweet Count 75 Experiments 50 Summary 25 0 0 2 4 6 Infectiousness Estimated by SEISMIC 0.06 Infectiousness 0.04 0.02 0.00 0 2 4 6 SEISMIC replaces p by a smooth process p t . 11/19

  24. Estimate infectiousness SEISMIC We estimate p t by locally smoothing the maximum likelihood estimator (MLE): Background “Number of retweets” SEISMIC Experiments R t Summary � K t ( t − t i ) i =1 ˆ p t = . � t R t � n i K t ( t − s ) φ ( s − t i ) ds t i i =0 “Number of views” 12/19

  25. Predict popularity SEISMIC SEISMIC prediction formula Background Assume the out-degrees in the network have mean n ∗ and the SEISMIC infectiousness parameter p t ≡ p for t ≥ T . Then Experiments R T + p ( N T − N e T ) Summary  , if p < 1 ,   1 − pn ∗ n ∗  E [ R ∞ | F T ] = if p ≥ 1 ∞ , .    n ∗ R T � T where N e � T = n i φ ( t − t i ) dt . t i i =0 See our paper for derivation. 13/19

  26. Example SEISMIC Histogram of Retweet Times Retweet Count 75 Background 50 SEISMIC Experiments 25 Summary 0 0 2 4 6 Prediction by SEISMIC 20000 15000 Retweets 10000 5000 0 0 2 4 6 Time since original tweet (hour) Final SEISMIC Cumulative 14/19

  27. Experiments: dataset SEISMIC Background Raw dataset: all tweet and retweet activities SEISMIC from October 7 to November 7, 2011. Experiments Filter by: Summary Posted in the first 15 days. English tweets; No hashtag; At least 50 retweets; End up with 166076 cascades (in total over 34 million tweets/retweets). 15/19

  28. Baselines SEISMIC Background SEISMIC We compare SEISMIC to four different baselines: Experiments 1 LR: linear regression Summary 2 LR-D: linear regression with degree 3 DPM: dynamic Poisson model 4 RPS: reinforced Poisson model 16/19

  29. Comparison: Absolute Percentage Error (APE) SEISMIC APE = | ˆ R ∞ − R ∞ | / R ∞ . Background SEISMIC Experiments Summary 15% vs 25% percentage error when observe 1 hour. 17/19

  30. Comparison: Coverage of breakouts SEISMIC A list of true top 500 tweets with most retweets. Lists of predicted top 500 tweets at all time points. Background SEISMIC Experiments Summary 70% vs 55% coverage when observe 25% retweets. 18/19

  31. Summary SEISMIC Background In conclusion, SEISMIC SEISMIC Effectively models information cascades by self-exciting Experiments Summary point processes; Efficiently updates parameters and makes prediction; Outperforms several baselines and state-of-the-art. Code and data available online at http://snap.stanford.edu/seismic . 19/19

  32. Estimation of memory kernel φ ( t ) SEISMIC Background SEISMIC Experiments Summary 19/19

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend