scaling choice models of relational social data
play

Scaling choice models of relational social data Jan Overgoor - PowerPoint PPT Presentation

Scaling choice models of relational social data Jan Overgoor Stanford University SIAM-NS July 09, 2020 Slides: bit.ly/c2g-venmo Joint work with George Pakapol Supaniratisai (Stanford) & Johan Ugander (Stanford) Events on networks


  1. Scaling choice models of relational social data Jan Overgoor · Stanford University SIAM-NS July 09, 2020 Slides: bit.ly/c2g-venmo Joint work with George Pakapol Supaniratisai (Stanford) & Johan Ugander (Stanford)

  2. Events on networks

  3. Observed data

  4. "Choosing to Grow a Graph" [Overgoor, Benson & Ugander, WWW’19] • Model edges as choices • Conditional on i initiating an edge, which j to pick from choice set C ? • Conditional Logit model:

  5. Conditional Logit choice process

  6. "Choosing to Grow a Graph" [Overgoor, Benson & Ugander, WWW’19] • Generalizes multiple known formation models and dynamics preferential attachment, local search, fitness, homophily, … • Efficient maximum likelihood estimation of model parameters, existing tools

  7. "Choosing to Grow a Graph" [Overgoor, Benson & Ugander, WWW’19] • Generalizes multiple known formation models and dynamics preferential attachment, local search, fitness, homophily, … • Efficient maximum likelihood estimation of model parameters, existing tools • Straightforward extension to events

  8. Two problems at scale 1. Estimation on large networks infeasible as n options for all m choices - features change at each event

  9. Two problems at scale 1. Estimation on large networks infeasible as n options for all m choices 2. Conditional logit model class less realistic - availability assumption of complete information ● ● ● ● ● ● ● ● ● ●

  10. Solution to Problem #1 – Negative sampling • Sample non-chosen alternatives and do estimation on the reduced choice set also called case-control sampling (see Vu 2015, Lerner 2019) • Update likelihood with sampling probabilities of data points: • Estimates on data with reduced choice sets generated with importance sampling are consistent for the estimates using complete choice sets. [McFadden 1977]

  11. Negative sampling strategies ● ● Uniform sampling ● + no adjustment necessary, weights cancel out − inefficient for rare (but important) features ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

  12. Negative sampling strategies ● ● Uniform sampling ● + no adjustment necessary, weights cancel out − inefficient for rare (but important) features ● ● ● ● ● ● Stratified sampling sample according to strata, adjust with ● ● ● ● ● ● ● ● ● ●

  13. Negative sampling strategies ● ● Uniform sampling ● + no adjustment necessary, weights cancel out − inefficient for rare (but important) features ● ● ● ● ● ● Stratified sampling sample according to strata, adjust with ● ● ● ● Importance sampling sample according to likelihood of being chosen ● ● ● − optimal weights are what we’re trying to estimate ● ● ●

  14. Sampling with synthetic data • Simulate 160k events with 5k nodes n Constant • Utility function with popularity, ● ● Uniform 1.00 repetition, reciprocity, and FoFs ● Importance ● ● • Estimate known parameter values ● MSE ● 0.10 ● ● ● ● ● 0.01 ● ● • Samples n constant at 10k, vary s ● ● ● ● ● ● • Stratification requires factors less 3 6 12 24 48 96 192 384 768 Number of samples (s) negative samples for comparable MSE

  15. Run time is linear in n and s 1000 Runtime Number of samples (s) (sec) 100 10 3 10 1 10 10 2 10 3 10 4 10 5 Number of data points (n)

  16. Sampling with synthetic data • Simulate 160k events with 5k nodes n*s Constant • Utility function with popularity, .300 repetition, reciprocity, and FoFs ● ● ● .100 ● ● ● ● ● ● ● • Estimate known parameter values ● MSE .030 ● .010 ● ● • Value of n and s at constant n*s budget ● .003 ● Uniform ● ● ● ● Importance • More choice samples ( n ) is better, but 3 6 12 24 48 96 192 384 768 Number of samples (s) diminishing returns below s = 24

  17. Back to problem #2 2. Conditional logit model class less realistic ● ● ● ● ● ● ● ● ● ●

  18. Mixed Logit • Combines multiple latent logits • Each ”mode” has it’s own utility function and choice set for example: social neighborhood Problems: • Log-likelihood not convex in general, need much slower EM • No sampling guarantees

  19. Solution to Problem #2 – De-mixed logit • Simplify: assume that each mode has a disjoint choice set • Reduces to m individual conditional logits, simple to estimate • The chosen item indicates the mode Friends FoFs Rest

  20. De-mixed logit choice process chooser neighborhood

  21. De-mixing with synthetic data • Simulate 80k events with 5k nodes • ”local” and “rest” mode with different utility functions = 0.75

  22. De-mixing with synthetic data • Simulate 80k events with 5k nodes log Degree • ”local” and “rest” mode with different 1.00 utility functions = 0.75 0.75 ● ● ● ● ● ● ● ● ● ● CL Estimates ● ● ● ● 0.50 • Conditional logit 0.25 • Estimates in between the two modes ● Uniform ● Importance (true values are 0.5 and 1.0) 0.00 16 32 64 128 256 512 1024 • Importance sampling doesn ’t help accuracy s

  23. De-mixing with synthetic data • Simulate 80k events with 5k nodes Reciprocity (ind) • ”local” and “rest” mode with different 3.00 utility functions = 0.75 ● ● CL Estimates ● 2.00 ● ● ● ● ● ● ● ● ● ● ● • Conditional logit 1.00 • Estimates not stable for different !! ● Uniform ● Importance values of s outside the model class 0.00 16 32 64 128 256 512 1024 s

  24. De-mixing with synthetic data • Simulate 80k events with 5k nodes Reciprocity (ind) • ”local” and “rest” mode with different 3.00 utility functions = 0.75 Demixed ML Estimates 2.00 • De-mixed logit 1.00 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● • Estimates accurate and stable ● Uniform ● Importance 0.00 16 32 64 128 256 512 1024 s

  25. Venmo Data 3M ● Scraped public transactions Transactions per week ● 25M users and 501M transactions 2M ● 80% transactions are “local” 1M ● Analyze stratified CL and de-mixed CL 2012 2014 2016 2018 Week

  26. Venmo Non-parametric estimates 10 2 ● Easy to test hypotheses over different modes. 10 1.5 Relative Probability 10 1 ● Degree is number of incoming transactions 10 0.5 ● ● Degree is less important 10 0 Local ● within social neighborhood, Non − local ● ● 10 − 0.5 super-linear outside. 0 1 3 10 30 100 300 In − degree

  27. Discussion ● Leverage existing results from sampling and econometrics literatures ● Make feasible to estimate complex models on very large graphs ● Think carefully about limitations of model class Future work ● Theory on “to sample or to negatively sample?” ● Sampling guarantees for mixed logit ● Empirical comparison with similar modeling frameworks (SAOM, REM) ● More applications THANKS! bit.ly/c2g-code overgoor@stanford.edu

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend