Scalable statistical estimation methods for large, time-varying networks
Duy Vu1 Arthur Asuncion2 David Hunter1 Padhraic Smyth3
1Department of Statistics, Penn State 2Google Inc. 3Department of Computer Science, UC-Irvine
Scalable statistical estimation methods for large, time-varying - - PowerPoint PPT Presentation
Scalable statistical estimation methods for large, time-varying networks Duy Vu 1 Arthur Asuncion 2 David Hunter 1 Padhraic Smyth 3 1 Department of Statistics, Penn State 2 Google Inc. 3 Department of Computer Science, UC-Irvine Supported by ONR
1Department of Statistics, Penn State 2Google Inc. 3Department of Computer Science, UC-Irvine
◮ Egocentric: The counting process Ni(t) = cumulative number
◮ Relational: The counting process Nij(t) = cumulative number
5 10 15 20
N(t)
t 1 2 N1(t) N2(t) N4(t) N3(t)
Time
contactee contacter date 1 14155 2004-06-15 12:00:00.000 1 2238 2004-06-15 12:00:00.000 1 14275 2004-06-15 12:00:00.000 ... 13099 7683 2004-06-17 16:31:51.040 15231 14752 2004-06-17 16:31:51.040 ... 45087 7610 2007-10-31 12:23:15.683 16719 61 2007-10-31 13:28:38.670 48758 1 2007-10-31 13:47:16.843
!
5 10 15 20
N(t)
t 1 2 N1(t) N2(t) N4(t) N3(t)
◮ Ri(t) = I(t > tarr
i
◮ Ht− is the past of the network up to but not including time t ◮ α0(t) or β0(t) is the baseline hazard function ◮ β is the vector of coefficients to estimate ◮ si(t) = (si1(t), . . . , sip(t)) is a p-vector of statistics for paper i
◮ W(t) is N(N − 1) × p with (i, j)th row Rij(t)s(i, j, t)⊤; ◮ J(t) is the indicator that W(t) has full column rank.
2 4 6 0.28 0.29 0.3 0.31 0.32 Paper batches Average normalized rank APS PA P2PT P2PTR180 5 10 0.16 0.18 0.2 0.22 0.24 0.26 Paper batches Average normalized rank arXiv−PH PA P2PT P2PTR180 5 10 0.1 0.15 0.2 0.25 0.3 Paper batches Average normalized rank arXiv−TH PA P2PT P2PTR180 LDA LDA+P2PTR180
2 4 6 −12.95 −12.9 −12.85 −12.8 −12.75 Paper batches Average partial likelihood APS PA P2PT P2PTR180 5 10 −10.8 −10.6 −10.4 −10.2 −10 Paper batches Average partial likelihood arXiv−PH PA P2PT P2PTR180 5 10 −10.5 −10 −9.5 −9 −8.5 Paper batches Average partial likelihood arXiv−TH PA P2PT P2PTR180 LDA LDA+P2PTR180
Seller
Broker Buyer
Seller
Broker Buyer
Nodes Edges Stats-Building Phase Training Phase Test Phase Irvine 1,899 20,296 7,073 7,646 5,507 MetaFilter 51,362 76,791 60,376 8,763 7,620
! Irvine: an online social network at UC Irvine
! MetaFilter: a community weblog contact network
! SIM-1: Constant coefficients
! SIM-2: Varying coefficients
Ground-truth Estimate
! Aalen coefficients suggest two distinct phases of network evolution,
! On prediction experiments, Aalen/Cox outperforms logistic regression.
! Network effects continuously change over time. ! Time-varying Aalen model outperforms Cox model.
Aalen, O. O., Borgan, O., and Gjessing, H. K. Survival and Event History Analysis: A Process Point of View Springer, 2008. Blei, D.M., Ng, A.Y., and Jordan, M.I. Latent Dirichlet allocation. Journal of Machine Learning Research, 3:993–1022, 2003. Butts, C.T. A relational event framework for social action. Sociological Methodology, 38(1):155–200, 2008. Cox, D. R. Regression models and life-tables. Journal of the Royal Statistical Society, Series B, 34:187–220, 1972. Perry, P. O. and Wolfe, P. J. Point process modeling for directed interaction networks arXiv:1011.1703v1 [stat.ME] 8 Nov 2010 Salath´ e, M. and Khandelwal, S. Assessing Vaccination Sentiments with Online Social Media: Implications for Infectious Disease Dynamics and Control PLoS Computational Biology, 7(10): e1002199. doi:10.1371/journal.pcbi.1002199, 2011. Vu, D. Q., Asuncion, A. U., Hunter, D. R., and Smyth, P. Dynamic Egocentric Models for Citation Networks, Proceedings of the 28th International Conference on Machine Learning (ICML 2011), 857–864, 2011. Vu, D. Q., Asuncion, A. U., Hunter, D. R., and Smyth, P. Continuous-Time Regression Models for Longitudinal Networks Advances in Neural Information Processing Systems 24 (NIPS 2011), to appear.