Scalable statistical estimation methods for large, time-varying - PowerPoint PPT Presentation

Scalable statistical estimation methods for large, time-varying networks Duy Vu 1 Arthur Asuncion 2 David Hunter 1 Padhraic Smyth 3 1 Department of Statistics, Penn State 2 Google Inc. 3 Department of Computer Science, UC-Irvine Supported by ONR MURI Award Number N00014-08-1-1015 MURI grant meeting, January 10, 2012

Outline Counting processes for evolving networks Egocentric Models vs. Relational Models Egocentric Network Models Model Structure Application: Citation Networks Refer to Vu et al (ICML 2011) for further details Relational Network Models Refer to Vu et al (NIPS 2011) for further details See also Perry and Wolfe (2010)

Counting Processes for networks 4 t=11 ◮ Goal: Model a t=20 dynamically evolving 1 network using 2 t=3 counting processes. t=16 3

Counting Processes for networks 4 t=11 ◮ Goal: Model a t=20 dynamically evolving 1 network using 2 t=3 counting processes. t=16 3 ◮ Two possibilities (using terminology of Butts, 2008): ◮ Egocentric: The counting process N i ( t ) = cumulative number of “events” involving the i th node by time t . ◮ Relational: The counting process N ij ( t ) = cumulative number of “events” involving the ( i , j )th node pair by time t .

Counting Process approach: Egocentric example ◮ Combine the N i ( t ) to give a 4 multivariate counting process t=11 N ( t ) = ( N 1 ( t ) , . . . , N n ( t )) . t=20 1 2 t=3 ◮ Genuinely multivariate; no t=16 3 assumption about the independence of N i ( t ). N(t) 2 1 N 2 ( t ) N 4 ( t ) N 3 ( t ) N 1 ( t ) 0 0 5 10 15 20 t

Egocentric Example: Modeling of Citation Networks ◮ New papers join the network over time. ◮ At arrival, a paper cites others that are already in the network. ◮ Main dynamic development: Number of citations received . Time ◮ N i ( t ): Number of citations to paper i by time t . ◮ “At-risk” indicator R i ( t ): Equal to I { t arr < t } . i

Relational Example: Modeling a network of contacts ◮ Metafilter: Community weblog for sharing links and discussing content among its users. ◮ Pattern of contacts: Dynamically evolving network ◮ Links are non-recurrent ; i.e., N ij ( t ) is either 0 or 1. ◮ “At-risk” indicator R ij ( t ) = I { max( t arr , t arr ) < t < t e ij } . i j contactee contacter date 1 14155 2004-06-15 12:00:00.000 1 2238 2004-06-15 12:00:00.000 1 14275 2004-06-15 12:00:00.000 ... 13099 7683 2004-06-17 16:31:51.040 15231 14752 2004-06-17 16:31:51.040 ... 45087 7610 2007-10-31 12:23:15.683 16719 61 2007-10-31 13:28:38.670 48758 1 2007-10-31 13:47:16.843 !

Submartingales: Egocentric Case Each N i ( t ) is nondecreasing in time, so N ( t ) may be considered a submartingale ; i.e., it satisfies E [ N ( t ) | past up to time s ] ≥ N ( s ) for all t > s . N(t) 2 1 N 2 ( t ) N 4 ( t ) N 3 ( t ) N 1 ( t ) 0 0 5 10 15 20 t

Theory: The Doob-Meyer Decomposition Any submartingale may be uniquely decomposed as � t N ( t ) = λ ( s ) ds + M ( t ) : 0 ◮ λ ( t ) is the “signal” at time t , called the intensity function ◮ M ( t ) is the “noise,” a continuous-time Martingale. ◮ We will model each λ i ( t ) or λ ij ( t ).

Outline Counting processes for evolving networks Egocentric Models vs. Relational Models Egocentric Network Models Model Structure Application: Citation Networks Refer to Vu et al (ICML 2011) for further details Relational Network Models Refer to Vu et al (NIPS 2011) for further details See also Perry and Wolfe (2010)

Modeling the Intensity Process, Part I: Egocentric Case The intensity process for node i is given by ◮ Cox Proportional Hazard Model, fixed coefficients: β ⊤ s i ( t ) � � λ i ( t | H t − ) = R i ( t ) α 0 ( t ) exp , ◮ Aalen additive model, time-varying coefficients: β 0 ( t ) + β ( t ) ⊤ s i ( t ) � � λ i ( t | H t − ) = R i ( t ) , where ◮ R i ( t ) = I ( t > t arr ) is the “at-risk indicator” i ◮ H t − is the past of the network up to but not including time t ◮ α 0 ( t ) or β 0 ( t ) is the baseline hazard function ◮ β is the vector of coefficients to estimate ◮ s i ( t ) = ( s i 1 ( t ) , . . . , s ip ( t )) is a p -vector of statistics for paper i Let us consider the citation network examples. . .

Preferential Attachment Statistics For each cited paper j already in the network. . . ◮ First-order PA: s j 1 ( t ) = � N i =1 y ij ( t − ). “Rich get richer” effect ◮ Second-order PA: s j 2 ( t ) = � i � = k y ki ( t − ) y ij ( t − ). Effect due to being cited by well-cited papers j Statistics in red are time-dependent. Others are fixed once j joins the network. NB: y ( t − ) is the network just prior to time t.

Recency PA Statistic For each cited paper j already in the network. . . ◮ Recency-based first-order PA (we take T w = 180 days): s j 3 ( t ) = � N i =1 y ij ( t − ) I ( t − t arr < T w ). i Temporary elevation of citation intensity after recent citations j Statistics in red are time-dependent. Others are fixed once j joins the network. NB: y ( t − ) is the network just prior to time t.

Triangle Statistics For each cited paper j already in the network. . . ◮ “Seller” statistic: s j 4 ( t ) = � i � = k y ki ( t − ) y ij ( t ) y kj ( t − ). ◮ “Broker” statistic: s j 5 ( t ) = � i � = k y kj ( t ) y ji ( t − ) y ki ( t − ). ◮ “Buyer” statistic: s j 6 ( t ) = � i � = k y jk ( t ) y ki ( t ) y ji ( t − ). Seller A Broker B Buyer C Statistics in red are time-dependent. Others are fixed once j joins the network. NB: y ( t − ) is the network just prior to time t.

Out-Path Statistics For each cited paper j already in the network. . . ◮ First-order out-degree (OD): s j 7 ( t ) = � N i =1 y ji ( t − ). ◮ Second-order OD: s j 8 ( t ) = � i � = k y jk ( t − ) y ki ( t − ). j Statistics in red are time-dependent. Others are fixed once j joins the network. NB: y ( t − ) is the network just prior to time t.

Topic Modeling Statistics Additional statistics, using abstract text if available, as follows: ◮ An LDA model (Blei et al, 2003) is learned on the training set. ◮ Topic proportions θ generated for each training node. ◮ LDA model also used to estimate topic proportions θ for each node in the test set. ◮ We construct a vector of similarity statistics: s LDA ( t arr ) = θ i ◦ θ j , j i where ◦ denotes the element-wise product of two vectors. ◮ We use 50 topics; each s j component has a corresponding β .

Partial Likelihood (how to fit the Cox PH Model) Recall: The intensity process for node i is β ⊤ s i ( t ) � � λ i ( t | H t − ) = R i ( t ) α 0 ( t ) exp . If α 0 ( t ) ≡ α 0 ( t , γ ), we may use the “local Poisson-ness” of the multivariate counting process to obtain (and maximize) a likelihood function (details omitted). However, we treat α 0 as a nuisance parameter and take a partial likelihood approach as in Cox (1972): Maximize � � � � β ⊤ s i e ( t e ) β ⊤ s i e ( t e ) m exp m exp � � � = L ( β ) = . � κ ( t e ) � n β ⊤ s i ( t e ) i =1 R i ( t e ) exp e =1 e =1 Computational Trick: Write κ ( t e ) = κ ( t e − 1 ) + ∆ κ ( t e ), then optimize ∆ κ ( t e ) calculation.

Least Squares (How to fit the Aalen Additive Model) Recall: The intensity process for node i is β 0 ( t ) + β ( t ) ⊤ s i ( t ) � � λ i ( t | H t − ) = R i ( t ) . ◮ We do inference not for the β k but rather for their time-integrals � t B k ( t ) = β k ( s ) ds . (1) 0 ◮ Then � − 1 W ( t e ) ⊤ ∆ N ( t e ) , ˆ � W ( t e ) ⊤ W ( t e ) � B ( t ) = J ( t e ) (2) where t e ≤ t ◮ W ( t ) is N ( N − 1) × p with ( i , j )th row R ij ( t ) s ( i , j , t ) ⊤ ; ◮ J ( t ) is the indicator that W ( t ) has full column rank.

Data Sets We Analyzed Three citation network datasets from the physics literature: 1. APS: Articles in Physical Review Letters , Physical Review , and Reviews of Modern Physics from 1893 through 2009. Timestamps are monthly for older, daily for more recent. 2. arXiv-PH: arXiv high-energy physics phenomenology articles from Jan. 1993 to Mar. 2002. Timestamps are daily. 3. arXiv-TH: High-energy physics theory articles spanning from January 1993 to April 2003. Timestamps are continuous-time (millisecond resolution). Also includes text of paper abstracts. Papers Citations Unique Times APS 463,348 4,708,819 5,134 arXiv-PH 38,557 345,603 3,209 arXiv-TH 29,557 352,807 25,004

Three Phases 1. Statistics-building phase: Construct network history and build up network statistics. 2. Training phase: Construct partial likelihood and estimate model coefficients. 3. Test phase: Evaluate predictive capability of the learned model. Statistics-building is ongoing even through the training and test phases. The phases are split along citation event times. Building Training Test Number of unique citation APS 4,934 100 100 event times in the three phases: arXiv-PH 2,209 500 500 arXiv-TH 19,004 1000 5000

Scalable statistical estimation methods for large, time-varying - PowerPoint PPT Presentation

Scalable statistical estimation methods for large, time-varying networks Duy Vu 1 Arthur Asuncion 2 David Hunter 1 Padhraic Smyth 3 1 Department of Statistics, Penn State 2 Google Inc. 3 Department of Computer Science, UC-Irvine Supported by ONR

Cache Coherence in Scalable Machines Scalable Cache Coherent Systems Scalable, distributed

Part 3. Spectrum Estimation Part 3. Spectrum Estimation 3.2 Parametric Methods for Spectral

Survey of Fast Methods for large-scale tree estimation J E S U S S A N D O V A L Introduction

Motion Estimation by Affine Transforms Motion Estimation by Affine Transforms Motion Estimation

Scalable String Matching on the Scalable String Matching on the Scalable String Matching on the

Lecture 6. Bayesian estimation Lecture 6. Bayesian estimation 1 (172) 6. Bayesian estimation

Scalable Non-Parametric Statistical Estimation Aymeric DIEULEVEUT ENS Paris, INRIA February 6,

Statistics 435/535 Statistical Methods for Quality and Productivity Improvement / Statistical

Statistical Methods Statistical Methods Descriptive Inferential Statistics Statistics

Statistical Statistical Statistical Model Statistical Model Model Checking Model Checking

A Scalable Scalable Approach Approach A for for Large- -Scale Scale Schema Schema

Cycle time: 40 sec Cycle time: 12 sec Cycle time: 0.75 sec Cycle time: 1.25 sec Cycle time: 5

Catastrophe Estimation Catastrophe Estimation Alternative Methods Alternative Methods Kevin D.

STAT 401A - Statistical Methods for Research Workers Statistical Inference Jarad Niemi (Dr. J)

Statistical Natural Language Processing Statistical models: learning, inference, estimation,

STK-IN4300 Methods using Derived Input Directions Statistical Learning Methods in Data Science

Survival Rates and Multiple timescales Survival Lifetable estimators Competing risks Kaplan-

Elements of survival analysis Gilbert Ritschard Department of Econometrics and Laboratory of

More on the Cox PH model I. Confidence intervals and hypothesis tests Two methods for

Estimating survival from Grays Outline flexible model I. Introduction II. Semiparametric

Comparing two frameworks for parametric multi-state modelling, applied to hospital admissions

The Graph Hawkes Neural Network for Forecasting on Temporal Knowledge Graphs By Zhen Han, Yunpu

Introduction to Computational Linguistics PD Dr. Frank Richter (all slides provided by Prof. Dr.

Definable Sets, Euler Products of p -adic Integrals, and Zeta Functions Jamshid Derakhshan St.

Sambuz

Useful Links

Newsletter

Mail Us