network topology inference
play

Network Topology Inference Gonzalo Mateos Dept. of ECE and Goergen - PowerPoint PPT Presentation

Network Topology Inference Gonzalo Mateos Dept. of ECE and Goergen Institute for Data Science University of Rochester gmateosb@ece.rochester.edu http://www.ece.rochester.edu/~gmateosb/ April 9, 2019 Network Science Analytics Network Topology


  1. Network Topology Inference Gonzalo Mateos Dept. of ECE and Goergen Institute for Data Science University of Rochester gmateosb@ece.rochester.edu http://www.ece.rochester.edu/~gmateosb/ April 9, 2019 Network Science Analytics Network Topology Inference 1

  2. Network topology inference Network topology inference problems Link prediction Case study: Predicting lawyer collaboration Inference of association networks Case study: Inferring genetic regulatory interactions Tomographic network topology inference Case study: Computer network topology identification Network Science Analytics Network Topology Inference 2

  3. Network topology inference ◮ So far dealt with modeling and inference of observed network graphs ⇒ Q: If a portion of G is unobserved, can we infer it from data? ◮ Discussed construction of representations G ( V , E ) for network mapping ⇒ Largely informal methodology, lacking an element of validation ◮ Formulate instead as statistical inference task, i.e. given ◮ Measurements x i of attributes at some or all vertices i ∈ V ◮ Indicators y ij of edge status for some vertex pairs { i , j } ∈ V (2) ◮ A collection G of candidate graphs G Goal: infer the topology of the network graph G ( V , E ) ◮ Three canonical network topology inference problems (i) Link prediction (ii) Association network inference (iii) Tomographic network topology inference Network Science Analytics Network Topology Inference 3

  4. Link prediction Original graph Link prediction ◮ Suppose we observe vertex attributes x = [ x 1 , . . . , x N v ] ⊤ ; and ◮ Edge status is only observed for some subset of pairs V (2) obs ⊂ V (2) miss = V (2) \ V (2) ◮ Goal: predict edge status for all other pairs, i.e., V (2) obs Network Science Analytics Network Topology Inference 4

  5. Association network inference Original graph Association network inference ◮ Suppose we only observe vertex attributes x = [ x 1 , . . . , x N v ] ⊤ ; and ◮ Assume ( i , j ) defined by nontrivial ‘level of association’ among x i , x j ◮ Goal: predict edge status for all vertex pairs V (2) Network Science Analytics Network Topology Inference 5

  6. Tomographic network topology inference Original graph Tomographic inference ◮ Suppose we only observe x i for vertices i ⊂ V in the ‘perimeter’ of G ◮ Goal: predict edge and vertex status in the ‘interior’ of G Network Science Analytics Network Topology Inference 6

  7. Link prediction Network topology inference problems Link prediction Case study: Predicting lawyer collaboration Inference of association networks Case study: Inferring genetic regulatory interactions Tomographic network topology inference Case study: Computer network topology identification Network Science Analytics Network Topology Inference 7

  8. Link prediction ◮ Let G ( V , E ) be a random graph, with adjacency matrix Y ∈ { 0 , 1 } N v × N v ⇒ Y obs and Y miss denote entries in V (2) obs and V (2) miss Link prediction Predict entries in Y miss , given observations Y obs = y obs and possibly various vertex attributes X = x ∈ R N v ◮ Edge status information may be missing due to: ⇒ Difficulty in observation, issues of sampling ⇒ Edge is not yet present, wish to predict future status ◮ Given a model for X and ( Y obs , Y miss ), jointly predict Y miss based on � Y obs = y obs , X = x Y miss � � � P ⇒ More manageable to predict the variables Y miss individually ij Network Science Analytics Network Topology Inference 8

  9. Informal scoring methods ◮ Idea: compute score s ( i , j ) for missing ‘potential edges’ { i , j } ∈ V (2) miss ⇒ Predicted edges returned by retaining the top n ∗ scores ◮ Scores designed to assess certain local structural properties of G obs ⇒ Distance-based, inspired by the small-world principle s ( i , j ) = − dist G obs ( i , j ) ⇒ Neighborhood-based, e.g., the number of common neighbors |N obs ∩ N obs | i j s ( i , j ) = |N obs ∩ N obs | or s ( i , j ) = i j |N obs ∪ N obs | i j ⇒ Favor loosely-connected common neighbors [Adamic-Adar’03] 1 � s ( i , j ) = log |N obs | k k ∈N obs ∩N obs i j Network Science Analytics Network Topology Inference 9

  10. Tests on co-authorship networks ◮ Results from a link prediction study in [Liben Nowell-Kleinberg’03] Network Science Analytics Network Topology Inference 10

  11. Classification methods ◮ Idea: use training data y obs and x to build a binary classifier ⇒ Classifier is in turn used to predict the entries in Y miss ◮ Logistic regression classifiers most popular, based on the model � � � � Z ij = z ) P β ( Y ij = 1 = β ⊤ z , log where � Z ij = z ) � P β ( Y ij = 0 (i) β ∈ R K is a vector of regression coefficients; and (ii) Z ij is a vector of explanatory variables indexed by { i , j } Z ij = [ g 1 ( Y obs ( − ij ) , X ) , . . . , g K ( Y obs ( − ij ) , X )] ⊤ ◮ Functions g k ( · ) encode useful predictive information in y obs ( − ij ) and x Ex: vertex attributes, score functions, network statistics in ERGMs Network Science Analytics Network Topology Inference 11

  12. Logistic regression classifier ◮ Train: Obtain MLE ˆ β via iteratively-reweighted LS ◮ Test: Potential edges ( i , j ) declared present based on probabilities ⊤ z � � ˆ exp β � Z ij = z ) = � P ˆ β ( Y ij = 1 ⊤ z � � ˆ 1 + exp β ◮ Logistic regression assumes Y ij conditionally independent given z ⇒ Seldom the case with relational network data ◮ Underlying mechanism of data missingness is important ⇒ Classification for link prediction reminiscent of cross-validation ⇒ Assumption that data are missing at random is fundamental Network Science Analytics Network Topology Inference 12

  13. Latent variable models ◮ In addition to a lineal predictor β ⊤ z , latent models describe Y ij ⇒ As a function of vertex-specific latent variables u i and u j Homophily Stochastic equivalence ◮ Latent models are flexible to capture underlying social mechanisms Ex: homophily (transitivity) and stochastic equivalence (groups) Network Science Analytics Network Topology Inference 13

  14. Latent class and distance models ◮ Latent distance model: node i has unobserved position U i ∈ R d ◮ Positions U i in latent space assumed i.i.d. e.g., Gaussian distributed ◮ Model cond. probability of edge Y ij as function of β ⊤ z − � u i − u j � 2 ◮ Homophily: Nearby nodes in latent space more likely to link ◮ Latent class model: node i belongs to unobserved class U i ∈ { 1 , . . . , k } ◮ Classes U i assumed i.i.d. e.g., multinomial distributed ◮ Model cond. probability of edge Y ij as function of β ⊤ z − θ u i , u j ◮ Stochastic equivalence: Nodes in same class equally likely to link ◮ P. D. Hoff, “Modeling homophily and stochastic equivalence in symmetric relational data,” NIPS, 2008 Network Science Analytics Network Topology Inference 14

  15. Logistic regression with latent variables ◮ Let M ∈ R N v × N v be unknown, random, and symmetric of the form M = U ⊤ ΛU + E , where (i) U = [ u 1 , . . . , u N v ] is a random orthonormal matrix of latent variables; (ii) Λ is a random diagonal matrix; and (iii) E is a symmetric matrix of i.i.d. noise entries ǫ ij ◮ Latent eigenmodel subsumes the class and distance variants [Hoff’08] ⇒ Notice that M ij = u T i Λu j + ǫ ij ◮ The logistic regression model with latent variables is � � Z ij = z , M ij = m ) � � P β ( Y ij = 1 = β ⊤ z + m log � Z ij = z , M ij = m ) � P β ( Y ij = 0 ◮ Y ij still assumed conditionally independent given Z ij and M ij ⇒ But they are conditionally dependent given only Z ij Network Science Analytics Network Topology Inference 15

  16. Bayesian link prediction ◮ Specify distributions for U , Λ , E to make statistical link predictions ◮ Bayesian inference natural ⇒ Specify a prior for β as well ◮ To predict those entries in Y miss , threshold the posterior mean � �  β ⊤ Z ij + M ij  exp � Y obs = y obs , Z ij = z � � E   � β ⊤ Z ij + M ij 1 + exp ◮ Use MCMC algorithms to approximate the posterior distribution ◮ Gaussian distributions attractive for their conjugacy properties ◮ Higher complexity than MLE for standard logistic regression ⇒ Need to generate draws for N 2 v unobserved variables { U ij } ⇒ Major cost reduction with reduced rank( U ) = k ≪ N v models Network Science Analytics Network Topology Inference 16

  17. Case study Network topology inference problems Link prediction Case study: Predicting lawyer collaboration Inference of association networks Case study: Inferring genetic regulatory interactions Tomographic network topology inference Case study: Computer network topology identification Network Science Analytics Network Topology Inference 17

  18. Lawyer collaboration network ◮ Network G obs of working relationships among lawyers [Lazega’01] ◮ Nodes are N v = 36 partners, edges indicate partners worked together 13 33 5 8 36 6 31 30 10 24 32 18 23 20 15 28 4 22 35 3 34 26 14 19 25 12 16 17 9 7 29 2 27 21 11 1 ◮ Data includes various node-level attributes: ◮ Seniority (node labels indicate rank ordering) ◮ Office location (triangle, square or pentagon) ◮ Type of practice, i.e., litigation (red) and corporate (cyan) ◮ Gender (three partners are female labeled 27, 29 and 34) ◮ Goal: predict cooperation among social actors in an organization Network Science Analytics Network Topology Inference 18

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend