the hierarchical structure of networks
play

The Hierarchical Structure of Networks Aaron Clauset Santa Fe - PowerPoint PPT Presentation

The Hierarchical Structure of Networks Aaron Clauset Santa Fe Institute 4 August 2008 SFI / CAIDA W orkshop Networks and Navigation First, Some Pictures social groups or communities teenage friendships * research collaborations *


  1. The Hierarchical Structure of Networks Aaron Clauset Santa Fe Institute 4 August 2008 SFI / CAIDA W orkshop Networks and Navigation

  2. First, Some Pictures

  3. social groups or communities teenage friendships * research collaborations * *image stolen from elsewhere

  4. functional(?) clusters, hierarchies * * metabolites proteins *image stolen from elsewhere

  5. co-purchasing (topical?) groups amazon.com books on politics communities * *image stolen from elsewhere

  6. A Question How can we extract • structural patterns • at many scales • in a rigorous fashion from complex networks?

  7. What is Structure? some stylized ideas

  8. no structure

  9. no structure modular structure one scale

  10. no structure modular structure hierarchical structure one scale multi-scale

  11. A Question network data How can we extract • hierarchical structure • in a rigorous fashion from complex networks? → ? hierarchy

  12. One Approach Model-based inference 1. describe how to generate hierarchies (a model) 2. “fit” model to empirical data 3. test “fitted” model 4. extract predictions + insight

  13. A Model of Hierarchy

  14. A Model of Hierarchy D , { p r } assortative modules → probability p r

  15. model “inhomogeneous” random graph → → j i instance → i j Pr( i, j connected) = p r = p (lowest common ancestor of i,j )

  16. Model Features • explicit model = explicit assumptions • very flexible (many parameters) • captures structure at all scales • arbitrary mixtures of assortativity, disassortativity • learnable directly from data

  17. Learning From Data • We use a Bayesian approach: • likelihood function L = Pr( data | model ) scores quality of model • sample high quality models via MCMC • technical details in arXiv : physics/0610051 and Nature 453 , p98 (2008)

  18. From Graph to Ensemble

  19. From Graph to Ensemble • Given graph G • run MCMC to equilibrium • then, for each sampled , draw a resampled D G � graph from ensemble A test: do resampled graphs look like original?

  20. herbivore → → plant → parasite Grassland species* *thank you: Jennifer Dunne

  21. Degree Distribution a 0 10 Fraction of vertices with degree k original → ! 1 10 ! 2 10 → resampled ! 3 10 0 1 10 10 Degree, k

  22. Clustering Coefficient Fraction of graphs with clustering coefficient c 0.25 original → original 0.2 → 0.15 0.1 → → resampled resampled 0.05 0 0 0.05 0.1 0.15 0.2 0.25 0.3 Clustering coefficient, c

  23. Distance Distribution b 0 10 Fraction of vertex ! pairs at distance d original → ! 1 10 → ! 2 10 resampled ! 3 10 2 4 6 8 10 Distance, d

  24. Missing Links A test: can model predict missing links?

  25. Predicting is Hard • remove edges from G k • how easy to guess a missing link? k p guess ≈ n 2 − m + k = O ( n − 2 ) n = 75 m = 113 p guess = k/ (2662 + k )

  26. Predicting Missing Links • Given incomplete graph G • run MCMC to equilibrium � p r � • then, over sampled , compute average D ( i, j ) �∈ G for links � p r � • predict links with high values are missing Test idea via leave- k -out cross-validation perfect accuracy: AUC = 1 no better than chance: AUC = 1/2

  27. Missing Structure Grassland species network 1 Pure chance Common neighbors 0.9 Jaccard coeff. hierarchy Degree product Area under ROC curve → Shortest paths 0.8 Hierarchical structure AUC 0.7 → simple predictors 0.6 → 0.5 pure chance 0.4 0 0.2 0.4 0.6 0.8 1 Fraction of edges observed, k/m

  28. Other Networks Terrorist association network a 1 Pure chance Common neighbors 0.9 Jaccard coefficient Degree product Shortest paths 0.8 Hierarchical structure AUC 0.7 b T. pallidum metabolic network 1 Pure chance 0.6 Common neighbors 0.9 Jaccard coefficient Degree product 0.5 Shortest paths 0.8 Hierarchical structure 0.4 0 0.2 0.4 0.6 0.8 1 AUC Fraction of edges observed 0.7 0.6 0.5 0.4 0 0.2 0.4 0.6 0.8 1 Fraction of edges observed

  29. Summary • Many real networks are hierarchically modular • Hierarchies can • model multi-scale structure • generalize a single network • predict missing links • Model-based inference is very powerful Acknowledgments : C. Moore, M.E.J. Newman, C.H. Wiggins, and C.R. Shalizi

  30. Fin

  31. Markov chain Monte Carlo (MCMC) Given , choose random internal node D Choose random reconfiguration of subtrees [ergodicity] { p r } Recompute probabilities and likelihood L Sampling states according to their likelihood [detailed balance] three subtree configurations (up to relabeling)

  32. herbivore → → plant → parasite Grassland species

  33. c

  34. Graph Resampling

  35. 1. Summary Statistics 0 10 0.4 0.35 ! 1 10 0.3 0.25 ! 2 10 P(x) p(d) 0.2 ! 3 10 0.15 0.1 ! 4 10 0.05 ! 5 10 0 0 1 2 3 4 1 2 3 4 5 10 10 10 10 10 Distance, d x degree distribution distance distribution rich-club distribution ... etc. short-loop distribution betweenness function degree-degree correlations

  36. 1. Summary Statistics The good • good for exploratory analysis • often quick calculations The bad • throw away important information • can make different networks appear similar • what are right statistics to measure? • different statistics often highly correlated • indirect measures of large-scale structure, function

  37. 2. Algorithmic Analysis U B C B U global modularity Q local modularity R network motifs ... etc. box covering clique covering

  38. 2. Algorithmic Analysis The good • good for exploratory analysis • illustrate large-scale structure, heterogeneity The bad • often (NP-)hard optimizations • can be sensitive to noise, uncertainty • ad hoc or heuristic measures of structure, function • algorithm = theory • implied physics often unclear

  39. 3. Statistical Inference hierarchical random graphs latent space models correlation reconstruction I ( X ; Y ) = H ( X ) − H ( X | Y ) community mixtures information bottlenecks network classification

  40. 3. Statistical Inference The good • model-based measures of structure • concrete, testable predictions • better robustness to noise, uncertainty • well-grounded in computer science, statistics The bad • models must be explicit, precise • often hard computations • data intensive

  41. Two Case Studies 22 18 25 26 8 20 10 28 2 4 30 24 NCAA Schedule 2000 27 31 3 13 1 15 34 32 n = 115 m = 613 6 16 7 5 19 12 49 14 53 58 33 21 63 9 17 46 83 114 11 29 23 28 33 25 11 97 88 1 59 67 73 Zachary’s Karate Club 105 24 50 103 37 89 69 36 45 110 109 57 90 n = 34 m = 78 44 66 34 42 16 82 75 4 31 86 93 91 112 80 0 18 54 48 9 92 23 7 29 104 8 61 71 94 41 35 78 68 99 19 22 55 21 77 5 10 111 30 81 101 79 3 108 51 85 38 52 84 98 113 2 6 17 43 26 76 70 107 60 39 40 14 74 72 47 62 95 96 12 13 27 100 15 102 65 20 87 106 56 64 32

  42. Mixing Times equilibrium → → MCMC mixes ! %" ! %"" , , relatively quickly ! !"" ! !""" ! !$" ! !$"" Equilibrium in /01 ! /)2+/)3004 ! !'" ! !'"" O ( n 2 ) steps ! !&" ! !&"" ! !%" ! !%"" ! $"" ! $""" 2565(+7,.89' :;<<,$"""7,.8!!# ! $$" ! $$"" , , ! # " # ! # " # !" !" !" !" !" !" ()*+,-,. $ ()*+,-,. $

  43. Hierarchies 2 14 3 8 1 5 2 5 3 4 3 6 6 3 34 13 30 7 10 28 11 4 3 20 2 17 3 7 16 22 24 2 3 8 2 8 0 1 3 27 21 4 2 1 4 12 9 3 2 1 8 6 5 4 2 1 29 6 15 18 32 10 2 7 0 21 11 32 22 19 17 29 13 15 31 1 1 2 23 9 2 31 26 9 5 point estimate consensus hierarchy

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend