The Hierarchical Structure of Networks Aaron Clauset Santa Fe - - PowerPoint PPT Presentation

the hierarchical structure of networks
SMART_READER_LITE
LIVE PREVIEW

The Hierarchical Structure of Networks Aaron Clauset Santa Fe - - PowerPoint PPT Presentation

The Hierarchical Structure of Networks Aaron Clauset Santa Fe Institute 4 August 2008 SFI / CAIDA W orkshop Networks and Navigation First, Some Pictures social groups or communities teenage friendships * research collaborations *


slide-1
SLIDE 1

The Hierarchical Structure

  • f Networks

Aaron Clauset Santa Fe Institute 4 August 2008 SFI / CAIDA W

  • rkshop

Networks and Navigation

slide-2
SLIDE 2

First, Some Pictures

slide-3
SLIDE 3

research collaborations teenage friendships

social groups or communities

*image stolen from elsewhere * *

slide-4
SLIDE 4

metabolites proteins

functional(?) clusters, hierarchies

* * *image stolen from elsewhere

slide-5
SLIDE 5

amazon.com communities books on politics

co-purchasing (topical?) groups

* *image stolen from elsewhere

slide-6
SLIDE 6

A Question

How can we extract

  • structural patterns
  • at many scales
  • in a rigorous fashion

from complex networks?

slide-7
SLIDE 7

What is Structure?

some stylized ideas

slide-8
SLIDE 8

no structure

slide-9
SLIDE 9

no structure modular structure

  • ne scale
slide-10
SLIDE 10

no structure modular structure hierarchical structure

  • ne scale

multi-scale

slide-11
SLIDE 11

A Question

How can we extract

  • hierarchical structure
  • in a rigorous fashion

from complex networks?

network data hierarchy ?

slide-12
SLIDE 12

One Approach

Model-based inference

  • 1. describe how to generate hierarchies (a model)
  • 2. “fit” model to empirical data
  • 3. test “fitted” model
  • 4. extract predictions + insight
slide-13
SLIDE 13

A Model of Hierarchy

slide-14
SLIDE 14

A Model of Hierarchy

probability assortative modules

pr

D, {pr}

slide-15
SLIDE 15

“inhomogeneous” random graph

→ →

model instance

Pr(i, j connected) = pr i j i j = p(lowest common ancestor of i,j)

slide-16
SLIDE 16

slide-17
SLIDE 17

Model Features

  • explicit model = explicit assumptions
  • very flexible (many parameters)
  • captures structure at all scales
  • arbitrary mixtures of assortativity, disassortativity
  • learnable directly from data
slide-18
SLIDE 18

Learning From Data

  • We use a Bayesian approach:
  • likelihood function

scores quality of model

  • sample high quality models via MCMC
  • technical details in arXiv : physics/0610051 and

Nature 453, p98 (2008)

L = Pr( data | model )

slide-19
SLIDE 19

From Graph to Ensemble

slide-20
SLIDE 20

From Graph to Ensemble

  • Given graph
  • run MCMC to equilibrium
  • then, for each sampled , draw a resampled

graph from ensemble A test: do resampled graphs look like original?

D

G G

slide-21
SLIDE 21

Grassland species* plant

→ →

herbivore

parasite

*thank you: Jennifer Dunne

slide-22
SLIDE 22

10 10

1

10

!3

10

!2

10

!1

10

a

Degree, k Fraction of vertices with degree k

Degree Distribution

resampled

  • riginal

slide-23
SLIDE 23

Clustering Coefficient

resampled

  • riginal

0.05 0.1 0.15 0.2 0.25 0.3 0.05 0.1 0.15 0.2 0.25

Fraction of graphs with clustering coefficient c Clustering coefficient, c

resampled

  • riginal

slide-24
SLIDE 24

2 4 6 8 10 10

!3

10

!2

10

!1

10

b

Distance, d Fraction of vertex!pairs at distance d

Distance Distribution

resampled

  • riginal

slide-25
SLIDE 25

Missing Links

A test: can model predict missing links?

slide-26
SLIDE 26

Predicting is Hard

  • remove edges from
  • how easy to guess a missing link?

n = 75 m = 113 pguess ≈ k n2 − m + k = O(n−2) k pguess = k/(2662 + k) G

slide-27
SLIDE 27
  • Given incomplete graph
  • run MCMC to equilibrium
  • then, over sampled , compute average

for links

  • predict links with high values are missing

Test idea via leave-k-out cross-validation perfect accuracy: AUC = 1 no better than chance: AUC = 1/2

(i, j) ∈ G

Predicting Missing Links

D

pr G pr

slide-28
SLIDE 28

Missing Structure

0.2 0.4 0.6 0.8 1 0.4 0.5 0.6 0.7 0.8 0.9 1 Area under ROC curve Fraction of edges observed, k/m Grassland species network Pure chance Common neighbors Jaccard coeff. Degree product Shortest paths Hierarchical structure

simple predictors

hierarchy

pure chance

AUC

slide-29
SLIDE 29

0.2 0.4 0.6 0.8 1 0.4 0.5 0.6 0.7 0.8 0.9 1 AUC Fraction of edges observed Terrorist association network

a

Pure chance Common neighbors Jaccard coefficient Degree product Shortest paths Hierarchical structure

Other Networks

0.2 0.4 0.6 0.8 1 0.4 0.5 0.6 0.7 0.8 0.9 1 AUC Fraction of edges observed

  • T. pallidum

metabolic network

b

Pure chance Common neighbors Jaccard coefficient Degree product Shortest paths Hierarchical structure

slide-30
SLIDE 30

Summary

  • Many real networks are hierarchically modular
  • Hierarchies can
  • model multi-scale structure
  • generalize a single network
  • predict missing links
  • Model-based inference is very powerful

Acknowledgments:

  • C. Moore, M.E.J. Newman, C.H. Wiggins, and C.R. Shalizi
slide-31
SLIDE 31

Fin

slide-32
SLIDE 32

Markov chain Monte Carlo (MCMC)

Given , choose random internal node Choose random reconfiguration of subtrees Recompute probabilities and likelihood Sampling states according to their likelihood

D

three subtree configurations

{pr} L

[ergodicity] [detailed balance] (up to relabeling)

slide-33
SLIDE 33

Grassland species plant

→ →

herbivore

parasite

slide-34
SLIDE 34

c

slide-35
SLIDE 35

Graph Resampling

slide-36
SLIDE 36
  • 1. Summary Statistics
10 10 1 10 2 10 3 10 4 10 !5 10 !4 10 !3 10 !2 10 !1 10 P(x) x

degree distribution

1 2 3 4 5 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 Distance, d p(d)

distance distribution rich-club distribution short-loop distribution betweenness function degree-degree correlations ... etc.

slide-37
SLIDE 37
  • 1. Summary Statistics

The good

  • good for exploratory analysis
  • often quick calculations

The bad

  • throw away important information
  • can make different networks appear similar
  • what are right statistics to measure?
  • different statistics often highly correlated
  • indirect measures of large-scale structure, function
slide-38
SLIDE 38
  • 2. Algorithmic Analysis

global modularity Q

C B B U U

local modularity R network motifs box covering clique covering ... etc.

slide-39
SLIDE 39
  • 2. Algorithmic Analysis

The good

  • good for exploratory analysis
  • illustrate large-scale structure, heterogeneity

The bad

  • often (NP-)hard optimizations
  • can be sensitive to noise, uncertainty
  • ad hoc or heuristic measures of structure, function
  • algorithm = theory
  • implied physics often unclear
slide-40
SLIDE 40
  • 3. Statistical Inference

hierarchical random graphs community mixtures latent space models information bottlenecks correlation reconstruction network classification

I(X; Y ) = H(X) − H(X|Y )

slide-41
SLIDE 41
  • 3. Statistical Inference

The good

  • model-based measures of structure
  • concrete, testable predictions
  • better robustness to noise, uncertainty
  • well-grounded in computer science, statistics

The bad

  • models must be explicit, precise
  • often hard computations
  • data intensive
slide-42
SLIDE 42

Two Case Studies

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 101 102 103 104 105 106 107 108 109 110 111 112 113 114 100

Zachary’s Karate Club NCAA Schedule 2000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34

n = 34 m = 78 n = 115 m = 613

slide-43
SLIDE 43

Mixing Times

MCMC mixes relatively quickly Equilibrium in steps

!"

!#

!"

"

!"

#

!$$" !$"" !!%" !!&" !!'" !!$" !!"" !%" ()*+,-,.$ /01!/)2+/)3004

, ,

!"

!#

!"

"

!"

#

!$$"" !$""" !!%"" !!&"" !!'"" !!$"" !!""" !%"" ()*+,-,.$

, ,

2565(+7,.89' :;<<,$"""7,.8!!#

O(n2)

equilibrium

slide-44
SLIDE 44

Hierarchies

8 14 3 13 4 20 22 1 8 2 12 5 6 7 11 17 1 9 31 23 15 19 21 32 29 2 8 2 4 27 3 16 3 3 10 34 2 6 2 5

point estimate consensus hierarchy

1 5 6 7 11 17 2 8 1 4 3 4 18 2 22 13 1 2 2 5 26 9 31 29 32 10 15 1 6 1 9 21 2 3 24 2 7 28 30 3 3 3 4

slide-45
SLIDE 45

Hierarchies

BrighamYoung (0) NewMexico (4) SanDiegoStat (9) Wyoming (16) U t a h ( 2 3 ) N V L a s V e g a s ( 1 4 ) AirForce (93) ColoradoStat (41) NMState (69) A r k a n s a s S t a t ( 2 4 ) NorthTexas (11) BoiseState (28) Idaho (50) UtahState (90) OregonState (108) ArizonaState (8) Arizona (22) C a l i f
  • r
n i a ( 1 1 1 ) WashState (78) UCLA (21) Oregon (68) Stanford (77) SouthernCal (7) W a s h i n g t
  • n
( 5 1 ) H a w a i i ( 1 1 4 ) N e v a d a ( 6 7 ) TexasElPaso (83) FresnoState (46) TXChristian (110) T u l s a ( 8 8 ) SanJoseState (73) Rice (49) S
  • u
t h e r n M e t h ( 5 3 ) F l
  • r
i d a S t a t e ( 1 ) WakeForest (105) M a r y l a n d ( 1 9 ) Clemson (103) NCState (25) D u k e ( 4 5 ) Virginia (33) G e
  • r
g i a T e c h ( 3 7 ) NoCarolina (89) WesternMich (14) C e n t r a l M i c h ( 3 8 ) NorthernIll (12) BallState (26) T
  • l
e d
  • (
8 5 ) EasternMich (43) A k r
  • n
( 1 8 ) Buffalo (34) Ohio (71) K e n t ( 5 4 ) M a r s h a l l ( 9 9 ) B
  • w
l i n g G r e e n ( 3 1 ) MiamiOhio (61) Connecticut (42) MissState (65) L
  • u
i s i a n S t a t ( 9 6 ) (62)Vanderbilt (95)Georgia (17)Auburn ( 8 7 ) M i s s i s s i p p i ( 7 ) S
  • C
a r
  • l
i n a ( 7 6 ) T e n n e s s e e (27)Florida (56)Kentucky (113)Arkansas (20)Alabama (101)MiamiFlorida ( 1 9 ) V i r g i n i a T e c h (30)WestVirginia (35)Syracuse (55)Pittsburgh (80)Navy (29)BostonColl (79)Temple (94)Rutgers (39)Purdue ( 3 2 ) M i c h i g a n (2)Iowa (47)OhioState ( 1 3 ) N
  • r
t h w e s t e r n ( 1 6 ) I n d i a n a (60)Minnesota (100)MichiganStat (64)Illinois (6)PennState (15)Wisconsin (74)Nebraska ( 5 2 ) K a n s a s ( 3 ) K a n s a s S t a t e ( 1 ) B a y l
  • r
( 9 8 ) T e x a s (107)OKState ( 8 1 ) T e x a s A & M (72)IowaState (40)Colorado (102)Missouri ( 8 4 ) O k l a h
  • m
a (5)TexasTech (82)NotreDame (57)Louisville (66)Memphis ( 7 5 ) S
  • u
t h e r n M i s s (91)Army (86)Tulane (48)Houston ( 1 1 2 ) A L B i r m i n g h a m (92)Cincinnati ( 4 4 ) E a s t C a r
  • l
i n a ( 9 7 ) L
  • u
i s i a n a L a f ( 6 3 ) M i d T N S t a t e ( 3 6 ) C e n t F l
  • r
i d a (59)LouisianMonr (58)LouisianTech

point estimate consensus hierarchy

LouisianTech (58) LouisianMonr (59) MidTNState (63) LouisianaLaf (97) F l
  • r
i d a S t a t e ( 1 ) N C S t a t e ( 2 5 ) Virginia (33) GeorgiaTech (37) Duke (45) N
  • C
a r
  • l
i n a ( 8 9 ) Clemson (103) WakeForest (105) Maryland (109) Louisville (57) Memphis (66) EastCarolina (44) Cincinnati (92) S
  • u
t h e r n M i s s ( 7 5 ) Army (91) Houston (48) Tulane (86) ALBirmingham (112) Vanderbilt (62) G e
  • r
g i a ( 9 5 ) A u b u r n ( 1 7 ) A l a b a m a ( 2 ) Florida (27) Kentucky (56) MissState (65) S
  • C
a r
  • l
i n a ( 7 ) Tennessee (76) Mississippi (87) L
  • u
i s i a n S t a t ( 9 6 ) A r k a n s a s ( 1 1 3 ) Akron (18) N
  • r
t h e r n I l l ( 1 2 ) WesternMich (14) BallState (26) C e n t r a l M i c h ( 3 8 ) EasternMich (43) T
  • l
e d
  • (
8 5 ) BowlingGreen (31) Buffalo (34) C e n t F l
  • r
i d a ( 3 6 ) Connecticut (42) Kent (54) M i a m i O h i
  • (
6 1 ) Ohio (71) M a r s h a l l ( 9 9 ) FresnoState (46) Rice (49) S
  • u
t h e r n M e t h ( 5 3 ) N e v a d a ( 6 7 ) S a n J
  • s
e S t a t e ( 7 3 ) TexasElPaso (83) Tulsa (88) TXChristian (110) H a w a i i ( 1 1 4 ) (11)NorthTexas (24)ArkansasStat (28)BoiseState ( 5 ) I d a h
  • (
6 9 ) N M S t a t e ( 9 ) U t a h S t a t e (0)BrighamYoung (4)NewMexico (9)SanDiegoStat (16)Wyoming (23)Utah ( 4 1 ) C
  • l
  • r
a d
  • S
t a t (93)AirForce (104)NVLasVegas (22)Arizona (111)California (7)SouthernCal (51)Washington (8)ArizonaState (108)OregonState ( 2 1 ) U C L A (78)WashState (68)Oregon ( 7 7 ) S t a n f
  • r
d ( 1 9 ) V i r g i n i a T e c h (30)WestVirginia (35)Syracuse (55)Pittsburgh (101)MiamiFlorida (80)Navy (29)BostonColl ( 7 9 ) T e m p l e ( 9 4 ) R u t g e r s ( 2 ) I
  • w
a ( 6 ) P e n n S t a t e (13)Northwestern ( 1 5 ) W i s c
  • n
s i n (32)Michigan (39)Purdue (47)OhioState ( 6 ) M i n n e s
  • t
a (64)Illinois (100)MichiganStat (106)Indiana (5)TexasTech ( 8 4 ) O k l a h
  • m
a (40)Colorado (72)IowaState (102)Missouri ( 1 ) B a y l
  • r
(3)KansasState ( 5 2 ) K a n s a s ( 7 4 ) N e b r a s k a ( 8 1 ) T e x a s A & M ( 9 8 ) T e x a s (107)OKState (82)NotreDame