sampling & community structure
in densely connected networks
Sune Lehmann, YY Ahn and JP Bagrow Technical University of Denmark
sampling & community structure in densely connected networks - - PowerPoint PPT Presentation
sampling & community structure in densely connected networks Sune Lehmann , YY Ahn and JP Bagrow Technical University of Denmark Yong-Yeol Ahn Jim Bagrow pervasive overlap sampling metadata Illustra6on: Newman &
in densely connected networks
Sune Lehmann, YY Ahn and JP Bagrow Technical University of Denmark
Illustra6on: ¡Newman ¡& ¡Girvan. ¡PRE ¡69, ¡026113 ¡(2004)
Sebastian_Bernhardsson Brian_Larsen Mikael_Caroc_Warner Jonas_Mengel-From Mads_Olesen Caroline_Buckee Nathan_Eagle Jure_Leskovec Anmol_Madan Aaron_Clauset Jose*_Fernando_Mendes Maximilian_Schich Julian_Candia Gourab_Ghoshal Sebastian_Ahnert Vinko_Zlatic* Dirk_Brockmann Pascal_Braun Skyler_Place Zehui_Qu Trevor_Gillaspy Annamaria_Talas Jim_Bagrow Cesar_A._Hidalgo Albert-Laszlo_Barabasi Ronaldo_Menezes Agi_Petroczky Suzanne_Aleva Yong-Yeol_Ahn Pu_WangMichele_Coscia Anne-Ruxandra_Carvunis Sabrina_Rabello Sang_Hoon_Lee Petter_Holme Sameet_Sreenivasan Peter_Csermely Janet_Kelley Ginestra_Bianconi Luigi_Cuccia Nitesh_Chawla Jozsef_Baranyi Joa*o_Gama_Oliveira Nicolle_Haley Marta_C._Gonzalez Mette_Miriam_Rakel_Bo*ll *****_****Dashun_Wang Chaoming_Song Martin_Schwartz Dan_Romescu Rut_Jesus Takashi_Iba J*rgen_Jensen Torben_Jensen Ole_Jensen Anders_Petersen Lasse_M*lgaard
Annelise_Maren_Kromann Alice_Martucci Dorthe_Sandager_Bilde Kaspar_Bredahl_Rasmussen Rikke_Thorsteinsson Line_Friis_Frederiksen Stig_Aagaard Benny_Lautrup Nikolaj_Beuschel Dennis_Meyhoff_Brink Palle_Hyldga*rd_Poulsen Jonas_Jakobsen Rikke_Hadrup Peder_Holm-Pedersen Morten_Felding Marie_Louise_Scharff_Grandorf Kristian_Ingemann_Petersen Kristoffer_Gravgaard Gert_Nielsen Ida_Solhaug Gert_Gadkj*r_Nielsen Ida_Marie_Heerfordt Kristian_West Signe_Torsbjerg_J*rgensen Aaron_Petersen_DiBona Ella_Caroline_DiBona Erik_West Karina_Louise_Petersen Niels_Arne_Dam Joakim_Grundahl Erik_Christensen Laura_Kirch_Kirkegaard Mikael_*rting_Kristiansen Steen_Thomsen Esben_Sverdrup-Jensen Kristian_Krohn_Djurhuus Mikkel_Nissen
Sebastian_Bernhardsson Brian_Larsen Lone_N*rgaard_Skoven Mikael_Caroc_Warner Jonas_Mengel-From Morten_Houmann_Jensen Lis_Agerb*k_J*rgensen Anne_Lynge_Agerb*k Mette_Agerb*k_Kj*ller Lisa_Agerb*k Carsten_Henriksen Karin_Agerb*k Karen_Agerb*k_J*rgensen Martin_Schwartz Dan_Romescu J*rgen_Jensen Katrine_Bj*rnlund Torben_Jensen Ole_Jensen Louise_Fynbo_Jensen Torbj*rn_Jensen Sarah_Wasana Tim_SweeneyShannon_Walkley James_SweeneyColeen_Filipinas Lauren_Knight_Lewicki Peter_Lehmann_Syre_Fin S*ren_McLaks_Lehmann Lars_Lehmann_Hunnam B*rge_Lehmann Mikis_Theodorakis_Lehmann Signe_Lehmann Sanne_Lehmann_Nielsen Lene_Lehmann Helle_Lehmann_Becker Steen_Lehmann Winnie_R*dkj*r
Clauset, Moore, Newman. Nature 453, 98 (2008)
Clauset, Moore, Newman. Nature 453, 98 (2008)
a b c a b c
S(eac, ebc) = 1 3 S(eac, ebc) = 1
LinkComm R Package by Alex T. Kalinka
nc(nc−1) 2
BRUSH HAIR GROOM COMB HAIRSPRAY TOOTHPASTE TOOTHBRUSH PAINTER PAINTING PAINT BROOM SWEEP
BEAUTIFUL
SUNSHINE
AFTERNOON, EVENING
LinkComm R Package by Alex T. Kalinka (Pavel Tomancak's group)
LinkComm R Package by Alex T. Kalinka (Pavel Tomancak's group)
LinkComm R Package by Alex T. Kalinka (Pavel Tomancak's group)
LinkComm R Package by Alex T. Kalinka (Pavel Tomancak's group)
LinkComm R Package by Alex T. Kalinka (Pavel Tomancak's group)
Community coverage Overlap quality Overlap coverage
community memberships no membership high coverage low coverage high overlap coverage low overlap coverage high overlap low overlap
Community quality
Subjects
HIV / AIDS Medical Nonfiction / General Infectious Diseases
Subjects
Africa - General Africa History
Amazon.com
Subjects
HIV / AIDS Medical Africa
Acetyl-CoA
Many pathway Memberships
IDP (Inosine diphosphate)
Few pathway Memberships
Metabolic network
HIV / AIDS Medical Nonfiction / General Infectious Diseases
Africa - General Africa History
HIV / AIDS Medical Africa
S Medical al Infectious Diseases
Nonfiction / General Infectious Diseases
Acetyl-CoA
Many pathway Memberships
IDP (Inosine diphosphate)
Few pathway Memberships
Infectious Diseases
metadata network description N k⇥ community
PPI (Y2H) PPI network of S. cerevisiae
(Y2H) experiment [41] 1647 3.06 Set of each protein’s known functions (GO terms)a The number of GO terms PPI (AP/MS) Affinity purification mass spectrometry (AP/MS) experiment 1004 16.57 GO terms GO terms PPI (LC) Literature curated (LC) 1213 4.21 GO terms GO terms PPI (all) Union of Y2H, AP/MS, and LC PPI networks 2729 8.92 GO terms GO-terms Metabolic Metabolic network (metabolites connected by reactions) of E. coli 1042 16.81 Set of each metabolite’s pathway annotations (KEGG)b The number of KEGG pathway annotations Phone Social contacts between mobile phone users [44, 45, 46] 885989 6.34 Each user’s most likely geographic location Call activity (number of phone calls) Actor Film actors that appear in the same movies during 2000–2009 [47] 67411 8.90 Set of plot keywords for all of the actor’s films Length of career (year of first role) US Congress Congressmen who co-sponsor bills during the 108th US Congress [48, 49] 390 38.95 Political ideology, from the common space score [50, 51] Seniority (number
served) Philosopher Philosophers and their philosophical influences, from the English Wikipediac 1219 9.80 Set of (wikipedia) hyperlinks exiting in the philosopher’s page Number of wikipedia subject categories Word Assoc. English words that are often mentally associated [52] 5018 22.02 Set of each word’s senses, as documented by WordNetd Number of senses Amazon.com Products that users frequently buy together 18142 5.09e Set of each product’s user tags (annotations) Number of product categories
Finding community structure in very large
nity structure. Proceedings of the National Academy of Sciences 105, 1118–1123 (2008).
eny, I., Farkas, I. & Vicsek, T. Uncovering the overlapping community structure
Other networks Social networks Biological networks 885989 6.34 k N 1042 16.81 1647 3.06 1004 16.57 1213 4.21 2729 8.92 67411 8.90 390 38.95 1219 9.80 5018 22.02 18142 5.09 1 2 3 4 L C G I L C G I L C G I L C G I L C G I L C G I L C G I L C G I L C G I L C G I L C G I
Composite performance
Amazon.com Word Assoc. Philosopher US Congress Actor Phone PPI (all) PPI (LC) PPI (AP/MS) PPI (Y2H) Metabolic
L C G I – – – – Links Clique Percolation Greedy Modularity Infomap
Other networks Social networks Biological networks 885989 6.34 k N 1042 16.81 1647 3.06 1004 16.57 1213 4.21 2729 8.92 67411 8.90 390 38.95 1219 9.80 5018 22.02 18142 5.09 1 2 3 4 L C G I L C G I L C G I L C G I L C G I L C G I L C G I L C G I L C G I L C G I L C G I Amazon.com Word Assoc. Philosopher US Congress Actor Phone PPI (all) PPI (LC) PPI (AP/MS) PPI (Y2H) Metabolic
m mrm
n nsn.
pervasively overlapping network characterized by two degree distributions rm and sn these determine the fraction of elements that belong to m modules and fraction of modules that contain n elements with averages projection provides module and element networks respectively
failures occurs on the element network. before projection, elements fail with probability (1 - p) and are removed from the network we say that modules fail when fewer some critical fc of the nodes in the module remain failed modules are removed from the module network, but their elements remain in the element network
We wish to determine S(p), the fraction of remaining nodes within the giant component as a function of p, for both the element and module networks The giant component in the element network disappears when the network loses global connectivity. In the module network the giant connected component vanishes when the modules become uncoupled (non-
could we end up in a situation where the element network remains globally connected, but module network has under gone a percolation transition?
∞
m=0
∞
m=0
∞
n=0
∞
n=0
Likewise, the total number of elements that a randomly chosen neighbor of A is connected to is generated by G1(z) = f1(h(z)). (6) Before determining S, we first identify the critical point pc where the giant component
element exceeds the number one step away, or @zG0(G1(z))
(7) Substituting Eqs. (5) and (6) gives f 0
0(1)h0(1)[f 0 1(1)h0(1) 1] > 0 or f 0 1(1)h0(1) > 1. Finally,
the condition for a giant component to exist, since h0(1) = pg0
1(1), is
pf 0
1(1)g0 1(1) > 1.
(8) For the uniform case, rm = (m, µ) and sn = (n, ⌫), this gives p(µ 1)(⌫ 1) > 1. If µ = 3 and ⌫ = 3, then the transition occurs at pc = 1/4. To find S, consider the probability u for element A to not belong to the giant component. A is not a member of the giant component only if all of A’s neighbors are also not members, so u satisfies the self-consistency condition u = G1(u). The size of the giant component is then S = 1 G0(u).
1 Element network
Consider a randomly chosen element A that belongs to a group of size n. Let P(k|n) be the probability that A still belongs to a connected cluster of k nodes (including itself) in this group after failures occur: P(k|n) = ✓n − 1 k − 1 ◆ pk−1(1 − p)n−k. (2) The generating function for the number of other elements connected to A within this group is hn(z) =
n
X
k=1
P(k|n)zk−1 = (zp + 1 − p)n−1 . (3) Averaging over module size: h(z) = 1 ν
∞
X
n=0
nsnhn(z) = g1(zp + 1 − p). (4) The total number of elements that A is connected to, from all modules it belongs to, is then generated by G0(z) = f0(h(z)). (5)
2 Module network
Consider a random module C and then a random member element A. Let Q(`|m) be the probability that C is connected to ` modules, including itself, through element A, who was
Q(`|m) = ✓m 1 ` 1 ◆ q`1
1
(1 q1)m` , (9) where q1 = 1 ⌫
1
X
n=0
nsn
n
X
i=x
✓n 1 i 1 ◆ pi1(1 p)ni. (10) (Notice that q1 = 1 when x(n) ⌘ dnfce = 1 for all n.) The generating function jm for the number of modules that C is connected to, including itself, through A is jm(z) =
m
X
`=1
Q(`|m)z`1 = (zq1 + 1 q1)m1 . (11) Once again, averaging jm over memberships gives j(z) = 1 µ
1
X
m=0
mrmjm(z) = f1(zq1 + 1 q1). (12) The total number of modules that C is connected to is not generated by g0(j(z)) but by ˜ g0(j(z)), where the ˜ gi are the generating functions for module size after elements fail: ˜ g0(z) =
1
X
n=0
˜ snzn, ˜ g1(z) = P1
n=0 n˜
snzn1 P1
n=0 n˜
sn . (13) The probability ˜ sk to have k member elements remaining in a module after percolation is given by ˜ sk = P
n
n
k
P
n
Pn
k0=x
n
k0
(14) The denominator is necessary for normalization since we cannot observe modules with fewer than dnfce members. Notice that ˜ sn = sn when sn = δ(n, ν) and dnfce = n = ν. Finally, the total number of modules connected to C through any member elements is generated by F0(z) = ˜ g0(j(z)) and the total number of modules connected to a random neighbor of C is generated by F1(z) = ˜ g1(j(z)). As before, the module network has a giant component when ∂zF0(F1(z))|z=1 ∂zF0(z)|z=1 > 0 and S = 1F0(u) = 1 ˜ g0(j(u)), where u satisfies u = F1(u) = ˜ g1(j(u)). For the uniform case with µ = 3, ν = 3, and fc > 2/3, the critical point for the module network is pc = 1/2, a considerably higher threshold than for the element network (pc = 1/4). In Fig. 2 we show S for µ = 3 and ν = 6. The “robustness gap” between the element and module networks widens as the module failure cutoff increases, covering a significant range
Simulations 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1
fc = 12 fc = 23 fc = 56 fc = 1 Modules Theory
Elements
It is known that scale-free networks are robust to random failures when 2 < λ < 3 (meaning that pc → 0). (This result requires max value
to be large.)
0.2 0.4 0.6 0.8 0.05 0.1 0.15 0.2 0.25 0.3
p
λ = 2.5
0.2 0.4 0.6 0.8
S
λ = 3.0
0.2 0.4 0.6 0.8 1
λ = 3.5
N = 100 N = 500 N = 5000
Here we take rm = δ(m, μ) as before, but now sn ∼ n−λ, with λ ≥ 2 As we lower λ (increasing K), the elements become more robust (as expected), but the module network becomes less robust. For modular networks, it may not be feasible to build extremely large
N = max{n | sn > 0} only improves element robustness.
Figure 3: Robustness of scale-free networks. Here rm = δ(m, 3), sn ∼ n−λ, fc = 1/2, and N ≡ max{n | sn > 0}. Increasing N and decreasing λ, measures known to improve
0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8 Brain (Functional) 0.2 0.4 0.6 0.8 Metabolic 0.2 0.4 0.6 0.8 1 Protein-Protein Interaction 0.2 0.4 0.6 0.8 1 Word Association
fc = 0.7 fc = 0.6 fc = 0.5 elements
Collaborations Web Links
p S'
S’(p) the fraction of original nodes in the giant connected component Shaded regions provide a guide to the eye for the robustness gap (fc = 0.7).