detecting community structure in networks
play

Detecting community structure in networks M.E.J. Newmans results 1 , - PowerPoint PPT Presentation

Detecting community structure in networks M.E.J. Newmans results 1 , 2 (presented by Botond Szabo) 1 Detecting community structure in networks (2004) 2 Finding community structure in networks using eigenvectors of matrices (2006) Statistics for


  1. Detecting community structure in networks M.E.J. Newman’s results 1 , 2 (presented by Botond Szabo) 1 Detecting community structure in networks (2004) 2 Finding community structure in networks using eigenvectors of matrices (2006) Statistics for Structures Seminar Amsterdam, 01. 04. 2015.

  2. Outline • Introduction • Bisection Algorithms • Spectral algorithm (Laplacian) • The Kernighan-Lin algorithm (greedy) • Modularity algorithm • Multisection Algorithms • Girvan and Newman algorithm • Generalized modularity algorithm • Conclusion

  3. Model Model: Grap G = ( V , E ) , with unweighted vertices V and undirected, unweighted edges E . Goal: Find communities: Examples: Social networks, biochemical networks, information networks (parallel computing)

  4. Spectral algorithm I. Definition : Laplacian L = D − A , where D is the diagonal matrix of vertex degrees and A is the adjacency matrix. Properties: • Since D i , i = � j A i , j the vector v 1 = ( 1 , 1 , .., 1 ) is an eigenvector of L with λ 1 = 0 eigenvalue. • All eigenvalues λ i are non-negative. • The # of zero eigenvalues gives the # of components. • In symmetric matrices the eigenvectors corresponding to different eigenvalues are orthogonal. • In connected graphs the eigenvectors contain both positive and negative components (except v 1 ).

  5. Spectral algorithm II. Application: Consider the problem of finding two communities in a connected graph. Goal: Minimize the cut size n R = 1 A i , j = 1 � 4 s T L s = � a 2 i λ i , 2 i , j in diffe- i = 1 rent groups where s i = ± 1 (group indicator), s = � n i = 1 a i v i .

  6. Spectral algorithm II. Application: Consider the problem of finding two communities in a connected graph. Goal: Minimize the cut size n R = 1 A i , j = 1 � 4 s T L s = � a 2 i λ i , 2 i , j in diffe- i = 1 rent groups where s i = ± 1 (group indicator), s = � n i = 1 a i v i . Problem: The minimum of R is taken in the trivial case s = ( 1 , 1 , ..., 1 ) .

  7. Spectral algorithm III. Solution: • Fix the size of the two groups ( n 1 , n 2 ). Then 1 s ) 2 = ( n 1 − n 2 ) 2 / n . a 2 1 = ( v T • Ideally s proportional to v 2 , but s i ∈ {− 1 , 1 } . • Choose s close to proportional to v 2 : � if v ( 2 ) + 1 ≥ 0 , i s i = (1) if v ( 2 ) − 1 < 0 . i • If # { v ( 2 ) ≥ 0 } > n 1 , then assign the smallest one to the other i group.

  8. Alternative spectral algorithm Approximate algorithm: No size control on communities, using ideas from above: � if v ( 2 ) + 1 ≥ 0 , i s i = (2) if v ( 2 ) − 1 < 0 . i Example: The karate club Runtime: O ( n 3 ) , for sparse Laplacian m / ( λ 3 − λ 2 ) .

  9. Alternative spectral algorithm Approximate algorithm: No size control on communities, using ideas from above: � if v ( 2 ) + 1 ≥ 0 , i s i = (2) if v ( 2 ) − 1 < 0 . i Example: The karate club Runtime: O ( n 3 ) , for sparse Laplacian m / ( λ 3 − λ 2 ) . Alternatively: Minimize the ratio cut R / ( n 1 n 2 ) , instead of R .

  10. Discussion of Spectral algorithms Problem: Satisfactory if the network does not divide up easily into groups but one has to do the best. However, they don’t reflect our intuitively concept of network communities.

  11. Kernighan-Lin algorithm Algorithm: • Assume that we know the community sizes | G 1 | , | G 2 | • Assign benefit function for every division: Q = # edges within − # edges between the two groups. • Stage 1: Maximize ∆ Q over all pairs i ∈ G 1 , j ∈ G 2 . • Then switch vertices and repeat until from one group all vertices have been swapped. • Stage 2: Choose in the preceding sequence the maximum Q . Runtime: worst case O ( n 2 ) . Example: Perfect match in the karate club.

  12. Modularity Problem: • We usually don’t know the size of the communities. • The number of edges between communities is smaller than expected.

  13. Modularity Problem: • We usually don’t know the size of the communities. • The number of edges between communities is smaller than expected. Definition: modularity - Benefit function (different, but related to before): Q = # edges within communities - expected # of such edges. Second term is rather vague. What do we mean under it?

  14. Modularity Problem: • We usually don’t know the size of the communities. • The number of edges between communities is smaller than expected. Definition: modularity - Benefit function (different, but related to before): Q = # edges within communities - expected # of such edges. Second term is rather vague. What do we mean under it? Null model: n vertices, P i , j the probability of an edge between i and j . Then Q = 1 � [ A i , j − P i , j ] δ ( g i , g j ) , 2 m i , j where g i denotes the community i belongs to.

  15. Choice of P i , j Condition 1: � � P i , j = A i , j = 2 m . i , j i , j Example: Bernoulli model P i , j = p , which has binomial degree distribution, not right skewed like most of real-world networks.

  16. Choice of P i , j Condition 1: � � P i , j = A i , j = 2 m . i , j i , j Example: Bernoulli model P i , j = p , which has binomial degree distribution, not right skewed like most of real-world networks. Condition 2: � � P i , j = A i , j =: k i j j which for entirely random edges leads to P i , j = k i k j 2 m . This is closely related to the configuration model (preferal attachment).

  17. Spectral optimization of modularity Assumption: we have two communities, but no fixed size. Definition: Modularity matrix • Rewrite modularity function Q = 1 4 m s T Bs = 1 � a 2 i β i , 4 m i where B=A-P and s = � n i = 1 a i u i ( β i is the eigenvalue corresponding to the eigenvector u i of B ) • There exists i , such that β i = 0 and v i = ( 1 , 1 , ..., 1 ) . • But there could be (and in practice are) both positive and negative eigenvalues.

  18. Spectral optimization of modularity II Solution: similarly to the spectral algorithm • Best would be to have s proportional to u 1 (with largest β 1 ). • But s i = ± 1. • Therefore take � if u ( 1 ) + 1 ≥ 0 , i s i = (3) if u ( 1 ) − 1 < 0 . i Runtime: O ( n 2 ) (by using Lanczos method or its variants).

  19. Example: Modularity

  20. Negative Eigenvalues Question: what information are stored in the negative eigenvalues?

  21. Negative Eigenvalues Question: what information are stored in the negative eigenvalues? Answer: “Anti-community structure”, i.e. numbers of edges within groups are smaller than expected. Procedure: • Minimize modularity: take s almost parallel to v n (corresponding β n ). � if u ( n ) + 1 ≥ 0 , i s i = (4) if u ( n ) − 1 < 0 . i • Refinement step: move single vertices between groups to minimize modularity.

  22. Negative Eigenvalues Question: what information are stored in the negative eigenvalues? Answer: “Anti-community structure”, i.e. numbers of edges within groups are smaller than expected. Procedure: • Minimize modularity: take s almost parallel to v n (corresponding β n ). � if u ( n ) + 1 ≥ 0 , i s i = (4) if u ( n ) − 1 < 0 . i • Refinement step: move single vertices between groups to minimize modularity. Other uses: • Network correlation: Adjacency vertices have similar properties. • Community centrality: How central vertices are in their community.

  23. Example: Anti-community structure

  24. Example: Community centrality

  25. Multiple communities Problem: In many real-world examples we don’t know the numbers of the communities.

  26. Multiple communities Problem: In many real-world examples we don’t know the numbers of the communities. Approach: Repeated division into two: not ideal.

  27. Girvan and Newman algorithm Idea: Remove edges from the networks, with high “betweenness score”, iteratively. Motivation: Few edges between communities are bottlenecks. Traffic has to travel through them.

  28. Girvan and Newman algorithm Idea: Remove edges from the networks, with high “betweenness score”, iteratively. Motivation: Few edges between communities are bottlenecks. Traffic has to travel through them. Algorithm • Edge betweennes: # of geodesic paths between vertex pairs containing the edge. • Remove edges with the highest betweennesses until no edges remains. • Progress represented in dendogram:

  29. Example: Girvan and Newman algorithm

  30. Girvan and Newman algorithm II. Problem: No guide how many communities to have.

  31. Girvan and Newman algorithm II. Problem: No guide how many communities to have. Solution: • Introduce again modularity: Q = fraction of edges within communities - expected value of the same quantity • If Q = 0 community structure is not stronger than by random chance. • Local peaks of Q during the algorithm indicates good divisions. Runtime: Slow O ( m 2 n ) or O ( n 3 ) .

  32. Girvan and Newman algorithm II. Problem: No guide how many communities to have. Solution: • Introduce again modularity: Q = fraction of edges within communities - expected value of the same quantity • If Q = 0 community structure is not stronger than by random chance. • Local peaks of Q during the algorithm indicates good divisions. Runtime: Slow O ( m 2 n ) or O ( n 3 ) . Extensions: • Monte Carlo estimate of betweennes Tyler at al. • Local measure of betweennes (short loops) O ( m 4 / n 2 ) Radachi et al.

  33. Modularity: multiple communities Shortcomings: two communities, using only leading eigenvector.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend