http cs224w stanford edu networks of tightly networks of
play

http://cs224w.stanford.edu Networks of tightly Networks of tightly - PowerPoint PPT Presentation

CS224W: Social and Information Network Analysis Jure Leskovec Stanford University Jure Leskovec, Stanford University http://cs224w.stanford.edu Networks of tightly Networks of tightly connected groups Network communities: Sets of


  1. CS224W: Social and Information Network Analysis Jure Leskovec Stanford University Jure Leskovec, Stanford University http://cs224w.stanford.edu

  2.  Networks of tightly  Networks of tightly connected groups  Network communities:  Sets of nodes with lots of  Sets of nodes with lots of connections inside and few to outside (the rest few to outside (the rest of the network) Communities, clusters, , , groups, modules 11/3/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 2

  3. [Onnela et al. ‘07] Edge strengths (call volume) Edge betweenness in real network in real network 11/3/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 3

  4. [Girvan ‐ Newman PNAS ‘02]  Divisive hierarchical clustering based on edge b t betweenness: Number of shortest paths passing through the edge  Girvan Newman Algorithm:  Girvan ‐ Newman Algorithm:  Repeat until no edges are left:  Calculate betweenness of edges  Remove edges with highest betweenness  Connected components are communities  Gives a hierarchical decomposition of the network Gives a hierarchical decomposition of the network  Example: 11/3/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 4

  5. [Newman ‐ Girvan PhysRevE ‘03]  Zachary’s Karate club:  Zachary s Karate club: hierarchical decomposition 11/3/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 5

  6. [Newman ‐ Girvan PhysRevE ‘03] Communities in physics collaborations 11/3/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 6

  7.  Breath first search starting from A: t ti f A  Want to compute betweenness of paths starting at node A 11/3/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 7

  8.  Count the number of shortest paths from A to  Count the number of shortest paths from A to all other nodes of the network: 11/3/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 8

  9.  Compute betweenness by working up the tree:  Compute betweenness by working up the tree: If there are multiple paths count them fractionally • Repeat the BFS 1+1 paths to H Split evenly procedure for each node of the network • Add edge scores 1+0.5 paths to J Split 1:2 • Runtime (all pairs shortest path): Runtime (all pairs shortest path): ‐‐ Weighted graphs: O(N 3 ) 1 path to K ‐‐ Unweighted graphs: O(N 2 ) Split evenly 11/3/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 9

  10. Define modularity to be Define modularity to be Q = (number of edges within groups) – (expected number within groups) (expected number within groups) Actual number of edges between i and j is Expected number of edges between i and j is m…number of edges 11/3/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 10

  11.  Q = (number of edges within groups) –  Q = (number of edges within groups) (expected number within groups)  Then:  Then: m … number of edges     A ij … 1 if (i,j) is edge, else 0 k k 1        k i … degree of node i i j     Q Q A ( ( c , , c ) )     c i c i … group id of node i group id of node i ij ij i i j j     4 4 m  2 2 m   (a, b) … 1 if a=b, else 0 i , j  Modularity lies in the range [ − 1,1] y g [ , ]  It is positive if the number of edges within groups exceeds the expected number  0.3<Q<0.7 means significant community structure 11/3/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 11

  12.  Modularity is useful for selecting the  Modularity is useful for selecting the number of clusters: Why not optimize modularity directly? 11/3/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 12

  13.  Consider splitting the graph in two communities  Consider splitting the graph in two communities k k  Modularity Q is:   2 i j A y ij 2 m m i , j in same group  Or we can write in matrix form as  s … vector of group memberships s i ={+1, ‐ 1}  B … modularity matrix Note: each row (column) of B sums to 0 11/3/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 13

  14.  Task: Find s  { 1 +1} n that maximizes Q  Task: Find s  { ‐ 1,+1} that maximizes Q  Rewrite Q in terms of eigenvalues β i of B         n    2 2       T T T T T Q s  u u  s s u u s s u i i i i i i i i    i i i 1  To maximize Q, easiest way is to make s =  u 1  Assigns all weight in the sum to β 1 (largest eigval) A i ll i h i h β (l i l)  (all other s T u i terms zero because of orthonormality)  Unfortunately elements of s must be  1  Unfortunately, elements of s must be  1  In general, finding optimal s is NP ‐ hard 11/3/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 14

  15. 2          n n    2           T T Q Q s u s u i i i 1 i 1     i 1 i 1  Heuristic: try to maximize only the β 1 term β  Similar in spirit to the spectral partitioning p p p g algorithm (we will explore it next time)  Continue the bisection hierarchically 11/3/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 15

  16.  Fast Modularity Optimization Algorithm: Fast Modularity Optimization Algorithm:  Find leading eigenvector u 1 of modularity matrix B  Divide the nodes by the signs of the elements of u 1 y g 1  Repeat hierarchically until:  If a proposed split does not cause modularity to increase declare modularity to increase, declare community indivisible and do not split it  If all communities are indivisible, stop  How to find u 1 ? Power method! Bv  Iterative multiplication, normalization   k v v  1 k  Start with random v, until convergence: Bv k 11/3/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 16

  17.  Also, can combine with other methods: ,  Randomly divide the nodes into two groups  Move the node that, if moved, will increase Q the most  Repeat for all nodes, with each node only moved once epeat o a odes, t eac ode o y o ed o ce  Once complete, find intermediate state with highest Q  Start from this state and repeat until Q stops increasing  Good results for “fine ‐ tuning” the spectral method Good results for fine tuning the spectral method  CNM Algorithm (Clauset ‐ Newman ‐ Moore ‘04):  (1) Separate each vertex solely into n community (1) Separate each vertex solely into n community  (2) Calculate  Q for all possible community pairs  (3) Merge the pair of the largest increase in Q  Repeat (2)&(3) until one community remains Repeat (2)&(3) until one community remains  Cross cut the dendogram where Q is maximum 11/3/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 17

  18. Fast modularity Fast modularity GN = Girvan ‐ Newman, O(n 3 ) CNM = Greedy merging (n log 2 n) DA = External Optimization O(n 2 log 2 n)  Issues with modularity:  May not find communities with less than  m links  NP ‐ hard to optimize exactly [Brandes et al. ‘07] 11/3/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 18

  19. 11/3/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 19

  20. [Kumar et al. ‘99]  Searching for small communities  Searching for small communities in a Web graph  (1) The signature of a community/discussion  (1) The signature of a community/discussion in the context of a Web graph Intuition: a bunch of people all A dense 2 ‐ layer graph talking about the same things 11/3/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 20

  21.  (2) A more well defined problem:  (2) A more well ‐ defined problem: Enumerate complete bipartite subgraphs K s,t  Where K  Where K s,t = s nodes where each links to the same s nodes where each links to the same t other nodes 11/3/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 21

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend