http cs224w stanford edu non overlapping vs overlapping
play

http://cs224w.stanford.edu Non overlapping vs overlapping - PowerPoint PPT Presentation

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University Jure Leskovec Stanford University http://cs224w.stanford.edu Non overlapping vs overlapping communities Non overlapping vs. overlapping communities


  1. CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University Jure Leskovec Stanford University http://cs224w.stanford.edu

  2.  Non overlapping vs overlapping communities  Non ‐ overlapping vs. overlapping communities 11/10/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 2

  3. [Palla et al., ‘05]  A node belongs to many social circles  A node belongs to many social circles 11/10/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 3

  4. [Palla et al., ‘05]  Two nodes belong to the same community if they Two nodes belong to the same community if they can be connected through adjacent k ‐ cliques:  k ‐ clique:  Fully connected graph on k nodes 4-clique  Adjacent k ‐ cliques: Adjacent k cliques:  overlap in k-1 nodes  k ‐ clique community  Set of nodes that can adjacent be reached through a 3-cliques sequence of adjacent sequence of adjacent k ‐ cliques 11/10/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 4

  5. [Palla et al., ‘05]  Clique Percolation Method: Clique Percolation Method:  Find maximal ‐ cliques (not k ‐ cliques!)  Clique overlap matrix: q p  Each clique is a node  Connect two cliques if they overlap in at least k-1 nodes overlap in at least k 1 nodes  Communities:  Connected components of th the clique overlap matrix li l t i  How to set k ?  Set k so that we get the “richest” (most widely distributed cluster sizes) community structure 11/10/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 5

  6. [Palla et al., ‘05]  Start with graph g p and find maximal cliques  Create clique  Create clique overlap matrix (1) Graph (2) Clique overlap matrix  Threshold the matrix at value k ‐ 1  If a ij <k-1 set 0  Communities are  Communities are the connected components of the thresholded matrix thresholded matrix (3) Thresholded (3) Thresholded matrix at k=4 (4) Communities (connected components) 11/10/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 6

  7. [Palla et al., ‘07] Communities in a “tiny” part of a phone calls network of 4 ll t k f million users [Barabasi ‐ Palla, 2007] 11/10/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 7

  8.  Each node is a community  Each node is a community  Nodes are weighted for community size community size  Links are weighted for overlap size overlap size  DIP “core” data base of protein interactions (S. cerevisiase, yeast) ( y ) 11/10/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 8

  9.  No nice way NP hard combinatorial problem  No nice way, NP ‐ hard combinatorial problem  Simple Algorithm:  Start with max clique size s  Start with max ‐ clique size s  Choose node u , extract cliques of size s node cliques of size s node u is member of  Delete u and its edges Delete u and its edges  When graph is empty, s=s-1 , restart on original graph restart on original graph 11/10/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 9

  10. [Palla et al., ‘05]  Finding cliques around u of size s :  Finding cliques around u of size s :  2 sets A and B :  Each node in B links to all nodes in A  Each node in B links to all nodes in A  Set A grows by moving nodes from B to it  Start with A={u} B={v: (u v)  E} Start with A {u}, B {v: (u,v)  E}  Recursively move each possible v  B to A and prune B v  B to A and prune B  If B runs out of nodes before A reaches size s ,  backtrack the recursion and try a different v 11/10/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 10

  11.  Let’s rethink what we Let s rethink what we are doing…  Given a network  Want to find clusters!  Need to:  Formalize the notion of a cluster  Need to design an algorithm Need to design an algorithm that will find sets of nodes that are “good” clusters  More generally:  How to think about clusters in large networks? 11/10/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 11

  12. S What is a good cluster? What is a good cluster?  Many edges internally  Few pointing outside Few pointing outside S’ Formally, conductance: Where: A(S)….volume Small Φ (S) corresponds to good clusters 11/10/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 12

  13. [WWW ‘08]  Define: Network community profile ( NCP ) plot Plot the score of best community of size k k=5 k=7 log Φ (k) Φ (5)=0.25 Φ (7)=0.18 (7) Community size, log k 11/10/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 13

  14. [WWW ‘08]  Meshes grids dense random graphs:  Meshes, grids, dense random graphs: California road network d-dimensional meshes d dimensional meshes 11/10/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 14

  15. [WWW ‘08]  Collaborations between scientists in networks [Newman, 2005] log Φ (k) ductance, Cond Community size, log k 11/10/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 15

  16. [Internet Mathematics ‘09] Natural hypothesis about NCP: Natural hypothesis about NCP:  NCP of real networks slopes downward  Slope of the NCP corresponds to the dimensionality of the network What about large What about large networks? Examine more than 100 large networks 11/10/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 16

  17. [Internet Mathematics ‘09] Typical example: General Relativity collaborations Typical example: General Relativity collaborations ( n=4,158, m=13,422 ) 11/10/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 17

  18. [Internet Mathematics ‘09] 11/10/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 18

  19. [Internet Mathematics ‘09] B tt Better and better d b tt communities nce) nductan Communities get worse and worse k), (con Φ ( Best community has ~ 100 nodes k, (cluster size) 11/10/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 19

  20. [Internet Mathematics ‘09]  Each successive edge inside the  Each successive edge inside the community costs more cut ‐ edges NCP plot Φ =1/3 = 0.33 Φ /3 0 33 Φ =2/4 = 0 5 Φ =2/4 = 0.5 Φ =8/6 = 1.3 Φ =64/14 = 4.5 Each node has twice as many children 11/10/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 20

  21. [Internet Mathematics ‘09]  Empirically we note that best clusters (call them  Empirically we note that best clusters (call them whiskers ) are barely connected to the network If we remove whiskers.. How does NCP look like?  Core ‐ periphery structure 11/10/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 21

  22. [Internet Mathematics ‘09] Nothing happens!  Nestedness of the  Nestedness of the core ‐ periphery structure 11/10/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 22

  23. Denser and denser Denser and denser network core Small good communities Nested core ‐ periphery 11/10/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 23

  24. [Internet Mathematics ‘09] Practically Practically constant!  Each dot is a different network 11/10/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 24

  25. [Internet Mathematics ‘09] LiveJournal LiveJournal DBLP DBLP Rewired Network Ground truth Amazon IMDB 11/10/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 25

  26.  Some issues with community detection: Some issues with community detection:  Many different formalizations of clustering objective functions  Objectives are NP ‐ hard to optimize exactly  Methods can find clusters that are systematically “biased” biased  Methods can perform well/poorly on some kinds of graphs  Questions:  How well do algorithms optimize objectives?  What clusters do different methods find? 11/10/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 26

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend