personalized pagerank based community detection
play

Personalized PageRank based Community Detection Code - PowerPoint PPT Presentation

Personalized PageRank based Community Detection Code bit.ly/dgleich-codes Joint work with C. Seshadhri, David F. Gleich Joyce Jiyoung Whang, and Inderjit S. Dhillon, supported by Purdue University NSF CAREER 1149756-CCF Todays


  1. Personalized � PageRank based Community Detection Code bit.ly/dgleich-codes � Joint work with C. Seshadhri, David F. Gleich � Joyce Jiyoung Whang, and Inderjit S. Dhillon, supported by Purdue University � NSF CAREER 1149756-CCF

  2. Today’s talk 1. Personalized PageRank � based community detection 2. Conductance, Egonets, and � Network Community Profiles 3. Egonet seeding 4. Improved seeding 2 David Gleich · Purdue MLG2013

  3. A community is a set of vertices that is denser inside than out. 3 David Gleich · Purdue MLG2013

  4. 4 250 node GEOP network in 2 dimensions

  5. 5 250 node GEOP network in 2 dimensions

  6. We can find communities using Personalized PageRank (PPR) [Andersen et al. 2006] PPR is a Markov chain on nodes 1. with probability 𝛽 , � , � follow a random edge 2. with probability 1- 𝛽 , � , � restart at a seed aka random surfer aka random walk with restart unique stationary distribution 6 David Gleich · Purdue MLG2013

  7. Personalized PageRank community detection 1. Given a seed, approximate the stationary distribution. 2. Extract the community. Both are local operations. 7 David Gleich · Purdue MLG2013

  8. Demo! 8 David Gleich · Purdue MLG2013

  9. Conductance communities Conductance is one of the most important community scores [Schaeffer07] The conductance of a set of vertices is the ratio of edges leaving to total edges: (edges leaving the set) cut( S ) cut( S ) = 7 φ ( S ) = (total edges vol( S ), vol( ¯ � � min S ) vol( S ) = 33 in the set) vol( ¯ S ) = 11 Equivalently, it’s the probability that a random edge leaves the set. φ ( S ) = 7 / 11 Small conductance ó Good community 9 David Gleich · Purdue MLG2013

  10. Andersen- Informally Chung-Lang � Suppose the seeds are in a set personalized of good conductance, then the PageRank personalized PageRank method will find a set with conductance community that’s nearly as good. theorem � … also, it’s really fast. [Andersen et al. 2006] � 10 10 David Gleich · Purdue MLG2013

  11. � # G is graph as dictionary-of-sets � alpha=0.99 � tol=1e-4 � x = {} # Store x, r as dictionaries � r = {} # initialize residual � Q = collections.deque() # initialize queue � for s in seed: � r(s) = 1/len(seed) � Q.append(s) � while len(Q) > 0: � v = Q.popleft() # v has r[v] > tol*deg(v) � if v not in x: x[v] = 0. � x[v] += (1-alpha)*r[v] � mass = alpha*r[v]/(2*len(G[v])) � for u in G[v]: # for neighbors of u � if u not in r: r[u] = 0. � if r[u] < len(G[u])*tol and \ � r[u] + mass >= len(G[u])*tol: � Q.append(u) # add u to queue if large � r[u] = r[u] + mass � r[v] = mass*len(G[v]) � 11 11 David Gleich · Purdue MLG2013

  12. Demo 2! 12 12 David Gleich · Purdue MLG2013

  13. Problem 1, which seeds? 13 13 David Gleich · Purdue MLG2013

  14. Problem 2, not fast enough. 14 14 David Gleich · Purdue MLG2013

  15. Gleich-Seshadhri, KDD 2012 � Neighborhoods are good communities 15 15 David Gleich · Purdue MLG2013

  16. Gleich-Seshadhri, KDD 2012 � Egonets and Conductance Vertex conductance Neighborhoods are good communities Egonets? ^ ^ … in graphs that look like social and information networks 16 16 David Gleich · Purdue MLG2013

  17. Vertex neighborhoods or � Egonets The induced subgraph of � set a vertex its neighbors Prior research on egonets of social networks from the “structural holes” perspective [Burt95,Kleinberg08] . Used for anomaly detection [Akoglu10] , � community seeds [Huang11,Schaeffer11] , � overlapping communities [Schaeffer07,Rees10] . 17 17 David Gleich · Purdue MLG2013

  18. Simple version of theorem If global clustering coefficient = 1, then � the graph is a disjoint union of cliques. Vertex neighborhoods are optimal communities! 18 18 David Gleich · Purdue MLG2013

  19. Theorem Condition Let graph G have clustering coefficient 𝜆 and � α 1 n / d γ log probability have vertex degrees bounded � by a power-law function with α 2 n / d γ exponent 𝛿 less than 3. log degree Theorem Then there exists a vertex neighborhood with conductance ≤ 4(1 − κ ) / (3 − 2 κ ) 19 19 David Gleich · Purdue MLG2013

  20. Confession � The theory is weak This bound is useless φ ( S ) ≤ 4(1 − κ ) / (3 − 2 κ ) unless 𝜆 ≥ 1/2 ¯ Graph Verts Edges C κ ca-AstroPh 17903 196972 0.318 0.633 Collaboration email-Enron 33696 180811 0.085 0.509 cond-mat-2005 36458 171735 0.243 0.657 networks � arxiv 86376 517563 0.560 0.678 𝜆 ~ [0.1 – 0.5] dblp 226413 716460 0.383 0.635 hollywood-2009 1069126 56306653 0.310 0.766 fb-Penn94 41536 1362220 0.098 0.212 Social networks � fb-A-oneyear 1138557 4404989 0.038 0.060 𝜆 ~ [0.05 – 0.1] fb-A 3097165 23667394 0.048 0.097 soc-LiveJournal1 4843953 42845684 0.118 0.274 oregon2-010526 11461 32730 0.037 0.352 Tech. networks � p2p-Gnutella25 22663 54693 0.005 0.005 as-22july06 22963 48436 0.011 0.230 𝜆 ~ [0.005 – 0.05] itdk0304 190914 607610 0.061 0.158 20 20 David Gleich · Purdue MLG2013

  21. We view this theory as � “intuition for the truth” 21 21 David Gleich · Purdue MLG2013

  22. Empirical Evaluation using Network Community Profiles fb-A-oneyear 0 0 10 10 Approximate canonical shape Minimum − 1 − 1 10 10 found by Leskovec, Lang, conductance for Dasgupta, and − 2 − 2 10 10 any community of Mahoney the given size − 3 − 3 Holds for a variety 10 10 of approximations to conductance. max max − 4 − 4 10 10 deg deg 0 0 1 1 2 2 3 3 4 4 5 5 10 10 10 10 10 10 10 10 10 10 10 10 Community Size 22 22 David Gleich · Purdue MLG2013

  23. Empirical Evaluation using Network Community Profiles Facebook data from Wilson et al. 2009 fb-A-oneyear 1.1M verts, 4M edges 0 0 10 10 “Egonet community profile” shows Minimum − 1 − 1 10 10 the same conductance for shape, 3 secs to compute. − 2 − 2 10 10 any community The Fiedler neighborhood of − 3 − 3 community 10 10 the given size computed from the normalized max max − 4 − 4 10 10 Laplacian is a deg deg neighborhood! 0 0 1 1 2 2 3 3 4 4 5 5 10 10 10 10 10 10 10 10 10 10 10 10 Community Size � (Degree + 1) 23 23 David Gleich · Purdue MLG2013

  24. Not just one graph arXiv – 86k verts, 500k edges soc-LiveJournal – 5M verts, 42M edges 0 0 0 0 10 10 10 10 − 1 − 1 − 1 − 1 10 10 10 10 − 2 − 2 − 2 − 2 10 10 10 10 − 3 − 3 − 3 − 3 10 10 10 10 max max max max − 4 − 4 − 4 − 4 ver t s ver t s 10 10 10 10 deg deg deg deg 2 2 0 0 1 1 2 2 3 3 4 4 5 5 0 0 1 1 2 2 3 3 4 4 5 5 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 15 more graphs available www.cs.purdue.edu/~dgleich/codes/neighborhoods 24 24 David Gleich · Purdue MLG2013

  25. Filling in the � Network Community Profile fb-A-oneyear Facebook Sample - 1.1M verts, 4M edges 0 0 10 10 We are missing a region of the Minimum − 1 − 1 10 10 NCP when we just look at conductance for neighborhoods − 2 − 2 10 10 any community neighborhood of − 3 − 3 10 10 the given size max max − 4 − 4 10 10 deg deg 0 0 1 1 2 2 3 3 4 4 5 5 10 10 10 10 10 10 10 10 10 10 10 10 Community Size � (Degree + 1) 25 25 David Gleich · Purdue MLG2013

  26. Filling in the � Network Community Profile Facebook Sample - 1.1M verts, 4M edges fb-A-oneyear 0 0 10 10 This region fills when − 1 − 1 10 10 Minimum using the PPR method conductance for − 2 − 2 (like now!) 10 10 any community of the given size − 3 − 3 10 10 max max 7807 seconds − 4 − 4 10 10 deg deg 0 0 1 1 2 2 3 3 4 4 5 5 10 10 10 10 10 10 10 10 10 10 10 10 Community Size � 26 26 David Gleich · Purdue MLG2013

  27. Am I a good seed? � Locally Minimal Communities “My conductance is the best locally.” φ ( N ( v )) ≤ φ ( N ( w )) for all w adjacent to v In Zachary’s Karate Club network, there are four locally minimal communities, the two leaders and two peripheral nodes. 27 27 David Gleich · Purdue MLG2013

  28. Locally minimal communities capture extremal neighborhoods Facebook Sample - 1.1M verts, 4M edges fb-A-oneyear 0 0 10 10 Red dots are The red conductance � circles – the − 1 − 1 10 10 and size of a � best local mins – find locally minimal − 2 − 2 the extremes 10 10 in the egonet community profile. − 3 − 3 10 10 Usually about 1% max max − 4 − 4 of # of vertices. 10 10 deg deg 0 0 1 1 2 2 3 3 4 4 5 5 10 10 10 10 10 10 10 10 10 10 10 10 Community Size � 28 28 David Gleich · Purdue MLG2013

  29. Filling in the NCP � Growing locally minimal comm. Original fb-A-oneyear Egonet 0 0 10 10 PPR growing Full NCP only locally min − 1 − 1 10 10 Locally min communities, NCP seeded from − 2 − 2 10 10 entire egonet − 3 − 3 10 10 3 seconds max max − 4 − 4 283 seconds 10 10 deg deg 7807 seconds 0 0 1 1 2 2 3 3 4 4 5 5 10 10 10 10 10 10 10 10 10 10 10 10 Community Size � 29 29 David Gleich · Purdue MLG2013

  30. But there’s a small problem. Most people want to cover a network with communities! We just looked at the best. 30 30 David Gleich · Purdue MLG2013

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend