http cs224w stanford edu
play

http://cs224w.stanford.edu Nodes Nodes Network Adjacency matrix - PowerPoint PPT Presentation

CS224W: Analysis of Networks Jure Leskovec, Stanford University http://cs224w.stanford.edu Nodes Nodes Network Adjacency matrix 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 2 Non-overlapping


  1. CS224W: Analysis of Networks Jure Leskovec, Stanford University http://cs224w.stanford.edu

  2. Nodes Nodes Network Adjacency matrix 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 2

  3. ¡ Non-overlapping vs. overlapping communities 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 3

  4. [Palla et al., ‘05] ¡ A node can belong to many social “circles” 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 4

  5. High school Company Stanford (Basketball) Stanford (Squash) 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 5

  6. 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 6

  7. [Palla et al., ‘05] ¡ Two nodes belong to the same community if they can be connected through adjacent k -cliques: § k -clique: § Fully connected graph on k nodes 3-clique § Adjacent k -cliques: Adjacent Non-adjacent 3-cliques § overlap in k-1 nodes 3-cliques ¡ k -clique community § Set of nodes that can be reached through a sequence of adjacent k -cliques Two overlapping 3 -clique communities 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 7

  8. [Palla et al., ‘05] ¡ Two nodes belong to the same community if they can be connected through adjacent k - cliques: Adjacent 4-cliques 4-clique Communities for k=4 Non-adjacent 4-cliques 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 8

  9. Set: k=3 ¡ Clique Percolation Method: A § Find maximal-cliques § Def: Clique is maximal if B D no superset is a clique C § Clique overlap super-graph: § Each clique is a super-node Cliques Communities § Connect two cliques if they A overlap in at least k-1 nodes § Communities: B § Connected components of D the clique overlap matrix C ¡ How to set k ? § Set k so that we get the “richest” (most widely distributed cluster sizes) community structure 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 9

  10. Overlap ¡ Start with graph size Cliques ¡ Find maximal cliques ¡ Create clique overlap Cliques matrix 𝐵 § Rows/Cols are max- cliques, entry is number (1) Graph (2) Clique overlap of nodes in common matrix ¡ Threshold the matrix at value k-1 § If 𝑏 #$ < 𝑙 − 1 set 0 ¡ Communities are the connected components (3) Thresholded of the thresholded matrix at 3 matrix (4) Communities (connected components) 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 10

  11. [Palla et al., ‘07] Communities in a “tiny” part of a phone call network of 4 million users [Palla et al., ‘07] 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 11

  12. [Farkas et. al. 07] 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 12

  13. ¡ No nice way, hard combinatorial problem ¡ Maximal clique: Clique that can’t be extended § {𝑏, 𝑐, 𝑑} is a clique but not maximal clique § {𝑏, 𝑐, 𝑑, 𝑒} is maximal clique ¡ Algorithm: Sketch § Start with a seed node § Expand the clique around the seed § Once the clique cannot be further expanded we found the maximal clique § Note: § This method will generate the same clique multiple times 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 13

  14. ¡ Start with a seed vertex 𝒃 ¡ Goal: Find the max clique 𝑹 that 𝒃 belongs to § Observation: § If some 𝒚 belongs to 𝑹 then it is a neighbor of 𝒃 § Why? If 𝒃, 𝒚 ∈ 𝑹 but edge (𝒃, 𝒚) does not exist, 𝑹 is not a clique! ¡ Recursive algorithm: § 𝑹 … current clique § 𝑺 … candidate vertices to expand the clique to ¡ Example: Start with 𝒃 and expand around it Q= {a} {a,b} {a,b,c} bktrack {a,b,d} {c,d} ÇG (c)={} {c} ÇG (d)={} R= {b,c,d} {b,c,d} ÇG (b)={c,d} G (u)…neighbor set of u Steps of the recursive algorithm 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 14

  15. ¡ Start with a seed vertex 𝒃 ¡ Goal: Find the max clique 𝑹 that 𝒃 belongs to § Observation: § If some 𝒚 belongs to 𝑹 then it is a neighbor of 𝒃 § Why? If 𝒃, 𝒚 ∈ 𝑹 but edge (𝒃, 𝒚) does not exist, 𝑹 is not a clique! ¡ Recursive algorithm: § 𝑹 … current clique § 𝑺 … candidate vertices to expand the clique to ¡ Example: Start with 𝒃 and expand around it Q= {a} {a,b} {a,b,c} bktrack {a,b,d} {d} ÇG (c)={} {c} ÇG (d)={} R= {b,c,d} {b,c,d} ÇG (b)={c,d} G (u)…neighbor set of u Steps of the recursive algorithm 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 15

  16. § 𝑹 … current clique § 𝑺 … candidate vertices ¡ Expand(R,Q) § while R ≠ {} § p = vertex in R § Q p = Q È {p} § R p = R Ç G (p) § if R p ≠ {}: Expand(R p, Q p ) else: output Q p § R = R – {p} 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 16

  17. Start: Expand(V, {}) § 𝑹 … current clique R={a,…f}, Q={} p = {b} Q p = {b} § 𝑺 … candidate vertices R p = {a,c,d} ¡ Expand(R,Q) Expand(R p , Q): R = {a,c,d}, Q={b} p = {a} § while R ≠ {} Q p = {b,a} R p = {d} § p = vertex in R Expand(R p , Q): § Q p = Q È {p} R = {d}, Q={b,a} p = {d} § R p = R Ç G (p) Q p = {b,a,d} R p = {} : output {b,a,d} § if R p ≠ {}: Expand(R p, Q p ) p = {c} Q p = {b,c} else: output Q p R p = {d} § R = R – {p} Expand(R p , Q): R = {d}, Q={b,c} p = {d} Q p = {b,c,d} R p = {} : output {b,c,d} 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 17

  18. ¡ How to prevent maximal cliques from being generated multiple times? § Only output cliques that are lexicographically minimum § {𝒃, 𝒄, 𝒅} < {𝒄, 𝒃, 𝒅} § Even better: Only expand to the nodes higher in the lexicographical order 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 18

  19. ¡ How should we think about large scale organization of clusters in networks? § Finding: Community Structure 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 20

  20. ¡ How should we think about large scale organization of clusters in networks? § Finding: Core-periphery structure Nested Core-Periphery 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 21

  21. ¡ How do we reconcile these two views? (and still do community detection) vs. Community structure Core-periphery 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 22

  22. ¡ How community-like is a set of nodes? ¡ A good cluster S has S § Many edges internally § Few edges pointing outside ¡ What’s a good metric: S’ Conductance Î Î Ï | {( i , j ) E ; i S , j S } | f = ( S ) å d s Î s S Small conductance corresponds to good clusters Note: We are assuming |𝑇| < |𝑊|/2 , d s degree of node s 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 23

  23. [WWW ‘08] (Note |S| < |V|/2) ¡ Define: Network community profile ( NCP ) plot Plot the score of best community of size k k=5 k=7 k=10 log Φ(k) Community size, log k 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 24

  24. Cluster score, log Φ (k) • Run the favorite clustering method(s) • Each dot represents a cluster • For each size 𝑙 find “best” cluster (min Φ (k) ) Spectral Graclus Metis Cluster size, log k 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 25

  25. [WWW ‘08] ¡ Meshes, grids, dense random graphs: California road network d-dimensional meshes 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 26

  26. [WWW ‘08] ¡ Collaborations between scientists in networks [Newman, 2005] Conductance, log Φ(k) Community size, log k Dips in the conductance graph correspond to the "good" clusters we can visually detect 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 27

  27. [Internet Mathematics ‘09] Natural hypothesis about NCP: ¡ NCP of real networks slopes downward ¡ Slope of the NCP corresponds to the “dimensionality“ of the network What about large networks? 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 28

  28. [Internet Mathematics ‘09] Typical example: General Relativity collaborations ( n=4,158, m=13,422 ) 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 29

  29. [Internet Mathematics ‘09] -- Rewired graph -- Real graph 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 30

  30. Better and better clusters Φ(k), (score) Clusters get worse and worse Best cluster has ~100 nodes k, (cluster size) 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 31

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend