http://cs224w.stanford.edu Networks of tightly Networks of tightly - PowerPoint PPT Presentation

CS224W: Social and Information Network Analysis Jure Leskovec Stanford University Jure Leskovec, Stanford University http://cs224w.stanford.edu

 Networks of tightly  Networks of tightly connected groups  Network communities:  Sets of nodes with lots of  Sets of nodes with lots of connections inside and few to outside (the rest few to outside (the rest of the network) Communities, clusters, , , groups, modules 11/3/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 2

[Onnela et al. ‘07] Edge strengths (call volume) Edge betweenness in real network in real network 11/3/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 3

[Girvan ‐ Newman PNAS ‘02]  Divisive hierarchical clustering based on edge b t betweenness: Number of shortest paths passing through the edge  Girvan Newman Algorithm:  Girvan ‐ Newman Algorithm:  Repeat until no edges are left:  Calculate betweenness of edges  Remove edges with highest betweenness  Connected components are communities  Gives a hierarchical decomposition of the network Gives a hierarchical decomposition of the network  Example: 11/3/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 4

[Newman ‐ Girvan PhysRevE ‘03]  Zachary’s Karate club:  Zachary s Karate club: hierarchical decomposition 11/3/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 5

[Newman ‐ Girvan PhysRevE ‘03] Communities in physics collaborations 11/3/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 6

 Breath first search starting from A: t ti f A  Want to compute betweenness of paths starting at node A 11/3/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 7

 Count the number of shortest paths from A to  Count the number of shortest paths from A to all other nodes of the network: 11/3/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 8

 Compute betweenness by working up the tree:  Compute betweenness by working up the tree: If there are multiple paths count them fractionally • Repeat the BFS 1+1 paths to H Split evenly procedure for each node of the network • Add edge scores 1+0.5 paths to J Split 1:2 • Runtime (all pairs shortest path): Runtime (all pairs shortest path): ‐‐ Weighted graphs: O(N 3 ) 1 path to K ‐‐ Unweighted graphs: O(N 2 ) Split evenly 11/3/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 9

Define modularity to be Define modularity to be Q = (number of edges within groups) – (expected number within groups) (expected number within groups) Actual number of edges between i and j is Expected number of edges between i and j is m…number of edges 11/3/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 10

 Q = (number of edges within groups) –  Q = (number of edges within groups) (expected number within groups)  Then:  Then: m … number of edges     A ij … 1 if (i,j) is edge, else 0 k k 1        k i … degree of node i i j     Q Q A ( ( c , , c ) )     c i c i … group id of node i group id of node i ij ij i i j j     4 4 m  2 2 m   (a, b) … 1 if a=b, else 0 i , j  Modularity lies in the range [ − 1,1] y g [ , ]  It is positive if the number of edges within groups exceeds the expected number  0.3<Q<0.7 means significant community structure 11/3/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 11

 Modularity is useful for selecting the  Modularity is useful for selecting the number of clusters: Why not optimize modularity directly? 11/3/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 12

 Consider splitting the graph in two communities  Consider splitting the graph in two communities k k  Modularity Q is:   2 i j A y ij 2 m m i , j in same group  Or we can write in matrix form as  s … vector of group memberships s i ={+1, ‐ 1}  B … modularity matrix Note: each row (column) of B sums to 0 11/3/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 13

 Task: Find s  { 1 +1} n that maximizes Q  Task: Find s  { ‐ 1,+1} that maximizes Q  Rewrite Q in terms of eigenvalues β i of B         n    2 2       T T T T T Q s  u u  s s u u s s u i i i i i i i i    i i i 1  To maximize Q, easiest way is to make s =  u 1  Assigns all weight in the sum to β 1 (largest eigval) A i ll i h i h β (l i l)  (all other s T u i terms zero because of orthonormality)  Unfortunately elements of s must be  1  Unfortunately, elements of s must be  1  In general, finding optimal s is NP ‐ hard 11/3/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 14

2          n n    2           T T Q Q s u s u i i i 1 i 1     i 1 i 1  Heuristic: try to maximize only the β 1 term β  Similar in spirit to the spectral partitioning p p p g algorithm (we will explore it next time)  Continue the bisection hierarchically 11/3/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 15

 Fast Modularity Optimization Algorithm: Fast Modularity Optimization Algorithm:  Find leading eigenvector u 1 of modularity matrix B  Divide the nodes by the signs of the elements of u 1 y g 1  Repeat hierarchically until:  If a proposed split does not cause modularity to increase declare modularity to increase, declare community indivisible and do not split it  If all communities are indivisible, stop  How to find u 1 ? Power method! Bv  Iterative multiplication, normalization   k v v  1 k  Start with random v, until convergence: Bv k 11/3/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 16

 Also, can combine with other methods: ,  Randomly divide the nodes into two groups  Move the node that, if moved, will increase Q the most  Repeat for all nodes, with each node only moved once epeat o a odes, t eac ode o y o ed o ce  Once complete, find intermediate state with highest Q  Start from this state and repeat until Q stops increasing  Good results for “fine ‐ tuning” the spectral method Good results for fine tuning the spectral method  CNM Algorithm (Clauset ‐ Newman ‐ Moore ‘04):  (1) Separate each vertex solely into n community (1) Separate each vertex solely into n community  (2) Calculate  Q for all possible community pairs  (3) Merge the pair of the largest increase in Q  Repeat (2)&(3) until one community remains Repeat (2)&(3) until one community remains  Cross cut the dendogram where Q is maximum 11/3/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 17

Fast modularity Fast modularity GN = Girvan ‐ Newman, O(n 3 ) CNM = Greedy merging (n log 2 n) DA = External Optimization O(n 2 log 2 n)  Issues with modularity:  May not find communities with less than  m links  NP ‐ hard to optimize exactly [Brandes et al. ‘07] 11/3/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 18

11/3/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 19

[Kumar et al. ‘99]  Searching for small communities  Searching for small communities in a Web graph  (1) The signature of a community/discussion  (1) The signature of a community/discussion in the context of a Web graph Intuition: a bunch of people all A dense 2 ‐ layer graph talking about the same things 11/3/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 20

 (2) A more well defined problem:  (2) A more well ‐ defined problem: Enumerate complete bipartite subgraphs K s,t  Where K  Where K s,t = s nodes where each links to the same s nodes where each links to the same t other nodes 11/3/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 21

http://cs224w.stanford.edu Networks of tightly Networks of tightly - PowerPoint PPT Presentation

CS224W: Social and Information Network Analysis Jure Leskovec Stanford University Jure Leskovec, Stanford University http://cs224w.stanford.edu Networks of tightly Networks of tightly connected groups Network communities: Sets of

http://cs224w.stanford.edu October August 12/3/2013 Jure Leskovec, Stanford CS224W: Social and

http://cs224w.stanford.edu 10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information

http://cs224w.stanford.edu Course website: Course website: http://cs224w.stanford.edu

http://cs224w.stanford.edu ? ? ? ? Machine Learning ? Node classification 12/4/17 Jure

http://cs224w.stanford.edu Nodes Nodes Network Adjacency matrix 11/30/17 Jure Leskovec,

http://cs224w.stanford.edu 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information

http://cs224w.stanford.edu ? ? ? ? Machine Learning ? Node classification 10/15/19 Jure

http://cs224w.stanford.edu Output: Node embeddings. We can also embed larger network

http://cs224w.stanford.edu Spreading through networks: Spreading through networks:

http://cs224w.stanford.edu Evolving Networks are networks that change as a function of time

http://cs224w.stanford.edu Stanford Social Web (ca. 1999) network

http://cs224w.stanford.edu Subnetworks , or subgraphs, are the building blocks of networks:

http://cs224w.stanford.edu Non overlapping vs overlapping communities Non overlapping

http://cs224w.stanford.edu Teams of 2 3 students (1 is also ok) Teams of 2 3 students

http://cs224w.stanford.edu How to organize/navigate it? How to organize/navigate it?

http://cs224w.stanford.edu Probabilistic models of network contagion Probabilistic models

Key Reuse: Theory and Practice Kenny Paterson Royal Holloway, University of London based on

Keccak and the SHA-3 Standardization Guido Bertoni 1 Joan Daemen 1 Michal Peeters 2 Gilles Van

Final Rule Medicaid HCBS Disabled and Elderly Health Programs Group Center for Medicaid and CHIP

Clinical Quality Management Program Bal4more City Health Department

Requesting Research Identifiable Data for HCIA Awardees 02/19/2014 Presented by Faith Asper,

(FC FCEP) Brothers Keeper Init itia iati tive RFPGC16-013 Full proposals must be

Requirements March 8, 2016 Developmental Disabilities Division Overview New regulations

DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2018 // JACOB LOGAS L E C T U R E # 1 0 : L