lecture 17
play

Lecture 17 Jan-Willem van de Meent Community Detection Problem: Can - PowerPoint PPT Presentation

Unsupervised Machine Learning and Data Mining DS 5230 / DS 4420 - Fall 2018 Lecture 17 Jan-Willem van de Meent Community Detection Problem: Can we identify groups of densely connected nodes? (Adapted from: Mining of Massive Datasets,


  1. Unsupervised Machine Learning 
 and Data Mining DS 5230 / DS 4420 - Fall 2018 Lecture 17 Jan-Willem van de Meent

  2. Community Detection Problem: Can we identify groups of densely connected nodes? (Adapted from: Mining of Massive Datasets, http://www.mmds.org)

  3. Communities: Football Conferences Nodes: Football Teams, Edges: Matches, Communities: Conferences (Adapted from: Mining of Massive Datasets, http://www.mmds.org)

  4. Communities: Academic Citations Source: Citation networks and Maps of science [Börner et al., 2012] Nodes: Journals, Edges: Citations, Communities: Academic Disciplines (Adapted from: Mining of Massive Datasets, http://www.mmds.org)

  5. Communities: Protein-Protein Interactions Nodes: Proteins, Edges: Physical interactions, Communities: Functional Modules (Adapted from: Mining of Massive Datasets, http://www.mmds.org)

  6. Community Detection Graph Partitioning Overlapping Communities We will work with undirected (unweighted) networks (Adapted from: Mining of Massive Datasets, http://www.mmds.org)

  7. Centrality Measures HIGHEST 6 7 17 BETWEENNESS 16 15 CENTRALITY 8 9 1 2 3 4 5 14 10 11 HIGHEST 13 HI HEST DEGREE G S G 12 CLOSENESS CENTRALITY CENTRALITY (a) centrality illustration • Betweenness : Number of shortest paths • Closeness : Average distance to other nodes • Degree : Number of connections to other nodes

  8. Betweenness Edge Strength (call volume) Edge Betweenness • Betweenness : Number of shortest paths 
 passing through a node or edge (Adapted from: Mining of Massive Datasets, http://www.mmds.org)

  9. Edge Betweenness 5 12 4.5 A B D E 1 5 4 4.5 1.5 C G F 1.5 • Count number of shortest paths 
 passing through each edge 
 ( can be done with weighted edges ) • If there are multiple paths of equal 
 length, then split counts

  10. Girvan-Newman Algorithm (hierarchical divisive clustering according to betweenness) 12 1 33 49 Repeat until k clusters found 1. Calculate betweenness 2. Remove edge(s) with highest betweenness (Adapted from: Mining of Massive Datasets, http://www.mmds.org)

  11. Girvan-Newman Algorithm (hierarchical divisive clustering according to betweenness) Step Step Hierarchical network Step (Adapted from: Mining of Massive Datasets, http://www.mmds.org)

  12. Girvan-Newman: Physics Citations (Adapted from: Mining of Massive Datasets, http://www.mmds.org)

  13. Girvan-Newman Two problems 1. How can we compute the 
 betweenness for all edges? 2. How can we choose the 
 number of components k? (Adapted from: Mining of Massive Datasets, http://www.mmds.org)

  14. Calculating Betweenness How can we count all shortest paths? • Loop over nodes in graph • Perform breadth-first search to find 
 shortest paths to other nodes • Increment counts for edges traversed 
 by shorts paths • Divide final betweenness by 2 
 ( since all paths counted twice )

  15. Counting Shortest Paths 1 E E 4.5 1.5 1 1 4.5 D F 1.5 D F 3 0.5 0.5 1 3 2 B G B G 1 1 1 1 1 1 A C 1 A C Count number of Accumulate credit 
 shortest paths from 
 upwards, dividing 
 (E) to each node across shortest paths (Adapted from: Mining of Massive Datasets, http://www.mmds.org)

  16. Counting Paths: Larger Example Original Graph Breadth-first Ordering from A (Adapted from: Mining of Massive Datasets, http://www.mmds.org)

  17. Counting Paths: Larger Example Step 1. Count number of shortest paths from to each node (Adapted from: Mining of Massive Datasets, http://www.mmds.org)

  18. Counting Paths: Larger Example 1 path to K. Split in ratio 3:3 Step 2. Propagate credit upwards, splitting 
 according to number of paths to parents (Adapted from: Mining of Massive Datasets, http://www.mmds.org)

  19. Counting Paths: Larger Example 1+0.5 paths to J Split 1:2 1 path to K. Split in ratio 3:3 Step 2. Propagate credit upwards, splitting 
 according to number of paths to parents (Adapted from: Mining of Massive Datasets, http://www.mmds.org)

  20. Counting Paths: Larger Example 1+0.5 paths to J Split 1:2 1 path to K. Split in ratio 3:3 Step 2. Propagate credit upwards, splitting 
 according to number of paths to parents (Adapted from: Mining of Massive Datasets, http://www.mmds.org)

  21. Counting Paths: Larger Example 1+0.5 paths to J Split 1:2 1 path to K. Split in ratio 3:3 Step 2. Propagate credit upwards, splitting 
 according to number of paths to parents (Adapted from: Mining of Massive Datasets, http://www.mmds.org)

  22. Counting Paths: Larger Example 1+0.5 paths to J Split 1:2 1 path to K. Split in ratio 3:3 Step 2. Propagate credit upwards, splitting 
 according to number of paths to parents (Adapted from: Mining of Massive Datasets, http://www.mmds.org)

  23. Determining the Number of Communities Hierarchical decomposition Choosing a cut-off Analogous problem to deciding on number 
 of clusters in hierarchical clustering (Adapted from: Mining of Massive Datasets, http://www.mmds.org)

  24. Modularity Idea: Compare fraction of edges within module to fraction 
 that would be observed for random connections Adjacency Matrix Node Degree Node Assignment (Adapted from: Mining of Massive Datasets, http://www.mmds.org)

  25. Modularity Use modularity to optimize connectivity within modules (Adapted from: Mining of Massive Datasets, http://www.mmds.org)

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend