Lecture 17 Jan-Willem van de Meent Community Detection Problem: Can - - PowerPoint PPT Presentation

lecture 17
SMART_READER_LITE
LIVE PREVIEW

Lecture 17 Jan-Willem van de Meent Community Detection Problem: Can - - PowerPoint PPT Presentation

Unsupervised Machine Learning and Data Mining DS 5230 / DS 4420 - Fall 2018 Lecture 17 Jan-Willem van de Meent Community Detection Problem: Can we identify groups of densely connected nodes? (Adapted from: Mining of Massive Datasets,


slide-1
SLIDE 1

Unsupervised Machine Learning 
 and Data Mining

DS 5230 / DS 4420 - Fall 2018

Lecture 17

Jan-Willem van de Meent

slide-2
SLIDE 2

Community Detection

(Adapted from: Mining of Massive Datasets, http://www.mmds.org)

Problem: Can we identify groups

  • f densely connected nodes?
slide-3
SLIDE 3

Communities: Football Conferences

(Adapted from: Mining of Massive Datasets, http://www.mmds.org)

Nodes: Football Teams, Edges: Matches, Communities: Conferences

slide-4
SLIDE 4

Communities: Academic Citations

Source: Citation networks and Maps of science [Börner et al., 2012] (Adapted from: Mining of Massive Datasets, http://www.mmds.org)

Nodes: Journals, Edges: Citations, Communities: Academic Disciplines

slide-5
SLIDE 5

Communities: Protein-Protein Interactions

(Adapted from: Mining of Massive Datasets, http://www.mmds.org)

Nodes: Proteins, Edges: Physical interactions, Communities: Functional Modules

slide-6
SLIDE 6

Community Detection

We will work with undirected (unweighted) networks

Graph Partitioning Overlapping Communities

(Adapted from: Mining of Massive Datasets, http://www.mmds.org)

slide-7
SLIDE 7

Centrality Measures

6 7 16 17 HIGHEST BETWEENNESS 1 3 2 4 5 8 9 14 15 CENTRALITY 10 11 13 G S G HIGHEST 12 HI HEST DEGREE CENTRALITY CLOSENESS CENTRALITY

centrality illustration (a)

  • Betweenness: Number of shortest paths
  • Closeness: Average distance to other nodes
  • Degree: Number of connections to other nodes
slide-8
SLIDE 8

Betweenness

Edge Strength (call volume) Edge Betweenness

  • Betweenness: Number of shortest paths


passing through a node or edge

(Adapted from: Mining of Massive Datasets, http://www.mmds.org)

slide-9
SLIDE 9

Edge Betweenness

A B D E G F C 5 12 4 1 5 4.5 1.5 1.5 4.5

  • Count number of shortest paths 


passing through each edge
 (can be done with weighted edges)

  • If there are multiple paths of equal 


length, then split counts

slide-10
SLIDE 10

Girvan-Newman Algorithm

Repeat until k clusters found

  • 1. Calculate betweenness
  • 2. Remove edge(s) with highest betweenness

(hierarchical divisive clustering according to betweenness)

49 33 12 1 (Adapted from: Mining of Massive Datasets, http://www.mmds.org)

slide-11
SLIDE 11

Girvan-Newman Algorithm

(hierarchical divisive clustering according to betweenness)

Step Step Step Hierarchical network

(Adapted from: Mining of Massive Datasets, http://www.mmds.org)

slide-12
SLIDE 12

Girvan-Newman: Physics Citations

(Adapted from: Mining of Massive Datasets, http://www.mmds.org)

slide-13
SLIDE 13

Girvan-Newman

(Adapted from: Mining of Massive Datasets, http://www.mmds.org)

Two problems

  • 1. How can we compute the 


betweenness for all edges?

  • 2. How can we choose the 


number of components k?

slide-14
SLIDE 14

Calculating Betweenness

How can we count all shortest paths?

  • Loop over nodes in graph
  • Perform breadth-first search to find 


shortest paths to other nodes

  • Increment counts for edges traversed


by shorts paths

  • Divide final betweenness by 2


(since all paths counted twice)

slide-15
SLIDE 15

Counting Shortest Paths

E D F B G A C 1 1 2 1 1 1 1 E D F B G A C 1 1 3 1 1 1

Count number of shortest paths from 
 (E) to each node

3 0.5 0.5

Accumulate credit
 upwards, dividing
 across shortest paths

4.5 1.5 4.5 1.5

(Adapted from: Mining of Massive Datasets, http://www.mmds.org)

slide-16
SLIDE 16

Counting Paths: Larger Example

(Adapted from: Mining of Massive Datasets, http://www.mmds.org)

Original Graph Breadth-first Ordering from A

slide-17
SLIDE 17

(Adapted from: Mining of Massive Datasets, http://www.mmds.org)

Step 1. Count number of shortest paths from to each node

Counting Paths: Larger Example

slide-18
SLIDE 18

(Adapted from: Mining of Massive Datasets, http://www.mmds.org)

Step 2. Propagate credit upwards, splitting 
 according to number of paths to parents

1 path to K. Split in ratio 3:3

Counting Paths: Larger Example

slide-19
SLIDE 19

(Adapted from: Mining of Massive Datasets, http://www.mmds.org)

1+0.5 paths to J Split 1:2 1 path to K. Split in ratio 3:3

Step 2. Propagate credit upwards, splitting 
 according to number of paths to parents

Counting Paths: Larger Example

slide-20
SLIDE 20

(Adapted from: Mining of Massive Datasets, http://www.mmds.org)

1+0.5 paths to J Split 1:2 1 path to K. Split in ratio 3:3

Step 2. Propagate credit upwards, splitting 
 according to number of paths to parents

Counting Paths: Larger Example

slide-21
SLIDE 21

(Adapted from: Mining of Massive Datasets, http://www.mmds.org)

1+0.5 paths to J Split 1:2 1 path to K. Split in ratio 3:3

Step 2. Propagate credit upwards, splitting 
 according to number of paths to parents

Counting Paths: Larger Example

slide-22
SLIDE 22

(Adapted from: Mining of Massive Datasets, http://www.mmds.org)

1+0.5 paths to J Split 1:2 1 path to K. Split in ratio 3:3

Step 2. Propagate credit upwards, splitting 
 according to number of paths to parents

Counting Paths: Larger Example

slide-23
SLIDE 23

Determining the Number of Communities

(Adapted from: Mining of Massive Datasets, http://www.mmds.org)

Hierarchical decomposition Choosing a cut-off

Analogous problem to deciding on number


  • f clusters in hierarchical clustering
slide-24
SLIDE 24

Modularity

(Adapted from: Mining of Massive Datasets, http://www.mmds.org)

Idea: Compare fraction of edges within module to fraction
 that would be observed for random connections Adjacency Matrix Node Degree Node Assignment

slide-25
SLIDE 25

Modularity

(Adapted from: Mining of Massive Datasets, http://www.mmds.org)

Use modularity to optimize connectivity within modules