IRDM ‘15/16
Jilles Vreeken
Chapter 8-2: Communit unity D Det etection
3 Dec 2015
Chapter 8-2: Communit unity D Det etection Jilles Vreeken IRDM - - PowerPoint PPT Presentation
Chapter 8-2: Communit unity D Det etection Jilles Vreeken IRDM 15/16 3 Dec 2015 IRDM Chapter 8, overview The basics 1. Properties of Graphs 2. Frequent Subgraphs 3. Graph Clustering 4. Youll find this covered in: Aggarwal,
IRDM ‘15/16
3 Dec 2015
IRDM ‘15/16
1.
The basics
2.
Properties of Graphs
3.
Frequent Subgraphs
4.
Graph Clustering
You’ll find this covered in: Aggarwal, Ch. 17, 19 Zaki & Meira, Ch. 4, 11, 16
VIII-2: 2
IRDM ‘15/16
1.
The basics
2.
Properties of Graphs
3.
Frequent Subgraphs
4.
Community Detection
You’ll find this covered in: Aggarwal, Ch. 17, 19 Zaki & Meira, Ch. 4, 11, 16
VIII-2: 3
IRDM ‘15/16
Aggarwal Ch. 19.3, 17.5 Zaki & Meira Ch. 16
VIII-2: 4
IRDM ‘15/16
IRDM ‘15/16
Searching for small communities in the Web graph What is the signature of a community in a Web graph?
intuition: Many people all talking about the same things
VIII-2: 6
… …
Dense 2-layer graph
Use this to define “topics”: What the same people on the left talk about on the right
IRDM ‘15/16
A more well-defined problem:
enumerate complete bipartite subgraphs 𝐿𝛽,𝛾
where 𝐿𝛽,𝛾 has 𝑡 nodes on the “left” and every such node in 𝑡 links to the same set of 𝑢 nodes on the “right”
VIII-2: 7
|𝑌| = 𝛽 = 3 |𝑍| = 𝛾 = 4
𝑌 𝑍
Fully connected
IRDM ‘15/16
Recall market basket analysis.
market: universe 𝑉 of 𝑜 items baskets: 𝑛 transctions, subsets of 𝑉: 𝑢1, 𝑢2, … , 𝑢𝑛 ⊆ 𝑉
where each 𝑢𝑗 is a set of items one person bought
support: frequency threshold 𝜏
Goal:
find all subsets 𝑌 ⊆ 𝑉 s.t. 𝑌 ⊆ 𝑢𝑗 of at least 𝜏 sets 𝑢𝑗 ∈ 𝑬
What’s the connection between itemsets and complete bipartite graphs?
VIII-2: 8
IRDM ‘15/16
Frequent itemsets = complete bipartite graphs! How?
view each node 𝑗 as a
set 𝑢𝑗 of the nodes 𝑗 points to 𝐿𝛽,𝛾 = a set 𝑍 of size 𝛾 that occurs in 𝛽 sets 𝑢𝑗
looking for 𝐿𝛽,𝛾 →
set frequency threshold to 𝛽 and look at layer 𝛾 , find all frequent sets of size 𝑢
VIII-2: 9
i b c d a
𝑢𝑗 = {𝑏, 𝑐, 𝑑, 𝑒}
j i k b c d a
𝑌 𝑍
𝛽 … minimum support (|𝑌| = 𝛽) 𝛾 … itemset size (|𝑍| = 𝛾)
IRDM ‘15/16
(Kumar et al ‘99) VIII-2: 10
i b c d a
𝑢𝑗 = {𝑏, 𝑐, 𝑑, 𝑒}
x y z b c a
𝑌 𝑍
2) Find frequent itemsets:
𝛽 … minimum support 𝛾 … itemset size x b c a
4) We found 𝐿𝛽,𝛾!
1) View each node 𝑗 as a
3) Say we find a frequent itemset
𝑌 = {𝑏, 𝑐, 𝑑} of supp 𝛽 This means, there are 𝛽 nodes that link to all of {𝑏, 𝑐, 𝑑}: z a b c y b c a
IRDM ‘15/16
Suppor
shol
{𝑐, 𝑒}: support 3 {𝑓, 𝑔}: support 2
i.e. we found 2 bipartite subgraphs:
VIII-2: 11
c a b d f Itemsets: 𝑏 = {𝑐, 𝑑, 𝑒} 𝑐 = {𝑒} 𝑑 = {𝑐, 𝑒, 𝑓, 𝑔} 𝑒 = {𝑓, 𝑔} 𝑓 = {𝑐, 𝑒} 𝑔 = {} e c a b d e c d f e
IRDM ‘15/16
Aggarwal Ch. 17.5, 19.3 Zaki & Meira Ch. 16
VIII-2: 12
IRDM ‘15/16
We can have data in graph form
e.g. the clusters of our
social networks
Or, we map existing data to a graph
data points become vertices add an edge if two data points are similar
edge weights can also tell about similarity
VIII-2: 13
IRDM ‘15/16
A sim imila ilarit ity matrix is an 𝑜-by- 𝑜 non-negative, symmetric matrix
the opposite of the distance matrix
Recall that a weighted adjacency matrix is an 𝑜-by-𝑜 non-negative, symmetric matrix
for weighted, undirected graphs
So, we can think every s sim imila ilarit ity m matrix ix as an adjacency matrix of some weighted, undirected graph
this graph will be complete (a clique)
Further, we can use any s sim imila ilarit ity m measure between two points as an edge weight
VIII-2: 14
IRDM ‘15/16
Using complete graphs can be a waste of resources
for clustering, we don’t really care about very dissimilar pairs
We can remove edges between dissimilar vertices
zero weight
Or, we adjust the weights to diminish dissimilar points
the Gaus
ussian n kernel l is popular for this 𝑥𝑗𝑗 = exp − 𝑦𝑗 − 𝑦𝑗
2
2𝜏2
VIII-2: 15
IRDM ‘15/16
How to decide when vertices are too dissimilar? In 𝝑-ne neighb hbour ur grap aphs s we add an edge between two vertices that are within distance 𝜗 to each other
usually the resulting graph is considered unweighted as all weights would
be roughly similar
In 𝑙-nearest st neighb hbour ur graphs we connect two vertices if
in mutual 𝑙-nearest ne
neighb hbour ur gra raph ph we only connect two vertices if they’re both in each other’s 𝑙 nearest neighbours
VIII-2: 16
IRDM ‘15/16
With 𝜗-graphs choosing the parameter is hard
no single cor
answer if different clusters have different internal similarities
𝑙-nearest neighbours can connect points with different similarities
but far-away high density regions become unconnected
The mutual 𝑙-nearest neighbours is somewhat in between
good for detecting clusters with different densities
General recommendation: start with 𝑙-NN
VIII-2: 17
IRDM ‘15/16
(Zaki & Meira, Fig 16.1) VIII-2: 18
IRDM ‘15/16
Undirected graph Bi-partitioning task:
divide vertices into two disjoint groups
Questions:
how can we define a “good partition of”? how can we efficiently identify such a partition?
VIII-2: 19
1 3 2 5 4 6 A B
1 3 2 5 4 6
IRDM ‘15/16
What makes a good partition?
maximize the number of within-group connections minimize the number of between-group connections
VIII-2: 20
1 3 2 5 4 6
IRDM ‘15/16
A cut cut of a connected graph 𝐻 = (𝑊, 𝐹) divides the set of vertices into two partitions 𝑇 and 𝑊 ∖ 𝑇 and removes the edges between them
cut can be expressed by giving the set 𝑇
𝐺 = { 𝑤, 𝑣 ∈ 𝐹 ∶ 𝑤, 𝑣 ∩ 𝑇 = 1}
A graph cut groups the vertices of a graph into two clusters
subsequent cuts in the components give us a hierarchical clustering
A 𝒍-way y cut cut cuts the graph into 𝑙 disjoint set of vertices 𝐷1, 𝐷2, … , 𝐷𝑙 and removes the edges between them
VIII-2: 21
IRDM ‘15/16
Not every cut will cut it In mi mini nimum mum c cut ut the goal is to find any set of vertices such that cutting them from the rest of the graph requires removing the least number of edges
least sum of weights for weighted graphs the extension to multiway cuts is straightforward
The minimum cut can be found in polynomial time
the max-flow min-cut theorem
But minimum cut isn’t very good for clustering purposes
VIII-2: 22
IRDM ‘15/16
The minimum cut usually removes only one vertex
not very appealing clustering we want to penalize the cut for imbalanced cluster sizes
In ratio io cu cut, the goal is to minimize the ratio of the weight of the edges in the cut set and the size of the clusters 𝐷𝑗
Let 𝑋 𝐵, 𝐶 = ∑
𝑥𝑗𝑗
𝑗∈𝐵,𝑗∈𝐶
wij is the weight of edge (i, j)
RatioCut = 𝑋 𝐷𝑗, 𝑊 ∖ 𝐷𝑗 𝐷𝑗
𝑙 𝑗=1
VIII-2: 23
IRDM ‘15/16
The vol
ume of a set of vertices 𝐵 is the weight of all edges connected to 𝐵
𝑤𝑤𝑤 𝐵 = 𝑋 𝐵, 𝑊 =
𝑗∈𝐵,𝑗∈𝑊
In normalize lized cu cut we measure the size of 𝐷𝑗 by 𝑤𝑤𝑤(𝐷𝑗) instead of |𝐷𝑗| NormalisedCut = 𝑋 𝐷𝑗, 𝑊 ∖ 𝐷𝑗 𝑤𝑤𝑤 𝐷𝑗
𝑙 𝑗=1
VIII-2: 24
IRDM ‘15/16
The vol
ume of a set of vertices 𝐵 is the weight of all edges connected to 𝐵
𝑤𝑤𝑤 𝐵 = 𝑋 𝐵, 𝑊 =
𝑗∈𝐵,𝑗∈𝑊
In normalize lized cu cut we measure the size of 𝐷𝑗 by 𝑤𝑤𝑤(𝐷𝑗) instead of 𝐷𝑗 NormalisedCut = 𝑋 𝐷𝑗, 𝑊 ∖ 𝐷𝑗 𝑤𝑤𝑤 𝐷𝑗
𝑙 𝑗=1
VIII-2: 25
IRDM ‘15/16
A: adjacency matrix of undirected G
𝑩𝑗𝑗 = 1 if is an edge, else 0
𝒚 is a vector in ℝ𝑜 with components (‘value groups’)
think of it as a label/value of each node
What is the meaning of 𝑩 ⋅ 𝒚?
𝑏11 ⋯ 𝑏1𝑜 ⋮ ⋱ ⋮ 𝑏𝑜1 ⋯ 𝑏𝑜𝑜 𝑦1 ⋮ 𝑦𝑜 = 𝑧1 ⋮ 𝑧𝑜
Entry 𝑧𝑗 is a sum of labels 𝑦𝑗 of neighbors of 𝑗
VIII-2: 26
𝑧𝑗 = 𝑩𝑗𝑗𝑦𝑗
𝑜 𝑗
= 𝒚𝑗
𝑗,𝑗 ∈𝐹
IRDM ‘15/16
𝑘𝑢𝑢 coordinate of 𝑩 ⋅ 𝒚
sum of the 𝒚-values
make this a new value
at node 𝒌
Spectral graph theory
analyse the spectrum
um of the matrix
the spectrum are the eigenvectors of a graph, ordered by the
magnitude (strength) of their corresponding eigenvalues Λ = 𝜇1, 𝜇2, … , 𝜇𝑜 with 𝜇1 ≤ 𝜇2 ≤ ⋯ ≤ 𝜇𝑜
VIII-2: 27
IRDM ‘15/16
Suppose all nodes in connected graph 𝐻 have degree 𝑒 What are some eigenvalues/vectors of 𝐻? 𝑩 ⋅ 𝒚 = 𝜇 ⋅ 𝒚 What is 𝜇? What is 𝒚?
let’s try: 𝒚 = (1,1, … , 1) then: 𝑩 ⋅ 𝒚 = 𝑒, 𝑒, … , 𝑒 = 𝜇 ⋅ 𝒚. So: 𝜇 = 𝑒
We found eigenpair of 𝐻: 𝒚 = 1,1, … , 1 , 𝜇 = 𝑒
VIII-2: 28
Remember the meaning of 𝒛 = 𝑩 ⋅ 𝒚: 𝑧𝑗 = 𝑩𝑗𝑗𝑦𝑗
𝑜 𝑗
= 𝒚𝑗
𝑗,𝑗 ∈𝐹
IRDM ‘15/16
What if 𝐻 is not connected?
𝐻 has 2 components, each 𝑒-regular
What are some eigenvectors?
𝒚 =
= put all 1s on 𝑫𝟐 and 0s on 𝑫𝟑 or vice versa
𝑦′ = (1, … , 1,0, … , 0) then 𝑩 ⋅ 𝑦′ = (d, … , d, 0, … , 0) 𝑦′′ = (0, … , 0,1, … , 0) then 𝑩 ⋅ 𝑦′′ = (0, … , 0, d, … , d) and so, in both cases the corresponding 𝜇 = 𝑒
A bit of intuition:
VIII-2: 29
𝐷1 𝐷2 𝐷1 𝐷2 𝐷1 𝐷2 𝜇𝑜 = 𝜇𝑜−1 𝜇𝑜 − 𝜇𝑜−1 ≈ 0
2nd largest eigenvalue 𝜇𝑜−1 now has value very close to 𝜇𝑜
IRDM ‘15/16
If the graph is connected (right) then we already know that 𝒚𝑜 = (1, … , 1) is an eigenvector Since eigenvectors are orthogonal, the components of 𝒚𝑜−1 sum to 0
why? Because 𝒚𝑜 ⋅ 𝒚𝑜−1 = ∑ 𝒚𝑜 𝑗 ⋅ 𝒚𝑜−1[𝑗]
𝑗
So, we can look at the eigenvectors of the 2𝑜𝑜 largest eigenvalue and declare nodes with positive label in 𝐷1 and negative label in 𝐷2 Still, lots to sort out.
VIII-2: 30
𝐷1 𝐷2 𝐷1 𝐷2 𝜇𝑜 = 𝜇𝑜−1 𝜇𝑜 − 𝜇𝑜−1 ≈ 0
2nd largest eigenvalue 𝜇𝑜−1 now has value very close to 𝜇𝑜
IRDM ‘15/16
The (weighted) adjacency matrix 𝑩 has the weight of edge (𝑗, 𝑘) at position 𝒃𝑗𝑗
𝒐×𝒐 matrix 𝑩 = [𝒃𝒋𝒌], 𝒃𝒋𝒌 = 𝟐 if edge between node i and j
Importan tant t proper erties es:
symmetric matrix eigenvectors are real and orthogonal
VIII-2: 31
1 3 2 5 4 6
1 2 3 4 5 6 1
1 1 1
2
1 1
3
1 1 1
4
1 1 1
5
1 1 1
6
1 1
IRDM ‘15/16
The degree m ee matri rix of a graph is a diagonal 𝑜-by-𝑜 matrix with the degree of vertex 𝑗 at position 𝚬𝑗𝑗 = 𝑒𝑗
𝚬𝑗𝑗 = 𝑒𝑗 = ∑ 𝑏𝑗𝑗
𝑗
= degree of node i
n× n diagonal matrix
VIII-2: 32
1 3 2 5 4 6
1 2 3 4 5 6 1
3
2
2
3
3
4
3
5
3
6
2
IRDM ‘15/16
The normal alized a d adjacency y matr trix x 𝑵 is the adjacency matrix where in every row 𝑗 all values are divided by 𝑒𝑗
every row sums up to 1 𝑵 = 𝚬−1𝑩
(p (pic ictur ure is is on va vacation)
VIII-2: 33
IRDM ‘15/16
The Laplac acian an m matrix x 𝑀 or Λ of a graph is the adjacency matrix subtracted from the degree matrix
𝒐× 𝒐 symmetric matrix
Important properties:
eigenvalu
alues are non-negative real numbers
ei
eigen genvectors are real and orthogonal
VIII-2: 34
1 3 2 5 4 6
1 2 3 4 5 6 1 3
2
2
3
3
4
3
5
3
6
2
IRDM ‘15/16
The Laplac acian an m matrix x 𝑀 or Λ of a graph is the adjacency matrix subtracted from the degree matrix
𝒐× 𝒐 symmetric matrix
𝑴 = 𝚳 = 𝚬 − 𝐁 = ∑ 𝑏1,𝑗
𝑗≠1
−𝑏1,2 … −𝑏1,𝑜 −𝑏2,1 ⋮ −𝑏𝑜,1 ∑ 𝑏2,𝑗
𝑗≠2
… −𝑏2,𝑜 ⋮ ⋱ ⋮ −𝑏𝑜,2 … ∑ 𝑏𝑜,𝑗
𝑗≠𝑜
The Laplacian is symmetric and positive semi-definite
(for undirected graphs) has 𝑜 real, non-negative, orthogonal eigenvalues
0 ≤ 𝜇1 ≤ 𝜇2 ≤ 𝜇3 ≤ ⋯ 𝜇𝑜
VIII-2: 35
IRDM ‘15/16
The norm rmalised sed, symm symmetric L Lapl placi cian n matr atrix x 𝑀𝑡 or Λs of a graph is defined as
𝚬−𝟐
𝟑𝑴𝚬−𝟐 𝟑 = 𝑱 − 𝚬−𝟐 𝟑𝐁𝚬−𝟐 𝟑 =
∑ 𝑏1,𝑘
𝑘≠1
𝑜1𝑜1
−
𝑏1,2 𝑜1𝑜2
… −
𝑏1,𝑜 𝑜1𝑜𝑜
−
𝑏2,1 𝑜2𝑜1
⋮ −
𝑏𝑜,1 𝑜𝑜𝑜1 ∑ 𝑏2,𝑘
𝑘≠2
𝑜2𝑜2
… −
𝑏2,𝑜 𝑜2𝑜𝑜
⋮ ⋱ ⋮ −
𝑏𝑜,2 𝑜𝑜𝑜2
…
∑ 𝑏𝑜,𝑘
𝑘≠𝑜
𝑜𝑜𝑜𝑜
and is also positive semi-definite
The norm rmalised sed, a , asymmet metri ric L c Laplacian n 𝑀𝑏 is 𝑀𝑏 = Δ−1𝑀
VIII-2: 36
IRDM ‘15/16
Recall that we can express a clustering using a binary cluster assignment matrix
each row has exactly one non-zero
Let the 𝑗-th column of this matrix be 𝒅𝑗
clusters are disjoint so 𝒅𝑗
𝑈𝒅𝑗 = 0
cluster has 𝒅𝑗
𝑈𝒅𝑗 = 𝒅𝑗 2 elements
We can get the 𝑤𝑤𝑤(𝐷𝑗) and 𝑋(𝐷𝑗, 𝑊) using 𝒅𝒋’s
𝑤𝑤𝑤 𝐷𝑗 = ∑
𝑒𝑗 = ∑ ∑ 𝒅𝑗𝑗𝚬𝑗𝑡𝒅𝑗𝑡
𝑜 𝑡 𝑜 𝑗=1
= 𝒅𝑗
𝑈𝚬𝒅𝑗 𝑗∈𝐷𝑗
𝑋 𝐷𝑗, 𝐷𝑗 = ∑
∑ 𝑏𝑗𝑡 = 𝒅𝑗
𝑈𝑩𝒅𝑗 𝑡∈𝐷𝑗 𝑗∈𝐷𝑗
𝑋 𝐷𝑗, 𝑊 ∖ 𝐷𝑗 = 𝑋 𝐷𝑗, 𝑊 − 𝑋 𝐷𝑗, 𝐷𝑗 = 𝒅𝑗
𝑈 𝚬 − 𝑩 𝒅𝑗 = 𝒅𝑗 𝑈𝑴𝒅𝑗
VIII-2: 37
IRDM ‘15/16
RatioCut = 𝑋 𝐷𝑗, 𝑊 ∖ 𝐷𝑗 𝐷𝑗
𝑙 𝑗=1
= 𝒅𝑗
𝑈𝑴𝒅𝑗
𝒅𝑗
2 𝑙 𝑗=1
NormalisedCut = 𝑋 𝐷𝑗, 𝑊 ∖ 𝐷𝑗 𝑤𝑤𝑤 𝐷𝑗
𝑙 𝑗=1
= 𝒅𝑗
𝑈𝑴𝒅𝑗
𝒅𝑗
𝑈𝚬𝒅𝑗 𝑙 𝑗=1
VIII-2: 38
IRDM ‘15/16
Fact: for symmetric matrix 𝑵:
𝜇2 = min
𝒚
𝒚𝑈𝑵𝒚 𝒚𝑈𝒚
What is the meaning of min 𝒚𝑈𝑴𝒚 on graph 𝐻?
𝒚𝑈𝑴𝒚 = 𝑴𝑗𝑗𝒚𝑗𝒚𝑗 = 𝑬𝑗𝑗 − 𝑩𝑗𝑗 𝒚𝑗𝒚𝑗
𝑜 𝑗𝑗 𝑜 𝑗𝑗
= 𝑬𝑗𝑗𝒚𝑗
2 𝑗
− 2𝒚𝑗𝒚𝑗
𝑗,𝑗 ∈𝐹
= (𝒚𝑗
2 + 𝒚𝑗 2 𝑗,𝑗 ∈𝐹
− 2𝒚𝑗𝒚𝑗)
VIII-2: 39
=
2 𝑗,𝑗 ∈𝐹
Node 𝑗 has degree 𝑒𝑗. So, value 𝒚𝑗
2 needs to be summed up 𝑒𝑗 times.
But each edge (𝑗, 𝑘) has two endpoints, so we need 𝒚𝑗
2 + 𝒚𝑗 2
IRDM ‘15/16
What else do we know about 𝑦?
𝒚 is a unit vector: ∑ 𝒚𝑗
2 = 1 𝑗
𝒚 is orthogonal to 1st eigenvector (1, … , 1) thus:
∑ 𝑦𝑗 ⋅ 1
𝑗
= ∑ 𝑦𝑗
𝑗
= 0
Remember 𝜇2 = min
all labelings
that ∑𝒚𝑗=0
∑ 𝒚𝑗 − 𝒚𝑗
2 𝑗,𝑗 ∈𝐹
∑ 𝒚𝑗
2 𝑗
We want to assign values 𝒚𝑗 to nodes 𝑗 such that few edges cross 0. (we want 𝒚𝑗 and 𝒚𝑗 to subtract eachother)
VIII-2: 40
IRDM ‘15/16
Back to finding the optimal cut Express partition (𝐷1, 𝐷2) as a vector
𝒅𝑗 = +1 if 𝑗 ∈ 𝐷1 −1 if 𝑗 ∈ 𝐷2
We can min inim imis ise the he cut of
he parti titi tion by finding a non-trivial vector 𝑦 that min inim imis ises
arg min
𝒅∈ −1,+1 𝑜 𝑔 𝒅 =
2 𝑗,𝑗 ∈𝐹
NP-hard… so, let’s relax!
let 𝒅𝑗’s take any real value
(Fiedler, 1973) VIII-2: 41
IRDM ‘15/16
min
𝒅∈ℝ𝑜 𝑔 𝒅 =
𝒅𝑗 − 𝒅𝑗
2 𝑗,𝑗 ∈𝐹
= 𝒅𝑈𝑴𝒅
𝜇2 = min
𝒅
𝑔(𝒅): : the minimum value of 𝑔 𝒅 is giv iven n by the e 2nd
nd smalle
lest eig igenvalue lue 𝝁𝟑 of the Lapla lacia ian n matrix 𝑴 𝒚 = arg min
𝒅
𝑔(𝒅): the optimal solution for 𝒅 is given by the corresponding eigenvector 𝒚 and is referred to as the Fiedl dler er vector
VIII-2: 42
IRDM ‘15/16
How to define a good
minimise a given graph cut criterion
How to efficiently identify such a partition?
approximate using information provided by the
eigenvalues and eigenvalues of a graph
Spectral clustering
VIII-2: 43
IRDM ‘15/16
Three basic stages
1.
pre-processing
construct a matrix representation of the graph
2.
decomposition
compute eigenvalues and eigenvectors of the matrix map each point to a lower-dimensional representation
based on one or more eigenvectors
3.
grouping
assign points to two or more clusters,
based on the new representation
VIII-2: 44
IRDM ‘15/16
1) 1) Pre-proc
ssing:
build Laplacian
matrix 𝑴 of the graph 2) 2) Decomp
find eigenvalues λ
and eigenvectors 𝒚
map vertices to
corresponding components of 𝜇2
VIII-2: 45
0.0
0.4
0.4 0.5 0.4
0.4
0.4 0.6 0.1
0.4 0.5
0.6 0.1 0.3 0.4 0.0 0.4
0.4 0.6 0.4
0.3 0.4 5.0 4.0 3.0 3.0 1.0 0.0
λ= X =
How do we now find the clusters?
6
5
4
0.3
3
0.6
2
0.3
1 1 2 3 4 5 6 1 3
2
2
3
3
4
3
5
3
6
2
IRDM ‘15/16
3) 3) Gr Group
sort components of reduced 1-dimensional vector identify clusters by splitting the sorted vector in two
How to choose a splitting point?
naïve approaches:
split at 0 or median value
more expensive approaches:
attempt to minimize normalized cut in 1-dimension (sweep over ordering of nodes induced by the eigenvector)
VIII-2: 46
6
5
4
0.3
3
0.6
2
0.3
1
Spli plit a at 0: 0: Clu luster A A: Positive points Clu luster B: B: Negative points
0.3
3
0.6
2
0.3
1
6
5
4
A B
IRDM ‘15/16
VIII-2: 47
Rank in x2 Value of x2
IRDM ‘15/16
VIII-2: 48
Rank in x2 Value of x2
Eigenvector corresponding to 𝜇2 is usefu ful, it shows communities!
IRDM ‘15/16
VIII-2: 49
Eigenvector corresponding to 𝜇1 is useles ess, it doesn’t show anything Eigenvector corresponding to 𝜇3 is useless ss by itself, but usefu ful when considered to togethe ther w with 𝝁𝟑
IRDM ‘15/16
How do we partition a graph into k clusters? There are two basic approaches
recursive bi
bi-partitioni ning ng (Hagen et al., ’92)
recursively apply a bi-partitioning algorithm in
a hierarchical divisive manner
inefficient, and unstable
cluster multip
iple eig igenv nvectors (Shi-Malik, ’00)
build a reduced space from multiple eigenvectors commonly used in recent papers a preferable approach…
VIII-2: 50
IRDM ‘15/16
Approximates the optimal cut
can be used to approximate optimal k-way normalized cut
Emphasizes cohesive clusters
increases the unevenness in the distribution of the data
associations between similar points are amplified, associations between dissimilar points are attenuated
the data begins to “approximate a clustering”
Well-separated space
transforms data to a new “embedded space”, consisting of k orthogonal basis vectors
Multiple eigenvectors prevent instability due to information loss
(Shi-Malik, 2000) VIII-2: 51
IRDM ‘15/16
Spectral clustering is not always a good approximation of the graph cuts
in so-called cockroach graphs, spectral clustering always cuts
horizontally, while vertically is optimal
approximation ratio of 𝑃(𝑜)
VIII-2: 52
v1 vk+1 v2k v2k+1 v3k vk v3k+1 v4k
Optimal Spectral
IRDM ‘15/16
To do the clustering, we need to move our real-valued eigenvectors 𝒗𝑗 to binary cluster indicator vectors First, create a matrix 𝑽 with 𝒗𝑗’s as its columns
Cluster the rows of this matrix using 𝑙-means
Solving the eigenvectors is 𝑃(𝑜3) in general or 𝑃(𝑜2) if the similarity graph has as many edges as vertices
the 𝑙-means on the 𝑽 matrix takes 𝑃(𝑢𝑜𝑙2)
𝑢 is the number of iterations in 𝑙-means
VIII-2: 53
IRDM ‘15/16
Allowing for real-valued cluster assignment vectors 𝑑𝑗 makes Relaxed RatioCut look like
𝐾𝑗𝑠 𝐷 = 𝒅𝑗
𝑈𝑴𝒅𝑗
𝒅𝑗
2 𝑙 𝑗=1
= 𝒅𝑗 𝒅𝑗
𝑈
𝑴 𝒅𝑗 𝒅𝑗
𝑙 𝑗=1
= 𝒗𝑗
𝑈𝑴𝒗𝑗 𝑙 𝑗=1
𝒗𝑗 =
𝒅𝑗 𝒅𝑗 i.e. the unit vector in the direction of 𝒅𝑗
VIII-2: 54
IRDM ‘15/16
We want to minimize the function 𝐾𝑗𝑠 over 𝒗𝑗’s
we have a constraint that 𝒗𝑗
𝑈𝒗𝑗 = 1
To solve, derive w.r.t. 𝒗𝑗’s and find the roots
add Lagrange multipliers to incorporate the constraints:
𝜀 𝜀𝒗𝑗 𝒗𝑗
𝑈𝑴𝒗𝑗 + 𝜇𝑗 1 − 𝒗𝑗 𝑈𝒗𝑗 𝑙 𝑗 𝑙 𝑗
= 𝟏
Hence, 𝑴𝒗𝑗 = 𝜇𝑗𝒗𝑗
𝒗𝑗 is an eigenvector of 𝑴 corresponding to the eigenvalue 𝜇𝑗
VIII-2: 55
IRDM ‘15/16
We know that 𝑴𝒗𝑗 = 𝜇i𝒗i
hence 𝜇𝑗 = 𝒗𝑗
𝑈𝑴𝒗𝑗
As we’re minimizing the sum of 𝒗𝑗
𝑈𝑴𝒗𝑗’s we should choose
the 𝒗𝑗’s corresponding to the 𝑙 smallest eigenvalues
these are our rela
laxed clu luster indic icators
But, we also know that 𝜇1 = 0 and that the corresponding eigenvector is (𝑜–1
2, 𝑜–1 2, … , 𝑜–1 2)
hmm, that doesn’t help with clustering...
VIII-2: 56
IRDM ‘15/16
For normalized cut similar procedure shows that we should select the 𝑙 smallest eigenvectors of 𝑴𝑡 instead of 𝑴
or, we can use the asymmetric Laplacian 𝑴𝑏
Which one we should choose?
both ratio and normalised cut aim at minimising intra-cluster similarity only normalised cut considers inter-cluster similarity → either 𝑴𝑡 or 𝑴𝑏
The asymmetric Laplacian is better
with symmetric one further normalisation is needed
VIII-2: 57
IRDM ‘15/16
Alg lgorithm hm SPECTRALCLUSTERING(connected graph 𝐻, 𝑙) : compute the similarity matrix 𝑩 ∈ ℝ𝑜×𝑜 for 𝐻 if if 𝑠𝑏𝑢𝑗𝑤 𝑑𝑣𝑢 then en 𝑪 ← 𝑴 else if if 𝑜𝑤𝑠𝑛𝑏𝑤𝑗𝑡𝑓𝑒 𝑑𝑣𝑢 then n 𝑪 ← 𝑴𝒕 or 𝑴𝒃 solve 𝑪𝒗𝑗 = 𝜇𝑗𝒗𝑗 for 𝑗 = 𝑙 + 1, where 𝜇2 ≤ 𝜇3 ≤ ⋯ ≤ 𝜇𝑙+1 𝑽 ← 𝒗𝑜 𝒗𝑜−1 … 𝒗𝑜−𝑙+1 𝒁 ← normalise rows of 𝑽 𝑫 ← 𝑫𝟐, … , 𝑫𝑙 via 𝑙-means on 𝒁
VIII-2: 58
IRDM ‘15/16
Frequent subgraph mining finds recurring patterns in graph data
enormously complex problem → exact algorithms can’t be fast but graphs are not usually very big even if there are many of them
Graph clustering is much like other clustering
any clusterable data can be turned into similarity graph spectral clustering uses well-known linear algebra though this doesn’t necessarily make it a good clustering algorithm
VIII-2: 59
IRDM ‘15/16
Frequent subgraph mining finds recurring patterns in graph data
enormously complex problem → exact algorithms can’t be fast but graphs are not usually very big even if there are many of them
Graph clustering is much like other clustering
any clusterable data can be turned into similarity graph spectral clustering uses well-known linear algebra though this doesn’t necessarily make it a good clustering algorithm
VIII-2: 60