Chapter 8-2: Communit unity D Det etection Jilles Vreeken IRDM - - PowerPoint PPT Presentation

chapter 8 2 communit unity d det etection
SMART_READER_LITE
LIVE PREVIEW

Chapter 8-2: Communit unity D Det etection Jilles Vreeken IRDM - - PowerPoint PPT Presentation

Chapter 8-2: Communit unity D Det etection Jilles Vreeken IRDM 15/16 3 Dec 2015 IRDM Chapter 8, overview The basics 1. Properties of Graphs 2. Frequent Subgraphs 3. Graph Clustering 4. Youll find this covered in: Aggarwal,


slide-1
SLIDE 1

IRDM ‘15/16

Jilles Vreeken

Chapter 8-2: Communit unity D Det etection

3 Dec 2015

slide-2
SLIDE 2

IRDM ‘15/16

IRDM Chapter 8, overview

1.

The basics

2.

Properties of Graphs

3.

Frequent Subgraphs

4.

Graph Clustering

You’ll find this covered in: Aggarwal, Ch. 17, 19 Zaki & Meira, Ch. 4, 11, 16

VIII-2: 2

slide-3
SLIDE 3

IRDM ‘15/16

IRDM Chapter 8, today

1.

The basics

2.

Properties of Graphs

3.

Frequent Subgraphs

4.

Community Detection

You’ll find this covered in: Aggarwal, Ch. 17, 19 Zaki & Meira, Ch. 4, 11, 16

VIII-2: 3

slide-4
SLIDE 4

IRDM ‘15/16

Chapter 7.4:

Communi unity Det Detec ection

Aggarwal Ch. 19.3, 17.5 Zaki & Meira Ch. 16

VIII-2: 4

slide-5
SLIDE 5

IRDM ‘15/16

Chapter 7.4.1:

Det Detec ecting Sm Smal all Communi unities es

  • VIII-2: 5
slide-6
SLIDE 6

IRDM ‘15/16

Trawling

Searching for small communities in the Web graph What is the signature of a community in a Web graph?

 intuition: Many people all talking about the same things

VIII-2: 6

… …

Dense 2-layer graph

Use this to define “topics”: What the same people on the left talk about on the right

slide-7
SLIDE 7

IRDM ‘15/16

Searching for small communities

A more well-defined problem:

 enumerate complete bipartite subgraphs 𝐿𝛽,𝛾

where 𝐿𝛽,𝛾 has 𝑡 nodes on the “left” and every such node in 𝑡 links to the same set of 𝑢 nodes on the “right”

VIII-2: 7

𝐿3,4

|𝑌| = 𝛽 = 3 |𝑍| = 𝛾 = 4

𝑌 𝑍

Fully connected

slide-8
SLIDE 8

IRDM ‘15/16

Frequent itemset mining

Recall market basket analysis.

 market: universe 𝑉 of 𝑜 items  baskets: 𝑛 transctions, subsets of 𝑉: 𝑢1, 𝑢2, … , 𝑢𝑛 ⊆ 𝑉

where each 𝑢𝑗 is a set of items one person bought

 support: frequency threshold 𝜏

Goal:

 find all subsets 𝑌 ⊆ 𝑉 s.t. 𝑌 ⊆ 𝑢𝑗 of at least 𝜏 sets 𝑢𝑗 ∈ 𝑬

What’s the connection between itemsets and complete bipartite graphs?

VIII-2: 8

slide-9
SLIDE 9

IRDM ‘15/16

From itemsets to bipartite 𝐿𝛽,𝛾

Frequent itemsets = complete bipartite graphs! How?

 view each node 𝑗 as a

set 𝑢𝑗 of the nodes 𝑗 points to 𝐿𝛽,𝛾 = a set 𝑍 of size 𝛾 that occurs in 𝛽 sets 𝑢𝑗

 looking for 𝐿𝛽,𝛾 →

set frequency threshold to 𝛽 and look at layer 𝛾 , find all frequent sets of size 𝑢

VIII-2: 9

i b c d a

𝑢𝑗 = {𝑏, 𝑐, 𝑑, 𝑒}

j i k b c d a

𝑌 𝑍

𝛽 … minimum support (|𝑌| = 𝛽) 𝛾 … itemset size (|𝑍| = 𝛾)

slide-10
SLIDE 10

IRDM ‘15/16

From itemsets to bipartite 𝐿𝛽,𝛾

(Kumar et al ‘99) VIII-2: 10

i b c d a

𝑢𝑗 = {𝑏, 𝑐, 𝑑, 𝑒}

x y z b c a

𝑌 𝑍

2) Find frequent itemsets:

𝛽 … minimum support 𝛾 … itemset size x b c a

4) We found 𝐿𝛽,𝛾!

𝐿𝛽,𝛾 = a set 𝑍 of size 𝛾 that occurs in 𝛽 sets 𝑢𝑗

1) View each node 𝑗 as a

set 𝑢𝑗 of nodes 𝑗 points to

3) Say we find a frequent itemset

𝑌 = {𝑏, 𝑐, 𝑑} of supp 𝛽 This means, there are 𝛽 nodes that link to all of {𝑏, 𝑐, 𝑑}: z a b c y b c a

slide-11
SLIDE 11

IRDM ‘15/16

Example

Suppor

  • rt thresh

shol

  • ld 𝛽 = 𝜏 = 2

 {𝑐, 𝑒}: support 3  {𝑓, 𝑔}: support 2

 i.e. we found 2 bipartite subgraphs:

VIII-2: 11

c a b d f Itemsets: 𝑏 = {𝑐, 𝑑, 𝑒} 𝑐 = {𝑒} 𝑑 = {𝑐, 𝑒, 𝑓, 𝑔} 𝑒 = {𝑓, 𝑔} 𝑓 = {𝑐, 𝑒} 𝑔 = {} e c a b d e c d f e

slide-12
SLIDE 12

IRDM ‘15/16

Chapter 7.4.2:

Communi unity Det Detec ection by y Gra Graph C h Clust lustering

Aggarwal Ch. 17.5, 19.3 Zaki & Meira Ch. 16

VIII-2: 12

slide-13
SLIDE 13

IRDM ‘15/16

Where do graphs come from?

We can have data in graph form

 e.g. the clusters of our

social networks

Or, we map existing data to a graph

 data points become vertices  add an edge if two data points are similar

 edge weights can also tell about similarity

VIII-2: 13

slide-14
SLIDE 14

IRDM ‘15/16

Similarity and adjacency matrices

A sim imila ilarit ity matrix is an 𝑜-by- 𝑜 non-negative, symmetric matrix

 the opposite of the distance matrix

Recall that a weighted adjacency matrix is an 𝑜-by-𝑜 non-negative, symmetric matrix

 for weighted, undirected graphs

So, we can think every s sim imila ilarit ity m matrix ix as an adjacency matrix of some weighted, undirected graph

 this graph will be complete (a clique)

Further, we can use any s sim imila ilarit ity m measure between two points as an edge weight

VIII-2: 14

slide-15
SLIDE 15

IRDM ‘15/16

Getting non-complete graphs

Using complete graphs can be a waste of resources

 for clustering, we don’t really care about very dissimilar pairs

We can remove edges between dissimilar vertices

 zero weight

Or, we adjust the weights to diminish dissimilar points

 the Gaus

ussian n kernel l is popular for this 𝑥𝑗𝑗 = exp − 𝑦𝑗 − 𝑦𝑗

2

2𝜏2

VIII-2: 15

slide-16
SLIDE 16

IRDM ‘15/16

Getting non-complete graphs (2)

How to decide when vertices are too dissimilar? In 𝝑-ne neighb hbour ur grap aphs s we add an edge between two vertices that are within distance 𝜗 to each other

 usually the resulting graph is considered unweighted as all weights would

be roughly similar

In 𝑙-nearest st neighb hbour ur graphs we connect two vertices if

  • ne is within the 𝑙 nearest neighbours of the other

 in mutual 𝑙-nearest ne

neighb hbour ur gra raph ph we only connect two vertices if they’re both in each other’s 𝑙 nearest neighbours

VIII-2: 16

slide-17
SLIDE 17

IRDM ‘15/16

Which similarity graph?

With 𝜗-graphs choosing the parameter is hard

no single cor

  • rrect an

answer if different clusters have different internal similarities

𝑙-nearest neighbours can connect points with different similarities

but far-away high density regions become unconnected

The mutual 𝑙-nearest neighbours is somewhat in between

good for detecting clusters with different densities

General recommendation: start with 𝑙-NN

  • thers if data supports that

VIII-2: 17

slide-18
SLIDE 18

IRDM ‘15/16

Example graph

(Zaki & Meira, Fig 16.1) VIII-2: 18

slide-19
SLIDE 19

IRDM ‘15/16

Graph partitioning

Undirected graph Bi-partitioning task:

 divide vertices into two disjoint groups

Questions:

 how can we define a “good partition of”?  how can we efficiently identify such a partition?

VIII-2: 19

1 3 2 5 4 6 A B

1 3 2 5 4 6

slide-20
SLIDE 20

IRDM ‘15/16

Graph partitioning

What makes a good partition?

 maximize the number of within-group connections  minimize the number of between-group connections

VIII-2: 20

1 3 2 5 4 6

A B

slide-21
SLIDE 21

IRDM ‘15/16

Clustering as Graph Cuts

A cut cut of a connected graph 𝐻 = (𝑊, 𝐹) divides the set of vertices into two partitions 𝑇 and 𝑊 ∖ 𝑇 and removes the edges between them

cut can be expressed by giving the set 𝑇

  • r by giving the cut set, i.e. edges with exactly one end in 𝑇,

𝐺 = { 𝑤, 𝑣 ∈ 𝐹 ∶ 𝑤, 𝑣 ∩ 𝑇 = 1}

A graph cut groups the vertices of a graph into two clusters

subsequent cuts in the components give us a hierarchical clustering

A 𝒍-way y cut cut cuts the graph into 𝑙 disjoint set of vertices 𝐷1, 𝐷2, … , 𝐷𝑙 and removes the edges between them

VIII-2: 21

slide-22
SLIDE 22

IRDM ‘15/16

What is a good cut?

Not every cut will cut it In mi mini nimum mum c cut ut the goal is to find any set of vertices such that cutting them from the rest of the graph requires removing the least number of edges

 least sum of weights for weighted graphs  the extension to multiway cuts is straightforward

The minimum cut can be found in polynomial time

 the max-flow min-cut theorem

But minimum cut isn’t very good for clustering purposes

VIII-2: 22

slide-23
SLIDE 23

IRDM ‘15/16

Cuts that cut it

The minimum cut usually removes only one vertex

 not very appealing clustering  we want to penalize the cut for imbalanced cluster sizes

In ratio io cu cut, the goal is to minimize the ratio of the weight of the edges in the cut set and the size of the clusters 𝐷𝑗

 Let 𝑋 𝐵, 𝐶 = ∑

𝑥𝑗𝑗

𝑗∈𝐵,𝑗∈𝐶

 wij is the weight of edge (i, j)

RatioCut = 𝑋 𝐷𝑗, 𝑊 ∖ 𝐷𝑗 𝐷𝑗

𝑙 𝑗=1

VIII-2: 23

slide-24
SLIDE 24

IRDM ‘15/16

Cuts that cut it

The vol

  • lume

ume of a set of vertices 𝐵 is the weight of all edges connected to 𝐵

𝑤𝑤𝑤 𝐵 = 𝑋 𝐵, 𝑊 =

  • 𝑥𝑗𝑗

𝑗∈𝐵,𝑗∈𝑊

In normalize lized cu cut we measure the size of 𝐷𝑗 by 𝑤𝑤𝑤(𝐷𝑗) instead of |𝐷𝑗| NormalisedCut = 𝑋 𝐷𝑗, 𝑊 ∖ 𝐷𝑗 𝑤𝑤𝑤 𝐷𝑗

𝑙 𝑗=1

VIII-2: 24

slide-25
SLIDE 25

IRDM ‘15/16

Cuts that cut it

The vol

  • lume

ume of a set of vertices 𝐵 is the weight of all edges connected to 𝐵

𝑤𝑤𝑤 𝐵 = 𝑋 𝐵, 𝑊 =

  • 𝑥𝑗𝑗

𝑗∈𝐵,𝑗∈𝑊

In normalize lized cu cut we measure the size of 𝐷𝑗 by 𝑤𝑤𝑤(𝐷𝑗) instead of 𝐷𝑗 NormalisedCut = 𝑋 𝐷𝑗, 𝑊 ∖ 𝐷𝑗 𝑤𝑤𝑤 𝐷𝑗

𝑙 𝑗=1

VIII-2: 25

Finding the optimal RatioCut or NormalisedCut is NP-hard

slide-26
SLIDE 26

IRDM ‘15/16

Spectral Graph Partitioning

 A: adjacency matrix of undirected G

 𝑩𝑗𝑗 = 1 if is an edge, else 0

 𝒚 is a vector in ℝ𝑜 with components (‘value groups’)

 think of it as a label/value of each node

What is the meaning of 𝑩 ⋅ 𝒚?

𝑏11 ⋯ 𝑏1𝑜 ⋮ ⋱ ⋮ 𝑏𝑜1 ⋯ 𝑏𝑜𝑜 𝑦1 ⋮ 𝑦𝑜 = 𝑧1 ⋮ 𝑧𝑜

Entry 𝑧𝑗 is a sum of labels 𝑦𝑗 of neighbors of 𝑗

VIII-2: 26

𝑧𝑗 = 𝑩𝑗𝑗𝑦𝑗

𝑜 𝑗

= 𝒚𝑗

𝑗,𝑗 ∈𝐹

slide-27
SLIDE 27

IRDM ‘15/16

What is the meaning of 𝑩𝒚?

𝑘𝑢𝑢 coordinate of 𝑩 ⋅ 𝒚

 sum of the 𝒚-values

  • f neighbors of 𝒌

 make this a new value

at node 𝒌

Spectral graph theory

 analyse the spectrum

um of the matrix

 the spectrum are the eigenvectors of a graph, ordered by the

magnitude (strength) of their corresponding eigenvalues Λ = 𝜇1, 𝜇2, … , 𝜇𝑜 with 𝜇1 ≤ 𝜇2 ≤ ⋯ ≤ 𝜇𝑜

VIII-2: 27

𝑏11 ⋯ 𝑏1𝑜 ⋮ ⋱ ⋮ 𝑏𝑜1 ⋯ 𝑏𝑜𝑜 𝑦1 ⋮ 𝑦𝑜 = 𝜇 𝑦1 ⋮ 𝑦𝑜 𝑩 ⋅ 𝒚 = 𝜇 ⋅ 𝒚

slide-28
SLIDE 28

IRDM ‘15/16

Example: 𝑒-regular graph

Suppose all nodes in connected graph 𝐻 have degree 𝑒 What are some eigenvalues/vectors of 𝐻? 𝑩 ⋅ 𝒚 = 𝜇 ⋅ 𝒚 What is 𝜇? What is 𝒚?

 let’s try: 𝒚 = (1,1, … , 1)  then: 𝑩 ⋅ 𝒚 = 𝑒, 𝑒, … , 𝑒 = 𝜇 ⋅ 𝒚. So: 𝜇 = 𝑒

We found eigenpair of 𝐻: 𝒚 = 1,1, … , 1 , 𝜇 = 𝑒

VIII-2: 28

Remember the meaning of 𝒛 = 𝑩 ⋅ 𝒚: 𝑧𝑗 = 𝑩𝑗𝑗𝑦𝑗

𝑜 𝑗

= 𝒚𝑗

𝑗,𝑗 ∈𝐹

slide-29
SLIDE 29

IRDM ‘15/16

Example: Graph of 2 components

What if 𝐻 is not connected?

 𝐻 has 2 components, each 𝑒-regular

What are some eigenvectors?

 𝒚 =

= put all 1s on 𝑫𝟐 and 0s on 𝑫𝟑 or vice versa

 𝑦′ = (1, … , 1,0, … , 0) then 𝑩 ⋅ 𝑦′ = (d, … , d, 0, … , 0)  𝑦′′ = (0, … , 0,1, … , 0) then 𝑩 ⋅ 𝑦′′ = (0, … , 0, d, … , d)  and so, in both cases the corresponding 𝜇 = 𝑒

A bit of intuition:

VIII-2: 29

𝐷1 𝐷2 𝐷1 𝐷2 𝐷1 𝐷2 𝜇𝑜 = 𝜇𝑜−1 𝜇𝑜 − 𝜇𝑜−1 ≈ 0

2nd largest eigenvalue 𝜇𝑜−1 now has value very close to 𝜇𝑜

slide-30
SLIDE 30

IRDM ‘15/16

More intuition

If the graph is connected (right) then we already know that 𝒚𝑜 = (1, … , 1) is an eigenvector Since eigenvectors are orthogonal, the components of 𝒚𝑜−1 sum to 0

 why? Because 𝒚𝑜 ⋅ 𝒚𝑜−1 = ∑ 𝒚𝑜 𝑗 ⋅ 𝒚𝑜−1[𝑗]

𝑗

So, we can look at the eigenvectors of the 2𝑜𝑜 largest eigenvalue and declare nodes with positive label in 𝐷1 and negative label in 𝐷2 Still, lots to sort out.

VIII-2: 30

𝐷1 𝐷2 𝐷1 𝐷2 𝜇𝑜 = 𝜇𝑜−1 𝜇𝑜 − 𝜇𝑜−1 ≈ 0

2nd largest eigenvalue 𝜇𝑜−1 now has value very close to 𝜇𝑜

slide-31
SLIDE 31

IRDM ‘15/16

Matrix representations

The (weighted) adjacency matrix 𝑩 has the weight of edge (𝑗, 𝑘) at position 𝒃𝑗𝑗

 𝒐×𝒐 matrix  𝑩 = [𝒃𝒋𝒌], 𝒃𝒋𝒌 = 𝟐 if edge between node i and j

Importan tant t proper erties es:

 symmetric matrix  eigenvectors are real and orthogonal

VIII-2: 31

1 3 2 5 4 6

1 2 3 4 5 6 1

1 1 1

2

1 1

3

1 1 1

4

1 1 1

5

1 1 1

6

1 1

slide-32
SLIDE 32

IRDM ‘15/16

Matrix representations (2)

The degree m ee matri rix of a graph is a diagonal 𝑜-by-𝑜 matrix with the degree of vertex 𝑗 at position 𝚬𝑗𝑗 = 𝑒𝑗

 𝚬𝑗𝑗 = 𝑒𝑗 = ∑ 𝑏𝑗𝑗

𝑗

= degree of node i

 n× n diagonal matrix

VIII-2: 32

1 3 2 5 4 6

1 2 3 4 5 6 1

3

2

2

3

3

4

3

5

3

6

2

slide-33
SLIDE 33

IRDM ‘15/16

Matrix representations (3)

The normal alized a d adjacency y matr trix x 𝑵 is the adjacency matrix where in every row 𝑗 all values are divided by 𝑒𝑗

 every row sums up to 1  𝑵 = 𝚬−1𝑩

(p (pic ictur ure is is on va vacation)

VIII-2: 33

slide-34
SLIDE 34

IRDM ‘15/16

Matrix representations (4)

The Laplac acian an m matrix x 𝑀 or Λ of a graph is the adjacency matrix subtracted from the degree matrix

 𝒐× 𝒐 symmetric matrix

Important properties:

 eigenvalu

alues are non-negative real numbers

 ei

eigen genvectors are real and orthogonal

VIII-2: 34

𝑴 = 𝑬 − 𝑩

1 3 2 5 4 6

1 2 3 4 5 6 1 3

  • 1
  • 1
  • 1

2

  • 1

2

  • 1

3

  • 1
  • 1

3

  • 1

4

  • 1

3

  • 1
  • 1

5

  • 1
  • 1

3

  • 1

6

  • 1
  • 1

2

slide-35
SLIDE 35

IRDM ‘15/16

Matrix representations (4)

The Laplac acian an m matrix x 𝑀 or Λ of a graph is the adjacency matrix subtracted from the degree matrix

 𝒐× 𝒐 symmetric matrix

𝑴 = 𝚳 = 𝚬 − 𝐁 = ∑ 𝑏1,𝑗

𝑗≠1

−𝑏1,2 … −𝑏1,𝑜 −𝑏2,1 ⋮ −𝑏𝑜,1 ∑ 𝑏2,𝑗

𝑗≠2

… −𝑏2,𝑜 ⋮ ⋱ ⋮ −𝑏𝑜,2 … ∑ 𝑏𝑜,𝑗

𝑗≠𝑜

The Laplacian is symmetric and positive semi-definite

 (for undirected graphs)  has 𝑜 real, non-negative, orthogonal eigenvalues

0 ≤ 𝜇1 ≤ 𝜇2 ≤ 𝜇3 ≤ ⋯ 𝜇𝑜

VIII-2: 35

slide-36
SLIDE 36

IRDM ‘15/16

The normalised, symmetric Laplacian

The norm rmalised sed, symm symmetric L Lapl placi cian n matr atrix x 𝑀𝑡 or Λs of a graph is defined as

𝚬−𝟐

𝟑𝑴𝚬−𝟐 𝟑 = 𝑱 − 𝚬−𝟐 𝟑𝐁𝚬−𝟐 𝟑 =

∑ 𝑏1,𝑘

𝑘≠1

𝑜1𝑜1

𝑏1,2 𝑜1𝑜2

… −

𝑏1,𝑜 𝑜1𝑜𝑜

𝑏2,1 𝑜2𝑜1

⋮ −

𝑏𝑜,1 𝑜𝑜𝑜1 ∑ 𝑏2,𝑘

𝑘≠2

𝑜2𝑜2

… −

𝑏2,𝑜 𝑜2𝑜𝑜

⋮ ⋱ ⋮ −

𝑏𝑜,2 𝑜𝑜𝑜2

∑ 𝑏𝑜,𝑘

𝑘≠𝑜

𝑜𝑜𝑜𝑜

and is also positive semi-definite

The norm rmalised sed, a , asymmet metri ric L c Laplacian n 𝑀𝑏 is 𝑀𝑏 = Δ−1𝑀

VIII-2: 36

slide-37
SLIDE 37

IRDM ‘15/16

Clusterings and matrices, redux

Recall that we can express a clustering using a binary cluster assignment matrix

 each row has exactly one non-zero

Let the 𝑗-th column of this matrix be 𝒅𝑗

 clusters are disjoint so 𝒅𝑗

𝑈𝒅𝑗 = 0

 cluster has 𝒅𝑗

𝑈𝒅𝑗 = 𝒅𝑗 2 elements

We can get the 𝑤𝑤𝑤(𝐷𝑗) and 𝑋(𝐷𝑗, 𝑊) using 𝒅𝒋’s

 𝑤𝑤𝑤 𝐷𝑗 = ∑

𝑒𝑗 = ∑ ∑ 𝒅𝑗𝑗𝚬𝑗𝑡𝒅𝑗𝑡

𝑜 𝑡 𝑜 𝑗=1

= 𝒅𝑗

𝑈𝚬𝒅𝑗 𝑗∈𝐷𝑗

 𝑋 𝐷𝑗, 𝐷𝑗 = ∑

∑ 𝑏𝑗𝑡 = 𝒅𝑗

𝑈𝑩𝒅𝑗 𝑡∈𝐷𝑗 𝑗∈𝐷𝑗

 𝑋 𝐷𝑗, 𝑊 ∖ 𝐷𝑗 = 𝑋 𝐷𝑗, 𝑊 − 𝑋 𝐷𝑗, 𝐷𝑗 = 𝒅𝑗

𝑈 𝚬 − 𝑩 𝒅𝑗 = 𝒅𝑗 𝑈𝑴𝒅𝑗

VIII-2: 37

slide-38
SLIDE 38

IRDM ‘15/16

Cuts using matrices

RatioCut = 𝑋 𝐷𝑗, 𝑊 ∖ 𝐷𝑗 𝐷𝑗

𝑙 𝑗=1

= 𝒅𝑗

𝑈𝑴𝒅𝑗

𝒅𝑗

2 𝑙 𝑗=1

NormalisedCut = 𝑋 𝐷𝑗, 𝑊 ∖ 𝐷𝑗 𝑤𝑤𝑤 𝐷𝑗

𝑙 𝑗=1

= 𝒅𝑗

𝑈𝑴𝒅𝑗

𝒅𝑗

𝑈𝚬𝒅𝑗 𝑙 𝑗=1

VIII-2: 38

slide-39
SLIDE 39

IRDM ‘15/16

The second eigenvalue, 𝜇2, as an

  • ptimization problem

Fact: for symmetric matrix 𝑵:

𝜇2 = min

𝒚

𝒚𝑈𝑵𝒚 𝒚𝑈𝒚

What is the meaning of min 𝒚𝑈𝑴𝒚 on graph 𝐻?

𝒚𝑈𝑴𝒚 = 𝑴𝑗𝑗𝒚𝑗𝒚𝑗 = 𝑬𝑗𝑗 − 𝑩𝑗𝑗 𝒚𝑗𝒚𝑗

𝑜 𝑗𝑗 𝑜 𝑗𝑗

= 𝑬𝑗𝑗𝒚𝑗

2 𝑗

− 2𝒚𝑗𝒚𝑗

𝑗,𝑗 ∈𝐹

= (𝒚𝑗

2 + 𝒚𝑗 2 𝑗,𝑗 ∈𝐹

− 2𝒚𝑗𝒚𝑗)

VIII-2: 39

=

  • 𝒚𝑗 − 𝒚𝑗

2 𝑗,𝑗 ∈𝐹

Node 𝑗 has degree 𝑒𝑗. So, value 𝒚𝑗

2 needs to be summed up 𝑒𝑗 times.

But each edge (𝑗, 𝑘) has two endpoints, so we need 𝒚𝑗

2 + 𝒚𝑗 2

slide-40
SLIDE 40

IRDM ‘15/16

𝜇2, as an optimization problem

What else do we know about 𝑦?

 𝒚 is a unit vector: ∑ 𝒚𝑗

2 = 1 𝑗

 𝒚 is orthogonal to 1st eigenvector (1, … , 1) thus:

∑ 𝑦𝑗 ⋅ 1

𝑗

= ∑ 𝑦𝑗

𝑗

= 0

Remember 𝜇2 = min

all labelings

  • f nodes i so

that ∑𝒚𝑗=0

∑ 𝒚𝑗 − 𝒚𝑗

2 𝑗,𝑗 ∈𝐹

∑ 𝒚𝑗

2 𝑗

We want to assign values 𝒚𝑗 to nodes 𝑗 such that few edges cross 0. (we want 𝒚𝑗 and 𝒚𝑗 to subtract eachother)

VIII-2: 40

slide-41
SLIDE 41

IRDM ‘15/16

Finding an optimal cut

Back to finding the optimal cut Express partition (𝐷1, 𝐷2) as a vector

𝒅𝑗 = +1 if 𝑗 ∈ 𝐷1 −1 if 𝑗 ∈ 𝐷2

We can min inim imis ise the he cut of

  • f the

he parti titi tion by finding a non-trivial vector 𝑦 that min inim imis ises

arg min

𝒅∈ −1,+1 𝑜 𝑔 𝒅 =

  • 𝒅𝑗 − 𝒅𝑗

2 𝑗,𝑗 ∈𝐹

NP-hard… so, let’s relax!

let 𝒅𝑗’s take any real value

(Fiedler, 1973) VIII-2: 41

slide-42
SLIDE 42

IRDM ‘15/16

Rayleigh theorem

min

𝒅∈ℝ𝑜 𝑔 𝒅 =

𝒅𝑗 − 𝒅𝑗

2 𝑗,𝑗 ∈𝐹

= 𝒅𝑈𝑴𝒅

𝜇2 = min

𝒅

𝑔(𝒅): : the minimum value of 𝑔 𝒅 is giv iven n by the e 2nd

nd smalle

lest eig igenvalue lue 𝝁𝟑 of the Lapla lacia ian n matrix 𝑴 𝒚 = arg min

𝒅

𝑔(𝒅): the optimal solution for 𝒅 is given by the corresponding eigenvector 𝒚 and is referred to as the Fiedl dler er vector

VIII-2: 42

slide-43
SLIDE 43

IRDM ‘15/16

So far…

How to define a good

  • od partition of a graph?

 minimise a given graph cut criterion

How to efficiently identify such a partition?

 approximate using information provided by the

eigenvalues and eigenvalues of a graph

Spectral clustering

VIII-2: 43

slide-44
SLIDE 44

IRDM ‘15/16

Spectral clustering algorithms

Three basic stages

1.

pre-processing

 construct a matrix representation of the graph

2.

decomposition

 compute eigenvalues and eigenvectors of the matrix  map each point to a lower-dimensional representation

based on one or more eigenvectors

3.

grouping

 assign points to two or more clusters,

based on the new representation

VIII-2: 44

slide-45
SLIDE 45

IRDM ‘15/16

Spectral partitioning algorithm

1) 1) Pre-proc

  • cessi

ssing:

 build Laplacian

matrix 𝑴 of the graph 2) 2) Decomp

  • mposi
  • sition
  • n:

 find eigenvalues λ

and eigenvectors 𝒚

  • f the matrix 𝑴

 map vertices to

corresponding components of 𝜇2

VIII-2: 45

0.0

  • 0.4
  • 0.4

0.4

  • 0.6

0.4 0.5 0.4

  • 0.2
  • 0.5
  • 0.3

0.4

  • 0.5

0.4 0.6 0.1

  • 0.3

0.4 0.5

  • 0.4

0.6 0.1 0.3 0.4 0.0 0.4

  • 0.4

0.4 0.6 0.4

  • 0.5
  • 0.4
  • 0.2
  • 0.5

0.3 0.4 5.0 4.0 3.0 3.0 1.0 0.0

λ= X =

How do we now find the clusters?

  • 0.6

6

  • 0.3

5

  • 0.3

4

0.3

3

0.6

2

0.3

1 1 2 3 4 5 6 1 3

  • 1
  • 1
  • 1

2

  • 1

2

  • 1

3

  • 1
  • 1

3

  • 1

4

  • 1

3

  • 1
  • 1

5

  • 1
  • 1

3

  • 1

6

  • 1
  • 1

2

slide-46
SLIDE 46

IRDM ‘15/16

Spectral partitioning

3) 3) Gr Group

  • uping:

 sort components of reduced 1-dimensional vector  identify clusters by splitting the sorted vector in two

How to choose a splitting point?

 naïve approaches:

split at 0 or median value

 more expensive approaches:

attempt to minimize normalized cut in 1-dimension (sweep over ordering of nodes induced by the eigenvector)

VIII-2: 46

  • 0.6

6

  • 0.3

5

  • 0.3

4

0.3

3

0.6

2

0.3

1

Spli plit a at 0: 0: Clu luster A A: Positive points Clu luster B: B: Negative points

0.3

3

0.6

2

0.3

1

  • 0.6

6

  • 0.3

5

  • 0.3

4

A B

slide-47
SLIDE 47

IRDM ‘15/16

Example: Spectral Partitioning

VIII-2: 47

Rank in x2 Value of x2

slide-48
SLIDE 48

IRDM ‘15/16

Example: Spectral Partitioning

VIII-2: 48

Rank in x2 Value of x2

Eigenvector corresponding to 𝜇2 is usefu ful, it shows communities!

slide-49
SLIDE 49

IRDM ‘15/16

Example: Spectral Partitioning

VIII-2: 49

Eigenvector corresponding to 𝜇1 is useles ess, it doesn’t show anything Eigenvector corresponding to 𝜇3 is useless ss by itself, but usefu ful when considered to togethe ther w with 𝝁𝟑

slide-50
SLIDE 50

IRDM ‘15/16

𝑙-way Spectral clustering

How do we partition a graph into k clusters? There are two basic approaches

 recursive bi

bi-partitioni ning ng (Hagen et al., ’92)

 recursively apply a bi-partitioning algorithm in

a hierarchical divisive manner

 inefficient, and unstable

 cluster multip

iple eig igenv nvectors (Shi-Malik, ’00)

 build a reduced space from multiple eigenvectors  commonly used in recent papers  a preferable approach…

VIII-2: 50

slide-51
SLIDE 51

IRDM ‘15/16

Why use multiple eigenvectors?

Approximates the optimal cut

can be used to approximate optimal k-way normalized cut

Emphasizes cohesive clusters

increases the unevenness in the distribution of the data

associations between similar points are amplified, associations between dissimilar points are attenuated

the data begins to “approximate a clustering”

Well-separated space

transforms data to a new “embedded space”, consisting of k orthogonal basis vectors

Multiple eigenvectors prevent instability due to information loss

(Shi-Malik, 2000) VIII-2: 51

slide-52
SLIDE 52

IRDM ‘15/16

Is spectral clustering optimal?

Spectral clustering is not always a good approximation of the graph cuts

 in so-called cockroach graphs, spectral clustering always cuts

horizontally, while vertically is optimal

 approximation ratio of 𝑃(𝑜)

VIII-2: 52

v1 vk+1 v2k v2k+1 v3k vk v3k+1 v4k

Optimal Spectral

slide-53
SLIDE 53

IRDM ‘15/16

Spectral clustering

To do the clustering, we need to move our real-valued eigenvectors 𝒗𝑗 to binary cluster indicator vectors First, create a matrix 𝑽 with 𝒗𝑗’s as its columns

  • ptionally, normalize the rows to sum up to 1 (esp when using 𝑴𝑡)

Cluster the rows of this matrix using 𝑙-means

  • r, in principle, any other clustering algorithm

Solving the eigenvectors is 𝑃(𝑜3) in general or 𝑃(𝑜2) if the similarity graph has as many edges as vertices

the 𝑙-means on the 𝑽 matrix takes 𝑃(𝑢𝑜𝑙2)

𝑢 is the number of iterations in 𝑙-means

VIII-2: 53

slide-54
SLIDE 54

IRDM ‘15/16

Another look at Approximate cuts

Allowing for real-valued cluster assignment vectors 𝑑𝑗 makes Relaxed RatioCut look like

𝐾𝑗𝑠 𝐷 = 𝒅𝑗

𝑈𝑴𝒅𝑗

𝒅𝑗

2 𝑙 𝑗=1

= 𝒅𝑗 𝒅𝑗

𝑈

𝑴 𝒅𝑗 𝒅𝑗

𝑙 𝑗=1

= 𝒗𝑗

𝑈𝑴𝒗𝑗 𝑙 𝑗=1

 𝒗𝑗 =

𝒅𝑗 𝒅𝑗 i.e. the unit vector in the direction of 𝒅𝑗

VIII-2: 54

slide-55
SLIDE 55

IRDM ‘15/16

Solving the relaxed version

We want to minimize the function 𝐾𝑗𝑠 over 𝒗𝑗’s

 we have a constraint that 𝒗𝑗

𝑈𝒗𝑗 = 1

To solve, derive w.r.t. 𝒗𝑗’s and find the roots

 add Lagrange multipliers to incorporate the constraints:

𝜀 𝜀𝒗𝑗 𝒗𝑗

𝑈𝑴𝒗𝑗 + 𝜇𝑗 1 − 𝒗𝑗 𝑈𝒗𝑗 𝑙 𝑗 𝑙 𝑗

= 𝟏

Hence, 𝑴𝒗𝑗 = 𝜇𝑗𝒗𝑗

 𝒗𝑗 is an eigenvector of 𝑴 corresponding to the eigenvalue 𝜇𝑗

VIII-2: 55

slide-56
SLIDE 56

IRDM ‘15/16

Which eigenvectors to choose

We know that 𝑴𝒗𝑗 = 𝜇i𝒗i

 hence 𝜇𝑗 = 𝒗𝑗

𝑈𝑴𝒗𝑗

As we’re minimizing the sum of 𝒗𝑗

𝑈𝑴𝒗𝑗’s we should choose

the 𝒗𝑗’s corresponding to the 𝑙 smallest eigenvalues

 these are our rela

laxed clu luster indic icators

But, we also know that 𝜇1 = 0 and that the corresponding eigenvector is (𝑜–1

2, 𝑜–1 2, … , 𝑜–1 2)

 hmm, that doesn’t help with clustering...

VIII-2: 56

slide-57
SLIDE 57

IRDM ‘15/16

Normalised cut and choice of Laplacians

For normalized cut similar procedure shows that we should select the 𝑙 smallest eigenvectors of 𝑴𝑡 instead of 𝑴

 or, we can use the asymmetric Laplacian 𝑴𝑏

Which one we should choose?

 both ratio and normalised cut aim at minimising intra-cluster similarity  only normalised cut considers inter-cluster similarity → either 𝑴𝑡 or 𝑴𝑏

The asymmetric Laplacian is better

 with symmetric one further normalisation is needed

VIII-2: 57

slide-58
SLIDE 58

IRDM ‘15/16

Pseudo-code

Alg lgorithm hm SPECTRALCLUSTERING(connected graph 𝐻, 𝑙) : compute the similarity matrix 𝑩 ∈ ℝ𝑜×𝑜 for 𝐻 if if 𝑠𝑏𝑢𝑗𝑤 𝑑𝑣𝑢 then en 𝑪 ← 𝑴 else if if 𝑜𝑤𝑠𝑛𝑏𝑤𝑗𝑡𝑓𝑒 𝑑𝑣𝑢 then n 𝑪 ← 𝑴𝒕 or 𝑴𝒃 solve 𝑪𝒗𝑗 = 𝜇𝑗𝒗𝑗 for 𝑗 = 𝑙 + 1, where 𝜇2 ≤ 𝜇3 ≤ ⋯ ≤ 𝜇𝑙+1 𝑽 ← 𝒗𝑜 𝒗𝑜−1 … 𝒗𝑜−𝑙+1 𝒁 ← normalise rows of 𝑽 𝑫 ← 𝑫𝟐, … , 𝑫𝑙 via 𝑙-means on 𝒁

VIII-2: 58

slide-59
SLIDE 59

IRDM ‘15/16

Conclusions

Frequent subgraph mining finds recurring patterns in graph data

 enormously complex problem → exact algorithms can’t be fast  but graphs are not usually very big even if there are many of them

Graph clustering is much like other clustering

 any clusterable data can be turned into similarity graph  spectral clustering uses well-known linear algebra  though this doesn’t necessarily make it a good clustering algorithm

VIII-2: 59

slide-60
SLIDE 60

IRDM ‘15/16

Frequent subgraph mining finds recurring patterns in graph data

 enormously complex problem → exact algorithms can’t be fast  but graphs are not usually very big even if there are many of them

Graph clustering is much like other clustering

 any clusterable data can be turned into similarity graph  spectral clustering uses well-known linear algebra  though this doesn’t necessarily make it a good clustering algorithm

VIII-2: 60