Axioms for graph clustering
Twan van Laarhoven and Elena Marchiori
Institute for Computing and Information Sciences Radboud University Nijmegen, The Netherlands
27th September 2013
1 / 49
Axioms for graph clustering Twan van Laarhoven and Elena Marchiori - - PowerPoint PPT Presentation
Axioms for graph clustering Twan van Laarhoven and Elena Marchiori Institute for Computing and Information Sciences Radboud University Nijmegen, The Netherlands 27th September 2013 1 / 49 Outline Introduction Axioms for data clustering
Twan van Laarhoven and Elena Marchiori
Institute for Computing and Information Sciences Radboud University Nijmegen, The Netherlands
27th September 2013
1 / 49
Introduction Axioms for data clustering Axioms for graph clustering Modularity Conclusion
2 / 49
Introduction Axioms for data clustering Axioms for graph clustering Modularity Conclusion
3 / 49
4 / 49
Data repository.
5 / 49
each group are more similar to each other than to objects in other groups.
function whose optimization yields a division of objects into (disjoint) groups. k-means clustering objective:
|| x − µc||2, where µc =
x/|c|.
6 / 49
NP-hard.
developed to find sub optimal solutions.
7 / 49
tion that quantifies the similarity be- tween each pair of patterns.
describing a relation over patterns.
8 / 49
clustering problem into a graph clustering one.
Distance matrix → kNN graph → Graph clustering · · · · · · · · · · · · · · · · · · · · · · · ·
9 / 49
Introduction Axioms for data clustering Axioms for graph clustering Modularity Conclusion
10 / 49
functions?
11 / 49
Kleinberg proved an impossibility result concerning the axiomatization of the notion of data clustering. He focused on clustering functions ˆ C : D → C, from distance functions over a dataset S to clusterings of S, d → C.
There is no clustering function that is scale invariant, consistent and rich.
12 / 49
∀d ∈ D, α > 0. ˆ C(d) = ˆ C(αd). ˆ C a b c d = ˆ C a b c d
13 / 49
range( ˆ C) is equal to the set of all partitions of S. ∃d. ˆ C (d) = a b c d e.g. d = a b c d
13 / 49
∀d, d′ ∈ D. ˆ C(d) = C and d′ is a C-transformation of d ⇒ ˆ C(d′) = C
d′ is a C-transformation of d if ∀i, j ∈ S
ˆ C a b c = a b c ⇒ ˆ C
b c
b c
13 / 49
∀d ∈ D, α > 0. ˆ C(d) = ˆ C(αd).
range( ˆ C) is equal to the set of all partitions of S.
∀d, d′ ∈ D. ˆ C(d) = C and d′ is a C-transformation of d ⇒ ˆ C(d′) = C
d′ is a C-transformation of d if ∀i, j ∈ S
13 / 49
C ′ is a refinement of C (C ′ ⊑ C) if ∀c′ ∈ C ′ ∃c ∈ C s.t. c′ ⊆ c. {C1, . . . , Cn} ⊂ C is an antichain if ∀i, j i = j ⇒ Ci ⊑ Cj.
If ˆ C is Scale Invariant and Consistent then range( ˆ C) is an antichain. Proof (sketch) Suppose ˆ C is Consistent and Scale Invariant. Let C0 ⊑ C1 in range( ˆ C). Construct d such that ˆ C(d) = C1. Choose α such that d′ = αd and ˆ C(d′) = C0.
14 / 49
Ackerman and Ben-David used quality functions Q instead of clustering functions. Q : D × C → R≥0, mapping a distance function and a clustering into a non-negative real number, (d, C) → r.
There is a clustering quality function that is permutation invariant, scale invariant, monotonic and rich. C-index = (s − smin)/(smax − smin), where s =
i∼C j d(i, j),
smin is the sum of the n minimal (over all pairs of patterns) distances, smax is the sum of the n maximal distances, n = |{(i, j) | i ∼C j}|.
15 / 49
are framed in terms of distance functions.
axiomatization of data clustering.
although related - story ...
16 / 49
Introduction Axioms for data clustering Axioms for graph clustering Modularity Conclusion
17 / 49
Distance functions Graphs d(i, j) E(i, j) c a d b c a d b
18 / 49
Distance functions Graphs d(i, j) E(i, j)
a d b c a d b
18 / 49
a b c e d g f h i j k
A symmetric weighted graph (or network) is a pair (V , E) of
such that E(i, j) = E(j, i) for all i, j ∈ V .
19 / 49
a b c e d g f h i j k
A clustering C of a graph G = (V , E) is a partition of its nodes.
19 / 49
ˆ C : Graph → Clustering ˆ C a b c e d = a b c e d
Q : Graph × Clustering → R
· G · ⊆ Clustering × Clustering
20 / 49
ˆ C : Graph → Clustering
Q : Graph × Clustering → R Q a b c e d = 0.1234
· G · ⊆ Clustering × Clustering
20 / 49
ˆ C : Graph → Clustering
Q : Graph × Clustering → R
· G · ⊆ Clustering × Clustering a b c e d a b c e d
20 / 49
Q(G, C) =
wc
Q(G, C) =
Q(G, C) =
−wc log(vc/vV ) · · ·
21 / 49
Q(G, C) =
wc − α|C|
Qγ
RB(G, C) =
Q(G, C) =
−wc log(vc/α) · · ·
22 / 49
Intuition: The magnitude of the edge weights shouldn’t matter. ˆ C a b c e d = ˆ C a b c e d
23 / 49
Intuition: The magnitude of the edge weights shouldn’t matter. Q a b c e d = Q a b c e d
23 / 49
Intuition: The magnitude of the edge weights shouldn’t matter. Q a b c e d = αQ a b c e d
23 / 49
Intuition: The magnitude of the edge weights shouldn’t matter. Q ≥ Q
≥ Q
23 / 49
Intuition: The magnitude of the edge weights shouldn’t matter.
A quality function Q is scale invariant if
Q(G, C1) ≥ Q(G, C2) if and only if Q(αG, C1) ≥ Q(αG, C2).
23 / 49
Intuition: Only the edge weights should matter. Q a b c e d = Q z v y x u
24 / 49
Intuition: Only the edge weights should matter.
A quality function Q is permutation invariant if Q(G, C) = Q(f (G), f (C)). for all
where f is extended to graphs and clusterings in the obvious way.
24 / 49
Intuition:
So,
A quality function Q is rich if
there is
25 / 49
Intuition: Adding edges inside a cluster or removing edges between clusters does not make the clustering worse. Q a b c e d ≥ Q a b c e d
26 / 49
Intuition: Adding edges inside a cluster or removing edges between clusters does not make the clustering worse.
Let
Then G ′ is a C-consistent improvement of G if
26 / 49
Intuition: Adding edges inside a cluster or removing edges between clusters does not make the clustering worse.
A quality function Q is monotonic if Q(G ′, C) ≥ Q(G, C). for all
26 / 49
Intuition: Local changes should have local effects. Q a b c e d = Q a b c + Q e d
27 / 49
Intuition: Local changes should have local effects. Q a b c e d ≥ Q a b c e d
a b c e d ≥ Q a b c e d
27 / 49
Intuition: Local changes should have local effects. Q a b c · · · ≥ Q a b c · · ·
a b c · · · ≥ Q a b c · · ·
27 / 49
Intuition: Local changes should have local effects.
Two graphs G1 and G2 agree on the neighborhood of Va ⊆ V1 ∩ V2 if E1(i, j) = E2(i, j) for all i ∈ Va, j ∈ V1 ∩ V2, and E1(i, j) = 0 for all i ∈ Va, j ∈ V1 \ V2, and E2(i, j) = 0 for all i ∈ Va, j ∈ V2 \ V1. So, for nodes/clusters in Va, all incident edges are the same.
27 / 49
Intuition: Local changes should have local effects.
A quality function Q is local if
that agree on a set Va and its neighborhood,
C2 of V2 \ Va and Ca, Da of Va. if Q(G1, Ca ∪ C1) ≥ Q(G1, Da ∪ C1) then Q(G2, Ca ∪ C2) ≥ Q(G2, Da ∪ C2).
27 / 49
There is a graph clustering function that is scale invariant, permutation invariant, monotonic, rich and local.
ˆ Ccoco(G) = the connected components of G Qcoco(G, C) = 1[C are the connected components of G]
28 / 49
There is a graph clustering function that is scale invariant, permutation invariant, monotonic, rich and local.
ˆ Ccoco(G) = the connected components of G Qcoco(G, C) = 1[C are the connected components of G]
28 / 49
Intuition:
small change in quality.
A quality function Q is continuous if
there exists a δ > 0 such that
we have E ′ − Emax < δ ⇒ |Q(G ′, C) − Q(G, C)| < ǫ.
29 / 49
Introduction Axioms for data clustering Axioms for graph clustering Modularity Conclusion
30 / 49
Intuition:
Qmodularity(G, C) =
E(i, j) vV − vi vV vj vV
=
wc vV − vc vV 2 . Where vc =
E(i, j) volume of cluster wc =
E(i, j) within cluster weight.
31 / 49
The obvious:
The less obvious:
The bad:
32 / 49
Qmodularity
b c d 2 1 2
Qmodularity
b c d 2 1 2
Qmodularity
b c d x y 2 1 2 20
Qmodularity
b c d x y 2 1 2 20
33 / 49
Qmodularity
b c d 1 1
Qmodularity
b c d 0.1 1
Qmodularity
b c d 1 10
34 / 49
QM-fixed(G, C) =
wc M − vc M 2
Is it monotonic? Take vc = wc + bc (within + between) ∂QM-fixed(G, C) ∂wc = 1 M − 2wc + 2bc M2 . This is negative when 2vc > M, so not monotonic.
35 / 49
QM-fixed(G, C) =
wc M − wc + bc M 2
Is it monotonic? Take vc = wc + bc (within + between) ∂QM-fixed(G, C) ∂wc = 1 M − 2wc + 2bc M2 . This is negative when 2vc > M, so not monotonic.
35 / 49
QM,γ(G, C) =
M + γvc −
M + γvc 2 .
Adaptive scale modularity is
36 / 49
QM,γ(G, C) =
M + γvc −
M + γvc 2 .
Adaptive scale modularity is
36 / 49
Take partial derivatives (vc = wc + bc) QM,γ(G, C) =
M + γ(wc + bc) −
M + γ(wc + bc) 2 . ∂QM,γ(G, C) ∂wc = M2 + (γ − 2)Mwc + (2γ − 2)Mbc + γ2vcbc (M + γvc)3 . ∂QM,γ(G, C) ∂bc = − 2Mvc (M + γvc)3 − γwc (M + γvc)2 ≤ 0. When γ ≥ 2, Q is a monotonic increasing function of wc and decreasing function of bc for all c, so the quality function is monotonic.
37 / 49
effect of M becomes insignificant. Q(G, D) ≈
d
γvd
(where ǫ depends on k and M)
The clique graph with edge weight k of a partition C of V is (V , E) where E(i, j) = k · 1[i ∼C j].
38 / 49
Equivalent to other modularity variants.
Q0,γ(G, C) ∝
wc vc − 1 γ
i.e. normalized cut.
Q∞,γ(G, C) ∝
wc, i.e. unnormalized cut.
39 / 49
Introduction Axioms for data clustering Axioms for graph clustering Modularity Conclusion
40 / 49
notions.
41 / 49
problems.
clustering.
42 / 49
Twan van Laarhoven and Elena Marchiori
Institute for Computing and Information Sciences Radboud University Nijmegen, The Netherlands
27th September 2013
43 / 49
44 / 49
Take a simple graph: w w b
45 / 49
M0
3 10 20 30 40 50 10 20 30 40 50 b 1 10 20 30 40 50 10 20 30 40 50 1 2 10 20 30 40 50 10 20 30 40 50 1 2 10 20 30 40 50 10 20 30 40 50
M10
3 10 20 30 40 50 10 20 30 40 50 b 1 2 10 20 30 40 50 10 20 30 40 50 1 2 10 20 30 40 50 10 20 30 40 50 1 2 10 20 30 40 50 10 20 30 40 50
M100
1 2 3 10 20 30 40 50 10 20 30 40 50 b 1 2 10 20 30 40 50 10 20 30 40 50 1 2 10 20 30 40 50 10 20 30 40 50 1 2 10 20 30 40 50 10 20 30 40 50
M1000
1 10 20 30 40 50 10 20 30 40 50 w b 1 10 20 30 40 50 10 20 30 40 50 w 1 10 20 30 40 50 10 20 30 40 50 w 1 2 10 20 30 40 50 10 20 30 40 50 w
Γ 0 Γ 1 Γ 2 Γ 10
Legend:
1 = 2 = 3 =
46 / 49
find best cut and repeat
group nodes together
47 / 49
Fast unfolding of communities in large networks
1 Move nodes into neighboring clusters to improve quality. 2 Repeat until local maximum. 3 Now cluster the clusters.
48 / 49
49 / 49
49 / 49
49 / 49
49 / 49
49 / 49
49 / 49
49 / 49
49 / 49
49 / 49
49 / 49