An Impossibility Theorem Seminar Algorithms Kevin Chang Eindhoven - - PowerPoint PPT Presentation
An Impossibility Theorem Seminar Algorithms Kevin Chang Eindhoven - - PowerPoint PPT Presentation
An Impossibility Theorem Seminar Algorithms Kevin Chang Eindhoven University of Technology June 19, 2018 Contents Motivation Definitions The Impossibility Theorem Centroid-Based Clustering and Consistency Relaxing the Properties Table of
Contents
Motivation Definitions The Impossibility Theorem Centroid-Based Clustering and Consistency Relaxing the Properties
Table of Contents
Motivation Definitions The Impossibility Theorem Centroid-Based Clustering and Consistency Relaxing the Properties
Motivation
Motivation
◮ The mapper algorithm needs a ’good’ cluster algorithm
Motivation
Clustering:
◮ ‘Clustering’ cannot be precisely defined ◮ Intuitively, group set of objects that are ‘similar’. ◮ Unsupervised
Motivation
Clustering:
◮ ‘Clustering’ cannot be precisely defined ◮ Intuitively, group set of objects that are ‘similar’. ◮ Unsupervised
Example:
Motivation
Clustering:
◮ ‘Clustering’ cannot be precisely defined ◮ Intuitively, group set of objects that are ‘similar’. ◮ Unsupervised
Example:
Motivation
◮ There exist no universal good clustering algorithm. ◮ Every clustering algorithm assumes a certain model. ◮ e.g. k-means tends to generate hyperspherical clusters.
Motivation
Example:
Motivation
Example:
k-means
Motivation
Example:
Single-link
Motivation
Example:
Motivation
Example:
k-means
Motivation
Example:
Single-link
Motivation
The idea of no universal clustering algorithm is partially captured by the impossibility theorem:
◮ There is no single clustering algorithm simultaneously satisfies
a set of basic intuitive axioms of data clustering.
Table of Contents
Motivation Definitions The Impossibility Theorem Centroid-Based Clustering and Consistency Relaxing the Properties
Definitions
◮ S is a set of n points
Definitions
◮ S is a set of n points ◮ A distance function is any function d : S × S → R such
that:
◮ For distinct i, j ∈ S, we have d(i, j) ≥ 0. ◮ d(i, j) = 0 iff i = j. ◮ d(i, j) = d(j, i).
Definitions
◮ S is a set of n points ◮ A distance function is any function d : S × S → R such
that:
◮ For distinct i, j ∈ S, we have d(i, j) ≥ 0. ◮ d(i, j) = 0 iff i = j. ◮ d(i, j) = d(j, i).
◮ A clustering function is any function f (d) that takes a
distance function d, and returns a partition of Γ of S.
◮ Points are not assumed to belong to any ambient space. ◮ The sets in Γ will be called its clusters.
Definitions
Example:
Set of points
Definitions
Example:
Distance function
Definitions
Example:
A partition of S, existing out of 3 clusters
Table of Contents
Motivation Definitions The Impossibility Theorem Centroid-Based Clustering and Consistency Relaxing the Properties
Scale-Invariance
Axiom 1: Scale-Invariance
For any distance function d and any α > 0, we have f (d) = f (α · d)
◮ i.e. cluster functions should not have a built-in ’length-scale’.
Scale-Invariance
Richness
Let Range(f ) denote the set of all partitions Γ such that f (d) = Γ for some distance function d
Axiom 2: Richness
Range(f ) is equal to the set of all partitions of S.
◮ i.e. every partition of S is a possible output.
Richness
Consistency
◮ Let Γ be a partition of S, and d and d’ two distance functions
- n S.
Consistency
◮ Let Γ be a partition of S, and d and d’ two distance functions
- n S.
◮ d’ is a Γ-transformation of d if
- 1. for all i, j ∈ S belonging to the same cluster of Γ, we have
d’(i, j) ≤ d(i, j);
- 2. for all i, j ∈ S belonging to different clusters of Γ, we have
d’(i, j) ≥ d(i, j)
Consistency
◮ Let Γ be a partition of S, and d and d’ two distance functions
- n S.
◮ d’ is a Γ-transformation of d if
- 1. for all i, j ∈ S belonging to the same cluster of Γ, we have
d’(i, j) ≤ d(i, j);
- 2. for all i, j ∈ S belonging to different clusters of Γ, we have
d’(i, j) ≥ d(i, j)
Axiom 3: Consistency
Let d and d’ be two distance functions. If f (d) = Γ, and d’ is a Γ-transformation of d, then f (d’) = Γ
◮ i.e. Cluster stays the same after reducing the distance within
cluster and enlarging distance between cluster.
Consistency
The Impossibility Theorem
Theorem 2.1
For each n ≥ 2, there is no clustering function f that satisfies Scale-Invariance, Richness, and Consistency.
Single-linkage
◮ Single-linkage is a family of clustering function. ◮ Initialize each point as its own cluster. ◮ Repeatedly merge pair of clusters whose distance to one
another is minimum until a stopping condition is reached.
Single-linkage
Example:
Single-linkage
Example:
Single-linkage
Example:
Single-linkage
Example:
Single-linkage
Example:
Single-linkage
Example:
Examples of Impossibility
◮ k-cluster stopping condition: Stop adding edges when there
are k connected components.
Examples of Impossibility
◮ k-cluster stopping condition: Stop adding edges when there
are k connected components.
◮ For any k ≥ 1, and n ≥ k, this stopping condition satisfies
Scale-Invariance and Consistency.
Examples of Impossibility
◮ k-cluster stopping condition: Stop adding edges when there
are k connected components.
◮ For any k ≥ 1, and n ≥ k, this stopping condition satisfies
Scale-Invariance and Consistency.
Examples of Impossibility
◮ k-cluster stopping condition: Stop adding edges when there
are k connected components.
◮ For any k ≥ 1, and n ≥ k, this stopping condition satisfies
Scale-Invariance and Consistency.
Examples of Impossibility
◮ distance-r stopping condition: Only add edges of weight at
most r.
Examples of Impossibility
◮ distance-r stopping condition: Only add edges of weight at
most r.
◮ For any r > 0, and any n ≥ 2, this stopping condition satisfies
Richness and Consistency.
Examples of Impossibility
◮ distance-r stopping condition: Only add edges of weight at
most r.
◮ For any r > 0, and any n ≥ 2, this stopping condition satisfies
Richness and Consistency.
r
Examples of Impossibility
◮ distance-r stopping condition: Only add edges of weight at
most r.
◮ For any r > 0, and any n ≥ 2, this stopping condition satisfies
Richness and Consistency.
r
Examples of Impossibility
◮ scale-α stopping condition: Let p∗ denote the maximum
pairwise distance. Add only edges of weight at most αp∗
Examples of Impossibility
◮ scale-α stopping condition: Let p∗ denote the maximum
pairwise distance. Add only edges of weight at most αp∗
◮ For any positive α < 1, and n ≥ 3, this stopping condition
satisfies Scale-Invariance and Richness
Examples of Impossibility
◮ scale-α stopping condition: Let p∗ denote the maximum
pairwise distance. Add only edges of weight at most αp∗
◮ For any positive α < 1, and n ≥ 3, this stopping condition
satisfies Scale-Invariance and Richness
p∗ αp∗
Examples of Impossibility
◮ scale-α stopping condition: Let p∗ denote the maximum
pairwise distance. Add only edges of weight at most αp∗
◮ For any positive α < 1, and n ≥ 3, this stopping condition
satisfies Scale-Invariance and Richness
αp∗ p*
The Impossibility Theorem Proof Intuition
The Impossibility Theorem Proof
First some notions.
Partition Γ Partition Γ’
◮ A partition Γ’ is a refinement of a
partition Γ if for every set C’∈ Γ’, there is a set C ∈ Γ such that C’⊆ C.
The Impossibility Theorem Proof
First some notions.
Partition Γ Partition Γ’
◮ A partition Γ’ is a refinement of a
partition Γ if for every set C’∈ Γ’, there is a set C ∈ Γ such that C’⊆ C.
◮ A collection of partitions is an antichain
if it does not contain two distinct partitions such that one is a refinement of the other.
The Impossibility Theorem Proof
The impossibility result follows from:
Theorem 3.1
If a clustering function f satisfies Scale-Invariance and Consistency, then Range(f ) is an antichain.
The Impossibility Theorem Proof
Some more notions needed to prove theorem 3.1:
◮ For a partition Γ a distance function d(a, b)-conforms to Γ if,
◮ for all pairs of points i, j that belong to the same cluster of Γ,
we have d(i, j) ≤ a
◮ while all pairs of points i, j that belong to the different cluster
- f Γ, we have d(i, j) ≥ b
The Impossibility Theorem Proof
Some more notions needed to prove theorem 3.1:
◮ For a partition Γ a distance function d(a, b)-conforms to Γ if,
◮ for all pairs of points i, j that belong to the same cluster of Γ,
we have d(i, j) ≤ a
◮ while all pairs of points i, j that belong to the different cluster
- f Γ, we have d(i, j) ≥ b
Example:
Partition Γ 5 d(3, 5)-conforms to Γ 3
The Impossibility Theorem Proof
Some more notions needed to prove theorem 3.1:
◮ For a partition Γ a distance function d(a, b)-conforms to Γ if,
◮ for all pairs of points i, j that belong to the same cluster of Γ,
we have d(i, j) ≤ a
◮ while all pairs of points i, j that belong to the different cluster
- f Γ, we have d(i, j) ≥ b
◮ (a,b) is Γ-forcing if, for all distance functions d that
(a, b)-conform to Γ, we have f (d) = Γ.
The Impossibility Theorem Proof
An example that is not Γ-forcing for (a,b):
The Impossibility Theorem Proof
An example that is not Γ-forcing for (a,b):
a b partion Γ d(a, b)-confrom to Γ
The Impossibility Theorem Proof
An example that is not Γ-forcing for (a,b):
But f outputs a different partition Θ, so the pair (a, b) is not Γ-forcing
The Impossibility Theorem Proof
Theorem 3.1
If a clustering function f satisfies Scale-Invariance and Consistency, then Range(f ) is an antichain.
The Impossibility Theorem Proof
Proof:
◮ If a cluster function f satisfies Consistency, then for any
partition Γ ∈ Range(f ), there exist positive real numbers a < b such that the pair (a, b) is Γ-forcing:
The Impossibility Theorem Proof
Proof:
◮ If a cluster function f satisfies Consistency, then for any
partition Γ ∈ Range(f ), there exist positive real numbers a < b such that the pair (a, b) is Γ-forcing:
The Impossibility Theorem Proof
Proof:
◮ If a cluster function f satisfies Consistency, then for any
partition Γ ∈ Range(f ), there exist positive real numbers a < b such that the pair (a, b) is Γ-forcing:
bmax
amin Let amin be the minimum distance between pair of points. Let bmax be the maximum distance be- tween pair of points.
The Impossibility Theorem Proof
Proof:
◮ If a cluster function f satisfies Consistency, then for any
partition Γ ∈ Range(f ), there exist positive real numbers a < b such that the pair (a, b) is Γ-forcing:
a b
choose a ≤ amin choose b ≥ bmax
The Impossibility Theorem Proof
Proof:
◮ If a cluster function f satisfies Consistency, then for any
partition Γ ∈ Range(f ), there exist positive real numbers a < b such that the pair (a, b) is Γ-forcing:
a b
Then any distance function d’ that (a, b)- conforms to Γ must be a Γ-transformation of d.
The Impossibility Theorem Proof
Proof:
◮ If a cluster function f satisfies Consistency, then for any
partition Γ ∈ Range(f ), there exist positive real numbers a < b such that the pair (a, b) is Γ-forcing:
a b
Then any distance function d’ that (a, b)- conforms to Γ must be a Γ-transformation of d.
The Impossibility Theorem Proof
Proof:
◮ If a cluster function f satisfies Consistency, then for any
partition Γ ∈ Range(f ), there exist positive real numbers a < b such that the pair (a, b) is Γ-forcing:
a b
Then any distance function d’ that (a, b)- conforms to Γ must be a Γ-transformation of d. And so by Consistency property, f(d’) = Γ.
The Impossibility Theorem Proof
Partition Γ1 Partition Γ0
◮ Assume there exist distinct partitions
Γ0, Γ1 ∈ Range(f ) s.t. Γ0 is a refinement
- f Γ1
The Impossibility Theorem Proof
Partition Γ1 Partition Γ0 a0 a1 b1 b0
◮ Assume there exist distinct partitions
Γ0, Γ1 ∈ Range(f ) s.t. Γ0 is a refinement
- f Γ1
◮ Let (a0, b0) be a Γ0-forcing pair, and let
(a1, b1) be a Γ1-forcing pair. (Existence proved above)
The Impossibility Theorem Proof
Partition Γ1 Partition Γ0 a0 a1 b1 b0 a2 ǫ
◮ Assume there exist distinct partitions
Γ0, Γ1 ∈ Range(f ) s.t. Γ0 is a refinement
- f Γ1
◮ Let (a0, b0) be a Γ0-forcing pair, and let
(a1, b1) be a Γ1-forcing pair. (Existence proved above)
◮ Let a2 ≤ a1, and let ϵ be
0 < ϵ < a0a2b−1
0 .
The Impossibility Theorem Proof
Partition Γ1 Partition Γ0 a0 a1 b1 b0 a2 ǫ
◮ Assume there exist distinct partitions
Γ0, Γ1 ∈ Range(f ) s.t. Γ0 is a refinement
- f Γ1
◮ Let (a0, b0) be a Γ0-forcing pair, and let
(a1, b1) be a Γ1-forcing pair. (Existence proved above)
◮ Let a2 ≤ a1, and let ϵ be
0 < ϵ < a0a2b−1
0 . ◮ Construct distance function as follows:
◮ For i, j in the same cluster of Γ0:
d(i, j) ≤ ϵ
◮ For i, j in the different cluster of Γ0:
d(i, j) ≥ a2
◮ For i, j in the same cluster of Γ1:
d(i, j) ≤ a1
◮ For i, j in the different cluster of Γ1:
d(i, j) ≥ b1
The Impossibility Theorem Proof
◮ d(a1, b1)-conforms to Γ1, and thus f (d) = Γ1
The Impossibility Theorem Proof
◮ d(a1, b1)-conforms to Γ1, and thus f (d) = Γ1 ◮ Now set α = b0a−1 2
and d′ = α ∗ d.
The Impossibility Theorem Proof
◮ d(a1, b1)-conforms to Γ1, and thus f (d) = Γ1 ◮ Now set α = b0a−1 2
and d′ = α ∗ d.
◮ By scale invariance we have f (d′) = f (d) = Γ1
The Impossibility Theorem Proof
◮ d(a1, b1)-conforms to Γ1, and thus f (d) = Γ1 ◮ Now set α = b0a−1 2
and d′ = α ∗ d.
◮ By scale invariance we have f (d′) = f (d) = Γ1 ◮ But for points i, j in same cluster of Γ0:
◮ d′(i, j) ≤ ϵα = ϵb0a−1
2
< a0.
The Impossibility Theorem Proof
◮ d(a1, b1)-conforms to Γ1, and thus f (d) = Γ1 ◮ Now set α = b0a−1 2
and d′ = α ∗ d.
◮ By scale invariance we have f (d′) = f (d) = Γ1 ◮ But for points i, j in same cluster of Γ0:
◮ d′(i, j) ≤ ϵα = ϵb0a−1
2
< a0.
◮ And points i, j not in same cluster Γ0:
◮ d′(i, j) ≥ a2α = a2b0a−1
2
= b0.
The Impossibility Theorem Proof
◮ d(a1, b1)-conforms to Γ1, and thus f (d) = Γ1 ◮ Now set α = b0a−1 2
and d′ = α ∗ d.
◮ By scale invariance we have f (d′) = f (d) = Γ1 ◮ But for points i, j in same cluster of Γ0:
◮ d′(i, j) ≤ ϵα = ϵb0a−1
2
< a0.
◮ And points i, j not in same cluster Γ0:
◮ d′(i, j) ≥ a2α = a2b0a−1
2
= b0.
◮ Thus d′(a0, b0)-conforms to Γ0, and so f (d′) = Γ0
The Impossibility Theorem Proof
◮ d(a1, b1)-conforms to Γ1, and thus f (d) = Γ1 ◮ Now set α = b0a−1 2
and d′ = α ∗ d.
◮ By scale invariance we have f (d′) = f (d) = Γ1 ◮ But for points i, j in same cluster of Γ0:
◮ d′(i, j) ≤ ϵα = ϵb0a−1
2
< a0.
◮ And points i, j not in same cluster Γ0:
◮ d′(i, j) ≥ a2α = a2b0a−1
2
= b0.
◮ Thus d′(a0, b0)-conforms to Γ0, and so f (d′) = Γ0 ◮ Contradiction as Γ0 ̸= Γ1
Table of Contents
Motivation Definitions The Impossibility Theorem Centroid-Based Clustering and Consistency Relaxing the Properties
Centroid-Based Clustering and Consistency
(k, g)-Centroid-based Clustering is a widely-used approach to clustering:
Centroid-Based Clustering and Consistency
(k, g)-Centroid-based Clustering is a widely-used approach to clustering:
◮ Choose the set of k centroid points T ⊆ S s.t.:
◮ the objective function Λg
d(T) = Σi∈Sg(d(i, T))) is minimized.
◮ d(i, T) =minj∈Td(i, j) ◮ g is any continuous, non-decreasing unbounded function
g : R+ → R+
Centroid-Based Clustering and Consistency
(k, g)-Centroid-based Clustering is a widely-used approach to clustering:
◮ Choose the set of k centroid points T ⊆ S s.t.:
◮ the objective function Λg
d(T) = Σi∈Sg(d(i, T))) is minimized.
◮ d(i, T) =minj∈Td(i, j) ◮ g is any continuous, non-decreasing unbounded function
g : R+ → R+
◮ Clusters: assigning each point in S to its nearest centroid.
Centroid-Based Clustering and Consistency
(k, g)-Centroid-based Clustering is a widely-used approach to clustering:
◮ Choose the set of k centroid points T ⊆ S s.t.:
◮ the objective function Λg
d(T) = Σi∈Sg(d(i, T))) is minimized.
◮ d(i, T) =minj∈Td(i, j) ◮ g is any continuous, non-decreasing unbounded function
g : R+ → R+
◮ Clusters: assigning each point in S to its nearest centroid. ◮ e.g.:
◮ k-median is obtained by setting g(d) = d. ◮ k-means is obtained by setting g(d) = d2.
Centroid-Based Clustering and Consistency
Example:
Centroid-Based Clustering and Consistency
Example:
Centroid-Based Clustering and Consistency
Example:
Centroid-Based Clustering and Consistency
Theorem 4.1
For every k ≥ 2 and every function g chosen as above, and for n sufficiently large relative to k, the (k, g)-centroid clustering function does not satisfy Consistency property.
Centroid-Based Clustering and Consistency
Theorem 4.1
For every k ≥ 2 and every function g chosen as above, and for n sufficiently large relative to k, the (k, g)-centroid clustering function does not satisfy Consistency property. Proof sketch:
Γ-transformation
Centroid-Based Clustering and Consistency
Theorem 4.1
For every k ≥ 2 and every function g chosen as above, and for n sufficiently large relative to k, the (k, g)-centroid clustering function does not satisfy Consistency property. Proof sketch:
Γ-transformation
Table of Contents
Motivation Definitions The Impossibility Theorem Centroid-Based Clustering and Consistency Relaxing the Properties
Relaxing the Properties
You can relax properties according to your problem.
◮ K-Richness ◮ Refinement-Consistency
Summary
◮ Mapper algorithm needs a cluster algorithm. ◮ There exist no cluster algorithm that
◮ scale-invariance ◮ richness ◮ consistency