An Impossibility Theorem Seminar Algorithms Kevin Chang Eindhoven - - PowerPoint PPT Presentation

an impossibility theorem
SMART_READER_LITE
LIVE PREVIEW

An Impossibility Theorem Seminar Algorithms Kevin Chang Eindhoven - - PowerPoint PPT Presentation

An Impossibility Theorem Seminar Algorithms Kevin Chang Eindhoven University of Technology June 19, 2018 Contents Motivation Definitions The Impossibility Theorem Centroid-Based Clustering and Consistency Relaxing the Properties Table of


slide-1
SLIDE 1

An Impossibility Theorem

Seminar Algorithms Kevin Chang

Eindhoven University of Technology

June 19, 2018

slide-2
SLIDE 2

Contents

Motivation Definitions The Impossibility Theorem Centroid-Based Clustering and Consistency Relaxing the Properties

slide-3
SLIDE 3

Table of Contents

Motivation Definitions The Impossibility Theorem Centroid-Based Clustering and Consistency Relaxing the Properties

slide-4
SLIDE 4

Motivation

slide-5
SLIDE 5

Motivation

◮ The mapper algorithm needs a ’good’ cluster algorithm

slide-6
SLIDE 6

Motivation

Clustering:

◮ ‘Clustering’ cannot be precisely defined ◮ Intuitively, group set of objects that are ‘similar’. ◮ Unsupervised

slide-7
SLIDE 7

Motivation

Clustering:

◮ ‘Clustering’ cannot be precisely defined ◮ Intuitively, group set of objects that are ‘similar’. ◮ Unsupervised

Example:

slide-8
SLIDE 8

Motivation

Clustering:

◮ ‘Clustering’ cannot be precisely defined ◮ Intuitively, group set of objects that are ‘similar’. ◮ Unsupervised

Example:

slide-9
SLIDE 9

Motivation

◮ There exist no universal good clustering algorithm. ◮ Every clustering algorithm assumes a certain model. ◮ e.g. k-means tends to generate hyperspherical clusters.

slide-10
SLIDE 10

Motivation

Example:

slide-11
SLIDE 11

Motivation

Example:

k-means

slide-12
SLIDE 12

Motivation

Example:

Single-link

slide-13
SLIDE 13

Motivation

Example:

slide-14
SLIDE 14

Motivation

Example:

k-means

slide-15
SLIDE 15

Motivation

Example:

Single-link

slide-16
SLIDE 16

Motivation

The idea of no universal clustering algorithm is partially captured by the impossibility theorem:

◮ There is no single clustering algorithm simultaneously satisfies

a set of basic intuitive axioms of data clustering.

slide-17
SLIDE 17

Table of Contents

Motivation Definitions The Impossibility Theorem Centroid-Based Clustering and Consistency Relaxing the Properties

slide-18
SLIDE 18

Definitions

◮ S is a set of n points

slide-19
SLIDE 19

Definitions

◮ S is a set of n points ◮ A distance function is any function d : S × S → R such

that:

◮ For distinct i, j ∈ S, we have d(i, j) ≥ 0. ◮ d(i, j) = 0 iff i = j. ◮ d(i, j) = d(j, i).

slide-20
SLIDE 20

Definitions

◮ S is a set of n points ◮ A distance function is any function d : S × S → R such

that:

◮ For distinct i, j ∈ S, we have d(i, j) ≥ 0. ◮ d(i, j) = 0 iff i = j. ◮ d(i, j) = d(j, i).

◮ A clustering function is any function f (d) that takes a

distance function d, and returns a partition of Γ of S.

◮ Points are not assumed to belong to any ambient space. ◮ The sets in Γ will be called its clusters.

slide-21
SLIDE 21

Definitions

Example:

Set of points

slide-22
SLIDE 22

Definitions

Example:

Distance function

slide-23
SLIDE 23

Definitions

Example:

A partition of S, existing out of 3 clusters

slide-24
SLIDE 24

Table of Contents

Motivation Definitions The Impossibility Theorem Centroid-Based Clustering and Consistency Relaxing the Properties

slide-25
SLIDE 25

Scale-Invariance

Axiom 1: Scale-Invariance

For any distance function d and any α > 0, we have f (d) = f (α · d)

◮ i.e. cluster functions should not have a built-in ’length-scale’.

slide-26
SLIDE 26

Scale-Invariance

slide-27
SLIDE 27

Richness

Let Range(f ) denote the set of all partitions Γ such that f (d) = Γ for some distance function d

Axiom 2: Richness

Range(f ) is equal to the set of all partitions of S.

◮ i.e. every partition of S is a possible output.

slide-28
SLIDE 28

Richness

slide-29
SLIDE 29

Consistency

◮ Let Γ be a partition of S, and d and d’ two distance functions

  • n S.
slide-30
SLIDE 30

Consistency

◮ Let Γ be a partition of S, and d and d’ two distance functions

  • n S.

◮ d’ is a Γ-transformation of d if

  • 1. for all i, j ∈ S belonging to the same cluster of Γ, we have

d’(i, j) ≤ d(i, j);

  • 2. for all i, j ∈ S belonging to different clusters of Γ, we have

d’(i, j) ≥ d(i, j)

slide-31
SLIDE 31

Consistency

◮ Let Γ be a partition of S, and d and d’ two distance functions

  • n S.

◮ d’ is a Γ-transformation of d if

  • 1. for all i, j ∈ S belonging to the same cluster of Γ, we have

d’(i, j) ≤ d(i, j);

  • 2. for all i, j ∈ S belonging to different clusters of Γ, we have

d’(i, j) ≥ d(i, j)

Axiom 3: Consistency

Let d and d’ be two distance functions. If f (d) = Γ, and d’ is a Γ-transformation of d, then f (d’) = Γ

◮ i.e. Cluster stays the same after reducing the distance within

cluster and enlarging distance between cluster.

slide-32
SLIDE 32

Consistency

slide-33
SLIDE 33

The Impossibility Theorem

Theorem 2.1

For each n ≥ 2, there is no clustering function f that satisfies Scale-Invariance, Richness, and Consistency.

slide-34
SLIDE 34

Single-linkage

◮ Single-linkage is a family of clustering function. ◮ Initialize each point as its own cluster. ◮ Repeatedly merge pair of clusters whose distance to one

another is minimum until a stopping condition is reached.

slide-35
SLIDE 35

Single-linkage

Example:

slide-36
SLIDE 36

Single-linkage

Example:

slide-37
SLIDE 37

Single-linkage

Example:

slide-38
SLIDE 38

Single-linkage

Example:

slide-39
SLIDE 39

Single-linkage

Example:

slide-40
SLIDE 40

Single-linkage

Example:

slide-41
SLIDE 41

Examples of Impossibility

◮ k-cluster stopping condition: Stop adding edges when there

are k connected components.

slide-42
SLIDE 42

Examples of Impossibility

◮ k-cluster stopping condition: Stop adding edges when there

are k connected components.

◮ For any k ≥ 1, and n ≥ k, this stopping condition satisfies

Scale-Invariance and Consistency.

slide-43
SLIDE 43

Examples of Impossibility

◮ k-cluster stopping condition: Stop adding edges when there

are k connected components.

◮ For any k ≥ 1, and n ≥ k, this stopping condition satisfies

Scale-Invariance and Consistency.

slide-44
SLIDE 44

Examples of Impossibility

◮ k-cluster stopping condition: Stop adding edges when there

are k connected components.

◮ For any k ≥ 1, and n ≥ k, this stopping condition satisfies

Scale-Invariance and Consistency.

slide-45
SLIDE 45

Examples of Impossibility

◮ distance-r stopping condition: Only add edges of weight at

most r.

slide-46
SLIDE 46

Examples of Impossibility

◮ distance-r stopping condition: Only add edges of weight at

most r.

◮ For any r > 0, and any n ≥ 2, this stopping condition satisfies

Richness and Consistency.

slide-47
SLIDE 47

Examples of Impossibility

◮ distance-r stopping condition: Only add edges of weight at

most r.

◮ For any r > 0, and any n ≥ 2, this stopping condition satisfies

Richness and Consistency.

r

slide-48
SLIDE 48

Examples of Impossibility

◮ distance-r stopping condition: Only add edges of weight at

most r.

◮ For any r > 0, and any n ≥ 2, this stopping condition satisfies

Richness and Consistency.

r

slide-49
SLIDE 49

Examples of Impossibility

◮ scale-α stopping condition: Let p∗ denote the maximum

pairwise distance. Add only edges of weight at most αp∗

slide-50
SLIDE 50

Examples of Impossibility

◮ scale-α stopping condition: Let p∗ denote the maximum

pairwise distance. Add only edges of weight at most αp∗

◮ For any positive α < 1, and n ≥ 3, this stopping condition

satisfies Scale-Invariance and Richness

slide-51
SLIDE 51

Examples of Impossibility

◮ scale-α stopping condition: Let p∗ denote the maximum

pairwise distance. Add only edges of weight at most αp∗

◮ For any positive α < 1, and n ≥ 3, this stopping condition

satisfies Scale-Invariance and Richness

p∗ αp∗

slide-52
SLIDE 52

Examples of Impossibility

◮ scale-α stopping condition: Let p∗ denote the maximum

pairwise distance. Add only edges of weight at most αp∗

◮ For any positive α < 1, and n ≥ 3, this stopping condition

satisfies Scale-Invariance and Richness

αp∗ p*

slide-53
SLIDE 53

The Impossibility Theorem Proof Intuition

slide-54
SLIDE 54

The Impossibility Theorem Proof

First some notions.

Partition Γ Partition Γ’

◮ A partition Γ’ is a refinement of a

partition Γ if for every set C’∈ Γ’, there is a set C ∈ Γ such that C’⊆ C.

slide-55
SLIDE 55

The Impossibility Theorem Proof

First some notions.

Partition Γ Partition Γ’

◮ A partition Γ’ is a refinement of a

partition Γ if for every set C’∈ Γ’, there is a set C ∈ Γ such that C’⊆ C.

◮ A collection of partitions is an antichain

if it does not contain two distinct partitions such that one is a refinement of the other.

slide-56
SLIDE 56

The Impossibility Theorem Proof

The impossibility result follows from:

Theorem 3.1

If a clustering function f satisfies Scale-Invariance and Consistency, then Range(f ) is an antichain.

slide-57
SLIDE 57

The Impossibility Theorem Proof

Some more notions needed to prove theorem 3.1:

◮ For a partition Γ a distance function d(a, b)-conforms to Γ if,

◮ for all pairs of points i, j that belong to the same cluster of Γ,

we have d(i, j) ≤ a

◮ while all pairs of points i, j that belong to the different cluster

  • f Γ, we have d(i, j) ≥ b
slide-58
SLIDE 58

The Impossibility Theorem Proof

Some more notions needed to prove theorem 3.1:

◮ For a partition Γ a distance function d(a, b)-conforms to Γ if,

◮ for all pairs of points i, j that belong to the same cluster of Γ,

we have d(i, j) ≤ a

◮ while all pairs of points i, j that belong to the different cluster

  • f Γ, we have d(i, j) ≥ b

Example:

Partition Γ 5 d(3, 5)-conforms to Γ 3

slide-59
SLIDE 59

The Impossibility Theorem Proof

Some more notions needed to prove theorem 3.1:

◮ For a partition Γ a distance function d(a, b)-conforms to Γ if,

◮ for all pairs of points i, j that belong to the same cluster of Γ,

we have d(i, j) ≤ a

◮ while all pairs of points i, j that belong to the different cluster

  • f Γ, we have d(i, j) ≥ b

◮ (a,b) is Γ-forcing if, for all distance functions d that

(a, b)-conform to Γ, we have f (d) = Γ.

slide-60
SLIDE 60

The Impossibility Theorem Proof

An example that is not Γ-forcing for (a,b):

slide-61
SLIDE 61

The Impossibility Theorem Proof

An example that is not Γ-forcing for (a,b):

a b partion Γ d(a, b)-confrom to Γ

slide-62
SLIDE 62

The Impossibility Theorem Proof

An example that is not Γ-forcing for (a,b):

But f outputs a different partition Θ, so the pair (a, b) is not Γ-forcing

slide-63
SLIDE 63

The Impossibility Theorem Proof

Theorem 3.1

If a clustering function f satisfies Scale-Invariance and Consistency, then Range(f ) is an antichain.

slide-64
SLIDE 64

The Impossibility Theorem Proof

Proof:

◮ If a cluster function f satisfies Consistency, then for any

partition Γ ∈ Range(f ), there exist positive real numbers a < b such that the pair (a, b) is Γ-forcing:

slide-65
SLIDE 65

The Impossibility Theorem Proof

Proof:

◮ If a cluster function f satisfies Consistency, then for any

partition Γ ∈ Range(f ), there exist positive real numbers a < b such that the pair (a, b) is Γ-forcing:

slide-66
SLIDE 66

The Impossibility Theorem Proof

Proof:

◮ If a cluster function f satisfies Consistency, then for any

partition Γ ∈ Range(f ), there exist positive real numbers a < b such that the pair (a, b) is Γ-forcing:

bmax

amin Let amin be the minimum distance between pair of points. Let bmax be the maximum distance be- tween pair of points.

slide-67
SLIDE 67

The Impossibility Theorem Proof

Proof:

◮ If a cluster function f satisfies Consistency, then for any

partition Γ ∈ Range(f ), there exist positive real numbers a < b such that the pair (a, b) is Γ-forcing:

a b

choose a ≤ amin choose b ≥ bmax

slide-68
SLIDE 68

The Impossibility Theorem Proof

Proof:

◮ If a cluster function f satisfies Consistency, then for any

partition Γ ∈ Range(f ), there exist positive real numbers a < b such that the pair (a, b) is Γ-forcing:

a b

Then any distance function d’ that (a, b)- conforms to Γ must be a Γ-transformation of d.

slide-69
SLIDE 69

The Impossibility Theorem Proof

Proof:

◮ If a cluster function f satisfies Consistency, then for any

partition Γ ∈ Range(f ), there exist positive real numbers a < b such that the pair (a, b) is Γ-forcing:

a b

Then any distance function d’ that (a, b)- conforms to Γ must be a Γ-transformation of d.

slide-70
SLIDE 70

The Impossibility Theorem Proof

Proof:

◮ If a cluster function f satisfies Consistency, then for any

partition Γ ∈ Range(f ), there exist positive real numbers a < b such that the pair (a, b) is Γ-forcing:

a b

Then any distance function d’ that (a, b)- conforms to Γ must be a Γ-transformation of d. And so by Consistency property, f(d’) = Γ.

slide-71
SLIDE 71

The Impossibility Theorem Proof

Partition Γ1 Partition Γ0

◮ Assume there exist distinct partitions

Γ0, Γ1 ∈ Range(f ) s.t. Γ0 is a refinement

  • f Γ1
slide-72
SLIDE 72

The Impossibility Theorem Proof

Partition Γ1 Partition Γ0 a0 a1 b1 b0

◮ Assume there exist distinct partitions

Γ0, Γ1 ∈ Range(f ) s.t. Γ0 is a refinement

  • f Γ1

◮ Let (a0, b0) be a Γ0-forcing pair, and let

(a1, b1) be a Γ1-forcing pair. (Existence proved above)

slide-73
SLIDE 73

The Impossibility Theorem Proof

Partition Γ1 Partition Γ0 a0 a1 b1 b0 a2 ǫ

◮ Assume there exist distinct partitions

Γ0, Γ1 ∈ Range(f ) s.t. Γ0 is a refinement

  • f Γ1

◮ Let (a0, b0) be a Γ0-forcing pair, and let

(a1, b1) be a Γ1-forcing pair. (Existence proved above)

◮ Let a2 ≤ a1, and let ϵ be

0 < ϵ < a0a2b−1

0 .

slide-74
SLIDE 74

The Impossibility Theorem Proof

Partition Γ1 Partition Γ0 a0 a1 b1 b0 a2 ǫ

◮ Assume there exist distinct partitions

Γ0, Γ1 ∈ Range(f ) s.t. Γ0 is a refinement

  • f Γ1

◮ Let (a0, b0) be a Γ0-forcing pair, and let

(a1, b1) be a Γ1-forcing pair. (Existence proved above)

◮ Let a2 ≤ a1, and let ϵ be

0 < ϵ < a0a2b−1

0 . ◮ Construct distance function as follows:

◮ For i, j in the same cluster of Γ0:

d(i, j) ≤ ϵ

◮ For i, j in the different cluster of Γ0:

d(i, j) ≥ a2

◮ For i, j in the same cluster of Γ1:

d(i, j) ≤ a1

◮ For i, j in the different cluster of Γ1:

d(i, j) ≥ b1

slide-75
SLIDE 75

The Impossibility Theorem Proof

◮ d(a1, b1)-conforms to Γ1, and thus f (d) = Γ1

slide-76
SLIDE 76

The Impossibility Theorem Proof

◮ d(a1, b1)-conforms to Γ1, and thus f (d) = Γ1 ◮ Now set α = b0a−1 2

and d′ = α ∗ d.

slide-77
SLIDE 77

The Impossibility Theorem Proof

◮ d(a1, b1)-conforms to Γ1, and thus f (d) = Γ1 ◮ Now set α = b0a−1 2

and d′ = α ∗ d.

◮ By scale invariance we have f (d′) = f (d) = Γ1

slide-78
SLIDE 78

The Impossibility Theorem Proof

◮ d(a1, b1)-conforms to Γ1, and thus f (d) = Γ1 ◮ Now set α = b0a−1 2

and d′ = α ∗ d.

◮ By scale invariance we have f (d′) = f (d) = Γ1 ◮ But for points i, j in same cluster of Γ0:

◮ d′(i, j) ≤ ϵα = ϵb0a−1

2

< a0.

slide-79
SLIDE 79

The Impossibility Theorem Proof

◮ d(a1, b1)-conforms to Γ1, and thus f (d) = Γ1 ◮ Now set α = b0a−1 2

and d′ = α ∗ d.

◮ By scale invariance we have f (d′) = f (d) = Γ1 ◮ But for points i, j in same cluster of Γ0:

◮ d′(i, j) ≤ ϵα = ϵb0a−1

2

< a0.

◮ And points i, j not in same cluster Γ0:

◮ d′(i, j) ≥ a2α = a2b0a−1

2

= b0.

slide-80
SLIDE 80

The Impossibility Theorem Proof

◮ d(a1, b1)-conforms to Γ1, and thus f (d) = Γ1 ◮ Now set α = b0a−1 2

and d′ = α ∗ d.

◮ By scale invariance we have f (d′) = f (d) = Γ1 ◮ But for points i, j in same cluster of Γ0:

◮ d′(i, j) ≤ ϵα = ϵb0a−1

2

< a0.

◮ And points i, j not in same cluster Γ0:

◮ d′(i, j) ≥ a2α = a2b0a−1

2

= b0.

◮ Thus d′(a0, b0)-conforms to Γ0, and so f (d′) = Γ0

slide-81
SLIDE 81

The Impossibility Theorem Proof

◮ d(a1, b1)-conforms to Γ1, and thus f (d) = Γ1 ◮ Now set α = b0a−1 2

and d′ = α ∗ d.

◮ By scale invariance we have f (d′) = f (d) = Γ1 ◮ But for points i, j in same cluster of Γ0:

◮ d′(i, j) ≤ ϵα = ϵb0a−1

2

< a0.

◮ And points i, j not in same cluster Γ0:

◮ d′(i, j) ≥ a2α = a2b0a−1

2

= b0.

◮ Thus d′(a0, b0)-conforms to Γ0, and so f (d′) = Γ0 ◮ Contradiction as Γ0 ̸= Γ1

slide-82
SLIDE 82

Table of Contents

Motivation Definitions The Impossibility Theorem Centroid-Based Clustering and Consistency Relaxing the Properties

slide-83
SLIDE 83

Centroid-Based Clustering and Consistency

(k, g)-Centroid-based Clustering is a widely-used approach to clustering:

slide-84
SLIDE 84

Centroid-Based Clustering and Consistency

(k, g)-Centroid-based Clustering is a widely-used approach to clustering:

◮ Choose the set of k centroid points T ⊆ S s.t.:

◮ the objective function Λg

d(T) = Σi∈Sg(d(i, T))) is minimized.

◮ d(i, T) =minj∈Td(i, j) ◮ g is any continuous, non-decreasing unbounded function

g : R+ → R+

slide-85
SLIDE 85

Centroid-Based Clustering and Consistency

(k, g)-Centroid-based Clustering is a widely-used approach to clustering:

◮ Choose the set of k centroid points T ⊆ S s.t.:

◮ the objective function Λg

d(T) = Σi∈Sg(d(i, T))) is minimized.

◮ d(i, T) =minj∈Td(i, j) ◮ g is any continuous, non-decreasing unbounded function

g : R+ → R+

◮ Clusters: assigning each point in S to its nearest centroid.

slide-86
SLIDE 86

Centroid-Based Clustering and Consistency

(k, g)-Centroid-based Clustering is a widely-used approach to clustering:

◮ Choose the set of k centroid points T ⊆ S s.t.:

◮ the objective function Λg

d(T) = Σi∈Sg(d(i, T))) is minimized.

◮ d(i, T) =minj∈Td(i, j) ◮ g is any continuous, non-decreasing unbounded function

g : R+ → R+

◮ Clusters: assigning each point in S to its nearest centroid. ◮ e.g.:

◮ k-median is obtained by setting g(d) = d. ◮ k-means is obtained by setting g(d) = d2.

slide-87
SLIDE 87

Centroid-Based Clustering and Consistency

Example:

slide-88
SLIDE 88

Centroid-Based Clustering and Consistency

Example:

slide-89
SLIDE 89

Centroid-Based Clustering and Consistency

Example:

slide-90
SLIDE 90

Centroid-Based Clustering and Consistency

Theorem 4.1

For every k ≥ 2 and every function g chosen as above, and for n sufficiently large relative to k, the (k, g)-centroid clustering function does not satisfy Consistency property.

slide-91
SLIDE 91

Centroid-Based Clustering and Consistency

Theorem 4.1

For every k ≥ 2 and every function g chosen as above, and for n sufficiently large relative to k, the (k, g)-centroid clustering function does not satisfy Consistency property. Proof sketch:

Γ-transformation

slide-92
SLIDE 92

Centroid-Based Clustering and Consistency

Theorem 4.1

For every k ≥ 2 and every function g chosen as above, and for n sufficiently large relative to k, the (k, g)-centroid clustering function does not satisfy Consistency property. Proof sketch:

Γ-transformation

slide-93
SLIDE 93

Table of Contents

Motivation Definitions The Impossibility Theorem Centroid-Based Clustering and Consistency Relaxing the Properties

slide-94
SLIDE 94

Relaxing the Properties

You can relax properties according to your problem.

◮ K-Richness ◮ Refinement-Consistency

slide-95
SLIDE 95

Summary

◮ Mapper algorithm needs a cluster algorithm. ◮ There exist no cluster algorithm that

◮ scale-invariance ◮ richness ◮ consistency

◮ Examples of single-linkage and centroid-based clustering

algorithms.

◮ Possibilities to relax properties.

slide-96
SLIDE 96

Sources

◮ J. Kleinberg, An Impossibility Theorem for Clustering, NIPS

2002 Proceedings of the 15th International Conference on Neural Information Processing Systems, 463-470, 2002.

◮ Estivill-Castro V, Why so many clustering algorithms: a

position paper, ACM SIGKDD Explor, Newslett, 4:65-75, 2002.

◮ ‘Distances between Clustering, Hierarchical Clustering’ ◮ ‘Tutorial of topological data analysis part 3’ ◮ ‘Is clustering mathematically impossible?’