A Sober look at Clustering Stability Shai Ben-David 1 Ulrike von - - PowerPoint PPT Presentation

a sober look at clustering stability
SMART_READER_LITE
LIVE PREVIEW

A Sober look at Clustering Stability Shai Ben-David 1 Ulrike von - - PowerPoint PPT Presentation

A Sober look at Clustering Stability Shai Ben-David 1 Ulrike von Luxburg 2 Dvid Pl 1 1 School of Computer Science University of Waterloo 2 Fraunhofer IPSI, Darmstadt, Germany COLT 2006 Shai Ben-David, Ulrike von Luxburg, Dvid Pl A Sober


slide-1
SLIDE 1

A Sober look at Clustering Stability

Shai Ben-David1 Ulrike von Luxburg2 Dávid Pál1

1School of Computer Science

University of Waterloo

2Fraunhofer IPSI, Darmstadt, Germany

COLT 2006

Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

slide-2
SLIDE 2

What is clustering?

By clustering we mean grouping data according to some distance/similarity measure. Data

Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

slide-3
SLIDE 3

What is clustering?

By clustering we mean grouping data according to some distance/similarity measure. Clusters (Linkage algorithm)

Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

slide-4
SLIDE 4

What is clustering?

By clustering we mean grouping data according to some distance/similarity measure. Clusters (Center-based algorithm)

Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

slide-5
SLIDE 5

Correctness of clustering

Q: Clustering is not well defined problem. How do we know that we cluster correctly? A: Common solution – Stability.

Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

slide-6
SLIDE 6

Correctness of clustering

Q: Clustering is not well defined problem. How do we know that we cluster correctly? A: Common solution – Stability.

Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

slide-7
SLIDE 7

Stability: Idea of our definition

Pick your favorite clustering algorithm A. Generate two independent samples S1 and S2. Stability How much will clusterings A(S1) and A(S2) differ? If for large sample sizes clusterings A(S1) and A(S2) are almost identical, we say that A is stable. Otherwise unstable.

Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

slide-8
SLIDE 8

Stability: Idea of our definition

Pick your favorite clustering algorithm A. Generate two independent samples S1 and S2. Stability How much will clusterings A(S1) and A(S2) differ? If for large sample sizes clusterings A(S1) and A(S2) are almost identical, we say that A is stable. Otherwise unstable.

Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

slide-9
SLIDE 9

Stability: Idea of our definition

Pick your favorite clustering algorithm A. Generate two independent samples S1 and S2. Stability How much will clusterings A(S1) and A(S2) differ? If for large sample sizes clusterings A(S1) and A(S2) are almost identical, we say that A is stable. Otherwise unstable.

Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

slide-10
SLIDE 10

Example of stability

Probability distribution

Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

slide-11
SLIDE 11

Example of stability

Sample S1

Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

slide-12
SLIDE 12

Example of stability

Clustering A(S1)

Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

slide-13
SLIDE 13

Example of stability

Sample S2

Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

slide-14
SLIDE 14

Example of stability

Clustering A(S2)

Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

slide-15
SLIDE 15

Example of stability

Clusterings A(S1) and A(S2) are equivalent.

Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

slide-16
SLIDE 16

Example of instability

Probability distribution

Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

slide-17
SLIDE 17

Example of instability

Sample S1

Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

slide-18
SLIDE 18

Example of instability

Clustering A(S1)

Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

slide-19
SLIDE 19

Example of instability

Sample S2

Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

slide-20
SLIDE 20

Example of instability

Clustering A(S2)

Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

slide-21
SLIDE 21

Example of instability

Clusterings A(S1) and A(S2) are different

Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

slide-22
SLIDE 22

Motivation

Why do people think stability is important? For tuning parameters of clusterings algorithms, such as number of clusters To verify meaningfulness of clustering outputted by algorithm.

Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

slide-23
SLIDE 23

Motivation

Why do people think stability is important? For tuning parameters of clusterings algorithms, such as number of clusters To verify meaningfulness of clustering outputted by algorithm.

Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

slide-24
SLIDE 24

Motivation

Our intention: Provide theoretical justification. We discovered: The popular belief is false.

Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

slide-25
SLIDE 25

Motivation

Our intention: Provide theoretical justification. We discovered: The popular belief is false.

Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

slide-26
SLIDE 26

First example

1D probability distribution

x Probability density

50% 50%

Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

slide-27
SLIDE 27

First example

2 centers – stable

Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

slide-28
SLIDE 28

First example

3 centers – solution #1

Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

slide-29
SLIDE 29

First example

3 centers – solution #2 = ⇒ unstable

Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

slide-30
SLIDE 30

First example

slightly asymmetric distribution

x Probability density

(50 + ǫ)% (50 − ǫ)%

Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

slide-31
SLIDE 31

First example

2 centers – stable

Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

slide-32
SLIDE 32

First example

3 centers – stable

x

Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

slide-33
SLIDE 33

Second example

1D probability distribution

x Probability density

∼ 90% ∼ 10%

Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

slide-34
SLIDE 34

Second example

2 centers – unstable

∼ 90% ∼ 10%

Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

slide-35
SLIDE 35

Second example

3 centers – stable

x

∼ 90% ∼ 10%

Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

slide-36
SLIDE 36

Our results

Theorem For a cost based algorithm (e.g. k-means, k-medians): If the optimization problem has unique optimum, then the algorithm is stable. If the underlying probability distribution is symmetric and the optimization problem has multiple symmetric optima, then the algorithm is unstable.

Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

slide-37
SLIDE 37

Our results

Theorem For a cost based algorithm (e.g. k-means, k-medians): If the optimization problem has unique optimum, then the algorithm is stable. If the underlying probability distribution is symmetric and the optimization problem has multiple symmetric optima, then the algorithm is unstable.

Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

slide-38
SLIDE 38

Our results

Theorem For a cost based algorithm (e.g. k-means, k-medians): If the optimization problem has unique optimum, then the algorithm is stable. If the underlying probability distribution is symmetric and the optimization problem has multiple symmetric optima, then the algorithm is unstable.

Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

slide-39
SLIDE 39

Conclusion

Stability, contrary to common belief, does not measure validity of a clustering or meaningfulness of choice of number of clusters. Instead, it measures the number of solutions to the clustering optimization problem for the underlying probability distribution.

Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

slide-40
SLIDE 40

Open problems

Q: Is symmetry really needed for instability? A: No! (Work in progress, together with Shai Ben-David & Hans Ulrich Simon) Analyze finite sample sizes, and give explicit bounds. Analyze other types of algorithms e.g. linkage algorithms.

Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

slide-41
SLIDE 41

Open problems

Q: Is symmetry really needed for instability? A: No! (Work in progress, together with Shai Ben-David & Hans Ulrich Simon) Analyze finite sample sizes, and give explicit bounds. Analyze other types of algorithms e.g. linkage algorithms.

Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

slide-42
SLIDE 42

Open problems

Q: Is symmetry really needed for instability? A: No! (Work in progress, together with Shai Ben-David & Hans Ulrich Simon) Analyze finite sample sizes, and give explicit bounds. Analyze other types of algorithms e.g. linkage algorithms.

Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

slide-43
SLIDE 43

Open problems

Q: Is symmetry really needed for instability? A: No! (Work in progress, together with Shai Ben-David & Hans Ulrich Simon) Analyze finite sample sizes, and give explicit bounds. Analyze other types of algorithms e.g. linkage algorithms.

Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

slide-44
SLIDE 44

Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

slide-45
SLIDE 45

Concrete demonstration of our analysis: k-means

Consider k-means in metric space (X, ℓ). Given a sample S = {x1, x2, . . . , xm}, we search centers c1, c2, . . . , ck. The k-means algorithm minimizes the empirical cost cost(S; c1, c2, . . . , ck) = 1 m

  • x∈S

min

1≤i≤k (ℓ(ci, x))2

As m → ∞ this converges to the true cost [Ben-David, COLT04] cost(P; c1, c2, . . . , ck) = Exp

x∈P

min

1≤i≤k (ℓ(ci, x))2

Minimizing cost(S; .) is for large samples almost the same as minimizing cost(P; .).

Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

slide-46
SLIDE 46

Concrete demonstration of our analysis: k-means

Consider k-means in metric space (X, ℓ). Given a sample S = {x1, x2, . . . , xm}, we search centers c1, c2, . . . , ck. The k-means algorithm minimizes the empirical cost cost(S; c1, c2, . . . , ck) = 1 m

  • x∈S

min

1≤i≤k (ℓ(ci, x))2

As m → ∞ this converges to the true cost [Ben-David, COLT04] cost(P; c1, c2, . . . , ck) = Exp

x∈P

min

1≤i≤k (ℓ(ci, x))2

Minimizing cost(S; .) is for large samples almost the same as minimizing cost(P; .).

Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

slide-47
SLIDE 47

Concrete demonstration of our analysis: k-means

What happens if the function cost(P; c1, c2, . . . , ck) has more than one k-tuple of centers minimizing it?

Instability !

Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

slide-48
SLIDE 48

Concrete demonstration of our analysis: k-means

What happens if the function cost(P; c1, c2, . . . , ck) has more than one k-tuple of centers minimizing it?

Instability !

Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

slide-49
SLIDE 49

Concrete demonstration of our analysis: k-means

What happens if the function cost(P; c1, c2, . . . , ck) has more than one k-tuple of centers minimizing it?

Instability !

Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

slide-50
SLIDE 50

Example of instability

Searching 2 centers Probability distribution (perfectly symmetric)

Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

slide-51
SLIDE 51

Example of instability

Searching 2 centers Optimal solution #1

Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

slide-52
SLIDE 52

Example of instability

Searching 2 centers Optimal solution #2

Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability

slide-53
SLIDE 53

Example of instability

Searching 2 centers Optimal solution #3

Shai Ben-David, Ulrike von Luxburg, Dávid Pál A Sober look at Clustering Stability