Stability of Clustering Methods Sasha Rakhlin Ph.D. candidate, MIT - - PowerPoint PPT Presentation

stability of clustering methods
SMART_READER_LITE
LIVE PREVIEW

Stability of Clustering Methods Sasha Rakhlin Ph.D. candidate, MIT - - PowerPoint PPT Presentation

Stability of Clustering Methods Sasha Rakhlin Ph.D. candidate, MIT 1 A procedure is stable if P ( solution perturbed solution > ) 0 2 This talk: A tool for theoretical analysis of stability of clustering algorithms. Idea:


slide-1
SLIDE 1

Stability of Clustering Methods

Sasha Rakhlin

Ph.D. candidate, MIT 1

slide-2
SLIDE 2

A procedure is stable if

P (solution − perturbed solution > ε) → 0

2

slide-3
SLIDE 3

This talk:

A tool for theoretical analysis of stability of clustering algorithms. Idea: phrase clustering as empirical risk minimization and use stability of ERM.

Based on work with A. Caponnetto: “Some properties of ERM over Donsker classes,” submitted to JMLR. 3

slide-4
SLIDE 4

Stability for model selection

4

slide-5
SLIDE 5

Stability for model selection

4

slide-6
SLIDE 6

Stability for model selection

4

slide-7
SLIDE 7

Stability for model selection

If the 2-cluster solution is in our hypothesis space (“realizable” case), we get stability with respect to perturbations of the whole dataset.

4

slide-8
SLIDE 8

Stability for model selection

Instability (w.r.t. complete change of dataset) arises in the “non-realizable” case when there are two or more clusterings of similar ”distance” to the underlying density. What can we say about “non-realizable”? We will show that natural algorithms are stable w.r.t. change of o(√n) points.

5

slide-9
SLIDE 9

Toy example

Choose, according to majority, either left or right half as the cluster. Probability that one point changes the cluster is Ω(n−1/2). This procedure is stable with respect to changes of o(√n) points.

6

slide-10
SLIDE 10

Much harder

Choose, according to majority, a cluster of fixed size. Does the probability of jumps by ε decrease as n → ∞?

7

slide-11
SLIDE 11

Much harder

Choose, according to majority, a cluster of fixed size. Does the probability of jumps by ε decrease as n → ∞? Yes, this procedure is stable w.r.t. changes of o(√n) points, no matter what P is.

7

slide-12
SLIDE 12

Similar problem

Choose, according to majority, k clusters of fixed size. Does the probability of jumps (in L1 distance) by ε decrease as n → ∞?

8

slide-13
SLIDE 13

Similar problem

Choose, according to majority, k clusters of fixed size. Does the probability of jumps (in L1 distance) by ε decrease as n → ∞? Yes, this procedure is stable w.r.t. changes of o(√n) points.

8

slide-14
SLIDE 14

Empirical Risk Minimization

  • A. Caponnetto and A. Rakhlin “Some properties of ERM over Donsker classes,” submitted to JMLR.

9

slide-15
SLIDE 15

Empirical Risk Minimization

  • A. Caponnetto and A. Rakhlin “Some properties of ERM over Donsker classes,” submitted to JMLR.

These examples are instances of empirical risk minimization. The following general result holds for any fixed distribution P: ∀ε > 0, P (fS − fTL1 ≥ ε) → 0 where S and T differ on o(√n) points, and fS, fT are respective almost-minimizers over a P-Donsker class.

9

slide-16
SLIDE 16

Empirical Risk Minimization

  • A. Caponnetto and A. Rakhlin “Some properties of ERM over Donsker classes,” submitted to JMLR.

These examples are instances of empirical risk minimization. The following general result holds for any fixed distribution P: ∀ε > 0, P (fS − fTL1 ≥ ε) → 0 where S and T differ on o(√n) points, and fS, fT are respective almost-minimizers over a P-Donsker class. For binary functions, Donsker = VC.

9

slide-17
SLIDE 17

k-means clustering

We can now study stability of other clustering procedures which

  • ptimize an objective function.

k-means clustering is min

C n

  • i=1

xi − mC(xi)2 which is empirical risk minimization over the class F = {x − mC(X)2 : C is a k-partition and mC(x) are centers}

10

slide-18
SLIDE 18

k-means clustering

11

slide-19
SLIDE 19

k-means clustering

11

slide-20
SLIDE 20

k-means clustering

F = {x − mC(X)2 : C is a k-partition and mC(x) are centers} If F is Donsker (e.g. domain is compact), then L1 stability implies stability of centers mC(x).

11

slide-21
SLIDE 21

MLE density estimation

12

slide-22
SLIDE 22

MLE density estimation

12

slide-23
SLIDE 23

MLE density estimation

max

f∈F n

  • i=1

log f(xi) Under some assumptions on the class F of densities, this should imply stability of modes/clusters.

12

slide-24
SLIDE 24

That’s all

13