On the Worst-Case Complexity of the k-Means Method Sergei - - PowerPoint PPT Presentation

on the worst case complexity of the k means method
SMART_READER_LITE
LIVE PREVIEW

On the Worst-Case Complexity of the k-Means Method Sergei - - PowerPoint PPT Presentation

On the Worst-Case Complexity of the k-Means Method Sergei Vassilvitskii David Arthur (Stanford University) Clustering R d Given points in split them into similar groups. k n 2 Clustering Objectives Let be the closest


slide-1
SLIDE 1

On the Worst-Case Complexity

  • f the k-Means Method

Sergei Vassilvitskii David Arthur (Stanford University)

slide-2
SLIDE 2

Given points in split them into similar groups.

Clustering

2

Rd n k

slide-3
SLIDE 3

Clustering Objectives

k-Center:

3

Let be the closest cluster center to . C(x) min max

x∈X x − C(x)

x

slide-4
SLIDE 4

Clustering Objectives

k-Median:

4

Let be the closest cluster center to . C(x) x min

  • x∈X

x − C(x)

slide-5
SLIDE 5

Clustering Objectives

k-Median Squared:

5

Let be the closest cluster center to . C(x) x min

  • x∈X

x − C(x)2 Much more sensitive to outliers

slide-6
SLIDE 6

Lloyd’s Method: K-means

Initialize with random clusters

6

slide-7
SLIDE 7

Lloyd’s Method: K-means

Assign each point to nearest center

7

slide-8
SLIDE 8

Lloyd’s Method: K-means

Recompute optimum centers (means)

8

slide-9
SLIDE 9

Lloyd’s Method: K-means

Repeat: Assign points to nearest center

9

slide-10
SLIDE 10

Lloyd’s Method: K-means

Repeat: Recompute centers

10

slide-11
SLIDE 11

Lloyd’s Method: K-means

Repeat...

11

slide-12
SLIDE 12

Lloyd’s Method: K-means

Repeat...Until clustering does not change

12

slide-13
SLIDE 13

Analysis

How good is this algorithm? Finds a local optimum

13

That’s arbitrarily worse than optimal solution

slide-14
SLIDE 14

Analysis

How fast is this algorithm? In practice: VERY fast: e.g. Digit Recognition dataset with Converges after 60 iterations In theory: Stay Tuned.

14

n = 60, 000, d = 700

slide-15
SLIDE 15

Previous Work

Lower Bounds

  • n the line

in the plane Upper Bounds:

  • n the line, spread

Exponential bounds:

15

Ω(n2) Ω(n) O(n∆2) ∆ = maxx,y x − y minx,y x − y O(kn), O(nkd)

slide-16
SLIDE 16

Our Results

Lower Bound:

2Ω(√n)

16

Smoothed Upper Bounds: is the smoothness factor Diameter of the pointset σ D O

  • n2+2/d

D σ 2 22n/d

  • O
  • nk+2/d

D σ 2

slide-17
SLIDE 17

Rest of the Talk

Lower Bound Sketch Upper Bound Sketch Open Problems

17

slide-18
SLIDE 18

Lower Bound

General Idea: Make a “Reset Widget”: If k-Means takes time on , create a new point set , s.t. k-Means takes time to terminate.

18

t X X 2t

slide-19
SLIDE 19

Lower Bound: Sketch

Initial Clustering:

t steps

19

slide-20
SLIDE 20

Lower Bound: Sketch

With Widget

t steps reset t steps

slide-21
SLIDE 21

Lower Bound Details

Three Main Ideas: Signaling - Recognizing when to start flipping the switch Resetting - Setting the cluster centers back to

  • riginal position

Odds & Ends - Clean-up to make the process recursive

21

slide-22
SLIDE 22

Signaling

Suppose that when k-Means terminates, there is one cluster center that has never appeared before. We use this as a signal to start the reset sequence.

22

p

slide-23
SLIDE 23

Signaling

Suppose that when k-Means terminates, there is one cluster center that has never appeared before. We use this as a signal to start the reset sequence.

23

p

slide-24
SLIDE 24

Signaling

Suppose that when k-Means terminates, there is one cluster center that has never appeared before. We use this as a signal to start the reset sequence.

24

p d − d By setting we can control exactly when will switch.

  • p
slide-25
SLIDE 25

Signaling

Suppose that when k-Means terminates, there is one cluster center that has never appeared before. We use this as a signal to start the reset sequence.

25

p

slide-26
SLIDE 26

Resetting

k properly placed points can reset the positions of the k current centers Easy to compute locations of the reset points, so that new cluster centers are placed correctly:

26

current center intended center

slide-27
SLIDE 27

Resetting

k properly placed points can reset the positions of the k current centers Easy to compute locations of the reset points, so that new cluster centers are placed correctly:

27

current center intended center add to cluster to reset mean

slide-28
SLIDE 28

Resetting

k properly placed points can reset the positions of the k current centers Easy to compute locations of the reset points, so that new cluster centers are placed correctly:

28

new center

slide-29
SLIDE 29

Resetting

Easy to compute locations of the reset points, so that new cluster centers are placed correctly: But must avoid accidentally grabbing other points

29

slide-30
SLIDE 30

Resetting

Solution: Add two new dimensions

30

x y w z intended new position center to reset

slide-31
SLIDE 31

Resetting

Solution: Add two new dimensions

31

x y w z new point added

slide-32
SLIDE 32

Resetting

Solution: Add two new dimensions

32

x y w z new point added

slide-33
SLIDE 33

Multi-Signaling

So far have shown how to signal and reset a single

  • cluster. Can use one signal to induce a signal from all

clusters.

33

p

slide-34
SLIDE 34

Multi-Signaling

All centers are stable before the main signaling has taken place.

34

p d d +

slide-35
SLIDE 35

Multi-Signaling

All centers are stable before the main signaling has taken place.

35

p d d +

slide-36
SLIDE 36

Multi-Signaling

Due to signaling the center moves away, now all centers absorb points above.

36

p d + > d +

slide-37
SLIDE 37

Multi-Signaling

Due to signaling the center q moves away, now all centers absorb points above. All clusters have previously unseen centers.

37

p

slide-38
SLIDE 38

Put All Pieces Together

Start with a signaling configuration Transform it, so that all clusters signal Use the new signal to reset cluster centers (and therefore double the runtime of k-Means) Ensure the new configuration is signaling Repeat...

38

slide-39
SLIDE 39

Construction in Pictures

Construction:

39

Reflected Points

slide-40
SLIDE 40

Construction in Pictures

After steps - signal by all clusters

40

t

slide-41
SLIDE 41

Construction in Pictures

Main Clusters absorb “catalyst” points. Yellow centers move away

41

slide-42
SLIDE 42

Construction in Pictures

The new points added are “reset” points - resetting the

  • riginal cluster centers.

42

slide-43
SLIDE 43

Construction in Pictures

Can ensure “catalyst” points leave the main clusters

43

slide-44
SLIDE 44

Construction in Pictures

k-Means runs for another steps. The original centers will be signaling.

44

t

slide-45
SLIDE 45

Construction Results

If we repeat the reseting widget construction times: points in dimensions clusters Total running time:

45

r O(r) O(r2) O(r) 2Ω(r)

slide-46
SLIDE 46

Construction Remarks

Currently construction has very large spread Can use more trickery to decrease the spread to be constant, albeit with a blow up in the dimension. As presented requires specific placement of initial cluster centers, in practice centers chosen randomly from points. Can make construction work even in this case Open question: Can we decrease the dimensionality to constant d?

46

slide-47
SLIDE 47

Outline

k-Means Intuition Lower Bound Sketch Upper Bound Sketch Open Problems

47

slide-48
SLIDE 48

Smoothed Analysis

Assume each point came from a Gaussian distribution with variance . Data collection is inherently noisy Or add some Gaussian noise (effect on final clustering is minimal)

48

σ2 Key Fact: Probability mass inside any ball of radius is at most .

  • (/σ)d
slide-49
SLIDE 49

Potential Function

Use a potential function: Original Potential at most Potential decreases every step. Reassignment reduces Center recomputation finds optimal for the given partition

49

Φ(C) =

  • x∈X

x − C(x)2 nD2 x − C(X) Φ

slide-50
SLIDE 50

Potential Decrease

Lemma Let be a pointset with optimal center and be any other point then:

50

c S c∗ Φ(c) − Φ(c∗) = |S|c − c∗2 Φ(c) =

  • x∈S

(x − c) · (x − c) =

  • x∈S

(x − c∗) · (x − c∗) + (c − c∗) · (c − c∗) + 2(c − c∗) · (x − c∗) = Φ(c∗) + |S|c − c∗ + 2(c − c∗) ·

  • x∈S

(x − c∗) =

  • x∈S

(x − c + c∗ − c∗) · (x − c + c∗ − c∗)

slide-51
SLIDE 51

Main Lemma

In a smoothed pointset, fix an . Then with probability at least for any two clusters and with

  • ptimal centers and we have that:

51

S T c(S) − c(T) ≥

  • 2 min(|S|, |T|)

1 − 22n σ d > 0 c(S) c(T)

slide-52
SLIDE 52

Proof Sketch

Suppose and . Fix all points except for .

52

|S| < |T| x, x ∈ S, x ∈ T x To ensure , must lie in a ball of diameter . x c(S) − c(T) ≤ |S| Since came from a Gaussian of variance this probability is at most . x (|S|σ−1)d σ2 Finally, union bound the total error probability over all possible pairs of sets - . 22n(/σ)d

slide-53
SLIDE 53

Potential Drop

At each iteration, examine a cluster whose center changed from to : Therefore, the potential drops by After iterations, the algorithm must terminate.

53

S c c c − c ≥

  • |S|

|S| 2 |S|2 ≥ 2 4n m = 4n2D2 2

slide-54
SLIDE 54

To Finish Up:

Chose . Then the total probability of failure is: . The total running time is Remark: polynomial for

54

= σn− 1

d 2− 2n d

22n(/σ)d = 1/n d = Ω( n log n) m = 4n2D2 2 = O

  • n2+2/d

D σ 2 22n/d

slide-55
SLIDE 55

Upper bound (2)

We used the union bound over all possible sets. However, due to the geometry, not all sets arise. The total number

  • f distinct clusters that can appear is .

Carrying the same calculations through we can bound the total number of iterations as:

55

O(nkd) O

  • nk+2/d

D σ 2

slide-56
SLIDE 56

Remarks

The noise need not be Gaussian, need to avoid large probabilistic point masses. e.g. Lipschitz conditions are enough.

56

slide-57
SLIDE 57

Outline

k-Means Intuition Lower Bound Sketch Upper Bound Sketch Open Problems

57

slide-58
SLIDE 58

Conclusion - Lower Bounds

Showed super-polynomial lower bound on the execution time of k-Means: However - construction requires many dimensions, does not preclude an upper bound

58

O(nd)

slide-59
SLIDE 59

Conclusion - Upper Bounds

Can use smoothed analysis to reduce the best known upper bounds for k-Means. But, is not polynomial for small values of d, or large values of k. Even with smoothness there is an lower bound, which is never observed in practice.

59

Ω(n)

slide-60
SLIDE 60

Thank you

Any Questions?