k means the advantages of careful seeding
play

K-means++: The Advantages of Careful Seeding Sergei Vassilvitskii - PowerPoint PPT Presentation

K-means++: The Advantages of Careful Seeding Sergei Vassilvitskii David Arthur (Stanford university) Clustering R d Given points in split them into similar groups. k n Clustering R d Given points in split them into


  1. K-means++: The Advantages of Careful Seeding Sergei Vassilvitskii David Arthur (Stanford university)

  2. Clustering R d Given points in split them into similar groups. k n

  3. Clustering R d Given points in split them into similar groups. k n This talk: k-means clustering: � c ∈C � x − c � 2 min Find centers, that minimize k C 2 x ∈ X

  4. Why Means? � c ∈C � x − c � 2 min Objective: Find centers, that minimize k C 2 x ∈ X � � x − y � 2 For one cluster: Find that minimizes y 2 x ∈ X 1 � Easy! y = x | X | x ∈ X

  5. Lloyd’s Method: k-means Initialize with random clusters

  6. Lloyd’s Method: k-means Assign each point to nearest center

  7. Lloyd’s Method: k-means Recompute optimum centers (means)

  8. Lloyd’s Method: k-means Repeat: Assign points to nearest center

  9. Lloyd’s Method: k-means Repeat: Recompute centers

  10. Lloyd’s Method: k-means Repeat...

  11. Lloyd’s Method: k-means Repeat...Until clustering does not change

  12. Analysis How good is this algorithm? Finds a local optimum That is potentially arbitrarily worse than optimal solution

  13. Approximating k-means O ( n 3 /� d ) • Mount et al.: approximation in time 9 + � O ( n + k k +2 � − 2 dk log k ( n/� )) • Har Peled et al.: in time 1 + � 2 ( k/� ) O (1) nd • Kumar et al.: in time 1 + �

  14. Approximating k-means O ( n 3 /� d ) • Mount et al.: approximation in time 9 + � O ( n + k k +2 � − 2 dk log k ( n/� )) • Har Peled et al.: in time 1 + � 2 ( k/� ) O (1) nd • Kumar et al.: in time 1 + � Lloyd’s method: 2 Ω( √ n ) • Worst-case time complexity: n O ( k ) • Smoothed complexity:

  15. Approximating k-means O ( n 3 /� d ) • Mount et al.: approximation in time 9 + � O ( n + k k +2 � − 2 dk log k ( n/� )) • Har Peled et al.: in time 1 + � 2 ( k/� ) O (1) nd • Kumar et al.: in time 1 + � Lloyd’s method: For example, Digit Recognition dataset (UCI): n = 60 , 000 d = 600 Convergence to a local optimum in 60 iterations.

  16. Challenge Develop an approximation algorithm for k-means clustering that is competitive with the k-means method in speed and solution quality. Easiest line of attack: focus on the initial center positions. Classical k-means: pick points at random. k

  17. k-means on Gaussians

  18. k-means on Gaussians

  19. Easy Fix Select centers using a furthest point algorithm (2-approximation to k-Center clustering).

  20. Easy Fix Select centers using a furthest point algorithm (2-approximation to k-Center clustering).

  21. Easy Fix Select centers using a furthest point algorithm (2-approximation to k-Center clustering).

  22. Easy Fix Select centers using a furthest point algorithm (2-approximation to k-Center clustering).

  23. Easy Fix Select centers using a furthest point algorithm (2-approximation to k-Center clustering).

  24. Sensitive to Outliers

  25. Sensitive to Outliers

  26. Sensitive to Outliers

  27. k-means++ Interpolate between the two methods: Let be the distance between and the nearest D ( x ) x cluster center. Sample proportionally to ( D ( x )) α = D α ( x ) Original Lloyd’s: α = 0 Furthest Point: α = ∞ k-means++: α = 2 Contribution of to the overall error x

  28. k-Means++

  29. k-Means++ Theorem: k-means++ is approximate in expectation. Θ(log k ) Ostrovsky et al. [06]: Similar method is approximate O (1) under some data distribution assumptions.

  30. Proof - 1st cluster Fix an optimal clustering . C ∗ Pick first center uniformly at random Bound the total error of that cluster.

  31. Proof - 1st cluster Let be the cluster. A Each point equally likely a 0 ∈ A to be the chosen center. Expected Error: 1 � � � a − a 0 � 2 E [ φ ( A )] = | A | a 0 ∈ A a ∈ A � � a − ¯ A � 2 = 2 φ ∗ ( A ) = 2 a ∈ A

  32. Proof - Other Clusters Suppose next center came from a new cluster in OPT. Bound the total error of that cluster.

  33. Other CLusters Let be this cluster, and the point selected. b 0 B Then: D 2 ( b 0 ) � � min( D ( b ) , � b − b 0 � ) 2 E [ φ ( B )] = b ∈ B D 2 ( b ) · � b 0 ∈ B b ∈ B Key step: D ( b 0 ) ≤ D ( b ) + � b − b 0 �

  34. Cont. For any b: D 2 ( b 0 ) ≤ 2 D 2 ( b ) + 2 � b − b 0 � 2 2 D 2 ( b ) + 2 � � D 2 ( b 0 ) ≤ � b − b 0 � 2 Avg. over all b: | B | | B | b ∈ B b ∈ B Same for all b 0 Cost in uniform sampling

  35. Cont. For any b: D 2 ( b 0 ) ≤ 2 D 2 ( b ) + 2 � b − b 0 � 2 2 D 2 ( b ) + 2 � � D 2 ( b 0 ) ≤ � b − b 0 � 2 Avg. over all b: | B | | B | b ∈ B b ∈ B Recall: D 2 ( b 0 ) � � min( D ( b ) , � b − b 0 � ) 2 E [ φ ( B )] = b ∈ B D 2 ( b ) · � b 0 ∈ B b ∈ B 4 � � � b − b 0 � 2 ≤ = 8 φ ∗ ( B ) | B | b 0 ∈ B b ∈ B

  36. Wrap Up If clusters are well separated, and we always pick a center from a new optimal cluster, the algorithm is - competitive. 8

  37. Wrap Up If clusters are well separated, and we always pick a center from a new optimal cluster, the algorithm is - competitive. 8 Intuition: if no points from a cluster are picked, then it probably does not contribute much to the overall error.

  38. Wrap Up If clusters are well separated, and we always pick a center from a new optimal cluster, the algorithm is - competitive. 8 Intuition: if no points from a cluster are picked, then it probably does not contribute much to the overall error. Formally, an inductive proof shows this method is Θ(log k ) competitive.

  39. Experiments Tested on several datasets: Synthetic • 10k points, 3 dimensions Cloud Cover (UCI Repository] • 10k points, 54 dimensions Color Quantization • 16k points, 16 dimensions Intrusion Detection (KDD Cup) • 500k points, 35 dimensions

  40. Typical Run KM++ v. KM v. KM-Hybrid 1300 1200 1100 1000 LLOYD Error HYBRID KM++ 900 800 700 600 0 50 100 150 200 250 300 350 400 450 500 Stage

  41. Experiments Total Error k-means km-Hybrid k-means++ Synthetic 0 . 016 0 . 015 0 . 014 6 . 06 × 10 5 6 . 02 × 10 5 5 . 95 × 10 5 Cloud Cover Color 741 712 670 32 . 9 × 10 3 3 . 4 × 10 3 Intrusion − Time: k-means++ 1% slower due to initialization.

  42. Final Message Friends don’t let friends use k-means.

  43. Thank You Any Questions?

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend