k means few more steps yield constant approximation
play

k-means++: few more steps yield constant approximation Davin Choo - PowerPoint PPT Presentation

k-means++: few more steps yield constant approximation Davin Choo Christoph Grunau Julian Portmann V aclav Rozho n ETH Z urich ICML 2020 Clustering Given unlabelled d -dimensional data points P = { p 1 , . . . , p n } , group


  1. k-means++: few more steps yield constant approximation Davin Choo Christoph Grunau Julian Portmann V´ aclav Rozhoˇ n ETH Z¨ urich ICML 2020

  2. Clustering Given unlabelled d -dimensional data points P = { p 1 , . . . , p n } , group similar ones together into k clusters Which is a better clustering into k = 3 groups?

  3. k-means metric ◮ Centers C = { c 1 , . . . , c k } p ∈ P min c ∈ C d ( p , c ) 2 = � ◮ cost ( P , C ) = � p ∈ P cost ( p , C ) c 2 p c 3 cost ( p , c 3 ) c 1 ◮ Restricting C ⊆ P only loses a 2-factor in cost ( P , C ) ◮ NP-hard to find optimal solution [ADHP09, MNV09]

  4. k-means metric ◮ Centers C = { c 1 , . . . , c k } p ∈ P min c ∈ C d ( p , c ) 2 = � ◮ cost ( P , C ) = � p ∈ P cost ( p , C ) ◮ Given k clusters, optimal centers are the means/centroids

  5. k-means metric ◮ Centers C = { c 1 , . . . , c k } p ∈ P min c ∈ C d ( p , c ) 2 = � ◮ cost ( P , C ) = � p ∈ P cost ( p , C ) p 4 c 1 p 3 p 2 p 1 ◮ Given k clusters, optimal centers are the means/centroids c 1 = 1 e.g. 4 [ p 1 + p 2 + p 3 + p 4 ]

  6. k-means metric ◮ Centers C = { c 1 , . . . , c k } p ∈ P min c ∈ C d ( p , c ) 2 = � ◮ cost ( P , C ) = � p ∈ P cost ( p , C ) c 2 c 3 c 1 ◮ Given k clusters, optimal centers are the means/centroids ◮ Given k centers, optimal cluster assignment is closest center

  7. k-means metric ◮ Centers C = { c 1 , . . . , c k } p ∈ P min c ∈ C d ( p , c ) 2 = � ◮ cost ( P , C ) = � p ∈ P cost ( p , C ) c 2 cost ( p , c 2 ) p c 3 cost ( p , c 1 ) cost ( p , c 3 ) c 1 ◮ Given k clusters, optimal centers are the means/centroids ◮ Given k centers, optimal cluster assignment is closest center

  8. Lloyd’s algo. [Llo82]: Heuristic alternating minimization Given k initial centers (Remark: centers not necessarily from P ) Optimal assignment ← → Optimal clustering (Animation works only for PDF readers like Adobe Acrobat Reader)

  9. Lloyd’s algo. [Llo82]: Heuristic alternating minimization Given k initial centers (Remark: centers not necessarily from P ) Optimal assignment ← → Optimal clustering ◮ Lloyd’s algorithm never worsens cost ( P , C ) but has no performance guarantees (local minimas) ◮ One way to get theoretic guarantees: Seed with provably good initial centers

  10. k-means++ initialization [AV07] ◮ Chooses k points from P : O (log k ) apx. (in expectation) ◮ 1 st center chosen uniformly at random from P

  11. k-means++ initialization [AV07] ◮ Chooses k points from P : O (log k ) apx. (in expectation) ◮ 1 st center chosen uniformly at random from P

  12. k-means++ initialization [AV07] ◮ Chooses k points from P : O (log k ) apx. (in expectation) ◮ 1 st center chosen uniformly at random from P Cost to centers C cost ( p , C ) ◮ D 2 -sampling: Pr[ p ] = � p ∈ P cost ( p , C ) (updated at each step) 100 90 40 p cost ( p , C ) = 90

  13. k-means++ initialization [AV07] ◮ Chooses k points from P : O (log k ) apx. (in expectation) ◮ 1 st center chosen uniformly at random from P Cost to centers C cost ( p , C ) ◮ D 2 -sampling: Pr[ p ] = � p ∈ P cost ( p , C ) (updated at each step)

  14. k-means++ initialization [AV07] ◮ Chooses k points from P : O (log k ) apx. (in expectation) ◮ 1 st center chosen uniformly at random from P Cost to centers C cost ( p , C ) ◮ D 2 -sampling: Pr[ p ] = � p ∈ P cost ( p , C ) (updated at each step) 40 90 p cost ( p , C ) = 40

  15. k-means++ initialization [AV07] ◮ Chooses k points from P : O (log k ) apx. (in expectation) ◮ 1 st center chosen uniformly at random from P Cost to centers C cost ( p , C ) ◮ D 2 -sampling: Pr[ p ] = � p ∈ P cost ( p , C ) (updated at each step) ◮ Practically efficient: O ( dnk ) running time ◮ Exist instances where running k-means++ yield Ω(log k ) apx. with high probability in k [BR13, BJA16]

  16. What is known? • Lloyd’s algorithm [Llo82] Practice Theory

  17. What is known? • Lloyd’s algorithm [Llo82] Practice • Best known approximation factor [ANFSW19]: 6.357 • PTAS for fixed k [KSS10] Theory • PTAS for fixed d [CAKM19, FRS19] • Local search [KMN + 04]: (9 + ǫ )-approximation in poly-time

  18. What is known? • Lloyd’s algorithm [Llo82] Practice • k-means++ [AV07]: O (log k ) apx. in O ( dnk ) time • LocalSearch++ [LS19]: O (1) apx. in O ( dnk 2 log log k ) time • Best known approximation factor [ANFSW19]: 6.357 • PTAS for fixed k [KSS10] Theory • PTAS for fixed d [CAKM19, FRS19] • Local search [KMN + 04]: (9 + ǫ )-approximation in poly-time

  19. What is known? • Lloyd’s algorithm [Llo82] Practice • k-means++ [AV07]: O (log k ) apx. in O ( dnk ) time • LocalSearch++ [LS19]: O (1) apx. in O ( dnk 2 log log k ) time • Best known approximation factor [ANFSW19]: 6.357 • PTAS for fixed k [KSS10] Theory • PTAS for fixed d [CAKM19, FRS19] • Local search [KMN + 04]: (9 + ǫ )-approximation in poly-time ◮ Bi-criteria approximation [Wei16, ADK09]: O (1)-approximation with O ( k ) cluster centers

  20. What is known? • Lloyd’s algorithm [Llo82] Practice • k-means++ [AV07]: O (log k ) apx. in O ( dnk ) time • LocalSearch++ [LS19]: O (1) apx. in O ( dnk 2 log log k ) time • Best known approximation factor [ANFSW19]: 6.357 • PTAS for fixed k [KSS10] Theory • PTAS for fixed d [CAKM19, FRS19] • Local search [KMN + 04]: (9 + ǫ )-approximation in poly-time ◮ Bi-criteria approximation [Wei16, ADK09]: O (1)-approximation with O ( k ) cluster centers ◮ This work: O ( dnk 2 ) running time, O (1) approximation

  21. Outline of talk ◮ What we have discussed ◮ Clustering as a motivation ◮ Lloyd’s heuristic and k-means++ initialization ◮ Prior work

  22. Outline of talk ◮ What we have discussed ◮ Clustering as a motivation ◮ Lloyd’s heuristic and k-means++ initialization ◮ Prior work ◮ What’s next ◮ Idea of bi-criteria algorithm and notion of settledness ◮ Idea of local search ◮ LocalSearch++: combining k-means++ with local search ◮ Key idea behind how we tighten analysis of LocalSearch++

  23. Bi-criteria [Wei16, ADK09] and settledness ◮ “Balls into bins” process ◮ k bins: Optimal k -clustering of points defined by OPT k ◮ O ( k ) balls: Sampled points in C ◮ A cluster Q is settled if cost ( Q , C ) ≤ 10 · cost ( Q , OPT k )

  24. Bi-criteria [Wei16, ADK09] and settledness ◮ “Balls into bins” process ◮ k bins: Optimal k -clustering of points defined by OPT k ◮ O ( k ) balls: Sampled points in C ◮ A cluster Q is settled if cost ( Q , C ) ≤ 10 · cost ( Q , OPT k ) ◮ Can show (with constant success probabilities): ◮ If not yet 20-apx., D 2 -sampling chooses from unsettled cluster ◮ If sample p from unsettled cluster Q , adding p makes Q settled

  25. Bi-criteria [Wei16, ADK09] and settledness ◮ “Balls into bins” process ◮ k bins: Optimal k -clustering of points defined by OPT k ◮ O ( k ) balls: Sampled points in C ◮ A cluster Q is settled if cost ( Q , C ) ≤ 10 · cost ( Q , OPT k ) ◮ Can show (with constant success probabilities): ◮ If not yet 20-apx., D 2 -sampling chooses from unsettled cluster ◮ If sample p from unsettled cluster Q , adding p makes Q settled ◮ After O ( k ) samples, cost ( P , C ) ≤ 20 · cost ( P , OPT k )

  26. Local search [KMN + 04] ◮ Initialize arbitrary k points → C ◮ Repeat ◮ Pick arbitrary point p ∈ P ◮ If ∃ q ∈ C such that cost ( P , C \ { q } ∪ { p } ) improves cost, swap

  27. Local search [KMN + 04] ◮ Initialize arbitrary k points → C ◮ Repeat ◮ Pick arbitrary point p ∈ P ◮ If ∃ q ∈ C such that cost ( P , C \ { q } ∪ { p } ) improves cost, swap

  28. Local search [KMN + 04] ◮ Initialize arbitrary k points → C ◮ Repeat ◮ Pick arbitrary point p ∈ P ◮ If ∃ q ∈ C such that cost ( P , C \ { q } ∪ { p } ) improves cost, swap

  29. Local search [KMN + 04] ◮ Initialize arbitrary k points → C ◮ Repeat ◮ Pick arbitrary point p ∈ P ◮ If ∃ q ∈ C such that cost ( P , C \ { q } ∪ { p } ) improves cost, swap

  30. Local search [KMN + 04] ◮ Initialize arbitrary k points → C ◮ Repeat ◮ Pick arbitrary point p ∈ P ◮ If ∃ q ∈ C such that cost ( P , C \ { q } ∪ { p } ) improves cost, swap ◮ Polynomial number of iterations → O (1) approximation

  31. LocalSearch++ [LS19] ◮ Initialize arbitrary k points → C from output of k-means++ ◮ Repeat ◮ Pick arbitrary point p ∈ P using D 2 -sampling ◮ If ∃ q ∈ C such that cost ( P , { p } ∪ C \ { q } ) improves cost, swap O ( k log log k ) ◮ Polynomial number of iterations → O (1) approximation

  32. LocalSearch++ [LS19]: One step of analysis � 1 ◮ Lemma: In each step, cost decrease by factor of 1 − Θ � k with constant probability

  33. LocalSearch++ [LS19]: One step of analysis � 1 ◮ Lemma: In each step, cost decrease by factor of 1 − Θ � k with constant probability ◮ Implication: After O ( k ) steps, approximation factor halves k-means++ is O (log k ) apx. in expectation O ( k ) O ( k ) steps steps � � � � log k log k . . . O (log k )-apx. O -apx. O = O (1)-apx. 2 2 r r = O (log log k ) phases, totaling O ( k log log k ) steps

  34. LocalSearch++ [LS19]: Bounding cost decrease ◮ Match OPT centers c ∗ ∈ C ∗ to candidate centers c ∈ C C ∗ clusters C clusters M L

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend