di dimen ension sion re redu ducti ction on
play

di dimen ension sion re redu ducti ction on Yury Makarychev, - PowerPoint PPT Presentation

-mea d -med eans ns an and edian ians s un unde der r di dimen ension sion re redu ducti ction on Yury Makarychev, TTIC Konstantin Makarychev, Northwestern Ilya Razenshteyn, Microsoft Research Simons Institute,


  1. 𝒍 -mea d 𝒍 -med eans ns an and edian ians s un unde der r di dimen ension sion re redu ducti ction on Yury Makarychev, TTIC Konstantin Makarychev, Northwestern Ilya Razenshteyn, Microsoft Research Simons Institute, November 2, 2018

  2. Euclidean 𝑙 -means and 𝑙 -medians Given a set of points π‘Œ in ℝ 𝑛 Partition π‘Œ into 𝑙 clusters 𝐷 1 , … , 𝐷 𝑙 and find a β€œcenter” 𝑑 𝑗 for each 𝐷 𝑗 so as to minimize the cost 𝑙 ( 𝑙 -median) ෍ ෍ 𝑒(𝑣, 𝑑 𝑗 ) 𝑗=1 π‘£βˆˆπ· 𝑗 𝑙 𝑒 𝑣, 𝑑 𝑗 2 ෍ ෍ ( 𝑙 -means) 𝑗=1 π‘£βˆˆπ· 𝑗

  3. Dimension Reduction Dimension reduction πœ’: ℝ 𝑛 β†’ ℝ 𝑒 is a random map that preserves distances within a factor of 1 + 𝜁 with probability at least 1 βˆ’ πœ€ : 1 𝑣 βˆ’ 𝑀 ≀ πœ’ 𝑣 βˆ’ πœ’ 𝑀 ≀ (1 + 𝜁) 𝑣 βˆ’ 𝑀 1 + 𝜁 [Johnson-Lindenstrauss β€˜84] There exists a random log 1/πœ€ linear dimension reduction with 𝑒 = 𝑃 . 𝜁 2 [Larsen, Nelson β€˜17] The dependence of 𝑒 on 𝜁 and πœ€ is optimal.

  4. Dimension Reduction JL preserves all distances between points in π‘Œ whp when 𝑒 = Ξ©(log |π‘Œ|/𝜁 2 ) . Numerous applications in computer science. Dimension Reduction Constructions: β€’ [JL β€˜84] Project on a random 𝑒 -dimensional subspace β€’ [Indyk, Motwani β€˜98] Apply a random Gaussian matrix β€’ [Achlioptas β€˜03] Apply a random matrix with Β±1 entries β€’ [Ailon , Chazelle β€˜06] Fast JL-transform

  5. 𝑙 -means under dimension reduction [Boutsidis, Zouzias, Drineas ’10] Apply a dimension reduction πœ’ to our dataset π‘Œ dimension reduction Cluster πœ’(π‘Œ) in dimension 𝑒 .

  6. 𝑙 -means under dimension reduction want Optimal clusterings of π‘Œ and πœ’(π‘Œ) have approximately the same cost. even better The cost of every clustering is approximately preserved. For what dimension 𝑒 can we get this?

  7. 𝑙 -means under dimension reduction distortion 𝒆 ~ log π‘œ /𝜁 2 Folklore 1 + 𝜁 Boutsidis, Zouzias, ~𝑙/𝜁 2 2 + 𝜁 Drineas β€˜10 ~𝑙/𝜁 2 Cohen, Elder, 1 + 𝜁 Musco, Musco, ~ log 𝑙 /𝜁 2 9 + 𝜁 Persu ’15 ~ log(𝑙/𝜁) /𝜁 2 MM R ’18 1 + 𝜁 ~ log 𝑙 /𝜁 2 Lower bound 1 + 𝜁

  8. 𝑙 -medians under dimension reduction distortion 𝒆 Prior work β€” β€” Kirszsbraun Thm β‡’ ~ log π‘œ /𝜁 2 1 + 𝜁 ~ log(𝑙/𝜁) /𝜁 2 MM R ’18 1 + 𝜁 ~ log 𝑙 /𝜁 2 Lower bound 1 + 𝜁

  9. Plan 𝑙 -means β€’ Challenges β€’ Warm up: 𝑒~log π‘œ /𝜁 2 β€’ Special case: β€œdistortions” are everywhere sparse β€’ Remove outliers: the general case β†’ the special case β€’ Outliers 𝑙 -medians β€’ Overview of our approach

  10. Out result for 𝑙 -means Let π‘Œ βŠ‚ ℝ 𝑛 πœ’: ℝ 𝑛 β†’ ℝ 𝑒 be a random dimension reduction. 𝑒 β‰₯ 𝑑 log 𝑙 πœπœ€ /𝜁 2 With probability at least 1 βˆ’ πœ€ : 1 βˆ’ 𝜁 cost π’Ÿ ≀ cost πœ’ π’Ÿ ≀ 1 + 𝜁 cost π’Ÿ for every clustering π’Ÿ = 𝐷 1 , … , 𝐷 𝑙 of π‘Œ

  11. Challenges Let π’Ÿ βˆ— be the optimal 𝑙 -means clustering. Easy: cost π’Ÿ βˆ— β‰ˆ cost πœ’(π’Ÿ βˆ— ) with probability 1 βˆ’ πœ€ Hard: Prove that there is no other clustering π’Ÿβ€² s.t. cost πœ’ π’Ÿ β€² < 1 βˆ’ 𝜁 cost π’Ÿ βˆ— since there are exponentially many clusterings π’Ÿβ€² (can’t use the union bound)

  12. Warm-up Consider a clustering π’Ÿ = (𝐷 1 , … , 𝐷 𝑙 ) . Write the cost in terms of pair-wise distances: 𝑙 1 𝑣 βˆ’ 𝑀 2 cost π’Ÿ = ෍ 2|𝐷 𝑗 | ෍ 𝑗=1 𝑣,π‘€βˆˆπ· 𝑗 all distances 𝑣 βˆ’ 𝑀 are preserved within 1 + 𝜁 ⇓ cost π’Ÿ is preserved within 1 + 𝜁 Sufficient to have 𝑒~ log π‘œ /𝜁 2

  13. Problem & Notation Assume that π’Ÿ = (𝐷 1 , … , 𝐷 𝑙 ) is a random clustering that depends on πœ’ . Want to prove: cost π’Ÿ β‰ˆ cost πœ’ π’Ÿ whp. The distance between 𝑣 and 𝑀 is (1 + 𝜁) -preserved or distorted depending on whether πœ’(𝑣) βˆ’ πœ’(𝑀) β‰ˆ 1+𝜁 𝑣 βˆ’ 𝑀 Think πœ€ = poly(1/𝑙, 𝜁) is sufficiently small.

  14. Distortion graph Connect 𝑣 and 𝑀 with an edge if the distance between them is distorted. + Every edge is present with probability at most πœ€ . βˆ’ Edges are not independent. βˆ’ π’Ÿ depends on the set of edges. βˆ’ May have high-degree vertices. βˆ’ All distances in a cluster may be distorted.

  15. Cost of a cluster The cost of 𝐷 𝑗 is 1 𝑣 βˆ’ 𝑀 2 2|𝐷 𝑗 | ෍ 𝑣,π‘€βˆˆπ· 𝑗 + Terms for non-edges (𝑣, 𝑀) are (1 + 𝜁) preserved. 𝑣 βˆ’ 𝑀 β‰ˆ πœ’ 𝑣 βˆ’ πœ’(𝑀) βˆ’ Need to prove that 𝑣 βˆ’ 𝑀 2 = πœ’ 𝑣 βˆ’ πœ’(𝑀) 2 Β± πœβ€²cost π’Ÿ ෍ ෍ 𝑣,π‘€βˆˆπ· 𝑗 𝑣,π‘€βˆˆπ· 𝑗 𝑣,𝑀 ∈𝐹 𝑣,𝑀 ∈𝐹

  16. Everywhere-sparse edges Assume every 𝑣 ∈ 𝐷 𝑗 is connected to at most a πœ„ fraction of all 𝑀 in 𝐷 𝑗 (where πœ„ β‰ͺ 𝜁 ).

  17. Everywhere-sparse edges + Terms for non-edges (𝑣, 𝑀) are (1 + 𝜁) preserved. + The contribution of terms for edges is small: for an edge 𝑣, 𝑀 and any π‘₯ ∈ 𝐷 𝑗 𝑣 βˆ’ 𝑀 ≀ 𝑣 βˆ’ π‘₯ + π‘₯ βˆ’ 𝑀 𝑣 βˆ’ 𝑀 2 ≀ 2 𝑣 βˆ’ π‘₯ 2 + π‘₯ βˆ’ 𝑀 2

  18. Everywhere-sparse edges 𝑣 βˆ’ 𝑀 2 ≀ 2 𝑣 βˆ’ π‘₯ 2 + π‘₯ βˆ’ 𝑀 2 β€’ Replace the term for every edge with two terms 𝑣 βˆ’ π‘₯ 2 , π‘₯ βˆ’ 𝑀 2 for random π‘₯ ∈ 𝐷 𝑗 . β€’ Each term is used at most 2πœ„ times, in expectation. 𝑣 βˆ’ 𝑀 2 ≀ 4πœ„ ෍ 𝑣 βˆ’ 𝑀 2 ෍ (𝑣,𝑀)∈𝐹 𝑣,π‘€βˆˆπ· 𝑗 𝑣,π‘€βˆˆπ· 𝑗

  19. Everywhere-sparse edges 𝑣 βˆ’ 𝑀 2 β‰ˆ 𝑣 βˆ’ 𝑀 2 ෍ ෍ 𝑣,π‘€βˆˆπ· 𝑗 𝑣,𝑀 βˆ‰πΉ β‰ˆ πœ’(𝑣) βˆ’ πœ’(𝑀) 2 β‰ˆ ෍ πœ’(𝑣) βˆ’ πœ’(𝑀) 2 ෍ (𝑣,𝑀)βˆ‰πΉ 𝑣,π‘€βˆˆπ· 𝑗

  20. Everywhere-sparse edges 𝑣 βˆ’ 𝑀 2 β‰ˆ 𝑣 βˆ’ 𝑀 2 ෍ ෍ 𝑣,π‘€βˆˆπ· 𝑗 𝑣,𝑀 βˆ‰πΉ β‰ˆ πœ’(𝑣) βˆ’ πœ’(𝑀) 2 β‰ˆ ෍ πœ’(𝑣) βˆ’ πœ’(𝑀) 2 ෍ (𝑣,𝑀)βˆ‰πΉ 𝑣,π‘€βˆˆπ· 𝑗 Edges are not necessarily everywhere sparse!

  21. Outliers Want: remove β€œoutliers” so that in the remaining set π‘Œβ€² edges are everywhere sparse in every cluster.

  22. (1 βˆ’ πœ„) non-distorted core Want: remove β€œoutliers” so that in the remaining set π‘Œβ€² edges are everywhere sparse in every cluster.

  23. (1 βˆ’ πœ„) non-distorted core Want: remove β€œoutliers” so that in the remaining set π‘Œβ€² edges are everywhere sparse in every cluster. Find a subset π‘Œ β€² βŠ‚ π‘Œ (which depends on π’Ÿ ) s.t. β€’ Edges are sparse in the obtained clusters: Every 𝑣 ∈ 𝐷 𝑗 ∩ π‘Œβ€² is connected to at most a πœ„ fraction of all 𝑀 in 𝐷 𝑗 ∩ π‘Œβ€² . β€’ Outliers are rare: For every 𝑣 , Pr 𝑣 βˆ‰ π‘Œ β€² ≀ πœ„

  24. All clusters are large Assume all clusters are of size ~π‘œ/𝑙 . Let πœ„ = πœ€ 1/4 . outliers = all vertices of degree at least ~πœ„π‘œ/𝑙 Every vertex has degree at most πœ€π‘œ in expectation. By Markov, Pr( 𝑣 is an outlier) ≀ πœ€π‘™ πœ„ ≀ πœ„ Remove πœ„π‘œ β‰ͺ π‘œ/𝑙 vertices in total, so all clusters still have size ~π‘œ/𝑙 . Crucially use that all clusters are large!

  25. Main Combinatorial Lemma Idea: assign β€œweights” to vertices so that all clusters have a large weight. β€’ There is a measure 𝜈 on π‘Œ and random set 𝑆 s.t. 1 𝐷 𝑗 βˆ–π‘† for 𝑦 ∈ 𝐷 𝑗 βˆ– 𝑆 (always) 𝜈 𝑦 β‰₯ β€’ 𝜈 π‘Œ ≀ 4𝑙 3 /πœ„ 2 β€’ Pr(𝑦 ∈ 𝑆) ≀ πœ„ All clusters 𝐷 𝑗 βˆ– 𝑆 are β€œlarge” w.r.t. measure 𝜈 . Can apply a variant of the previous argument.

  26. Edges Incident on Outliers Need to take care of edges incident on outliers. 𝑣 𝑀 𝑑 βˆ— Say, 𝑣 is an outlier and 𝑀 is not. βˆ— for π‘Œ . βˆ— , … , 𝐷 𝑙 Consider a fixed optimal clustering 𝐷 1 Let 𝑑 βˆ— be the optimal center for 𝑣 .

  27. Edges Incident on Outliers 𝑣 𝑀 𝑑 βˆ— 𝑀 βˆ’ 𝑑 βˆ— Β± 𝑑 βˆ— βˆ’ 𝑣 𝑣 βˆ’ 𝑀 = β‰ˆ πœ’(𝑀) βˆ’ πœ’(𝑑 βˆ— ) Β± πœ’(𝑑 βˆ— ) βˆ’ πœ’(𝑣) πœ’(𝑣) βˆ’ πœ’(𝑀) = May assume that the distances between non-outliers and the optimal centers are 1 + 𝜁 -preserved.

  28. Edges Incident on Outliers 𝑣 𝑀 𝑑 βˆ— 𝑀 βˆ’ 𝑑 βˆ— Β± 𝑑 βˆ— βˆ’ 𝑣 𝑣 βˆ’ 𝑀 = β‰ˆ πœ’(𝑀) βˆ’ πœ’(𝑑 βˆ— ) Β± πœ’(𝑑 βˆ— ) βˆ’ πœ’(𝑣) πœ’(𝑣) βˆ’ πœ’(𝑀) = βˆ— βˆ’ 𝑣 2 ] ≀ πœ„ Οƒ π‘£βˆˆπ‘Œ 𝑑 𝑣 βˆ— βˆ’ 𝑣 2 = πœ„ OPT 𝔽 [ Οƒ π‘£βˆ‰π‘Œ β€² 𝑑 𝑣

  29. Edges Incident on Outliers 𝑣 𝑀 𝑑 βˆ— 𝑀 βˆ’ 𝑑 βˆ— Β± 𝑑 βˆ— βˆ’ 𝑣 𝑣 βˆ’ 𝑀 = β‰ˆ πœ’(𝑀) βˆ’ πœ’(𝑑 βˆ— ) Β± πœ’(𝑑 βˆ— ) βˆ’ πœ’(𝑣) πœ’(𝑣) βˆ’ πœ’(𝑀) = Taking care of πœ’(𝑑 βˆ— ) βˆ’ πœ’(𝑣) is a bit more difficult. QED

  30. 𝑙 -medians under dimension reduction

  31. 𝑙 -medians βˆ’ No formula for the cost of the clustering in terms of pairwise distances. βˆ’ Not obvious when 𝑒 ~ log π‘œ (then all pairwise distances are approximately preserved). [was asked by Ravi Kannan in a tutorial @ Simons] + Kirzsbraun Theorem β‡’ the 𝑒~ log π‘œ case + Prove a Robust Kirzsbraun Theorem Our methods for 𝑙 -means + Robust Kirzsbraun β‡’ 𝑒~ log 𝑙 for 𝑙 -medians

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend