SLIDE 1 π-mea eans ns an and d π-med edian ians s un unde der r di dimen ension sion re redu ducti ction
Yury Makarychev, TTIC Konstantin Makarychev, Northwestern Ilya Razenshteyn, Microsoft Research
Simons Institute, November 2, 2018
SLIDE 2 Euclidean π-means and π-medians
Given a set of points π in βπ Partition π into π clusters π·1, β¦ , π·π and find a βcenterβ ππ for each π·π so as to minimize the cost ΰ·
π=1 π
ΰ·
π£βπ·π
π(π£, ππ) ΰ·
π=1 π
ΰ·
π£βπ·π
π π£, ππ 2 (π-means) (π-median)
SLIDE 3 Dimension Reduction
Dimension reduction π: βπ β βπ is a random map that preserves distances within a factor of 1 + π with probability at least 1 β π: 1 1 + π π£ β π€ β€ π π£ β π π€ β€ (1 + π) π£ β π€ [Johnson-Lindenstrauss β84] There exists a random linear dimension reduction with π = π
log 1/π π2
. [Larsen, Nelson β17] The dependence of π on π and π is optimal.
SLIDE 4 Dimension Reduction
JL preserves all distances between points in π whp when π = Ξ©(log |π|/π2). Numerous applications in computer science. Dimension Reduction Constructions:
- [JL β84] Project on a random π-dimensional subspace
- [Indyk, Motwani β98] Apply a random Gaussian matrix
- [Achlioptas β03] Apply a random matrix with Β±1 entries
- [Ailon, Chazelle β06] Fast JL-transform
SLIDE 5
π-means under dimension reduction
[Boutsidis, Zouzias, Drineas β10] Apply a dimension reduction π to our dataset π Cluster π(π) in dimension π.
dimension reduction
SLIDE 6
π-means under dimension reduction
want
Optimal clusterings of π and π(π) have approximately the same cost.
even better
The cost of every clustering is approximately preserved. For what dimension π can we get this?
SLIDE 7
π-means under dimension reduction
π distortion Folklore ~ log π /π2 1 + π Boutsidis, Zouzias, Drineas β10 ~π/π2 2 + π Cohen, Elder, Musco, Musco, Persu β15 ~π/π2 1 + π ~ log π /π2 9 + π MMR β18 ~ log(π/π) /π2 1 + π Lower bound ~ log π /π2 1 + π
SLIDE 8
π-medians under dimension reduction
π distortion Prior work β β Kirszsbraun Thm β ~ log π /π2 1 + π MMR β18 ~ log(π/π) /π2 1 + π Lower bound ~ log π /π2 1 + π
SLIDE 9 Plan
π-means
- Challenges
- Warm up: π~log π /π2
- Special case: βdistortionsβ are everywhere sparse
- Remove outliers: the general case β the special case
- Outliers
π-medians
SLIDE 10
Out result for π-means
Let π β βπ π: βπ β βπ be a random dimension reduction. π β₯ π log π ππ /π2 With probability at least 1 β π: 1 β π cost π β€ cost π π β€ 1 + π cost π for every clustering π = π·1, β¦ , π·π of π
SLIDE 11
Challenges
Let πβ be the optimal π-means clustering. Easy: cost πβ β cost π(πβ) with probability 1 β π Hard: Prove that there is no other clustering πβ² s.t. cost π πβ² < 1 β π cost πβ since there are exponentially many clusterings πβ² (canβt use the union bound)
SLIDE 12 Warm-up
Consider a clustering π = (π·1, β¦ , π·π). Write the cost in terms of pair-wise distances: cost π = ΰ·
π=1 π
1 2|π·π| ΰ·
π£,π€βπ·π
π£ β π€ 2 all distances π£ β π€ are preserved within 1 + π
β
cost π is preserved within 1 + π Sufficient to have π~ log π /π2
SLIDE 13 Problem & Notation
Assume that π = (π·1, β¦ , π·π) is a random clustering that depends on π. Want to prove: cost π β cost π π whp. The distance between π£ and π€ is (1 + π)-preserved
- r distorted depending on whether
π(π£) β π(π€) β1+π π£ β π€ Think π = poly(1/π, π) is sufficiently small.
SLIDE 14
Distortion graph
Connect π£ and π€ with an edge if the distance between them is distorted. + Every edge is present with probability at most π. β Edges are not independent. β π depends on the set of edges. β May have high-degree vertices. β All distances in a cluster may be distorted.
SLIDE 15 Cost of a cluster
The cost of π·π is
1 2|π·π| ΰ·
π£,π€βπ·π
π£ β π€ 2
+ Terms for non-edges (π£, π€) are (1 + π) preserved. π£ β π€ β π π£ β π(π€) β Need to prove that
ΰ·
π£,π€βπ·π π£,π€ βπΉ
π£ β π€ 2 = ΰ·
π£,π€βπ·π π£,π€ βπΉ
π π£ β π(π€) 2 Β± πβ²cost π
SLIDE 16
Everywhere-sparse edges
Assume every π£ β π·π is connected to at most a π fraction of all π€ in π·π (where π βͺ π).
SLIDE 17
Everywhere-sparse edges
+ Terms for non-edges (π£, π€) are (1 + π) preserved. + The contribution of terms for edges is small: for an edge π£, π€ and any π₯ β π·π π£ β π€ β€ π£ β π₯ + π₯ β π€ π£ β π€ 2 β€ 2 π£ β π₯ 2 + π₯ β π€ 2
SLIDE 18 Everywhere-sparse edges
π£ β π€ 2 β€ 2 π£ β π₯ 2 + π₯ β π€ 2
- Replace the term for every edge with two terms
π£ β π₯ 2, π₯ β π€ 2 for random π₯ β π·π.
- Each term is used at most 2π times, in expectation.
ΰ·
(π£,π€)βπΉ π£,π€βπ·π
π£ β π€ 2 β€ 4π ΰ·
π£,π€βπ·π
π£ β π€ 2
SLIDE 19 Everywhere-sparse edges
ΰ·
π£,π€βπ·π
π£ β π€ 2 β ΰ·
π£,π€ βπΉ
π£ β π€ 2 β ΰ·
(π£,π€)βπΉ
π(π£) β π(π€) 2 β ΰ·
π£,π€βπ·π
π(π£) β π(π€) 2
SLIDE 20 Everywhere-sparse edges
ΰ·
π£,π€βπ·π
π£ β π€ 2 β ΰ·
π£,π€ βπΉ
π£ β π€ 2 β ΰ·
(π£,π€)βπΉ
π(π£) β π(π€) 2 β ΰ·
π£,π€βπ·π
π(π£) β π(π€) 2
Edges are not necessarily everywhere sparse!
SLIDE 21
Outliers
Want: remove βoutliersβ so that in the remaining set πβ² edges are everywhere sparse in every cluster.
SLIDE 22
(1 β π) non-distorted core
Want: remove βoutliersβ so that in the remaining set πβ² edges are everywhere sparse in every cluster.
SLIDE 23 (1 β π) non-distorted core
Want: remove βoutliersβ so that in the remaining set πβ² edges are everywhere sparse in every cluster. Find a subset πβ² β π (which depends on π) s.t.
- Edges are sparse in the obtained clusters:
Every π£ β π·π β© πβ² is connected to at most a π fraction of all π€ in π·π β© πβ².
For every π£, Pr π£ β πβ² β€ π
SLIDE 24 All clusters are large
Assume all clusters are of size ~π/π. Let π = π1/4.
- utliers = all vertices of degree at least ~ππ/π
Every vertex has degree at most ππ in expectation. By Markov, Pr( π£ is an outlier) β€ ππ π β€ π Remove ππ βͺ π/π vertices in total, so all clusters still have size ~π/π. Crucially use that all clusters are large!
SLIDE 25 Main Combinatorial Lemma
Idea: assign βweightsβ to vertices so that all clusters have a large weight.
- There is a measure π on π and random set π s.t.
π π¦ β₯
1 π·πβπ for π¦ β π·π β π (always)
- π π β€ 4π3/π2
- Pr(π¦ β π) β€ π
All clusters π·π β π are βlargeβ w.r.t. measure π. Can apply a variant of the previous argument.
SLIDE 26 Edges Incident on Outliers
Need to take care of edges incident on outliers. Say, π£ is an outlier and π€ is not. Consider a fixed optimal clustering π·1
β, β¦ , π·π β for π.
Let πβ be the optimal center for π£. π€ π£ πβ
SLIDE 27
Edges Incident on Outliers
π£ β π€ = π€ β πβ Β± πβ β π£ π(π£) β π(π€) = π(π€) β π(πβ) Β± π(πβ) β π(π£)
May assume that the distances between non-outliers and the optimal centers are 1 + π -preserved. π€ π£ πβ
β
SLIDE 28 Edges Incident on Outliers
π£ β π€ = π€ β πβ Β± πβ β π£ π(π£) β π(π€) = π(π€) β π(πβ) Β± π(πβ) β π(π£)
π½[ Οπ£βπβ² ππ£
β β π£ 2] β€ π Οπ£βπ ππ£ β β π£ 2 = π OPT
π€ π£ πβ
β
SLIDE 29
Edges Incident on Outliers
π£ β π€ = π€ β πβ Β± πβ β π£ π(π£) β π(π€) = π(π€) β π(πβ) Β± π(πβ) β π(π£)
Taking care of π(πβ) β π(π£) is a bit more difficult. π€ π£ πβ
β
QED
SLIDE 30
π-medians under dimension reduction
SLIDE 31 π-medians
β No formula for the cost of the clustering in terms
β Not obvious when π ~ log π (then all pairwise
distances are approximately preserved). [was asked by Ravi Kannan in a tutorial @ Simons]
+ Kirzsbraun Theorem β the π~ log π case + Prove a Robust Kirzsbraun Theorem
Our methods for π-means + Robust Kirzsbraun β π~ log π for π-medians
SLIDE 32 Summary
- Prove that the cost of every π-means and π-medians
clustering is preserved up to (1 + π) under dimension reduction, when π β₯ π log
π ππ /π2.
- The bound on π almost matches the lower bound.
- π-means: improves the bound π β₯
ππ π2 by Cohen et al.
- π-medians: no results were known.
- Applies to π-clustering with the βπ-objective when
π β₯ π π4 log π ππ /π2