SLIDE 1
Probabilistic clustering of high dimensional norms
Assaf Naor Princeton University SODA’17
SLIDE 2 Partitions of metric spaces
Let be a metric space and a partition
(X; dX)
P
SLIDE 3
Given a point is the unique cluster in which contains x.
P(x)
P x 2 X;
SLIDE 4
Given a point is the unique cluster in which contains x.
P(x)
P x 2 X;
x
SLIDE 5
Given a point is the unique cluster in which contains x.
P(x)
P x 2 X;
P(x)
SLIDE 6
Given a point is the unique cluster in which contains x.
P(x)
P x 2 X;
x
SLIDE 7
Given a point is the unique cluster in which contains x.
P(x)
P x 2 X;
P(x)
SLIDE 8
Given , the partition is bounded if all the clusters of have diameter at most
8x 2 X; diamX ¡ P(x) ¢ := max
u;v2P(x) dX(u;v) 6 ¢:
¢ > 0
P P ¢:
¢¡
SLIDE 9
Given , the partition is bounded if all the clusters of have diameter at most
8x 2 X; diamX ¡ P(x) ¢ := max
u;v2P(x) dX(u;v) 6 ¢:
¢ > 0
P P ¢:
¢¡
6 ¢ 6 ¢ 6 ¢ 6 ¢ 6 ¢ 6 ¢
SLIDE 10 Random partitions
- Key goal in several areas of computer science
and mathematics: use a bounded partition to “simplify” the metric space.
¢¡
SLIDE 11 Random partitions
- Key goal in several areas of computer science
and mathematics: use a bounded partition to “simplify” the metric space.
- The partition should “mimic” the coarse
geometric structure (at distance scale ) in some meaningful way.
¢¡
¢
SLIDE 12 Random partitions
- Key goal in several areas of computer science
and mathematics: use a bounded partition to “simplify” the metric space.
- The partition should “mimic” the coarse
geometric structure (at distance scale ) in some meaningful way.
- Regions near boundaries should be “thin.”
¢¡
¢
SLIDE 13 Random partitions
- Key goal in several areas of computer science
and mathematics: use a bounded partition to “simplify” the metric space.
- The partition should “mimic” the coarse
geometric structure (at distance scale ) in some meaningful way.
- Regions near boundaries should be “thin.”
- Quite paradoxical, but randomness helps here…
¢¡
¢
SLIDE 14
Separating random partitions
Definition (Bartal, 1996): Suppose that is a metric space and A distribution over bounded random partitions of X is said to be separating if
(Implicit in several early works, variety of applications: Leighton- Rao [1988], Auerbuch-Peleg [1990], Linial-Saks [1991], Alon- Karp-Peleg-West [1991], Klein-Plotkin-Rao [1993], Rao [1999].)
(X; dX)
¾;¢ > 0: P ¢¡
¾¡
8x;y 2 X; P £ P(x) 6= P(y) ¤ 6 ¾ ¢dX(x;y):
SLIDE 15
Modulus of separated decomposability
Denote by the minimum such that for every there is a separating distribution over bounded random partitions of Note: we are ignoring here technical measurability issues that are important for mathematical applications in the infinite setting. For TCS purposes, it suffices to deal with random partitions of finite subsets of X.
SEP(X)
¾ > 0 ¢ > 0
¾¡
¢¡
(X;dX):
SLIDE 16
Theorem (Bartal, 1996): If then Goal of present work: to study for finite dimensional normed spaces X (and subsets thereof). Originated in Peleg-Reshef [1998], followed by important work of Charikar-Chekuri-Goel-Guha- Plotkin [1998].
jXj = n
SEP(X) . logn: SEP(X)
SLIDE 17
Sharp a priori bounds
Theorem: Suppose that X is an n-dimensional normed space. Then The upper bound follows from [CCGGP98]. The lower bound hasn’t been noticed before: it follows from a theorem of Bourgain-Szarek (1988) that is a consequence of the Bourgain- Tzafriri restricted invertibility principle (1987).
pn . SEP(X) . n:
SLIDE 18 Both bounds are asymptotically sharp, as shown in [CCGGP98]. In fact, it is proved there that For and
pn . SEP(X) . n: SEP(`n
2) ³ pn
and SEP(`n
1) ³ n:
p 2 [1;1) x = (x1;:::;xn) 2 Rn;
kxk`n
p :=
µ
n
X
j=1
jxjjp ¶ 1
p
kxk`n
1 :=
max
j2f1;:::;ng jxjj:
SLIDE 19 In [CCGGP98], Charikar-Chekuri-Goel-Guha-Plotkin asserted that The upper bound on in the above equivalence is valid as stated for all SEP(`n
p) ³
( n
1 p
if 1 6 p 6 2; n1¡ 1
p
if 2 6 p 6 1:
SEP(`n
p)
1 6 p 6 1;
SLIDE 20 In [CCGGP98], Charikar-Chekuri-Goel-Guha-Plotkin asserted that The upper bound on in the above equivalence is valid as stated for all but we show here that the matching lower bound is incorrect when SEP(`n
p) ³
( n
1 p
if 1 6 p 6 2; n1¡ 1
p
if 2 6 p 6 1:
SEP(`n
p)
1 6 p 6 1; 2 < p 6 1:
SLIDE 21 In [CCGGP98], Charikar-Chekuri-Goel-Guha-Plotkin asserted that The upper bound on in the above equivalence is valid as stated for all but we show here that the matching lower bound is incorrect when Thus, in particular, we obtain an asymptotically better probabilistic clustering of, say, SEP(`n
p) ³
( n
1 p
if 1 6 p 6 2; n1¡ 1
p
if 2 6 p 6 1:
SEP(`n
p)
1 6 p 6 1; 2 < p 6 1: `n
1:
SLIDE 22
Theorem: For every In particular, the previous best known bound when was (and this was asserted in [CCGGP98] to be sharp), but here we show that actually
p 2 [2; 1]; SEP(`n
p) .
p nminfp;logng:
p = 1
SEP(`n
1) . n
pn . SEP(`n
1) .
p nlogn:
SLIDE 23 The source of the error in [CCGGP98] was that it relied on unpublished work of Indyk (1998) that was not published since then; we confirmed with Indyk as well as with some of the authors
- f [CCGGP98] that there is indeed a flaw in the
(unpublished) work of Indyk that was cited. There is no flaw in the proof of [CCGGP98] in the range i.e.,
p 2 [1; 2]; p 2 [1; 2] = ) SEP(`n
p) ³ n
1 p:
SLIDE 24 Refined probabilistic partitions for sparse or rapidly decaying vectors
For and denote by the subset of consisting of all of those vectors with at most k nonzero entries, equipped with the metric. Theorem: For every we have
n 2 N k 2 f1;:::;ng (`n
p)6k
Rn
`n
p
p > 1
SEP ¡ (`n
p)6k
¢ . kmaxf 1
p ; 1 2g
r log ³n k ´ + minfp; log ng:
SLIDE 25
The special case becomes A curious aspect of this bound is that despite the fact that it is a statement about Euclidean geometry, our proof involves non-Euclidean geometric considerations. Specifically, the ubiquitous “iterative ball partitioning method” is applied to balls in with
p = 2
8 k 2 f1; : : : ; ng; SEP ¡ (`n
2)6k
¢ . r k log ³en k ´ :
`n
p
p = 1 + log(n=k):
SLIDE 26 Mixed-metric random partitions
Theorem: For every and there exists a distribution over random partitions of with the following properties. 1) 2) For every
p 2 [1; 1] ¢ > 0
P Rn
8x 2 Rn; diam`n
p
¡ P(x) ¢ 6 ¢:
P £ P(x) 6= P(y) ¤ . n
1 p p
minfp; log ng ¢ ¢ kx ¡ yk`n
2 :
x;y 2 Rn;
SLIDE 27 In particular, the special case shows that
- ne can obtain a random partition of into
clusters of diameter at most yet with the exponentially stronger Euclidean separation property
p = 2 Rn `n
1
¢
8x; y 2 Rn; P £ P(x) 6= P(y) ¤ . plog n ¢ ¢ kx ¡ yk`n
2 :
SLIDE 28
Iterative ball partitioning method
Karger-Motwani-Sudan (1998), Charikar-Chekuri-Goel-Guha-Plotkin (1998), Calinescu-Karloff-Rabani (2001). Iteratively remove balls of radius centered at i.i.d. points in the normed space X.
¢=2 BX = fx 2 X : kxkX 6 1g:
SLIDE 29
= x1 + ¢ 2 BX:
SLIDE 30
= x2 + ¢ 2 BX:
SLIDE 31
SLIDE 32
SLIDE 33
SLIDE 34
SLIDE 35
SLIDE 36
SLIDE 37
SLIDE 38 Theorem: Let be a norm on and let be the random partition that is obtained using iterative ball partitioning where the underlying balls are balls of radius in the norm Then (by design) for all and for every we have
P
k ¢ kX
Rn
¢=2 k ¢ kX: diamX ¡ P(x) ¢ 6 ¢ x 2 Rn
P £ P(x) 6= P(y) ¤ . voln¡1 ¡ Proj(x¡y)?(BX) ¢ ¢voln(BX) ¢ kx ¡ yk`n
2 :
x;y 2 Rn
Sharp when the right hand side is < 1 (using SchmuckenschlÄ ager [1992]).
SLIDE 39 Extremal hyperplane projections
The previously stated theorems about random partitions of follow from this general theorem in combination with the evaluation of the extremal volumes of hyperplane projections
- f the unit ball of that were obtained by
Barthe-N. (2002).
`n
p
`n
p
SLIDE 40 Extremal hyperplane projections
Theorem (Barthe-N., 2002): For every the following function is increasing in p. When the above ratio attains its maximum when
a 2 Rn r f0g;
p 7! voln¡1 ¡ Proja?(B`n
p )
¢ voln¡1(B`n¡1
p
) :
p > 2; a = (1;1;:::;1):
SLIDE 41
The need to use an auxiliary metric
In the special case of if one applies iterative ball partitioning using balls in the intrinsic metric (which are in this case simply axis-parallel hypercubes ), then one obtains a separation modulus of n. In other words, one cannot obtain our better estimate using the intrinsic metric of the space that we wish to partition!
`n
1;
[¡¢=2;¢=2]n SEP(`n
1) .
p nlogn
SLIDE 42
The need to use an auxiliary metric
Our bound follows by applying this procedure using balls in the metric that is induced from The metrics on and are O(1)-equivalent (the balls in are “rounded cubes”). But the corresponding volumes change drastically, which allows our theorem to yield a better (almost sharp) bound on
`n
log n:
`n
1
`n
log n
`n
log n
SEP(`n
1):
SLIDE 43 Further applications
- Solution of longstanding open problems on the
extension of Lipschitz functions.
- Improved probabilistic partitions of the
Schatten-von Neumann trace classes and their subset consisting of all the matrices of rank at most k (N.-Schechtman, forthcoming); improved Lipschitz extension theorems for
- New volumetric stability theorems.
- Several additional results in full journal version.
Sn
p
Sn
p: