Dim imensionality ty Redu eduction: Th Theoretic ical Ana - - PowerPoint PPT Presentation

โ–ถ
dim imensionality ty redu eduction
SMART_READER_LITE
LIVE PREVIEW

Dim imensionality ty Redu eduction: Th Theoretic ical Ana - - PowerPoint PPT Presentation

Dim imensionality ty Redu eduction: Th Theoretic ical Ana nalysis of Pr Practi tical Mea easu sures Nova Fandina Hebrew University Joint work with Yair Bartal, Hebrew University Ofer Neiman, Ben Gurion University 1 Outl utline


slide-1
SLIDE 1

Dim imensionality ty Redu eduction: Th Theoretic ical Ana nalysis of Pr Practi tical Mea easu sures

Nova Fandina Hebrew University Joint work with Yair Bartal, Hebrew University Ofer Neiman, Ben Gurion University

1

slide-2
SLIDE 2

Outl utline

  • Measuring the Quality of Embedding
  • in theory : worst case distortion analysis
  • in practice: average case distortion measures
  • in between: theoretical analysis of practical measures

(for dimensionality reduction methods)

  • Our Results
  • upper bounds
  • lower bounds
  • approximating optimal embedding

2

slide-3
SLIDE 3

Measuring the Quality ty of Embeddin ing: g: in theory

Basic question in metric embedding theory (informally) Given metric spaces ๐‘Œ and ๐‘, embed ๐‘Œ into ๐‘, with small error on the distances How well it can be done? In theory: โ€œwellโ€ traditionally means to minimize distortion of the worst pair

3

Definition For an embedding ๐‘”: ๐‘Œ โ†’ ๐‘ , for a pair of points ๐‘ฃ โ‰  ๐‘ค โˆˆ ๐‘Œ

  • ๐‘“๐‘ฆ๐‘ž๐‘๐‘œ๐‘ก๐‘” ๐‘ฃ, ๐‘ค =

๐‘’๐‘ ๐‘” ๐‘ฃ ,๐‘” ๐‘ค ๐‘’๐‘Œ ๐‘ฃ,๐‘ค

, ๐‘‘๐‘๐‘œ๐‘ข๐‘ 

๐‘” ๐‘ฃ, ๐‘ค = ๐‘’๐‘Œ ๐‘ฃ,๐‘ค ๐‘’๐‘ ๐‘” ๐‘ฃ ,๐‘” ๐‘ค

  • ๐‘’๐‘—๐‘ก๐‘ข๐‘๐‘ ๐‘ข๐‘—๐‘๐‘œ ๐‘” = ๐‘›๐‘๐‘ฆ๐‘ฃโ‰ ๐‘คโˆˆ๐‘Œ ๐‘“๐‘ฆ๐‘ž๐‘๐‘œ๐‘ก๐‘” ๐‘ฃ, ๐‘ค

โ‹… ๐‘›๐‘๐‘ฆ๐‘ฃโ‰ ๐‘คโˆˆ๐‘Œ{๐‘‘๐‘๐‘œ๐‘ข๐‘ 

๐‘” (๐‘ฃ, ๐‘ค)}

slide-4
SLIDE 4

Mea easuring the Quality ty of Embeddin ing: in pract ctice ce

Demand for the worst case guarantee is too strong: The quality of a method in practical applications is its average performance over all pairs

  • Yuval Shavitt and Tomer Tankel. Big-bang simulation for embedding network distances in Euclidean space.

IEEE/ACM Trans. Netw., 12(6), 2004.

  • P. Sharma, Z. Xu, S. Banerjee, and S. Lee. Estimating network proximity and
  • latency. Computer Communication Review, 36(3), 2006.
  • P. J. F. Groenen, R. Mathar, and W. J. Heiser. The majorization approach to multidimensional scaling for

minkowski distances. Journal of Classification, 12(1), 1995.

  • J. F. Vera, W. J. Heiser, and A. Murillo. Global optimization in any minkowski metric: A permutation-

translation simulated annealing algorithm for multidimensional scaling. Journal of Classification, 24(2), 2007.

  • A. Censi and D. Scaramuzza. Calibration by correlation using metric embedding from nonmetric similarities.

IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(10), 2013.

  • C. Lumezanu and N. Spring. Measurement manipulation and space selection in network coordinates. The

28th International Conference on Distributed Computing Systems, 2008.

  • S. Chatterjee, B. Neff, and P. Kumar. Instant approximate 1-center on road networks via embeddings. In

Proceedings of the 19th International Conference on Advances in Geographic Information Systems, GIS โ€™11, 2011.

  • S. Lee, Z. Zhang, S. Sahu, and D. Saha. On suitability of Euclidean embedding for host-based network

coordinate systems. IEEE/ACM Trans. Netw., 18(1), 2010.

  • L. Chennuru Vankadara and U. von Luxburg. Measures of distortion for machine learning. Advances in Neural

Information Processing Systems, Curran Associates, Inc., 2018.

Just a small sample from googolplex number of such studies

4

slide-5
SLIDE 5

Measuring the Quality ty of Embeddin ing: g: in pract ctice ce

Moments of Distortion and Relative Error

For ๐‘”: ๐‘Œ โ†’ ๐‘, for a pair ๐‘ฃ โ‰  ๐‘ค โˆˆ ๐‘Œ, ๐‘’๐‘—๐‘ก๐‘ข๐‘” ๐‘ฃ, ๐‘ค โ‰” max{๐‘“๐‘ฆ๐‘ž๐‘๐‘œ๐‘ก๐‘” ๐‘ฃ, ๐‘ค , ๐‘‘๐‘๐‘œ๐‘ข๐‘ ๐‘๐‘‘๐‘ข๐‘”(๐‘ฃ, ๐‘ค)}

โ„“๐’“-di distortio ion ABN[06]

For ๐‘”: ๐‘Œ โ†’ ๐‘, for a distribution ฮ  over pairs of ๐‘Œ, ๐‘Ÿ โ‰ฅ 1

Relative Error Measure [In many papers]

5

โ„“๐‘Ÿ

ฮ  -dist(f)= ๐น๐›ฒ

๐‘’๐‘—๐‘ก๐‘ข๐‘” ๐‘ฃ, ๐‘ค

๐‘Ÿ 1/๐‘Ÿ

๐‘†๐น๐‘๐‘Ÿ

(ฮ ) = ๐น๐›ฒ

|๐‘’๐‘—๐‘ก๐‘ข๐‘” ๐‘ฃ, ๐‘ค โˆ’ 1|

๐‘Ÿ 1/๐‘Ÿ

slide-6
SLIDE 6

Measuring the Quality ty of Embeddin ing: g: in pract ctice ce

Additive Distortion Measures [MDS: optimally embed a given finite X into a k-dim Euclidean space, for a given k] For a pair ๐‘ฃ โ‰  ๐‘ค โˆˆ ๐‘Œ, ๐‘’๐‘ฃ๐‘ค = ๐‘’๐‘Œ ๐‘ฃ, ๐‘ค , แˆ˜ ๐‘’๐‘ฃ๐‘ค = ๐‘’๐‘ ๐‘” ๐‘ฃ , ๐‘” ๐‘ค

6

๐น๐‘œ๐‘“๐‘ ๐‘•๐‘ง๐‘Ÿ ๐‘” = Eฮ  | แˆ˜ ๐‘’๐‘ฃ๐‘ค โˆ’ ๐‘’๐‘ฃ๐‘ค| ๐‘’๐‘ฃ๐‘ค

๐‘Ÿ 1/๐‘Ÿ

๐‘‡๐‘ข๐‘ ๐‘“๐‘ก๐‘ก๐‘Ÿ ๐‘” = ๐นฮ [|๐‘’๐‘ฃ๐‘ค โˆ’ แˆ˜ ๐‘’๐‘ฃ๐‘ค|๐‘Ÿ] ๐นฮ [ ๐‘’๐‘ฃ๐‘ค ๐‘Ÿ]

1/๐‘Ÿ

๐‘‡๐‘ข๐‘ ๐‘“๐‘ก๐‘ก๐‘Ÿ

โˆ— ๐‘” =

๐นฮ [|๐‘’๐‘ฃ๐‘ค โˆ’ แˆ˜ ๐‘’๐‘ฃ๐‘ค|๐‘Ÿ] ๐นฮ [ แˆ˜ ๐‘’๐‘ฃ๐‘ค

๐‘Ÿ] 1/๐‘Ÿ

๐‘†๐น๐‘๐‘Ÿ ๐‘” = Eฮ  | แˆ˜ ๐‘’๐‘ฃ๐‘ค โˆ’ ๐‘’๐‘ฃ๐‘ค| min{๐‘’๐‘ฃ๐‘ค, แˆ˜ ๐‘’๐‘ฃ๐‘ค}

๐‘Ÿ 1/๐‘Ÿ

slide-7
SLIDE 7

Measuring the Quality ty of Embeddin ing: g: in pract ctice ce

๐‰-distortion ML motivated, in [VvL18]

โ€œNecessary properties for ML applicationsโ€

  • translation invariance
  • scale invariance
  • monotonicity
  • robustness (outliers, noise)
  • incorporation of probability

7

โžข Almost nothing is known in terms of rigorous analysis

๐œ โˆ’ ๐‘’๐‘—๐‘ก๐‘ข(ฮ )๐‘Ÿ,๐‘  ๐‘” = Eฮ  ๐‘“๐‘ฆ๐‘๐‘ž๐‘œ๐‘ก๐‘” ๐‘ฃ, ๐‘ค โ„“๐‘ 

๐‘‰ โˆ’ ๐‘“๐‘ฆ๐‘ž๐‘๐‘œ๐‘ก ๐‘” โˆ’ 1 ๐‘Ÿ 1/๐‘Ÿ

โžข Many heuristics for

  • ptimizing these measures
  • โ„“๐‘ 

(U) โˆ’ ๐‘“๐‘ฆ๐‘ž๐‘๐‘œ๐‘ก ๐‘” = ๐นU[(๐‘“๐‘ฆ๐‘ž๐‘๐‘œ๐‘ก๐‘” ๐‘ฃ, ๐‘ค ๐‘ )]

  • โ„“๐‘ 

(U) โˆ’ ๐‘‘๐‘๐‘œ๐‘ข๐‘  ๐‘” = ๐นU[(๐‘‘๐‘๐‘œ๐‘ข๐‘  ๐‘” ๐‘ฃ, ๐‘ค ๐‘ )]

slide-8
SLIDE 8

Measuring the Quality ty of Embeddin ing: g: in betw tween

Bridging the gap between theory and practice outlook [CD06] Optimizing is NP-hard for ๐‘‡๐‘ข๐‘ ๐‘“๐‘ก๐‘ก๐‘Ÿ and ๐‘™ = 1

8

๐›ฝ(๐‘™, ๐‘Ÿ)-Dimension Reduction Given a dimension bound ๐ฅ โ‰ฅ ๐Ÿ and ๐ซ โ‰ฅ ๐Ÿ, what is the least ๐›ƒ(๐ฅ, ๐ซ) such that every finite subset of Euclidean space embeds into ๐ฅ dim. with ๐๐Ÿ๐›๐ญ๐ฏ๐ฌ๐Ÿ๐ซ โ‰ค ๐›ƒ(๐ฅ, ๐ซ)? General Metrics (MDS) For a given finite ๐‘Œ and ๐‘™ โ‰ฅ 1, compute the optimal embedding of ๐‘Œ into k-dim Euclidean space, minimizing a particular ๐‘๐‘“๐‘๐‘ก๐‘ฃ๐‘ ๐‘“๐‘Ÿ. General Metrics: Approximating the Optimal Embedding For a given finite ๐‘Œ and for ๐‘™ โ‰ฅ 1, compute an embedding of X into k-dim Euclidean space that approximates the best possible embedding, for a given ๐‘๐‘“๐‘๐‘ก๐‘ฃ๐‘ ๐‘“๐‘Ÿ.

slide-9
SLIDE 9

Our ur Resu sult lts: : upper boun

  • unds

s previous results

Previous results: worst case distortion JL[84] Every ๐‘œ-point ๐‘Œ โˆˆ โ„“2

๐‘’ embeds into โ„“2 ๐‘™ with distortion ๐‘ƒ ๐‘œ

2 ๐‘™ (log ๐‘œ)/๐‘™

Mat[90] There is ๐‘Š โˆˆ โ„“2

๐‘™+1 such that any ๐‘”: ๐‘Š โ†’ โ„“2 ๐‘™ must have distortion ๐‘œฮฉ(1/๐‘™)

โ–ช distortion(f) โ‰ค (โ„“โˆž-dist)2 โ–ช For every ๐‘”: ๐‘Œ โ†’ ๐‘ (scalable) there is g: ๐‘Œ โ†’ ๐‘ with โ„“โˆž-๐ž๐ฃ๐ญ๐ฎ ๐ก = ๐ž๐ฃ๐ญ๐ฎ ๐  What about the ๐‘๐‘“๐‘๐‘ก๐‘ฃ๐‘ ๐‘“๐‘Ÿ guarantees for ๐‘Ÿ < โˆž?

9

๐›ฝ(๐‘™, ๐‘Ÿ)-Dimension Reduction Given a dimension bound ๐ฅ โ‰ฅ ๐Ÿ and ๐ซ โ‰ฅ ๐Ÿ, what is the least ๐›ƒ(๐ฅ, ๐ซ) such that every finite subset of Euclidean space embeds into ๐ฅ dim. with ๐๐Ÿ๐›๐ญ๐ฏ๐ฌ๐Ÿ๐ซ โ‰ค ๐›ƒ(๐ฅ, ๐ซ)?

slide-10
SLIDE 10

Our ur Resu esult lts: upper boun bounds s JL transform: IM implementation

The answer to the ๐›ฝ(๐‘™, ๐‘Ÿ)-Dim. Reduction question is, essentially, the JL transform [JL84] Projection onto a random subspace of dim. ๐’ = ๐‘ท(๐ฆ๐ฉ๐ก ๐’ /๐‘๐Ÿ‘), with const. prob. ๐’†๐’‹๐’•๐’– ๐’ˆ = ๐Ÿ + ๐‘ [IM 98] ๐‘ˆ is a matrix of size ๐‘™ ร— ๐‘’ with indep. entries sampled from ๐‘‚(0,1). The embedding ๐‘”: ๐‘Œ โ†’ โ„“2

๐‘™ is

๐‘” ๐‘ฆ = 1/ ๐‘™ โ‹… ๐‘ˆ(๐‘ฆ)

  • The JL transform of IM98 provides constant upper bounds for all ๐‘๐‘“๐‘๐‘ก๐‘ฃ๐‘ ๐‘“๐‘Ÿ

The bounds are almost optimal

  • Other popular implementations of JL do not work for โ„“๐‘Ÿ-dist and for ๐‘†๐น๐‘๐‘Ÿ
  • PCA may produce an embedding of extremely poor quality for all the measures

(this does not happen to the JL)

10

[tight, LN16]

slide-11
SLIDE 11

Our ur Resu sult lts: : upper boun

  • unds

s other implementations of JL

[Achl03] The entries of T are uniform indep. from {ยฑ1} [DKS10,KN10, AL10] Sparse/Fast: particular distr. from {ยฑ1,0} Constant bounds cannot be achieved using the above implementations

โ–ช โ„“๐‘Ÿ

ฮ  -dist(f) = ๐น๐›ฒ

๐‘’๐‘—๐‘ก๐‘ข๐‘” ๐‘ฃ, ๐‘ค

๐‘Ÿ 1/๐‘Ÿ

, ๐‘†๐น๐‘๐‘Ÿ

(ฮ ) = ๐น๐›ฒ

|๐‘’๐‘—๐‘ก๐‘ข๐‘” ๐‘ฃ, ๐‘ค โˆ’ 1|

๐‘Ÿ 1/๐‘Ÿ

โ–ช ๐‘’๐‘—๐‘ก๐‘ข๐‘” ๐‘ฃ, ๐‘ค = max(๐‘“๐‘ฆ๐‘ž๐‘๐‘œ๐‘ก๐‘” ๐‘ฃ, ๐‘ค , ๐‘‘๐‘๐‘œ๐‘ข๐‘ ๐‘๐‘‘๐‘ข๐‘”(๐‘ฃ, ๐‘ค))

โžข ๐‘ˆ ๐‘“1, โ€ฆ , ๐‘“๐‘’ = {๐‘‘๐‘๐‘š๐‘ฃ๐‘›๐‘œ๐‘ก ๐‘๐‘” ๐‘ˆ}. The number of different columns is ๐‘ก๐‘™ < ๐‘’

11

Observation If a linear transformation ๐‘ˆ: ๐‘†๐‘’ โ†’ ๐‘†๐‘™ samples its entries form a discrete set of values of size ๐‘ก โ‰ค ๐‘’1/๐‘™, then applying it on a standard basis of ๐‘†๐‘’ results in โ„“๐‘Ÿ-dist, ๐‘†๐น๐‘๐‘Ÿ = โˆž.

slide-12
SLIDE 12

Our ur Resu sult lts: upper boun

  • unds

s limitation of PCA

PCA/c-MDS For a given finite ๐‘Œ โˆˆ โ„“2

๐‘’ and a given integer ๐‘™ โ‰ฅ 1, computes the

best rank ๐‘™- approx. to ๐‘Œ: A projection ๐‘„ onto the ๐‘™- dim subspace spanned by largest eigenvectors of the covariance matrix, with the smallest ฯƒ๐‘ฃโˆˆ๐‘Œ ๐‘ฃ โˆ’ ๐‘„ ๐‘ฃ

2

โ–ช ๐‘”: ๐‘Œ โ†’ โ„“2

๐‘™ with optimal ฯƒ๐‘ฃโ‰ ๐‘คโˆˆ๐‘Œ (๐‘’๐‘ฃ๐‘ค 2 โˆ’ แˆ˜

๐‘’๐‘ฃ๐‘ค

2 ) over all projections

โ–ช Often misused: โ€œminimizing ๐‘‡๐‘ข๐‘ ๐‘“๐‘ก๐‘ก2 over all embeddings into ๐‘™-dimโ€ โ–ช Actually, PCA does not minimize any of the mentioned measures

12

slide-13
SLIDE 13

Our ur Resu esult lts: upper bou

  • unds

s Bad metric for PCA

  • The metric is in ๐‘’ dimensional Euclidean space, for any ๐‘’ large enough
  • Fix some ๐›ฝ < 1, and ๐‘Ÿ โ‰ฅ 1
  • Consider the standard basis vectors ๐‘“1, โ€ฆ , ๐‘“๐‘’
  • For each vector ๐‘“๐‘—, let ๐‘Œ๐‘— be the set of

1 ๐›ฝ๐‘— ๐‘Ÿ

copies of vector ๐›ฝ๐‘— โ‹… ๐‘“๐‘—, and let ๐‘

๐‘— be the set of the same size of the antipodal vector โˆ’๐›ฝ๐‘— โ‹… ๐‘“๐‘—

13

๐‘ง ๐‘ฆ ๐‘จ

  • ๐‘‡๐‘ข๐‘ ๐‘“๐‘ก๐‘ก2

2 measure: ฯƒ๐‘ฃโ‰ ๐‘คโˆˆ๐‘Œ ๐‘’๐‘ฃ๐‘คโˆ’ เท  ๐‘’๐‘ฃ๐‘ค

2

ฯƒ๐‘ฃโ‰ ๐‘คโˆˆ๐‘Œ ๐‘’๐‘ฃ๐‘ค

2

  • ฯƒ๐‘ฃโ‰ ๐‘ค ๐‘’๐‘ฃ๐‘ค

2 โ‰ˆ ๐‘’ โ‹… 1 ๐›ฝ๐‘’ 2

  • pairs between ๐‘Œ๐‘— and ๐‘Œ

๐‘˜, ๐‘— < ๐‘˜ contribute: 1 ๐›ฝ2๐‘— โ‹… 1 ๐›ฝ2๐‘˜ โ‹… (๐›ฝ2๐‘—+๐›ฝ2๐‘˜) โ‰ค 1 ๐›ฝ2๐‘— โ‹… 1 ๐›ฝ2๐‘˜ โ‹… 2๐›ฝ2๐‘— โ‰ค 2 1 ๐›ฝ2๐‘˜ 1 ๐›ฝ2๐‘— โ‹… 1 ๐›ฝ2๐‘˜ โ‹… (๐›ฝ2๐‘—+๐›ฝ2๐‘˜) โ‰ฅ 1 ๐›ฝ2๐‘˜

i j ๐›ฝ ๐›ฝ

๐›ฝ2

๐›ฝ3

slide-14
SLIDE 14

Our ur Resu esult lts: upper boun bounds s limitation of PCA

เท

๐‘ฃโ‰ ๐‘ค

๐‘’๐‘ฃ๐‘ค

2 โ‰ˆ ๐‘’ โ‹…

1 ๐›ฝ๐‘’

2

โžข PCA projects onto ๐‘ก๐‘ž๐‘๐‘œ{๐‘“1, ๐‘“2, โ€ฆ , ๐‘“๐‘™} taking ๐‘™ < 0.99๐‘’ โฆ โ„“๐’“-dist/๐‘†๐น๐‘๐‘Ÿ = โˆž โžข The JL embedding has bounded measures: ๐›ฝ ๐‘™ โ†’ 0, as ๐‘™ increases

14

๐‘ง ๐‘ฆ ๐‘จ

  • Error contribution: ๐‘— < ๐‘˜
  • โ‰ˆ

1 ๐›ฝ2๐‘— โ‹… 1 ๐›ฝ2๐‘˜

(๐›ฝ2๐‘—+๐›ฝ2๐‘˜) โˆ’ ๐›ฝ๐‘— โˆ’ ๐›ฝ๐‘˜

2

โ‰ˆ1/ฮฑ2๐‘— for ๐‘— < ๐‘˜, in total โ‰ˆ

1 ๐›ฝ๐‘’โˆ’1 2

  • ๐‘ป๐’–๐’”๐’‡๐’•๐’•๐Ÿ‘ โ‰ค

ฮค ๐œท ๐’†๐Ÿ/๐Ÿ‘

  • Error contribution: โ‰ฅ ๐‘’ โˆ’ ๐‘™

1/๐›ฝ๐‘’ 2

  • ๐‘‡๐‘ข๐‘ ๐‘“๐‘ก๐‘ก2 โ‰ฅ ฮฉ 1
  • Is not better than a naรฏve algo:

any non-expansive embedding i j

2๐›ฝ3

2๐›ฝ2

slide-15
SLIDE 15

Our ur Resu esult lts: upper bou

  • unds

s moment analysis of JL transform

The bounds are almost tight in most of the ranges of values of ๐‘Ÿ Proo

  • of

f (for for ๐ซ < ๐ฅ) For a given dist. ฮ  over pairs of ๐‘Œ, for a given ๐‘Ÿ โ‰ฅ 1

๐น

๐‘”

โ„“๐‘Ÿ

(ฮ ) โˆ’ ๐‘’๐‘—๐‘ก๐‘ข ๐‘” ๐‘Ÿ

= ๐น

๐‘”

๐น ๐‘ฃ,๐‘ค โˆผฮ  ๐‘’๐‘—๐‘ก๐‘ข๐‘” ๐‘ฃ, ๐‘ค

๐‘Ÿ

= ๐น u,v โˆผ ฮ  ๐น

๐‘”

๐‘’๐‘—๐‘ก๐‘ข๐‘” ๐‘ฃ, ๐‘ค

๐‘Ÿ

15

Theorem (Moment analysis of JL transform) There is a map (JL or normalized JL) ๐‘”: ๐‘Œ โ†’ โ„“2

๐‘™ s.t. for a given ๐‘Ÿ โ‰ฅ 1 with const. prob.

โ„“๐’“-dist(f) =

1 โ‰ค ๐‘Ÿ < ๐‘™ ๐‘™ โ‰ค ๐‘Ÿ โ‰ค ๐‘™/4 ๐‘™/4 โ‰ค ๐‘Ÿ โ‰ค ๐‘™ ๐‘Ÿ = ๐‘™ ๐‘™ โ‰ค ๐‘Ÿ โ‰ค โˆž 1 + ๐‘ƒ 1 ๐‘™ 1 + ๐‘ƒ ๐‘Ÿ ๐‘™ โˆ’ ๐‘Ÿ ๐‘™ ๐‘™ โˆ’ ๐‘Ÿ

๐‘ƒ(1/๐‘Ÿ)

๐‘ƒ log๐‘œ

1/๐‘™

๐‘œ

๐‘ƒ 1 ๐‘™โˆ’1 ๐‘Ÿ

slide-16
SLIDE 16

For every ๐‘ฃ โ‰  ๐‘ค โˆˆ ๐‘Œ , estimate ๐น๐‘” ๐‘’๐‘—๐‘ก๐‘ข๐‘” ๐‘ฃ, ๐‘ค

๐‘Ÿ

= ๐น๐‘” max

๐‘” ๐‘ฃ โˆ’๐‘” ๐‘ค ๐‘ฃโˆ’๐‘ค

,

๐‘ฃโˆ’๐‘ค ๐‘” ๐‘ฃ โˆ’๐‘” ๐‘ค ๐‘Ÿ

Since ๐‘” is a linear map, for any ๐‘จ โˆˆ โ„๐‘’, with ๐‘จ = 1 estimate ๐น๐‘” max ๐‘” ๐‘จ

๐‘Ÿ, 1 ๐‘” ๐‘จ

๐‘Ÿ

  • ๐‘” ๐‘จ = 1/ ๐‘™ โ‹… ๐‘ˆ ๐‘จ = 1/ ๐‘™ โ‹… < ๐‘จ, ๐‘ˆ

1 >, โ€ฆ , < ๐‘จ, ๐‘ˆ ๐‘™ > = 1/ ๐‘™ (๐‘ 1, โ€ฆ , ๐‘ ๐‘™),

for ๐‘

๐‘— โˆผ ๐‘‚(0,1)

  • ๐‘” ๐‘จ

๐‘Ÿ = ( ๐‘” ๐‘จ 2)๐‘Ÿ/2 = ๐‘Œ ๐‘™ ๐‘Ÿ/2

, where X โˆผ ๐œ“๐‘™

2

16

slide-17
SLIDE 17
  • ๐น

๐‘” max

๐‘” ๐‘จ

๐‘Ÿ, 1 ๐‘” ๐‘จ

๐‘Ÿ

โ‰ค ๐น๐‘Œโˆผ๐œ“๐‘™

2

ฮค ๐‘Œ ๐‘™ ๐‘Ÿ/2 + ๐น๐‘Œโˆผ๐œ“๐‘™

2

ฮค ๐‘™ ๐‘Œ ๐‘Ÿ/2

  • ๐น๐‘Œโˆผ๐œ“๐‘™

2

ฮค ๐‘™ ๐‘Œ ๐‘Ÿ/2 = Gamma function: ฮ“ ๐‘ข = ืฌ

โˆž ๐‘ฆ๐‘ขโˆ’1๐‘“โˆ’๐‘ฆ ๐‘’๐‘ฆ, ฮ“ ๐‘ข + 1 = ๐‘ข ฮ“(๐‘ข)

  • ๐น

๐‘”

โ„“๐‘Ÿ โˆ’ ๐‘‘๐‘๐‘œ๐‘ข๐‘  ๐‘”

๐‘Ÿ

= 1 + ๐‘ƒ

๐‘Ÿ ๐‘™โˆ’๐‘Ÿ ๐‘Ÿ

, for all ๐‘Ÿ < ๐‘™

  • ๐น

๐‘”

โ„“๐‘Ÿ โˆ’ ๐‘“๐‘ฆ๐‘ž๐‘๐‘œ๐‘ก ๐‘”

๐‘Ÿ

= 1 + ๐‘ƒ

๐‘Ÿ ๐‘™ ๐‘Ÿ

, for all ๐‘Ÿ โ‰ฅ 1

  • ๐น

๐‘” โ„“๐‘Ÿ โˆ’ ๐‘’๐‘—๐‘ก๐‘ข(๐‘”) โ‰ค 2

1 ๐‘Ÿ 1 + ๐‘ƒ

๐‘Ÿ ๐‘™โˆ’๐‘Ÿ

โ‰ค 1 +

1 ๐‘Ÿ

1 + ๐‘ƒ

๐‘Ÿ ๐‘™โˆ’๐‘Ÿ

= 1 + ๐‘ƒ(1/ ๐‘™), taking ๐‘Ÿ = ๐‘™ โ—ผ

17

goes to โˆž , as ๐‘Ÿ โ†’ ๐‘™

slide-18
SLIDE 18

Our ur Resu esult lts: upper bou

  • unds

s estimates for large values of ๐‘Ÿ

Proo

  • of

f The same as before, estimation of expectation conditioning on the event โˆ€ ๐‘ฃ โ‰  ๐‘ค โˆˆ ๐‘Œ, ๐‘‘๐‘๐‘œ๐‘ข๐‘ 

๐‘” ๐‘ฃ, ๐‘ค โ‰ค ๐‘œ

2 ๐‘™ (holds with const prob).

Normalize by an appropriate factor (depends on ๐‘Ÿ and ๐‘™)

18

Theorem (Moment analysis of JL transform) There is a map (normalized JL) ๐‘”: ๐‘Œ โ†’ โ„“2

๐‘™ s.t. for a given ๐‘Ÿ โ‰ฅ 1 with const. prob.

โ„“๐’“-dist(f) =

1 โ‰ค ๐‘Ÿ < ๐‘™ ๐‘™ โ‰ค ๐‘Ÿ โ‰ค ๐‘™/4 ๐‘™/4 โ‰ค ๐‘Ÿ โ‰ค ๐‘™ ๐‘Ÿ = ๐‘™ ๐‘™ โ‰ค ๐‘Ÿ โ‰ค โˆž 1 + ๐‘ƒ 1 ๐‘™ 1 + ๐‘ƒ ๐‘Ÿ ๐‘™ โˆ’ ๐‘Ÿ ๐‘™ ๐‘™ โˆ’ ๐‘Ÿ

๐‘ƒ(1/๐‘Ÿ)

๐‘ƒ log๐‘œ

1/๐‘™

๐‘œ

๐‘ƒ 1 ๐‘™โˆ’1 ๐‘Ÿ

โ—ผ

slide-19
SLIDE 19

Our ur Resu sult lts: : upper boun

  • unds

s REM and additive measures

โ–ช All the additive measures โ‰ค ๐‘†๐น๐‘ โ–ช As before, estimate the appropriate integral to get the bound โ–ช Tight for ๐‘Ÿ โ‰ฅ 2 (equilateral space on ๐‘œ points, ๐น๐‘œ) Based on [Alon90]: lower bound for embedding with 1 + ๐œ— w.c. dist For 1 โ‰ค ๐‘Ÿ < 2 the lower bound is ฮฉ 1/๐‘™1/๐‘Ÿ

20

Theorem (REM and additive measures analysis of JL) There is a map (JL) ๐‘”: ๐‘Œ โ†’ ๐‘†๐‘™, for ๐‘™ โ‰ฅ 2, s.t. with const. prob. for all 1 โ‰ค ๐‘Ÿ โ‰ค ๐‘™ โˆ’ 1 (simultaneously) ๐œ-dist, ๐‘‡๐‘ข๐‘ ๐‘“๐‘ก๐‘ก๐‘Ÿ, ๐‘‡๐‘ข๐‘ ๐‘“๐‘ก๐‘กโˆ—, ๐น๐‘œ๐‘“๐‘ ๐‘“๐‘•๐‘ง๐‘Ÿ ๐‘” โ‰ค ๐‘†๐น๐‘๐‘Ÿ ๐‘” = ๐‘ƒ( ฮค ๐‘Ÿ ๐‘™) ๐‘†๐น๐‘๐‘Ÿ = ๐นฮ [ ๐‘’๐‘—๐‘ก๐‘ข๐‘” ๐‘ฃ, ๐‘ค โˆ’ 1

๐‘Ÿ]

slide-20
SLIDE 20

Our ur Resu sult lts: : opti timal l emb mbedding for for ๐‘ญ๐’ small q

  • There is a gap with respect to the upper bounds provided by the JL

embedding

  • This is the best we can do for ๐น๐‘œ space:

24

Theorem โžข Any embedding ๐‘”: ๐น๐‘œ โ†’ โ„“2

๐‘™ must have:

โ„“๐‘Ÿ- dist ๐‘” = 1 + ฮฉ(๐‘Ÿ/๐‘™) for 1 โ‰ค ๐‘Ÿ โ‰ค ๐‘™ ๐น๐‘œ๐‘“๐‘ ๐‘•๐‘ง๐‘Ÿ ๐‘” = ฮฉ ฮค (1 ๐‘™)1/๐‘Ÿ for 1 โ‰ค ๐‘Ÿ < 2 For every ๐‘™ โ‰ฅ 3, for all 1 โ‰ค ๐‘Ÿ โ‰ค ๐‘™, for every distr. ฮ  over pairs of ๐น๐‘œ, there is a random map ๐‘”: ๐น๐‘œ โ†’ โ„“2

๐‘™ s.t. with const. prob.

โ„“๐‘Ÿ - dist ๐‘” = 1 + ๐‘ƒ(๐‘Ÿ/๐‘™)

slide-21
SLIDE 21

Our ur Resu sult lts: : opti timal l emb mbedding for for ๐‘ญ๐’ small ๐‘Ÿ < ๐‘™

Intuition

โฆ โฆ โฆ โฆ โฆ โฆ

25

Theorem

For every ๐‘™ โ‰ฅ 3, for all 1 โ‰ค ๐‘Ÿ โ‰ค ๐‘™, for every distr. ฮ  over pairs of ๐น๐‘œ, there is a random ๐‘”: ๐น๐‘œ โ†’ โ„“2

๐‘™ s.t. with const. prob. โ„“๐‘Ÿ - dist ๐‘” = 1 + ๐‘ƒ(๐‘Ÿ/๐‘™)

๐‘“1 ๐‘“2 ๐‘“3 โ€ฆ ๐‘“๐‘™ uniform metric in ๐’ dim uniform metric in ๐ฅ dim ๐‘ค๐‘— โ†’ uniformly chosen ๐‘“

๐‘˜

๐น๐‘œ = {๐‘ค1, ๐‘ค2, โ€ฆ , ๐‘ค๐‘œ} ๐‘ค1 ๐‘ค2 ๐‘ค3 ๐‘ค4

๐น(โ„“๐‘Ÿโˆ’ dist(๐‘”))๐‘Ÿ = ๐น [(๐‘’๐‘—๐‘ข๐‘ก๐‘” ๐‘ฃ, ๐‘ค )๐‘Ÿ] = Pr ๐‘” ๐‘ฃ = ๐‘” ๐‘ค โ‹… ๐น [ ๐‘’๐‘—๐‘ข๐‘ก๐‘” ๐‘ฃ, ๐‘ค )๐‘Ÿ ๐‘” ๐‘ฃ = ๐‘”(๐‘ค)] + Pr ๐‘” ๐‘ฃ โ‰  ๐‘” ๐‘ค โ‹… 1 โˆž

slide-22
SLIDE 22

Our ur Resu sult lts: opti timal l emb mbedding for for ๐‘ญ๐’ small ๐‘Ÿ < ๐‘™

26

  • each ๐‘‡[๐‘˜] has radius ๐‘ 

๐‘˜ = 1/ 2

  • Each sphere is of dim. ๐‘’, ๐‘‡ ๐‘˜ โŠ‚ โ„“2

๐‘Ÿ2

  • โงฃ spheres ๐‘™/๐‘Ÿ2

โ€ฆ ๐‘‡[1] ๐‘‡[2]

๐‘‡[๐‘™/๐‘Ÿ2]

Algorithm: map ๐บ: ๐น๐‘œ โ†’ โ„“2

๐‘™

โˆ€ ๐’˜ โˆˆ ๐‘ญ๐’ โฆ ind. & uniformly choose a sphere ๐‘‡ ๐‘˜ โฆ place ๐‘ค ind. & uniformly on ๐‘‡[๐‘˜]; ๐บ

๐‘˜ (๐‘ค) โ†’ โ„“2 ๐‘Ÿ2

โฆ ๐บ ๐‘ค โ†’

00000 00000 ๐บ

๐‘˜(๐‘ค)๐บ

๐‘˜ ๐บ ๐‘˜

00000

โ–ช ๐น(โ„“๐‘Ÿโˆ’ dist(๐‘”))๐‘Ÿ = ๐น [(๐‘’๐‘—๐‘ข๐‘ก๐‘” ๐‘ฃ, ๐‘ค )๐‘Ÿ] = Pr ๐‘” ๐‘ฃ = ๐‘” ๐‘ค โ‹… ๐น [ ๐‘’๐‘—๐‘ข๐‘ก๐‘” ๐‘ฃ, ๐‘ค )๐‘Ÿ ๐‘” ๐‘ฃ = ๐‘”(๐‘ค)] + Pr ๐‘” ๐‘ฃ โ‰  ๐‘” ๐‘ค โ‹… 1 โ–ช If ๐‘ค โˆˆ ๐‘‡ ๐‘˜ and ๐‘ฃ โˆˆ ๐‘‡ ๐‘— โŸน ๐บ ๐‘ค โˆ’ ๐บ ๐‘ฃ

2 =

๐บ

๐‘˜ ๐‘ค ิก2 โˆ’ ิก๐บ ๐‘˜ ๐‘ฃ 2 = 1/2 +1/2 = 1

2 ( ฮค ๐‘’ (๐‘’ โˆ’ ๐‘Ÿ))๐‘Ÿ/2 ฮค ๐‘Ÿ2 ๐‘™ โ‰ค [๐‘’=๐‘Ÿ2] 1 + ( ฮค ๐‘Ÿ2 ๐‘™) โ‹… ๐‘‘๐‘๐‘œ๐‘ก๐‘ข โ–ช Taking 1/๐‘Ÿ on both sides, โ„“๐‘Ÿ - dist ๐‘” = 1 + O(q/k).

slide-23
SLIDE 23

Our ur Resu sult lts: low

  • wer boun
  • unds

phase transition at ๐‘Ÿ = ๐‘™

Proof

  • It is enough to show that for any non-expansive ๐‘”: ๐น๐‘œ โ†’ โ„“2

๐‘™, โ„“๐‘™-dist ๐‘” โ‰ฅ ฮฉ log ๐‘œ

1 ๐‘™

๐‘™

  • Claim: if for any non-expansive ๐บ: ๐น๐‘œ โ†’ ๐‘, โ„“๐‘™-dist ๐บ โ‰ฅ ๐ธ ๐‘™, ๐‘œ , then

for any ๐‘”: ๐น๐‘œ โ†’ ๐‘, โ„“๐‘™-dist ๐‘” โ‰ฅ ๐‘‘๐‘๐‘œ๐‘ก๐‘ข โ‹… ๐ธ ๐‘™, ๐‘œ .

  • Since โ„“2

๐‘™ โˆผ โ„“โˆž ๐‘™ with distortion ๐‘™, it is enough to prove for any non-expansive

๐‘”: ๐น๐‘œ โ†’ โ„“โˆž

๐‘™ has โ„“๐‘™-dist ๐‘” โ‰ฅ ฮฉ log ๐‘œ 1/๐‘™

28

Theorem

  • For any ๐‘™ โ‰ฅ 1, any embedding ๐‘”: ๐น๐‘œ โ†’ โ„“2

๐‘™ has โ„“๐‘™-dist ๐‘” = ฮฉ log ๐‘œ

1 ๐‘™

๐‘™1/4

.

  • For any q > ๐‘™ โ‰ฅ 1, any embedding ๐‘”: ๐น๐‘œ โ†’ โ„“2

๐‘™ has โ„“๐‘Ÿ-dist ๐‘” = ฮฉ(๐‘œ

1 2๐‘™โˆ’ 1 2๐‘Ÿ) .

slide-24
SLIDE 24

Our ur Resu sult lts: (alm lmost) t) opti timal l low

  • wer boun
  • und for ๐‘Ÿ > ๐‘™

Claim: For every non-expansive ๐‘”: ๐น๐‘œ โ†’ โ„“โˆž

๐‘™ , โ„“๐‘™-dist ๐‘” โ‰ฅ ฮฉ log ๐‘œ 1/๐‘™ .

Proof

  • Basically, embedding (non-expansively) ๐น๐‘œ into โ„“โˆž

๐‘™ is

as embedding it (non-expansively) into a family of certain tree metrics ๐Ÿ‘-HST metrics of degree ๐’ โ€“ a family of all rooted trees on ๐‘œ leaves, with each node having at most 2๐‘™ children. The nodes have labels, decreasing by a factor of 2 along the paths from the root to each leaf. The rootโ€™s label is 1. Each tree defines a metric over the set of its leaves: ๐‘’๐‘—๐‘ก๐‘ข ๐‘ฃ, ๐‘ค = ๐‘š๐‘๐‘๐‘“๐‘š(๐‘š๐‘‘๐‘(๐‘ฃ, ๐‘ค))

29

1 ฮค 1 2 ฮค 1 2

ฮค 1 2 ฮค 1 4

slide-25
SLIDE 25

Our ur Resu sult lts: (alm lmost) t) opti timal l low

  • wer boun
  • und for ๐‘Ÿ > ๐‘™
  • Every non-expansive embedding ๐‘”: ๐น๐‘œ โ†’ โ„“โˆž

๐‘™ can be modified to the one that

embeds ๐น๐‘œ into a 2-HST tree from the family of degree ๐‘™, with better โ„“๐‘™- distortion For ๐‘™ = 2, r ecursively construct 2-HST tree, of degree 4

30

1 1

ฮค 1 2 ฮค 1 2

ฮค 1 4

1

slide-26
SLIDE 26

Our ur Resu esult lts: (alm almost) t) opti timal l low

  • wer bou
  • und for ๐‘Ÿ = ๐‘™

So, it is enough to prove that: Proof By induction on ๐‘œ, showing that the best tree is the perfectly balanced (each node has exactly 2๐‘™ children). Computing its weight completes the proof. โ—ผ

31

Claim Any non-expansive embedding ๐‘” of ๐น๐‘œ into a family of 2-HSTโ€™s

  • f degree ๐‘™, has (โ„“๐‘™โˆ’๐‘’๐‘—๐‘ก๐‘ข(๐‘”))๐‘™ โ‰ฅ ฮฉ(log ๐‘œ).
slide-27
SLIDE 27

Our ur Resu sult lts: approxi ximati ting g opti timal emb mbedding

[ABN06] Every finite X embeds into โ„“๐‘ž

๐‘ƒ๐‘ž(log ๐‘œ), with โ„“๐‘Ÿ-distortion ๐‘ƒ(๐‘Ÿ/๐‘ž).

Gives ๐‘ƒ(๐‘Ÿ) approximation to the optimum under โ„“๐‘Ÿ-distortion [HIL03] 2-approx. to ๐‘‡๐‘ข๐‘ ๐‘“๐‘ก๐‘กโˆž, for embedding into ๐‘™ = 1 dim [Bado03] ๐‘ƒ(1)-approx. to ๐‘‡๐‘ข๐‘ ๐‘“๐‘ก๐‘กโˆž, for embedding into ๐‘™ = 2 dim, under ๐‘š1norm [Dham04] ๐‘ƒ(log1/๐‘Ÿ ๐‘œ)-approx. to ๐‘‡๐‘ข๐‘ ๐‘“๐‘ก๐‘ก๐‘Ÿ, for embedding into ๐‘™ = 1 dim

32

General Metrics: Approximating the Optimal Embedding For a given finite ๐‘Œ and for ๐‘™ โ‰ฅ 1, compute an embedding of X into k-dim Euclidean space that approximates the best possible embedding, for a given ๐‘๐‘“๐‘๐‘ก๐‘ฃ๐‘ ๐‘“๐‘Ÿ.

slide-28
SLIDE 28

Our ur Resu sult lts: : approxi ximati ting g opti timal emb mbedding

For a given ๐‘Œ and q, ๐‘™ โ‰ฅ 1, for an objective measure ๐‘ƒ๐‘๐‘˜๐‘Ÿ โˆถ ๐‘ƒ๐‘„๐‘ˆ: ๐‘Œ โ†’ โ„“2

๐‘™ is an optimal embedding for the measure ๐‘ƒ๐‘๐‘˜๐‘Ÿ

Proo

  • of

f Outl tlin ine: (For Stress)

  • Find an optimal embedding ๐‘” without constraining the dimension
  • Reduce dimension with JL (IM implementation) and show this works

33

Theorem For a given finite ๐‘Œ, for a given ๐‘™ โ‰ฅ 3 and 2 โ‰ค ๐‘Ÿ โ‰ค ๐‘™ โˆ’ 1, there is a randomized polytime algorithm that computes an embedding F: ๐‘Œ โ†’ โ„“2

๐‘™ s.t. with const. prob.

  • ๐‘š๐‘Ÿ-๐‘’๐‘—๐‘ก๐‘ข ๐บ =

1 + ๐‘ƒ

๐Ÿ ๐’ + ๐’“ ๐’โˆ’๐’“

โ‹… ๐‘ท๐‘ธ๐‘ผ

  • ๐‘ƒ๐‘๐‘˜๐‘Ÿ

๐›ฒ ๐บ = ๐‘ƒ ๐‘ƒ๐‘๐‘˜๐‘Ÿ ๐‘ท๐‘ธ๐‘ผ

+ ๐‘ƒ ๐‘Ÿ/๐‘™ , for all the rest objective measures

slide-29
SLIDE 29

Our ur Resu sult lts: approxi ximati ting g opti timal emb mbedding proof outline

Find an optimal embedding ๐‘” without constraining the dimension

  • [LLR95]: SDP computes a map with optimal worst case distortion

For ๐‘Œ = {๐‘ฆ0, โ€ฆ , ๐‘ฆ๐‘œโˆ’1}, for each pair, variable ๐‘จ๐‘—๐‘˜ = ๐‘” ๐‘ฆ๐‘— โˆ’ ๐‘” ๐‘ฆ๐‘˜

2,

๐‘•๐‘—๐‘˜ = 1/2(๐‘จ0๐‘— + ๐‘จ0๐‘˜ โˆ’ ๐‘จ๐‘—๐‘˜) represents < ๐‘” ๐‘ฆ๐‘— , ๐‘”(๐‘ฆ๐‘˜) >, ๐‘” ๐‘ฆ0 = 0 ๐‘จ๐‘—๐‘˜ are Euclidean distances iff matrix ๐ป ๐‘—, ๐‘˜ โ‰” ๐‘•๐‘—๐‘˜ is PSD

  • Convex objective function for ๐‘‡๐‘ข๐‘ ๐‘“๐‘ก๐‘ก๐‘Ÿ: for ๐‘Ÿ โ‰ฅ 2

min ฯƒ0<๐‘—<๐‘œ

๐‘จ๐‘—๐‘˜ ๐‘’๐‘—๐‘˜ โˆ’ 1 2 ๐‘Ÿ/2

, s.t. ๐‘จ๐‘—๐‘˜ โ‰ฅ 0, ๐ป is PSD

34

slide-30
SLIDE 30

Our ur Resu sult lts: approxi ximati ting g opti timal emb mbedding proof outline

  • Apply JL ๐‘•: ๐‘” ๐‘Œ โ†’ โ„“2

๐‘™

  • ๐บ โ‰” ๐‘• โˆ˜ ๐‘”
  • Jl embedding is as in the claim, thus ๐น ๐น๐‘œ๐‘“๐‘ ๐‘•๐‘ง๐‘Ÿ ๐บ

โ‰ค ๐‘‘ โ‹… ๐‘ท๐‘ธ๐‘ผ + ๐‘ƒ

๐‘Ÿ ๐‘™

โ—ผ

35

Claim If ๐‘”: ๐‘Œ โ†’ ๐‘ is some embedding and ๐‘•: ๐‘ โ†’ ๐‘Ž is a random map that has ๐น ๐‘“๐‘ฆ๐‘ž๐‘๐‘œ๐‘ก๐‘• ๐‘ฃ, ๐‘ค = ๐ต, and ๐น ๐‘‘๐‘๐‘œ๐‘ข๐‘ 

๐‘• ๐‘ฃ, ๐‘ค

= ๐ถ, for all pairs in ๐‘Œ, then ๐น ๐น๐‘œ๐‘“๐‘ ๐‘•๐‘ง๐‘Ÿ ๐‘• โˆ˜ ๐‘” โ‰ค 4 โ‹… ๐น๐‘œ๐‘“๐‘ ๐‘•๐‘ง๐‘Ÿ ๐‘” โ‹… ๐น ๐น๐‘œ๐‘“๐‘ ๐‘•๐‘ง๐‘Ÿ ๐‘•

1 ๐‘Ÿ + 4 โ‹… ๐น ๐น๐‘œ๐‘“๐‘ ๐‘•๐‘ง๐‘Ÿ ๐‘• 1 ๐‘Ÿ

slide-31
SLIDE 31

Conc nclu lusions / Thank k you

  • u slide

โžข We initiate theoretical study of measurement criteria widely used in practical applications โžข We give theoretical bounds for these criteria, by showing that the JL random projection is, essentially, the optimal tool for dimensionality reduction โžข Our bounds result in approximate algorithm for embedding any finite metric into ๐‘™-dim space, with proven approximation ratio guarantees โžข One of the central open questions arising from our work : close the gap for values of ๐‘Ÿ โ‰ค ๐‘™, and ๐‘Ÿ < 2 โžข Thank you!

36