Dimensionality Reduction embedding Distortion L Norm Corollaries - PDF document

Dimensionality Reduction Metric Space Isometric Dimensionality Reduction embedding Distortion L ∞ Norm Corollaries Anil Maheshwari Euclidean Norm anil@scs.carleton.ca School of Computer Science Carleton University Canada

Metric Space h X, d i Dimensionality Reduction Let X be a set of n -points and let d be a distance Metric Space measure associated with pairs of elements in X . Isometric embedding We say that h X, d i is a finite metric space if the function d Distortion satisfies metric properties, i.e. L ∞ Norm (a) 8 x 2 X , d ( x, x ) = 0 , Corollaries (b) 8 x, y 2 X, x 6 = y, d ( x, y ) > 0 , Euclidean Norm (c) 8 x, y 2 X , d ( x, y ) = d ( y, x ) (symmetry), and (d) 8 x, y, z 2 X , d ( x, y )  d ( x, z ) + d ( z, y ) (triangle inequality).

Embeddings Dimensionality Reduction Let h X, d i and h X 0 , d 0 i be two metric spaces. Metric Space Isometric Embedding: A map f : X ! X 0 is called an embedding. embedding Distortion Isometric embedding (i.e., distance preserving) if for all L ∞ Norm x, y 2 X , d ( x, y ) = d 0 ( f ( x ) , f ( y )) . Corollaries Euclidean Norm

Motivating Problem Dimensionality Reduction Input: X =Set of n -points in k -dimensional space, where Metric Space n >> 2 k Isometric embedding Output: A pair of points that maximize L 1 -distance. Distortion � n ) = O ( kn 2 ) time � L ∞ Norm Naive Solution: O ( k 2 Corollaries 1 ! L 2 k Better algorithm via isometric embedding of L k Euclidean Norm 1 running in O (2 k n ) time

Universality of L 1 -metric Dimensionality Reduction Metric Space L 1 -metric Isometric embedding Let h X, d i be any finite metric space, where n = | X | . X Distortion can be isometrically embedded into L 1 -metric space of L ∞ Norm appropriate dimension. Corollaries Euclidean Norm

Euclidean Metric Dimensionality Reduction Input: Metric Space defined by K 4 , C 4 , and star- Y w.r.t. Metric Space unweighted SP . Isometric embedding Question: Can one embed 4-points in Euclidean space Distortion isometrically? L ∞ Norm Corollaries Euclidean Norm

Distortion Dimensionality Reduction Contraction: Is the maximum factor by which the Metric Space d ( x,y ) Isometric distances shrink and it equals max x,y 2 X d 0 ( f ( x ) ,f ( y )) . embedding Distortion Expansion: Is the maximum factor by which the L ∞ Norm distances are stretched and it equals Corollaries d 0 ( f ( x ) ,f ( y )) max x,y 2 X . d ( x,y ) Euclidean Norm Distortion: of an embedding is the product of its expansion and contraction factor.

2 D D log n ) ! L k = O ( Dn Dimensionality h X, d i , 1 Reduction Input: A metric space h X, d i , where X is a set of n -points Metric Space Isometric and let d satisfies the metric properties. embedding 2 D log n ) Output: An embedding of X in a k = O ( Dn Distortion dimensional space such that such that the distances gets L ∞ Norm distorted (actually contracted) by a factor of at most D Corollaries Euclidean Norm under L 1 norm.

2 D D log n ) ! L k = O ( Dn Dimensionality h X, d i (contd.) , 1 Reduction Let x, y 2 X and let f ( x ) , f ( y ) be their embedding in the Metric Space Isometric k -dimensional space, respectively. embedding Distortion Property L ∞ Norm The distances gets contracted by a factor of at most Corollaries d ( x,y ) D � 1 . Formally, max x,y 2 X || f ( x ) � f ( y ) || 1  D Euclidean Norm Example: If D = O (log n ) , k = O (log 2 n ) , i.e. O (log n ) L O (log 2 n ) h X, d i ! , 1 Meaning: Any metric space h X, d i can be embedded in a O (log 2 n ) -dimensional space and the distances may distort (contract) by a factor of at most O (log n ) . Applications ?

2 D D log n ) ! L k = O ( Dn Dimensionality Proof of h X, d i , 1 Reduction Metric Space Constructive proof via a randomized algorithm. Isometric embedding Definition Distortion Let S ✓ X . For x 2 X , define distance of x from S as L ∞ Norm d ( x, S ) = min z 2 S d ( x, z ) Corollaries Euclidean Norm Claim Let x, y 2 X . For all S ✓ X , | d ( x, S ) � d ( y, S ) |  d ( x, y ) .

Proof Contd. Dimensionality Reduction Metric Space Definition Isometric embedding ( Mapping ) Let x 2 X . Let S 1 , S 2 , · · · , S k ✓ X . The Distortion mapping f maps x to the point L ∞ Norm Corollaries f ( x ) = { d ( x, S 1 ) , d ( x, S 2 ) , · · · , d ( x, S k ) } . Euclidean Norm Observation: Let S 1 , S 2 , · · · , S k ✓ X . For x, y 2 X , || f ( x ) � f ( y ) || 1  d ( x, y ) .

Proof Contd. Dimensionality Reduction 2020-10-19 Definition ( Mapping ) Let x 2 X . Let S 1 , S 2 , · · · , S k ✓ X . The L ∞ Norm mapping f maps x to the point f ( x ) = { d ( x, S 1 ) , d ( x, S 2 ) , · · · , d ( x, S k ) } . Observation: Let S 1 , S 2 , · · · , S k ✓ X . For x, y 2 X , || f ( x ) � f ( y ) || 1  d ( x, y ) . Proof Contd. Proof. Follows from the above claim, as for each 1  i  k , | d ( x, S i ) � d ( y, S i ) |  d ( x, y ) .

Randomized Algorithm Dimensionality Reduction Input: Metric space X and parameter D . Metric Space Output: A set of O ( Dm ) subsets of X . Isometric embedding Distortion 2 , n � 2 p min( 1 D ) 1 L ∞ Norm 2 Corollaries D log n ) m O ( n 2 Euclidean Norm For j 1 to d D 2 e and 3 For i 1 to m : Choose set S ij by sampling each element of X independently with probability p j For each x 2 X return f ( x ) = [ d ( x, S 11 ) , · · · d ( x, S m 1 ) , 4 d ( x, S 12 ) , · · · , d ( x, S m 2 ) , · · · d ( x, S 1 d D 2 e ) , · · · , d ( x, S m d D 2 e )]

An Observation Dimensionality Reduction Let x, y be two distinct points of X . Let B ( x, r ) be the set Metric Space of points of X that are within a distance of r from x (think Isometric embedding of B ( x, r ) as a ball of radius r centred at x ). Similarly, let Distortion B ( y, r + ∆ ) be the set of points of X that are within a L ∞ Norm distance of r + ∆ from y . Consider a subset S ⇢ X such Corollaries that S \ B ( x, r ) 6 = ; and S \ B ( y, r + ∆ ) = ; . Then Euclidean Norm | d ( x, S ) � d ( y, S ) | � ∆ .

Key Lemma Dimensionality Reduction Metric Space Lemma Isometric embedding Let x, y be two distinct points of X . There exists an index Distortion j 2 { 1 , · · · , d D 2 e } such that if S ij is as chosen in the L ∞ Norm || f ( x ) � f ( y ) || 1 � d ( x,y ) � p ⇥ ⇤ Algorithm, than Pr 12 D Corollaries Euclidean Norm 2 , n � 2 p min( 1 D ) 1 2 D log n ) m O ( n 2 For j 1 to d D 2 e and 3 For i 1 to m : Choose set S ij by sampling each element of X independently with probability p j For each x 2 X return f ( x ) = [ d ( x, S 11 ) , · · · d ( x, S m 1 ) , 4 d ( x, S 12 ) , · · · , d ( x, S m 2 ) , · · · 2 e ) , · · · , d ( x, S m d D d ( x, S 1 d D 2 e )]

Ball Properties Dimensionality Reduction Set ∆ = d ( x,y ) . Metric Space D For i = 0 , · · · , d D 2 e , define balls of radius i ∆ as follows. Isometric embedding Let B 0 = { x } . Distortion B 1 be the ball of radius ∆ centred at y . L ∞ Norm B 2 is the ball of radius 2 ∆ centred at x . Corollaries B 3 is the ball centred at y of radius 3 ∆ and so on. Euclidean Norm Property I No even ball overlaps with an odd ball.

Ball Properties (contd.) Dimensionality Reduction For even (odd) i , let | B i | denote the number of points of Metric Space X that are within a distance of at most i ∆ from x Isometric embedding (respectively, y ). Distortion L ∞ Norm Property II Corollaries There is an index t 2 { 0 , · · · , d D 2 e � 1 } , such that Euclidean Norm 2( t +1) 2 t D and | B t +1 |  n | B t | � n D

Ball Properties (contd.) Dimensionality Reduction 2 t D and Let t be the index such that | B t | � n Metric Space 2( t +1) Isometric | B t +1 |  n D embedding Consider when j = t + 1 in the Algorithm. Distortion L ∞ Norm Property III Corollaries The set S ij chosen by the algorithm has non-empty Euclidean Norm intersection with B t with probability at least p/ 3 , and it will avoid B t +1 with probability at least 1 / 4 . Define: Event E 1 : S ij \ B t 6 = ; . Event E 2 : S ij \ B t +1 = ; .

Event E 1 Dimensionality Reduction Metric Space Pr ( S ij \ B t 6 = ; ) � p/ 3 Isometric embedding Distortion L ∞ Norm Corollaries Euclidean Norm

Event E 2 Dimensionality Reduction Metric Space Pr ( S ij \ B t +1 = ; ) � 1 / 4 Isometric embedding Distortion L ∞ Norm Corollaries Euclidean Norm

Main Theorem Dimensionality Reduction Metric Space 2 D D log n ) ! L k = O ( Dn Isometric h X, d i , 1 embedding Distortion L ∞ Norm Corollaries Euclidean Norm

L O (log 2 n ) Θ (log n ) Dimensionality Corollary 1: h X, d i ! , 1 Reduction Metric Space Set D = Θ (log n ) , in the Theorem 2 Isometric D D log n ) ! L k = O ( Dn embedding h X, d i and we obtain , 1 Distortion Θ (log n ) L O (log 2 n ) h X, d i ! . , 1 L ∞ Norm Corollaries Euclidean Norm

Dimensionality Reduction embedding Distortion L Norm Corollaries - PDF document

Dimensionality Reduction Metric Space Isometric Dimensionality Reduction embedding Distortion L Norm Corollaries Anil Maheshwari Euclidean Norm anil@scs.carleton.ca School of Computer Science Carleton University Canada Metric Space

STAT 209 Dimensionality Reduction November 26, 2019 Colin Reimer Dawson 1 / 24 Dimensionality

Dimensionality Reduction Alexandros Tantos Assistant Professor Aristotle University of

Investigating Dimensionality Dimensionality Dimensionality with with Investigating

WIKIPEDIA ARTICLE GROUP 9 Contents Article Overview 1. Dimensionality Reduction 2.

Nonlinear Dimensionality Reduction Donovan Parks Overview Direct visualization vs.

Dimensionality Reduction Algorithms (and how to interpret their output) Dalya Baron (Tel Aviv

Exploring Multivariate Data with Clustering and Dimensionality Reduction Marco Baroni Practical

Applied Machine Learning Dimensionality reduction using PCA Siamak Ravanbakhsh COMP 551 (Fall

Preprocessing and Dimensionality Reduction J er emy Fix CentraleSup elec

DIMENSIONALITY REDUCTION DIMENSIONALITY REDUCTION MATTHIEU BLOCH April 21, 2020 1 / 26

Probabilistic Dimensionality Reduction Neil D. Lawrence University of Sheffield Facebook, London

Kernel-Based Dimensionality Reduction Methods on Synthesized and Facial Image Data Jonathan L.

Spatial Data: Dimensionality Reduction CS444 Techniques, Lecture 3 In this subfield, we think

Spatial Data: Dimensionality Reduction CSC444 Techniques In this subfield, we think of a data

Dimensionality Reduction INFO-4604, Applied Machine Learning University of Colorado Boulder

Dimensionality Reduction Techniques for Proximity Problems Piotr Indyk, SODA 2000 CS 468 |

A formalization of metric spaces in HOL Light Marco Maggesi DiMaI - Universit` a degli Studi

Obstructions to embedding subsets of Schatten classes in L p spaces Gideon Schechtman Joint work

An Information-Theoretic Approach to Detecting Changes in Multi-Dimensional Data Streams Auth

Statistical Natural Language Processing A refresher on information theory ar ltekin

The Gromov-Hausdorff Propinquity Latrmolire, PhD Quantum Compact Metric Spaces Frdric

CALCULUS ON METRIC SPACES: BEYOND THE POINCAR INEQUALITY New Examples of Differentiability

On metric embeddings, shortest path decompositions and face cover of planar graphs Arnold

A Tight Bound on Approximating Arbitrary Metrics by Tree Metrics Jittat Fakcheroenphol Satish