JL Lemma, Dimensionality Reduction, and Subspace Embeddings - PowerPoint PPT Presentation

CS 498ABD: Algorithms for Big Data JL Lemma, Dimensionality Reduction, and Subspace Embeddings Lecture 11 September 29, 2020 Chandra (UIUC) CS498ABD 1 Fall 2020 1 / 25

F 2 estimation in turnstile setting AMS- ` 2 -Estimate : Let Y 1 , Y 2 , . . . , Y n be { � 1 , +1 } random variables that are 4 -wise independent z 0 While (stream is not empty) do a j = ( i j , ∆ j ) is current update z z + ∆ j Y i j endWhile Output z 2 Claim: Output estimates || x || 2 I ? I 2 where x is the vector at end of stream of updates. ← ± : 't " " e : - - Chandra (UIUC) CS498ABD 2 Fall 2020 2 / 25

Analysis Z = P n i =1 x i Y i and output is Z 2 Z 2 = X X x 2 i Y 2 i + 2 x i x j Y i Y j i 6 = j i and hence Z 2 ⇤ X x 2 i = || x || 2 ⇥ = 2 . E i One can show that Var ( Z 2 )  2(E Z 2 ⇤ ) 2 . ⇥ Chandra (UIUC) CS498ABD 3 Fall 2020 3 / 25

Linear Sketching View Recall that we take average of independent estimators and take median to reduce error. Can we view all this as a sketch? AMS- ` 2 -Sketch : k = c log(1 / � ) / ✏ 2 Let M be a ` ⇥ n matrix with entries in { � 1 , 1 } s.t (i) rows are independent and (ii) in each row entries are 4 -wise independent z is a ` ⇥ 1 vector initialized to 0 While (stream is not empty) do a j = ( i j , ∆ j ) is current update z z + ∆ j Me i j endWhile Output vector z as sketch. M is compactly represented via k hash functions, one per row, independently chosen from 4 -wise independent hash family. Chandra (UIUC) CS498ABD 4 Fall 2020 4 / 25

Geometric Interpretation Given vector x 2 R n let M the random map z = Mx has the following features ⇥ z 2 ⇤ = k x k 2 E[ z i ] = 0 and E 2 for each 1  i  k where k is i number of rows of M Thus each z 2 i is an estimate of length of x in Euclidean norm When k = Θ ( 1 ✏ 2 log(1 / � )) one can obtain an (1 ± ✏ ) estimate of k x k 2 by averaging and median ideas Thus we are able to compress x into k -dimensional vector z such " Is that z contains information to estimate k x k 2 accurately ¥ is ; I l " THE Hell xd I Chandra (UIUC) CS498ABD 5 Fall 2020 5 / 25

Geometric Interpretation Given vector x 2 R n let M the random map z = Mx has the following features ⇥ z 2 ⇤ = k x k 2 E[ z i ] = 0 and E 2 for each 1  i  k where k is i number of rows of M Thus each z 2 i is an estimate of length of x in Euclidean norm When k = Θ ( 1 ✏ 2 log(1 / � )) one can obtain an (1 ± ✏ ) estimate of k x k 2 by averaging and median ideas Thus we are able to compress x into k -dimensional vector z such that z contains information to estimate k x k 2 accurately Question: Do we need median trick? Will averaging do? Chandra (UIUC) CS498ABD 5 Fall 2020 5 / 25

Distributional JL Lemma Lemma (Distributional JL Lemma) Fix vector x 2 R d and let Π 2 R k ⇥ d matrix where each entry Π ij is chosen independently according to standard normal distribution N (0 , 1) distribution. If k = Ω ( 1 ✏ 2 log(1 / � )) , then with probability (1 � � ) E k 1 ( i - c) 11 × 112 E Π x k 2 = (1 ± ✏ ) k x k 2 . p • • k Can choose entries from { � 1 , 1 } as well. Note: unlike ` 2 estimation, entries of Π are independent. 1 Letting z = k Π x we have projected x from d dimensions to p k = O ( 1 ✏ 2 log(1 / � )) dimensions while preserving length to within (1 ± ✏ ) -factor. Chandra (UIUC) CS498ABD 6 Fall 2020 6 / 25

C- R " ' " " ÷ ' ' Kx n ⇐ HEIL elite ) 11 × 11 . n NCO , l ) Tig . with pub = x G - f ) .

Dimensionality reduction Theorem (Metric JL Lemma) Let v 1 , v 2 , . . . , v n be any n points/vectors in R d . For any ✏ 2 (0 , 1 / 2) , there is linear map f : R d ! R k where k  8 ln n / ✏ 2 such that for all 1  i < j  n , CHL ) (1 � ✏ ) || v i � v j || 2  || f ( v i ) � f ( v j ) || 2  || v i � v j || 2 . Moreover f can be obtained in randomized polynomial-time. Linear map f is simply given by random matrix Π : f ( v ) = Π v . • ve → bwfY . HEH I Uj iii. viii. - Hui - Y 'll ud - - Chandra (UIUC) CS498ABD 7 Fall 2020 7 / 25

E Rd . Tu Vi , Vr , . " dpr . . . IT E R' you chase If Lemma : DJL then if any fixed vector - I , but K - KIT xllznn ( II. e) 11 × 112 I C- Rd i # j - Tj (7) Ji nectar ⇒ k= Eln 's = - in f , we chose If - = Ellen . G- plz ) with pub then - fill . IT Cui - 5. Ill , a line , Hui ' Ii - u rectum all ; band Rg union I - ⇐ g. ha with pals preserved are I - Ln 7 .

Dimensionality reduction Theorem (Metric JL Lemma) Let v 1 , v 2 , . . . , v n be any n points/vectors in R d . For any ✏ 2 (0 , 1 / 2) , there is linear map f : R d ! R k where k  8 ln n / ✏ 2 such that for all 1  i < j  n , (1 � ✏ ) || v i � v j || 2  || f ( v i ) � f ( v j ) || 2  || v i � v j || 2 . Moreover f can be obtained in randomized polynomial-time. Linear map f is simply given by random matrix Π : f ( v ) = Π v . Proof. Apply DJL with � = 1 / n 2 and apply union bound to � n � vectors 2 ( v i � v j ) , i 6 = j . Chandra (UIUC) CS498ABD 7 Fall 2020 7 / 25

DJL and Metric JL Key advantage: mapping is oblivious to data! Chandra (UIUC) CS498ABD 8 Fall 2020 8 / 25

Normal Distribution 0 2 ⇡� 2 e � ( x − µ )2 1 Density function: f ( x ) = p 2 σ 2 Standard normal: N (0 , 1) is when µ = 0 , � = 1 44 = Chandra (UIUC) CS498ABD 9 Fall 2020 9 / 25

Normal Distribution Cumulative density function for standard normal: R t 1 e � t 2 / 2 (no closed form) 1 Φ ( x ) = p 2 ⇡ # Chandra (UIUC) CS498ABD 10 Fall 2020 10 / 25

Sum of independent Normally distributed variables Lemma Let X and Y be independent random variables. Suppose X ⇠ N ( µ X , � 2 X ) and Y ⇠ N ( µ Y , � 2 Y ) . Let Z = X + Y . Then Z ⇠ N ( µ X + µ Y , � 2 X + � 2 Y ) . Chandra (UIUC) CS498ABD 11 Fall 2020 11 / 25

Sum of independent Normally distributed variables Lemma Let X and Y be independent random variables. Suppose X ⇠ N ( µ X , � 2 X ) and Y ⇠ N ( µ Y , � 2 Y ) . Let Z = X + Y . Then Z ⇠ N ( µ X + µ Y , � 2 X + � 2 Y ) . Corollary Let X and Y be independent random variables. Suppose X ⇠ N (0 , 1) and Y ⇠ N (0 , 1) . Let Z = aX + bY . Then Z ⇠ N (0 , a 2 + b 2 ) . Chandra (UIUC) CS498ABD 11 Fall 2020 11 / 25

Sum of independent Normally distributed variables Lemma Let X and Y be independent random variables. Suppose X ⇠ N ( µ X , � 2 X ) and Y ⇠ N ( µ Y , � 2 Y ) . Let Z = X + Y . Then Z ⇠ N ( µ X + µ Y , � 2 X + � 2 Y ) . Corollary Let X and Y be independent random variables. Suppose X ⇠ N (0 , 1) and Y ⇠ N (0 , 1) . Let Z = aX + bY . Then Z ⇠ N (0 , a 2 + b 2 ) . Normal distribution is a stable distributions: adding two independent random variables within the same class gives a distribution inside the class. Others exist and useful in F p estimation for p 2 (0 , 2) . Chandra (UIUC) CS498ABD 11 Fall 2020 11 / 25

Concentration of sum of squares of normally distributed variables � 2 ( k ) distribution: distribution of sum of k independent standard normally distributed variables Y = P k i =1 Z i where each Z i ' N (0 , 1) . Chandra (UIUC) CS498ABD 12 Fall 2020 12 / 25

Concentration of sum of squares of normally distributed variables � 2 ( k ) distribution: distribution of sum of k independent standard normally distributed variables Effi 7- O ~ Y = P k i =1 Z i where each Z i ' N (0 , 1) . ⇥ Z 2 ⇤ = 1 hence E[ Y ] = k . E i = Chandra (UIUC) CS498ABD 12 Fall 2020 12 / 25

Concentration of sum of squares of normally distributed variables � 2 ( k ) distribution: distribution of sum of k independent standard normally distributed variables Y = P k i =1 Z i where each Z i ' N (0 , 1) . ⇥ Z 2 ⇤ = 1 hence E[ Y ] = k . E i Lemma Let Z 1 , Z 2 , . . . , Z k be independent N (0 , 1) random variables and let Y = P i Z 2 i . Then, for ✏ 2 (0 , 1 / 2) , there is a constant c such that, Pr[(1 � ✏ ) 2 k  Y  (1 + ✏ ) 2 k ] � 1 � 2 e c ✏ 2 k . I " = . Chandra (UIUC) CS498ABD 12 Fall 2020 12 / 25

� 2 distribution Density function Chandra (UIUC) CS498ABD 13 Fall 2020 13 / 25

� 2 distribution Cumulative density function Chandra (UIUC) CS498ABD 14 Fall 2020 14 / 25

Concentration of sum of squares of normally distributed variables � 2 ( k ) distribution: distribution of sum of k independent standard normally distributed variables Lemma Let Z 1 , Z 2 , . . . , Z k be independent N (0 , 1) random variables and i Z 2 let Y = P i . Then, for ✏ 2 (0 , 1 / 2) , there is a constant c such that, Pr[(1 � ✏ ) 2 k  Y  (1 + ✏ ) 2 k ] � 1 � 2 e c ✏ 2 k . T S Recall Cherno ff -Hoe ff ding bound for bounded independent non-negative random variables. Z 2 i is not bounded, however Cherno ff -Hoe ff ding bounds extend to sums of random variables with exponentially decaying tails. Chandra (UIUC) CS498ABD 15 Fall 2020 15 / 25

→ **¥¥÷h¥* t .

Proof of DJL Lemma M Without loss of generality assume k x k 2 = 1 (unit vector) ER n IT ' z r - € Z i = P n , - - - - X , j =1 Π ij x i - - - . Z i ⇠ N (0 , 1) TIE Rk " " Xu IT Ella Elite ) llxth G- SYNTH , EH where G - f ) pub with 1k¥ # bust - N 10,17 . Tlij Chandra (UIUC) CS498ABD 16 Fall 2020 16 / 25

JL Lemma, Dimensionality Reduction, and Subspace Embeddings - PowerPoint PPT Presentation

CS 498ABD: Algorithms for Big Data JL Lemma, Dimensionality Reduction, and Subspace Embeddings Lecture 11 September 29, 2020 Chandra (UIUC) CS498ABD 1 Fall 2020 1 / 25 F 2 estimation in turnstile setting AMS- ` 2 -Estimate : Let Y 1 , Y 2

STAT 209 Dimensionality Reduction November 26, 2019 Colin Reimer Dawson 1 / 24 Dimensionality

Dimensionality Reduction Alexandros Tantos Assistant Professor Aristotle University of

Subspace Embeddings and p -Regression Using Exponential Random Variables David P. Woodruff

Subspace Polynomials and Cyclic Subspace Codes Netanel Raviv Joint work with: Prof. Tuvi Etzion

Investigating Dimensionality Dimensionality Dimensionality with with Investigating

Embeddings @ Twitter Making ML easy with Embeddings !!! Sept 2018 Agenda 1 Team 2 Whats an

Subspace Embeddings for Regression Lecture 12 October 1, 2020 Chandra (UIUC) CS498ABD 1 Fall

Dimensionality Reduction Algorithms (and how to interpret their output) Dalya Baron (Tel Aviv

Exploring Multivariate Data with Clustering and Dimensionality Reduction Marco Baroni Practical

Preprocessing and Dimensionality Reduction J er emy Fix CentraleSup elec

WIKIPEDIA ARTICLE GROUP 9 Contents Article Overview 1. Dimensionality Reduction 2.

Nonlinear Dimensionality Reduction Donovan Parks Overview Direct visualization vs.

Applied Machine Learning Dimensionality reduction using PCA Siamak Ravanbakhsh COMP 551 (Fall

DIMENSIONALITY REDUCTION DIMENSIONALITY REDUCTION MATTHIEU BLOCH April 21, 2020 1 / 26

Probabilistic Dimensionality Reduction Neil D. Lawrence University of Sheffield Facebook, London

Graph based Subspace Segmentation Canyi Lu National University of Singapore Nov. 21, 2013

Studies of I=0 and 2 pi-pi scattering at kaon mass with physical pion mass in GPBC Tianle Wang 1 1

Bel ( x t ) = P ( z t | x t ) P ( x t | u t 1 , x t 1 ) Bel ( x t 1 ) dx t

Community Meeting: Woodrow Wilson Montessori K-8 Date: 09/21/2015 Woodrow Wilson Montessori K-8

COMP 3713 Operating Systems Slides Part 4 Jim Diamond CAR 409 Jodrey School of Computer

Probability and Statistics for Computer Science On

Machine Learning Lecture Notes on Clustering (IV) 2018-2019 Davide Eynard davide.eynard@usi.ch

Random Forests September 29, 2019 Random Forests September 29, 2019 1 / 30 Motto The clearest

INF4820 Algorithms for AI and NLP Evaluating Classifiers Clustering Erik Velldal &

JL Lemma, Dimensionality Reduction, and Subspace Embeddings - PowerPoint PPT Presentation

CS 498ABD: Algorithms for Big Data JL Lemma, Dimensionality Reduction, and Subspace Embeddings Lecture 11 September 29, 2020 Chandra (UIUC) CS498ABD 1 Fall 2020 1 / 25 F 2 estimation in turnstile setting AMS- ` 2 -Estimate : Let Y 1 , Y 2

STAT 209 Dimensionality Reduction November 26, 2019 Colin Reimer Dawson 1 / 24 Dimensionality

Dimensionality Reduction Alexandros Tantos Assistant Professor Aristotle University of

Subspace Embeddings and p -Regression Using Exponential Random Variables David P. Woodruff

Subspace Polynomials and Cyclic Subspace Codes Netanel Raviv Joint work with: Prof. Tuvi Etzion

Investigating Dimensionality Dimensionality Dimensionality with with Investigating

Embeddings @ Twitter Making ML easy with Embeddings !!! Sept 2018 Agenda 1 Team 2 Whats an

Subspace Embeddings for Regression Lecture 12 October 1, 2020 Chandra (UIUC) CS498ABD 1 Fall

Dimensionality Reduction Algorithms (and how to interpret their output) Dalya Baron (Tel Aviv

Exploring Multivariate Data with Clustering and Dimensionality Reduction Marco Baroni Practical

Preprocessing and Dimensionality Reduction J er emy Fix CentraleSup elec

WIKIPEDIA ARTICLE GROUP 9 Contents Article Overview 1. Dimensionality Reduction 2.

Nonlinear Dimensionality Reduction Donovan Parks Overview Direct visualization vs.

Applied Machine Learning Dimensionality reduction using PCA Siamak Ravanbakhsh COMP 551 (Fall

DIMENSIONALITY REDUCTION DIMENSIONALITY REDUCTION MATTHIEU BLOCH April 21, 2020 1 / 26

Probabilistic Dimensionality Reduction Neil D. Lawrence University of Sheffield Facebook, London

Graph based Subspace Segmentation Canyi Lu National University of Singapore Nov. 21, 2013

Studies of I=0 and 2 pi-pi scattering at kaon mass with physical pion mass in GPBC Tianle Wang 1 1

Bel ( x t ) = P ( z t | x t ) P ( x t | u t 1 , x t 1 ) Bel ( x t 1 ) dx t

Community Meeting: Woodrow Wilson Montessori K-8 Date: 09/21/2015 Woodrow Wilson Montessori K-8

COMP 3713 Operating Systems Slides Part 4 Jim Diamond CAR 409 Jodrey School of Computer

Probability and Statistics for Computer Science On

Machine Learning Lecture Notes on Clustering (IV) 2018-2019 Davide Eynard davide.eynard@usi.ch

Random Forests September 29, 2019 Random Forests September 29, 2019 1 / 30 Motto The clearest

INF4820 Algorithms for AI and NLP Evaluating Classifiers Clustering Erik Velldal &amp;

INF4820 Algorithms for AI and NLP Evaluating Classifiers Clustering Erik Velldal &