jl lemma dimensionality reduction and subspace embeddings
play

JL Lemma, Dimensionality Reduction, and Subspace Embeddings - PowerPoint PPT Presentation

CS 498ABD: Algorithms for Big Data JL Lemma, Dimensionality Reduction, and Subspace Embeddings Lecture 11 September 29, 2020 Chandra (UIUC) CS498ABD 1 Fall 2020 1 / 25 F 2 estimation in turnstile setting AMS- ` 2 -Estimate : Let Y 1 , Y 2


  1. CS 498ABD: Algorithms for Big Data JL Lemma, Dimensionality Reduction, and Subspace Embeddings Lecture 11 September 29, 2020 Chandra (UIUC) CS498ABD 1 Fall 2020 1 / 25

  2. F 2 estimation in turnstile setting AMS- ` 2 -Estimate : Let Y 1 , Y 2 , . . . , Y n be { � 1 , +1 } random variables that are 4 -wise independent z 0 While (stream is not empty) do a j = ( i j , ∆ j ) is current update z z + ∆ j Y i j endWhile Output z 2 Claim: Output estimates || x || 2 I ? I 2 where x is the vector at end of stream of updates. ← ± : 't " " e : - - Chandra (UIUC) CS498ABD 2 Fall 2020 2 / 25

  3. Analysis Z = P n i =1 x i Y i and output is Z 2 Z 2 = X X x 2 i Y 2 i + 2 x i x j Y i Y j i 6 = j i and hence Z 2 ⇤ X x 2 i = || x || 2 ⇥ = 2 . E i One can show that Var ( Z 2 )  2(E Z 2 ⇤ ) 2 . ⇥ Chandra (UIUC) CS498ABD 3 Fall 2020 3 / 25

  4. Linear Sketching View Recall that we take average of independent estimators and take median to reduce error. Can we view all this as a sketch? AMS- ` 2 -Sketch : k = c log(1 / � ) / ✏ 2 Let M be a ` ⇥ n matrix with entries in { � 1 , 1 } s.t (i) rows are independent and (ii) in each row entries are 4 -wise independent z is a ` ⇥ 1 vector initialized to 0 While (stream is not empty) do a j = ( i j , ∆ j ) is current update z z + ∆ j Me i j endWhile Output vector z as sketch. M is compactly represented via k hash functions, one per row, independently chosen from 4 -wise independent hash family. Chandra (UIUC) CS498ABD 4 Fall 2020 4 / 25

  5. Geometric Interpretation Given vector x 2 R n let M the random map z = Mx has the following features ⇥ z 2 ⇤ = k x k 2 E[ z i ] = 0 and E 2 for each 1  i  k where k is i number of rows of M Thus each z 2 i is an estimate of length of x in Euclidean norm When k = Θ ( 1 ✏ 2 log(1 / � )) one can obtain an (1 ± ✏ ) estimate of k x k 2 by averaging and median ideas Thus we are able to compress x into k -dimensional vector z such " Is that z contains information to estimate k x k 2 accurately ¥ is ; I l " THE Hell xd I Chandra (UIUC) CS498ABD 5 Fall 2020 5 / 25

  6. Geometric Interpretation Given vector x 2 R n let M the random map z = Mx has the following features ⇥ z 2 ⇤ = k x k 2 E[ z i ] = 0 and E 2 for each 1  i  k where k is i number of rows of M Thus each z 2 i is an estimate of length of x in Euclidean norm When k = Θ ( 1 ✏ 2 log(1 / � )) one can obtain an (1 ± ✏ ) estimate of k x k 2 by averaging and median ideas Thus we are able to compress x into k -dimensional vector z such that z contains information to estimate k x k 2 accurately Question: Do we need median trick? Will averaging do? Chandra (UIUC) CS498ABD 5 Fall 2020 5 / 25

  7. Distributional JL Lemma Lemma (Distributional JL Lemma) Fix vector x 2 R d and let Π 2 R k ⇥ d matrix where each entry Π ij is chosen independently according to standard normal distribution N (0 , 1) distribution. If k = Ω ( 1 ✏ 2 log(1 / � )) , then with probability (1 � � ) E k 1 ( i - c) 11 × 112 E Π x k 2 = (1 ± ✏ ) k x k 2 . p • • k Can choose entries from { � 1 , 1 } as well. Note: unlike ` 2 estimation, entries of Π are independent. 1 Letting z = k Π x we have projected x from d dimensions to p k = O ( 1 ✏ 2 log(1 / � )) dimensions while preserving length to within (1 ± ✏ ) -factor. Chandra (UIUC) CS498ABD 6 Fall 2020 6 / 25

  8. C- R " ' " " ÷ ' ' Kx n ⇐ HEIL elite ) 11 × 11 . n NCO , l ) Tig . with pub = x G - f ) .

  9. Dimensionality reduction Theorem (Metric JL Lemma) Let v 1 , v 2 , . . . , v n be any n points/vectors in R d . For any ✏ 2 (0 , 1 / 2) , there is linear map f : R d ! R k where k  8 ln n / ✏ 2 such that for all 1  i < j  n , CHL ) (1 � ✏ ) || v i � v j || 2  || f ( v i ) � f ( v j ) || 2  || v i � v j || 2 . Moreover f can be obtained in randomized polynomial-time. Linear map f is simply given by random matrix Π : f ( v ) = Π v . • ve → bwfY . HEH I Uj iii. viii. - Hui - Y 'll ud - - Chandra (UIUC) CS498ABD 7 Fall 2020 7 / 25

  10. E Rd . Tu Vi , Vr , . " dpr . . . IT E R' you chase If Lemma : DJL then if any fixed vector - I , but K - KIT xllznn ( II. e) 11 × 112 I C- Rd i # j - Tj (7) Ji nectar ⇒ k= Eln 's = - in f , we chose If - = Ellen . G- plz ) with pub then - fill . IT Cui - 5. Ill , a line , Hui ' Ii - u rectum all ; band Rg union I - ⇐ g. ha with pals preserved are I - Ln 7 .

  11. Dimensionality reduction Theorem (Metric JL Lemma) Let v 1 , v 2 , . . . , v n be any n points/vectors in R d . For any ✏ 2 (0 , 1 / 2) , there is linear map f : R d ! R k where k  8 ln n / ✏ 2 such that for all 1  i < j  n , (1 � ✏ ) || v i � v j || 2  || f ( v i ) � f ( v j ) || 2  || v i � v j || 2 . Moreover f can be obtained in randomized polynomial-time. Linear map f is simply given by random matrix Π : f ( v ) = Π v . Proof. Apply DJL with � = 1 / n 2 and apply union bound to � n � vectors 2 ( v i � v j ) , i 6 = j . Chandra (UIUC) CS498ABD 7 Fall 2020 7 / 25

  12. DJL and Metric JL Key advantage: mapping is oblivious to data! Chandra (UIUC) CS498ABD 8 Fall 2020 8 / 25

  13. Normal Distribution 0 2 ⇡� 2 e � ( x − µ )2 1 Density function: f ( x ) = p 2 σ 2 Standard normal: N (0 , 1) is when µ = 0 , � = 1 44 = Chandra (UIUC) CS498ABD 9 Fall 2020 9 / 25

  14. Normal Distribution Cumulative density function for standard normal: R t 1 e � t 2 / 2 (no closed form) 1 Φ ( x ) = p 2 ⇡ # Chandra (UIUC) CS498ABD 10 Fall 2020 10 / 25

  15. Sum of independent Normally distributed variables Lemma Let X and Y be independent random variables. Suppose X ⇠ N ( µ X , � 2 X ) and Y ⇠ N ( µ Y , � 2 Y ) . Let Z = X + Y . Then Z ⇠ N ( µ X + µ Y , � 2 X + � 2 Y ) . Chandra (UIUC) CS498ABD 11 Fall 2020 11 / 25

  16. Sum of independent Normally distributed variables Lemma Let X and Y be independent random variables. Suppose X ⇠ N ( µ X , � 2 X ) and Y ⇠ N ( µ Y , � 2 Y ) . Let Z = X + Y . Then Z ⇠ N ( µ X + µ Y , � 2 X + � 2 Y ) . Corollary Let X and Y be independent random variables. Suppose X ⇠ N (0 , 1) and Y ⇠ N (0 , 1) . Let Z = aX + bY . Then Z ⇠ N (0 , a 2 + b 2 ) . Chandra (UIUC) CS498ABD 11 Fall 2020 11 / 25

  17. Sum of independent Normally distributed variables Lemma Let X and Y be independent random variables. Suppose X ⇠ N ( µ X , � 2 X ) and Y ⇠ N ( µ Y , � 2 Y ) . Let Z = X + Y . Then Z ⇠ N ( µ X + µ Y , � 2 X + � 2 Y ) . Corollary Let X and Y be independent random variables. Suppose X ⇠ N (0 , 1) and Y ⇠ N (0 , 1) . Let Z = aX + bY . Then Z ⇠ N (0 , a 2 + b 2 ) . Normal distribution is a stable distributions: adding two independent random variables within the same class gives a distribution inside the class. Others exist and useful in F p estimation for p 2 (0 , 2) . Chandra (UIUC) CS498ABD 11 Fall 2020 11 / 25

  18. Concentration of sum of squares of normally distributed variables � 2 ( k ) distribution: distribution of sum of k independent standard normally distributed variables Y = P k i =1 Z i where each Z i ' N (0 , 1) . Chandra (UIUC) CS498ABD 12 Fall 2020 12 / 25

  19. Concentration of sum of squares of normally distributed variables � 2 ( k ) distribution: distribution of sum of k independent standard normally distributed variables Effi 7- O ~ Y = P k i =1 Z i where each Z i ' N (0 , 1) . ⇥ Z 2 ⇤ = 1 hence E[ Y ] = k . E i = Chandra (UIUC) CS498ABD 12 Fall 2020 12 / 25

  20. Concentration of sum of squares of normally distributed variables � 2 ( k ) distribution: distribution of sum of k independent standard normally distributed variables Y = P k i =1 Z i where each Z i ' N (0 , 1) . ⇥ Z 2 ⇤ = 1 hence E[ Y ] = k . E i Lemma Let Z 1 , Z 2 , . . . , Z k be independent N (0 , 1) random variables and let Y = P i Z 2 i . Then, for ✏ 2 (0 , 1 / 2) , there is a constant c such that, Pr[(1 � ✏ ) 2 k  Y  (1 + ✏ ) 2 k ] � 1 � 2 e c ✏ 2 k . I " = . Chandra (UIUC) CS498ABD 12 Fall 2020 12 / 25

  21. � 2 distribution Density function Chandra (UIUC) CS498ABD 13 Fall 2020 13 / 25

  22. � 2 distribution Cumulative density function Chandra (UIUC) CS498ABD 14 Fall 2020 14 / 25

  23. Concentration of sum of squares of normally distributed variables � 2 ( k ) distribution: distribution of sum of k independent standard normally distributed variables Lemma Let Z 1 , Z 2 , . . . , Z k be independent N (0 , 1) random variables and i Z 2 let Y = P i . Then, for ✏ 2 (0 , 1 / 2) , there is a constant c such that, Pr[(1 � ✏ ) 2 k  Y  (1 + ✏ ) 2 k ] � 1 � 2 e c ✏ 2 k . T S Recall Cherno ff -Hoe ff ding bound for bounded independent non-negative random variables. Z 2 i is not bounded, however Cherno ff -Hoe ff ding bounds extend to sums of random variables with exponentially decaying tails. Chandra (UIUC) CS498ABD 15 Fall 2020 15 / 25

  24. → **¥¥÷h¥* t .

  25. Proof of DJL Lemma M Without loss of generality assume k x k 2 = 1 (unit vector) ER n IT ' z r - € Z i = P n , - - - - X , j =1 Π ij x i - - - . Z i ⇠ N (0 , 1) TIE Rk " " Xu IT Ella Elite ) llxth G- SYNTH , EH where G - f ) pub with 1k¥ # bust - N 10,17 . Tlij Chandra (UIUC) CS498ABD 16 Fall 2020 16 / 25

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend