subspace embeddings for regression
play

Subspace Embeddings for Regression Lecture 12 October 1, 2020 - PowerPoint PPT Presentation

CS 498ABD: Algorithms for Big Data Subspace Embeddings for Regression Lecture 12 October 1, 2020 Chandra (UIUC) CS498ABD 1 Fall 2020 1 / 18 Subspace Embedding Question: Suppose we have linear subspace E of R n of dimension d . Can we find


  1. CS 498ABD: Algorithms for Big Data Subspace Embeddings for Regression Lecture 12 October 1, 2020 Chandra (UIUC) CS498ABD 1 Fall 2020 1 / 18

  2. Subspace Embedding Question: Suppose we have linear subspace E of R n of dimension d . Can we find a projection Π : R d ! R k such that for every x 2 E , k Π x k 2 = (1 ± ✏ ) k x k 2 ? Not possible if k < d . Possible if k = ` . Pick Π to be an orthonormal basis for E . Disadvantage: This requires knowing E and computing orthonormal basis which is slow. What we really want: Oblivious subspace embedding ala JL based on random projections Chandra (UIUC) CS498ABD 2 Fall 2020 2 / 18

  3. Oblivious Supspace Embedding Theorem Suppose E is a linear subspace of R n of dimension d . Let Π be a DJL matrix Π 2 R k ⇥ d with k = O ( d ✏ 2 log(1 / � )) rows. Then with probability (1 � � ) for every x 2 E , k 1 p Π x k 2 = (1 ± ✏ ) k x k 2 . k In other words JL Lemma extends from one dimension to arbitrary number of dimensions in a graceful way. Chandra (UIUC) CS498ABD 3 Fall 2020 3 / 18

  4. Part I Faster algorithms via subspace embeddings Chandra (UIUC) CS498ABD 4 Fall 2020 4 / 18

  5. Linear model fitting An important problem in data analysis n data points Each data point a i 2 R d and real value b i . We think of a i = ( a i , 1 , a i , 2 , . . . , a i , d ) . Interesting special case is when d = 1 . What model should one use to explain the data? Chandra (UIUC) CS498ABD 5 Fall 2020 5 / 18

  6. Linear model fitting An important problem in data analysis n data points Each data point a i 2 R d and real value b i . We think of a i = ( a i , 1 , a i , 2 , . . . , a i , d ) . Interesting special case is when d = 1 . What model should one use to explain the data? Simplest model? A ffi ne fitting. b i = ↵ 0 + P d j =1 ↵ j a i , j for some real numbers ↵ 0 , ↵ 1 , . . . , ↵ d . Can restrict to ↵ 0 = 0 by lifting to d + 1 dimensions and hence linear model. Chandra (UIUC) CS498ABD 5 Fall 2020 5 / 18

  7. Linear model fitting An important problem in data analysis n data points Each data point a i 2 R d and real value b i . We think of a i = ( a i , 1 , a i , 2 , . . . , a i , d ) . Interesting special case is when d = 1 . What model should one use to explain the data? Simplest model? A ffi ne fitting. b i = ↵ 0 + P d j =1 ↵ j a i , j for some real numbers ↵ 0 , ↵ 1 , . . . , ↵ d . Can restrict to ↵ 0 = 0 by lifting to d + 1 dimensions and hence linear model. But data is noisy so we won’t be able to satisfy all data points even if true model is a linear model. How do we find a good linear model? Chandra (UIUC) CS498ABD 5 Fall 2020 5 / 18

  8. Regression ¥ n data points . Each data point a i 2 R d and real value b i . We think of Lait , F- bi a i = ( a i , 1 , a i , 2 , . . . , a i , d ) . Linear model fitting: Find real numbers ↵ 1 , . . . , ↵ d such that - b b i ' P d A x j =1 ↵ j a i , j for all points. - = Let A be matrix with one row per data point a i . We write x 1 , x 2 , . . . , x d as variables for finding ↵ 1 , . . . , ↵ d . Ideally: Find x 2 R d such that Ax = b Best fit: Find x 2 R d to minimize Ax � b under some norm. k Ax � b k 1 , k Ax � b k 2 , k Ax � b k 1 f- = = Chandra (UIUC) CS498ABD 6 Fall 2020 6 / 18

  9. Linear least squares/Regression Linear least squares: Given A 2 R n ⇥ d and b 2 R d find x to minimize k Ax � b k 2 . Optimal estimator for certain noise models Interesting when n � d the over constrained case when there is no solution to Ax = b and want to find best fit. Wikipedia = Chandra (UIUC) CS498ABD 7 Fall 2020 7 / 18

  10. Linear least squares/Regression Linear least squares: Given A 2 R n ⇥ d and b 2 R d find x to minimize k Ax � b k 2 . ( I = Interesting when n � d the over constrained case when there is no solution to Ax = b and want to find best fit. Geometrically Ax is a linear combination of columns of A . Hence we are asking what is the vector z in the column space of A that is closest to vector b in ` 2 norm. Closest vector to b is the projection of b into the column space of A so it is “obvious” geometrically. How do we find it? Chandra (UIUC) CS498ABD 8 Fall 2020 8 / 18

  11. ⇐ " " [ gains [ . . A x Fix x HA x # blk ' x. A' + KA a xD Ad = = = 02 ? x - of A the column b span in is . wi the column space b is Supper int . ? what is the am 11 Ax - BHT +3 = n xp Dfid Is s .tA " " " " Itis . . " A ' = Ah

  12. Linear least squares/Regression Linear least squares: Given A 2 R n ⇥ d and b 2 R d find x to minimize k Ax � b k 2 . Geometrically Ax is a linear combination of columns of A . Hence we are asking what is the vector z in the column space of A that is closest to vector b in ` 2 norm. Closest vector to b is the projection of b into the column space of A so it is “obvious” geometrically. How do we find it? Find an orthonormal basis z 1 , z 2 , . . . , z r for the columns of A . Compute projection c of b to column space of A as c = P r j =1 h b , z j i z j and output answer as k b � c k 2 . expressing is obtained by What is x ? X Ax - = - c c Chandra (UIUC) CS498ABD 9 Fall 2020 9 / 18

  13. Linear least squares/Regression Linear least squares: Given A 2 R n ⇥ d and b 2 R d find x to minimize k Ax � b k 2 . Geometrically Ax is a linear combination of columns of A . Hence we are asking what is the vector z in the column space of A that is closest to vector b in ` 2 norm. Closest vector to b is the projection of b into the column space of A so it is “obvious” geometrically. How do we find it? Find an orthonormal basis z 1 , z 2 , . . . , z r for the columns of A . Compute projection c of b to column space of A as c = P r j =1 h b , z j i z j and output answer as k b � c k 2 . What is x ? We know that Ax = c . Solve linear system. Can combine both steps via SVD and other methods. Chandra (UIUC) CS498ABD 9 Fall 2020 9 / 18

  14. Linear least square: Optimization perspective Linear least squares: Given A 2 R n ⇥ d and b 2 R d find x to minimize k Ax � b k 2 . Optimization: Find x 2 R d to minimize k Ax � b k 2 2 = # k Ax � b k 2 2 = x T A T Ax � 2 b T Ax + b t b - K d . The quadratic function f ( x ) = x T A T Ax � 2 b T Ax + b t b is a convex function since the matrix A T A is positive semi-definite. r f ( x ) = 2 A T Ax � 2 b T A and hence optimum solution x ⇤ is given by x ⇤ = ( A T A ) � 1 b T A . ° # Chandra (UIUC) CS498ABD 10 Fall 2020 10 / 18

  15. Computational perspective n large (number of data points), d smaller so A is tall and skinny. Exact solution requires SVD or other methods. Worst case time nd 2 . Can we speed up computation with some potential approximation? ' d - aid . E = DI EL + nd F Chandra (UIUC) CS498ABD 11 Fall 2020 11 / 18

  16. Linear least squares via Subspace embeddings Let A (1) , A (2) , . . . , A ( d ) be the columns of A and let E be the subspace spanned by { A (1) , A (2) , . . . , A ( d ) , b } Note columns are in R n corresponding to n data points ITE Rk " ' ' ' j ' E has dimension at most d + 1 . EL Use subspace embedding on E . Applying JL matrix Π with k = O ( d ✏ 2 ) rows we reduce { A (1) , A (2) , . . . , A ( d ) , b } to - cdjftlugt 0 (1) , A 0 (2) , . . . , A 0 ( d ) , b 0 } which are vectors in R k . { A K - Solve min x 0 2 R d k A 0 x 0 � b 0 k 2 I Chandra (UIUC) CS498ABD 12 Fall 2020 12 / 18

  17. " d " by ft - - k :# bust ) " kilt IT e R' " n . " ' s xx # - ft k -

  18. Analysis Lemma With probability (1 � � ) , x 0 2 R d k A 0 x 0 � b 0 k 2  (1+ ✏ ) min (1 � ✏ ) min x 2 R d k Ax � b k  min x 2 R d k Ax � b k = Chandra (UIUC) CS498ABD 13 Fall 2020 13 / 18

  19. Analysis Lemma With probability (1 � � ) , x 0 2 R d k A 0 x 0 � b 0 k 2  (1+ ✏ ) min (1 � ✏ ) min x 2 R d k Ax � b k  min x 2 R d k Ax � b k a ← With probability (1 � � ) via the subpsace embedding guarantee, for all z 2 E , (1 � ✏ ) k z k 2  k Π z k 2  (1 + ✏ ) k z k 2 Now prove two inequalities in lemma separately using above. Chandra (UIUC) CS498ABD 13 Fall 2020 13 / 18

  20. Analysis Suppose x ⇤ is an optimum solution to min x k Ax � b k 2 . = Let z = Ax ⇤ � b . We have k Π z k 2  (1 + ✏ ) k z k 2 since z 2 E . , = Chandra (UIUC) CS498ABD 14 Fall 2020 14 / 18

  21. Analysis Suppose x ⇤ is an optimum solution to min x k Ax � b k 2 . Let z = Ax ⇤ � b . We have k Π z k 2  (1 + ✏ ) k z k 2 since z 2 E . Since x ⇤ is a feasible solution to min x 0 k A 0 x 0 � b 0 k , I x 0 k A 0 x 0 � b 0 k 2  k A 0 x ⇤ � b 0 k 2 = k Π ( Ax ⇤ � b ) k 2  (1+ ✏ ) k Ax ⇤ � b k 2 min Hitz 'll Elite ) # EH = = =htEHAxEblL Chandra (UIUC) CS498ABD 14 Fall 2020 14 / 18

  22. Analysis For any y 2 R d , k Π Ay � Π b k 2 � (1 � ✏ ) k Ay � b k 2 because Ay � b is a vector in E and Π preserves all of them. - tbh ay - b) HE HIT A y HIT zllz , Ci - e) Hell Chandra (UIUC) CS498ABD 15 Fall 2020 15 / 18

  23. Analysis For any y 2 R d , k Π Ay � Π b k 2 � (1 � ✏ ) k Ay � b k 2 because Ay � b is a vector in E and Π preserves all of them. - o Let y ⇤ be optimum solution to min x 0 k A 0 x 0 � b 0 k 2 . Then k Π ( Ay ⇤ � b ) k 2 � (1 � ✏ ) k Ay ⇤ � b k 2 � (1 � ✏ ) k Ax ⇤ � b k 2 =p T Chandra (UIUC) CS498ABD 15 Fall 2020 15 / 18

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend