svd and low rank approximation
play

SVD and Low-rank Approximation Lecture 23 April 18, 2019 Chandra - PowerPoint PPT Presentation

CS 498ABD: Algorithms for Big Data, Spring 2019 SVD and Low-rank Approximation Lecture 23 April 18, 2019 Chandra (UIUC) CS498ABD 1 Spring 2019 1 / 18 Singular Value Decomposition (SVD) Let A be a m n real-valued matrix a i denotes


  1. CS 498ABD: Algorithms for Big Data, Spring 2019 SVD and Low-rank Approximation Lecture 23 April 18, 2019 Chandra (UIUC) CS498ABD 1 Spring 2019 1 / 18

  2. Singular Value Decomposition (SVD) Let A be a m × n real-valued matrix a i denotes vector corresponding to row i m rows. think of each row as a data point in R n Data applications: m ≫ n Other notation: A is a n × d matrix. Chandra (UIUC) CS498ABD 2 Spring 2019 2 / 18

  3. Singular Value Decomposition (SVD) Let A be a m × n real-valued matrix a i denotes vector corresponding to row i m rows. think of each row as a data point in R n Data applications: m ≫ n Other notation: A is a n × d matrix. SVD theorem: A can be written as UDV T where V is a n × n orthonormal matrix D is a m × n diagonal matrix with ≤ min { m , n } non-zeroes called the singular values of A U is a m × m orthonormal matrix Chandra (UIUC) CS498ABD 2 Spring 2019 2 / 18

  4. SVD Let d = min { m , n } . u 1 , u 2 , . . . , u m columns of U , left singular vectors of A v 1 , v 2 , . . . , v n columns of V (rows of V T ) right singular vectors of A σ 1 ≥ σ 2 ≥ . . . , ≥ σ d are singular values where d = min { m , n } . And σ i = D i , i d � σ i u i v T A = i i =1 Chandra (UIUC) CS498ABD 3 Spring 2019 3 / 18

  5. SVD Let d = min { m , n } . u 1 , u 2 , . . . , u m columns of U , left singular vectors of A v 1 , v 2 , . . . , v n columns of V (rows of V T ) right singular vectors of A σ 1 ≥ σ 2 ≥ . . . , ≥ σ d are singular values where d = min { m , n } . And σ i = D i , i d � σ i u i v T A = i i =1 We can in fact restrict attention to r the rank of A . r � σ i u i v T A = i i =1 Chandra (UIUC) CS498ABD 3 Spring 2019 3 / 18

  6. SVD Interpreting A as a linear operator A : R n → R m Columns of V is an orthonormal basis and hence V T x for x ∈ R n expresses x in the V basis. Note that V T x is a rigid transformation (does not change length of x ). Let y = V T z . D is a diagonal matrix which only stretches y along the coordinate axes. Also adjusts dimension to go from n to m with right number of zeroes. Let z = Dy . Then Uz is a rigid transformation that expresses z in the basis corresponding to rows of U . Thus any linear operator can be broken up into a sequence of three simpler/basic type of transformations Chandra (UIUC) CS498ABD 4 Spring 2019 4 / 18

  7. Low rank approximation property of SVD Question: Given A ∈ R m × n and integer k find a matrix B of rank at most k such that � A − B � is minimized Chandra (UIUC) CS498ABD 5 Spring 2019 5 / 18

  8. Low rank approximation property of SVD Question: Given A ∈ R m × n and integer k find a matrix B of rank at most k such that � A − B � is minimized Fact: For Frobenius norm optimum for all k is captured by SVD. That is, A k = � k i =1 σ i u i v T is the best rank k approximation to A i � A − A k � F = B : rank ( B ) ≤ k � A − B � F min Chandra (UIUC) CS498ABD 5 Spring 2019 5 / 18

  9. Low rank approximation property of SVD Question: Given A ∈ R m × n and integer k find a matrix B of rank at most k such that � A − B � is minimized Fact: For Frobenius norm optimum for all k is captured by SVD. That is, A k = � k i =1 σ i u i v T is the best rank k approximation to A i � A − A k � F = B : rank ( B ) ≤ k � A − B � F min Why this magic? Frobenius norm and basic properties of vector projections Chandra (UIUC) CS498ABD 5 Spring 2019 5 / 18

  10. Geometric meaning Consider k = 1 . What is the best rank 1 matrix B that minimizes � A − B � F Since B is rank 1 , B = uv T where v ∈ R n and u ∈ R m Wlog v is a unit vector Chandra (UIUC) CS498ABD 6 Spring 2019 6 / 18

  11. Geometric meaning Consider k = 1 . What is the best rank 1 matrix B that minimizes � A − B � F Since B is rank 1 , B = uv T where v ∈ R n and u ∈ R m Wlog v is a unit vector m � A − uv T � 2 � || a i − u ( i ) v || 2 F = i =1 Chandra (UIUC) CS498ABD 6 Spring 2019 6 / 18

  12. Geometric meaning Consider k = 1 . What is the best rank 1 matrix B that minimizes � A − B � F Since B is rank 1 , B = uv T where v ∈ R n and u ∈ R m Wlog v is a unit vector m � A − uv T � 2 � || a i − u ( i ) v || 2 F = i =1 If we know v then best u to minimize above is determined. Why? Chandra (UIUC) CS498ABD 6 Spring 2019 6 / 18

  13. Geometric meaning Consider k = 1 . What is the best rank 1 matrix B that minimizes � A − B � F Since B is rank 1 , B = uv T where v ∈ R n and u ∈ R m Wlog v is a unit vector m � A − uv T � 2 � || a i − u ( i ) v || 2 F = i =1 If we know v then best u to minimize above is determined. Why? For fixed v , u ( i ) = � a i , v � Chandra (UIUC) CS498ABD 6 Spring 2019 6 / 18

  14. Geometric meaning Consider k = 1 . What is the best rank 1 matrix B that minimizes � A − B � F Since B is rank 1 , B = uv T where v ∈ R n and u ∈ R m Wlog v is a unit vector m � A − uv T � 2 � || a i − u ( i ) v || 2 F = i =1 If we know v then best u to minimize above is determined. Why? For fixed v , u ( i ) = � a i , v � � a i − � a i , v � v � 2 is distance of a i from line described by v . Chandra (UIUC) CS498ABD 6 Spring 2019 6 / 18

  15. Geometric meaning What is the best rank 1 matrix B that minimizes � A − B � F It is to find unit vector/direction v to minimize m � || a i − � a i , v � v || 2 i =1 which is same as finding unit vector v to maximize m � � a i , v � 2 i =1 Chandra (UIUC) CS498ABD 7 Spring 2019 7 / 18

  16. Geometric meaning What is the best rank 1 matrix B that minimizes � A − B � F It is to find unit vector/direction v to minimize m � || a i − � a i , v � v || 2 i =1 which is same as finding unit vector v to maximize m � � a i , v � 2 i =1 How to find best v ? Not obvious: we will come to it a bit later Chandra (UIUC) CS498ABD 7 Spring 2019 7 / 18

  17. Best rank two approximation Consider k = 2 . What is the best rank 2 matrix B that minimizes � A − B � F Since B has rank 2 we can assume without loss of generality that B = u 1 v T 1 + u 2 v T 2 where v 1 , v 2 are orthogonal unit vectors (span a space of dimension 2 ) Chandra (UIUC) CS498ABD 8 Spring 2019 8 / 18

  18. Best rank two approximation Consider k = 2 . What is the best rank 2 matrix B that minimizes � A − B � F Since B has rank 2 we can assume without loss of generality that B = u 1 v T 1 + u 2 v T 2 where v 1 , v 2 are orthogonal unit vectors (span a space of dimension 2 ) Minimizing � A − B � 2 F is same as finding orthogonal vectors v 1 , v 2 to maximize m ( � a i , v 1 � 2 + � a i , v 2 � 2 ) � i =1 in other words the best fit 2 -dimensional space Chandra (UIUC) CS498ABD 8 Spring 2019 8 / 18

  19. Greedy algorithm Find v 1 as the best rank 1 approximation. That is � m i =1 � a i , v � 2 v 1 = arg max v , � v � 2 =1 � m i =1 � a i , v � 2 . For v 2 solve arg max v ⊥ v 1 , � v � 2 =1 Alternatively: let a ′ i = a i − � a i , v 1 � v 1 . Let � m i =1 � a ′ i , v � 2 v 2 = arg max v , � v � 2 =1 Chandra (UIUC) CS498ABD 9 Spring 2019 9 / 18

  20. Greedy algorithm Find v 1 as the best rank 1 approximation. That is � m i =1 � a i , v � 2 v 1 = arg max v , � v � 2 =1 � m i =1 � a i , v � 2 . For v 2 solve arg max v ⊥ v 1 , � v � 2 =1 Alternatively: let a ′ i = a i − � a i , v 1 � v 1 . Let � m i =1 � a ′ i , v � 2 v 2 = arg max v , � v � 2 =1 Greedy algorithm works! Chandra (UIUC) CS498ABD 9 Spring 2019 9 / 18

  21. Greedy algorithm correctness Proof that Greedy works for k = 1 . Suppose w 1 , w 2 are orthogonal unit vectors that form the best fit 2 -d space. Let H be the space spanned by w 1 , w 2 . Suffices to prove that m m ( � a i , v 1 � 2 + � a i , v 2 � 2 ) ≥ ( � a i , w 1 � 2 + � a i , w 2 � 2 ) � � i =1 i =1 Chandra (UIUC) CS498ABD 10 Spring 2019 10 / 18

  22. Greedy algorithm correctness Proof that Greedy works for k = 1 . Suppose w 1 , w 2 are orthogonal unit vectors that form the best fit 2 -d space. Let H be the space spanned by w 1 , w 2 . Suffices to prove that m m ( � a i , v 1 � 2 + � a i , v 2 � 2 ) ≥ ( � a i , w 1 � 2 + � a i , w 2 � 2 ) � � i =1 i =1 If v 1 ⊂ H then done because we can assume wlog that w 1 = v 1 and v 2 is at least as good as w 2 . Chandra (UIUC) CS498ABD 10 Spring 2019 10 / 18

  23. Greedy algorithm correctness Suppose v 1 �∈ H . Let v ′ 1 be projection of v 1 onto H and v ′′ 1 = v 1 − v ′ 1 be the component of v 1 orthogonal to H . Chandra (UIUC) CS498ABD 11 Spring 2019 11 / 18

  24. Greedy algorithm correctness Suppose v 1 �∈ H . Let v ′ 1 be projection of v 1 onto H and v ′′ 1 = v 1 − v ′ 1 be the component of v 1 orthogonal to H . Note that 1 � 2 + � v ′′ � v ′ 1 � 2 2 = � v 1 � 2 2 = 1 . 1 1 � 2 v ′ Wlog we can assume by rotation that w 1 = 1 and w 2 is � v ′ orthogonal to v ′ 1 . Hence w 2 is orthogonal to v 1 . Chandra (UIUC) CS498ABD 11 Spring 2019 11 / 18

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend