low rank approximation lecture 1
play

Low Rank Approximation Lecture 1 Daniel Kressner Chair for - PowerPoint PPT Presentation

Low Rank Approximation Lecture 1 Daniel Kressner Chair for Numerical Algorithms and HPC Institute of Mathematics, EPFL daniel.kressner@epfl.ch 1 Organizational aspects Lectures: Tuesday 8-10, MA A110. First: September 25, Last: December


  1. Low Rank Approximation Lecture 1 Daniel Kressner Chair for Numerical Algorithms and HPC Institute of Mathematics, EPFL daniel.kressner@epfl.ch 1

  2. Organizational aspects ◮ Lectures: Tuesday 8-10, MA A110. First: September 25, Last: December 18. ◮ Exercises: Tuesday 8-10, MA A110. First: September 25, Last: December 18. ◮ Exam: Miniproject + oral exam. ◮ Webpage: https://anchp.epfl.ch/lowrank . ◮ daniel.kressner@epfl.ch , lana.perisa@epfl.ch 2

  3. From http://www.niemanlab.org ... his [Aleksandr Kogan’s] message went on to confirm that his approach was indeed similar to SVD or other matrix factorization meth- ods, like in the Netflix Prize competition, and the Kosinki-Stillwell- Graepel Facebook model. Dimensionality reduction of Facebook data was the core of his model. 3

  4. Rank and basic properties For field F , let A ∈ F m × n . Then rank ( A ) := dim ( range ( A )) . For simplicity, F = R throughout the lecture and often m ≥ n . Lemma Let A ∈ R m × n . Then 1. rank ( A T ) = rank ( A ) ; 2. rank ( PAQ ) = rank ( A ) for invertible matrices P ∈ R m × m , Q ∈ R n × n ; 3. rank ( AB ) ≤ min { rank ( A ) , rank ( B ) } for any matrix B ∈ R n × p . �� A 11 �� A 12 = rank ( A 11 ) + rank ( A 22 ) for A 11 ∈ R m 1 × n 1 , 4. rank 0 A 22 A 12 ∈ R m 1 × n 2 , A 22 ∈ R m 2 × n 2 . Proof: See Linear Algebra 1 / Exercises. 4

  5. Rank and matrix factorizations Let B = { b 1 , . . . , b r } ⊂ R m with r = rank ( A ) be basis of range ( A ) . � a 1 , a 2 , . . . , a n � Then each of the columns of A = can be expressed as linear combination of B :   c i 1 . � b 1 , . . . , b r � . a i = b 1 c i 1 + b 2 c i 2 + · · · + b r c ir =  ,   .  c ir for some coefficients c ij ∈ R with i = 1 , . . . , n , j = 1 , . . . , r . Stacking these relations column by column �  c 11 · · · c n 1  . . � a 1 , . . . , a n � � b 1 , . . . , b r � . . =   . .   c 1 r · · · c nr 5

  6. Rank and matrix factorizations Lemma. A matrix A ∈ R m × n of rank r admits a factorization of the form A = BC T , B ∈ R m × r , C ∈ R n × r . We say that A has low rank if rank ( A ) ≪ m , n . Illustration of low-rank factorization: BC T A #entries mn mr + nr ◮ Generically (and in most applications), A has full rank, that is, rank ( A ) = min { m , n } . ◮ Aim instead at approximating A by a low-rank matrix. 6

  7. Questions addressed in lecture series What? Theoretical foundations of low-rank approximation. When? A priori and a posteriori estimates for low-rank approximation. Situations that allow for low-rank approximation techniques. Why? Applications in engineering, scientific computing, data analysis, ... where low-rank approximation plays a central role. How? State-of-the-art algorithms for performing and working with low-rank approximations. Will cover both, matrices and tensors. 7

  8. Literature for Lecture 1 Golub/Van Loan’2013 Golub, Gene H.; Van Loan, Charles F . Matrix computations. Fourth edition. Johns Hopkins University Press, Baltimore, MD, 2013. Horn/Johnson’2013 Horn, Roger A.; Johnson, Charles R. Matrix analysis. Second edition. Cambridge University Press, 2013. + References on slides. 8

  9. 1. Fundamental tools ◮ SVD ◮ Relation to eigenvalues ◮ Norms ◮ Best low-rank approximation 9

  10. The singular value decomposition Theorem (SVD). Let A ∈ R m × n with m ≥ n . Then there are orthogonal matrices U ∈ R m × m and V ∈ R n × n such that  σ 1  ...   A = U Σ V T ,  ∈ R m × n with Σ =     σ n  0 and σ 1 ≥ σ 2 ≥ · · · ≥ σ n ≥ 0. ◮ σ 1 , . . . , σ n are called singular values ◮ u 1 , . . . , u n are called left singular vectors ◮ v 1 , . . . , v n are called right singular vectors ◮ Av i = σ i u i , A T u i = σ i v i for i = 1 , . . . , n . ◮ Singular values are always uniquely defined by A . ◮ Singular values are never unique. If σ 1 > σ 2 > · · · σ n > 0 then unique up to u i ← ± u i , v i ← ± v i . 10

  11. SVD: Sketch of proof Induction over n . n = 1 trivial. For general n , let v 1 solve max {� Av � 2 : � v � 2 = 1 } =: � A � 2 . Set σ 1 := � A � 2 and u 1 := Av 1 /σ 1 . 1 By definition, Av 1 = σ 1 u 1 . ∈ R m × m and � � After completion to orthogonal matrices U 1 = u 1 , U ⊥ � � ∈ R n × n : V 1 = v 1 , V ⊥ � u T � σ 1 u T w T � � 1 Av 1 1 AV ⊥ U T 1 AV 1 = = , U T U T ⊥ Av 1 ⊥ AV ⊥ 0 A 1 with w := V T ⊥ A T u 1 and A 1 = U T ⊥ AV ⊥ . � · � 2 invariant under orthogonal transformations � � σ 1 w T � �� � σ 1 = � A � 2 = � U T � � σ 2 1 + � w � 2 1 AV 1 � 2 = ≥ 2 . � � 0 A 1 � � 2 Hence, w = 0. Proof completed by applying induction to A 1 . 1 If σ 1 = 0, choose arbitrary u 1 . 11

  12. Very basic properties of the SVD ◮ r = rank ( A ) is number of nonzero singular values of A . ◮ kernel ( A ) = span { v r + 1 , . . . , v n } ◮ range ( A ) = span { u 1 , . . . , u r } 12

  13. SVD: Computation (for small dense matrices) Computation of SVD proceeds in two steps: 1. Reduction to bidiagonal form: By applying n Householder reflectors from left and n − 1 Householder reflectors from right, compute orthogonal matrices U 1 , V 1 such that ❅ ❅   � B 1 � ❅ ❅ U T  , 1 AV 1 = B = = ❅  0 0 that is, B 1 ∈ R n × n is an upper bidiagonal matrix. 2. Reduction to diagonal form: Use Divide&Conquer to compute orthogonal matrices U 2 , V 2 such that Σ = U T 2 B 1 V 2 is diagonal. Set U = U 1 U 2 and V = V 1 V 2 . Step 1 is usually the most expensive. Remarks on Step 1: ◮ If m is significantly larger than n , say, m ≥ 3 n / 2, first computing QR decomposition of A reduces cost. ◮ Most modern implementations reduce A successively via banded form to bidiagonal form. 2 2 Bischof, C. H.; Lang, B.; Sun, X. A framework for symmetric band reduction. ACM Trans. Math. Software 26 (2000), no. 4, 581–601. 13

  14. SVD: Computation (for small dense matrices) In most applications, vectors u n + 1 , . . . , u m are not of interest. By omitting these vectors one obtains the following variant of the SVD. Theorem (Economy size SVD). Let A ∈ R m × n with m ≥ n . Then there is a matrix U ∈ R m × n with orthonormal columns and an orthonormal matrix V ∈ R n × n such that  σ 1  ... A = U Σ V T ,  ∈ R n × n with Σ =    σ n and σ 1 ≥ σ 2 ≥ · · · ≥ σ n ≥ 0. Computed by M ATLAB ’s [U,S,V] = svd(A,’econ’) . Complexity: memory operations O ( mn 2 ) singular values only O ( mn ) O ( mn 2 ) economy size SVD O ( mn ) O ( m 2 + mn ) O ( m 2 n + mn 2 ) (full) SVD 14

  15. SVD: Computation (for small dense matrices) Beware of roundoff error when interpreting singular value plots. Exmaple: semilogy(svd(hilb(100))) 10 0 10 -10 10 -20 0 20 40 60 80 100 ◮ Kink is caused by roundoff error and does not reflect true behavior of singular values. ◮ Exact singular values are known to decay exponentially. 3 ◮ Sometimes more accuracy possible. 4 . 3 Beckermann, B. The condition number of real Vandermonde, Krylov and positive definite Hankel matrices. Numer. Math. 85 (2000), no. 4, 553–577. 4 Drmaˇ c, Z.; Veseli´ c, K. New fast and accurate Jacobi SVD algorithm. I. SIAM J. Matrix Anal. Appl. 29 (2007), no. 4, 1322–1342 15

  16. Singular/eigenvalue relations: symmetric matrices Symmetric A = A T ∈ R n × n admits spectral decomposition A = U diag ( λ 1 , λ 2 , . . . , λ n ) U T with orthogonal matrix U . After reordering may assume | λ 1 | ≥ | λ 2 | ≥ · · · ≥ | λ n | . Spectral decomposition can be turned into SVD A = U Σ V T by defining Σ = diag ( | λ 1 | , . . . , | λ n | ) , V = U diag ( sign ( λ 1 ) , . . . , sign ( λ n )) . Remark: This extends to the more general case of normal matrices (e.g., orthogonal or symmetric) via complex spectral or real Schur decompositions. 16

  17. Singular/eigenvalue relations: general matrices Consider SVD A = U Σ V T of A ∈ R m × n with m ≥ n . We then have: 1. Spectral decomposition of Gramian A T A = V Σ T Σ V T = V diag ( σ 2 1 , . . . , σ 2 n ) V T � A T A has eigenvalues σ 2 1 , . . . , σ 2 n , right singular vectors of A are eigenvectors of A T A . 2. Spectral decomposition of Gramian AA T = U ΣΣ T U T = U diag ( σ 2 1 , . . . , σ 2 n , 0 , . . . , 0 ) U T � AA T has eigenvalues σ 2 1 , . . . , σ 2 n and, additionally, m − n zero eigenvalues, first n left singular vectors A are eigenvectors of AA T . 3. Decomposition of Golub-Kahan matrix � 0 � � 0 � T � � U � � U A 0 Σ 0 A = = A T Σ T 0 0 V 0 0 V � eigenvalues of A are ± σ j , j = 1 , . . . , n , and zero ( m − n times). 17

  18. Norms: Spectral and Frobenius norm Given SVD A = U Σ V T , one defines: ◮ Spectral norm: � A � 2 = σ 1 . � ◮ Frobenius norm: � A � F = σ 2 1 + · · · + σ 2 n . Basic properties: ◮ � A � 2 = max {� Av � 2 : � v � 2 = 1 } (see proof of SVD). ◮ � · � 2 and � · � F are both (submultiplicative) matrix norms. ◮ � · � 2 and � · � F are both unitarily invariant, that is � QAZ � 2 = � A � 2 , � QAZ � F = � A � F for any orthogonal matrices Q , Z . ◮ � A � 2 ≤ � A � F ≤ � A � 2 / √ r ◮ � AB � F ≤ min {� A � 2 � B � F , � A � F � B � 2 } 18

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend