Low Rank Approximation Lecture 1 Daniel Kressner Chair for - PowerPoint PPT Presentation

Low Rank Approximation Lecture 1 Daniel Kressner Chair for Numerical Algorithms and HPC Institute of Mathematics, EPFL daniel.kressner@epfl.ch 1

Organizational aspects ◮ Lectures: Tuesday 8-10, MA A110. First: September 25, Last: December 18. ◮ Exercises: Tuesday 8-10, MA A110. First: September 25, Last: December 18. ◮ Exam: Miniproject + oral exam. ◮ Webpage: https://anchp.epfl.ch/lowrank . ◮ daniel.kressner@epfl.ch , lana.perisa@epfl.ch 2

From http://www.niemanlab.org ... his [Aleksandr Kogan’s] message went on to confirm that his approach was indeed similar to SVD or other matrix factorization methods, like in the Netflix Prize competition, and the Kosinki-Stillwell- Graepel Facebook model. Dimensionality reduction of Facebook data was the core of his model. 3

Rank and basic properties For field F , let A ∈ F m × n . Then rank ( A ) := dim ( range ( A )) . For simplicity, F = R throughout the lecture and often m ≥ n . Lemma Let A ∈ R m × n . Then 1. rank ( A T ) = rank ( A ) ; 2. rank ( PAQ ) = rank ( A ) for invertible matrices P ∈ R m × m , Q ∈ R n × n ; 3. rank ( AB ) ≤ min { rank ( A ) , rank ( B ) } for any matrix B ∈ R n × p . �� A 11 �� A 12 = rank ( A 11 ) + rank ( A 22 ) for A 11 ∈ R m 1 × n 1 , 4. rank 0 A 22 A 12 ∈ R m 1 × n 2 , A 22 ∈ R m 2 × n 2 . Proof: See Linear Algebra 1 / Exercises. 4

Rank and matrix factorizations Let B = { b 1 , . . . , b r } ⊂ R m with r = rank ( A ) be basis of range ( A ) . � a 1 , a 2 , . . . , a n � Then each of the columns of A = can be expressed as linear combination of B :   c i 1 . � b 1 , . . . , b r � . a i = b 1 c i 1 + b 2 c i 2 + · · · + b r c ir =  ,   .  c ir for some coefficients c ij ∈ R with i = 1 , . . . , n , j = 1 , . . . , r . Stacking these relations column by column �  c 11 · · · c n 1  . . � a 1 , . . . , a n � � b 1 , . . . , b r � . . =   . .   c 1 r · · · c nr 5

Rank and matrix factorizations Lemma. A matrix A ∈ R m × n of rank r admits a factorization of the form A = BC T , B ∈ R m × r , C ∈ R n × r . We say that A has low rank if rank ( A ) ≪ m , n . Illustration of low-rank factorization: BC T A #entries mn mr + nr ◮ Generically (and in most applications), A has full rank, that is, rank ( A ) = min { m , n } . ◮ Aim instead at approximating A by a low-rank matrix. 6

Questions addressed in lecture series What? Theoretical foundations of low-rank approximation. When? A priori and a posteriori estimates for low-rank approximation. Situations that allow for low-rank approximation techniques. Why? Applications in engineering, scientific computing, data analysis, ... where low-rank approximation plays a central role. How? State-of-the-art algorithms for performing and working with low-rank approximations. Will cover both, matrices and tensors. 7

Literature for Lecture 1 Golub/Van Loan’2013 Golub, Gene H.; Van Loan, Charles F . Matrix computations. Fourth edition. Johns Hopkins University Press, Baltimore, MD, 2013. Horn/Johnson’2013 Horn, Roger A.; Johnson, Charles R. Matrix analysis. Second edition. Cambridge University Press, 2013. + References on slides. 8

1. Fundamental tools ◮ SVD ◮ Relation to eigenvalues ◮ Norms ◮ Best low-rank approximation 9

The singular value decomposition Theorem (SVD). Let A ∈ R m × n with m ≥ n . Then there are orthogonal matrices U ∈ R m × m and V ∈ R n × n such that  σ 1  ...   A = U Σ V T ,  ∈ R m × n with Σ =     σ n  0 and σ 1 ≥ σ 2 ≥ · · · ≥ σ n ≥ 0. ◮ σ 1 , . . . , σ n are called singular values ◮ u 1 , . . . , u n are called left singular vectors ◮ v 1 , . . . , v n are called right singular vectors ◮ Av i = σ i u i , A T u i = σ i v i for i = 1 , . . . , n . ◮ Singular values are always uniquely defined by A . ◮ Singular values are never unique. If σ 1 > σ 2 > · · · σ n > 0 then unique up to u i ← ± u i , v i ← ± v i . 10

SVD: Sketch of proof Induction over n . n = 1 trivial. For general n , let v 1 solve max {� Av � 2 : � v � 2 = 1 } =: � A � 2 . Set σ 1 := � A � 2 and u 1 := Av 1 /σ 1 . 1 By definition, Av 1 = σ 1 u 1 . ∈ R m × m and � � After completion to orthogonal matrices U 1 = u 1 , U ⊥ � � ∈ R n × n : V 1 = v 1 , V ⊥ � u T � σ 1 u T w T � � 1 Av 1 1 AV ⊥ U T 1 AV 1 = = , U T U T ⊥ Av 1 ⊥ AV ⊥ 0 A 1 with w := V T ⊥ A T u 1 and A 1 = U T ⊥ AV ⊥ . � · � 2 invariant under orthogonal transformations � � σ 1 w T � �� σ 1 = � A � 2 = � U T � � σ 2 1 + � w � 2 1 AV 1 � 2 = ≥ 2 . � � 0 A 1 � � 2 Hence, w = 0. Proof completed by applying induction to A 1 . 1 If σ 1 = 0, choose arbitrary u 1 . 11

Very basic properties of the SVD ◮ r = rank ( A ) is number of nonzero singular values of A . ◮ kernel ( A ) = span { v r + 1 , . . . , v n } ◮ range ( A ) = span { u 1 , . . . , u r } 12

SVD: Computation (for small dense matrices) Computation of SVD proceeds in two steps: 1. Reduction to bidiagonal form: By applying n Householder reflectors from left and n − 1 Householder reflectors from right, compute orthogonal matrices U 1 , V 1 such that ❅ ❅   � B 1 � ❅ ❅ U T  , 1 AV 1 = B = = ❅  0 0 that is, B 1 ∈ R n × n is an upper bidiagonal matrix. 2. Reduction to diagonal form: Use Divide&Conquer to compute orthogonal matrices U 2 , V 2 such that Σ = U T 2 B 1 V 2 is diagonal. Set U = U 1 U 2 and V = V 1 V 2 . Step 1 is usually the most expensive. Remarks on Step 1: ◮ If m is significantly larger than n , say, m ≥ 3 n / 2, first computing QR decomposition of A reduces cost. ◮ Most modern implementations reduce A successively via banded form to bidiagonal form. 2 2 Bischof, C. H.; Lang, B.; Sun, X. A framework for symmetric band reduction. ACM Trans. Math. Software 26 (2000), no. 4, 581–601. 13

SVD: Computation (for small dense matrices) In most applications, vectors u n + 1 , . . . , u m are not of interest. By omitting these vectors one obtains the following variant of the SVD. Theorem (Economy size SVD). Let A ∈ R m × n with m ≥ n . Then there is a matrix U ∈ R m × n with orthonormal columns and an orthonormal matrix V ∈ R n × n such that  σ 1  ... A = U Σ V T ,  ∈ R n × n with Σ =    σ n and σ 1 ≥ σ 2 ≥ · · · ≥ σ n ≥ 0. Computed by M ATLAB ’s [U,S,V] = svd(A,’econ’) . Complexity: memory operations O ( mn 2 ) singular values only O ( mn ) O ( mn 2 ) economy size SVD O ( mn ) O ( m 2 + mn ) O ( m 2 n + mn 2 ) (full) SVD 14

SVD: Computation (for small dense matrices) Beware of roundoff error when interpreting singular value plots. Exmaple: semilogy(svd(hilb(100))) 10 0 10 -10 10 -20 0 20 40 60 80 100 ◮ Kink is caused by roundoff error and does not reflect true behavior of singular values. ◮ Exact singular values are known to decay exponentially. 3 ◮ Sometimes more accuracy possible. 4 . 3 Beckermann, B. The condition number of real Vandermonde, Krylov and positive definite Hankel matrices. Numer. Math. 85 (2000), no. 4, 553–577. 4 Drmaˇ c, Z.; Veseli´ c, K. New fast and accurate Jacobi SVD algorithm. I. SIAM J. Matrix Anal. Appl. 29 (2007), no. 4, 1322–1342 15

Singular/eigenvalue relations: symmetric matrices Symmetric A = A T ∈ R n × n admits spectral decomposition A = U diag ( λ 1 , λ 2 , . . . , λ n ) U T with orthogonal matrix U . After reordering may assume | λ 1 | ≥ | λ 2 | ≥ · · · ≥ | λ n | . Spectral decomposition can be turned into SVD A = U Σ V T by defining Σ = diag ( | λ 1 | , . . . , | λ n | ) , V = U diag ( sign ( λ 1 ) , . . . , sign ( λ n )) . Remark: This extends to the more general case of normal matrices (e.g., orthogonal or symmetric) via complex spectral or real Schur decompositions. 16

Singular/eigenvalue relations: general matrices Consider SVD A = U Σ V T of A ∈ R m × n with m ≥ n . We then have: 1. Spectral decomposition of Gramian A T A = V Σ T Σ V T = V diag ( σ 2 1 , . . . , σ 2 n ) V T � A T A has eigenvalues σ 2 1 , . . . , σ 2 n , right singular vectors of A are eigenvectors of A T A . 2. Spectral decomposition of Gramian AA T = U ΣΣ T U T = U diag ( σ 2 1 , . . . , σ 2 n , 0 , . . . , 0 ) U T � AA T has eigenvalues σ 2 1 , . . . , σ 2 n and, additionally, m − n zero eigenvalues, first n left singular vectors A are eigenvectors of AA T . 3. Decomposition of Golub-Kahan matrix � 0 � � 0 � T � � U � � U A 0 Σ 0 A = = A T Σ T 0 0 V 0 0 V � eigenvalues of A are ± σ j , j = 1 , . . . , n , and zero ( m − n times). 17

Norms: Spectral and Frobenius norm Given SVD A = U Σ V T , one defines: ◮ Spectral norm: � A � 2 = σ 1 . � ◮ Frobenius norm: � A � F = σ 2 1 + · · · + σ 2 n . Basic properties: ◮ � A � 2 = max {� Av � 2 : � v � 2 = 1 } (see proof of SVD). ◮ � · � 2 and � · � F are both (submultiplicative) matrix norms. ◮ � · � 2 and � · � F are both unitarily invariant, that is � QAZ � 2 = � A � 2 , � QAZ � F = � A � F for any orthogonal matrices Q , Z . ◮ � A � 2 ≤ � A � F ≤ � A � 2 / √ r ◮ � AB � F ≤ min {� A � 2 � B � F , � A � F � B � 2 } 18

Low Rank Approximation Lecture 1 Daniel Kressner Chair for - PowerPoint PPT Presentation

Low Rank Approximation Lecture 1 Daniel Kressner Chair for Numerical Algorithms and HPC Institute of Mathematics, EPFL daniel.kressner@epfl.ch 1 Organizational aspects Lectures: Tuesday 8-10, MA A110. First: September 25, Last: December

2 3 4 5 8 9 MINNEAPOLIS MILWAUKEE MSA RANK #16 MSA RANK #39 CHICAGO MSA RANK #3

Parallel Numerical Algorithms Chapter 6 Matrix Models Section 6.2 Low Rank Approximation

Low Rank Approximation Lecture 4 Daniel Kressner Chair for Numerical Algorithms and HPC

Low Rank Approximation Lecture 10 Daniel Kressner Chair for Numerical Algorithms and HPC

Low Rank Approximation Lecture 5 Daniel Kressner Chair for Numerical Algorithms and HPC

ECS231 Low-rank approximation revisited (Introduction to Randomized Algorithms) May 23, 2019

On the minimum rank of a graph Jisu Jeong June 21, 2013 Jisu Jeong On the minimum rank of a

Low Rank Approximation Lecture 3 Daniel Kressner Chair for Numerical Algorithms and HPC

6. Approximation and fitting norm approximation least-norm problems regularized

1 SVD applications: rank, column, row, and null spaces Rank : the rank of a matrix is equal to:

A new family of maximum rank distance codes or: Maximum rank distance codes and finite semifields

Predictive low-rank decomposition for kernel methods Francis Bach Michael Jordan Ecole des

Recitations for 10-701 Randomized Algorithm for matrices Mu Li April 9, 2013 Low-rank

Computing the Best Rank ( r 1 , r 2 , r 3 ) Approximation of a Tensor Lars Eld en

Optimal Rank-1 Hankel Approximation of Matrices Gerlind Plonka University of Gttingen CodEx

Bayesian Estimation of Low-rank Matrices Pierre Alquier Journes de Statistique du Sud,

Multi-norms H. G. Dales (Lancaster) Fields Institute, Toronto 20/21 March 2014 1 References

Slaters condition: proof p* = inf x f(x) s.t. Ax = b, g(x) 0 e.g., inf x 2 s.t. e x+2

NORMS, INNER PRODUCTS, AND ORTHOGONALITY Vector norms Generalize the familiar concept of

Mathematical Background Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Norms

Norm matters: efficient and accurate normalization schemes in deep networks Elad Hoffer*, Ron

Linear switched DAEs: Lyapunov exponents, a converse Lyapunov theorem, and Barabanov norms Stephan

Optimization of quadratic forms and t -norm forms on interval domains and computational complexity

Column Subset Selection Joel A. Tropp Applied & Computational Mathematics California

Low Rank Approximation Lecture 1 Daniel Kressner Chair for - PowerPoint PPT Presentation

Low Rank Approximation Lecture 1 Daniel Kressner Chair for Numerical Algorithms and HPC Institute of Mathematics, EPFL daniel.kressner@epfl.ch 1 Organizational aspects Lectures: Tuesday 8-10, MA A110. First: September 25, Last: December

2 3 4 5 8 9 MINNEAPOLIS MILWAUKEE MSA RANK #16 MSA RANK #39 CHICAGO MSA RANK #3

Parallel Numerical Algorithms Chapter 6 Matrix Models Section 6.2 Low Rank Approximation

Low Rank Approximation Lecture 4 Daniel Kressner Chair for Numerical Algorithms and HPC

Low Rank Approximation Lecture 10 Daniel Kressner Chair for Numerical Algorithms and HPC

Low Rank Approximation Lecture 5 Daniel Kressner Chair for Numerical Algorithms and HPC

ECS231 Low-rank approximation revisited (Introduction to Randomized Algorithms) May 23, 2019

On the minimum rank of a graph Jisu Jeong June 21, 2013 Jisu Jeong On the minimum rank of a

Low Rank Approximation Lecture 3 Daniel Kressner Chair for Numerical Algorithms and HPC

6. Approximation and fitting norm approximation least-norm problems regularized

1 SVD applications: rank, column, row, and null spaces Rank : the rank of a matrix is equal to:

A new family of maximum rank distance codes or: Maximum rank distance codes and finite semifields

Predictive low-rank decomposition for kernel methods Francis Bach Michael Jordan Ecole des

Recitations for 10-701 Randomized Algorithm for matrices Mu Li April 9, 2013 Low-rank

Computing the Best Rank ( r 1 , r 2 , r 3 ) Approximation of a Tensor Lars Eld en

Optimal Rank-1 Hankel Approximation of Matrices Gerlind Plonka University of Gttingen CodEx

Bayesian Estimation of Low-rank Matrices Pierre Alquier Journes de Statistique du Sud,

Multi-norms H. G. Dales (Lancaster) Fields Institute, Toronto 20/21 March 2014 1 References

Slaters condition: proof p* = inf x f(x) s.t. Ax = b, g(x) 0 e.g., inf x 2 s.t. e x+2

NORMS, INNER PRODUCTS, AND ORTHOGONALITY Vector norms Generalize the familiar concept of

Mathematical Background Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Norms

Norm matters: efficient and accurate normalization schemes in deep networks Elad Hoffer*, Ron

Linear switched DAEs: Lyapunov exponents, a converse Lyapunov theorem, and Barabanov norms Stephan

Optimization of quadratic forms and t -norm forms on interval domains and computational complexity

Column Subset Selection Joel A. Tropp Applied &amp; Computational Mathematics California

Column Subset Selection Joel A. Tropp Applied & Computational Mathematics California