singular value decomposition
play

Singular Value Decomposition CS3220 - Summer 2008 Jonathan Kaldor - PowerPoint PPT Presentation

Singular Value Decomposition CS3220 - Summer 2008 Jonathan Kaldor Another Factorization? Weve already looked at A = LU (for n x n matrices) and A = QR (for m x m matrices) Both of them are exceedingly useful, but somewhat specialized


  1. Singular Value Decomposition CS3220 - Summer 2008 Jonathan Kaldor

  2. Another Factorization? • We’ve already looked at A = LU (for n x n matrices) and A = QR (for m x m matrices) • Both of them are exceedingly useful, but somewhat specialized

  3. Extending QR • We factored A = QR because we wanted an “easy” system to solve for the least squares problem (namely, upper triangular system) • Recall also that when solving n x n systems, we observed that diagonal systems were even easier to solve • Can we come up with a factorization where we only have orthogonal and diagonal matrices?

  4. The SVD • For any m x n matrix A , we can factor it into A = U ∑ V T , where: U : m x m orthogonal matrix V : n x n orthogonal matrix ∑ : m x n diagonal matrix, with ∑ i,i = σ i ≥ 0 σ i ’s are typically ordered so σ i ≥ σ i+1 for i=1...n

  5. Terminology • The σ i s are called the singular values of the matrix A • The columns u i of U are called the left singular vectors of A • The columns v i of V are called the right singular vectors of A

  6. Existence • Note: we have not made any qualifications about A • In particular, A doesn’t need to be full rank, and m can be less than, equal to, or greater than n (works for all sized matrices A ). Essentially, every matrix has an SVD factorization

  7. Uniqueness • SVD is “mostly” unique. Singular values σ i are unique, and singular vectors u i and v i are unique up to choice of sign if σ i s are distinct. If σ i = σ i+1 for some i, then SVD is not unique • Example: identity matrix

  8. How to Compute? • Short answer: in MATLAB, with [U, S, V] = svd(A); • Longer answer: algorithm is beyond the scope of this course • Sufficient to know that factorization exists, we can compute it, and that computing it is more expensive than LU or QR factorization.

  9. What does it mean? • Let A = U ∑ V T . What is Av 1 ( A multiplied by the first column of V )? • V T v 1 = e 1 = [1;0;0;...0] • ∑ e 1 = σ 1 e 1 = [ σ 1 ;0;0;...0] • U σ 1 e 1 = σ 1 Ue 1 = σ 1 u 1 • Extending this shows that Av i = σ i u i for all i

  10. What does it mean? • Take A = 2 x 2 matrix. Then U and V are both 2 x 2 matrices v 2 v 1 Unit Circle

  11. What does it mean? • Take A = 2 x 2 matrix. Then U and V are both 2 x 2 matrices v 2 v 1

  12. What does it mean? • Take A = 2 x 2 matrix. Then U and V are both 2 x 2 matrices v 2 v 2 v 1 v 1 Multiply by V T (rotates v vectors to axes)

  13. What does it mean? • Take A = 2 x 2 matrix. Then U and V are both 2 x 2 matrices v 2 v 1

  14. What does it mean? • Take A = 2 x 2 matrix. Then U and V are both 2 x 2 matrices

  15. What does it mean? • Take A = 2 x 2 matrix. Then U and V are both 2 x 2 matrices Multiply by ∑ (scales 1st axis by σ 1 , second by σ 2 )

  16. What does it mean? • Take A = 2 x 2 matrix. Then U and V are both 2 x 2 matrices

  17. What does it mean? • Take A = 2 x 2 matrix. Then U and V are both 2 x 2 matrices

  18. What does it mean? • Take A = 2 x 2 matrix. Then U and V are both 2 x 2 matrices u 1 u 2 Multiply by U (rotates ellipse to u 1 , u 2 )

  19. What does it mean? • Take A = 2 x 2 matrix. Then U and V are both 2 x 2 matrices u 1 v 2 v 1 u 2 Multiply by U ∑ V T

  20. What does it mean? • In 2D, says that A takes directions v 1 and v 2 , scales them by σ 1 and σ 2 , and then rotates them to u 1 and u 2 • Maps unit circle defined by v 1 and v 2 to ellipse defined by axes u 1 and u 2 • This geometric argument is true in general (but best not to try and imagine it for n > 3!)

  21. Rank Deficiency • Suppose A is 2 x 2, not the zero matrix, and not full rank (i.e. its singular). This means that it maps to a subspace of the 2D plane (i.e. a line). This can also be seen to be mapping the unit circle to a degenerate ellipse where one axis has length 0 • In terms of our rotation and scaling operations, this is equivalent to having σ 2 =0 (why σ 2 ?)

  22. Rank Deficiency • Thus, we can use the SVD to determine the rank of a matrix: Compute the SVD, and count the number of singular values > 0 (in practice, need to count number of singular values > some small epsilon to account for floating point issues)

  23. Rank Deficiency • If matrix is rank r ∑ 1 n σ 1 V 1T r ⋱ σ r m n-r U 1 V 2T U 2 r m - r r n-r

  24. Range and Null Space • Suppose v i is a vector in V 2 . What is Av i ? • If v i is in V 2 , then σ i = 0. That means that Av i = σ i u i = 0 • Thus, v i is in the null space of A • This means that V 2 is an orthonormal basis for the null space of A (or null( A ))

  25. Range and Null Space • Similarly, let x be any random vector. Then Ax is a linear combination of the columns of U 1 • This means that U 1 is an orthonormal basis for range( A )

  26. Range and Null Space • Since columns in V 1 are orthonormal to V 2 , and V 2 is an orthonormal basis for null( A ), it follows that V 1 is an orthonormal basis for the orthogonal complement of null( A ) (or null( A ) ⊥ ) • Similarly, columns of U 2 are orthonormal to U 1 , so U 2 is an orthonormal basis for ran( A ) ⊥

  27. Skinny SVD • Just like QR, there are some columns of U and V that we don’t need • Namely, any vector in V 2 gets zeroed out when multiplying by ∑ , and any vector in U 2 is zeroed when multiplying by ∑ • Can reconstruct A using only U 1 , ∑ (1:r, 1:r), and V 1 , where U 1 is m x r and V 1 is n x r

  28. Generalizing Least Squares • We know how to solve least squares as long as A is of full rank • Suppose A is instead of rank r ( σ r+1 = ... = σ n = 0). Can we still find an acceptable least squares solution? • Note: have to combine both overdetermined and underdetermined strategies (may not be an exact solution, and we can add any vector in the null space without changing the answer)

  29. Rank-Deficient Least Squares • ‖ Ax-b ‖ 22 = ‖ U ∑ V T x - b ‖ 22 = U 1 U 2 x ∑ 1 0 V 1T - b 0 0 V 2T = ∑ 1 0 V 1T U 1T V 2T x - b 0 0 U 2T U 2T b = ∑ 1 0 V 1T V 2T x - U 1T b +

  30. Rank Deficient Least Squares • = ∑ 1 0 V 1T V 2T x - U 1T b V 1T y 1 Let y = = and a = U 1T b . Then V 2T x y 2 y 1 ∑ 1 0 - a y 2

  31. Rank Deficient Least Squares • Can choose y 2 to be anything and still satisfy equation. Let it be 0 to minimize norm y 1 ∑ 1 0 - a y 2 Then our solution is simply ∑ 1 y 1 = a and y 2 = 0 . Recall that y 1 = V 1T x and a = U 1T b . Substituting everything, we get x = V 1 ∑ 1-1 U 1T b

  32. Rank Deficient Least Squares • Note that this included both of the observations we made while solving overdetermined and underdetermined systems (ignoring ‖ U 2T b ‖ since there was no way to minimize it, and seeing that we could arbitrarily set part of our solution vector ( V 2T x ) without changing the solution)

  33. Matrix Inverse • Let A = U ∑ V T be an n x n matrix of full rank. Then A -1 = V ∑ -1 U T • What can we observe about this? • Roles of U and V are changed • Singular values of inverse are reciprocals of singular values of A: 1/ σ i (note that they are in reverse order because of the way we order singular values)

  34. Matrix (Pseudo)Inverse • We can generalize this notion of the matrix inverse to come up with the pseudoinverse, which exists for m x n matrices of rank r: A + = V 1 ∑ 1-1 U 1T , where V 1 , ∑ 1 , and U 1 are defined from the skinny SVD • This is in a sense the closest matrix to the inverse for matrices that don’t have an inverse

  35. Matrix (Pseudo)Inverse • Note that when A is n x n and full rank, U = U 1 , V = V 1 , and ∑ = ∑ 1 , so the pseudoinverse is the inverse. • Note that the pseudoinverse is just what we came up with in the general rank-deficient least squares case: the pseudoinverse A + b solves the least squares problem Ax = b

  36. Sidestep: Matrix Norms • When introducing least squares, we introduced the concept of vector norms, which measured size. • There is a corresponding notion of norms for matrices as well, which have similar properties as the vector norms: ‖ A ‖ > 0 if A ≠ 0 ‖ c A ‖ = |c| ‖ A ‖ ‖ A + B ‖ ≤ ‖ A ‖ + ‖ B ‖

  37. Sidestep: Matrix Norms • Frobenius norm of a matrix: ‖ A ‖ F = sqrt(sum(sum( A i,j2 )))

  38. Sidestep: Matrix Norms • If ‖‖ n is some vector norm, then there is a corresponding matrix norm defined as ‖ A ‖ n = max ‖ Ax ‖ n / ‖ x ‖ n x ≠ 0 Note: this does not necessarily give us a rule for how to compute the matrix norm

  39. Sidestep: Matrix Norms • ‖ A ‖ 1 happens to be easy to compute directly: it’s max ‖ A (:,i) ‖ 1 , i.e. the maximum one-norm of the columns of A • ‖ A ‖ 2 unfortunately is rather difficult to compute directly... at least, without our fancy new SVD. Namely, ‖ A ‖ 2 = σ 1

  40. Sidestep: Matrix Norms • Matrix norms are useful for measuring the maximum amount any vector x is scaled by when multiplying by A • We’ll see on Wednesday that they are also useful for measuring error propagation in linear systems

  41. Rank-k Approximation • A = U ∑ V T can be expressed as sum( u i σ i v iT ) from i = 1 ... n • Idea: cut off summation at value k < n • Gives us the rank-k approximation

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend