Singular Value Decomposition CS3220 - Summer 2008 Jonathan Kaldor - - PowerPoint PPT Presentation

singular value decomposition
SMART_READER_LITE
LIVE PREVIEW

Singular Value Decomposition CS3220 - Summer 2008 Jonathan Kaldor - - PowerPoint PPT Presentation

Singular Value Decomposition CS3220 - Summer 2008 Jonathan Kaldor Another Factorization? Weve already looked at A = LU (for n x n matrices) and A = QR (for m x m matrices) Both of them are exceedingly useful, but somewhat specialized


slide-1
SLIDE 1

Singular Value Decomposition

CS3220 - Summer 2008 Jonathan Kaldor

slide-2
SLIDE 2

Another Factorization?

  • We’ve already looked at A=LU (for n x n

matrices) and A=QR (for m x m matrices)

  • Both of them are exceedingly useful, but

somewhat specialized

slide-3
SLIDE 3

Extending QR

  • We factored A=QR because we wanted an

“easy” system to solve for the least squares problem (namely, upper triangular system)

  • Recall also that when solving n x n systems,

we observed that diagonal systems were even easier to solve

  • Can we come up with a factorization where

we only have orthogonal and diagonal matrices?

slide-4
SLIDE 4

The SVD

  • For any m x n matrix A, we can factor it

into A = U∑VT, where: U: m x m orthogonal matrix V: n x n orthogonal matrix ∑: m x n diagonal matrix, with ∑i,i = σi ≥ 0 σi’s are typically ordered so σi ≥ σi+1 for i=1...n

slide-5
SLIDE 5

Terminology

  • The σis are called the singular values of the

matrix A

  • The columns ui of U are called the left

singular vectors of A

  • The columns vi of V are called the right

singular vectors of A

slide-6
SLIDE 6

Existence

  • Note: we have not made any qualifications

about A

  • In particular, A doesn’t need to be full rank,

and m can be less than, equal to, or greater than n (works for all sized matrices A). Essentially, every matrix has an SVD factorization

slide-7
SLIDE 7

Uniqueness

  • SVD is “mostly” unique. Singular values σi

are unique, and singular vectors ui and vi are unique up to choice of sign if σis are

  • distinct. If σi = σi+1 for some i, then SVD is

not unique

  • Example: identity matrix
slide-8
SLIDE 8

How to Compute?

  • Short answer: in MATLAB, with

[U, S, V] = svd(A);

  • Longer answer: algorithm is beyond the

scope of this course

  • Sufficient to know that factorization

exists, we can compute it, and that computing it is more expensive than LU

  • r QR factorization.
slide-9
SLIDE 9

What does it mean?

  • Let A = U∑VT. What is Av1 (A multiplied

by the first column of V)?

  • VTv1 = e1 = [1;0;0;...0]
  • ∑e1 = σ1e1 = [σ1;0;0;...0]
  • Uσ1e1 = σ1Ue1 = σ1u1
  • Extending this shows that Avi = σiui for all i
slide-10
SLIDE 10

What does it mean?

  • Take A = 2 x 2 matrix. Then U and V are

both 2 x 2 matrices v1 v2 Unit Circle

slide-11
SLIDE 11

What does it mean?

  • Take A = 2 x 2 matrix. Then U and V are

both 2 x 2 matrices v1 v2

slide-12
SLIDE 12

What does it mean?

  • Take A = 2 x 2 matrix. Then U and V are

both 2 x 2 matrices v1 v2 v1 v2 Multiply by VT (rotates v vectors to axes)

slide-13
SLIDE 13

What does it mean?

  • Take A = 2 x 2 matrix. Then U and V are

both 2 x 2 matrices v1 v2

slide-14
SLIDE 14

What does it mean?

  • Take A = 2 x 2 matrix. Then U and V are

both 2 x 2 matrices

slide-15
SLIDE 15

What does it mean?

  • Take A = 2 x 2 matrix. Then U and V are

both 2 x 2 matrices Multiply by ∑ (scales 1st axis by σ1, second by σ2)

slide-16
SLIDE 16

What does it mean?

  • Take A = 2 x 2 matrix. Then U and V are

both 2 x 2 matrices

slide-17
SLIDE 17

What does it mean?

  • Take A = 2 x 2 matrix. Then U and V are

both 2 x 2 matrices

slide-18
SLIDE 18

What does it mean?

  • Take A = 2 x 2 matrix. Then U and V are

both 2 x 2 matrices Multiply by U (rotates ellipse to u1, u2) u1 u2

slide-19
SLIDE 19

What does it mean?

  • Take A = 2 x 2 matrix. Then U and V are

both 2 x 2 matrices u1 u2 v1 v2 Multiply by U∑VT

slide-20
SLIDE 20

What does it mean?

  • In 2D, says that A takes directions v1 and v2,

scales them by σ1 and σ2, and then rotates them to u1 and u2

  • Maps unit circle defined by v1 and v2 to

ellipse defined by axes u1 and u2

  • This geometric argument is true in general

(but best not to try and imagine it for n > 3!)

slide-21
SLIDE 21

Rank Deficiency

  • Suppose A is 2 x 2, not the zero matrix, and

not full rank (i.e. its singular). This means that it maps to a subspace of the 2D plane (i.e. a line). This can also be seen to be mapping the unit circle to a degenerate ellipse where one axis has length 0

  • In terms of our rotation and scaling
  • perations, this is equivalent to having σ2=0

(why σ2?)

slide-22
SLIDE 22

Rank Deficiency

  • Thus, we can use the SVD to determine the

rank of a matrix: Compute the SVD, and count the number of singular values > 0 (in practice, need to count number of singular values > some small epsilon to account for floating point issues)

slide-23
SLIDE 23

Rank Deficiency

  • If matrix is rank r

U1 U2 V1T V2T

σ1 ⋱ σr

∑1 r m - r r r n-r m n n-r

slide-24
SLIDE 24

Range and Null Space

  • Suppose vi is a vector in V2. What is Avi?
  • If vi is in V2, then σi = 0. That means that

Avi = σiui = 0

  • Thus, vi is in the null space of A
  • This means that V2 is an orthonormal basis

for the null space of A (or null(A))

slide-25
SLIDE 25

Range and Null Space

  • Similarly, let x be any random vector. Then

Ax is a linear combination of the columns

  • f U1
  • This means that U1 is an orthonormal basis

for range(A)

slide-26
SLIDE 26

Range and Null Space

  • Since columns in V1 are orthonormal to V2,

and V2 is an orthonormal basis for null(A), it follows that V1 is an orthonormal basis for the orthogonal complement of null(A) (or null(A)⊥ )

  • Similarly, columns of U2 are orthonormal to

U1, so U2 is an orthonormal basis for ran(A)⊥

slide-27
SLIDE 27

Skinny SVD

  • Just like QR, there are some columns of U

and V that we don’t need

  • Namely, any vector in V2 gets zeroed out

when multiplying by ∑, and any vector in U2 is zeroed when multiplying by ∑

  • Can reconstruct A using only U1, ∑(1:r, 1:r),

and V1, where U1 is m x r and V1 is n x r

slide-28
SLIDE 28

Generalizing Least Squares

  • We know how to solve least squares as long

as A is of full rank

  • Suppose A is instead of rank r

(σr+1 = ... = σn = 0). Can we still find an acceptable least squares solution?

  • Note: have to combine both overdetermined

and underdetermined strategies (may not be an exact solution, and we can add any vector in the null space without changing the answer)

slide-29
SLIDE 29

Rank-Deficient Least Squares

  • ‖Ax-b‖22 = ‖U∑VTx - b‖22

= = = U1 U2 ∑1 0 0 0 V1T V2T x b

  • U1T

U2T ∑1 0 0 0 V1T V2T x b

  • U1T

∑1 0 V1T V2T x b

  • U2T b

+

slide-30
SLIDE 30

Rank Deficient Least Squares

  • =

Let y = = and a = U1Tb. Then U1T ∑1 0 V1T V2T x b

  • V1T

V2T x y1 y2 ∑1 0 y1 y2 a

slide-31
SLIDE 31

Rank Deficient Least Squares

  • Can choose y2 to be anything and still satisfy
  • equation. Let it be 0 to minimize norm

Then our solution is simply ∑1y1 = a and y2 = 0. Recall that y1 = V1Tx and a =

  • U1Tb. Substituting everything, we get

x = V1∑1-1U1Tb ∑1 0 y1 y2 a

slide-32
SLIDE 32

Rank Deficient Least Squares

  • Note that this included both of the
  • bservations we made while solving
  • verdetermined and underdetermined

systems (ignoring ‖U2Tb‖ since there was no way to minimize it, and seeing that we could arbitrarily set part of our solution vector (V2Tx) without changing the solution)

slide-33
SLIDE 33

Matrix Inverse

  • Let A = U∑VT be an n x n matrix of full
  • rank. Then A-1=V∑-1UT
  • What can we observe about this?
  • Roles of U and V are changed
  • Singular values of inverse are reciprocals
  • f singular values of A: 1/σi (note that they

are in reverse order because of the way we order singular values)

slide-34
SLIDE 34

Matrix (Pseudo)Inverse

  • We can generalize this notion of the matrix

inverse to come up with the pseudoinverse, which exists for m x n matrices of rank r: A+ = V1∑1-1U1T, where V1, ∑1, and U1 are defined from the skinny SVD

  • This is in a sense the closest matrix to the

inverse for matrices that don’t have an inverse

slide-35
SLIDE 35

Matrix (Pseudo)Inverse

  • Note that when A is n x n and full rank,

U = U1, V = V1, and ∑ = ∑1, so the pseudoinverse is the inverse.

  • Note that the pseudoinverse is just what we

came up with in the general rank-deficient least squares case: the pseudoinverse A+b solves the least squares problem Ax=b

slide-36
SLIDE 36

Sidestep: Matrix Norms

  • When introducing least squares, we

introduced the concept of vector norms, which measured size.

  • There is a corresponding notion of norms

for matrices as well, which have similar properties as the vector norms: ‖A‖ > 0 if A≠0 ‖c A‖ = |c| ‖A‖ ‖A + B‖ ≤ ‖A‖ + ‖B‖

slide-37
SLIDE 37

Sidestep: Matrix Norms

  • Frobenius norm of a matrix:

‖A‖F = sqrt(sum(sum(Ai,j2)))

slide-38
SLIDE 38

Sidestep: Matrix Norms

  • If ‖‖n is some vector norm, then there is a

corresponding matrix norm defined as ‖A‖n = max ‖Ax‖n / ‖x‖n Note: this does not necessarily give us a rule for how to compute the matrix norm x≠0

slide-39
SLIDE 39

Sidestep: Matrix Norms

  • ‖A‖1 happens to be easy to compute

directly: it’s max ‖A(:,i)‖1, i.e. the maximum

  • ne-norm of the columns of A
  • ‖A‖2 unfortunately is rather difficult to

compute directly... at least, without our fancy new SVD. Namely, ‖A‖2 = σ1

slide-40
SLIDE 40

Sidestep: Matrix Norms

  • Matrix norms are useful for measuring the

maximum amount any vector x is scaled by when multiplying by A

  • We’ll see on Wednesday that they are also

useful for measuring error propagation in linear systems

slide-41
SLIDE 41

Rank-k Approximation

  • A = U∑VT can be expressed as

sum(ui σi viT) from i = 1 ... n

  • Idea: cut off summation at value k < n
  • Gives us the rank-k approximation
slide-42
SLIDE 42

Rank-k Approximation

  • The rank-k approximation is a rank-k matrix

that is the closest rank-k matrix to the

  • riginal matrix A when measured using the

Frobenius matrix norm

  • ‖A-Ak‖F is minimal over all rank k

matrices

  • Same as setting k+1...n singular values to 0.
slide-43
SLIDE 43

Principal Component Analysis (PCA)

  • Powerful tool for data analysis
  • General idea: we have some data in a high

dimensional space, but it actually comes from some low dimensional model, maybe corrupted by some noise

slide-44
SLIDE 44

Principal Component Analysis

  • Example: Suppose we have a zip line (rope

between two trees) that we can slide down. We set up a bunch of cameras to record our motion as we slide down. Each camera records an (x,y) position for us at each point in time - our position in camera space. We have 2n measurements, if we have n cameras. However, our motion is primarily in one direction (plus some error)

slide-45
SLIDE 45

Principal Component Analysis

slide-46
SLIDE 46

Principal Component Analysis

slide-47
SLIDE 47

Variance and Covariance

  • Variance of a set of m measurements: how

much it spreads from the mean ∑(Xi - X)2/m where X is the mean of the data

  • Covariance: how much a pair of

measurements change together (dependence between the two variables) ∑(Xi - X)(Yi - Y)/m

slide-48
SLIDE 48

Variance and Covariance

  • Idea: choose basis that maximizes variance,

minimizes covariance of measurements in that basis

  • Maximizing variance: finding important

dimensions

  • Minimizing covariance: reducing

dependence between pairs of measurements

slide-49
SLIDE 49

Variance and Covariance

  • If we have experimental data for

measurement X in m x 1 vector x, then variance is simply xTx/m

  • If we have experimental data for two

measurements X and Y in m x 1 vectors x and y, then covariance is simply xTy/m

slide-50
SLIDE 50

Variance and Covariance

  • Let X be our m x n matrix of experimental

data (m trials, n measurements per trial, adjusted so mean of each measurement is 0)

  • Covariance matrix can then be expressed as

XTX

  • Want to find orthonormal basis that

diagonalizes covariance matrix: this will minimize covariance, maximize variance (in that basis)

slide-51
SLIDE 51

PCA

  • We can use the SVD to find a good basis: if

X = U∑VT, then XTX = V∑∑VT

  • Singular values σi give us relative

importances of each dimension

  • Columns of V give the orthonormal basis

we’re interested in.