norms inner products and orthogonality vector norms
play

NORMS, INNER PRODUCTS, AND ORTHOGONALITY Vector norms Generalize - PowerPoint PPT Presentation

NORMS, INNER PRODUCTS, AND ORTHOGONALITY Vector norms Generalize the familiar concept of length from R 2 and R 3 to higher dimensions Cant really visualize in higher dimensions Must trust algebra Generalize the concept of


  1. Advantage of Orthonormal Bases • Recall that writing a vector in basis B in general requires solving a linear system • When is orthonormal, B = { u 1 , u 2 , . . . , u n } the coordinates are immediately available • Simply write x = ξ 1 u 1 + ξ 2 u 2 + · · · + ξ n u n and notice that n n n D E X X X � � x | u i ⇥ = � ξ i u j | u i ⇥ = ξ i � u j | u i ⇥ ξ i u j � u i = i = j i = j i = j = ξ i � u i | u i ⇥ = ξ i

  2. Fourier Expansion • If is an orthonormal B = { u 1 , u 2 , . . . , u n } basis for an inner product space V , then each x in V can be expressed as x = � x | u 1 ⇥ u 1 + � x | u 2 ⇥ u 2 + · · · + � x | u n ⇥ u n • This is the Fourier expansion of x • The scalars are the coordinates of x � x | u i ⇥ with respect to B , its Fourier coefficients • Each represents the projection of x � x | u i ⇥ u i onto the line spanned by u i

  3. Idea of Gram-Schmidt Procedure • Orthonormal bases often desirable • Given a general basis B = { x 1 , x 2 , . . . , x n } for a vector space V , can we construct an orthonormal basis spanning V ? • Yes: Use the Gram-Schmidt procedure • At each step k we complete an orthonormal basis S k for span { x 1 , x 2 , . . . , x k } • I.e., proceed incrementally

  4. Gram-Schmidt Procedure Steps • The first step is easy x 1 S 1 = { u 1 } u 1 = k x 1 k • Now assume we have S k and want to find u k + 1 • u k + 1 ⊥ u i , i < k + 1 • � � span { x 1 , x 2 , . . . , x k , x k +1 } = span S k ∪ { u k +1 } • k u k +1 k = 1 • Use Fourier expansion of x k +1 to solve for u k +1 k k X X � u i | x k +1 ⇥ u i α k +1 u k +1 = x k +1 � ⇥ u i | x k +1 ⇤ u i x k +1 = α k +1 u k +1 + i =1 i =1 α k +1 u k +1 u k +1 = k α k +1 u k +1 k

  5. Fourier Expansion in Matrix Form • Let be an orthonormal B = { u 1 , u 2 , . . . , u n } basis for V , and put all u i as columns in U • Now, using the standard inner-product     [ U ∗ ] 1 ∗ x u ∗ 1 x n [ U ∗ ] 2 ∗ x u ∗ 2 x     X x = u j ( u ∗ j x ) = U = U = UU ∗ x .     . .     . . .     j =1 [ U ∗ ] n ∗ x u ∗ n x • So U * = U - 1 ? ( 1 , i = j [ U ∗ U ] ij = u ∗ i u j = i 6 = j 0 ,

  6. Gram-Schmidt in Matrix Form • First step is x 1 u 1 = k x 1 k • To produce u k + 1 , compute ( I � U k U ∗ k ) x k +1 u k +1 = k ( I � U k U ∗ k ) x k +1 k where the orthogonal matrix U k is given by � u 1 � u 2 u k U k = · · ·

  7. Unitary and Orthogonal Matrices • A unitary matrix is a complex matrix U n × n whose columns form an orthonormal basis for C n • An orthogonal matrix is a real matrix P n × n whose columns form an orthonormal basis for R n • The following characterizations are equivalent • P (or U ) has orthonormal columns • P (or U ) have orthonormal rows • P - 1 = P T (or U - 1 = U * ) for all x in R n × 1 • k Px k 2 = k x k 2 (or for all x in C n × 1 ) k Ux k 2 = k x k 2

  8. The QR Factorization • The Gram-Schmidt procedure can be seen as a matrix factorization • Put each x k of basis as B = { x 1 , x 2 , . . . , x n } a column of matrix A • Put each u k in S n as columns of an orthonormal or unitary matrix Q • Pick column i of matrix R to compute the linear combination of u k that results in x i • Results in R being upper-triangular!

  9. The QR factorization • Every matrix A m × n with l.i. columns can be uniquely factored as A = QR where the columns of Q m × n form an orthonormal basis for R ( A ) and R n × n is an upper-triangular matrix with positive diagonal entries k − 1 X h a k | q i i q i   a k = α k q k + q ∗ 1 a 2 q ∗ 1 a 3 q ∗ 1 a n α 1 · · · i =1 q ∗ 2 a 3 q ∗ 2 a n 0 α 2 · · ·     0 0 q ∗ 3 a n α 3 R = · · ·     . . . . ... k − 1 . . . .   . . . . X   α k = k a k � h a k | q i i q i k 0 0 0 α n · · · i =1

  10. Least-squares application of QR • Start from the normal equations A T Ax = A T b • Decomposing A T A into LU is both inefficient and leads to information loss • From the QR factorization, we have ( QR ) T QRx = R T ( Q T Q ) Rx = R T Rx = R T Q T b • If R T is non-singular (What if it isn’t?) Rx = Q T b x = R − 1 Q T b

  11. General orthogonal projectors • Given an m × n orthogonal or unitary matrix U , find the orthogonal projection z in R ( U ) of a vector b not necessarily in R ( U ) ( z − b ) ⊥ Uy , ∀ y z = Ux ( Uy ) ∗ ( Ux − b ) = 0 ⇒ y ∗ U ∗ ( Ux − b ) = 0 ⇒ y ∗ x = y ∗ U ∗ b Choose y = e i to prove ⇒ z = UU ∗ b ⇒ x = U ∗ b • This is just like the Fourier expansion, for an incomplete basis

  12. Elementary orthogonal projectors • For a vector , , matrix u ∈ C n × 1 u 6 = 0 Q = I − uu ∗ Not an unitary matrix! u ∗ u is called an elementary orthogonal projector • Note that and Q 2 = Q Q ∗ = Q with • x = Qx + ( I − Q ) x Qx ⊥ ( I − Q ) x is the orthogonal • ( I − Q ) x = u ( u ∗ x ) projection in � � { u } span is the orthogonal projection of x in the • Qx orthogonal complement of u ⊥ u

  13. Elementary reflectors (Householder transformation) u ∈ C n × 1 u 6 = 0 • For a vector , , matrix R = I − 2 uu ∗ Unitary matrix! u ∗ u is the elementary reflector about u ⊥ • Note that and and • R 2 = I R − 1 = R R ∗ = R • QR = RQ = Q • R = 2 Q − I � uu ∗ � � • k x � Qx k = k Rx � Qx k = � � u ∗ ux �

  14. Geometry of reflectors and projectors Q = I − uu ∗ R = I − 2 uu ∗ u ∗ u u ∗ u u ⊥ u x k x � Qx k ( I − Q ) x Qx k Rx � Qx k Rx

  15. Reflector into given vector • Can find such that u = v + e k v k µ u = v ± µ e R = I − 2 uu ∗ ⇒ Rv = ⌥ µ e v u ∗ u u 0 = v � e k v k • Proof e Rv = v − 2 uu ∗ u ∗ uv k e k = 1 = v − 2 u ∗ v = ⌥ µ e u ∗ uu − e ⇔ 2 u ∗ v = u ∗ u 2 u ∗ v � u ∗ u = v ∗ v � k µ k 2 ± ¯ µ e ∗ v ⌥ µ v ∗ e = 0 µ = k v k e ∗ v k e ∗ v k

  16. Plane rotations (Givens rotations) • Orthogonal matrix of the form col j col i → →  1  ...     row i   → c s     1     ... P ij ( c, s ) =       row j − s c →     1     ...     1 where c 2 + s 2 = 1 • Performs a rotation in the i,j plane of R n

  17. Pivoting with plane rotations • Kill element below pivot • “Exchange” rows 2 3 x 1     x 1 x 1 . . . . . 6 7 . .     . . 6 7     6 q 7     x 2 i + x 2 cx i + sx j 6 7 x j     j 6 7 . .     6 7 . . . P ij ( c, s ) x = P ij (0 , 1) x = =     . . . 6 7 .     6 7     − sx i + cx j 6 7 − x i 0     6 7     . . 6 7 . . .     . . . 6 7 .     4 5 x n x n x n x i x j c = s = q q x 2 i + x 2 x 2 i + x 2 j j x 2 i + x 2 j 6 = 0

  18. Combined rotations into given vector • Rotations from any vector into k x k e 1 x p p 2 3 2 3   x 2 1 + x 2 x 2 1 + x 2 2 + x 2 k x k 2 3 0 0 0 6 7 6 7   6 7 6 7   0 0 x 3 6 7 6 7   P 1 n . . . P 13 P 12 x = P 12 x = P 13 P 12 x = 6 7 6 7   0 x 4 x 4 6 7 6 7   6 7 6 7   . . . . . . 6 7 6 7   . . . 4 5 4 5   0 x n x n • Use as building block for rotations into arbitrary vector , with k x k e k e k = 1 • Result of combined rotations is not a rotation, but is an orthogonal transformation

  19. Orthogonal reduction • Matrix A can be reduced to echelon form by elementary row operations • Eliminate all entries below pivot • Matrix A can also be reduced using elementary reflectors or plane rotations • Using elementary projectors leads to Householder reduction • Using plane rotations leads to Givens reduction

  20. Orthogonal reduction • Start by finding such that R 1 [ A ] ∗ 1 = µ 1 e 1 R 1  t T � µ 1 1 R 1 A = 0 A 2 • Then find such that R 2 [ A 2 ] ∗ 1 = µ 2 e 2 R 2  1 �  t T � 0 µ 1 1 R 1 A = 0 R 2 0 R 2 A 2  t T � µ 2 2 R 2 A 2 = 0 A 3 • And so on…

  21. Reduces to trapezoidal form   ∗ ∗ ∗ · · · 0 ∗ ∗ · · ·     . . ... . .   . .     0 0 ∗ PA =  · · ·      0 0 0 · · ·     . . . ... . . .   . . .   0 0 0 · · ·   ∗ ∗ ∗ ∗ ∗ · · · · · · 0 ∗ ∗ ∗ ∗ · · · · · ·   PA =   . . . . ... . . . .   . . . .   0 0 ∗ ∗ ∗ · · · · · ·

  22. Unicity of QR factorization • When matrix is square and non-singular, Gram-Schmidt, Householder, and Givens produce QR, with R triangular • Gram-Schmidt has positive diagonal. Householder and Givens don’t, but can • In this case, factorization is unique

  23. We now have two factorizations • The LU factorization and QR factorization • LU gives the roadmap to Gaussian elimination applied to a square, non-singular matrix • QR gives the roadmap to Gram-Schmidt applied to a matrix with independent columns • Both LU and QR can be used to solve linear systems efficiently QRx = b ⇒ Rx = Q T b • • LU about twice as efficient • QR is useful for least squares as well, more stable

  24. Practical considerations • Gram-Schmidt as presented is numerically unstable in floating-point • For example if applied to a set of vectors that is not even close to being orthogonal • Even modified Gram-Schmidt stable only for LS • Householder/Givens are stable and faster • When dealing with sparse problems • Gram-Schmidt and Householder can transform it into dense problem • Not enough memory? • Maybe Givens can help

  25. COMPLEMENTARY SUBSPACES

  26. Complementary Subspaces • Subspaces X , Y of V are said to be complementary whenever and V = X + Y , in which case V is said to be the X ∩ Y = 0 direct sum of X and Y , denoted by V = X ⊕ Y • Given bases B X and B Y for X and Y , the following statements are equivalent • V = X ⊕ Y • For each v in V , there are unique vectors x in X and y in Y such that v = x + y is a basis for V and • B X ∪ B Y B X ∩ B Y = ∅

  27. Projection • Suppose that so that for each V = X ⊕ Y v in V there are unique vectors x in X and y in Y such that v = x + y • The vector x is the projection of v onto X along Y • The vector y is the projection of v onto Y along X • When X and Y are not orthogonal subspaces, this is an “oblique projection” • As opposed to the orthogonal projection

  28. Visually Y v = x + y y x 0 X

  29. Computing Oblique Projections • Assume we have bases B X and B Y for complementary subspaces X and Y of R n • This means that is a basis for R n B = B X ∪ B Y B = { x 1 , x 2 , . . . , x r , y 1 , y 2 , . . . , y n − r } • It is easier to find the coordinate matrix for the projection P onto X along Y in basis B � [ P ( x 1 )] B � [ P ] B = [ P ( x r )] B [ P ( y 1 )] B [ P ( y n − r )] B · · · · · · ✓ I r ◆ 0 � e 1 0 � e r 0 = · · · · · · = 0 0 • Placing the elements of B ( x i and y i ) as columns in B ✓ I r ◆ 0 B − 1 P = B 0 0

  30. Summary of Projections • Let X and Y be complementary subspaces of V . Each v in V can be uniquely expressed as x + y , with x in X and y in Y • The unique linear operator P defined by Pv = x is the projector onto X along Y • The following properties hold • P 2 = P (i.e., P is idempotent) • I – P is the complementary projector onto Y along X • R ( P ) = { x | Px = x } • R ( P ) = N ( I – P ) = X and R ( I – P ) = N ( P ) = Y • If V = R n or C n , then P is given by ✓ I r ◆ 0 B − 1 P = B 0 0 where B = [ X | Y ] and X , Y are respective bases for X , Y

  31. Projectors and Idempotents • A linear operator P on V is a projector if and only if it is idempotent • Proof • We have just seen that projectors ⇒ idempotents • Need to prove that idempotent ⇒ projector P 2 = P ⇒ R ( P ) are complementary and N ( P ) R ( P ) ∩ N ( P ) = { 0 } x = Py Px = 0 ⇒ 0 = Px = P 2 y = Py = x Pv ∈ R ( P ) v = Pv + ( I − P ) v ( I − P ) v ∈ N ( P )

  32. RANGE-NULLSPACE DECOMPOSITION

  33. Complementary subspaces associated to a square matrix • Let’s go back to a matrix A n × n • The Rank+Nullity theorem guarantees that dim R ( A ) + dim N ( A ) = n • But are R ( A ) and N ( A ) complementary? • They certainly are if A is non-singular. Duh. • Otherwise, it can happen that R ( A ) ⇥ N ( A ) � = { 0 } ✓ 0 ◆ ✓ 1 ◆ 1 A = ∈ R ( A ) ∩ N ( A ) ⇒ 0 0 0 • We can find complementary subspaces by looking at the powers of A

  34. Range-Nullspace Decomposition • For every singular matrix A n × n there exists a positive integer k such that R ( A k ) and N ( A k ) are complementary subspaces. I.e., R n = R ( A k ) ⊕ N ( A k ) • The smallest positive integer k for which this holds is called the index of A • For non-singular matrices, we simply define index ( A ) = 0

  35. Proof • The nullspace doesn’t shrink and the range doesn’t grow with k N ( A ) ⊆ N ( A 2 ) ⊆ · · · ⊆ N ( A k ) ⊆ N ( A k +1 ) ⊆ · · · R ( A ) ⊇ R ( A 2 ) ⊇ · · · ⊇ R ( A k ) ⊇ R ( A k +1 ) ⊇ · · · • At some point there must be equality • The dimensions can’t shrink below zero or grow above n • Once equality is attained, it is preserved forever R ( A k ) = R ( A k +1 ) ⇒ A R ( A k ) = A R ( A k +1 ) ⇒ R ( A k +1 ) = R ( A k +2 ) • Equality is attained at the same k for the nullspace dim N ( A k ) = n − dim R ( A k ) = n − dim R ( A k +1 ) = dim N ( A k +1 ) • When equality is attained, R ( A k ) ∩ N ( A k ) = { 0 } x ∈ R ( A k ) ∩ N ( A k ) ⇒ x = A k y , A k x = 0 ⇒ A 2 k y = 0 ⇒ y ∈ N ( A 2 k ) ⇒ y ∈ N ( A k ) ⇒ x = A k y = 0 • Finally, when equality is attained R ( A k ) + N ( A k ) = R n R ( A k ) + N ( A k ) = dim R ( A k ) + dim N ( A k ) − dim R ( A k ) ∩ N ( A k ) ⇥ ⇤ dim = dim R ( A k ) + dim N ( A k ) = n ⇒ R ( A k ) + N ( A k ) = R n

  36. Summary about the Index of a square matrix • The index of a square matrix A is the smallest nonnegative integer k such that any of these holds • rank ( A k ) = rank ( A k + 1 ) • R ( A k ) = R ( A k + 1 ) (the point where it stops shrinking) • N ( A k ) = N ( A k + 1 ) (the point where it stops growing) • For non-singular matrices, index ( A ) = 0 • For singular matrices, index ( A ) is the smallest positive integer k such such that any of these holds • R ( A k ) ∩ N ( A k ) = { 0 } R n = R ( A k ) ⊕ N ( A k ) •

  37. Nilpotent Matrices • N n × n is said to be nilpotent whenever N k = 0 for some positive integer k • k = index ( N ) is the smallest positive integer such that N k = 0 • Proof • As soon as N k = 0 , we have R ( N k + i ) = { 0 } • index ( N ) ≤ k • On the other hand, if we reach index ( N ) and R ( N k ) ≠ { 0 }, we could never have N k = 0 • index ( N ) ≥ k

  38. Example of Nilpotent Matrix • Start taking powers       0 1 0 0 0 1 0 0 0 N 2 = N 3 = N = 0 0 1 0 0 0 0 0 0       0 0 0 0 0 0 0 0 0 • index ( N ) = 3 , since N 2 ≠ 0 and N 3 = 0

  39. Application for the Range-Nullspace decomposition • If k = index ( A ) , then both R ( A k ) and N ( A k ) are invariant subspaces of A • A R ( A k ) = R ( A k +1 ) = R ( A k ) • x ∈ A N ( A k ) ⇒ x = Aw , w ∈ N ( A k ) = N ( A k +1 ) ⇒ A k x = A k +1 w = 0 ⇒ x ∈ N ( A k ) • When we write A in basis [ R ( A k ) N ( A k ) ], we get a block diagonal matrix • This will be very useful as a step toward the Jordan form we will see later on

  40. Core-Nilpotent Decomposition • Let A n × n be a singular matrix with index k , and let rank ( A ) = r . There exists a non-singular matrix Q such that ✓ C r × r ◆ 0 Q − 1 AQ = 0 N where C is nonsingular and N is nilpotent of index k • A is similar to a 2×2 block diagonal matrix containing a non-singular “core” C and a nilpotent component N • This block diagonal matrix is known as the core-nilpotent decomposition of A • When A is non-singular, this is trivial (and useless)

  41. Core-Nilpotent Decomposition • Proof • We have already seen the block diagonal property • To see that N is nilpotent, write ✓ C k ◆ ✓ U ◆ ✓ UA k X ◆ 0 0 = Q − 1 A k Q A k � X Y � = = N k VA k X 0 V 0 with columns of X , Y bases for R ( A k ) , N ( A k ) • Since rank ( C k ) = rank ( A k ) = r , C is non-singular • To see that index ( N ) = k , imagine it is smaller • Then rank ( A k -1 ) = rank ( C k -1 ) = r = rank ( A k ) • So index ( A ) < k , which is a contradiction

  42. ORTHOGONAL DECOMPOSITION

  43. Orthogonal Complement • For a subset M of an inner-product space V , the orthogonal complement M ⊥ of M is the set of all vectors in V that are orthogonal to all vectors in M M ⊥ = { x � V | ⇥ m | x ⇤ = 0 for all m � M} M ⊥ M = { x } M ⊥ M

  44. Orthogonal Complementary Subspaces • If M is a subspace of an finite-dimensional inner-product space V , then V = M ⊕ M ⊥ • Furthermore, if N is another subspace such that and , then V = M ⊕ N N ⊥ M N = M ⊥ • Finally, if dim V = n , then N = M ⊥ dim M ⊥ = n − dim M N ⊕ M = V ⊥ = M N ⊕ N ⊥ = V M ⊥ N ⊥ = M

  45. Orthogonal Decomposition Theorem • For every A m × n R ( A ) ⊥ = N ( A T ) N ( A ) ⊥ = R ( A T ) • Proof x ∈ R ( A ) ⊥ ⇔ x T Ay = 0 ⇔ x T A = 0 ⇔ x ∈ N ( A T ) • In other words, every matrix in R m × n produces orthogonal decompositions of R m and R n R m = R ( A ) ⊕ N ( A T ) R n = N ( A ) ⊕ R ( A T )

  46. Significance #1 • The orthogonal decomposition theorem give us decompositions of R m and R n in terms of the four fundamental subspaces • The resulting bases for R m and R n reveal more about basic components of matrix A itself • Combine orthonormal bases for R ( A ) , N ( A T ) and for R ( A T ) , N ( A ) into bases of R m and R n N ( A T ) � � U m × m = R ( A ) R ( A T ) � � V n × n = N ( A )

  47. Significance #2 • What does A look like w.r.t. these bases? N ( A T ) � � U m × m = R ( A ) R ( A T ) � � V n × n = N ( A ) • Multiply on the right by V and on the left by U -1 R = U − 1 AV = U T AV = U T A R ( A T ) � � N ( A ) ✓ R ( A ) T � = ◆ � A R ( A T ) A R ( A T ) = U T � � 0 0 N ( A T ) T R ( A ) T A R ( A T ) ✓ ◆ ✓ C r × r ◆ 0 0 = = 0 0 0 0

  48. The URV Factorization • For each matrix A m × n in R m × n of rank r , there are orthogonal matrices U m × m and V n × n and a non-singular matrix C r × r such that ✓ C r × r ◆ 0 A = URV T = U V T 0 0 m × n • The first r columns of U are an orthonormal basis for R ( A ) • The last m – r columns of U an orthonormal basis for N ( A T ) • The first r columns of V are an orthonormal basis for R ( A T ) • The last n – r columns of U an orthonormal basis for N ( A ) • In the complex case, replace orthogonal by unitary and conjugate the transposes

  49. Four fundamental subspaces R n = R ( A T ) ⊕ N ( A ) R m = R ( A ) ⊕ N ( A T ) A dim = r dim = r R ( A T ) R ( A ) A T y A T Ax z ⊥ A T y 0 A T A 0 w ⊥ Ax N ( A ) N ( A T ) A T w = 0 Az = 0 dim = n − r dim = m − r

  50. The action of A R n = R ( A T ) ⊕ N ( A ) R m = R ( A ) ⊕ N ( A T ) R ( A T ) R ( A ) A x y 0 v = x + z 0 z N ( A ) N ( A T )

  51. Inverting what we can R n = R ( A T ) ⊕ N ( A ) R m = R ( A ) ⊕ N ( A T ) R ( A T ) A † R ( A ) x y ? 0 0 y + w = u N ( A ) w N ( A T )

  52. Inverting what we can R n = R ( A T ) ⊕ N ( A ) R m = R ( A ) ⊕ N ( A T ) R ( A T ) A † R ( A ) x x y ? 0 v = x + z 0 z N ( A ) N ( A T )

  53. Inverting what we can R n = R ( A T ) ⊕ N ( A ) R m = R ( A ) ⊕ N ( A T ) R ( A T ) A † R ( A ) x y 0 0 y + w = u N ( A ) w N ( A T )

  54. Finding a matrix representation • Start from an URV factorization of A • Singularity of A has been isolated • Problem comes from N ( A ) and N ( A T ) • Map from R ( A T ) and R ( A ) is invertible • It is given by invertible matrix C • Simply transpose and replace C with C – 1 ✓ C r × r ◆ ✓ C − 1 ◆ 0 0 A † V T U T r × r A m × n = U n × m = V 0 0 0 0

  55. The Moore-Penrose Inverse • In terms of the URV factorization, the Moore-Penrose Inverse of ✓ C r × r ◆ ✓ C − 1 ◆ 0 0 is A † V T U T r × r A m × n = U n × m = V 0 0 0 0 • When Ax = b is consistent, is the x = A † b solution with minimal Euclidean norm • When Ax = b is inconsistent, is the least x = A † b squares solution of minimal Euclidean norm • Note: much like the inverse, the pseudoinverse is useful in theory, not in practical applications

  56. The Moore-Penrose Inverse • Proof for the consistent case solves Ax = b • x � = A † b A = AA † A Ax 0 = AA † b = AA † Ax = Ax = b has minimum Euclidean norm • x � = A † b x = x 0 + N ( A ) = x 0 + n R ( A † ) = R ( A T ) R ( A T ) ⊥ N ( A ) ⇒ x 0 ⊥ n k x k 2 = k x 0 + n k 2 = k x 0 k 2 + k n k 2 � k x 0 k 2

  57. Matrix decompositions based on four fundamental spaces #1 • Orthogonal decomposition • Used by URV decomposition • Works with any matrix • Different orthogonal transformations on each side • Specializes to the singular value decomposition • Range-Nullspace decomposition • Used by Core-Nilpotent factorization • Matrix must be square • Based on similarity transformation • Paves the way to the Jordan form

  58. Matrix decompositions based on four fundamental spaces #1 • Similarities reveal properties of a transformation that are independent of bases or coordinate systems • Orthogonallity is useful in least-squares, or for numerical stability when using floating-point • Core-Nilpotent is usually not a special case of URV • Not even necessarily when A is a square matrix • URV T is not a similarity unless U = V • Surprisingly, U = V defines a rather large class of matrices

  59. Range Perpendicular to Nullspace • For rank ( A n × n ) = r , the following are equivalent • R ( A ) ⊥ N ( A ) R ( A ) = R ( A T ) • N ( A ) = N ( A T ) • ✓ C r × r ◆ U orthogonal and 0 • U T with A = U C non-singular 0 0 • Such matrices are called RPN matrices , or range- symmetric matrices, or EP matrices • In the complex case, replace orthogonal by unitary and conjugate the transposes

  60. A map of matrix types RPN Normal Hermitian Real-Symmetric Non-singular

  61. SINGULAR VALUE DECOMPOSITION

  62. Can we specialize URV even more? • We can certainly make C triangular • Factor C = QR • Note that ✓ C ◆ ✓ Q ◆ ✓ R ◆ 0 0 0 = 0 0 0 I 0 0 • So that ✓ Q ◆ ✓ R ◆ 0 0 V T A = U 0 I 0 0 • We now do even better, by making C diagonal • This leads to the Singular Value Decomposition • SVD is the Swiss-army knife of decompositions

  63. Derivation of SVD #1 • We proceed incrementally … ✓ σ 1 ◆ ✓ σ 2 ◆ 0 0 = P T = P T D 1 = 1 C 1 Q 1 D 2 = 2 C 2 Q 2 0 C 2 0 C 3 • Multiplying by blocks, we have ✓ x T ◆ ✓ x T ◆ � y 1 ✓ x T x T ◆ ✓ σ 1 ◆ 1 C 1 1 C 1 y 1 1 C 1 Y 1 0 1 � y 1 � � C 1 Y 1 Y 1 = = = X T X T X T X T 1 C 1 1 C 1 y 1 1 C 1 Y 1 0 C 2 1

  64. Derivation of SVD #1 • We proceed incrementally … ✓ σ 1 ◆ ✓ σ 2 ◆ 0 0 = P T = P T D 1 = 1 C 1 Q 1 D 2 = 2 C 2 Q 2 0 C 2 0 C 3 • Multiplying by blocks, we have ✓ x T ◆ ✓ x T ◆ � y 1 ✓ x T x T ◆ ✓ σ 1 ◆ 1 C 1 1 C 1 y 1 1 C 1 Y 1 0 1 � y 1 � � C 1 Y 1 Y 1 = = = X T X T X T X T 1 C 1 1 C 1 y 1 1 C 1 Y 1 0 C 2 1 • So we have two equations to satisfy and X T x T 1 C 1 y 1 = 0 1 C 1 Y 1 = 0 • The first requires , and we know x 1 ⊥ X 1 C 1 y 1 ⊥ X 1 • Simply use Cy 1 to define x 1 and normalize C 1 y 1 ) X T 1 C 1 y 1 = X T x 1 = 1 x 1 k C 1 y 1 k = 0 k C 1 y 1 k

  65. Derivation of SVD #2 • Next requires x T C T 1 C 1 Y 1 = 0 1 x 1 ⊥ Y 1 • In other words C T 1 C 1 y 1 ⊥ Y 1 • Problem solved if there is y 1 such that C T 1 C 1 y 1 = λ y 1 • Same as ( C T 1 C 1 − λ I ) y 1 = 0 • There is y 1 , and k C 1 y 1 k y 1 = arg max k y 1 k =1 • Now that we can pick x 1 and y 1 , complete X 1 and Y 1 • Then set C 2 = X T 1 C 1 Y 1 • Finally, note that x T 1 C 1 y 1 = σ 1 = k C 1 y 1 k ⇒ σ 2 1 = λ 1 C T 1 C 1 y 1 = λ 1 y 1 ) k C 1 y 1 k 2 = λ 1

  66. Derivation of SVD #3 • Now that we have and ✓ σ 2 ◆ ✓ σ 1 ◆ 0 0 = P T = P T D 2 = 2 C 2 Q 2 D 1 = 1 C 1 Q 1 0 C 3 0 C 2 set and ✓ 1 ◆ ✓ 1 ◆ 0 0 ˆ ˆ Q 2 = P 2 = 0 Q 2 0 P 2   0 0 σ 1 so that we get D 2 = ˆ ˆ 1 C 1 Q 1 ˆ P T 2 P T 0 Q 2 = 0 σ 2   0 0 C 3 • After r – 1 steps P T Cˆ ˆ Q = D = diag( σ 1 , σ 2 , . . . , σ r ) with P and Q orthogonal ˆ ˆ P Q • To complete the proof, use P T Dˆ ✓ ˆ ◆ ✓ ˆ ✓ ˆ ✓ C ◆ ◆ ✓ C ◆ ◆ ✓ D ◆ P T 0 0 0 Q 0 Q 0 0 U T AV = = = 0 0 0 0 0 0 0 I 0 I 0 0

  67. The Singular Value Decomposition • For each A in R m × n of rank r , there are orthogonal matrices U m × m and V n × n , and a diagonal matrix such that D = diag( σ 1 , σ 2 , . . . , σ r ) ✓ D ◆ 0 with σ 1 ≥ σ 2 ≥ · · · ≥ σ r > 0 V T A = U 0 0 • The are called the nonzero singular values of A σ i • When r < p = min( m , n ) , A is said to have p – r additional zero singular values • This factorization is called the singular value decomposition of A • The columns of U and V are called left-hand and right-hand singular vectors of A , respectively

  68. Left and right singular vectors • The bases for R ( A T ) in V and for R ( A ) in U are chosen in a very special way ✓ D ◆ 0 AV = U ⇒ Av i = σ i u i 0 0 ✓ D ◆ 0 U T A = ⇒ u T i A = σ i v T V T i 0 0

  69. Image of Unit Sphere • For a nonsingular matrix A in R n × n having singular values and a SVD A = UDV T , where σ 1 ≥ σ 2 ≥ · · · ≥ σ n , the image of the unit 2-sphere D = diag( σ 1 , σ 2 , . . . , σ n ) is an ellipsoid whose k th semiaxis is given by σ k u k • Furthermore, v k in unit sphere is such that Av k = σ k u k R ( A T ) R ( A ) Av 2 = σ 2 u 2 Av 1 = σ 1 u 1 v 1 A v 3 v 2 Av 3 = σ 3 u 3

  70. Connection to A T A and AA T • The squared singular values are non-zero σ 2 i eigenvalues of both A T A and of AA T • Furthermore, the vectors v i in V and u i in U are corresponding eigenvectors of A T A and AA T • Proof ✓ D 2 ◆ ✓ D 2 ◆ 0 0 ⇒ A T A v i = σ 2 ( A T A ) n × n = V V T ⇒ A T A V = V i v i 0 0 0 0 ✓ D 2 ◆ ✓ D 2 ◆ 0 0 ( AA T ) m × m = U U T ⇒ AA T U = U ⇒ AA T u i = σ 2 i u i 0 0 0 0

  71. Compact SVDs • When matrix A m × n has many more rows than columns, n << m , it is better to use • Thin SVD: A = U n Σ n V T • Compute only first n columns in U and Σ • We usually don’t care about the Nullspaces • When the rank ( A ) = r is much smaller than the number of rows or columns • Economy/Compact SVD: A = U r Σ r V T r • Compute and store only first r columns

  72. Dimensionality Reduction • Map points in high-dimensional space to points in a lower-dimensional space • Try to preserve structure • Pairwise distances etc • Useful for further processing • Less computation, fewer parameters • Easier to understand, visualize

  73. Principal Component Analysis • Principal Component Analysis (PCA) is a method for approximating a high-dimensional data set using a lower-dimensional affine space * * Second principal component * * First principal component * * * * * * * * * Original axes * * * * * * * * * * * Data points

  74. PCA: Formal definition • Given a set of points in R n , D = { x 1 , x 2 , . . . , x N } can we find the m -dimensional subspace V of R n that allows us to best represent the set D ? • For each x i , we want to find an y i such that Px i ∈ V , a ∈ R n y i = Px i + a , • Given m , find V and a that minimize N N ε = 1 k y i � x i k 2 = 1 X X k a � ( I � P ) x i k 2 N N i =1 i =1

  75. PCA: Derivation #1 • First, let us find the value of a N ε = 1 � T � X � � a − ( I − P ) x i a − ( I − P ) x i N i =1 N = 1 a T a − 2 a T ( I − P ) x i + x T ( I − P ) T ( I − P ) x i X N i =1 • Using calculus, N ⇒ 1 ∂ε X 2 e T k a − 2 e T = 0 k ( I − P ) x i = 0 ∂ [ a ] k N i =1 N N N ⇒ 1 a = 1 x i X X X ( I − P ) x i = ( I − P ) N N N i =1 i =1 i =1 ⇒ a = ( I − P ) ¯ x

  76. PCA: Derivation #2 • Now that we know a , we can write N ε = 1 X x ) k 2 k ( I � P )( x i � ¯ N • Take orthonormal bases for V and its orthogonal i =1 complement and put in matrices V and U N N ε = 1 x ) k 2 = 1 k UU T ( x i � ¯ k U T ( x i � ¯ X X x ) k 2 N N i =1 i =1 N ! N n − m n − m 1 = 1 x ) T u j X X X X u T u T x ) T = ( x i − ¯ x )( x i − ¯ j ( x i − ¯ x )( x i − ¯ u j j N N j =1 i =1 i =1 j =1 S = 1 n − m N S = 1 N XX T X X u T x ) T ( x i − ¯ x )( x i − ¯ j S u j = N [ X ] ∗ j = x j − ¯ x i =1 j =1 S is the covariance matrix , and it does not depend on U

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend