NORMS, INNER PRODUCTS, AND ORTHOGONALITY Vector norms Generalize - PowerPoint PPT Presentation

Advantage of Orthonormal Bases • Recall that writing a vector in basis B in general requires solving a linear system • When is orthonormal, B = { u 1 , u 2 , . . . , u n } the coordinates are immediately available • Simply write x = ξ 1 u 1 + ξ 2 u 2 + · · · + ξ n u n and notice that n n n D E X X X � � x | u i ⇥ = � ξ i u j | u i ⇥ = ξ i � u j | u i ⇥ ξ i u j � u i = i = j i = j i = j = ξ i � u i | u i ⇥ = ξ i

Fourier Expansion • If is an orthonormal B = { u 1 , u 2 , . . . , u n } basis for an inner product space V , then each x in V can be expressed as x = � x | u 1 ⇥ u 1 + � x | u 2 ⇥ u 2 + · · · + � x | u n ⇥ u n • This is the Fourier expansion of x • The scalars are the coordinates of x � x | u i ⇥ with respect to B , its Fourier coefficients • Each represents the projection of x � x | u i ⇥ u i onto the line spanned by u i

Idea of Gram-Schmidt Procedure • Orthonormal bases often desirable • Given a general basis B = { x 1 , x 2 , . . . , x n } for a vector space V , can we construct an orthonormal basis spanning V ? • Yes: Use the Gram-Schmidt procedure • At each step k we complete an orthonormal basis S k for span { x 1 , x 2 , . . . , x k } • I.e., proceed incrementally

Gram-Schmidt Procedure Steps • The first step is easy x 1 S 1 = { u 1 } u 1 = k x 1 k • Now assume we have S k and want to find u k + 1 • u k + 1 ⊥ u i , i < k + 1 • � � span { x 1 , x 2 , . . . , x k , x k +1 } = span S k ∪ { u k +1 } • k u k +1 k = 1 • Use Fourier expansion of x k +1 to solve for u k +1 k k X X � u i | x k +1 ⇥ u i α k +1 u k +1 = x k +1 � ⇥ u i | x k +1 ⇤ u i x k +1 = α k +1 u k +1 + i =1 i =1 α k +1 u k +1 u k +1 = k α k +1 u k +1 k

Fourier Expansion in Matrix Form • Let be an orthonormal B = { u 1 , u 2 , . . . , u n } basis for V , and put all u i as columns in U • Now, using the standard inner-product     [ U ∗ ] 1 ∗ x u ∗ 1 x n [ U ∗ ] 2 ∗ x u ∗ 2 x     X x = u j ( u ∗ j x ) = U = U = UU ∗ x .     . .     . . .     j =1 [ U ∗ ] n ∗ x u ∗ n x • So U * = U - 1 ? ( 1 , i = j [ U ∗ U ] ij = u ∗ i u j = i 6 = j 0 ,

Gram-Schmidt in Matrix Form • First step is x 1 u 1 = k x 1 k • To produce u k + 1 , compute ( I � U k U ∗ k ) x k +1 u k +1 = k ( I � U k U ∗ k ) x k +1 k where the orthogonal matrix U k is given by � u 1 � u 2 u k U k = · · ·

Unitary and Orthogonal Matrices • A unitary matrix is a complex matrix U n × n whose columns form an orthonormal basis for C n • An orthogonal matrix is a real matrix P n × n whose columns form an orthonormal basis for R n • The following characterizations are equivalent • P (or U ) has orthonormal columns • P (or U ) have orthonormal rows • P - 1 = P T (or U - 1 = U * ) for all x in R n × 1 • k Px k 2 = k x k 2 (or for all x in C n × 1 ) k Ux k 2 = k x k 2

The QR Factorization • The Gram-Schmidt procedure can be seen as a matrix factorization • Put each x k of basis as B = { x 1 , x 2 , . . . , x n } a column of matrix A • Put each u k in S n as columns of an orthonormal or unitary matrix Q • Pick column i of matrix R to compute the linear combination of u k that results in x i • Results in R being upper-triangular!

The QR factorization • Every matrix A m × n with l.i. columns can be uniquely factored as A = QR where the columns of Q m × n form an orthonormal basis for R ( A ) and R n × n is an upper-triangular matrix with positive diagonal entries k − 1 X h a k | q i i q i   a k = α k q k + q ∗ 1 a 2 q ∗ 1 a 3 q ∗ 1 a n α 1 · · · i =1 q ∗ 2 a 3 q ∗ 2 a n 0 α 2 · · ·     0 0 q ∗ 3 a n α 3 R = · · ·     . . . . ... k − 1 . . . .   . . . . X   α k = k a k � h a k | q i i q i k 0 0 0 α n · · · i =1

Least-squares application of QR • Start from the normal equations A T Ax = A T b • Decomposing A T A into LU is both inefficient and leads to information loss • From the QR factorization, we have ( QR ) T QRx = R T ( Q T Q ) Rx = R T Rx = R T Q T b • If R T is non-singular (What if it isn’t?) Rx = Q T b x = R − 1 Q T b

General orthogonal projectors • Given an m × n orthogonal or unitary matrix U , find the orthogonal projection z in R ( U ) of a vector b not necessarily in R ( U ) ( z − b ) ⊥ Uy , ∀ y z = Ux ( Uy ) ∗ ( Ux − b ) = 0 ⇒ y ∗ U ∗ ( Ux − b ) = 0 ⇒ y ∗ x = y ∗ U ∗ b Choose y = e i to prove ⇒ z = UU ∗ b ⇒ x = U ∗ b • This is just like the Fourier expansion, for an incomplete basis

Elementary orthogonal projectors • For a vector , , matrix u ∈ C n × 1 u 6 = 0 Q = I − uu ∗ Not an unitary matrix! u ∗ u is called an elementary orthogonal projector • Note that and Q 2 = Q Q ∗ = Q with • x = Qx + ( I − Q ) x Qx ⊥ ( I − Q ) x is the orthogonal • ( I − Q ) x = u ( u ∗ x ) projection in � � { u } span is the orthogonal projection of x in the • Qx orthogonal complement of u ⊥ u

Elementary reflectors (Householder transformation) u ∈ C n × 1 u 6 = 0 • For a vector , , matrix R = I − 2 uu ∗ Unitary matrix! u ∗ u is the elementary reflector about u ⊥ • Note that and and • R 2 = I R − 1 = R R ∗ = R • QR = RQ = Q • R = 2 Q − I � uu ∗ � � • k x � Qx k = k Rx � Qx k = � � u ∗ ux �

Geometry of reflectors and projectors Q = I − uu ∗ R = I − 2 uu ∗ u ∗ u u ∗ u u ⊥ u x k x � Qx k ( I − Q ) x Qx k Rx � Qx k Rx

Reflector into given vector • Can find such that u = v + e k v k µ u = v ± µ e R = I − 2 uu ∗ ⇒ Rv = ⌥ µ e v u ∗ u u 0 = v � e k v k • Proof e Rv = v − 2 uu ∗ u ∗ uv k e k = 1 = v − 2 u ∗ v = ⌥ µ e u ∗ uu − e ⇔ 2 u ∗ v = u ∗ u 2 u ∗ v � u ∗ u = v ∗ v � k µ k 2 ± ¯ µ e ∗ v ⌥ µ v ∗ e = 0 µ = k v k e ∗ v k e ∗ v k

Plane rotations (Givens rotations) • Orthogonal matrix of the form col j col i → →  1  ...     row i   → c s     1     ... P ij ( c, s ) =       row j − s c →     1     ...     1 where c 2 + s 2 = 1 • Performs a rotation in the i,j plane of R n

Pivoting with plane rotations • Kill element below pivot • “Exchange” rows 2 3 x 1     x 1 x 1 . . . . . 6 7 . .     . . 6 7     6 q 7     x 2 i + x 2 cx i + sx j 6 7 x j     j 6 7 . .     6 7 . . . P ij ( c, s ) x = P ij (0 , 1) x = =     . . . 6 7 .     6 7     − sx i + cx j 6 7 − x i 0     6 7     . . 6 7 . . .     . . . 6 7 .     4 5 x n x n x n x i x j c = s = q q x 2 i + x 2 x 2 i + x 2 j j x 2 i + x 2 j 6 = 0

Combined rotations into given vector • Rotations from any vector into k x k e 1 x p p 2 3 2 3   x 2 1 + x 2 x 2 1 + x 2 2 + x 2 k x k 2 3 0 0 0 6 7 6 7   6 7 6 7   0 0 x 3 6 7 6 7   P 1 n . . . P 13 P 12 x = P 12 x = P 13 P 12 x = 6 7 6 7   0 x 4 x 4 6 7 6 7   6 7 6 7   . . . . . . 6 7 6 7   . . . 4 5 4 5   0 x n x n • Use as building block for rotations into arbitrary vector , with k x k e k e k = 1 • Result of combined rotations is not a rotation, but is an orthogonal transformation

Orthogonal reduction • Matrix A can be reduced to echelon form by elementary row operations • Eliminate all entries below pivot • Matrix A can also be reduced using elementary reflectors or plane rotations • Using elementary projectors leads to Householder reduction • Using plane rotations leads to Givens reduction

Orthogonal reduction • Start by finding such that R 1 [ A ] ∗ 1 = µ 1 e 1 R 1  t T � µ 1 1 R 1 A = 0 A 2 • Then find such that R 2 [ A 2 ] ∗ 1 = µ 2 e 2 R 2  1 �  t T � 0 µ 1 1 R 1 A = 0 R 2 0 R 2 A 2  t T � µ 2 2 R 2 A 2 = 0 A 3 • And so on…

Reduces to trapezoidal form   ∗ ∗ ∗ · · · 0 ∗ ∗ · · ·     . . ... . .   . .     0 0 ∗ PA =  · · ·      0 0 0 · · ·     . . . ... . . .   . . .   0 0 0 · · ·   ∗ ∗ ∗ ∗ ∗ · · · · · · 0 ∗ ∗ ∗ ∗ · · · · · ·   PA =   . . . . ... . . . .   . . . .   0 0 ∗ ∗ ∗ · · · · · ·

Unicity of QR factorization • When matrix is square and non-singular, Gram-Schmidt, Householder, and Givens produce QR, with R triangular • Gram-Schmidt has positive diagonal. Householder and Givens don’t, but can • In this case, factorization is unique

We now have two factorizations • The LU factorization and QR factorization • LU gives the roadmap to Gaussian elimination applied to a square, non-singular matrix • QR gives the roadmap to Gram-Schmidt applied to a matrix with independent columns • Both LU and QR can be used to solve linear systems efficiently QRx = b ⇒ Rx = Q T b • • LU about twice as efficient • QR is useful for least squares as well, more stable

Practical considerations • Gram-Schmidt as presented is numerically unstable in floating-point • For example if applied to a set of vectors that is not even close to being orthogonal • Even modified Gram-Schmidt stable only for LS • Householder/Givens are stable and faster • When dealing with sparse problems • Gram-Schmidt and Householder can transform it into dense problem • Not enough memory? • Maybe Givens can help

COMPLEMENTARY SUBSPACES

Complementary Subspaces • Subspaces X , Y of V are said to be complementary whenever and V = X + Y , in which case V is said to be the X ∩ Y = 0 direct sum of X and Y , denoted by V = X ⊕ Y • Given bases B X and B Y for X and Y , the following statements are equivalent • V = X ⊕ Y • For each v in V , there are unique vectors x in X and y in Y such that v = x + y is a basis for V and • B X ∪ B Y B X ∩ B Y = ∅

Projection • Suppose that so that for each V = X ⊕ Y v in V there are unique vectors x in X and y in Y such that v = x + y • The vector x is the projection of v onto X along Y • The vector y is the projection of v onto Y along X • When X and Y are not orthogonal subspaces, this is an “oblique projection” • As opposed to the orthogonal projection

Visually Y v = x + y y x 0 X

Computing Oblique Projections • Assume we have bases B X and B Y for complementary subspaces X and Y of R n • This means that is a basis for R n B = B X ∪ B Y B = { x 1 , x 2 , . . . , x r , y 1 , y 2 , . . . , y n − r } • It is easier to find the coordinate matrix for the projection P onto X along Y in basis B � [ P ( x 1 )] B � [ P ] B = [ P ( x r )] B [ P ( y 1 )] B [ P ( y n − r )] B · · · · · · ✓ I r ◆ 0 � e 1 0 � e r 0 = · · · · · · = 0 0 • Placing the elements of B ( x i and y i ) as columns in B ✓ I r ◆ 0 B − 1 P = B 0 0

Summary of Projections • Let X and Y be complementary subspaces of V . Each v in V can be uniquely expressed as x + y , with x in X and y in Y • The unique linear operator P defined by Pv = x is the projector onto X along Y • The following properties hold • P 2 = P (i.e., P is idempotent) • I – P is the complementary projector onto Y along X • R ( P ) = { x | Px = x } • R ( P ) = N ( I – P ) = X and R ( I – P ) = N ( P ) = Y • If V = R n or C n , then P is given by ✓ I r ◆ 0 B − 1 P = B 0 0 where B = [ X | Y ] and X , Y are respective bases for X , Y

Projectors and Idempotents • A linear operator P on V is a projector if and only if it is idempotent • Proof • We have just seen that projectors ⇒ idempotents • Need to prove that idempotent ⇒ projector P 2 = P ⇒ R ( P ) are complementary and N ( P ) R ( P ) ∩ N ( P ) = { 0 } x = Py Px = 0 ⇒ 0 = Px = P 2 y = Py = x Pv ∈ R ( P ) v = Pv + ( I − P ) v ( I − P ) v ∈ N ( P )

RANGE-NULLSPACE DECOMPOSITION

Complementary subspaces associated to a square matrix • Let’s go back to a matrix A n × n • The Rank+Nullity theorem guarantees that dim R ( A ) + dim N ( A ) = n • But are R ( A ) and N ( A ) complementary? • They certainly are if A is non-singular. Duh. • Otherwise, it can happen that R ( A ) ⇥ N ( A ) � = { 0 } ✓ 0 ◆ ✓ 1 ◆ 1 A = ∈ R ( A ) ∩ N ( A ) ⇒ 0 0 0 • We can find complementary subspaces by looking at the powers of A

Range-Nullspace Decomposition • For every singular matrix A n × n there exists a positive integer k such that R ( A k ) and N ( A k ) are complementary subspaces. I.e., R n = R ( A k ) ⊕ N ( A k ) • The smallest positive integer k for which this holds is called the index of A • For non-singular matrices, we simply define index ( A ) = 0

Proof • The nullspace doesn’t shrink and the range doesn’t grow with k N ( A ) ⊆ N ( A 2 ) ⊆ · · · ⊆ N ( A k ) ⊆ N ( A k +1 ) ⊆ · · · R ( A ) ⊇ R ( A 2 ) ⊇ · · · ⊇ R ( A k ) ⊇ R ( A k +1 ) ⊇ · · · • At some point there must be equality • The dimensions can’t shrink below zero or grow above n • Once equality is attained, it is preserved forever R ( A k ) = R ( A k +1 ) ⇒ A R ( A k ) = A R ( A k +1 ) ⇒ R ( A k +1 ) = R ( A k +2 ) • Equality is attained at the same k for the nullspace dim N ( A k ) = n − dim R ( A k ) = n − dim R ( A k +1 ) = dim N ( A k +1 ) • When equality is attained, R ( A k ) ∩ N ( A k ) = { 0 } x ∈ R ( A k ) ∩ N ( A k ) ⇒ x = A k y , A k x = 0 ⇒ A 2 k y = 0 ⇒ y ∈ N ( A 2 k ) ⇒ y ∈ N ( A k ) ⇒ x = A k y = 0 • Finally, when equality is attained R ( A k ) + N ( A k ) = R n R ( A k ) + N ( A k ) = dim R ( A k ) + dim N ( A k ) − dim R ( A k ) ∩ N ( A k ) ⇥ ⇤ dim = dim R ( A k ) + dim N ( A k ) = n ⇒ R ( A k ) + N ( A k ) = R n

Summary about the Index of a square matrix • The index of a square matrix A is the smallest nonnegative integer k such that any of these holds • rank ( A k ) = rank ( A k + 1 ) • R ( A k ) = R ( A k + 1 ) (the point where it stops shrinking) • N ( A k ) = N ( A k + 1 ) (the point where it stops growing) • For non-singular matrices, index ( A ) = 0 • For singular matrices, index ( A ) is the smallest positive integer k such such that any of these holds • R ( A k ) ∩ N ( A k ) = { 0 } R n = R ( A k ) ⊕ N ( A k ) •

Nilpotent Matrices • N n × n is said to be nilpotent whenever N k = 0 for some positive integer k • k = index ( N ) is the smallest positive integer such that N k = 0 • Proof • As soon as N k = 0 , we have R ( N k + i ) = { 0 } • index ( N ) ≤ k • On the other hand, if we reach index ( N ) and R ( N k ) ≠ { 0 }, we could never have N k = 0 • index ( N ) ≥ k

Example of Nilpotent Matrix • Start taking powers       0 1 0 0 0 1 0 0 0 N 2 = N 3 = N = 0 0 1 0 0 0 0 0 0       0 0 0 0 0 0 0 0 0 • index ( N ) = 3 , since N 2 ≠ 0 and N 3 = 0

Application for the Range-Nullspace decomposition • If k = index ( A ) , then both R ( A k ) and N ( A k ) are invariant subspaces of A • A R ( A k ) = R ( A k +1 ) = R ( A k ) • x ∈ A N ( A k ) ⇒ x = Aw , w ∈ N ( A k ) = N ( A k +1 ) ⇒ A k x = A k +1 w = 0 ⇒ x ∈ N ( A k ) • When we write A in basis [ R ( A k ) N ( A k ) ], we get a block diagonal matrix • This will be very useful as a step toward the Jordan form we will see later on

Core-Nilpotent Decomposition • Let A n × n be a singular matrix with index k , and let rank ( A ) = r . There exists a non-singular matrix Q such that ✓ C r × r ◆ 0 Q − 1 AQ = 0 N where C is nonsingular and N is nilpotent of index k • A is similar to a 2×2 block diagonal matrix containing a non-singular “core” C and a nilpotent component N • This block diagonal matrix is known as the core-nilpotent decomposition of A • When A is non-singular, this is trivial (and useless)

Core-Nilpotent Decomposition • Proof • We have already seen the block diagonal property • To see that N is nilpotent, write ✓ C k ◆ ✓ U ◆ ✓ UA k X ◆ 0 0 = Q − 1 A k Q A k � X Y � = = N k VA k X 0 V 0 with columns of X , Y bases for R ( A k ) , N ( A k ) • Since rank ( C k ) = rank ( A k ) = r , C is non-singular • To see that index ( N ) = k , imagine it is smaller • Then rank ( A k -1 ) = rank ( C k -1 ) = r = rank ( A k ) • So index ( A ) < k , which is a contradiction

ORTHOGONAL DECOMPOSITION

Orthogonal Complement • For a subset M of an inner-product space V , the orthogonal complement M ⊥ of M is the set of all vectors in V that are orthogonal to all vectors in M M ⊥ = { x � V | ⇥ m | x ⇤ = 0 for all m � M} M ⊥ M = { x } M ⊥ M

Orthogonal Complementary Subspaces • If M is a subspace of an finite-dimensional inner-product space V , then V = M ⊕ M ⊥ • Furthermore, if N is another subspace such that and , then V = M ⊕ N N ⊥ M N = M ⊥ • Finally, if dim V = n , then N = M ⊥ dim M ⊥ = n − dim M N ⊕ M = V ⊥ = M N ⊕ N ⊥ = V M ⊥ N ⊥ = M

Orthogonal Decomposition Theorem • For every A m × n R ( A ) ⊥ = N ( A T ) N ( A ) ⊥ = R ( A T ) • Proof x ∈ R ( A ) ⊥ ⇔ x T Ay = 0 ⇔ x T A = 0 ⇔ x ∈ N ( A T ) • In other words, every matrix in R m × n produces orthogonal decompositions of R m and R n R m = R ( A ) ⊕ N ( A T ) R n = N ( A ) ⊕ R ( A T )

Significance #1 • The orthogonal decomposition theorem give us decompositions of R m and R n in terms of the four fundamental subspaces • The resulting bases for R m and R n reveal more about basic components of matrix A itself • Combine orthonormal bases for R ( A ) , N ( A T ) and for R ( A T ) , N ( A ) into bases of R m and R n N ( A T ) � � U m × m = R ( A ) R ( A T ) � � V n × n = N ( A )

Significance #2 • What does A look like w.r.t. these bases? N ( A T ) � � U m × m = R ( A ) R ( A T ) � � V n × n = N ( A ) • Multiply on the right by V and on the left by U -1 R = U − 1 AV = U T AV = U T A R ( A T ) � � N ( A ) ✓ R ( A ) T � = ◆ � A R ( A T ) A R ( A T ) = U T � � 0 0 N ( A T ) T R ( A ) T A R ( A T ) ✓ ◆ ✓ C r × r ◆ 0 0 = = 0 0 0 0

The URV Factorization • For each matrix A m × n in R m × n of rank r , there are orthogonal matrices U m × m and V n × n and a non-singular matrix C r × r such that ✓ C r × r ◆ 0 A = URV T = U V T 0 0 m × n • The first r columns of U are an orthonormal basis for R ( A ) • The last m – r columns of U an orthonormal basis for N ( A T ) • The first r columns of V are an orthonormal basis for R ( A T ) • The last n – r columns of U an orthonormal basis for N ( A ) • In the complex case, replace orthogonal by unitary and conjugate the transposes

Four fundamental subspaces R n = R ( A T ) ⊕ N ( A ) R m = R ( A ) ⊕ N ( A T ) A dim = r dim = r R ( A T ) R ( A ) A T y A T Ax z ⊥ A T y 0 A T A 0 w ⊥ Ax N ( A ) N ( A T ) A T w = 0 Az = 0 dim = n − r dim = m − r

The action of A R n = R ( A T ) ⊕ N ( A ) R m = R ( A ) ⊕ N ( A T ) R ( A T ) R ( A ) A x y 0 v = x + z 0 z N ( A ) N ( A T )

Inverting what we can R n = R ( A T ) ⊕ N ( A ) R m = R ( A ) ⊕ N ( A T ) R ( A T ) A † R ( A ) x y ? 0 0 y + w = u N ( A ) w N ( A T )

Inverting what we can R n = R ( A T ) ⊕ N ( A ) R m = R ( A ) ⊕ N ( A T ) R ( A T ) A † R ( A ) x x y ? 0 v = x + z 0 z N ( A ) N ( A T )

Inverting what we can R n = R ( A T ) ⊕ N ( A ) R m = R ( A ) ⊕ N ( A T ) R ( A T ) A † R ( A ) x y 0 0 y + w = u N ( A ) w N ( A T )

Finding a matrix representation • Start from an URV factorization of A • Singularity of A has been isolated • Problem comes from N ( A ) and N ( A T ) • Map from R ( A T ) and R ( A ) is invertible • It is given by invertible matrix C • Simply transpose and replace C with C – 1 ✓ C r × r ◆ ✓ C − 1 ◆ 0 0 A † V T U T r × r A m × n = U n × m = V 0 0 0 0

The Moore-Penrose Inverse • In terms of the URV factorization, the Moore-Penrose Inverse of ✓ C r × r ◆ ✓ C − 1 ◆ 0 0 is A † V T U T r × r A m × n = U n × m = V 0 0 0 0 • When Ax = b is consistent, is the x = A † b solution with minimal Euclidean norm • When Ax = b is inconsistent, is the least x = A † b squares solution of minimal Euclidean norm • Note: much like the inverse, the pseudoinverse is useful in theory, not in practical applications

The Moore-Penrose Inverse • Proof for the consistent case solves Ax = b • x � = A † b A = AA † A Ax 0 = AA † b = AA † Ax = Ax = b has minimum Euclidean norm • x � = A † b x = x 0 + N ( A ) = x 0 + n R ( A † ) = R ( A T ) R ( A T ) ⊥ N ( A ) ⇒ x 0 ⊥ n k x k 2 = k x 0 + n k 2 = k x 0 k 2 + k n k 2 � k x 0 k 2

Matrix decompositions based on four fundamental spaces #1 • Orthogonal decomposition • Used by URV decomposition • Works with any matrix • Different orthogonal transformations on each side • Specializes to the singular value decomposition • Range-Nullspace decomposition • Used by Core-Nilpotent factorization • Matrix must be square • Based on similarity transformation • Paves the way to the Jordan form

Matrix decompositions based on four fundamental spaces #1 • Similarities reveal properties of a transformation that are independent of bases or coordinate systems • Orthogonallity is useful in least-squares, or for numerical stability when using floating-point • Core-Nilpotent is usually not a special case of URV • Not even necessarily when A is a square matrix • URV T is not a similarity unless U = V • Surprisingly, U = V defines a rather large class of matrices

Range Perpendicular to Nullspace • For rank ( A n × n ) = r , the following are equivalent • R ( A ) ⊥ N ( A ) R ( A ) = R ( A T ) • N ( A ) = N ( A T ) • ✓ C r × r ◆ U orthogonal and 0 • U T with A = U C non-singular 0 0 • Such matrices are called RPN matrices , or range- symmetric matrices, or EP matrices • In the complex case, replace orthogonal by unitary and conjugate the transposes

A map of matrix types RPN Normal Hermitian Real-Symmetric Non-singular

SINGULAR VALUE DECOMPOSITION

Can we specialize URV even more? • We can certainly make C triangular • Factor C = QR • Note that ✓ C ◆ ✓ Q ◆ ✓ R ◆ 0 0 0 = 0 0 0 I 0 0 • So that ✓ Q ◆ ✓ R ◆ 0 0 V T A = U 0 I 0 0 • We now do even better, by making C diagonal • This leads to the Singular Value Decomposition • SVD is the Swiss-army knife of decompositions

Derivation of SVD #1 • We proceed incrementally … ✓ σ 1 ◆ ✓ σ 2 ◆ 0 0 = P T = P T D 1 = 1 C 1 Q 1 D 2 = 2 C 2 Q 2 0 C 2 0 C 3 • Multiplying by blocks, we have ✓ x T ◆ ✓ x T ◆ � y 1 ✓ x T x T ◆ ✓ σ 1 ◆ 1 C 1 1 C 1 y 1 1 C 1 Y 1 0 1 � y 1 � � C 1 Y 1 Y 1 = = = X T X T X T X T 1 C 1 1 C 1 y 1 1 C 1 Y 1 0 C 2 1

Derivation of SVD #1 • We proceed incrementally … ✓ σ 1 ◆ ✓ σ 2 ◆ 0 0 = P T = P T D 1 = 1 C 1 Q 1 D 2 = 2 C 2 Q 2 0 C 2 0 C 3 • Multiplying by blocks, we have ✓ x T ◆ ✓ x T ◆ � y 1 ✓ x T x T ◆ ✓ σ 1 ◆ 1 C 1 1 C 1 y 1 1 C 1 Y 1 0 1 � y 1 � � C 1 Y 1 Y 1 = = = X T X T X T X T 1 C 1 1 C 1 y 1 1 C 1 Y 1 0 C 2 1 • So we have two equations to satisfy and X T x T 1 C 1 y 1 = 0 1 C 1 Y 1 = 0 • The first requires , and we know x 1 ⊥ X 1 C 1 y 1 ⊥ X 1 • Simply use Cy 1 to define x 1 and normalize C 1 y 1 ) X T 1 C 1 y 1 = X T x 1 = 1 x 1 k C 1 y 1 k = 0 k C 1 y 1 k

Derivation of SVD #2 • Next requires x T C T 1 C 1 Y 1 = 0 1 x 1 ⊥ Y 1 • In other words C T 1 C 1 y 1 ⊥ Y 1 • Problem solved if there is y 1 such that C T 1 C 1 y 1 = λ y 1 • Same as ( C T 1 C 1 − λ I ) y 1 = 0 • There is y 1 , and k C 1 y 1 k y 1 = arg max k y 1 k =1 • Now that we can pick x 1 and y 1 , complete X 1 and Y 1 • Then set C 2 = X T 1 C 1 Y 1 • Finally, note that x T 1 C 1 y 1 = σ 1 = k C 1 y 1 k ⇒ σ 2 1 = λ 1 C T 1 C 1 y 1 = λ 1 y 1 ) k C 1 y 1 k 2 = λ 1

Derivation of SVD #3 • Now that we have and ✓ σ 2 ◆ ✓ σ 1 ◆ 0 0 = P T = P T D 2 = 2 C 2 Q 2 D 1 = 1 C 1 Q 1 0 C 3 0 C 2 set and ✓ 1 ◆ ✓ 1 ◆ 0 0 ˆ ˆ Q 2 = P 2 = 0 Q 2 0 P 2   0 0 σ 1 so that we get D 2 = ˆ ˆ 1 C 1 Q 1 ˆ P T 2 P T 0 Q 2 = 0 σ 2   0 0 C 3 • After r – 1 steps P T Cˆ ˆ Q = D = diag( σ 1 , σ 2 , . . . , σ r ) with P and Q orthogonal ˆ ˆ P Q • To complete the proof, use P T Dˆ ✓ ˆ ◆ ✓ ˆ ✓ ˆ ✓ C ◆ ◆ ✓ C ◆ ◆ ✓ D ◆ P T 0 0 0 Q 0 Q 0 0 U T AV = = = 0 0 0 0 0 0 0 I 0 I 0 0

The Singular Value Decomposition • For each A in R m × n of rank r , there are orthogonal matrices U m × m and V n × n , and a diagonal matrix such that D = diag( σ 1 , σ 2 , . . . , σ r ) ✓ D ◆ 0 with σ 1 ≥ σ 2 ≥ · · · ≥ σ r > 0 V T A = U 0 0 • The are called the nonzero singular values of A σ i • When r < p = min( m , n ) , A is said to have p – r additional zero singular values • This factorization is called the singular value decomposition of A • The columns of U and V are called left-hand and right-hand singular vectors of A , respectively

Left and right singular vectors • The bases for R ( A T ) in V and for R ( A ) in U are chosen in a very special way ✓ D ◆ 0 AV = U ⇒ Av i = σ i u i 0 0 ✓ D ◆ 0 U T A = ⇒ u T i A = σ i v T V T i 0 0

Image of Unit Sphere • For a nonsingular matrix A in R n × n having singular values and a SVD A = UDV T , where σ 1 ≥ σ 2 ≥ · · · ≥ σ n , the image of the unit 2-sphere D = diag( σ 1 , σ 2 , . . . , σ n ) is an ellipsoid whose k th semiaxis is given by σ k u k • Furthermore, v k in unit sphere is such that Av k = σ k u k R ( A T ) R ( A ) Av 2 = σ 2 u 2 Av 1 = σ 1 u 1 v 1 A v 3 v 2 Av 3 = σ 3 u 3

Connection to A T A and AA T • The squared singular values are non-zero σ 2 i eigenvalues of both A T A and of AA T • Furthermore, the vectors v i in V and u i in U are corresponding eigenvectors of A T A and AA T • Proof ✓ D 2 ◆ ✓ D 2 ◆ 0 0 ⇒ A T A v i = σ 2 ( A T A ) n × n = V V T ⇒ A T A V = V i v i 0 0 0 0 ✓ D 2 ◆ ✓ D 2 ◆ 0 0 ( AA T ) m × m = U U T ⇒ AA T U = U ⇒ AA T u i = σ 2 i u i 0 0 0 0

Compact SVDs • When matrix A m × n has many more rows than columns, n << m , it is better to use • Thin SVD: A = U n Σ n V T • Compute only first n columns in U and Σ • We usually don’t care about the Nullspaces • When the rank ( A ) = r is much smaller than the number of rows or columns • Economy/Compact SVD: A = U r Σ r V T r • Compute and store only first r columns

Dimensionality Reduction • Map points in high-dimensional space to points in a lower-dimensional space • Try to preserve structure • Pairwise distances etc • Useful for further processing • Less computation, fewer parameters • Easier to understand, visualize

Principal Component Analysis • Principal Component Analysis (PCA) is a method for approximating a high-dimensional data set using a lower-dimensional affine space * * Second principal component * * First principal component * * * * * * * * * Original axes * * * * * * * * * * * Data points

PCA: Formal definition • Given a set of points in R n , D = { x 1 , x 2 , . . . , x N } can we find the m -dimensional subspace V of R n that allows us to best represent the set D ? • For each x i , we want to find an y i such that Px i ∈ V , a ∈ R n y i = Px i + a , • Given m , find V and a that minimize N N ε = 1 k y i � x i k 2 = 1 X X k a � ( I � P ) x i k 2 N N i =1 i =1

PCA: Derivation #1 • First, let us find the value of a N ε = 1 � T � X � � a − ( I − P ) x i a − ( I − P ) x i N i =1 N = 1 a T a − 2 a T ( I − P ) x i + x T ( I − P ) T ( I − P ) x i X N i =1 • Using calculus, N ⇒ 1 ∂ε X 2 e T k a − 2 e T = 0 k ( I − P ) x i = 0 ∂ [ a ] k N i =1 N N N ⇒ 1 a = 1 x i X X X ( I − P ) x i = ( I − P ) N N N i =1 i =1 i =1 ⇒ a = ( I − P ) ¯ x

PCA: Derivation #2 • Now that we know a , we can write N ε = 1 X x ) k 2 k ( I � P )( x i � ¯ N • Take orthonormal bases for V and its orthogonal i =1 complement and put in matrices V and U N N ε = 1 x ) k 2 = 1 k UU T ( x i � ¯ k U T ( x i � ¯ X X x ) k 2 N N i =1 i =1 N ! N n − m n − m 1 = 1 x ) T u j X X X X u T u T x ) T = ( x i − ¯ x )( x i − ¯ j ( x i − ¯ x )( x i − ¯ u j j N N j =1 i =1 i =1 j =1 S = 1 n − m N S = 1 N XX T X X u T x ) T ( x i − ¯ x )( x i − ¯ j S u j = N [ X ] ∗ j = x j − ¯ x i =1 j =1 S is the covariance matrix , and it does not depend on U

NORMS, INNER PRODUCTS, AND ORTHOGONALITY Vector norms Generalize - PowerPoint PPT Presentation

NORMS, INNER PRODUCTS, AND ORTHOGONALITY Vector norms Generalize the familiar concept of length from R 2 and R 3 to higher dimensions Cant really visualize in higher dimensions Must trust algebra Generalize the concept of

Inner Product Spaces and Orthogonality Mongi BLEL King Saud University August 30, 2019 Mongi

Bubble Sort Algorithm for (int outer=0; outer<a.length-1; outer++) { for (int inner=0;

Orthogonality and orthonormality Inner product 1 Definition (inner product) Let V be a vector

Matrix Calculations: Inner Products & Orthogonality A. Kissinger Institute for Computing and

Chapter 6 Orthogonality and Least Squares Section 6.1 Inner Product, Length, and Orthogonality

Chapter 7 Norms and Distance Measures Chapter 7 Vector Norms Norms are functions which measure

Lecture 1.4: Inner products and orthogonality Matthew Macauley Department of Mathematical

Session4: Normsand inner-products Optimization and Computational Linear Algebra for Data Science

The Independence Number of the Orthogonality Graph Ferdinand Ihringer Joint work with: Hajime

ALGOL68 1 ALGOL68: Goals & History Thesis: It is good practice in programming language

Inner Class Inner Classes A "regular" inner class is declared inside the curly

Topic 6: Inner Classes Inner Classes What's an inner class? A class defined inside another class

Norms Norm is a measure of size of a vector or matrix. Typical vector norms: Let v = [ v 1 , v

Matrix Calculations: Diagonalisation, Orthogonality, and Applications A. Kissinger Institute for

Vector addition: The zero vector The D -vector whose entries are all zero is the zero vector ,

= = p CSE 541 x x x x = n p 1-norm: x x =

Mathematical Background Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Norms

#5 Norms L. Olson October 1, 2015 Department of Computer Science University of Illinois at

Swedish Neutron Education for Science & Society SwedNess Swedish national graduate school

Oral communica,on (Voice) ergonomic work in Iceland. What we

Slaters condition: proof p* = inf x f(x) s.t. Ax = b, g(x) 0 e.g., inf x 2 s.t. e x+2

Multi-norms H. G. Dales (Lancaster) Fields Institute, Toronto 20/21 March 2014 1 References

Low Rank Approximation Lecture 1 Daniel Kressner Chair for Numerical Algorithms and HPC

Norm matters: efficient and accurate normalization schemes in deep networks Elad Hoffer*, Ron

NORMS, INNER PRODUCTS, AND ORTHOGONALITY Vector norms Generalize - PowerPoint PPT Presentation

NORMS, INNER PRODUCTS, AND ORTHOGONALITY Vector norms Generalize the familiar concept of length from R 2 and R 3 to higher dimensions Cant really visualize in higher dimensions Must trust algebra Generalize the concept of

Inner Product Spaces and Orthogonality Mongi BLEL King Saud University August 30, 2019 Mongi

Bubble Sort Algorithm for (int outer=0; outer&lt;a.length-1; outer++) { for (int inner=0;

Orthogonality and orthonormality Inner product 1 Definition (inner product) Let V be a vector

Matrix Calculations: Inner Products &amp; Orthogonality A. Kissinger Institute for Computing and

Chapter 6 Orthogonality and Least Squares Section 6.1 Inner Product, Length, and Orthogonality

Chapter 7 Norms and Distance Measures Chapter 7 Vector Norms Norms are functions which measure

Lecture 1.4: Inner products and orthogonality Matthew Macauley Department of Mathematical

Session4: Normsand inner-products Optimization and Computational Linear Algebra for Data Science

The Independence Number of the Orthogonality Graph Ferdinand Ihringer Joint work with: Hajime

ALGOL68 1 ALGOL68: Goals &amp; History Thesis: It is good practice in programming language

Inner Class Inner Classes A &quot;regular&quot; inner class is declared inside the curly

Topic 6: Inner Classes Inner Classes What's an inner class? A class defined inside another class

Norms Norm is a measure of size of a vector or matrix. Typical vector norms: Let v = [ v 1 , v

Matrix Calculations: Diagonalisation, Orthogonality, and Applications A. Kissinger Institute for

Vector addition: The zero vector The D -vector whose entries are all zero is the zero vector ,

= = p CSE 541 x x x x = n p 1-norm: x x =

Mathematical Background Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Norms

#5 Norms L. Olson October 1, 2015 Department of Computer Science University of Illinois at

Swedish Neutron Education for Science &amp; Society SwedNess Swedish national graduate school

Oral communica,on (Voice) ergonomic work in Iceland. What we

Slaters condition: proof p* = inf x f(x) s.t. Ax = b, g(x) 0 e.g., inf x 2 s.t. e x+2

Multi-norms H. G. Dales (Lancaster) Fields Institute, Toronto 20/21 March 2014 1 References

Low Rank Approximation Lecture 1 Daniel Kressner Chair for Numerical Algorithms and HPC

Norm matters: efficient and accurate normalization schemes in deep networks Elad Hoffer*, Ron

Bubble Sort Algorithm for (int outer=0; outer<a.length-1; outer++) { for (int inner=0;

Matrix Calculations: Inner Products & Orthogonality A. Kissinger Institute for Computing and

ALGOL68 1 ALGOL68: Goals & History Thesis: It is good practice in programming language

Inner Class Inner Classes A "regular" inner class is declared inside the curly

Swedish Neutron Education for Science & Society SwedNess Swedish national graduate school