NORMS, INNER PRODUCTS, AND ORTHOGONALITY Vector norms Generalize - - PowerPoint PPT Presentation
NORMS, INNER PRODUCTS, AND ORTHOGONALITY Vector norms Generalize - - PowerPoint PPT Presentation
NORMS, INNER PRODUCTS, AND ORTHOGONALITY Vector norms Generalize the familiar concept of length from R 2 and R 3 to higher dimensions Cant really visualize in higher dimensions Must trust algebra Generalize the concept of
Vector norms
- Generalize the familiar concept of “length”
from R2 and R3 to higher dimensions
- Can’t really visualize in higher dimensions
- Must trust algebra
- Generalize the concept of length itself
- Find basic properties every “length” should have
- Find different scalar functions of vectors that
share these properties
Euclidean Vector Norm
- In R2 and R3 we have Pythagoras!
- Begs the generalization to higher dimensions
x y kvk = p x2 + y2 kuk = p kvk2 + z2 = p x2 + y2 + z2 = √ vT v = √ uT u kwk = n X
i=1
w2
i
!1/2 = p wT w, w 2 Rn×1 z u = x y z v = ✓x y ◆ ⇥w⇥ = n X
i=1
|wi|2 !1/2 = ⇤ w∗w, w Cn×1
A few notes
- The distance between two points u and v in
Rn is simply
- The definition guarantees
- The unit norm vector in the same direction as
v is simply ku vk v/kvk kvk 0 kvk = 0 ) v = 0 αv = |α|v for all scalars
Connection to standard inner-product
- We already saw one connection between the
standard inner-product and Euclidean norm
- There is another important connection in the
form of an inequality known as the Cauchy- Bunyakovskii-Schwarz (CBS) Inequality
kxk = p x∗x x∗y =
n
X
i=1
¯ xiyi x, y ∈ Cn×1 x, y ∈ Rn×1 xT y =
n
X
i=1
xiyi kxk = p xT x
The CBS Inequality
- Cauchy-Bunyakovskii-Schwarz inequality
- Furthermore, equality holds iff
- Useful for example to prove another
inequality: the Triangle Inequality
- It is also important when we generalize the
concept of angle to higher dimensions |x∗y| ⇥x⇥⇥y⇥ y = αx
Proof of CBS Inequality
- Equality certainly holds when x = 0
- Equality also holds when
- Otherwise, look at the norm of
- Pick to kill first term
y = αx
|x∗y|2 = (x∗y)(x∗y) = (x∗y)(y∗x) =
- x∗(αx)
- (¯
αx∗)x
- = (x∗x)
- (¯
αx∗)(αx)
- = (x∗x)(y∗y)
0 < kαx yk2 = (αx y)∗(αx y) = ¯
αx∗(αx − y) − y∗(αx − y)
αx − y α
α = x∗y x∗x ⇒ 0 < y∗y − αy∗x = (y∗y)(x∗x) − (x∗y)(y∗x) x∗x = ⇥y⇥2⇥x⇥2 |x∗y|2 ⇥x⇥2
|x∗y| ⇥x⇥⇥y⇥
Triangle Inequality
- Important to ensure our concept of length in
higher dimensions makes sense
- Proof
u v u + v
ku + vk kuk + kvk, u, v 2 Cn
ku + vk2 = (u + v)∗(u + v) = u∗u + u∗v + v∗u + v∗v = kuk2 + u∗v + v∗u + kvk2 u∗v + v∗u = 2Re (u∗v) ≤ 2|u∗v| 2kukkvk ku + vk2 kuk2 + 2kukkvk + kvk2 =
- kuk + kvk
2
Other notions of length
- There are three commonly used norms
- Special cases of the p-norm, p ≥ 1
Euclidean norm u v Manhattan (grid) norm u v x2 = n X
i=1
|xi|2 !1/2 x1 =
n
X
i=1
|xi| u v Max norm kxk∞ = max
i
xi
x ∈ Cn xp = n X
i=1
|xi|p !1/p
General Vector Norms
- All p-norms satisfy the properties
- From now on, we define a norm for a real or
complex vector space V as any function mapping V into R that satisfies these conditions
- No need for coordinates
k ? k kx + yk kxk + kyk kxk 0 kxk = 0 ) x = 0 and kαxk = |α|kxk for all scalars α
Matrix norms
- Since Cm×n is a vector space with dimension mn,
we could use any vector norm from Cmn
- The Euclidean norm leads to the Frobenius norm
- Not as useful as we would like
- Motivates useful property related to matrix product
- Compatibility with Euclidean vector norm
- Submultiplicative property
A2
F =
X
i,j
|aij|2 = X
i
Ai∗2
2 =
X
j
A∗j2
2 = trace(A∗A)
Ax2
2 =
X
i
|Ai∗x|2 X
i
kAi∗k2
2 kxk2 2 = kAk2 F kxk2 2
kAk2
F
X
j
kB∗jk2 = kAk2
F kBk2 F
kABk2
F =
X
j
- [AB]∗j
- 2
2 =
X
i
- A[B]∗j
- 2
2
General Matrix Norm
- A matrix norm is a function from the set of
all complex matrices (of finite order) into R that satisfies the following properties
- Frobenius norm is one example
- Are there other examples?
k ? k and kAk 0 kAk = 0 ) A = 0 for all scalars α kαAk = |α|kAk kA + Bk kAk + kBk (matrices of same size) kABk kAk kBk (conformable matrices)
Induced Matrix Norms
- A vector norm defined on Cp for p = m,n
induces a matrix norm on Cm×n by setting
- The induced matrix norm is compatible with
the underlying vector norm
- Additionally, if A is nonsingular
kAk = max
kxk=1 kAxk,
A 2 Cm⇥n, x 2 Cn⇥1 kAxk kAk kxk min
kxk=1 kAxk = 1/kA1k
Matrix 2-norm
- Matrix norm induced by Euclidean vector
norm is not the Frobenius norm
- Instead
where is the largest such that is singular
- If A is non-singular, then
kAk2 = max
kxk2=1 kAxk2 =
p λmax
λmax λ A∗A − λI
kA1k2 = 1 minkxk2=1 kAxk2 = 1 pλmin
Why is this useful?
- Assume we solve a linear system Ax = b and
find the solution vector
- How close are we to the actual solution x?
- Ideal measure of error is relative
- So knowledge of is very useful!
- But how would you compute it?
ˆ x
kx ˆ xk kˆ xk = kA−1Ax A−1Aˆ xk kˆ xk = kA−1(b Aˆ x)k kˆ xk = kA−1ˆ rk kˆ xk kA−1k kˆ rk kˆ xk
kA−1k
General Inner Products
- Euclidean norm came from standard inner-product
- We already generalized vector norm
- Now we want to generalize inner product
- An inner product on a vector space V is a function that
maps each pair of vectors x,y to a scalar such that the following properties hold
⇥x | x⇤ R ⇥x | x⇤ 0 ⇥x | x⇤ = 0 x = 0 hαx | yi = αhx | yi hx + y | zi = hx | zi + hy | zi x | y⇥ = y | x⇥
Examples of inner products
- The standard inner-product in
- The elliptical inner product or A-inner product
- The trace inner product
- Standard inner-product for matrices (leads to Frobenius matrix norm)
- Reduces to standard-inner product for column vectors
- In the space V of functions defined on (a, b)
- nly for A non-singular (why?)
hx | yi = yT x hx | yi = y∗x hx | yi = y∗A∗Ax
hA | Bi = trace(AT B)
A | B⇥ = trace(A∗B) f | g⇥ = Z b
a
f(t)g(t)dt hf, gi = Z b
a
f(t)g(t)dt
Norms in Inner-Product Spaces
- If V is an inner-product space with an inner
product , then defines a norm on V
- Most norm properties are immediately satisfied
from the definition of general inner product
- The proof of the Triangle Inequality comes again
from the CBS inequality
- CBS holds for general inner products
x | y⇥ ⇤x⇤ = p x | x⇥
Not all norms define an inner product
- Given a norm on V, is there always an inner
product such that ?
- No, not always. And yes iff the norm satisfies
the parallelogram identity
- In which case
is an inner product on V
- Only holds for p = 2 in p-Norms
- Complex case is a bit more complicated
kxk ⇤x⇤2 = x | x⇥
kx + yk2 + kx yk2 = 2
- kxk2 + kyk2
⇥x | y⇤ = 1 4
- ⌅x + y⌅2 ⌅x y⌅2
ORTHOGONALITY
- Generalize concept based on Pythagorean theorem and
general inner product spaces
- In an inner-product space V, we say two vectors x, y
are orthogonal when and denote by
- In Rn, with standard inner-product,
- In Cn, with standard inner-product,
⇥ ⇤u v | u v⌅ = ⇤u | u⌅ + ⇤v | v⌅
Orthogonallity
kuk kvk k u
- v
k ku vk2 = kuk2 + kvk2 ⇥u | v⇤ + ⇥v | u⇤ = 0 ⇥u | v⇤ + ⇥u | v⇤ = 0
x | y⇥ = 0 x ⊥ y
x ⊥ y ⇔ yT x = 0 x ⊥ y ⇔ y∗x = 0
Re ⇥u | v⇤ = 0
Example of Orthogonallity
- Find the solution to a linear least squares
problem Ax = b
- In particular when b is not in R(A), we
wonder about the residual
- Turns out that
ˆ x ˆ r = b − Aˆ x ˆ r ⊥ R(A)
AT Aˆ x = AT b ⇒ AT (b − Aˆ x) = 0 ⇒ ATˆ r = 0 ⇒ yT ATˆ r = 0 ⇒ (Ay)Tˆ r = 0
Angles between general vectors
- Use the law of cosines, which generalizes the
Pythagorean theorem for all triangles
- CBS inequality guarantees RHS is in [–1, 1]
- So there is a for every u and v
kuk2 + kvk2 ku vk2 2 kuk kvk = cos θ
θ
= Re
- hu | vi
- kuk kvk
Orthonormal sets
- is called an orthonormal
set whenever for each i and when whenever
- Every orthonormal set is l.i.
- Every orthonormal set with n vectors from an
n-dimensional vector space V constitutes an
- rthonormal basis for V
B = {u1, u2, . . . , un} kuik = 1 ui ⊥ uj i 6= j
⇥ui | uj⇤ = ( 0, i = j 1, i = j
Advantage of Orthonormal Bases
- Recall that writing a vector in basis B in general
requires solving a linear system
- When is orthonormal,
the coordinates are immediately available
- Simply write
and notice that B = {u1, u2, . . . , un}
x = ξ1u1 + ξ2u2 + · · · + ξnun x | ui⇥ = D
n
X
i=j
ξiuj
- ui
E =
n
X
i=j
ξiuj | ui⇥ =
n
X
i=j
ξiuj | ui⇥ = ξiui | ui⇥ = ξi
Fourier Expansion
- If is an orthonormal
basis for an inner product space V, then each x in V can be expressed as
- This is the Fourier expansion of x
- The scalars are the coordinates of x
with respect to B, its Fourier coefficients
- Each represents the projection of x
- nto the line spanned by ui
B = {u1, u2, . . . , un} x = x | u1⇥u1 + x | u2⇥u2 + · · · + x | un⇥un x | ui⇥ui x | ui⇥
Idea of Gram-Schmidt Procedure
- Orthonormal bases often desirable
- Given a general basis
for a vector space V, can we construct an
- rthonormal basis spanning V ?
- Yes: Use the Gram-Schmidt procedure
- At each step k we complete an orthonormal
basis Sk for
- I.e., proceed incrementally
B = {x1, x2, . . . , xn} span{x1, x2, . . . , xk}
Gram-Schmidt Procedure Steps
- The first step is easy
- Now assume we have Sk and want to find uk+1
- uk+1 ⊥ ui, i < k + 1
- Use Fourier expansion of xk+1 to solve for uk+1
u1 = x1 kx1k S1 = {u1}
span{x1, x2, . . . , xk, xk+1} = span
- Sk ∪ {uk+1}
- kuk+1k = 1
uk+1 = αk+1uk+1 kαk+1uk+1k xk+1 = αk+1uk+1 +
k
X
i=1
ui | xk+1⇥ui αk+1uk+1 = xk+1
k
X
i=1
⇥ui | xk+1⇤ui
Fourier Expansion in Matrix Form
- Let be an orthonormal
basis for V, and put all ui as columns in U
- Now, using the standard inner-product
- So U* = U-1?
B = {u1, u2, . . . , un}
x =
n
X
j=1
uj(u∗
jx) = U
u∗
1x
u∗
2x
. . . u∗
nx
= UU∗x [U∗U]ij = u∗
i uj =
( 1, i = j 0, i 6= j = U [U∗]1∗ x [U∗]2∗ x . . . [U∗]n∗ x
Gram-Schmidt in Matrix Form
- First step is
- To produce uk+1, compute
where the orthogonal matrix Uk is given by
u1 = x1 kx1k Uk = u1 u2 · · · uk
- uk+1 =
(I UkU∗
k)xk+1
k(I UkU∗
k)xk+1k
Unitary and Orthogonal Matrices
- A unitary matrix is a complex matrix Un×n whose
columns form an orthonormal basis for Cn
- An orthogonal matrix is a real matrix Pn×n whose
columns form an orthonormal basis for Rn
- The following characterizations are equivalent
- P (or U) has orthonormal columns
- P (or U) have orthonormal rows
- P-1 = PT (or U-1 = U*)
- for all x in Rn×1
(or for all x in Cn×1) kPxk2 = kxk2 kUxk2 = kxk2
The QR Factorization
- The Gram-Schmidt procedure can be seen as
a matrix factorization
- Put each xk of basis as
a column of matrix A
- Put each uk in Sn as columns of an
- rthonormal or unitary matrix Q
- Pick column i of matrix R to compute the
linear combination of uk that results in xi
- Results in R being upper-triangular!
B = {x1, x2, . . . , xn}
The QR factorization
- Every matrix Am×n with l.i. columns can be
uniquely factored as A = QR where the columns of Qm×n form an orthonormal basis for R(A) and Rn×n is an upper-triangular matrix with positive diagonal entries
R = α1 q∗
1a2
q∗
1a3
· · · q∗
1an
α2 q∗
2a3
· · · q∗
2an
α3 · · · q∗
3an
. . . . . . . . . ... . . . · · · αn ak = αkqk +
k−1
X
i=1
hak | qiiqi αk = kak
k−1
X
i=1
hak | qiiqik
Least-squares application of QR
- Start from the normal equations
- Decomposing ATA into LU is both inefficient
and leads to information loss
- From the QR factorization, we have
- If RT is non-singular (What if it isn’t?)
AT Ax = AT b
(QR)T QRx = RT (QT Q)Rx = RT Rx = RT QT b
Rx = QT b x = R−1QT b
General orthogonal projectors
- Given an m × n orthogonal or unitary matrix U,
find the orthogonal projection z in R(U) of a vector b not necessarily in R(U)
- This is just like the Fourier expansion, for an
incomplete basis
(Uy)∗(Ux − b) = 0 ⇒ y∗U∗(Ux − b) = 0 ⇒ y∗x = y∗U∗b ⇒ x = U∗b Choose y = ei to prove z = Ux (z − b) ⊥ Uy, ∀y ⇒ z = UU∗b
Elementary orthogonal projectors
- For a vector , , matrix
is called an elementary orthogonal projector
- Note that and
- with
- is the orthogonal
projection in
- is the orthogonal projection of x in the
- rthogonal complement of
u ∈ Cn×1 x = Qx + (I − Q)x Qx ⊥ (I − Q)x (I − Q)x = u(u∗x) span
- {u}
- Qx
u⊥ u
Not an unitary matrix!
Q = I − uu∗ u∗u u 6= 0 Q2 = Q Q∗ = Q
Elementary reflectors (Householder transformation)
- For a vector , , matrix
is the elementary reflector about
- Note that
- and and
- u ∈ Cn×1 u 6= 0
R = I − 2uu∗ u∗u u⊥
kx Qxk = kRx Qxk =
- uu∗
u∗ux
- R = 2Q − I
R2 = I R∗ = R R−1 = R
Unitary matrix!
QR = RQ = Q
Geometry of reflectors and projectors
u Qx Rx u⊥ x kx Qxk kRx Qxk (I − Q)x
R = I − 2uu∗ u∗u Q = I − uu∗ u∗u
Reflector into given vector
- Can find such that
- Proof
u0 = v ekvk −e u = v + ekvk v e kek = 1 Rv = v − 2uu∗ u∗uv = v − 2u∗v u∗uu ⇔ 2u∗v = u∗u
µ
R = I − 2uu∗ u∗u ⇒ Rv = ⌥µe u = v ± µe = ⌥µe 2u∗v u∗u = v∗v kµk2 ± ¯ µ e∗v ⌥ µ v∗e = 0 µ = kvk e∗v ke∗vk
Plane rotations (Givens rotations)
- Orthogonal matrix of the form
where
- Performs a rotation in the i,j plane of
Pij(c, s) = 1 ... c s 1 ... −s c 1 ... 1 row i → row j → col i → col j →
c2 + s2 = 1
Rn
Pivoting with plane rotations
- Kill element below pivot
- “Exchange” rows
Pij(c, s) x = x1 . . . cxi + sxj . . . −sxi + cxj . . . xn = 2 6 6 6 6 6 6 6 6 6 6 6 6 4 x1 . . . q x2
i + x2 j
. . . . . . xn 3 7 7 7 7 7 7 7 7 7 7 7 7 5
c = xi q x2
i + x2 j
s = xj q x2
i + x2 j
x2
i + x2 j 6= 0
Pij(0, 1) x = x1 . . . xj . . . −xi . . . xn
Combined rotations into given vector
- Rotations from any vector into
- Use as building block for rotations into
arbitrary vector , with
- Result of combined rotations is not a rotation,
but is an orthogonal transformation
P12 x = 2 6 6 6 6 6 6 6 4 p x2
1 + x2 2
x3 x4 . . . xn 3 7 7 7 7 7 7 7 5 P13P12 x = 2 6 6 6 6 6 6 6 4 p x2
1 + x2 2 + x2 3
x4 . . . xn 3 7 7 7 7 7 7 7 5 P1n . . . P13P12 x = kxk . . .
x kxke1 kxk e kek = 1
Orthogonal reduction
- Matrix A can be reduced to echelon form by
elementary row operations
- Eliminate all entries below pivot
- Matrix A can also be reduced using
elementary reflectors or plane rotations
- Using elementary projectors leads to
Householder reduction
- Using plane rotations leads to Givens
reduction
Orthogonal reduction
- Start by finding such that
- Then find such that
- And so on…
R1 R1[A]∗1 = µ1e1
R1A = µ1 tT
1
A2
- R2
R2[A2]∗1 = µ2e2
R1 1 R2
- A =
µ1 tT
1
R2A2
- R2A2 =
µ2 tT
2
A3
Reduces to trapezoidal form
PA = ∗ ∗ · · · ∗ ∗ · · · ∗ ∗ · · · ∗ ∗ · · · ∗ . . . ... . . . . . . . . . · · · ∗ ∗ · · · ∗ PA = ∗ ∗ · · · ∗ ∗ · · · ∗ . . . ... . . . · · · ∗ · · · . . . . . . ... . . . · · ·
Unicity of QR factorization
- When matrix is square and non-singular,
Gram-Schmidt, Householder, and Givens produce QR, with R triangular
- Gram-Schmidt has positive diagonal.
Householder and Givens don’t, but can
- In this case, factorization is unique
We now have two factorizations
- The LU factorization and QR factorization
- LU gives the roadmap to Gaussian elimination
applied to a square, non-singular matrix
- QR gives the roadmap to Gram-Schmidt applied to a
matrix with independent columns
- Both LU and QR can be used to solve linear
systems efficiently
- LU about twice as efficient
- QR is useful for least squares as well, more stable
QRx = b ⇒ Rx = QT b
Practical considerations
- Gram-Schmidt as presented is numerically
unstable in floating-point
- For example if applied to a set of vectors that is not
even close to being orthogonal
- Even modified Gram-Schmidt stable only for LS
- Householder/Givens are stable and faster
- When dealing with sparse problems
- Gram-Schmidt and Householder can transform it into
dense problem
- Not enough memory?
- Maybe Givens can help
COMPLEMENTARY SUBSPACES
Complementary Subspaces
- Subspaces X, Y of V are said to be
complementary whenever and , in which case V is said to be the direct sum of X and Y, denoted by
- Given bases BX and BY for X and Y, the
following statements are equivalent
- For each v in V, there are unique vectors x in X and
y in Y such that v = x + y
- is a basis for V and
V = X ⊕ Y
V = X ⊕ Y X ∩ Y = 0 V = X + Y
BX ∪ BY BX ∩ BY = ∅
Projection
- Suppose that so that for each
v in V there are unique vectors x in X and y in Y such that v = x + y
- The vector x is the projection of v onto X along Y
- The vector y is the projection of v onto Y along X
- When X and Y are not orthogonal subspaces,
this is an “oblique projection”
- As opposed to the orthogonal projection
V = X ⊕ Y
Visually
v = x + y x y X Y
Computing Oblique Projections
- Assume we have bases BX and BY for complementary
subspaces X and Y of Rn
- This means that is a basis for Rn
- It is easier to find the coordinate matrix for the
projection P onto X along Y in basis B
- Placing the elements of B (xi and yi) as columns in B
B = BX ∪ BY B = {x1, x2, . . . , xr, y1, y2, . . . , yn−r}
[P]B = [P(x1)]B · · · [P(xr)]B [P(y1)]B · · · [P(yn−r)]B
- =
e1 · · · er · · · = ✓Ir ◆ P = B ✓Ir ◆ B−1
Summary of Projections
- Let X and Y be complementary subspaces of V. Each v in V can
be uniquely expressed as x + y, with x in X and y in Y
- The unique linear operator P defined by Pv = x is the
projector onto X along Y
- The following properties hold
- P2 = P (i.e., P is idempotent)
- I – P is the complementary projector onto Y along X
- R(P) = {x | Px = x}
- R(P) = N(I – P) = X and R(I – P) = N(P) = Y
- If V = Rn or Cn, then P is given by
where B = [X | Y] and X, Y are respective bases for X, Y
P = B ✓Ir ◆ B−1
Projectors and Idempotents
- A linear operator P on V is a projector if and
- nly if it is idempotent
- Proof
- We have just seen that projectors ⇒ idempotents
- Need to prove that idempotent ⇒ projector
P2 = P ⇒ R(P) N(P) and are complementary v = Pv + (I − P)v Pv ∈ R(P) (I − P)v ∈ N(P) R(P) ∩ N(P) = {0} x = Py Px = 0 ⇒ 0 = Px = P2y = Py = x
RANGE-NULLSPACE DECOMPOSITION
Complementary subspaces associated to a square matrix
- Let’s go back to a matrix An×n
- The Rank+Nullity theorem guarantees that
- But are R(A) and N(A) complementary?
- They certainly are if A is non-singular. Duh.
- Otherwise, it can happen that
- We can find complementary subspaces by
looking at the powers of A dim R(A) + dim N(A) = n
R(A) ⇥ N(A) = {0} A = ✓0 1 ◆ ⇒ ✓1 ◆ ∈ R(A) ∩ N(A)
Range-Nullspace Decomposition
- For every singular matrix An×n there exists a
positive integer k such that R(Ak) and N(Ak) are complementary subspaces. I.e.,
- The smallest positive integer k for which this
holds is called the index of A
- For non-singular matrices, we simply define
index(A) = 0 Rn = R(Ak) ⊕ N(Ak)
Proof
- The nullspace doesn’t shrink and the range doesn’t grow with k
- At some point there must be equality
- The dimensions can’t shrink below zero or grow above n
- Once equality is attained, it is preserved forever
- Equality is attained at the same k for the nullspace
- When equality is attained,
- Finally, when equality is attained
R(Ak) = R(Ak+1) ⇒ AR(Ak) = AR(Ak+1) ⇒ R(Ak+1) = R(Ak+2) dim N(Ak) = n − dim R(Ak) = n − dim R(Ak+1) = dim N(Ak+1)
R(Ak) ∩ N(Ak) = {0} R(Ak) + N(Ak) = Rn
x ∈ R(Ak) ∩ N(Ak) ⇒ x = Aky, Akx = 0 ⇒ A2ky = 0 ⇒ y ∈ N(A2k) ⇒ y ∈ N(Ak) ⇒ x = Aky = 0 dim ⇥ R(Ak) + N(Ak) ⇤ = dim R(Ak) + dim N(Ak) − dim R(Ak) ∩ N(Ak) R(A) ⊇ R(A2) ⊇ · · · ⊇ R(Ak) ⊇ R(Ak+1) ⊇ · · · N(A) ⊆ N(A2) ⊆ · · · ⊆ N(Ak) ⊆ N(Ak+1) ⊆ · · · = dim R(Ak) + dim N(Ak) = n ⇒ R(Ak) + N(Ak) = Rn
Summary about the Index of a square matrix
- The index of a square matrix A is the smallest
nonnegative integer k such that any of these holds
- rank(Ak) = rank(Ak+1)
- R(Ak) = R(Ak+1) (the point where it stops shrinking)
- N(Ak) = N(Ak+1) (the point where it stops growing)
- For non-singular matrices, index(A) = 0
- For singular matrices, index(A) is the smallest positive
integer k such such that any of these holds
- R(Ak) ∩ N(Ak) = {0}
Rn = R(Ak) ⊕ N(Ak)
Nilpotent Matrices
- Nn×n is said to be nilpotent whenever Nk = 0
for some positive integer k
- k = index(N) is the smallest positive integer
such that Nk = 0
- Proof
- As soon as Nk = 0, we have R(Nk+i) = {0}
- index(N) ≤ k
- On the other hand, if we reach index(N) and
R(Nk) ≠ {0}, we could never have Nk = 0
- index(N) ≥ k
Example of Nilpotent Matrix
- Start taking powers
- index(N) = 3, since N2 ≠ 0 and N3 = 0
N = 1 1 N2 = 1 N3 =
Application for the Range-Nullspace decomposition
- If k = index(A), then both R(Ak) and
N(Ak) are invariant subspaces of A
- When we write A in basis [R(Ak) N(Ak)], we
get a block diagonal matrix
- This will be very useful as a step toward the
Jordan form we will see later on
x ∈ AN(Ak) ⇒ x = Aw, w ∈ N(Ak) = N(Ak+1) AR(Ak) = R(Ak+1) = R(Ak) ⇒ Akx = Ak+1w = 0 ⇒ x ∈ N(Ak)
Core-Nilpotent Decomposition
- Let An×n be a singular matrix with index k, and let
rank(A) = r. There exists a non-singular matrix Q such that where C is nonsingular and N is nilpotent of index k
- A is similar to a 2×2 block diagonal matrix containing a
non-singular “core” C and a nilpotent component N
- This block diagonal matrix is known as the
core-nilpotent decomposition of A
- When A is non-singular, this is trivial (and useless)
Q−1AQ = ✓Cr×r N ◆
Core-Nilpotent Decomposition
- Proof
- We have already seen the block diagonal property
- To see that N is nilpotent, write
with columns of X, Y bases for R(Ak), N(Ak)
- Since rank(Ck) = rank(Ak) = r, C is non-singular
- To see that index(N) = k, imagine it is smaller
- Then rank(Ak-1) = rank(Ck-1) = r = rank(Ak)
- So index(A) < k, which is a contradiction
✓Ck Nk ◆ = Q−1AkQ = ✓U V ◆ Ak X Y = ✓UAkX VAkX ◆
ORTHOGONAL DECOMPOSITION
Orthogonal Complement
- For a subset M of an inner-product space V,
the orthogonal complement M⊥ of M is the set of all vectors in V that are orthogonal to all vectors in M
M⊥ = {x V | ⇥m | x⇤ = 0 for all m M}
M⊥ M = {x} M⊥ M
Orthogonal Complementary Subspaces
- If M is a subspace of an finite-dimensional
inner-product space V, then
- Furthermore, if N is another subspace such
that and , then
- Finally, if dim V = n, then
V = M ⊕ N N ⊥ M N = M⊥ V = M ⊕ M⊥ dim M⊥ = n − dim M M⊥
⊥ = M
N = M⊥ N ⊕ M = V N ⊕ N ⊥ = V N ⊥ = M
Orthogonal Decomposition Theorem
- For every Am×n
- Proof
- In other words, every matrix in Rm×n produces
- rthogonal decompositions of Rm and Rn
R(A)⊥ = N(AT ) N(A)⊥ = R(AT ) Rm = R(A) ⊕ N(AT ) Rn = N(A) ⊕ R(AT )
x ∈ R(A)⊥ ⇔ xT Ay = 0 ⇔ xT A = 0 ⇔ x ∈ N(AT )
Significance #1
- The orthogonal decomposition theorem give us
decompositions of Rm and Rn in terms of the four fundamental subspaces
- The resulting bases for Rm and Rn reveal more
about basic components of matrix A itself
- Combine orthonormal bases for R(A), N(AT )
and for R(AT ), N(A) into bases of Rm and Rn Um×m =
- R(A)
N(AT )
- Vn×n =
- R(AT )
N(A)
Significance #2
- What does A look like w.r.t. these bases?
- Multiply on the right by V and on the left by U-1
Um×m =
- R(A)
N(AT )
- Vn×n =
- R(AT )
N(A)
- R = U−1AV = UT AV = UT A
- R(AT )
N(A)
- = UT
AR(AT ) = ✓ R(A)T N(AT )T ◆ AR(AT )
- =
✓ R(A)T AR(AT ) ◆ = ✓Cr×r ◆
The URV Factorization
- For each matrix Am×n in Rm×n of rank r, there are
- rthogonal matrices Um×m and Vn×n and a non-singular
matrix Cr×r such that
- The first r columns of U are an orthonormal basis for R(A)
- The last m – r columns of U an orthonormal basis for N(AT )
- The first r columns of V are an orthonormal basis for R(AT )
- The last n – r columns of U an orthonormal basis for N(A)
- In the complex case, replace orthogonal by unitary and
conjugate the transposes
A = URVT = U ✓Cr×r ◆
m×n
VT
Four fundamental subspaces
A AT A AT
dim = r dim = n − r R(AT ) AT y N(A) Az = 0 z ⊥ AT y
Rn = R(AT ) ⊕ N(A)
dim = m − r dim = r w ⊥ Ax R(A) Ax N(AT ) AT w = 0
Rm = R(A) ⊕ N(AT )
The action of A
R(AT ) N(A) R(A) N(AT )
Rn = R(AT ) ⊕ N(A) Rm = R(A) ⊕ N(AT ) A
y x z v = x + z
Inverting what we can
R(AT ) N(A) R(A) N(AT )
Rn = R(AT ) ⊕ N(A) Rm = R(A) ⊕ N(AT )
x
A†
y w y + w = u
?
Inverting what we can
R(AT ) N(A) R(A) N(AT )
Rn = R(AT ) ⊕ N(A) Rm = R(A) ⊕ N(AT )
x
A†
x z v = x + z y
?
Inverting what we can
R(AT ) N(A) R(A) N(AT )
Rn = R(AT ) ⊕ N(A) Rm = R(A) ⊕ N(AT )
x
A†
y w y + w = u
Finding a matrix representation
- Start from an URV factorization of A
- Singularity of A has been isolated
- Problem comes from N(A) and N(AT )
- Map from R(AT ) and R(A) is invertible
- It is given by invertible matrix C
- Simply transpose and replace C with C–1
A†
n×m = V
✓ C−1
r×r
◆ UT Am×n = U ✓Cr×r ◆ VT
The Moore-Penrose Inverse
- In terms of the URV factorization, the
Moore-Penrose Inverse of
- When Ax = b is consistent, is the
solution with minimal Euclidean norm
- When Ax = b is inconsistent, is the least
squares solution of minimal Euclidean norm
- Note: much like the inverse, the pseudoinverse
is useful in theory, not in practical applications
A†
n×m = V
✓ C−1
r×r
◆ UT Am×n = U ✓Cr×r ◆ VT is
x = A†b x = A†b
The Moore-Penrose Inverse
- Proof for the consistent case
- solves Ax = b
- has minimum Euclidean norm
A = AA†A
x = A†b x = A†b
R(A†) = R(AT ) x = x0 + N(A) = x0 + n R(AT ) ⊥ N(A) ⇒ x0 ⊥ n kxk2 = kx0 + nk2 = kx0k2 + knk2 kx0k2 Ax0 = AA†b = AA†Ax = Ax = b
Matrix decompositions based on four fundamental spaces #1
- Orthogonal decomposition
- Used by URV decomposition
- Works with any matrix
- Different orthogonal transformations on each side
- Specializes to the singular value decomposition
- Range-Nullspace decomposition
- Used by Core-Nilpotent factorization
- Matrix must be square
- Based on similarity transformation
- Paves the way to the Jordan form
Matrix decompositions based on four fundamental spaces #1
- Similarities reveal properties of a transformation that
are independent of bases or coordinate systems
- Orthogonallity is useful in least-squares, or for
numerical stability when using floating-point
- Core-Nilpotent is usually not a special case of URV
- Not even necessarily when A is a square matrix
- URVT is not a similarity unless U = V
- Surprisingly, U = V defines a rather large
class of matrices
Range Perpendicular to Nullspace
- For rank(An×n) = r, the following are equivalent
- Such matrices are called RPN matrices, or range-
symmetric matrices, or EP matrices
- In the complex case, replace orthogonal by unitary
and conjugate the transposes
R(A) ⊥ N(A) R(A) = R(AT ) N(A) = N(AT ) A = U ✓Cr×r ◆ UT U orthogonal and C non-singular with
A map of matrix types
RPN Normal Hermitian Real-Symmetric Non-singular
SINGULAR VALUE DECOMPOSITION
Can we specialize URV even more?
- We can certainly make C triangular
- Factor C = QR
- Note that
- So that
- We now do even better, by making C diagonal
- This leads to the Singular Value Decomposition
- SVD is the Swiss-army knife of decompositions
✓C ◆ = ✓Q I ◆ ✓R ◆ A = U ✓Q I ◆ ✓R ◆ VT
Derivation of SVD #1
- We proceed incrementally
- Multiplying by blocks, we have
…
✓xT
1
XT
1
◆ C1 y1 Y1
- =
✓xT
1 C1
XT
1 C1
◆ y1 Y1
- =
✓xT
1 C1y1
xT
1 C1Y1
XT
1 C1y1
XT
1 C1Y1
◆ = ✓σ1 C2 ◆ D1 = ✓σ1 C2 ◆ = PT
1 C1Q1
D2 = ✓σ2 C3 ◆ = PT
2 C2Q2
Derivation of SVD #1
- We proceed incrementally
- Multiplying by blocks, we have
- So we have two equations to satisfy
- The first requires , and we know
- Simply use Cy1 to define x1 and normalize
…
✓xT
1
XT
1
◆ C1 y1 Y1
- =
✓xT
1 C1
XT
1 C1
◆ y1 Y1
- =
✓xT
1 C1y1
xT
1 C1Y1
XT
1 C1y1
XT
1 C1Y1
◆ = ✓σ1 C2 ◆
C1y1 ⊥ X1 x1 ⊥ X1
) XT
1 C1y1 = XT 1 x1kC1y1k = 0
XT
1 C1y1 = 0
xT
1 C1Y1 = 0
and
D1 = ✓σ1 C2 ◆ = PT
1 C1Q1
D2 = ✓σ2 C3 ◆ = PT
2 C2Q2
x1 = C1y1 kC1y1k
Derivation of SVD #2
- Next requires
- In other words
- Problem solved if there is y1 such that
- Same as
- There is y1, and
- Now that we can pick x1 and y1, complete X1 and Y1
- Then set
- Finally, note that
xT
1 C1Y1 = 0
CT
1 x1 ⊥ Y1
CT
1 C1y1 ⊥ Y1
CT
1 C1y1 = λy1
C2 = XT
1 C1Y1
xT
1 C1y1 = σ1 = kC1y1k
⇒ σ2
1 = λ1
y1 = arg max
ky1k=1
kC1y1k
CT
1 C1y1 = λ1y1 ) kC1y1k2 = λ1
(CT
1 C1 − λI)y1 = 0
Derivation of SVD #3
- Now that we have
set so that we get
- After r – 1 steps
with P and Q orthogonal
- To complete the proof, use
and
ˆ P2 = ✓1 P2 ◆ ˆ Q2 = ✓1 Q2 ◆ ˆ D2 = ˆ PT
2 PT 1 C1Q1 ˆ
Q2 = σ1 σ2 C3 D1 = ✓σ1 C2 ◆ = PT
1 C1Q1
D2 = ✓σ2 C3 ◆ = PT
2 C2Q2
and
ˆ PT Cˆ Q = D = diag(σ1, σ2, . . . , σr)
ˆ P ˆ Q
UT AV = ✓C ◆ ✓ˆ PT I ◆ ✓C ◆ ✓ˆ Q I ◆ = ✓ˆ PT Dˆ Q ◆ = ✓D ◆
The Singular Value Decomposition
- For each A in Rm×n of rank r, there are orthogonal
matrices Um×m and Vn×n, and a diagonal matrix such that
- The are called the nonzero singular values of A
- When r < p = min(m,n), A is said to have p – r additional
zero singular values
- This factorization is called the singular value decomposition of A
- The columns of U and V are called left-hand and right-hand
singular vectors of A, respectively
D = diag(σ1, σ2, . . . , σr) with σ1 ≥ σ2 ≥ · · · ≥ σr > 0 σi
A = U ✓D ◆ VT
Left and right singular vectors
- The bases for R(AT ) in V and for R(A) in U
are chosen in a very special way
AV = U ✓D ◆ ⇒ Avi = σiui ⇒ uT
i A = σivT i
UT A = ✓D ◆ VT
Image of Unit Sphere
- For a nonsingular matrix A in Rn×n having singular values
and a SVD A = UDVT, where , the image of the unit 2-sphere is an ellipsoid whose kth semiaxis is given by
- Furthermore, vk in unit sphere is such that
σ1 ≥ σ2 ≥ · · · ≥ σn D = diag(σ1, σ2, . . . , σn) σkuk Avk = σkuk
Av1 = σ1u1 Av2 = σ2u2 Av3 = σ3u3 v1 v2 v3
A
R(A) R(AT )
Connection to ATA and AAT
- The squared singular values are non-zero
eigenvalues of both ATA and of AAT
- Furthermore, the vectors vi in V and ui in U are
corresponding eigenvectors of ATA and AAT
- Proof
(AT A)n×n = V ✓D2 ◆ VT ⇒ AT A V = V ✓D2 ◆ ⇒ AT A vi = σ2
i vi
(AAT )m×m = U ✓D2 ◆ UT ⇒ AAT U = U ✓D2 ◆ ⇒ AAT ui = σ2
i ui
σ2
i
Compact SVDs
- When matrix Am×n has many more rows than
columns, n << m, it is better to use
- Thin SVD:
- Compute only first n columns in U and
- We usually don’t care about the Nullspaces
- When the rank(A) = r is much smaller than
the number of rows or columns
- Economy/Compact SVD:
- Compute and store only first r columns
A = UnΣnVT
Σ
A = UrΣrVT
r
Dimensionality Reduction
- Map points in high-dimensional space to points
in a lower-dimensional space
- Try to preserve structure
- Pairwise distances etc
- Useful for further processing
- Less computation, fewer parameters
- Easier to understand, visualize
Principal Component Analysis
- Principal Component Analysis (PCA) is a method
for approximating a high-dimensional data set using a lower-dimensional affine space
Original axes * * * * * * * * * * * * * * * * * * * * * * * * Data points First principal component Second principal component
PCA: Formal definition
- Given a set of points in Rn,
can we find the m-dimensional subspace V of Rn that allows us to best represent the set D ?
- For each xi, we want to find an yi such that
- Given m, find V and a that minimize
D = {x1, x2, . . . , xN} yi = Pxi + a, Pxi ∈ V, a ∈ Rn ε = 1 N
N
X
i=1
kyi xik2 = 1 N
N
X
i=1
ka (I P)xik2
PCA: Derivation #1
- First, let us find the value of a
- Using calculus,
∂ε ∂[a]k = 0 ε = 1 N
N
X
i=1
- a − (I − P)xi
T a − (I − P)xi
- = 1
N
N
X
i=1
aT a − 2aT (I − P)xi + xT (I − P)T (I − P)xi ⇒ 1 N
N
X
i=1
2eT
k a − 2eT k (I − P)xi = 0
⇒ 1 N
N
X
i=1
a = 1 N
N
X
i=1
(I − P)xi = (I − P)
N
X
i=1
xi N ⇒ a = (I − P)¯ x
PCA: Derivation #2
- Now that we know a, we can write
- Take orthonormal bases for V and its orthogonal
complement and put in matrices V and U
ε = 1 N
N
X
i=1
kUUT (xi ¯ x)k2 = 1 N
N
X
i=1
kUT (xi ¯ x)k2 = 1 N
N
X
i=1 n−m
X
j=1
uT
j (xi − ¯
x)(xi − ¯ x)T uj ε = 1 N
N
X
i=1
k(I P)(xi ¯ x)k2 =
n−m
X
j=1
uT
j
1 N N X
i=1
(xi − ¯ x)(xi − ¯ x)T ! uj =
n−m
X
j=1
uT
j S uj
S is the covariance matrix, and it does not depend on U [X]∗j = xj − ¯ x S = 1 N
N
X
i=1
(xi − ¯ x)(xi − ¯ x)T S = 1 N XXT
PCA: Derivation #3
- Want to use Calculus again to find ui minimizing,
- Unconstrained minimization gives ui = 0, so we use a
Lagrange multiplier to enforce
- Differentiating
- We simply pick the left singular vectors ui associated to
the n – m smallest singular values of X as a basis for
- Pick the remaining singular vectors to form the basis of V
- Each vi gives a principal component
- The corresponding gives the associated importance
ε =
n−m
X
j=1
uT
j Suj
kuik2 = 1
¯ ε = ε −
n−m
X
j=1
λj(uT
j uj − 1)
∂¯ ε ∂[ui]k = 0 ⇒ Sui = λiui ⇒ XXT ui = λiui ¯ ε =
n−m
X
j=1
λj with
σi V⊥
Comparing PCA to Least Squares
Least squares Principal component a v1
Example uses of SVD
- Image Compression
- Look at image as a matrix
- Store only truncated SVD decomposition
- First few singular values/vectors should be enough
- Recognition
- Use PCA on training set
- Project unknown face on first few principal components
- Compare projection coordinates to other faces in database
SVD for Image Compression
- Look at an image as a matrix A and compute SVD
- Truncate to first p singular values and vectors
˜ Am×n = (U)m×p(Σ)p×p(VT )p×n
1 (.5%) 2 (.8%) 5 (2%) 10 (4%) 15 (6%) 20 (8%) 25 (10%) 30 (12%) 50 (20%) 512×512
PCA on Faces: “Eigenfaces”
Faces in one of 10 different training poses First 10 principal components Average face
PCA on Faces: “Eigenfaces”
Faces in unseen pose
20 40 60 80 100 120 140 160 50 100 150 200 250 300
All principal values 5 All 10 15 20 25
SVD connection to Matrix 2-norm
- For a nonsingular matrix A in Rn×n having singular values
and a SVD A = UDVT, where , we have
- The degree of distortion of the unit sphere transformed by
A is measured by the 2-norm condition number
- Note that is a multiple of an orthogonal matrix
σ1 ≥ σ2 ≥ · · · ≥ σn D = diag(σ1, σ2, . . . , σn)
σ1 = kAv1k2 = max
kxk2=1 kAxk2 = kAk2
σn = kAvnk2 = min
kxk2=1 kAxk2 = 1/kA1k2
κ2 = σ1 σn = kAk2 kA−1k2 1 κ2 = 1 ⇔ A
Uncertainty in Linear Systems
- Many sources of uncertainty
- Modeling errors: simplifying assumptions
- Data collection: measurements have finite precision
- Data entry: computers have finite precision
- Round-off errors during computation
- How can we evaluate the accuracy of a solution?
- When A is known exactly, but b has uncertainty
- Can we even check the answer we found?
Uncertainty in b
- Wanted Ax = b, ended up with
- Compute relative uncertainty
in x from relative uncertainty in b
- Conversely,
- So
kx ˜ xk/kxk kb ˜ bk/kbk = kek/kbk
kx ˜ xk = kA−1(Ax A˜ x)k = kA−1(b ˜ b)k = kA−1ek kA−1k kek kbk = kAxk kAk kxk kek = kb ˜ bk = kA(x ˜ x)k kAk kx ˜ xk kxk = kA−1bk kA−1k kbk ) kx ˜ xk kxk kAk kA−1k kek kbk = κ kek kbk ) kx ˜ xk kxk
- kek
kAk kA−1k kbk = 1 κ kek kbk 1 κ kek kbk kx ˜ xk kxk κ kek kbk with κ = kAk kA−1k κ ≥ 1
A˜ x = b − e = ˜ b
Ill-conditioning Example
A = ✓ .2315 −.3071 −.5525 .7387 ◆ A˜ x = b + e Ax = b
1 κ kek kbk kx ˜ xk kxk κ kek kbk b1 = ✓ .6 −.8 ◆ ˜ b1 = ✓ .6008 −.7994 ◆ e1 = ✓.0008 .0006 ◆ kx1 ˜ x1k kx1k = 1 x1 = ✓ .3846 −.9231 ◆ A−1b1 = ˜ x1 = ✓1.3077 −.5385 ◆ A−1˜ b1 = b2 = ✓.8 .6 ◆ e2 = ✓ .0006 −.0008 ◆ ˜ b2 = ✓.8006 .5992 ◆ kx2 ˜ x2k kx2k = 10−6 ˜ x2 = ✓923.0773 384.6145 ◆ A−1˜ b2 = x2 = ✓923.0769 384.6154 ◆ A−1b2 =
Worst and Best Cases for Uncertainty
- When κ is small, we have a well-conditioned system
- Small relative uncertainty cannot have large impact on solution
- When κ is large, we have an ill-conditioned system
- Small relative uncertainty may or may not have large impact
- Can we reach either side of the equality?
- Things are easier to analyze when using the 2-norm
- Pick e and b along left singular vectors of A
- Since , we can reach both sides of the equation
1 κ kek kbk kx ˜ xk kxk κ kek kbk kx ˜ xk2 kxk2 = kA−1ek2 kA−1bk2 = kA−1✏uik2 kA−1ujk2 = k⇥vi/⇤ik2 kvj/⇤jk2 = j i |✏| || = σj σi kek2 kbk2 κ = σ1 σn
Ill-conditioning Example Explained
A = ✓ .2315 −.3071 −.5525 .7387 ◆
b1 = ✓ .6 −.8 ◆ ˜ b1 = ✓ .6008 −.7994 ◆ e1 = ✓.0008 .0006 ◆ kx1 ˜ x1k kx1k = 1 x1 = ✓ .3846 −.9231 ◆ A−1b1 = ˜ x1 = ✓1.3077 −.5385 ◆ A−1˜ b1 = b2 = ✓.8 .6 ◆ e2 = ✓ .0006 −.0008 ◆ ˜ b2 = ✓.8006 .5992 ◆ kx2 ˜ x2k kx2k = 10−6 ˜ x2 = ✓923.0773 384.6145 ◆ A−1˜ b2 = x2 = ✓923.0769 384.6154 ◆ A−1b2 = = 1 5 ✓ 3 4 −4 3 ◆ ✓1 10−3 ◆ 1 13 ✓ 5 12 −12 5 ◆T κ = σ1 σ2 = 103 = 10−3 5 ✓4 3 ◆ = 1 5 ✓ 3 −4 ◆ = κ ke1k kb1k = 10−3 5 ✓ 3 −4 ◆ = 1 5 ✓4 3 ◆ = 1 κ ke2k kb2k
Checking the solution
- Assume we found a solution for
- Compute the residual
- If r = 0 exactly, then must be the exact solution
- What about when r is only close to zero (say, with t-digits)?
- Can we say anything about how accurate is?
- We have already answered this question
- Set so that r = e in our previous expression
- Residuals are an indication of solution accuracy only
when the matrix is well conditioned ˜ x Ax = b
˜ x r = b − A˜ x ˜ x A˜ x = b − r
1 κ krk kbk kx ˜ xk kxk κ krk kbk