NORMS, INNER PRODUCTS, AND ORTHOGONALITY Vector norms Generalize - - PowerPoint PPT Presentation

norms inner products and orthogonality vector norms
SMART_READER_LITE
LIVE PREVIEW

NORMS, INNER PRODUCTS, AND ORTHOGONALITY Vector norms Generalize - - PowerPoint PPT Presentation

NORMS, INNER PRODUCTS, AND ORTHOGONALITY Vector norms Generalize the familiar concept of length from R 2 and R 3 to higher dimensions Cant really visualize in higher dimensions Must trust algebra Generalize the concept of


slide-1
SLIDE 1

NORMS, INNER PRODUCTS, AND ORTHOGONALITY

slide-2
SLIDE 2

Vector norms

  • Generalize the familiar concept of “length”

from R2 and R3 to higher dimensions

  • Can’t really visualize in higher dimensions
  • Must trust algebra
  • Generalize the concept of length itself
  • Find basic properties every “length” should have
  • Find different scalar functions of vectors that

share these properties

slide-3
SLIDE 3

Euclidean Vector Norm

  • In R2 and R3 we have Pythagoras!
  • Begs the generalization to higher dimensions

x y kvk = p x2 + y2 kuk = p kvk2 + z2 = p x2 + y2 + z2 = √ vT v = √ uT u kwk = n X

i=1

w2

i

!1/2 = p wT w, w 2 Rn×1 z u =   x y z   v = ✓x y ◆ ⇥w⇥ = n X

i=1

|wi|2 !1/2 = ⇤ w∗w, w Cn×1

slide-4
SLIDE 4

A few notes

  • The distance between two points u and v in

Rn is simply

  • The definition guarantees
  • The unit norm vector in the same direction as

v is simply ku vk v/kvk kvk 0 kvk = 0 ) v = 0 αv = |α|v for all scalars

slide-5
SLIDE 5

Connection to standard inner-product

  • We already saw one connection between the

standard inner-product and Euclidean norm

  • There is another important connection in the

form of an inequality known as the Cauchy- Bunyakovskii-Schwarz (CBS) Inequality

kxk = p x∗x x∗y =

n

X

i=1

¯ xiyi x, y ∈ Cn×1 x, y ∈ Rn×1 xT y =

n

X

i=1

xiyi kxk = p xT x

slide-6
SLIDE 6

The CBS Inequality

  • Cauchy-Bunyakovskii-Schwarz inequality
  • Furthermore, equality holds iff
  • Useful for example to prove another

inequality: the Triangle Inequality

  • It is also important when we generalize the

concept of angle to higher dimensions |x∗y| ⇥x⇥⇥y⇥ y = αx

slide-7
SLIDE 7

Proof of CBS Inequality

  • Equality certainly holds when x = 0
  • Equality also holds when
  • Otherwise, look at the norm of
  • Pick to kill first term

y = αx

|x∗y|2 = (x∗y)(x∗y) = (x∗y)(y∗x) =

  • x∗(αx)

αx∗)x

  • = (x∗x)

αx∗)(αx)

  • = (x∗x)(y∗y)

0 < kαx yk2 = (αx y)∗(αx y) = ¯

αx∗(αx − y) − y∗(αx − y)

αx − y α

α = x∗y x∗x ⇒ 0 < y∗y − αy∗x = (y∗y)(x∗x) − (x∗y)(y∗x) x∗x = ⇥y⇥2⇥x⇥2 |x∗y|2 ⇥x⇥2

|x∗y| ⇥x⇥⇥y⇥

slide-8
SLIDE 8

Triangle Inequality

  • Important to ensure our concept of length in

higher dimensions makes sense

  • Proof

u v u + v

ku + vk  kuk + kvk, u, v 2 Cn

ku + vk2 = (u + v)∗(u + v) = u∗u + u∗v + v∗u + v∗v = kuk2 + u∗v + v∗u + kvk2 u∗v + v∗u = 2Re (u∗v) ≤ 2|u∗v|  2kukkvk ku + vk2  kuk2 + 2kukkvk + kvk2 =

  • kuk + kvk

2

slide-9
SLIDE 9

Other notions of length

  • There are three commonly used norms
  • Special cases of the p-norm, p ≥ 1

Euclidean norm u v Manhattan (grid) norm u v x2 = n X

i=1

|xi|2 !1/2 x1 =

n

X

i=1

|xi| u v Max norm kxk∞ = max

i

xi

x ∈ Cn xp = n X

i=1

|xi|p !1/p

slide-10
SLIDE 10

General Vector Norms

  • All p-norms satisfy the properties
  • From now on, we define a norm for a real or

complex vector space V as any function mapping V into R that satisfies these conditions

  • No need for coordinates

k ? k kx + yk  kxk + kyk kxk 0 kxk = 0 ) x = 0 and kαxk = |α|kxk for all scalars α

slide-11
SLIDE 11

Matrix norms

  • Since Cm×n is a vector space with dimension mn,

we could use any vector norm from Cmn

  • The Euclidean norm leads to the Frobenius norm
  • Not as useful as we would like
  • Motivates useful property related to matrix product
  • Compatibility with Euclidean vector norm
  • Submultiplicative property

A2

F =

X

i,j

|aij|2 = X

i

Ai∗2

2 =

X

j

A∗j2

2 = trace(A∗A)

Ax2

2 =

X

i

|Ai∗x|2  X

i

kAi∗k2

2 kxk2 2 = kAk2 F kxk2 2

 kAk2

F

X

j

kB∗jk2 = kAk2

F kBk2 F

kABk2

F =

X

j

  • [AB]∗j
  • 2

2 =

X

i

  • A[B]∗j
  • 2

2

slide-12
SLIDE 12

General Matrix Norm

  • A matrix norm is a function from the set of

all complex matrices (of finite order) into R that satisfies the following properties

  • Frobenius norm is one example
  • Are there other examples?

k ? k and kAk 0 kAk = 0 ) A = 0 for all scalars α kαAk = |α|kAk kA + Bk  kAk + kBk (matrices of same size) kABk  kAk kBk (conformable matrices)

slide-13
SLIDE 13

Induced Matrix Norms

  • A vector norm defined on Cp for p = m,n

induces a matrix norm on Cm×n by setting

  • The induced matrix norm is compatible with

the underlying vector norm

  • Additionally, if A is nonsingular

kAk = max

kxk=1 kAxk,

A 2 Cm⇥n, x 2 Cn⇥1 kAxk  kAk kxk min

kxk=1 kAxk = 1/kA1k

slide-14
SLIDE 14

Matrix 2-norm

  • Matrix norm induced by Euclidean vector

norm is not the Frobenius norm

  • Instead

where is the largest such that is singular

  • If A is non-singular, then

kAk2 = max

kxk2=1 kAxk2 =

p λmax

λmax λ A∗A − λI

kA1k2 = 1 minkxk2=1 kAxk2 = 1 pλmin

slide-15
SLIDE 15

Why is this useful?

  • Assume we solve a linear system Ax = b and

find the solution vector

  • How close are we to the actual solution x?
  • Ideal measure of error is relative
  • So knowledge of is very useful!
  • But how would you compute it?

ˆ x

kx ˆ xk kˆ xk = kA−1Ax A−1Aˆ xk kˆ xk = kA−1(b Aˆ x)k kˆ xk = kA−1ˆ rk kˆ xk  kA−1k kˆ rk kˆ xk

kA−1k

slide-16
SLIDE 16

General Inner Products

  • Euclidean norm came from standard inner-product
  • We already generalized vector norm
  • Now we want to generalize inner product
  • An inner product on a vector space V is a function that

maps each pair of vectors x,y to a scalar such that the following properties hold

⇥x | x⇤ R ⇥x | x⇤ 0 ⇥x | x⇤ = 0 x = 0 hαx | yi = αhx | yi hx + y | zi = hx | zi + hy | zi x | y⇥ = y | x⇥

slide-17
SLIDE 17

Examples of inner products

  • The standard inner-product in
  • The elliptical inner product or A-inner product
  • The trace inner product
  • Standard inner-product for matrices (leads to Frobenius matrix norm)
  • Reduces to standard-inner product for column vectors
  • In the space V of functions defined on (a, b)
  • nly for A non-singular (why?)

hx | yi = yT x hx | yi = y∗x hx | yi = y∗A∗Ax

hA | Bi = trace(AT B)

A | B⇥ = trace(A∗B) f | g⇥ = Z b

a

f(t)g(t)dt hf, gi = Z b

a

f(t)g(t)dt

slide-18
SLIDE 18

Norms in Inner-Product Spaces

  • If V is an inner-product space with an inner

product , then defines a norm on V

  • Most norm properties are immediately satisfied

from the definition of general inner product

  • The proof of the Triangle Inequality comes again

from the CBS inequality

  • CBS holds for general inner products

x | y⇥ ⇤x⇤ = p x | x⇥

slide-19
SLIDE 19

Not all norms define an inner product

  • Given a norm on V, is there always an inner

product such that ?

  • No, not always. And yes iff the norm satisfies

the parallelogram identity

  • In which case

is an inner product on V

  • Only holds for p = 2 in p-Norms
  • Complex case is a bit more complicated

kxk ⇤x⇤2 = x | x⇥

kx + yk2 + kx yk2 = 2

  • kxk2 + kyk2

⇥x | y⇤ = 1 4

  • ⌅x + y⌅2 ⌅x y⌅2
slide-20
SLIDE 20

ORTHOGONALITY

slide-21
SLIDE 21
  • Generalize concept based on Pythagorean theorem and

general inner product spaces

  • In an inner-product space V, we say two vectors x, y

are orthogonal when and denote by

  • In Rn, with standard inner-product,
  • In Cn, with standard inner-product,

⇥ ⇤u v | u v⌅ = ⇤u | u⌅ + ⇤v | v⌅

Orthogonallity

kuk kvk k u

  • v

k ku vk2 = kuk2 + kvk2 ⇥u | v⇤ + ⇥v | u⇤ = 0 ⇥u | v⇤ + ⇥u | v⇤ = 0

x | y⇥ = 0 x ⊥ y

x ⊥ y ⇔ yT x = 0 x ⊥ y ⇔ y∗x = 0

Re ⇥u | v⇤ = 0

slide-22
SLIDE 22

Example of Orthogonallity

  • Find the solution to a linear least squares

problem Ax = b

  • In particular when b is not in R(A), we

wonder about the residual

  • Turns out that

ˆ x ˆ r = b − Aˆ x ˆ r ⊥ R(A)

AT Aˆ x = AT b ⇒ AT (b − Aˆ x) = 0 ⇒ ATˆ r = 0 ⇒ yT ATˆ r = 0 ⇒ (Ay)Tˆ r = 0

slide-23
SLIDE 23

Angles between general vectors

  • Use the law of cosines, which generalizes the

Pythagorean theorem for all triangles

  • CBS inequality guarantees RHS is in [–1, 1]
  • So there is a for every u and v

kuk2 + kvk2 ku vk2 2 kuk kvk = cos θ

θ

= Re

  • hu | vi
  • kuk kvk
slide-24
SLIDE 24

Orthonormal sets

  • is called an orthonormal

set whenever for each i and when whenever

  • Every orthonormal set is l.i.
  • Every orthonormal set with n vectors from an

n-dimensional vector space V constitutes an

  • rthonormal basis for V

B = {u1, u2, . . . , un} kuik = 1 ui ⊥ uj i 6= j

⇥ui | uj⇤ = ( 0, i = j 1, i = j

slide-25
SLIDE 25

Advantage of Orthonormal Bases

  • Recall that writing a vector in basis B in general

requires solving a linear system

  • When is orthonormal,

the coordinates are immediately available

  • Simply write

and notice that B = {u1, u2, . . . , un}

x = ξ1u1 + ξ2u2 + · · · + ξnun x | ui⇥ = D

n

X

i=j

ξiuj

  • ui

E =

n

X

i=j

ξiuj | ui⇥ =

n

X

i=j

ξiuj | ui⇥ = ξiui | ui⇥ = ξi

slide-26
SLIDE 26

Fourier Expansion

  • If is an orthonormal

basis for an inner product space V, then each x in V can be expressed as

  • This is the Fourier expansion of x
  • The scalars are the coordinates of x

with respect to B, its Fourier coefficients

  • Each represents the projection of x
  • nto the line spanned by ui

B = {u1, u2, . . . , un} x = x | u1⇥u1 + x | u2⇥u2 + · · · + x | un⇥un x | ui⇥ui x | ui⇥

slide-27
SLIDE 27

Idea of Gram-Schmidt Procedure

  • Orthonormal bases often desirable
  • Given a general basis

for a vector space V, can we construct an

  • rthonormal basis spanning V ?
  • Yes: Use the Gram-Schmidt procedure
  • At each step k we complete an orthonormal

basis Sk for

  • I.e., proceed incrementally

B = {x1, x2, . . . , xn} span{x1, x2, . . . , xk}

slide-28
SLIDE 28

Gram-Schmidt Procedure Steps

  • The first step is easy
  • Now assume we have Sk and want to find uk+1
  • uk+1 ⊥ ui, i < k + 1
  • Use Fourier expansion of xk+1 to solve for uk+1

u1 = x1 kx1k S1 = {u1}

span{x1, x2, . . . , xk, xk+1} = span

  • Sk ∪ {uk+1}
  • kuk+1k = 1

uk+1 = αk+1uk+1 kαk+1uk+1k xk+1 = αk+1uk+1 +

k

X

i=1

ui | xk+1⇥ui αk+1uk+1 = xk+1

k

X

i=1

⇥ui | xk+1⇤ui

slide-29
SLIDE 29

Fourier Expansion in Matrix Form

  • Let be an orthonormal

basis for V, and put all ui as columns in U

  • Now, using the standard inner-product
  • So U* = U-1?

B = {u1, u2, . . . , un}

x =

n

X

j=1

uj(u∗

jx) = U

     u∗

1x

u∗

2x

. . . u∗

nx

     = UU∗x [U∗U]ij = u∗

i uj =

( 1, i = j 0, i 6= j = U      [U∗]1∗ x [U∗]2∗ x . . . [U∗]n∗ x     

slide-30
SLIDE 30

Gram-Schmidt in Matrix Form

  • First step is
  • To produce uk+1, compute

where the orthogonal matrix Uk is given by

u1 = x1 kx1k Uk = u1 u2 · · · uk

  • uk+1 =

(I UkU∗

k)xk+1

k(I UkU∗

k)xk+1k

slide-31
SLIDE 31

Unitary and Orthogonal Matrices

  • A unitary matrix is a complex matrix Un×n whose

columns form an orthonormal basis for Cn

  • An orthogonal matrix is a real matrix Pn×n whose

columns form an orthonormal basis for Rn

  • The following characterizations are equivalent
  • P (or U) has orthonormal columns
  • P (or U) have orthonormal rows
  • P-1 = PT (or U-1 = U*)
  • for all x in Rn×1

(or for all x in Cn×1) kPxk2 = kxk2 kUxk2 = kxk2

slide-32
SLIDE 32

The QR Factorization

  • The Gram-Schmidt procedure can be seen as

a matrix factorization

  • Put each xk of basis as

a column of matrix A

  • Put each uk in Sn as columns of an
  • rthonormal or unitary matrix Q
  • Pick column i of matrix R to compute the

linear combination of uk that results in xi

  • Results in R being upper-triangular!

B = {x1, x2, . . . , xn}

slide-33
SLIDE 33

The QR factorization

  • Every matrix Am×n with l.i. columns can be

uniquely factored as A = QR where the columns of Qm×n form an orthonormal basis for R(A) and Rn×n is an upper-triangular matrix with positive diagonal entries

R =        α1 q∗

1a2

q∗

1a3

· · · q∗

1an

α2 q∗

2a3

· · · q∗

2an

α3 · · · q∗

3an

. . . . . . . . . ... . . . · · · αn        ak = αkqk +

k−1

X

i=1

hak | qiiqi αk = kak

k−1

X

i=1

hak | qiiqik

slide-34
SLIDE 34

Least-squares application of QR

  • Start from the normal equations
  • Decomposing ATA into LU is both inefficient

and leads to information loss

  • From the QR factorization, we have
  • If RT is non-singular (What if it isn’t?)

AT Ax = AT b

(QR)T QRx = RT (QT Q)Rx = RT Rx = RT QT b

Rx = QT b x = R−1QT b

slide-35
SLIDE 35

General orthogonal projectors

  • Given an m × n orthogonal or unitary matrix U,

find the orthogonal projection z in R(U) of a vector b not necessarily in R(U)

  • This is just like the Fourier expansion, for an

incomplete basis

(Uy)∗(Ux − b) = 0 ⇒ y∗U∗(Ux − b) = 0 ⇒ y∗x = y∗U∗b ⇒ x = U∗b Choose y = ei to prove z = Ux (z − b) ⊥ Uy, ∀y ⇒ z = UU∗b

slide-36
SLIDE 36

Elementary orthogonal projectors

  • For a vector , , matrix

is called an elementary orthogonal projector

  • Note that and
  • with
  • is the orthogonal

projection in

  • is the orthogonal projection of x in the
  • rthogonal complement of

u ∈ Cn×1 x = Qx + (I − Q)x Qx ⊥ (I − Q)x (I − Q)x = u(u∗x) span

  • {u}
  • Qx

u⊥ u

Not an unitary matrix!

Q = I − uu∗ u∗u u 6= 0 Q2 = Q Q∗ = Q

slide-37
SLIDE 37

Elementary reflectors (Householder transformation)

  • For a vector , , matrix

is the elementary reflector about

  • Note that
  • and and
  • u ∈ Cn×1 u 6= 0

R = I − 2uu∗ u∗u u⊥

kx Qxk = kRx Qxk =

  • uu∗

u∗ux

  • R = 2Q − I

R2 = I R∗ = R R−1 = R

Unitary matrix!

QR = RQ = Q

slide-38
SLIDE 38

Geometry of reflectors and projectors

u Qx Rx u⊥ x kx Qxk kRx Qxk (I − Q)x

R = I − 2uu∗ u∗u Q = I − uu∗ u∗u

slide-39
SLIDE 39

Reflector into given vector

  • Can find such that
  • Proof

u0 = v ekvk −e u = v + ekvk v e kek = 1 Rv = v − 2uu∗ u∗uv = v − 2u∗v u∗uu ⇔ 2u∗v = u∗u

µ

R = I − 2uu∗ u∗u ⇒ Rv = ⌥µe u = v ± µe = ⌥µe 2u∗v u∗u = v∗v kµk2 ± ¯ µ e∗v ⌥ µ v∗e = 0 µ = kvk e∗v ke∗vk

slide-40
SLIDE 40

Plane rotations (Givens rotations)

  • Orthogonal matrix of the form

where

  • Performs a rotation in the i,j plane of

Pij(c, s) =                  1 ... c s 1 ... −s c 1 ... 1                  row i → row j → col i → col j →

c2 + s2 = 1

Rn

slide-41
SLIDE 41

Pivoting with plane rotations

  • Kill element below pivot
  • “Exchange” rows

Pij(c, s) x =              x1 . . . cxi + sxj . . . −sxi + cxj . . . xn              = 2 6 6 6 6 6 6 6 6 6 6 6 6 4 x1 . . . q x2

i + x2 j

. . . . . . xn 3 7 7 7 7 7 7 7 7 7 7 7 7 5

c = xi q x2

i + x2 j

s = xj q x2

i + x2 j

x2

i + x2 j 6= 0

Pij(0, 1) x =              x1 . . . xj . . . −xi . . . xn             

slide-42
SLIDE 42

Combined rotations into given vector

  • Rotations from any vector into
  • Use as building block for rotations into

arbitrary vector , with

  • Result of combined rotations is not a rotation,

but is an orthogonal transformation

P12 x = 2 6 6 6 6 6 6 6 4 p x2

1 + x2 2

x3 x4 . . . xn 3 7 7 7 7 7 7 7 5 P13P12 x = 2 6 6 6 6 6 6 6 4 p x2

1 + x2 2 + x2 3

x4 . . . xn 3 7 7 7 7 7 7 7 5 P1n . . . P13P12 x =          kxk . . .         

x kxke1 kxk e kek = 1

slide-43
SLIDE 43

Orthogonal reduction

  • Matrix A can be reduced to echelon form by

elementary row operations

  • Eliminate all entries below pivot
  • Matrix A can also be reduced using

elementary reflectors or plane rotations

  • Using elementary projectors leads to

Householder reduction

  • Using plane rotations leads to Givens

reduction

slide-44
SLIDE 44

Orthogonal reduction

  • Start by finding such that
  • Then find such that
  • And so on…

R1 R1[A]∗1 = µ1e1

R1A =  µ1 tT

1

A2

  • R2

R2[A2]∗1 = µ2e2

R1 1 R2

  • A =

 µ1 tT

1

R2A2

  • R2A2 =

 µ2 tT

2

A3

slide-45
SLIDE 45

Reduces to trapezoidal form

PA =      ∗ ∗ · · · ∗ ∗ · · · ∗ ∗ · · · ∗ ∗ · · · ∗ . . . ... . . . . . . . . . · · · ∗ ∗ · · · ∗      PA =              ∗ ∗ · · · ∗ ∗ · · · ∗ . . . ... . . . · · · ∗ · · · . . . . . . ... . . . · · ·             

slide-46
SLIDE 46

Unicity of QR factorization

  • When matrix is square and non-singular,

Gram-Schmidt, Householder, and Givens produce QR, with R triangular

  • Gram-Schmidt has positive diagonal.

Householder and Givens don’t, but can

  • In this case, factorization is unique
slide-47
SLIDE 47

We now have two factorizations

  • The LU factorization and QR factorization
  • LU gives the roadmap to Gaussian elimination

applied to a square, non-singular matrix

  • QR gives the roadmap to Gram-Schmidt applied to a

matrix with independent columns

  • Both LU and QR can be used to solve linear

systems efficiently

  • LU about twice as efficient
  • QR is useful for least squares as well, more stable

QRx = b ⇒ Rx = QT b

slide-48
SLIDE 48

Practical considerations

  • Gram-Schmidt as presented is numerically

unstable in floating-point

  • For example if applied to a set of vectors that is not

even close to being orthogonal

  • Even modified Gram-Schmidt stable only for LS
  • Householder/Givens are stable and faster
  • When dealing with sparse problems
  • Gram-Schmidt and Householder can transform it into

dense problem

  • Not enough memory?
  • Maybe Givens can help
slide-49
SLIDE 49

COMPLEMENTARY SUBSPACES

slide-50
SLIDE 50

Complementary Subspaces

  • Subspaces X, Y of V are said to be

complementary whenever and , in which case V is said to be the direct sum of X and Y, denoted by

  • Given bases BX and BY for X and Y, the

following statements are equivalent

  • For each v in V, there are unique vectors x in X and

y in Y such that v = x + y

  • is a basis for V and

V = X ⊕ Y

V = X ⊕ Y X ∩ Y = 0 V = X + Y

BX ∪ BY BX ∩ BY = ∅

slide-51
SLIDE 51

Projection

  • Suppose that so that for each

v in V there are unique vectors x in X and y in Y such that v = x + y

  • The vector x is the projection of v onto X along Y
  • The vector y is the projection of v onto Y along X
  • When X and Y are not orthogonal subspaces,

this is an “oblique projection”

  • As opposed to the orthogonal projection

V = X ⊕ Y

slide-52
SLIDE 52

Visually

v = x + y x y X Y

slide-53
SLIDE 53

Computing Oblique Projections

  • Assume we have bases BX and BY for complementary

subspaces X and Y of Rn

  • This means that is a basis for Rn
  • It is easier to find the coordinate matrix for the

projection P onto X along Y in basis B

  • Placing the elements of B (xi and yi) as columns in B

B = BX ∪ BY B = {x1, x2, . . . , xr, y1, y2, . . . , yn−r}

[P]B = [P(x1)]B · · · [P(xr)]B [P(y1)]B · · · [P(yn−r)]B

  • =

e1 · · · er · · · = ✓Ir ◆ P = B ✓Ir ◆ B−1

slide-54
SLIDE 54

Summary of Projections

  • Let X and Y be complementary subspaces of V. Each v in V can

be uniquely expressed as x + y, with x in X and y in Y

  • The unique linear operator P defined by Pv = x is the

projector onto X along Y

  • The following properties hold
  • P2 = P (i.e., P is idempotent)
  • I – P is the complementary projector onto Y along X
  • R(P) = {x | Px = x}
  • R(P) = N(I – P) = X and R(I – P) = N(P) = Y
  • If V = Rn or Cn, then P is given by

where B = [X | Y] and X, Y are respective bases for X, Y

P = B ✓Ir ◆ B−1

slide-55
SLIDE 55

Projectors and Idempotents

  • A linear operator P on V is a projector if and
  • nly if it is idempotent
  • Proof
  • We have just seen that projectors ⇒ idempotents
  • Need to prove that idempotent ⇒ projector

P2 = P ⇒ R(P) N(P) and are complementary v = Pv + (I − P)v Pv ∈ R(P) (I − P)v ∈ N(P) R(P) ∩ N(P) = {0} x = Py Px = 0 ⇒ 0 = Px = P2y = Py = x

slide-56
SLIDE 56

RANGE-NULLSPACE DECOMPOSITION

slide-57
SLIDE 57

Complementary subspaces associated to a square matrix

  • Let’s go back to a matrix An×n
  • The Rank+Nullity theorem guarantees that
  • But are R(A) and N(A) complementary?
  • They certainly are if A is non-singular. Duh.
  • Otherwise, it can happen that
  • We can find complementary subspaces by

looking at the powers of A dim R(A) + dim N(A) = n

R(A) ⇥ N(A) = {0} A = ✓0 1 ◆ ⇒ ✓1 ◆ ∈ R(A) ∩ N(A)

slide-58
SLIDE 58

Range-Nullspace Decomposition

  • For every singular matrix An×n there exists a

positive integer k such that R(Ak) and N(Ak) are complementary subspaces. I.e.,

  • The smallest positive integer k for which this

holds is called the index of A

  • For non-singular matrices, we simply define

index(A) = 0 Rn = R(Ak) ⊕ N(Ak)

slide-59
SLIDE 59

Proof

  • The nullspace doesn’t shrink and the range doesn’t grow with k
  • At some point there must be equality
  • The dimensions can’t shrink below zero or grow above n
  • Once equality is attained, it is preserved forever
  • Equality is attained at the same k for the nullspace
  • When equality is attained,
  • Finally, when equality is attained

R(Ak) = R(Ak+1) ⇒ AR(Ak) = AR(Ak+1) ⇒ R(Ak+1) = R(Ak+2) dim N(Ak) = n − dim R(Ak) = n − dim R(Ak+1) = dim N(Ak+1)

R(Ak) ∩ N(Ak) = {0} R(Ak) + N(Ak) = Rn

x ∈ R(Ak) ∩ N(Ak) ⇒ x = Aky, Akx = 0 ⇒ A2ky = 0 ⇒ y ∈ N(A2k) ⇒ y ∈ N(Ak) ⇒ x = Aky = 0 dim ⇥ R(Ak) + N(Ak) ⇤ = dim R(Ak) + dim N(Ak) − dim R(Ak) ∩ N(Ak) R(A) ⊇ R(A2) ⊇ · · · ⊇ R(Ak) ⊇ R(Ak+1) ⊇ · · · N(A) ⊆ N(A2) ⊆ · · · ⊆ N(Ak) ⊆ N(Ak+1) ⊆ · · · = dim R(Ak) + dim N(Ak) = n ⇒ R(Ak) + N(Ak) = Rn

slide-60
SLIDE 60

Summary about the Index of a square matrix

  • The index of a square matrix A is the smallest

nonnegative integer k such that any of these holds

  • rank(Ak) = rank(Ak+1)
  • R(Ak) = R(Ak+1) (the point where it stops shrinking)
  • N(Ak) = N(Ak+1) (the point where it stops growing)
  • For non-singular matrices, index(A) = 0
  • For singular matrices, index(A) is the smallest positive

integer k such such that any of these holds

  • R(Ak) ∩ N(Ak) = {0}

Rn = R(Ak) ⊕ N(Ak)

slide-61
SLIDE 61

Nilpotent Matrices

  • Nn×n is said to be nilpotent whenever Nk = 0

for some positive integer k

  • k = index(N) is the smallest positive integer

such that Nk = 0

  • Proof
  • As soon as Nk = 0, we have R(Nk+i) = {0}
  • index(N) ≤ k
  • On the other hand, if we reach index(N) and

R(Nk) ≠ {0}, we could never have Nk = 0

  • index(N) ≥ k
slide-62
SLIDE 62

Example of Nilpotent Matrix

  • Start taking powers
  • index(N) = 3, since N2 ≠ 0 and N3 = 0

N =   1 1   N2 =   1   N3 =    

slide-63
SLIDE 63

Application for the Range-Nullspace decomposition

  • If k = index(A), then both R(Ak) and

N(Ak) are invariant subspaces of A

  • When we write A in basis [R(Ak) N(Ak)], we

get a block diagonal matrix

  • This will be very useful as a step toward the

Jordan form we will see later on

x ∈ AN(Ak) ⇒ x = Aw, w ∈ N(Ak) = N(Ak+1) AR(Ak) = R(Ak+1) = R(Ak) ⇒ Akx = Ak+1w = 0 ⇒ x ∈ N(Ak)

slide-64
SLIDE 64

Core-Nilpotent Decomposition

  • Let An×n be a singular matrix with index k, and let

rank(A) = r. There exists a non-singular matrix Q such that where C is nonsingular and N is nilpotent of index k

  • A is similar to a 2×2 block diagonal matrix containing a

non-singular “core” C and a nilpotent component N

  • This block diagonal matrix is known as the

core-nilpotent decomposition of A

  • When A is non-singular, this is trivial (and useless)

Q−1AQ = ✓Cr×r N ◆

slide-65
SLIDE 65

Core-Nilpotent Decomposition

  • Proof
  • We have already seen the block diagonal property
  • To see that N is nilpotent, write

with columns of X, Y bases for R(Ak), N(Ak)

  • Since rank(Ck) = rank(Ak) = r, C is non-singular
  • To see that index(N) = k, imagine it is smaller
  • Then rank(Ak-1) = rank(Ck-1) = r = rank(Ak)
  • So index(A) < k, which is a contradiction

✓Ck Nk ◆ = Q−1AkQ = ✓U V ◆ Ak X Y = ✓UAkX VAkX ◆

slide-66
SLIDE 66

ORTHOGONAL DECOMPOSITION

slide-67
SLIDE 67

Orthogonal Complement

  • For a subset M of an inner-product space V,

the orthogonal complement M⊥ of M is the set of all vectors in V that are orthogonal to all vectors in M

M⊥ = {x V | ⇥m | x⇤ = 0 for all m M}

M⊥ M = {x} M⊥ M

slide-68
SLIDE 68

Orthogonal Complementary Subspaces

  • If M is a subspace of an finite-dimensional

inner-product space V, then

  • Furthermore, if N is another subspace such

that and , then

  • Finally, if dim V = n, then

V = M ⊕ N N ⊥ M N = M⊥ V = M ⊕ M⊥ dim M⊥ = n − dim M M⊥

⊥ = M

N = M⊥ N ⊕ M = V N ⊕ N ⊥ = V N ⊥ = M

slide-69
SLIDE 69

Orthogonal Decomposition Theorem

  • For every Am×n
  • Proof
  • In other words, every matrix in Rm×n produces
  • rthogonal decompositions of Rm and Rn

R(A)⊥ = N(AT ) N(A)⊥ = R(AT ) Rm = R(A) ⊕ N(AT ) Rn = N(A) ⊕ R(AT )

x ∈ R(A)⊥ ⇔ xT Ay = 0 ⇔ xT A = 0 ⇔ x ∈ N(AT )

slide-70
SLIDE 70

Significance #1

  • The orthogonal decomposition theorem give us

decompositions of Rm and Rn in terms of the four fundamental subspaces

  • The resulting bases for Rm and Rn reveal more

about basic components of matrix A itself

  • Combine orthonormal bases for R(A), N(AT )

and for R(AT ), N(A) into bases of Rm and Rn Um×m =

  • R(A)

N(AT )

  • Vn×n =
  • R(AT )

N(A)

slide-71
SLIDE 71

Significance #2

  • What does A look like w.r.t. these bases?
  • Multiply on the right by V and on the left by U-1

Um×m =

  • R(A)

N(AT )

  • Vn×n =
  • R(AT )

N(A)

  • R = U−1AV = UT AV = UT A
  • R(AT )

N(A)

  • = UT

AR(AT ) = ✓ R(A)T N(AT )T ◆ AR(AT )

  • =

✓ R(A)T AR(AT ) ◆ = ✓Cr×r ◆

slide-72
SLIDE 72

The URV Factorization

  • For each matrix Am×n in Rm×n of rank r, there are
  • rthogonal matrices Um×m and Vn×n and a non-singular

matrix Cr×r such that

  • The first r columns of U are an orthonormal basis for R(A)
  • The last m – r columns of U an orthonormal basis for N(AT )
  • The first r columns of V are an orthonormal basis for R(AT )
  • The last n – r columns of U an orthonormal basis for N(A)
  • In the complex case, replace orthogonal by unitary and

conjugate the transposes

A = URVT = U ✓Cr×r ◆

m×n

VT

slide-73
SLIDE 73

Four fundamental subspaces

A AT A AT

dim = r dim = n − r R(AT ) AT y N(A) Az = 0 z ⊥ AT y

Rn = R(AT ) ⊕ N(A)

dim = m − r dim = r w ⊥ Ax R(A) Ax N(AT ) AT w = 0

Rm = R(A) ⊕ N(AT )

slide-74
SLIDE 74

The action of A

R(AT ) N(A) R(A) N(AT )

Rn = R(AT ) ⊕ N(A) Rm = R(A) ⊕ N(AT ) A

y x z v = x + z

slide-75
SLIDE 75

Inverting what we can

R(AT ) N(A) R(A) N(AT )

Rn = R(AT ) ⊕ N(A) Rm = R(A) ⊕ N(AT )

x

A†

y w y + w = u

?

slide-76
SLIDE 76

Inverting what we can

R(AT ) N(A) R(A) N(AT )

Rn = R(AT ) ⊕ N(A) Rm = R(A) ⊕ N(AT )

x

A†

x z v = x + z y

?

slide-77
SLIDE 77

Inverting what we can

R(AT ) N(A) R(A) N(AT )

Rn = R(AT ) ⊕ N(A) Rm = R(A) ⊕ N(AT )

x

A†

y w y + w = u

slide-78
SLIDE 78

Finding a matrix representation

  • Start from an URV factorization of A
  • Singularity of A has been isolated
  • Problem comes from N(A) and N(AT )
  • Map from R(AT ) and R(A) is invertible
  • It is given by invertible matrix C
  • Simply transpose and replace C with C–1

A†

n×m = V

✓ C−1

r×r

◆ UT Am×n = U ✓Cr×r ◆ VT

slide-79
SLIDE 79

The Moore-Penrose Inverse

  • In terms of the URV factorization, the

Moore-Penrose Inverse of

  • When Ax = b is consistent, is the

solution with minimal Euclidean norm

  • When Ax = b is inconsistent, is the least

squares solution of minimal Euclidean norm

  • Note: much like the inverse, the pseudoinverse

is useful in theory, not in practical applications

A†

n×m = V

✓ C−1

r×r

◆ UT Am×n = U ✓Cr×r ◆ VT is

x = A†b x = A†b

slide-80
SLIDE 80

The Moore-Penrose Inverse

  • Proof for the consistent case
  • solves Ax = b
  • has minimum Euclidean norm

A = AA†A

x = A†b x = A†b

R(A†) = R(AT ) x = x0 + N(A) = x0 + n R(AT ) ⊥ N(A) ⇒ x0 ⊥ n kxk2 = kx0 + nk2 = kx0k2 + knk2 kx0k2 Ax0 = AA†b = AA†Ax = Ax = b

slide-81
SLIDE 81

Matrix decompositions based on four fundamental spaces #1

  • Orthogonal decomposition
  • Used by URV decomposition
  • Works with any matrix
  • Different orthogonal transformations on each side
  • Specializes to the singular value decomposition
  • Range-Nullspace decomposition
  • Used by Core-Nilpotent factorization
  • Matrix must be square
  • Based on similarity transformation
  • Paves the way to the Jordan form
slide-82
SLIDE 82

Matrix decompositions based on four fundamental spaces #1

  • Similarities reveal properties of a transformation that

are independent of bases or coordinate systems

  • Orthogonallity is useful in least-squares, or for

numerical stability when using floating-point

  • Core-Nilpotent is usually not a special case of URV
  • Not even necessarily when A is a square matrix
  • URVT is not a similarity unless U = V
  • Surprisingly, U = V defines a rather large

class of matrices

slide-83
SLIDE 83

Range Perpendicular to Nullspace

  • For rank(An×n) = r, the following are equivalent
  • Such matrices are called RPN matrices, or range-

symmetric matrices, or EP matrices

  • In the complex case, replace orthogonal by unitary

and conjugate the transposes

R(A) ⊥ N(A) R(A) = R(AT ) N(A) = N(AT ) A = U ✓Cr×r ◆ UT U orthogonal and C non-singular with

slide-84
SLIDE 84

A map of matrix types

RPN Normal Hermitian Real-Symmetric Non-singular

slide-85
SLIDE 85

SINGULAR VALUE DECOMPOSITION

slide-86
SLIDE 86

Can we specialize URV even more?

  • We can certainly make C triangular
  • Factor C = QR
  • Note that
  • So that
  • We now do even better, by making C diagonal
  • This leads to the Singular Value Decomposition
  • SVD is the Swiss-army knife of decompositions

✓C ◆ = ✓Q I ◆ ✓R ◆ A = U ✓Q I ◆ ✓R ◆ VT

slide-87
SLIDE 87

Derivation of SVD #1

  • We proceed incrementally
  • Multiplying by blocks, we have

✓xT

1

XT

1

◆ C1 y1 Y1

  • =

✓xT

1 C1

XT

1 C1

◆ y1 Y1

  • =

✓xT

1 C1y1

xT

1 C1Y1

XT

1 C1y1

XT

1 C1Y1

◆ = ✓σ1 C2 ◆ D1 = ✓σ1 C2 ◆ = PT

1 C1Q1

D2 = ✓σ2 C3 ◆ = PT

2 C2Q2

slide-88
SLIDE 88

Derivation of SVD #1

  • We proceed incrementally
  • Multiplying by blocks, we have
  • So we have two equations to satisfy
  • The first requires , and we know
  • Simply use Cy1 to define x1 and normalize

✓xT

1

XT

1

◆ C1 y1 Y1

  • =

✓xT

1 C1

XT

1 C1

◆ y1 Y1

  • =

✓xT

1 C1y1

xT

1 C1Y1

XT

1 C1y1

XT

1 C1Y1

◆ = ✓σ1 C2 ◆

C1y1 ⊥ X1 x1 ⊥ X1

) XT

1 C1y1 = XT 1 x1kC1y1k = 0

XT

1 C1y1 = 0

xT

1 C1Y1 = 0

and

D1 = ✓σ1 C2 ◆ = PT

1 C1Q1

D2 = ✓σ2 C3 ◆ = PT

2 C2Q2

x1 = C1y1 kC1y1k

slide-89
SLIDE 89

Derivation of SVD #2

  • Next requires
  • In other words
  • Problem solved if there is y1 such that
  • Same as
  • There is y1, and
  • Now that we can pick x1 and y1, complete X1 and Y1
  • Then set
  • Finally, note that

xT

1 C1Y1 = 0

CT

1 x1 ⊥ Y1

CT

1 C1y1 ⊥ Y1

CT

1 C1y1 = λy1

C2 = XT

1 C1Y1

xT

1 C1y1 = σ1 = kC1y1k

⇒ σ2

1 = λ1

y1 = arg max

ky1k=1

kC1y1k

CT

1 C1y1 = λ1y1 ) kC1y1k2 = λ1

(CT

1 C1 − λI)y1 = 0

slide-90
SLIDE 90

Derivation of SVD #3

  • Now that we have

set so that we get

  • After r – 1 steps

with P and Q orthogonal

  • To complete the proof, use

and

ˆ P2 = ✓1 P2 ◆ ˆ Q2 = ✓1 Q2 ◆ ˆ D2 = ˆ PT

2 PT 1 C1Q1 ˆ

Q2 =   σ1 σ2 C3   D1 = ✓σ1 C2 ◆ = PT

1 C1Q1

D2 = ✓σ2 C3 ◆ = PT

2 C2Q2

and

ˆ PT Cˆ Q = D = diag(σ1, σ2, . . . , σr)

ˆ P ˆ Q

UT AV = ✓C ◆ ✓ˆ PT I ◆ ✓C ◆ ✓ˆ Q I ◆ = ✓ˆ PT Dˆ Q ◆ = ✓D ◆

slide-91
SLIDE 91

The Singular Value Decomposition

  • For each A in Rm×n of rank r, there are orthogonal

matrices Um×m and Vn×n, and a diagonal matrix such that

  • The are called the nonzero singular values of A
  • When r < p = min(m,n), A is said to have p – r additional

zero singular values

  • This factorization is called the singular value decomposition of A
  • The columns of U and V are called left-hand and right-hand

singular vectors of A, respectively

D = diag(σ1, σ2, . . . , σr) with σ1 ≥ σ2 ≥ · · · ≥ σr > 0 σi

A = U ✓D ◆ VT

slide-92
SLIDE 92

Left and right singular vectors

  • The bases for R(AT ) in V and for R(A) in U

are chosen in a very special way

AV = U ✓D ◆ ⇒ Avi = σiui ⇒ uT

i A = σivT i

UT A = ✓D ◆ VT

slide-93
SLIDE 93

Image of Unit Sphere

  • For a nonsingular matrix A in Rn×n having singular values

and a SVD A = UDVT, where , the image of the unit 2-sphere is an ellipsoid whose kth semiaxis is given by

  • Furthermore, vk in unit sphere is such that

σ1 ≥ σ2 ≥ · · · ≥ σn D = diag(σ1, σ2, . . . , σn) σkuk Avk = σkuk

Av1 = σ1u1 Av2 = σ2u2 Av3 = σ3u3 v1 v2 v3

A

R(A) R(AT )

slide-94
SLIDE 94

Connection to ATA and AAT

  • The squared singular values are non-zero

eigenvalues of both ATA and of AAT

  • Furthermore, the vectors vi in V and ui in U are

corresponding eigenvectors of ATA and AAT

  • Proof

(AT A)n×n = V ✓D2 ◆ VT ⇒ AT A V = V ✓D2 ◆ ⇒ AT A vi = σ2

i vi

(AAT )m×m = U ✓D2 ◆ UT ⇒ AAT U = U ✓D2 ◆ ⇒ AAT ui = σ2

i ui

σ2

i

slide-95
SLIDE 95

Compact SVDs

  • When matrix Am×n has many more rows than

columns, n << m, it is better to use

  • Thin SVD:
  • Compute only first n columns in U and
  • We usually don’t care about the Nullspaces
  • When the rank(A) = r is much smaller than

the number of rows or columns

  • Economy/Compact SVD:
  • Compute and store only first r columns

A = UnΣnVT

Σ

A = UrΣrVT

r

slide-96
SLIDE 96

Dimensionality Reduction

  • Map points in high-dimensional space to points

in a lower-dimensional space

  • Try to preserve structure
  • Pairwise distances etc
  • Useful for further processing
  • Less computation, fewer parameters
  • Easier to understand, visualize
slide-97
SLIDE 97

Principal Component Analysis

  • Principal Component Analysis (PCA) is a method

for approximating a high-dimensional data set using a lower-dimensional affine space

Original axes * * * * * * * * * * * * * * * * * * * * * * * * Data points First principal component Second principal component

slide-98
SLIDE 98

PCA: Formal definition

  • Given a set of points in Rn,

can we find the m-dimensional subspace V of Rn that allows us to best represent the set D ?

  • For each xi, we want to find an yi such that
  • Given m, find V and a that minimize

D = {x1, x2, . . . , xN} yi = Pxi + a, Pxi ∈ V, a ∈ Rn ε = 1 N

N

X

i=1

kyi xik2 = 1 N

N

X

i=1

ka (I P)xik2

slide-99
SLIDE 99

PCA: Derivation #1

  • First, let us find the value of a
  • Using calculus,

∂ε ∂[a]k = 0 ε = 1 N

N

X

i=1

  • a − (I − P)xi

T a − (I − P)xi

  • = 1

N

N

X

i=1

aT a − 2aT (I − P)xi + xT (I − P)T (I − P)xi ⇒ 1 N

N

X

i=1

2eT

k a − 2eT k (I − P)xi = 0

⇒ 1 N

N

X

i=1

a = 1 N

N

X

i=1

(I − P)xi = (I − P)

N

X

i=1

xi N ⇒ a = (I − P)¯ x

slide-100
SLIDE 100

PCA: Derivation #2

  • Now that we know a, we can write
  • Take orthonormal bases for V and its orthogonal

complement and put in matrices V and U

ε = 1 N

N

X

i=1

kUUT (xi ¯ x)k2 = 1 N

N

X

i=1

kUT (xi ¯ x)k2 = 1 N

N

X

i=1 n−m

X

j=1

uT

j (xi − ¯

x)(xi − ¯ x)T uj ε = 1 N

N

X

i=1

k(I P)(xi ¯ x)k2 =

n−m

X

j=1

uT

j

1 N N X

i=1

(xi − ¯ x)(xi − ¯ x)T ! uj =

n−m

X

j=1

uT

j S uj

S is the covariance matrix, and it does not depend on U [X]∗j = xj − ¯ x S = 1 N

N

X

i=1

(xi − ¯ x)(xi − ¯ x)T S = 1 N XXT

slide-101
SLIDE 101

PCA: Derivation #3

  • Want to use Calculus again to find ui minimizing,
  • Unconstrained minimization gives ui = 0, so we use a

Lagrange multiplier to enforce

  • Differentiating
  • We simply pick the left singular vectors ui associated to

the n – m smallest singular values of X as a basis for

  • Pick the remaining singular vectors to form the basis of V
  • Each vi gives a principal component
  • The corresponding gives the associated importance

ε =

n−m

X

j=1

uT

j Suj

kuik2 = 1

¯ ε = ε −

n−m

X

j=1

λj(uT

j uj − 1)

∂¯ ε ∂[ui]k = 0 ⇒ Sui = λiui ⇒ XXT ui = λiui ¯ ε =

n−m

X

j=1

λj with

σi V⊥

slide-102
SLIDE 102

Comparing PCA to Least Squares

Least squares Principal component a v1

slide-103
SLIDE 103

Example uses of SVD

  • Image Compression
  • Look at image as a matrix
  • Store only truncated SVD decomposition
  • First few singular values/vectors should be enough
  • Recognition
  • Use PCA on training set
  • Project unknown face on first few principal components
  • Compare projection coordinates to other faces in database
slide-104
SLIDE 104

SVD for Image Compression

  • Look at an image as a matrix A and compute SVD
  • Truncate to first p singular values and vectors

˜ Am×n = (U)m×p(Σ)p×p(VT )p×n

1 (.5%) 2 (.8%) 5 (2%) 10 (4%) 15 (6%) 20 (8%) 25 (10%) 30 (12%) 50 (20%) 512×512

slide-105
SLIDE 105

PCA on Faces: “Eigenfaces”

Faces in one of 10 different training poses First 10 principal components Average face

slide-106
SLIDE 106

PCA on Faces: “Eigenfaces”

Faces in unseen pose

20 40 60 80 100 120 140 160 50 100 150 200 250 300

All principal values 5 All 10 15 20 25

slide-107
SLIDE 107

SVD connection to Matrix 2-norm

  • For a nonsingular matrix A in Rn×n having singular values

and a SVD A = UDVT, where , we have

  • The degree of distortion of the unit sphere transformed by

A is measured by the 2-norm condition number

  • Note that is a multiple of an orthogonal matrix

σ1 ≥ σ2 ≥ · · · ≥ σn D = diag(σ1, σ2, . . . , σn)

σ1 = kAv1k2 = max

kxk2=1 kAxk2 = kAk2

σn = kAvnk2 = min

kxk2=1 kAxk2 = 1/kA1k2

κ2 = σ1 σn = kAk2 kA−1k2 1 κ2 = 1 ⇔ A

slide-108
SLIDE 108

Uncertainty in Linear Systems

  • Many sources of uncertainty
  • Modeling errors: simplifying assumptions
  • Data collection: measurements have finite precision
  • Data entry: computers have finite precision
  • Round-off errors during computation
  • How can we evaluate the accuracy of a solution?
  • When A is known exactly, but b has uncertainty
  • Can we even check the answer we found?
slide-109
SLIDE 109

Uncertainty in b

  • Wanted Ax = b, ended up with
  • Compute relative uncertainty

in x from relative uncertainty in b

  • Conversely,
  • So

kx ˜ xk/kxk kb ˜ bk/kbk = kek/kbk

kx ˜ xk = kA−1(Ax A˜ x)k = kA−1(b ˜ b)k = kA−1ek  kA−1k kek kbk = kAxk  kAk kxk kek = kb ˜ bk = kA(x ˜ x)k  kAk kx ˜ xk kxk = kA−1bk  kA−1k kbk ) kx ˜ xk kxk  kAk kA−1k kek kbk = κ kek kbk ) kx ˜ xk kxk

  • kek

kAk kA−1k kbk = 1 κ kek kbk 1 κ kek kbk  kx ˜ xk kxk  κ kek kbk with κ = kAk kA−1k κ ≥ 1

A˜ x = b − e = ˜ b

slide-110
SLIDE 110

Ill-conditioning Example

A = ✓ .2315 −.3071 −.5525 .7387 ◆ A˜ x = b + e Ax = b

1 κ kek kbk  kx ˜ xk kxk  κ kek kbk b1 = ✓ .6 −.8 ◆ ˜ b1 = ✓ .6008 −.7994 ◆ e1 = ✓.0008 .0006 ◆ kx1 ˜ x1k kx1k = 1 x1 = ✓ .3846 −.9231 ◆ A−1b1 = ˜ x1 = ✓1.3077 −.5385 ◆ A−1˜ b1 = b2 = ✓.8 .6 ◆ e2 = ✓ .0006 −.0008 ◆ ˜ b2 = ✓.8006 .5992 ◆ kx2 ˜ x2k kx2k = 10−6 ˜ x2 = ✓923.0773 384.6145 ◆ A−1˜ b2 = x2 = ✓923.0769 384.6154 ◆ A−1b2 =

slide-111
SLIDE 111

Worst and Best Cases for Uncertainty

  • When κ is small, we have a well-conditioned system
  • Small relative uncertainty cannot have large impact on solution
  • When κ is large, we have an ill-conditioned system
  • Small relative uncertainty may or may not have large impact
  • Can we reach either side of the equality?
  • Things are easier to analyze when using the 2-norm
  • Pick e and b along left singular vectors of A
  • Since , we can reach both sides of the equation

1 κ kek kbk  kx ˜ xk kxk  κ kek kbk kx ˜ xk2 kxk2 = kA−1ek2 kA−1bk2 = kA−1✏uik2 kA−1ujk2 = k⇥vi/⇤ik2 kvj/⇤jk2 = j i |✏| || = σj σi kek2 kbk2 κ = σ1 σn

slide-112
SLIDE 112

Ill-conditioning Example Explained

A = ✓ .2315 −.3071 −.5525 .7387 ◆

b1 = ✓ .6 −.8 ◆ ˜ b1 = ✓ .6008 −.7994 ◆ e1 = ✓.0008 .0006 ◆ kx1 ˜ x1k kx1k = 1 x1 = ✓ .3846 −.9231 ◆ A−1b1 = ˜ x1 = ✓1.3077 −.5385 ◆ A−1˜ b1 = b2 = ✓.8 .6 ◆ e2 = ✓ .0006 −.0008 ◆ ˜ b2 = ✓.8006 .5992 ◆ kx2 ˜ x2k kx2k = 10−6 ˜ x2 = ✓923.0773 384.6145 ◆ A−1˜ b2 = x2 = ✓923.0769 384.6154 ◆ A−1b2 = = 1 5 ✓ 3 4 −4 3 ◆ ✓1 10−3 ◆ 1 13 ✓ 5 12 −12 5 ◆T κ = σ1 σ2 = 103 = 10−3 5 ✓4 3 ◆ = 1 5 ✓ 3 −4 ◆ = κ ke1k kb1k = 10−3 5 ✓ 3 −4 ◆ = 1 5 ✓4 3 ◆ = 1 κ ke2k kb2k

slide-113
SLIDE 113

Checking the solution

  • Assume we found a solution for
  • Compute the residual
  • If r = 0 exactly, then must be the exact solution
  • What about when r is only close to zero (say, with t-digits)?
  • Can we say anything about how accurate is?
  • We have already answered this question
  • Set so that r = e in our previous expression
  • Residuals are an indication of solution accuracy only

when the matrix is well conditioned ˜ x Ax = b

˜ x r = b − A˜ x ˜ x A˜ x = b − r

1 κ krk kbk  kx ˜ xk kxk  κ krk kbk