 
              Proof • Let ✓ P 1 ◆ ✓ U r ◆ P − 1 = � Q 1 � P = U = Q 2 PA = U P 2 0 P − 1 P = I ⇒ Q 1 P 1 + Q 2 P 2 = I ⇒ Q 1 P 1 = I − Q 2 P 2 • Proof ( ⇒ ) ✓ P 1 ◆ ✓ P 1 A ◆ ✓ U r ◆ PA = A = = U ⇒ P 2 A = 0 = P 2 P 2 A 0 • Proof ( ⇐ ) y T A = 0 ⇒ y T = u T P 2 � ✓ U r ◆ y T A = 0 ⇒ y T P − 1 U = 0 ⇒ y T � Q 1 = 0 ⇒ y T Q 1 U r = 0 Q 2 0 ⇒ y T Q 1 P 1 = 0 ⇒ y T ( I − Q 2 P 2 ) = 0 ⇒ y T Q 1 = 0 ⇒ y T = y T Q 2 P 2 ⇒ y T = ( y T Q 2 ) P 2
Example • Using Gauss-Jordan • From which     1 / 3   1 2 2 3 1 0 0   N ( A T ) = span − 5 / 3 2 4 1 3 0 1 0     1   3 6 1 4 0 0 1 • Find E A   − 1 / 3 2 / 3 1 2 2 3 0 2 / 3 − 1 / 3 0 0 1 1 0   1 / 3 − 5 / 3 0 0 0 0 1 • So   − 1 / 3 2 / 3 0 2 / 3 − 1 / 3 P = 0   1 / 3 − 5 / 3 1
Additional insights • We have shown that N ( A T ) = R ( P T 2 ) • It turns out that R ( A ) = N ( P 2 ) • Proof R ( A ) ⊆ N ( P 2 ) y = Ax ⇒ P 2 y = 0 P 2 y = P 2 Ax = ( P 2 A ) x = 0x = 0 R ( A ) ⊇ N ( P 2 ) P 2 y = 0 ⇒ ∃ x | Ax = y P ( A | y ) = ( PA | Py ) ✓ P 1 A ◆ ✓ U r ◆ ✓ P 1 y ◆ ✓ P 1 y ◆ PA = Py = = = P 2 A 0 P 2 y 0 ✓ U r ◆ P 1 y P ( A | y ) = 0 0
Equal Nullspaces #1 • We already know how to test for equality in range spaces: row and column equivalence • How do we test for Nullspace equality? • Use equivalence again • For two matrices A and B of the same shape row N ( A ) = N ( B ) ⇔ A ∼ B • Similarly, N ( A T ) = N ( B T ) ⇔ A col ∼ B
Equal Nullspaces #2 • Let’s prove one of them N ( A T ) = N ( B T ) ⇔ A col ∼ B • Proof ( ⇒ ) N ( A T ) = N ( B T ) ⇒ R ( A ) = R ( B ) ⇔ A col ∼ B y T A = 0 ⇔ y T B = 0 ⇒ z = Ax 1 ⇔ z = Bx 2 ✓ P 1 A ◆ P 1 Bx 2 ( A | Bx 2 ) → ( PA | PBx 2 ) → P 2 A P 2 Bx 2 ✓ P 1 A ◆ ✓ P 1 A ◆ P 1 Bx 2 P 1 Bx 2 → → 0 P 2 Bx 2 0 0
Equal Nullspaces #3 • Let’s prove one of them N ( A T ) = N ( B T ) ⇔ A col ∼ B • Proof ( ⇐ ) A = BQ P 2 A = 0 ⇒ P 2 BQ = 0 ⇒ P 2 B = 0 ⇒ N ( A T ) ⊆ N ( B T ) • Conversely, N ( B T ) ⊆ N ( A T ) (replace A and B in proof)
Summary #1 • The four fundamental subspaces associated to a matrix A m × n are • The range or column space R ( A ) = { Ax } ⊆ R m • The row-space or left-hand range R ( A T ) = { yA } ⊆ R n • The nullspace N ( A ) = { x | Ax = 0 } ⊆ R n • The left-hand nullspace N ( A T ) = { y | yA = 0 } ⊆ R m
Summary #2 • Let P be a nonsingular matrix such that PA = U , where U is in echelon form and let rank ( A ) = r • Spanning sets for • R ( A ) : Basic columns of A • R ( A T ) : Non-zero rows in U (transposed) • N ( A ) : The h i in the general solution of Ax = 0 • N ( A T ) : The last m – r rows in P (transposed)
Summary #3 • If A and B are matrices of the same shape ∼ B ⇔ R ( A ) = R ( B ) ⇔ N ( A T ) = N ( B T ) col A ∼ B ⇔ N ( A ) = N ( B ) ⇔ R ( A T ) = R ( B T ) row A
LINEAR INDEPENDENCE, BASIS, AND DIMENSION
Linear independence • Matrix dimensions give an incomplete picture of the true size of a linear system • The important number is the rank • Number of pivots • Number of non-zero rows in echelon form • Better interpretation • Number of genuinely independent rows in matrix • Other rows are redundant
Formally • Take a set of vectors S = { v 1 , v 2 , . . . , v r } • Look at linear combinations α 1 v 1 + α 2 v 2 + · · · + α r v r • Vectors v i are linearly independent (l.i.) iff the only linear combination that produces 0 is trivial α i = 0 • Otherwise they are linearly dependent (l.d.) • One of them is a linear combination of the others
Easy to visualize in R 3 • 2 vectors are dependent if they lie on a line • 3 vectors are dependent if they lie on a plane • Or line • 4 vectors are always dependent • 3 random vectors should be independent
Example • Determine if the set of • I.e., non-trivial solution vectors is l.i. to the homogeneous linear system         1 1 5   S = 2 0 6  ,  ,           1 1 5 0 α 1 1 2 7    = 2 0 6 0 α 2      • Look for a non-trivial 1 2 7 0 α 3 solution to • From Gauss-Jordan   1 0 3         1 1 5 0 E A = 0 1 2  + α 2  + α 3  = 2 0 6 0   α 1      0 0 0 1 2 7 0 • So, they are l.d. and e.g. α 1 = − 3 α 2 = − 2 α 3 = 1
Linear independence and Matrices • Let A be an m × n matrix. • These are equivalent to saying the columns of A form a linearly independent set • N ( A ) = { 0 } rank ( A ) = n • These are equivalent to saying the rows of A form a linearly independent set • N ( A T ) = { 0 } rank ( A ) = m • If A is square, these are equivalent to saying matrix A is non-singular • Columns of A form a linearly independent set • Rows of A form a linearly independent set
Diagonal dominance • An n × n matrix A = [ a ij ] is diagonally dominant whenever n X | a kk | > | a kj | , k ∈ { 1 , 2 , . . . , n } j =1 j 6 = k • I.e., diagonal elements are larger in magnitude than the sum of magnitudes of other row elements • These matrices appear frequently in practical applications. Two important properties are • They are never singular • Don’t need to use partial pivoting
Diagonal dominance • Diagonally dominant matrices are non-singular • Proof by contradiction • Assume there is a non-zero vector in N ( A ) • Find a contradiction • Let Ax = 0 , and let x k be the entry of largest magnitude in x n n X X [ Ax ] k = 0 = ⇒ a kk x k = − a kj x j a kj x j j =1 j =1 j 6 = k n n n n � � X X X X ⇒ | a kk || x k | = | a kj || x j | ≤ | x k | | a kj | ⇒ | a kk | ≤ | a kj | � ≤ � � a kj x j � j =1 j =1 j =1 j =1 j 6 = k j 6 = k j 6 = k j 6 = k
Polynomial interpolation • Given a set of m points S = { ( x 1 , y 1 ) , . . . , ( x m , y m ) } where x i are distinct, there is a unique polynomial ⇥ ( t ) = � 0 + � 1 t + · · · + � m − 1 t m − 1 of degree m – 1 that goes through each point in S 1 + · · · � m − 1 x m − 1 � 0 + � 1 x 1 + � 2 x 2 = ⇥ ( x 1 ) = y 1 1 2 + · · · � m − 1 x m − 1 � 0 + � 1 x 2 + � 2 x 2 = ⇥ ( x 2 ) = y 2 2 . . . � 0 + � 1 x m + � 2 x 2 m + · · · � m − 1 x m − 1 = ⇥ ( x m ) = y m m
Polynomial interpolation • Same as saying the following system has a unique solution for any right-hand side y i x m − 1  x 2      1 x 1 α 0 y 1 · · · 1 1 x m − 1 x 2 1 x 2 α 1 y 2  · · ·      2 2  =       . . . . . . . . . . . .       . . . . . .      x 2 x m − 1 1 α m − 1 y m x m · · · m m • Matrix is non-singular whenever x i are distinct? • Such matrices are called Vandermonde Matrices
Vandermonde Matrices • Vandermonde matrices have independent columns whenever n ≤ m x n − 1      x 2  0 1 α 0 x 1 · · · 1 1 x n − 1 x 2 0 1 α 1 x 2      · · ·  2 2  =       . . . . . . . . . . . .       . . . . . .      x 2 x n − 1 0 1 α n − 1 x m · · · m m • Proof i + · · · + α n − 1 x n − 1 p ( x i ) = α 0 + α 1 x i + α 2 x 2 = 0 i • So p ( x ) has m distinct roots and degree n – 1 ? • Fundamental theorem of of algebra implies α j = 0
Lagrange interpolator • In particular, when n = m we have that x m − 1  x 2      1 x 1 α 0 y 1 · · · 1 1 x m − 1 x 2 1 α 1 y 2 x 2 · · ·       2 2  =       . . . . . . . . . . . .       . . . . . .      x 2 x m − 1 1 α m − 1 y m x m · · · m m has a unique solution • The solution is the Lagrange interpolator Q m ! m j 6 = i ( t − x j ) X � ( t ) = y i Q m j 6 = i ( x i − x j ) i =1
Example of interpolation H L H L 8 25 6 20 15 4 10 2 5 x x 2 4 6 8 2 4 6 8 P 4 H x L P 5 H x L 6 6 4 4 2 2 x x 2 4 6 8 2 4 6 8 - 2 - 2 - 4 - 4 6 6
Maximal independent subsets #1 • We know that if rank ( A m × n ) < n then the columns of A must be a dependent set • In such cases, we often want to extract the maximal independent subset of columns • An l.i. set with as many columns of A as possible • Such columns are sufficient to span R ( A )
Maximal independent subsets #2 • If rank ( A m × n ) = r , then the following hold • Any maximal independent subset of the columns of A contain exactly r columns • Any maximal independent subset of rows from A contain exactly r rows • In particular, the r basic columns in A constitute a maximal independent subset of the columns of A
Maximal independent subsets #3 • Any maximal independent subset of the columns of A contain exactly r columns • Proof • Every column of matrix A can be written as a linear combination of the basic columns in A • Pick k > r columns of A and show they are l.d. 0 1 α 1 α 2 B C � A ∗ s 1 � A ∗ s 2 A ∗ s k A = 0 B C · · · . B C . . 0 1 0 1 @ β 11 β 12 β 1 k α 1 · · · α k β 21 β 22 β 2 k α 2 · · · B C B C � A ∗ b 1 � A ∗ b 2 A ∗ b r A = 0 B C B C · · · . . . . ... B C B C . . . . . . . . @ A @ β r 1 β r 2 β rk α k · · ·
Basic facts about Independence • The following hold about a set of vectors in V S = { u 1 , u 2 , . . . , u n } • If S contains an l.d. subset, then S itself must be l.d. • If S is l.i., then every subset of S is also l.i. • If S is l.i. and , then is l.i. iff S ∪ { v } v ∈ V v ⇥� span ( S ) • Proof ( ) ⇐ α 1 u 1 + α 2 u 2 + · · · + α n u n + α n +1 v = 0 ⇒ α n +1 = 0 ⇒ α 1 u 1 + α 2 u 2 + · · · + α n u n = 0 ⇒ α 1 = α 2 = · · · = α n = 0 • If and n > m , then S is l.d. S ⊆ R m
BASIS AND DIMENSION
Bases • A basis for a vector space V is a set S that • Spans V • Is linearly independent • Spanning sets can contain redundant vectors • Bases, on the other hand, contain only necessary and sufficient information • Every vector space V has a basis • General proof depends on the axiom of choice • Bases are not unique
Examples • The unit vectors in R n are a S = { e 1 , e 2 , . . . , e n } basis for R n . The standard or canonic basis of R n • If A is an n × n non-singular matrix, then the set of rows and the set of columns of A each constitute a basis of R n • What about ? Z = { 0 } • The set is a basis for all S = { 1 , x, x 2 , . . . , x n } polynomials of degree n or less • What about the vector space of all polynomials?
Characterizations of a Basis • With V a subspace of R m and B = { b 1 , b 2 , . . . , b n } the following are equivalent • (1) B is a basis for V • (2) B is a minimal spanning set for V • (3) B is a maximal l.i. subset of V
Proof #1 • (basis) ⇒ (minimal spanning set) • Assume B is a basis and X is a smaller spanning set 0 1 α 11 α 12 α 1 n · · · α 21 α 22 α 2 n · · · B C � b 1 � � x 1 � k < n b 2 b n x 2 x k B C = · · · · · · . . . ... B C . . . . . . @ A B = XA α k 1 α k 2 α kn · · · rank ( A ) ≤ k < n � ⇤ y ⇥ = 0 | Ay = 0 ⇒ By = 0 ⇒ B is l.d. (contradiction) • (minimal spanning set) ⇒ (basis) • A minimal spanning set must be l.i. • Otherwise remove an l.d. vector and reduce size • So it wasn’t minimal (contradiction)
Proof #2 • (maximal l.i. set) ⇒ (basis) • If a maximal l.i. set B of V is not a basis for V then there is v � V | v ⇥� span ( B ) • So is l.i. and B is not maximal (contradiction) B ∪ { v } • (basis) ⇒ (maximal l.i. set) • If basis B is not maximal l.i., then take a larger set Y that is maximal l.i. • We know Y is a basis • But a basis is minimal and B is smaller • So B must also be maximal
Dimension • We have just proven that, although there are many bases for V , each of them has the same number of vectors • The dimension of a space V , dim V , is the number of vectors in • Any basis of V • Any minimal spanning set for V • Any maximal independent set for V
Examples , then dim Z = 0 • Z = { 0 } • The basis is the empty set • L is a line through the origin in R 3 , dim L = 1 • Any non-zero vector along L forms a basis for L • P is a plane through the origin in R 3 , dim P = 2 • How would we find a basis? • dim R n = n • The canonic vectors form a basis
Further insights • Dimension measures the “amount of stuff” in a subspace • Point < Line < Plane < R 3 • Also measures the number of degrees of freedom in the subspace • Z : no freedom, Line: 1 degree, Plane: 2 etc • Do not confuse with number of components in a vector! Related, but not equal!
Subspace dimension • Let M and N be vectors spaces and M ⊆ N • (1) dim M ≤ dim N • (2) If dim M = dim N , then M = N • Proof • (1) Assume dim M > dim N • Basis of M (all l.i. elements of N ) would have more vectors than the maximum independent set of N • (2) Assume M ⊂ N • Augment basis of M with v ∈ N \ M • Independent set with more than dim N vectors!
Four Fundamental Subspaces: Dimension • For an m × n matrix A with rank ( A ) = r • dim R ( A ) = r • dim N ( A ) = n – r • dim R ( A T ) = r • dim N ( A T ) = m – r
Rank Plus Nullity Theorem • For all m × n matrices A dim R ( A ) + dim N ( A ) = n • As the “amount of stuff” in R ( A ) grows, the “amount of stuff” in N ( A ) shrinks • ( dim N ( A ) was traditionally known as nullity)
Completing a Basis • If is an l.i. subset of an S r = { v 1 , v 2 , . . . , v r } n -dimensional space V , where r < n, show how to extend S r with so that { v r +1 , . . . , v n } S n = { v 1 , . . . , v r , v r +1 , . . . , v n } forms a basis for V • Solution • Create a matrix A with S r as columns • Augment to by the identity matrix ( A | I ) • Reduce to echelon form to find basic columns • Return the n basic columns of ( A | I )
Example • Take two l.i. vectors in R 4 and augment to a complete basis for R 4       1 0     0 0       S 2 =     − 1  , 1        2 − 2   • Solution     1 0 1 0 0 0 1 0 1 0 0 0 0 0 0 1 0 0 − 1 / 2 0 1 1 0 0     ( A | I ) = E ( A | I ) =     − 1 1 0 0 1 0 0 0 0 1 0 0     2 − 2 0 0 0 1 1 / 2 0 0 0 0 1           1 0 0 0     0 0 1 0           S 4 =          ,  ,  , − 1 1 0 1          2 − 2 0 0  
Graphs • A graph G is defined by is a pair ( V , E ), where V is a set of vertices , and E a set of edges • Each edge connects two vertices • So E ⊆ V × V v 1 e 1 = ( v 2 , v 1 ) e 1 e 2 e 5 e 2 = ( v 1 , v 4 ) e 4 v 2 v 4 . . e 3 e 6 . v 3
Incidence Matrices • For a graph G with m vertices and n edges • Associate an m × n matrix E such that  1 , e j = ( ∗ , v i )   [ E ] ij = − 1 , e j = ( v i , ∗ )  0 , otherwise  v 1 e 1 e 2 e 5 e 1 e 2 e 3 e 4 e 5 e 6 e 4   1 − 1 0 0 − 1 0 v 1 v 2 v 4 − 1 0 − 1 1 0 0 v 2   E =   0 0 1 0 1 1 v 3 e 3 e 6   0 1 0 − 1 0 − 1 v 4 v 3
Rank and Connectivity • Each edge is associated to two vertices • Each column contains two entries ( 1 , and – 1 ) • All columns add up to zero • In other words, if e T = (1 1 … 1) , e ∈ N ( E T ) then e T E = 0 and therefore • So rank ( E ) = rank ( E T ) = m − dim N ( E T ) ≤ m − 1 • Equality holds iff the graph is connected! • I.e., when there is a sequence of edges connecting any pair of vertices
Proof of Rank and Connectivity #1 • Proof ( ⇒ ) • Assume G is connected, prove dim N ( E T ) = 1 • I.e., prove e = (1 1 … 1) T spans N ( E T ) • Let and take any x i and x k from x x ∈ N ( E T ) • There is a path from v i to v k • Take the subset of vertices visited along the way { v j 1 = v i , v j 2 , . . . , v j r = v k } • There is an edge q linking and v j p v j p +1
Proof of Rank and Connectivity #2 • There is an edge q linking and v j p v j p +1 • So column q in E is - 1 at row j p and 1 at row j p + 1 x T E = 0 x T E ∗ q = 0 = x j p +1 − x j p • But and so • Since this is true for all p , it turns out x i = x k • But i and k were arbitrary • And so finally we reach x = α e • So dim N ( E T ) = 1 • Which leads to rank ( E ) = m – 1
Proof of Rank and Connectivity #3 • Proof ( ⇐ ) • If the graph is not connected, we can partition it into two disconnected subgraphs G 1 and G 2 • Reorder vertices so vertices/edges in G 1 appear before vertices/edges of G 2 in E . ✓ E 1 ◆ 0 E = 0 E 2 • Now compute the rank ✓ E 1 ◆ 0 rank ( E ) = rank 0 E 2 ≤ ( m 1 − 1) + ( m 2 − 1) = rank ( E 1 ) + rank ( E 2 ) = m − 2
Application of Rank and Connectivity • Nodes 1 : I 1 − I 2 − I 5 = 0 2 : − I 1 − I 3 + I 4 = 0 3 : I 3 + I 5 + I 6 = 0 4 : I 2 − I 4 − I 6 = 0 • Loops A : I 1 R 1 − I 3 R 3 + I 5 R 5 = E 1 − E 3 B : I 2 R 2 − I 5 R 5 + I 6 R 6 = E 2 C : I 3 R 3 + I 4 R 4 − I 6 R 6 = E 3 + E 4
Rank of a product • Equivalent matrices have the same rank • Recall the rank normal form • Multiplication by invertible matrices preserves rank • Multiplication by rectangular or singular matrices can reduce the rank • If A is m × n and B is n × p then rank ( AB ) = rank ( B ) − dim N ( A ) ∩ R ( B )
Proof #1 • Start with a basis for N ( A ) ∩ R ( B ) S = { x 1 , x 2 , . . . , x s } • Augment to form a basis for R ( B ) S ext = { x 1 , . . . , x s , z 1 , . . . , z t } • Let us prove that dim R ( AB ) = t , so that rank ( AB ) = rank ( B ) − dim N ( A ) ∩ R ( B ) • Sufficient to prove that T = { Az 1 , . . . , Az t } is a basis for R ( AB )
Proof #2 • T spans R ( AB ) b ∈ R ( AB ) ⇒ b = ABy X X By ∈ R ( B ) ⇒ By = ξ i x i + η i z i � X � X X � � b = A ξ i x i + A η i z i η i Az i = • T is l.i. X � X � α i Az i = 0 ⇒ A α i z i = 0 X X X α i z i ∈ N ( A ) ∩ R ( B ) α i z i = β i x i ⇒ ⇒ X X α i z i − β i x i = 0 ⇒ α i = β i = 0 ⇒
Small perturbations can’t reduce rank • We already know that we can’t increase rank by means of matrix product rank ( AB ) ≤ rank ( B ) • We now show it is impossible to reduce rank by adding a matrix that is “small enough” rank ( A + E ) ≥ rank ( A ) • “Small” in a sense that will be clarified later, but for now here is some intuition
Proof • Suppose rank ( A ) = r and let P and Q reduce A to rank normal form • Apply P and Q to A + E ✓ I r ◆ ✓ E 11 ◆ ✓ I r + E 11 ◆ 0 E 12 E 12 PAQ = P ( A + E ) Q = PEQ = 0 0 E 21 E 22 E 21 E 22 • But I r + E 11 is invertible. Keep eliminating ✓ I r + E 11 ◆ 0 P 2 P ( A + E ) QQ 2 = 0 S • From which rank ( A + E ) = rank ( A ) + rank ( S ) ≥ rank ( A )
Pitfall solving singular systems • Due to floating-point precision, we do not really solve Ax = b • We solve some perturbed system ( A + E ) x = b • If A is non-singular, so is A + E and we are fine • If A is singular, A + E may have higher rank! • All we need is for rank ( S ) > 0! • But S = E 22 − E 21 ( I + E 11 ) − 1 E 12 • So fewer free variables than actual system • Significant loss of information
Products A T A and AA T • For A in R m × n , the following statements hold • rank ( A T A ) = rank ( A ) = rank ( AA T ) • R ( A T A ) = R ( A T ) and R ( AA T ) = R ( A ) and N ( AA T ) = N ( A T ) • N ( A T A ) = N ( A ) • For A in C m × n , replace transposition by conjugate transpose operation
Proof #1 • rank ( A T A ) = rank ( A ) • We know that rank ( A T A ) = rank ( A ) − dim N ( A T ) ∩ R ( A ) • So prove N ( A T ) ∩ R ( A ) = { 0 } x ∈ N ( A T ) ∩ R ( A ) ⇒ A T x = 0 x = Ay ⇒ A T Ay = 0 ⇒ y T A T Ay = 0 X ⇒ x T x = 0 x 2 i = 0 ⇒ x = 0 ⇒
Proof #2 • R ( A T A ) = R ( A T ) R ( BC ) ⊆ R ( B ) ⇒ R ( A T A ) ⊆ R ( A T ) dim R ( A T A ) = rank ( A T A ) = rank ( A ) = rank ( A T ) = dim R ( A T ) • N ( A T A ) = N ( A ) N ( B ) ⊆ N ( CB ) ⇒ N ( A ) ⊆ N ( A T A ) dim N ( A ) = n − rank ( A ) = n − rank ( A T A ) = dim N ( A T A )
Application for A T A • Consider an m × n system Ax = b that may or may not be consistent • Multiply on the left by A T to reach A T Ax = A T b • This is known as the associated system of normal equations • It has many nice properties
Application for A T A • A T Ax = A T b is always consistent! A T b ∈ R ( A T ) = R ( A T A ) • If Ax = b is consistent, then both systems have the same solution set • Take a particular solution p for Ax = b • If Ap = b , then A T Ap = A T b • General solution is p + N ( A ) = p + N ( A T A ) • If Ax = b has a unique solution, then it is • x = ( A T A ) -1 A T b N ( A ) = 0 = N ( A T A ) (warning: A may not even be square, so not invertible!)
Normal equations • For an m × n system Ax = b , the associated system of normal equations is the m × n system A T Ax = A T b • A T Ax = A T b is always consistent, even when Ax = b is not • When both are consistent, the solution sets agree • Otherwise, A T Ax = A T b gives the least-squares solution to Ax = b • When Ax = b is consistent and has a unique solution, so does A T Ax = A T b and the solution x = ( A T A ) -1 A T b
LEAST SQUARES
Motivating problem • Assume we observe a phenomenon that varies with time and record observations � D = ( t 1 , b 1 ) , ( t 2 , b 2 ) , . . . , ( t m , b m ) • Want to be able to infer the value of an observation at an arbitrary point in time t ) = ˆ f (ˆ b • Assume we have a sensible model for f e.g. f ( t ) = α + β t • Find “good” values for and given D β α
Proposed solution • Want to find “best” f ( t ) = α + β t • Find values for and β α that minimize m m � 2 X X ε 2 � i = f ( t i ) − b i i =1 i =1 • Turns out this reduces to a linear problem • Let us express in vector form and generalize
Changing to vector form • In our example, define     b 1 1 t 1 1 b 2 t 2 ✓ α ◆     ε = Ax − b A = b =     . . x = . β . .     . . . .     1 b m t m • Then and [ ε ] i = α + β t i − b i = ε i m i = ε T ε = ( Ax − b ) T ( Ax − b ) X ε 2 i =1 = x T A T Ax − x T A T b − b T Ax + b T b = x T A T Ax − 2 x T A T b + b T b
The minimization problem • Our goal is to find where the scalar arg min ε ( x ) function x ε ( x ) = x T A T Ax − 2 x T A T b + b T b • From calculus, at the minimum r ε ( x ) = 0 i = ∂ε ( x ) ⇥ ⇤ r ε ( x ) ∂ x i • Both Ax and x T A T can be seen as matrix functions of each x i • We can use our rules for differentiation of matrix functions
Finding the minimum • Differentiating ε ( x ) = x T A T Ax − 2 x T A T b + b T b w.r.t. each component in x we get T T i = ∂ε ( x ) = ∂ x A T Ax + x T A T A ∂ x − 2 ∂ x A T b ⇥ ⇤ r ε ( x ) ∂ x i ∂ x i ∂ x i ∂ x i ∂ x • Since and since e T i A T = A T ⇤ ⇥ = e i i ∗ ∂ x i i A T Ax + x T A T Ae i � 2 e T i A T b i A T Ax − 2 e T i A T b i = e T = 2 e T ⇥ ⇤ r ε ( x ) A T Ax A T b ⇥ ⇤ ⇥ ⇤ = 2 i ∗ − 2 i ∗ • Equating to zero and grouping all rows A T Ax = A T b
Is there a favorite solution? • Calculus tells us that the minimum of ε ( x ) can only happen at some solution of the normal equations A T Ax = A T b • Are all solutions equally good? ε ( x ) = x T A T Ax − 2 x T A T b + b T b • Take any two solutions z 1 and z 2 = z 1 + u ε ( z 1 ) = b T b − z T 1 A T b ε ( z 2 ) = ε ( z 1 + u ) = ε ( z 1 ) + u T A T Au = ε ( z 1 ) • Same argument proves no other vector can produce a lower value for ε ( x )
General Least Squares • For A in R m × n and b in R m , let ε = Ax − b • The general least squares problem is to find a vector x that minimizes the quantity m i = ε T ε = ( Ax − b ) T ( Ax − b ) X ε 2 i =1 • Any such vector is a least-squares solution • The solution set is the same as of A T Ax = A T b • Unique only iff rank ( A ) = n , in which case x = ( A T A ) -1 A T b • If Ax = b is consistent, solution sets are the same
Example of Linear Regression • Predict amount of weight that a pint of ice- cream loses when stored at low temperatures • Assume a linear model for phenomenon y = α 0 + α 1 t 1 + α 2 t 2 + ε t 1 (time), t 2 (temperature), (random noise) ε • Assume random noise “averages out” • Use measurements to find least-squares solution for parameters in E ( t 1 , t 2 ) = α 0 + α 1 t 1 + α 2 t 2
Result of experiments • Assume the following measurements Time (weeks) 1 1 1 2 2 2 3 3 3 Temp ( o C) -10 -5 0 -10 -5 0 -10 -5 0 Loss (grams) 0.15 0.18 0.2 0.17 0.19 0.22 0.2 0.23 0.25 • In vector form, we get   0 . 15   1 1 − 10 0 . 18 1 1 − 5       0 . 2   1 1 0         0 . 17   α 0 1 2 − 10       b = 0 . 19   α 1 A = x = 1 2 − 5         0 . 22   α 2 1 2 0       0 . 2   1 3 − 10       0 . 23   1 3 − 5     0 . 25 1 3 − 0
Recommend
More recommend