lgebra linear e aplica es vector spaces avoid
play

lgebra Linear e Aplicaes VECTOR SPACES Avoid rediscovering the - PowerPoint PPT Presentation

lgebra Linear e Aplicaes VECTOR SPACES Avoid rediscovering the wheel Many mathematical objects that seem to have nothing in common with matrices do in fact share very similar properties Points in plane, in 3-space, polynomials,


  1. Proof • Let ✓ P 1 ◆ ✓ U r ◆ P − 1 = � Q 1 � P = U = Q 2 PA = U P 2 0 P − 1 P = I ⇒ Q 1 P 1 + Q 2 P 2 = I ⇒ Q 1 P 1 = I − Q 2 P 2 • Proof ( ⇒ ) ✓ P 1 ◆ ✓ P 1 A ◆ ✓ U r ◆ PA = A = = U ⇒ P 2 A = 0 = P 2 P 2 A 0 • Proof ( ⇐ ) y T A = 0 ⇒ y T = u T P 2 � ✓ U r ◆ y T A = 0 ⇒ y T P − 1 U = 0 ⇒ y T � Q 1 = 0 ⇒ y T Q 1 U r = 0 Q 2 0 ⇒ y T Q 1 P 1 = 0 ⇒ y T ( I − Q 2 P 2 ) = 0 ⇒ y T Q 1 = 0 ⇒ y T = y T Q 2 P 2 ⇒ y T = ( y T Q 2 ) P 2

  2. Example • Using Gauss-Jordan • From which     1 / 3   1 2 2 3 1 0 0   N ( A T ) = span − 5 / 3 2 4 1 3 0 1 0     1   3 6 1 4 0 0 1 • Find E A   − 1 / 3 2 / 3 1 2 2 3 0 2 / 3 − 1 / 3 0 0 1 1 0   1 / 3 − 5 / 3 0 0 0 0 1 • So   − 1 / 3 2 / 3 0 2 / 3 − 1 / 3 P = 0   1 / 3 − 5 / 3 1

  3. Additional insights • We have shown that N ( A T ) = R ( P T 2 ) • It turns out that R ( A ) = N ( P 2 ) • Proof R ( A ) ⊆ N ( P 2 ) y = Ax ⇒ P 2 y = 0 P 2 y = P 2 Ax = ( P 2 A ) x = 0x = 0 R ( A ) ⊇ N ( P 2 ) P 2 y = 0 ⇒ ∃ x | Ax = y P ( A | y ) = ( PA | Py ) ✓ P 1 A ◆ ✓ U r ◆ ✓ P 1 y ◆ ✓ P 1 y ◆ PA = Py = = = P 2 A 0 P 2 y 0 ✓ U r ◆ P 1 y P ( A | y ) = 0 0

  4. Equal Nullspaces #1 • We already know how to test for equality in range spaces: row and column equivalence • How do we test for Nullspace equality? • Use equivalence again • For two matrices A and B of the same shape row N ( A ) = N ( B ) ⇔ A ∼ B • Similarly, N ( A T ) = N ( B T ) ⇔ A col ∼ B

  5. Equal Nullspaces #2 • Let’s prove one of them N ( A T ) = N ( B T ) ⇔ A col ∼ B • Proof ( ⇒ ) N ( A T ) = N ( B T ) ⇒ R ( A ) = R ( B ) ⇔ A col ∼ B y T A = 0 ⇔ y T B = 0 ⇒ z = Ax 1 ⇔ z = Bx 2 ✓ P 1 A ◆ P 1 Bx 2 ( A | Bx 2 ) → ( PA | PBx 2 ) → P 2 A P 2 Bx 2 ✓ P 1 A ◆ ✓ P 1 A ◆ P 1 Bx 2 P 1 Bx 2 → → 0 P 2 Bx 2 0 0

  6. Equal Nullspaces #3 • Let’s prove one of them N ( A T ) = N ( B T ) ⇔ A col ∼ B • Proof ( ⇐ ) A = BQ P 2 A = 0 ⇒ P 2 BQ = 0 ⇒ P 2 B = 0 ⇒ N ( A T ) ⊆ N ( B T ) • Conversely, N ( B T ) ⊆ N ( A T ) (replace A and B in proof)

  7. Summary #1 • The four fundamental subspaces associated to a matrix A m × n are • The range or column space R ( A ) = { Ax } ⊆ R m • The row-space or left-hand range R ( A T ) = { yA } ⊆ R n • The nullspace N ( A ) = { x | Ax = 0 } ⊆ R n • The left-hand nullspace N ( A T ) = { y | yA = 0 } ⊆ R m

  8. Summary #2 • Let P be a nonsingular matrix such that PA = U , where U is in echelon form and let rank ( A ) = r • Spanning sets for • R ( A ) : Basic columns of A • R ( A T ) : Non-zero rows in U (transposed) • N ( A ) : The h i in the general solution of Ax = 0 • N ( A T ) : The last m – r rows in P (transposed)

  9. Summary #3 • If A and B are matrices of the same shape ∼ B ⇔ R ( A ) = R ( B ) ⇔ N ( A T ) = N ( B T ) col A ∼ B ⇔ N ( A ) = N ( B ) ⇔ R ( A T ) = R ( B T ) row A

  10. LINEAR INDEPENDENCE, BASIS, AND DIMENSION

  11. Linear independence • Matrix dimensions give an incomplete picture of the true size of a linear system • The important number is the rank • Number of pivots • Number of non-zero rows in echelon form • Better interpretation • Number of genuinely independent rows in matrix • Other rows are redundant

  12. Formally • Take a set of vectors S = { v 1 , v 2 , . . . , v r } • Look at linear combinations α 1 v 1 + α 2 v 2 + · · · + α r v r • Vectors v i are linearly independent (l.i.) iff the only linear combination that produces 0 is trivial α i = 0 • Otherwise they are linearly dependent (l.d.) • One of them is a linear combination of the others

  13. Easy to visualize in R 3 • 2 vectors are dependent if they lie on a line • 3 vectors are dependent if they lie on a plane • Or line • 4 vectors are always dependent • 3 random vectors should be independent

  14. Example • Determine if the set of • I.e., non-trivial solution vectors is l.i. to the homogeneous linear system         1 1 5   S = 2 0 6  ,  ,           1 1 5 0 α 1 1 2 7    = 2 0 6 0 α 2      • Look for a non-trivial 1 2 7 0 α 3 solution to • From Gauss-Jordan   1 0 3         1 1 5 0 E A = 0 1 2  + α 2  + α 3  = 2 0 6 0   α 1      0 0 0 1 2 7 0 • So, they are l.d. and e.g. α 1 = − 3 α 2 = − 2 α 3 = 1

  15. Linear independence and Matrices • Let A be an m × n matrix. • These are equivalent to saying the columns of A form a linearly independent set • N ( A ) = { 0 } rank ( A ) = n • These are equivalent to saying the rows of A form a linearly independent set • N ( A T ) = { 0 } rank ( A ) = m • If A is square, these are equivalent to saying matrix A is non-singular • Columns of A form a linearly independent set • Rows of A form a linearly independent set

  16. Diagonal dominance • An n × n matrix A = [ a ij ] is diagonally dominant whenever n X | a kk | > | a kj | , k ∈ { 1 , 2 , . . . , n } j =1 j 6 = k • I.e., diagonal elements are larger in magnitude than the sum of magnitudes of other row elements • These matrices appear frequently in practical applications. Two important properties are • They are never singular • Don’t need to use partial pivoting

  17. Diagonal dominance • Diagonally dominant matrices are non-singular • Proof by contradiction • Assume there is a non-zero vector in N ( A ) • Find a contradiction • Let Ax = 0 , and let x k be the entry of largest magnitude in x n n X X [ Ax ] k = 0 = ⇒ a kk x k = − a kj x j a kj x j j =1 j =1 j 6 = k n n n n � � X X X X ⇒ | a kk || x k | = | a kj || x j | ≤ | x k | | a kj | ⇒ | a kk | ≤ | a kj | � ≤ � � a kj x j � j =1 j =1 j =1 j =1 j 6 = k j 6 = k j 6 = k j 6 = k

  18. Polynomial interpolation • Given a set of m points S = { ( x 1 , y 1 ) , . . . , ( x m , y m ) } where x i are distinct, there is a unique polynomial ⇥ ( t ) = � 0 + � 1 t + · · · + � m − 1 t m − 1 of degree m – 1 that goes through each point in S 1 + · · · � m − 1 x m − 1 � 0 + � 1 x 1 + � 2 x 2 = ⇥ ( x 1 ) = y 1 1 2 + · · · � m − 1 x m − 1 � 0 + � 1 x 2 + � 2 x 2 = ⇥ ( x 2 ) = y 2 2 . . . � 0 + � 1 x m + � 2 x 2 m + · · · � m − 1 x m − 1 = ⇥ ( x m ) = y m m

  19. Polynomial interpolation • Same as saying the following system has a unique solution for any right-hand side y i x m − 1  x 2      1 x 1 α 0 y 1 · · · 1 1 x m − 1 x 2 1 x 2 α 1 y 2  · · ·      2 2  =       . . . . . . . . . . . .       . . . . . .      x 2 x m − 1 1 α m − 1 y m x m · · · m m • Matrix is non-singular whenever x i are distinct? • Such matrices are called Vandermonde Matrices

  20. Vandermonde Matrices • Vandermonde matrices have independent columns whenever n ≤ m x n − 1      x 2  0 1 α 0 x 1 · · · 1 1 x n − 1 x 2 0 1 α 1 x 2      · · ·  2 2  =       . . . . . . . . . . . .       . . . . . .      x 2 x n − 1 0 1 α n − 1 x m · · · m m • Proof i + · · · + α n − 1 x n − 1 p ( x i ) = α 0 + α 1 x i + α 2 x 2 = 0 i • So p ( x ) has m distinct roots and degree n – 1 ? • Fundamental theorem of of algebra implies α j = 0

  21. Lagrange interpolator • In particular, when n = m we have that x m − 1  x 2      1 x 1 α 0 y 1 · · · 1 1 x m − 1 x 2 1 α 1 y 2 x 2 · · ·       2 2  =       . . . . . . . . . . . .       . . . . . .      x 2 x m − 1 1 α m − 1 y m x m · · · m m has a unique solution • The solution is the Lagrange interpolator Q m ! m j 6 = i ( t − x j ) X � ( t ) = y i Q m j 6 = i ( x i − x j ) i =1

  22. Example of interpolation H L H L 8 25 6 20 15 4 10 2 5 x x 2 4 6 8 2 4 6 8 P 4 H x L P 5 H x L 6 6 4 4 2 2 x x 2 4 6 8 2 4 6 8 - 2 - 2 - 4 - 4 6 6

  23. Maximal independent subsets #1 • We know that if rank ( A m × n ) < n then the columns of A must be a dependent set • In such cases, we often want to extract the maximal independent subset of columns • An l.i. set with as many columns of A as possible • Such columns are sufficient to span R ( A )

  24. Maximal independent subsets #2 • If rank ( A m × n ) = r , then the following hold • Any maximal independent subset of the columns of A contain exactly r columns • Any maximal independent subset of rows from A contain exactly r rows • In particular, the r basic columns in A constitute a maximal independent subset of the columns of A

  25. Maximal independent subsets #3 • Any maximal independent subset of the columns of A contain exactly r columns • Proof • Every column of matrix A can be written as a linear combination of the basic columns in A • Pick k > r columns of A and show they are l.d. 0 1 α 1 α 2 B C � A ∗ s 1 � A ∗ s 2 A ∗ s k A = 0 B C · · · . B C . . 0 1 0 1 @ β 11 β 12 β 1 k α 1 · · · α k β 21 β 22 β 2 k α 2 · · · B C B C � A ∗ b 1 � A ∗ b 2 A ∗ b r A = 0 B C B C · · · . . . . ... B C B C . . . . . . . . @ A @ β r 1 β r 2 β rk α k · · ·

  26. Basic facts about Independence • The following hold about a set of vectors in V S = { u 1 , u 2 , . . . , u n } • If S contains an l.d. subset, then S itself must be l.d. • If S is l.i., then every subset of S is also l.i. • If S is l.i. and , then is l.i. iff S ∪ { v } v ∈ V v ⇥� span ( S ) • Proof ( ) ⇐ α 1 u 1 + α 2 u 2 + · · · + α n u n + α n +1 v = 0 ⇒ α n +1 = 0 ⇒ α 1 u 1 + α 2 u 2 + · · · + α n u n = 0 ⇒ α 1 = α 2 = · · · = α n = 0 • If and n > m , then S is l.d. S ⊆ R m

  27. BASIS AND DIMENSION

  28. Bases • A basis for a vector space V is a set S that • Spans V • Is linearly independent • Spanning sets can contain redundant vectors • Bases, on the other hand, contain only necessary and sufficient information • Every vector space V has a basis • General proof depends on the axiom of choice • Bases are not unique

  29. Examples • The unit vectors in R n are a S = { e 1 , e 2 , . . . , e n } basis for R n . The standard or canonic basis of R n • If A is an n × n non-singular matrix, then the set of rows and the set of columns of A each constitute a basis of R n • What about ? Z = { 0 } • The set is a basis for all S = { 1 , x, x 2 , . . . , x n } polynomials of degree n or less • What about the vector space of all polynomials?

  30. Characterizations of a Basis • With V a subspace of R m and B = { b 1 , b 2 , . . . , b n } the following are equivalent • (1) B is a basis for V • (2) B is a minimal spanning set for V • (3) B is a maximal l.i. subset of V

  31. Proof #1 • (basis) ⇒ (minimal spanning set) • Assume B is a basis and X is a smaller spanning set 0 1 α 11 α 12 α 1 n · · · α 21 α 22 α 2 n · · · B C � b 1 � � x 1 � k < n b 2 b n x 2 x k B C = · · · · · · . . . ... B C . . . . . . @ A B = XA α k 1 α k 2 α kn · · · rank ( A ) ≤ k < n � ⇤ y ⇥ = 0 | Ay = 0 ⇒ By = 0 ⇒ B is l.d. (contradiction) • (minimal spanning set) ⇒ (basis) • A minimal spanning set must be l.i. • Otherwise remove an l.d. vector and reduce size • So it wasn’t minimal (contradiction)

  32. Proof #2 • (maximal l.i. set) ⇒ (basis) • If a maximal l.i. set B of V is not a basis for V then there is v � V | v ⇥� span ( B ) • So is l.i. and B is not maximal (contradiction) B ∪ { v } • (basis) ⇒ (maximal l.i. set) • If basis B is not maximal l.i., then take a larger set Y that is maximal l.i. • We know Y is a basis • But a basis is minimal and B is smaller • So B must also be maximal

  33. Dimension • We have just proven that, although there are many bases for V , each of them has the same number of vectors • The dimension of a space V , dim V , is the number of vectors in • Any basis of V • Any minimal spanning set for V • Any maximal independent set for V

  34. Examples , then dim Z = 0 • Z = { 0 } • The basis is the empty set • L is a line through the origin in R 3 , dim L = 1 • Any non-zero vector along L forms a basis for L • P is a plane through the origin in R 3 , dim P = 2 • How would we find a basis? • dim R n = n • The canonic vectors form a basis

  35. Further insights • Dimension measures the “amount of stuff” in a subspace • Point < Line < Plane < R 3 • Also measures the number of degrees of freedom in the subspace • Z : no freedom, Line: 1 degree, Plane: 2 etc • Do not confuse with number of components in a vector! Related, but not equal!

  36. Subspace dimension • Let M and N be vectors spaces and M ⊆ N • (1) dim M ≤ dim N • (2) If dim M = dim N , then M = N • Proof • (1) Assume dim M > dim N • Basis of M (all l.i. elements of N ) would have more vectors than the maximum independent set of N • (2) Assume M ⊂ N • Augment basis of M with v ∈ N \ M • Independent set with more than dim N vectors!

  37. Four Fundamental Subspaces: Dimension • For an m × n matrix A with rank ( A ) = r • dim R ( A ) = r • dim N ( A ) = n – r • dim R ( A T ) = r • dim N ( A T ) = m – r

  38. Rank Plus Nullity Theorem • For all m × n matrices A dim R ( A ) + dim N ( A ) = n • As the “amount of stuff” in R ( A ) grows, the “amount of stuff” in N ( A ) shrinks • ( dim N ( A ) was traditionally known as nullity)

  39. Completing a Basis • If is an l.i. subset of an S r = { v 1 , v 2 , . . . , v r } n -dimensional space V , where r < n, show how to extend S r with so that { v r +1 , . . . , v n } S n = { v 1 , . . . , v r , v r +1 , . . . , v n } forms a basis for V • Solution • Create a matrix A with S r as columns • Augment to by the identity matrix ( A | I ) • Reduce to echelon form to find basic columns • Return the n basic columns of ( A | I )

  40. Example • Take two l.i. vectors in R 4 and augment to a complete basis for R 4       1 0     0 0       S 2 =     − 1  , 1        2 − 2   • Solution     1 0 1 0 0 0 1 0 1 0 0 0 0 0 0 1 0 0 − 1 / 2 0 1 1 0 0     ( A | I ) = E ( A | I ) =     − 1 1 0 0 1 0 0 0 0 1 0 0     2 − 2 0 0 0 1 1 / 2 0 0 0 0 1           1 0 0 0     0 0 1 0           S 4 =          ,  ,  , − 1 1 0 1          2 − 2 0 0  

  41. Graphs • A graph G is defined by is a pair ( V , E ), where V is a set of vertices , and E a set of edges • Each edge connects two vertices • So E ⊆ V × V v 1 e 1 = ( v 2 , v 1 ) e 1 e 2 e 5 e 2 = ( v 1 , v 4 ) e 4 v 2 v 4 . . e 3 e 6 . v 3

  42. Incidence Matrices • For a graph G with m vertices and n edges • Associate an m × n matrix E such that  1 , e j = ( ∗ , v i )   [ E ] ij = − 1 , e j = ( v i , ∗ )  0 , otherwise  v 1 e 1 e 2 e 5 e 1 e 2 e 3 e 4 e 5 e 6 e 4   1 − 1 0 0 − 1 0 v 1 v 2 v 4 − 1 0 − 1 1 0 0 v 2   E =   0 0 1 0 1 1 v 3 e 3 e 6   0 1 0 − 1 0 − 1 v 4 v 3

  43. Rank and Connectivity • Each edge is associated to two vertices • Each column contains two entries ( 1 , and – 1 ) • All columns add up to zero • In other words, if e T = (1 1 … 1) , e ∈ N ( E T ) then e T E = 0 and therefore • So rank ( E ) = rank ( E T ) = m − dim N ( E T ) ≤ m − 1 • Equality holds iff the graph is connected! • I.e., when there is a sequence of edges connecting any pair of vertices

  44. Proof of Rank and Connectivity #1 • Proof ( ⇒ ) • Assume G is connected, prove dim N ( E T ) = 1 • I.e., prove e = (1 1 … 1) T spans N ( E T ) • Let and take any x i and x k from x x ∈ N ( E T ) • There is a path from v i to v k • Take the subset of vertices visited along the way { v j 1 = v i , v j 2 , . . . , v j r = v k } • There is an edge q linking and v j p v j p +1

  45. Proof of Rank and Connectivity #2 • There is an edge q linking and v j p v j p +1 • So column q in E is - 1 at row j p and 1 at row j p + 1 x T E = 0 x T E ∗ q = 0 = x j p +1 − x j p • But and so • Since this is true for all p , it turns out x i = x k • But i and k were arbitrary • And so finally we reach x = α e • So dim N ( E T ) = 1 • Which leads to rank ( E ) = m – 1

  46. Proof of Rank and Connectivity #3 • Proof ( ⇐ ) • If the graph is not connected, we can partition it into two disconnected subgraphs G 1 and G 2 • Reorder vertices so vertices/edges in G 1 appear before vertices/edges of G 2 in E . ✓ E 1 ◆ 0 E = 0 E 2 • Now compute the rank ✓ E 1 ◆ 0 rank ( E ) = rank 0 E 2 ≤ ( m 1 − 1) + ( m 2 − 1) = rank ( E 1 ) + rank ( E 2 ) = m − 2

  47. Application of Rank and Connectivity • Nodes 1 : I 1 − I 2 − I 5 = 0 2 : − I 1 − I 3 + I 4 = 0 3 : I 3 + I 5 + I 6 = 0 4 : I 2 − I 4 − I 6 = 0 • Loops A : I 1 R 1 − I 3 R 3 + I 5 R 5 = E 1 − E 3 B : I 2 R 2 − I 5 R 5 + I 6 R 6 = E 2 C : I 3 R 3 + I 4 R 4 − I 6 R 6 = E 3 + E 4

  48. Rank of a product • Equivalent matrices have the same rank • Recall the rank normal form • Multiplication by invertible matrices preserves rank • Multiplication by rectangular or singular matrices can reduce the rank • If A is m × n and B is n × p then rank ( AB ) = rank ( B ) − dim N ( A ) ∩ R ( B )

  49. Proof #1 • Start with a basis for N ( A ) ∩ R ( B ) S = { x 1 , x 2 , . . . , x s } • Augment to form a basis for R ( B ) S ext = { x 1 , . . . , x s , z 1 , . . . , z t } • Let us prove that dim R ( AB ) = t , so that rank ( AB ) = rank ( B ) − dim N ( A ) ∩ R ( B ) • Sufficient to prove that T = { Az 1 , . . . , Az t } is a basis for R ( AB )

  50. Proof #2 • T spans R ( AB ) b ∈ R ( AB ) ⇒ b = ABy X X By ∈ R ( B ) ⇒ By = ξ i x i + η i z i � X � X X � � b = A ξ i x i + A η i z i η i Az i = • T is l.i. X � X � α i Az i = 0 ⇒ A α i z i = 0 X X X α i z i ∈ N ( A ) ∩ R ( B ) α i z i = β i x i ⇒ ⇒ X X α i z i − β i x i = 0 ⇒ α i = β i = 0 ⇒

  51. Small perturbations can’t reduce rank • We already know that we can’t increase rank by means of matrix product rank ( AB ) ≤ rank ( B ) • We now show it is impossible to reduce rank by adding a matrix that is “small enough” rank ( A + E ) ≥ rank ( A ) • “Small” in a sense that will be clarified later, but for now here is some intuition

  52. Proof • Suppose rank ( A ) = r and let P and Q reduce A to rank normal form • Apply P and Q to A + E ✓ I r ◆ ✓ E 11 ◆ ✓ I r + E 11 ◆ 0 E 12 E 12 PAQ = P ( A + E ) Q = PEQ = 0 0 E 21 E 22 E 21 E 22 • But I r + E 11 is invertible. Keep eliminating ✓ I r + E 11 ◆ 0 P 2 P ( A + E ) QQ 2 = 0 S • From which rank ( A + E ) = rank ( A ) + rank ( S ) ≥ rank ( A )

  53. Pitfall solving singular systems • Due to floating-point precision, we do not really solve Ax = b • We solve some perturbed system ( A + E ) x = b • If A is non-singular, so is A + E and we are fine • If A is singular, A + E may have higher rank! • All we need is for rank ( S ) > 0! • But S = E 22 − E 21 ( I + E 11 ) − 1 E 12 • So fewer free variables than actual system • Significant loss of information

  54. Products A T A and AA T • For A in R m × n , the following statements hold • rank ( A T A ) = rank ( A ) = rank ( AA T ) • R ( A T A ) = R ( A T ) and R ( AA T ) = R ( A ) and N ( AA T ) = N ( A T ) • N ( A T A ) = N ( A ) • For A in C m × n , replace transposition by conjugate transpose operation

  55. Proof #1 • rank ( A T A ) = rank ( A ) • We know that rank ( A T A ) = rank ( A ) − dim N ( A T ) ∩ R ( A ) • So prove N ( A T ) ∩ R ( A ) = { 0 } x ∈ N ( A T ) ∩ R ( A ) ⇒ A T x = 0 x = Ay ⇒ A T Ay = 0 ⇒ y T A T Ay = 0 X ⇒ x T x = 0 x 2 i = 0 ⇒ x = 0 ⇒

  56. Proof #2 • R ( A T A ) = R ( A T ) R ( BC ) ⊆ R ( B ) ⇒ R ( A T A ) ⊆ R ( A T ) dim R ( A T A ) = rank ( A T A ) = rank ( A ) = rank ( A T ) = dim R ( A T ) • N ( A T A ) = N ( A ) N ( B ) ⊆ N ( CB ) ⇒ N ( A ) ⊆ N ( A T A ) dim N ( A ) = n − rank ( A ) = n − rank ( A T A ) = dim N ( A T A )

  57. Application for A T A • Consider an m × n system Ax = b that may or may not be consistent • Multiply on the left by A T to reach A T Ax = A T b • This is known as the associated system of normal equations • It has many nice properties

  58. Application for A T A • A T Ax = A T b is always consistent! A T b ∈ R ( A T ) = R ( A T A ) • If Ax = b is consistent, then both systems have the same solution set • Take a particular solution p for Ax = b • If Ap = b , then A T Ap = A T b • General solution is p + N ( A ) = p + N ( A T A ) • If Ax = b has a unique solution, then it is • x = ( A T A ) -1 A T b N ( A ) = 0 = N ( A T A ) (warning: A may not even be square, so not invertible!)

  59. Normal equations • For an m × n system Ax = b , the associated system of normal equations is the m × n system A T Ax = A T b • A T Ax = A T b is always consistent, even when Ax = b is not • When both are consistent, the solution sets agree • Otherwise, A T Ax = A T b gives the least-squares solution to Ax = b • When Ax = b is consistent and has a unique solution, so does A T Ax = A T b and the solution x = ( A T A ) -1 A T b

  60. LEAST SQUARES

  61. Motivating problem • Assume we observe a phenomenon that varies with time and record observations � D = ( t 1 , b 1 ) , ( t 2 , b 2 ) , . . . , ( t m , b m ) • Want to be able to infer the value of an observation at an arbitrary point in time t ) = ˆ f (ˆ b • Assume we have a sensible model for f e.g. f ( t ) = α + β t • Find “good” values for and given D β α

  62. Proposed solution • Want to find “best” f ( t ) = α + β t • Find values for and β α that minimize m m � 2 X X ε 2 � i = f ( t i ) − b i i =1 i =1 • Turns out this reduces to a linear problem • Let us express in vector form and generalize

  63. Changing to vector form • In our example, define     b 1 1 t 1 1 b 2 t 2 ✓ α ◆     ε = Ax − b A = b =     . . x = . β . .     . . . .     1 b m t m • Then and [ ε ] i = α + β t i − b i = ε i m i = ε T ε = ( Ax − b ) T ( Ax − b ) X ε 2 i =1 = x T A T Ax − x T A T b − b T Ax + b T b = x T A T Ax − 2 x T A T b + b T b

  64. The minimization problem • Our goal is to find where the scalar arg min ε ( x ) function x ε ( x ) = x T A T Ax − 2 x T A T b + b T b • From calculus, at the minimum r ε ( x ) = 0 i = ∂ε ( x ) ⇥ ⇤ r ε ( x ) ∂ x i • Both Ax and x T A T can be seen as matrix functions of each x i • We can use our rules for differentiation of matrix functions

  65. Finding the minimum • Differentiating ε ( x ) = x T A T Ax − 2 x T A T b + b T b w.r.t. each component in x we get T T i = ∂ε ( x ) = ∂ x A T Ax + x T A T A ∂ x − 2 ∂ x A T b ⇥ ⇤ r ε ( x ) ∂ x i ∂ x i ∂ x i ∂ x i ∂ x • Since and since e T i A T = A T ⇤ ⇥ = e i i ∗ ∂ x i i A T Ax + x T A T Ae i � 2 e T i A T b i A T Ax − 2 e T i A T b i = e T = 2 e T ⇥ ⇤ r ε ( x ) A T Ax A T b ⇥ ⇤ ⇥ ⇤ = 2 i ∗ − 2 i ∗ • Equating to zero and grouping all rows A T Ax = A T b

  66. Is there a favorite solution? • Calculus tells us that the minimum of ε ( x ) can only happen at some solution of the normal equations A T Ax = A T b • Are all solutions equally good? ε ( x ) = x T A T Ax − 2 x T A T b + b T b • Take any two solutions z 1 and z 2 = z 1 + u ε ( z 1 ) = b T b − z T 1 A T b ε ( z 2 ) = ε ( z 1 + u ) = ε ( z 1 ) + u T A T Au = ε ( z 1 ) • Same argument proves no other vector can produce a lower value for ε ( x )

  67. General Least Squares • For A in R m × n and b in R m , let ε = Ax − b • The general least squares problem is to find a vector x that minimizes the quantity m i = ε T ε = ( Ax − b ) T ( Ax − b ) X ε 2 i =1 • Any such vector is a least-squares solution • The solution set is the same as of A T Ax = A T b • Unique only iff rank ( A ) = n , in which case x = ( A T A ) -1 A T b • If Ax = b is consistent, solution sets are the same

  68. Example of Linear Regression • Predict amount of weight that a pint of ice- cream loses when stored at low temperatures • Assume a linear model for phenomenon y = α 0 + α 1 t 1 + α 2 t 2 + ε t 1 (time), t 2 (temperature), (random noise) ε • Assume random noise “averages out” • Use measurements to find least-squares solution for parameters in E ( t 1 , t 2 ) = α 0 + α 1 t 1 + α 2 t 2

  69. Result of experiments • Assume the following measurements Time (weeks) 1 1 1 2 2 2 3 3 3 Temp ( o C) -10 -5 0 -10 -5 0 -10 -5 0 Loss (grams) 0.15 0.18 0.2 0.17 0.19 0.22 0.2 0.23 0.25 • In vector form, we get   0 . 15   1 1 − 10 0 . 18 1 1 − 5       0 . 2   1 1 0         0 . 17   α 0 1 2 − 10       b = 0 . 19   α 1 A = x = 1 2 − 5         0 . 22   α 2 1 2 0       0 . 2   1 3 − 10       0 . 23   1 3 − 5     0 . 25 1 3 − 0

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend