the geometry of least squares
play

The Geometry of Least Squares Mathematical Basics Inner / dot - PowerPoint PPT Presentation

The Geometry of Least Squares Mathematical Basics Inner / dot product: a and b column vectors a b = a T b = a i b i a b a T b = 0 Matrix Product: A is r s B is s t ( AB ) rt = A rs B st s Richard Lockhart STAT


  1. The Geometry of Least Squares Mathematical Basics ◮ Inner / dot product: a and b column vectors � a · b = a T b = a i b i a ⊥ b ⇔ a T b = 0 ◮ Matrix Product: A is r × s B is s × t � ( AB ) rt = A rs B st s Richard Lockhart STAT 350: Geometry of Least Squares

  2. Partitioned Matrices ◮ Partitioned matrices are like ordinary matrices but the entries are matrices themselves. ◮ They add and multiply (if the dimensions match properly) just like regular matrices but(!) you must remember that matrix multiplication is not commutative. ◮ Here is an example � A 11 � A 12 A 13 A = A 21 A 22 A 23   B 11 B 12 B = B 21 B 22   B 31 B 32 Richard Lockhart STAT 350: Geometry of Least Squares

  3. ◮ Think of A as a 2 × 3 matrix and B as a 3 × 2 matrix. ◮ multiply them to get C = AB a 2 × 2 matrix as follows: � A 11 B 11 + A 12 B 21 + A 13 B 31 � A 11 B 12 + A 12 B 22 + A 13 B 32 AB = A 21 B 11 + A 22 B 21 + A 23 B 31 A 21 B 12 + A 22 B 22 + A 23 B 32 ◮ BUT: this only works if each of the matrix products in the formulas makes sense. ◮ So, A 11 must have the same number of columns as B 11 has rows and many other similar restrictions apply. Richard Lockhart STAT 350: Geometry of Least Squares

  4. First application: X = [ X 1 | X 2 | · · · | X p ] where each X i is a column of X . Then   β 1 .   . X β = [ X 1 | X 2 | · · · | X p ]  = X 1 β 1 + X 2 β 2 + · · · + X p β p .  β p which is a linear combination of the columns of X . Definition : The column space of X , written col ( X ) is the (vector space of) set of all linear combinations of columns of X also called the space “spanned” by the columns of X . SO: ˆ µ = X β is in col ( X ). Richard Lockhart STAT 350: Geometry of Least Squares

  5. Back to normal equations: X T Y = X T X ˆ β or X T � � Y − X ˆ β = 0 or   X T 1 � � . Y − X ˆ  .  β = 0 .   X T p or � � Y − X ˆ X T = 0 i = 1 , . . . , p β i or Y − X ˆ β ⊥ every vector in col ( X ) Richard Lockhart STAT 350: Geometry of Least Squares

  6. ǫ = Y − X ˆ Definition : ˆ β is the fitted residual vector. SO: ˆ ǫ ⊥ col ( X ) and ˆ ǫ ⊥ ˆ µ Pythagoras’ Theorem : If a ⊥ b then || a || 2 + || b || 2 = || a + b || 2 Definition : || a || is the “length” or “norm” of a : √ �� a 2 || a || = i = a T a Moreover, if a , b , c , . . . are all perpendicular then || a || 2 + || b || 2 + · · · = || a + b + · · · || 2 Richard Lockhart STAT 350: Geometry of Least Squares

  7. Application Y = Y − X ˆ β + X ˆ β = ˆ ǫ + ˆ µ so || Y || 2 = || ˆ ǫ || 2 + || ˆ µ || 2 or � � � Y 2 ǫ 2 µ 2 i = ˆ i + ˆ i Definitions : � Y 2 i = Total Sum of Squares (unadjusted) � ǫ 2 ˆ i = Error or Residual Sum of Squares � µ 2 ˆ i = Regression Sum of Squares Richard Lockhart STAT 350: Geometry of Least Squares

  8. Alternative formulas for the Regression SS � µ T ˆ µ 2 ˆ i = ˆ µ = ( X ˆ β ) T ( X ˆ β ) β T X T X ˆ = ˆ β Notice the matrix identity which I will use regularly: ( AB ) T = B T A T . Richard Lockhart STAT 350: Geometry of Least Squares

  9. What is least squares? Choose ˆ β to minimize � µ i ) 2 = || Y − ˆ µ || 2 ( Y i − ˆ ǫ || 2 . The resulting ˆ That is, to minimize || ˆ µ is called the Orthogonal Projection of Y onto the column space of X . Extension : � β 1 � X = [ X 1 | X 2 ] β = p = p 1 + p 2 β 2 Imagine we fit 2 models: 1. The FULL model: Y = X β + ǫ (= X 1 β 1 + X 2 β 2 + ǫ ) 2. The REDUCED model: Y = X 1 β 1 + ǫ Richard Lockhart STAT 350: Geometry of Least Squares

  10. If we fit the full model we get ˆ ˆ ˆ ˆ ǫ F ⊥ col ( X ) (1) β F µ F ǫ F If we fit the reduced model we get ˆ ˆ ˆ ˆ µ R ∈ col ( X 1 ) ⊂ col ( X ) (2) β R µ R ǫ R Notice that ˆ ǫ F ⊥ ˆ (3) µ R . (The vector ˆ µ R is in the column space of X 1 so it is in the column space of X and ˆ ǫ F is orthogonal to everything in the column space of X .) So: Y = ˆ ǫ F + ˆ µ F = ˆ ǫ F + ˆ µ R + (ˆ µ F − ˆ µ R ) = ǫ R + ˆ µ R Richard Lockhart STAT 350: Geometry of Least Squares

  11. You know ˆ ǫ F ⊥ ˆ µ R (from (3) above) and ˆ ǫ F ⊥ ˆ µ F (from (1) above). So ǫ F ⊥ ˆ ˆ µ F − ˆ µ R Also µ R ⊥ ˆ ˆ ǫ R = ˆ ǫ F + (ˆ µ F − ˆ µ R ) So µ R ) T ˆ 0 = (ˆ ǫ F + ˆ µ F − ˆ µ R µ R ) T ˆ ǫ T = ˆ F ˆ µ R +(ˆ µ F − ˆ µ R � �� � 0 so µ F − ˆ ˆ µ R ⊥ ˆ µ R Richard Lockhart STAT 350: Geometry of Least Squares

  12. Summary We have Y = ˆ µ R + (ˆ µ F − ˆ µ R ) + ˆ ǫ F All three vectors on the Right Hand Side are perpendicular to each other. This gives: || Y || 2 = || ˆ µ R || 2 + || ˆ µ R || 2 + || ˆ ǫ F || 2 µ F − ˆ which is an Analysis of Variance (ANOVA) table! Richard Lockhart STAT 350: Geometry of Least Squares

  13. Here is the most basic version of the above: X = [ 1 | X 1 ] Y i = β 0 + · · · + ǫ i The notation here is that   1 .  .  1 = .   1 is a column vector with all entries equal to 1. The coefficient of this column, β 0 , is called the “intercept” term in the model. Richard Lockhart STAT 350: Geometry of Least Squares

  14. To find ˆ µ R we minimize � ( Y i − ˆ β 0 ) 2 and get simply β 0 = ¯ ˆ Y and   ¯ Y .  .  µ R = ˆ .   ¯ Y Our ANOVA identity is now || Y || 2 = || ˆ µ R || 2 + || ˆ µ R || 2 + || ˆ ǫ F || 2 µ F − ˆ Y 2 + || ˆ µ R || 2 + || ˆ = n ¯ ǫ F || 2 µ F − ˆ Richard Lockhart STAT 350: Geometry of Least Squares

  15. This identity is usually rewritten in subtracted form: || Y || 2 − n ¯ Y 2 = || ˆ µ R || 2 + || ˆ ǫ F || 2 µ F − ˆ Y ) 2 = � Y 2 Y 2 we find Remembering the identity � ( Y i − ¯ i − n ¯ � � � Y ) 2 = Y ) 2 + ( Y i − ¯ µ F , i − ¯ ǫ 2 (ˆ ˆ F , i These terms are respectively: ◮ the Adjusted or Corrected Total Sum of Squares, ◮ the Regression or Model Sum of Squares and ◮ the Error Sum of Squares. Richard Lockhart STAT 350: Geometry of Least Squares

  16. Simple Linear Regression ◮ Filled Gas tank 107 times. ◮ Record distance since last fill, gas needed to fill. ◮ Question for discussion: natural model? ◮ Look at JMP analysis. Richard Lockhart STAT 350: Geometry of Least Squares

  17. The sum of squares decomposition in one example ◮ Example discussed in Introduction . ◮ Consider model Y ij = µ + α i + ǫ ij with α 4 = − ( α 1 + α 2 + α 3 ). ◮ Data consist of blood coagulation times for 24 animals fed one of 4 different diets. ◮ Now I write the data in a table and decompose the table into a sum of several tables. ◮ The 4 columns of the table correspond to Diets A, B, C and D. ◮ You should think of the entries in each table as being stacked up into a column vector, but the tables save space. Richard Lockhart STAT 350: Geometry of Least Squares

  18. ◮ The design matrix can be partitioned into a column of 1s and 3 other columns. ◮ You should compute the product X T X and get   24 − 4 − 2 − 2 − 4 12 8 8     − 2 8 14 8   − 2 8 8 14 ◮ The matrix X T Y is just   � � � � � � � Y ij , Y 1 j − Y 4 j , Y 2 j − Y 4 j , Y 3 j − Y 4 j  ij j j j j j j Richard Lockhart STAT 350: Geometry of Least Squares

  19. ◮ The matrix X T X can be inverted using a program like Maple. ◮ I found that   17 7 − 1 − 1 7 65 − 23 − 23 384( X T X ) − 1 =     − 1 − 23 49 − 15   − 1 − 23 − 15 49 ◮ It now takes quite a bit of algebra to verify that the vector of fitted values can be computed by simply averaging the data in each column. Richard Lockhart STAT 350: Geometry of Least Squares

  20. That is, the fitted value, ˆ µ is the table   61 66 68 61 61 66 68 61     61 66 68 61     61 66 68 61     66 68 61     66 68 61     61   61 Richard Lockhart STAT 350: Geometry of Least Squares

  21. On the other hand fitting the model with a design matrix consisting only of a column of 1s just leads to ˆ µ R (notation from the lecture) given by   64 64 64 64 64 64 64 64     64 64 64 64     64 64 64 64     64 64 64     64 64 64     64   64 Richard Lockhart STAT 350: Geometry of Least Squares

  22. Earlier I gave identity: Y = ˆ µ R + (ˆ µ F − ˆ µ R ) + ˆ ǫ F which corresponds to the following identity: 2 62 63 68 56 3 2 64 64 64 64 3 2 − 3 2 4 − 3 3 2 1 − 3 0 − 5 3 60 67 66 62 64 64 64 64 − 3 2 4 − 3 − 1 1 − 2 1 6 7 6 7 6 7 6 7 6 63 71 71 60 7 6 64 64 64 64 7 6 − 3 2 4 − 3 7 6 2 5 3 − 1 7 6 7 6 7 6 7 6 7 59 64 67 61 64 64 64 64 − 3 2 4 − 3 − 2 − 2 − 1 0 6 7 6 7 6 7 6 7 6 7 = 6 7 + 6 7 + 6 7 65 68 63 64 64 64 2 4 − 3 − 1 0 2 6 7 6 7 6 7 6 7 6 7 6 7 6 7 6 7 66 68 64 64 64 64 2 4 − 3 0 0 3 6 7 6 7 6 7 6 7 6 7 6 7 6 7 6 7 63 64 − 3 2 4 5 4 5 4 5 4 5 59 64 − 3 − 2 Richard Lockhart STAT 350: Geometry of Least Squares

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend