Fitting Linear Models Requires assumptions about i s. Usual - PowerPoint PPT Presentation

Fitting Linear Models Requires assumptions about ǫ i s. Usual assumptions: 1. ǫ 1 , . . . , ǫ n are independent. (Sometimes we assume only that Cov ( ǫ i , ǫ j ) = 0 for i � = j ; that is we assume the errors are uncorrelated .) 2. Homoscedastic errors; all variances are equal: Var ( ǫ 1 ) = Var ( ǫ 2 ) = · · · = σ 2 3. Normal errors: ǫ i ∼ N (0 , σ 2 ). Remember: we already have assumed E ( ǫ i ) = 0. Richard Lockhart STAT 350: Fitting Linear Models

Notes ◮ Assumptions 1, 2 and 3 permit Maximum Likelihood Estimation . ◮ Assumptions 1 and 2 justify least squares . ◮ Assumption 3 can be replaced by other error distributions, but not in this course. ◮ With normal errors maximum likelihood estimates are the same as least squares estimates. ◮ Assumption 2 — Homoscedastic errors — can be relaxed; see STAT 402 “Generalized Linear Models” or “weighted least square”. ◮ Assumption 1 can be relaxed; see STAT 804 for Time Series models. Richard Lockhart STAT 350: Fitting Linear Models

Motivation for Least Squares Choose ˆ µ = X ˆ β to make fitted values ˆ β as close to Y s as possible. There are many possible choices for “close”: ◮ Mean Absolute Deviation: minimize 1 � | Y i − ˆ µ i | n ◮ Least Median of Squares: minimize µ i | 2 } median {| Y i − ˆ ◮ Least squares: minimize � µ i ) 2 ( Y i − ˆ WE DO LS = least squares. Richard Lockhart STAT 350: Fitting Linear Models

µ i ) 2 take derivatives with respect to each ˆ To minimize � ( Y i − ˆ β j and set them equal to 0: n n ∂ ∂ µ i ) 2 = � � µ i ) 2 ( Y i − ˆ ( Y i − ˆ ∂ ˆ ∂ ˆ β j β j i =1 i =1 � ∂ � ∂ ˆ n µ i � µ i ) 2 = ( Y i − ˆ ∂ ˆ ∂ ˆ µ i β j i =1 But ∂ µ i ) 2 = − 2( Y i − ˆ ( Y i − ˆ µ i ) ∂ ˆ µ i and p � x ij ˆ µ i = ˆ β j j =1 so ∂ ˆ µ i = x ij ∂ ˆ β j Richard Lockhart STAT 350: Fitting Linear Models

Thus n n ∂ µ i ) 2 = − 2 � � ( Y i − ˆ x ij ( Y i − ˆ µ i ) ∂ ˆ β j i =1 i =1 Richard Lockhart STAT 350: Fitting Linear Models

Normal Equations Set this equal to 0; Get so-called normal equations : n n � � Y i x ij = ˆ j = 1 , . . . , p µ i x ij i =1 i =1 µ i = � p k =1 x ik ˆ Finally remember that ˆ β k to get n p x ij x ik ˆ � � � Y i x ij = (1) β k i =1 k =1 Richard Lockhart STAT 350: Fitting Linear Models

◮ Formula looks dreadful ◮ but it’s just a bunch of matrix-vector multiplications written out in summation notation. ◮ Note that it is a set of p linear equations in p unknowns β 1 , . . . , ˆ ˆ β p . Richard Lockhart STAT 350: Fitting Linear Models

Normal equations in vector matrix form First n � x ij x ik i =1 is the dot product between the j th and k th columns of X . Another way to view this is as the jk th entry in the matrix X T X :   x 11 x 21 · · · x n 1   · · · x 11 x 1 p · · · x 12 x 22 x n 2 . .   X T X = . .  . . .    . · · · . . . .     . . · · · .   x n 1 · · · x np · · · x 1 p x 2 p x np Richard Lockhart STAT 350: Fitting Linear Models

The jk th entry in this matrix product is clearly x 1 j x 1 k + x 2 j x 2 k + · · · + x nj x nk so that the right hand side of (1) is p ( X T X ) jk ˆ � β k k =1 which is just the matrix product (( X T X )ˆ β ) j Richard Lockhart STAT 350: Fitting Linear Models

Now look at the left hand side of (1), namely, � n i =1 Y i x ij which is just the dot product of Y and the j th column of X or the j th entry of X T Y :     · · · x 11 x 21 x n 1 Y 1   x 11 Y 1 + x 21 Y 2 + · · · + x n 1 Y n x 12 x 22 · · · x n 2 Y 2     . .  =  . . .   .    . . . . .       . . · · · . .    x 1 p Y 1 + x 2 p Y 2 + · · · + x np Y n · · · x 1 p x 2 p x np Y n So the normal equations are ( X T Y ) j = ( X T X ˆ β ) j or just X T Y = X T X ˆ β Last formula is usual way to write the normal equations. Richard Lockhart STAT 350: Fitting Linear Models

Solving Normal Equations for ˆ β Let’s look at the dimensions of the matrices first. ◮ X T is p × n , ◮ Y is n × 1, ◮ X T X is a p × n matrix multiplied by a n × p matrix which just produces a p × p matrix. ◮ If the matrix X T X has rank p then X T X is not singular and its inverse ( X T X ) − 1 exists. So solve X T Y = X T X ˆ β β by multiplying both sides by ( X T X ) − 1 to get for ˆ β = ( X T X ) − 1 X T Y ˆ This is the ordinary least squares estimator. See assignment 1 for an example with rank ( X ) < p . See chapter 5 in the text for a review of matrices and vectors. Richard Lockhart STAT 350: Fitting Linear Models

Normal Equations for Simple Linear Regression Thermoluminescence Example See Introduction for the framework. Here I consider two models: ◮ a straight-line model, Y i = β 1 + β 2 D i + ǫ i ◮ a quadratic model, Y i = β 1 + β 2 D i + β 3 D 2 i + ǫ i . Richard Lockhart STAT 350: Fitting Linear Models

First, general theoretical formulas, then numbers and arithmetic:   1 D 1 . . . . X =   . .   1 D n   1 � 1 D 1 � n � � � · · · 1 n i =1 D i . . X T X = . .  =   . . � n � n i =1 D 2 · · · D 1 D n i =1 D i  i 1 D n � n − � n i =1 D 2 1 � � i =1 D i ( X T X ) − 1 = i n � D 2 i − ( � D i ) 2 − � n i =1 D i n Richard Lockhart STAT 350: Fitting Linear Models

  Y 1 � 1 � n � � � · · · 1 i =1 Y i . X T Y = .  =   . � n D 1 · · · D n i =1 D i Y i  Y n P Y i P D 2 i − P D i P D i Y i   n P D 2 i − ( P D i ) 2 ( X T X ) − 1 X T Y =     n P D i Y i − ( P D I )( P Y i )   n P D 2 i − ( P D i ) 2 Richard Lockhart STAT 350: Fitting Linear Models

So Y P D 2 D P D i Y i ¯ i − ¯   P ( D i − ¯ D ) 2 ( X T X ) − 1 X T Y =     P ( D i − ¯ D )( Y i − ¯   Y ) P ( D i − ¯ D ) 2 P D 2 P D i Y i − ¯ ¯ i − n ¯ D 2 ] − ¯ D ¯ Y [ D [ Y ]   n P D 2 i − ( P D i ) 2 =     S DY S DD  Y − S DY ¯ S DD ¯  D =     S DY S DD Richard Lockhart STAT 350: Fitting Linear Models

The data are Dose Count 0 27043 0 26902 0 25959 150 27700 150 27530 150 27460 420 30650 420 30150 420 29480 900 34790 900 32020 1800 42280 1800 39370 1800 36200 3600 53230 3600 49260 3600 53030 Richard Lockhart STAT 350: Fitting Linear Models

The design matrix for the linear model is   1 27043 1 26902     1 25959     1 27700     1 27530     1 27460     1 30650     1 30150     X = 1 29480     1 34790      1 32020     1 42280     1 39370     1 36200      1 53230     1 49260   1 53030 Richard Lockhart STAT 350: Fitting Linear Models

◮ Compute X T X in Minitab or Splus or R . ◮ That matrix has 4 numbers each of which is computed as the dot product of 2 columns of X . ◮ For instance the first column dotted with itself produces 1 2 + · · · + 1 2 = 17. ◮ Here is an example S session which reads in the data, produces the design matrices for the two models and computes X T X . Richard Lockhart STAT 350: Fitting Linear Models

[36]skekoowahts% S # The data are in a file called linear. The ! # tells S that what follows is not an S command but a standard # UNIX (or DOS) command # > !more linear Dose Count 0 27043 0 26902 0 25959 150 27700 150 27530 150 27460 420 30650 420 30150 420 29480 900 34790 900 32020 1800 42280 1800 39370 1800 36200 3600 53230 3600 49260 3600 53030 # Richard Lockhart STAT 350: Fitting Linear Models

# The function help(function) produces help for # a function such as # > help(read.table) # # Read in the data from a file. The file has 18 lines: # 17 lines of data and a first line which has the names # of the variables. The function read.table reads such # data and header=T warns S that the first line is # variable names. The first argument of read.table is # a character string containing the name of the file # to read from. # > dat <- read.table("linear",header=T) Richard Lockhart STAT 350: Fitting Linear Models

> dat Dose Count 1 0 27043 2 0 26902 3 0 25959 4 150 27700 5 150 27530 6 150 27460 7 420 30650 8 420 30150 9 420 29480 10 900 34790 11 900 32020 12 1800 42280 13 1800 39370 14 1800 36200 15 3600 53230 16 3600 49260 17 3600 53030 # Richard Lockhart STAT 350: Fitting Linear Models

# the design matrix has a column of 1s and also # a column consisting of the first column of dat # which is just the list of covariate values # The notation dat[,1] picks out the first column of dat # > design.mat <- cbind(rep(1,17),dat[,1]) # # To print out an object you type its name! # Richard Lockhart STAT 350: Fitting Linear Models

Fitting Linear Models Requires assumptions about i s. Usual - PowerPoint PPT Presentation

Fitting Linear Models Requires assumptions about i s. Usual assumptions: 1. 1 , . . . , n are independent. (Sometimes we assume only that Cov ( i , j ) = 0 for i = j ; that is we assume the errors are uncorrelated .) 2.

Track fitting, vertex fitting and Track fitting, vertex fitting and Track fitting, vertex fitting

Week 2 Video 5 Cross-Validation and Over-Fitting Over-Fitting Ive mentioned over-fitting a

Lecture 11 Fitting ARIMA Models 10/10/2018 1 Model Fitting Fitting ARIMA For an

Least Squares and Data Fitting Data fitting How do we best fit a set of data points? Linear

Linear Least Squares I Steve Marschner Cornell CS 322 Cornell CS 322 Linear Least Squares I 1

Fitting a Line, Residuals, and Correlation October 28, 2019 October 28, 2019 1 / 36 Fitting a

Fitting Agent Fitting Agent- -Based Models to Based Models to Historical Networks Historical

Lecture 19 Fitting CAR and SAR Models Colin Rundel 03/29/2017 1 Fitting areal models 2 CAR

Lecture 18 Fitting CAR and SAR Models Colin Rundel 11/07/2018 1 Fitting areal models Revised

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Introduction to Data Science: Logistic 0 1 1 according to a data fit criterion. account

Functions and Data Fitting COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning

Over fitting distribution functions over Bayesian Regression / " ' i diggllloise dist

Fitting high resolution structures into low resolution EM maps Michael Rossmann Purdue

Unit 1: Data Fitting Motivation Data fitting: Construct a continuous function that represents

Fitting a Line, Residuals, and Correlation August 27, 2019 August 27, 2019 1 / 54 Fitting a

Chapter 5 Least Squares Chapter 5 Inconsistent Systems In regression (and many other

Differentiability and strict convexity of the stable norm Michael Goldman CMAP, Polytechnique/

Learning algorithms and statistical software, with applications to bioinformatics PhD defense of

Spin norm: combinatorics and representations Chao-Ping Dong Institute of Mathematics Hunan

Regression Practical Machine Learning Fabian Wauthier 09/10/2009 Adapted from slides by Kurt

Bundle Adjustment and SLAM 31 March 2014 1 Structure-From-Motion Two views initialization:

Notes on Linear Least Squares Model, COMP24111 Tingting Mu tingtingmu@manchester.ac.uk School of

Notes Block Approach to LU Assignment 1 is out (due October 5) Rather than get bogged down

Fitting Linear Models Requires assumptions about i s. Usual - PowerPoint PPT Presentation

Fitting Linear Models Requires assumptions about i s. Usual assumptions: 1. 1 , . . . , n are independent. (Sometimes we assume only that Cov ( i , j ) = 0 for i = j ; that is we assume the errors are uncorrelated .) 2.

Track fitting, vertex fitting and Track fitting, vertex fitting and Track fitting, vertex fitting

Week 2 Video 5 Cross-Validation and Over-Fitting Over-Fitting Ive mentioned over-fitting a

Lecture 11 Fitting ARIMA Models 10/10/2018 1 Model Fitting Fitting ARIMA For an

Least Squares and Data Fitting Data fitting How do we best fit a set of data points? Linear

Linear Least Squares I Steve Marschner Cornell CS 322 Cornell CS 322 Linear Least Squares I 1

Fitting a Line, Residuals, and Correlation October 28, 2019 October 28, 2019 1 / 36 Fitting a

Fitting Agent Fitting Agent- -Based Models to Based Models to Historical Networks Historical

Lecture 19 Fitting CAR and SAR Models Colin Rundel 03/29/2017 1 Fitting areal models 2 CAR

Lecture 18 Fitting CAR and SAR Models Colin Rundel 11/07/2018 1 Fitting areal models Revised

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Introduction to Data Science: Logistic 0 1 1 according to a data fit criterion. account

Functions and Data Fitting COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning

Over fitting distribution functions over Bayesian Regression / &quot; ' i diggllloise dist

Fitting high resolution structures into low resolution EM maps Michael Rossmann Purdue

Unit 1: Data Fitting Motivation Data fitting: Construct a continuous function that represents

Fitting a Line, Residuals, and Correlation August 27, 2019 August 27, 2019 1 / 54 Fitting a

Chapter 5 Least Squares Chapter 5 Inconsistent Systems In regression (and many other

Differentiability and strict convexity of the stable norm Michael Goldman CMAP, Polytechnique/

Learning algorithms and statistical software, with applications to bioinformatics PhD defense of

Spin norm: combinatorics and representations Chao-Ping Dong Institute of Mathematics Hunan

Regression Practical Machine Learning Fabian Wauthier 09/10/2009 Adapted from slides by Kurt

Bundle Adjustment and SLAM 31 March 2014 1 Structure-From-Motion Two views initialization:

Notes on Linear Least Squares Model, COMP24111 Tingting Mu tingtingmu@manchester.ac.uk School of

Notes Block Approach to LU Assignment 1 is out (due October 5) Rather than get bogged down

Over fitting distribution functions over Bayesian Regression / " ' i diggllloise dist