Fitting Linear Statistical Models to Data by Least Squares III: - PowerPoint PPT Presentation

Fitting Linear Statistical Models to Data by Least Squares III: Multivariate Brian R. Hunt and C. David Levermore University of Maryland, College Park Math 420: Mathematical Modeling January 25, 2012 version

Outline 1) Introduction to Linear Statistical Models 2) Linear Euclidean Least Squares Fitting 3) Linear Weighted Least Squares Fitting 4) Least Squares Fitting for Univariate Polynomial Models 5) Least Squares Fitting with Orthogonalization 6) Multivariate Linear Least Squares Fitting 7) General Multivariate Linear Least Squares Fitting

6. Multivariate Linear Least Squares Fitting The least square method extends to settings with a multivariate dependent variable y . Suppose we are given data { ( x j , y j ) } n j =1 where the x j lie within a domain X ⊂ R p and the y j lie in R q . The problem we will examine is now the following. How can you use this data set to make a reasonable guess about the value of y when x takes a value X that is not represnted in the data set? In this setting x is called the independent variable while y is called the dependent variable . We will use weighted least squares to fit the data to a linear statistical model with m parameter q -vectors in the form m � f ( x ; β 1 , · · · , β m ) = β i f i ( x ) , i =1 where each basis function f i ( x ) is defined over X and takes values in R .

We now define the j th residual by the vector-valued formula m � r j ( β 1 , · · · , β m ) = y j − β i f i ( x ) . i =1 Introduce the m × q -matrix B , the n × q -vectors Y and R , and the n × m - matrix F by       β T T T y r 1 1 1 . . .       . . . B = . Y = . R = .  ,  ,  ,    β T T T y r n n m   f 1 ( x 1 ) · · · f m ( x 1 ) . . .   . . . F = . . .  .  f 1 ( x n ) · · · f m ( x n ) We will assume the matrix F has rank m . The fitting problem then can be recast as finding B so as to minimize the size of the vector R ( B ) = Y − F B .

As we did for univariate weighted least square fitting, we will minimize n � q ( B ) = 1 T r j ( β 1 , · · · , β m ) , w j r j ( β 1 , · · · , β m ) 2 j =1 where the w j are positive weights. If we again let W be the n × n diagonal matrix whose j th diagonal entry is w j then this can be expressed as � � � � q ( B ) = 1 T WR ( B ) = 1 T W ( Y − F B ) 2 tr R ( B ) 2 tr ( Y − F B ) � � � � � � = 1 Y T WY B T F T WY + 1 T F T WF B 2 tr − tr 2 tr B . T WF is positive definite. The Because F has rank m the m × m -matrix F function q ( B ) thereby has a strictly convex structure similar to that it had in the univariate case. It thereby has a unique global minimizer B = � B given by T WF ) − 1 F T WY . � B = ( F

The fact that � B in a global minimizer again can be seen from the fact T WF is positive definite and the identity F � � � � � T F Y T WY T WF � q ( B ) = tr − tr B B � � T F T WF ( B − � ( B − � + tr B ) B ) � � T F T WF ( B − � = q ( � ( B − � B ) + tr B ) B ) . B ) for every B ∈ R m × q and that In particular, this shows that q ( B ) ≥ q ( � q ( B ) = q ( � B ) if and only if B = � B . T i be the i th row of � If we let � β B then the fit is given by m � � � f ( x ) = β i f i ( x ) . i =1 The geometric interpretation of this fit is similar to that for the univariate weighted least squares fit.

Example. Use least squares to fit the affine model f ( x ; a , B ) = a + Bx with a ∈ R q and B ∈ R q × p to the data { ( x j , y j ) } n j =1 . Begin by setting     T T � � y 1 x T 1 1 a . . .     . . . B = Y = . F = . . ,  ,  .   T B T T 1 x y n n Because � � � � T � T � � 1 � � y � x T WY = T WF = F , F , T � T � � x y � x � � x x we find that � � − 1 � � T � T � 1 � x � y T WF ) − 1 F T WY = � B = ( F T � T � � x � � x x � x y   T � T � − 1 � T � T � − � x � T � − � x �� x � T � − � x �� y � � y � x x � x y   =  . � T � − 1 � T �  T � − � x �� x � T � − � x �� y � � x x � x y

� � T = Because � � , by setting � x � = ¯ x and � y � = ¯ B y we can express a B � a and � these formulas for � B simply as � T � � T � − 1 , � y − � B = y ( x − ¯ x ) ( x − ¯ x ) ( x − ¯ x ) a = ¯ B ¯ x . � The affine fit is therefore � y + � f ( x ) = ¯ B ( x − ¯ x ) . Remark. The linear multivariate models considered above have the form m � f ( x ; β 1 , · · · , β m ) = β i f i ( x ) , i =1 where each parameter vector β i lies in R q while each basis function f i ( x ) is defined over the bounded domain X ⊂ R p and takes values in R . This is assumes that each entry of f is being fit to the same family — namely, the family spanned by the basis { f i ( x ) } m i =1 . Such families often are too large to be practical. We will therefore consider more general linear models.

7. General Multivariate Linear Least Squares Fitting We now extend the least square method to the general multivariate setting. Suppose we are given data { ( x j , y j ) } n j =1 where the x j lie within a bounded domain X ⊂ R p while the y j lie in R q . We will use weighted least squares to fit the data to a linear statistical model with m real parameters in the form m � f ( x ; β 1 , · · · , β m ) = β i f i ( x ) , i =1 where each basis function f i ( x ) is defined over X and takes values in R q . We will minimize the j th residual, which is defined by the vector-valued formula m � r j ( β 1 , · · · , β m ) = y j − β i f i ( x ) . i =1

Following what was done earlier, introduce the m -vector β , the nq -vectors Y and R , and the nq × m matrix F by       β 1 y 1 r 1  .   .   .  . . . β = Y = R = .  , .  , .  ,    β m y n r n   f 1 ( x 1 ) · · · f m ( x 1 ) . . .   . . . F = . . .  .  f 1 ( x n ) f m ( x n ) · · · We will assume the matrix F has rank m . The fitting problem then can be recast as finding β so as to minimize the size of the vector R ( β ) = Y − F β .

We assume that R q is endowed with an inner product. Without loss of T Gz where generality we can assume that this inner product has the form y G is a symmetric, positive definite q × q matrix. We will minimize n � q ( β ) = 1 T Gr j ( β 1 , · · · , β m ) , w j r j ( β 1 , · · · , β m ) 2 j =1 where the w j are positive weights. If we let W be the symmetric, positive definite nq × nq block-diagonal matrix   0 0 w 1 G · · · . ...   . 0 w 2 G .   W =  ,   . ... ... . 0 .  0 · · · 0 w n G then q ( β ) can be expressed in terms of the weight matrix W as q ( β ) = 1 T WR ( β ) = 1 T W ( Y − F β ) 2 R ( β ) 2 ( Y − F β ) = 1 2 Y T WY − β T F T WY + 1 2 β T F T WF β .

T WF is positive definite. The Because F has rank m the m × m -matrix F function q ( β ) thereby has the same strictly convex structure as it had in the univariate case. It therefore has a unique minimizer β = � β where T WF ) − 1 F T WY . � β = ( F The fact that � T WF is positive β in a minimizer again follows from the fact F definite and the identity T F q ( β ) = 1 2 Y T WY − 1 T WF � β + 1 T F T WF ( β − � 2 � 2 ( β − � β β ) β ) T F T WF ( β − � = q ( � β ) + ( β − � β ) β ) . β ) for every β ∈ R m and that In particular, this shows that q ( β ) ≥ q ( � q ( β ) = q ( � β ) if and only if β = � β . Remark. The geometric interpretation of this fit is that same as that for the weighted least squares fit, except here the W -inner product on R nq is T WQ . ( P | Q ) W = P

Further Questions We have seen how to use least squares to fit linear satistical models with m parameters to data sets containing n pairs when m << n . Among the questions that arise are the following. • How does one pick a basis that is well suited to the given data? • How can one avoid overfitting? • Do these methods extended to nonlinear statistical models? • Can one use other notions of smallness of the residual?

Fitting Linear Statistical Models to Data by Least Squares III: - PowerPoint PPT Presentation

Fitting Linear Statistical Models to Data by Least Squares III: Multivariate Brian R. Hunt and C. David Levermore University of Maryland, College Park Math 420: Mathematical Modeling January 25, 2012 version Outline 1) Introduction to Linear

Track fitting, vertex fitting and Track fitting, vertex fitting and Track fitting, vertex fitting

Least Squares and Data Fitting Data fitting How do we best fit a set of data points? Linear

Linear Least Squares I Steve Marschner Cornell CS 322 Cornell CS 322 Linear Least Squares I 1

Week 2 Video 5 Cross-Validation and Over-Fitting Over-Fitting Ive mentioned over-fitting a

Lecture 11 Fitting ARIMA Models 10/10/2018 1 Model Fitting Fitting ARIMA For an

Functions and Data Fitting COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning

Statistical Geometry Processing Winter Semester 2011/2012 Least-Squares Least-Squares Fitting

Unit 1: Data Fitting Motivation Data fitting: Construct a continuous function that represents

Fitting a Line, Residuals, and Correlation October 28, 2019 October 28, 2019 1 / 36 Fitting a

8. Least squares Review of linear equations Least squares Example: curve-fitting

Introduction to Data Science: Logistic 0 1 1 according to a data fit criterion. account

Functional Linear Models 1 66 / 181 Functional Linear Models Statistical Models So far we have

Statistical Properties of the Regularized Least Squares Functional and a hybrid LSQR Newton method

Fitting a Line, Residuals, and Correlation August 27, 2019 August 27, 2019 1 / 54 Fitting a

Fitting Agent Fitting Agent- -Based Models to Based Models to Historical Networks Historical

Lecture 19 Fitting CAR and SAR Models Colin Rundel 03/29/2017 1 Fitting areal models 2 CAR

On the use of the damped Newton method to solve direct and controllability problems for parabolic

GMRES Methods for Least Squares Problems Ken Hayami, Jun-Feng Yin, National Institute of

ON-Bases and Least Square Method Artem Los (arteml@kth.se) February 21th, 2017 Artem Los

Estimating Size and Effort Dr. James A. Bednar jbednar@inf.ed.ac.uk

Final exam location: Clough 152 Please fill out your CIOS survey! Post topics for

Estimation III: Method of Moments and Maximum Likelihood Stat 3202 @ OSU, Autumn 2018 Dalpiaz 1

MLE part 2 18 / 70 Gaussian Mixture Model Suppose data is drawn from k Gaussians, meaning Y =

Evaluation 1. written exam dealing with all theoretical background and examples discussed in the