Linear Least Squares I Steve Marschner Cornell CS 322 Cornell CS - PowerPoint PPT Presentation

Fitting Applications Solving Trouble Summary Linear Least Squares I Steve Marschner Cornell CS 322 Cornell CS 322 Linear Least Squares I 1

Fitting Applications Solving Trouble Summary Outline Linear Fitting Examples of linear �tting problems Solving linear least squares problems Dif�culties in least squares �tting Summary Cornell CS 322 Linear Least Squares I 2

Fitting Applications Solving Trouble Summary Linear systems We have been looking at systems for i = 1 . . . n y i = f i ( x 1 , . . . , x n ) or R n → I where f : I R n y = f ( x ) which, when f is linear, read where A ∈ I R n × n Ax = b Cornell CS 322 Linear Least Squares I 3

Fitting Applications Solving Trouble Summary Square linear systems The equation Ax = b with A an n × n matrix is a square linear system. Generally we expect this system to have exactly one solution. A x b = (If A is singular, there might be no solution or many solutions.) Cornell CS 322 Linear Least Squares I 4

Fitting Applications Solving Trouble Summary Non-square systems If A is n × m , and n � = m , the system is called (surprise!) non-square or rectangular and is generally either overdetermined or underdetermined . x A = b A = b x overdetermined underdetermined (If A is singular, you can't necessarily tell whether a system is over- or underdetermined from its shape.) Cornell CS 322 Linear Least Squares I 5

Fitting Applications Solving Trouble Summary Overdetermined systems Today, we're interested in the overdetermined case: m > n , more knowns than unknowns. = Ax ≈ b Generally such an equation will have no exact solution, and we are in the business of �nding a compromise. Cornell CS 322 Linear Least Squares I 6

Fitting Applications Solving Trouble Summary Linear regression Experiment to �nd thermal expansion coef�cient with metal bar and a torch: • measure temperature of bar, record as T 1 . • measure length of bar, record as L 1 . • crank up heat, wait for a bit. • measure temperature T 2 and length L 2 . • repeat for many trials. The data is n pairs ( T i , L i ) . The hypothesis is that L ( T ) = L 0 (1 + αT ) , where L 0 is the bar's nominal length, and we want to estimate α . Cornell CS 322 Linear Least Squares I 7

Fitting Applications Solving Trouble Summary Linear regression To put this in the standard form, we have a set of given data points ( x i , y i ) and we believe that y = mx + b . (Here x is T , y is L , and m is L 0 α .) We believe that if there were no experimental uncertainty the model would �t the data exactly, but since there is noise the best we can do is minimize error. The problem is � ( mx i + b − y i ) 2 min m,b i To make this look like our standard problem we use the HW2 trick: � � m � � mx + b = x 1 b Cornell CS 322 Linear Least Squares I 8

Fitting Applications Solving Trouble Summary Linear regression Stacking the data points into a matrix results in: 2 � �     x 1 1 y 1 � � . � m � . � � . . min − . . �     � b �     � m,b � � x n 1 y n � � which is a linear least squares problem in the standard form. Cornell CS 322 Linear Least Squares I 9

Fitting Applications Solving Trouble Summary Polynomial regression Suppose the model we expect to �t our data ( x i , y i ) is a cubic polynomial rather than a straight line: p ( x ) = ax 3 + bx 2 + cx + d We want to �nd a , b , c , and d to best match the data: � ( ax 3 i + bx 2 i + cx i + d − y i ) 2 min a,b,c,d i Thinking of the coef�cients as variables and the variables as coef�cients, we can write this: 2 �   � a  x 3 x 2    1 x 1 y 1 � � 1 1 � � . b . . . �   � min  −  .   .  �   � c     a,b,c,d � �  x 3 x 2 1 x n y n � � d n n � � Cornell CS 322 Linear Least Squares I 10

Fitting Applications Solving Trouble Summary Fitting with basis functions This same approach works for any set of functions you want to add together to approximate some data: � y i ≈ a j b j ( x i ) j This works for any b j s, such as monomials (which we just saw), sines and cosines, etc. Cornell CS 322 Linear Least Squares I 11

Fitting Applications Solving Trouble Summary Economic prediction So far we have looked at a single independent variable, with complexity arising from the type of model. Some problems have many independent variables. Moler's problem 5.11 has an example of an economic application. We would like to be able to predict total employment from a set of other economic measures: • x 1 : GNP implicit price de�ator • x 2 : Gross National Product • x 3 : Unemployment • x 4 : Size of armed forces • x 5 : Population • x 6 : Year Cornell CS 322 Linear Least Squares I 12

Fitting Applications Solving Trouble Summary Economic prediction We'd like to approximate y , the total employment, as a linear combination of the others: � y ≈ β 0 + β j x j j We have historical data available for many years, and so we can set up a system with a row for each year, each of which reads   β 0 β 1   � � . y = 1 x 1 x 2 x 3 x 4 x 5 x 6   . .     β 6 with more than 7 years of data, this will be an overdetermined system that can be solved by least squares. Then y can be predicted in future years for which only the x s are available. Cornell CS 322 Linear Least Squares I 13

Fitting Applications Solving Trouble Summary Least squares �tting The basic approach is to look for an x that makes Ax close to b : x ∗ = min x distance( Ax , b ) . How to measure distance? Usually by the magnitude of the difference: x ∗ = min x size( Ax , b ) How we measure �size� determines what kind of answer we get. Cornell CS 322 Linear Least Squares I 14

Fitting Applications Solving Trouble Summary Least squares �tting The default way to measure size is with a vector norm, such as the familiar Euclidean distance (2-norm): x ∗ = min x � Ax − b � which expands out to �� x ∗ = min ( a i · x − b i ) 2 x i � ( a i · x − b i ) 2 . = min x i Since we only care about the minimum value, we can drop the square root, and our problem is to minimize the sum of squares. Cornell CS 322 Linear Least Squares I 15

Fitting Applications Solving Trouble Summary Why least squares? Why are we using this sum-of-squares metric for error? • Because it is the right norm for the problem? maybe with some strong assumptions. . . • Because it corresponds to a familiar notion of distance? getting closer. . . • Because it results in a problem that's really easy to solve? bingo! Don't let its elegance seduce you into thinking that a least squares solution is the Right Answer for every �tting problem. Cornell CS 322 Linear Least Squares I 16

Fitting Applications Solving Trouble Summary Solving a 2 × 1 least squares system Let's look at an example for n = 1 , m = 2 : � a 1 � � b 1 � or x ≈ a x ≈ b a 2 b 2 In this case we are taking a scalar multiple of a single vector a and trying to come close to a point b . Here is a picture of the situation: a b Cornell CS 322 Linear Least Squares I 17

Fitting Applications Solving Trouble Summary Solving a 2 × 1 least squares system What is the closest point on this line to b ? a b It is the orthogonal projection of b on to the line. Cornell CS 322 Linear Least Squares I 18

Fitting Applications Solving Trouble Summary Solving a 2 × 1 least squares system What is the closest point on this line to b ? a x a b It is the orthogonal projection of b on to the line. Cornell CS 322 Linear Least Squares I 18

Fitting Applications Solving Trouble Summary Solving a 2 × 1 least squares system What is the closest point on this line to b ? a x r a b It is the orthogonal projection of b on to the line. If a x ∗ is the closest point to b , then the residual r = a x ∗ − b must be orthogonal to a : a · ( a x ∗ − b ) = 0 a · a x ∗ = a · b a · r = 0 ; ; Cornell CS 322 Linear Least Squares I 18

Fitting Applications Solving Trouble Summary Solving a 2 × 1 least squares system So the 2 × 1 case boils down to a · a x ∗ = a · b Some interpretations of this: • Residual is orthogonal to a . • The vectors a x ∗ and b have the same component in the a direction. • (If � a � = 1 ) x ∗ is the component of b in the a direction. Cornell CS 322 Linear Least Squares I 19

Fitting Applications Solving Trouble Summary Solving a 3 × 2 least squares system Now we can graduate to the 3 × 2 case: or or � � Ax ≈ b x ≈ b a 1 x 1 + a 2 x 2 ≈ b a 1 a 2 Geometrically, this is �nding the point on the plane spanned by a 1 and a 2 that is closest to b . b a 2 a 1 Cornell CS 322 Linear Least Squares I 20

Fitting Applications Solving Trouble Summary Solving a 3 × 2 least squares system Now we can graduate to the 3 × 2 case: or or � � Ax ≈ b x ≈ b a 1 x 1 + a 2 x 2 ≈ b a 1 a 2 Geometrically, this is �nding the point on the plane spanned by a 1 and a 2 that is closest to b . b r a 2 Ax a 1 Now the residual is orthogonal to the plane�which is to say, it is orthogonal to both columns of A . Cornell CS 322 Linear Least Squares I 20

Linear Least Squares I Steve Marschner Cornell CS 322 Cornell CS - PowerPoint PPT Presentation

Fitting Applications Solving Trouble Summary Linear Least Squares I Steve Marschner Cornell CS 322 Cornell CS 322 Linear Least Squares I 1 Fitting Applications Solving Trouble Summary Outline Linear Fitting Examples of linear tting

Practical Least-Squares for Computer Graphics Siggraph Course 11 Siggraph Course 11 Practical

Statistical Properties of the Regularized Least Squares Functional and a hybrid LSQR Newton method

Least Mean Squares Regression Machine Learning 1 Least Squares Method for regression

Non linear Least Squares Lectures for PHD course on Numerical optimization Enrico Bertolazzi

The Mathemagic of Magic Squares History of Magic Squares Mathematics and Magic Squares

8. Least squares Review of linear equations Least squares Example: curve-fitting

Geometry of Least Squares 2 Least squares from the

Non-linear Least Squares and Durbins Problem Asymptotic Theory Part V James J. Heckman

ECE 516: Adaptive Digital Filters Lecture 13 (Recursive Least-Squares) Mojtaba Soltanalian 2

Statistical Geometry Processing Winter Semester 2011/2012 Least-Squares Least-Squares Fitting

9. Equality constraints and tradeoffs More least squares Example: moving average model

Moving Least Squares Outline The Approximation Power of Moving Least- Squares D. Levin

The Chi-squared Distribution of the Regularized Least Squares Functional for Regularization

Topic 5: Non-Linear Relationships and Non-Linear Least Squares Non-linear Relationships Many

A fast way to compute Least Squares Teo Zhi Shen Anderson Serangoon Junior College Least

Squares of function spaces and function spaces on squares Miko laj Krupski University of

Network Flow II Lecture 13 October 10, 2013 http://www.cs.berkeley.edu/jrs/ Calvin Sariel

Lecture 1 Economic Data and Simple Linear Regression CHUNG-MING KUAN Department of Finance &

Regression Discontinuity Designs James H. Steiger Department of Psychology and Human Development

Overview of Spatial Statistics Brian Reich and Safraj Shahul Hameed North Carolina State

Sorting in the labor Market Part 1: AKM framework Thibaut Lamadon U. Chicago October 24, 2017

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 12:

The MassHealth Waiver & 2009-2011 Yes We Can: The MassHealth Waiver 2009-2011 MA Health

Introduction Literature Methodology Empirical Finding Robustness Tests

Linear Least Squares I Steve Marschner Cornell CS 322 Cornell CS - PowerPoint PPT Presentation

Fitting Applications Solving Trouble Summary Linear Least Squares I Steve Marschner Cornell CS 322 Cornell CS 322 Linear Least Squares I 1 Fitting Applications Solving Trouble Summary Outline Linear Fitting Examples of linear tting

Practical Least-Squares for Computer Graphics Siggraph Course 11 Siggraph Course 11 Practical

Statistical Properties of the Regularized Least Squares Functional and a hybrid LSQR Newton method

Least Mean Squares Regression Machine Learning 1 Least Squares Method for regression

Non linear Least Squares Lectures for PHD course on Numerical optimization Enrico Bertolazzi

The Mathemagic of Magic Squares History of Magic Squares Mathematics and Magic Squares

8. Least squares Review of linear equations Least squares Example: curve-fitting

Geometry of Least Squares 2 Least squares from the

Non-linear Least Squares and Durbins Problem Asymptotic Theory Part V James J. Heckman

ECE 516: Adaptive Digital Filters Lecture 13 (Recursive Least-Squares) Mojtaba Soltanalian 2

Statistical Geometry Processing Winter Semester 2011/2012 Least-Squares Least-Squares Fitting

9. Equality constraints and tradeoffs More least squares Example: moving average model

Moving Least Squares Outline The Approximation Power of Moving Least- Squares D. Levin

The Chi-squared Distribution of the Regularized Least Squares Functional for Regularization

Topic 5: Non-Linear Relationships and Non-Linear Least Squares Non-linear Relationships Many

A fast way to compute Least Squares Teo Zhi Shen Anderson Serangoon Junior College Least

Squares of function spaces and function spaces on squares Miko laj Krupski University of

Network Flow II Lecture 13 October 10, 2013 http://www.cs.berkeley.edu/jrs/ Calvin Sariel

Lecture 1 Economic Data and Simple Linear Regression CHUNG-MING KUAN Department of Finance &amp;

Regression Discontinuity Designs James H. Steiger Department of Psychology and Human Development

Overview of Spatial Statistics Brian Reich and Safraj Shahul Hameed North Carolina State

Sorting in the labor Market Part 1: AKM framework Thibaut Lamadon U. Chicago October 24, 2017

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 12:

The MassHealth Waiver &amp; 2009-2011 Yes We Can: The MassHealth Waiver 2009-2011 MA Health

Introduction Literature Methodology Empirical Finding Robustness Tests

Lecture 1 Economic Data and Simple Linear Regression CHUNG-MING KUAN Department of Finance &

The MassHealth Waiver & 2009-2011 Yes We Can: The MassHealth Waiver 2009-2011 MA Health