Linear Fitting CS3220 - Summer 2008 Jonathan Kaldor (based on Sp07 - PowerPoint PPT Presentation

Linear Fitting CS3220 - Summer 2008 Jonathan Kaldor (based on Sp07 Slides)

From N to M • We have been talking about solving linear systems of n equations in n variables • In other words, Ax = b where A is n x n • Usually: a single solution x = b A Square system

From N to M • What happens if the number of equations is not equal to the number of unknowns? • General case: m linear equations in n unknowns • Still expressible as a matrix times a vector... • ...but no longer a square matrix

Rectangular Systems A b x = x = b A Overdetermined Underdetermined (m > n) (m < n)

Rectangular Systems • Still well-defined set of matrix equations • May be full rank but have many solutions (or no exact solutions) • Our focus: m > n (overdetermined systems)

Example 4 2 x 1 2 10 y = 0 2 4 3

Overdetermined Systems • When full rank, extra equations either not necessary or unsatisfiable • Can we even talk about what a solution to this problem is? • Want “best” answer for some definition of “best”

Examples • Model Fitting

Examples • Model Fitting • We’ve fit our data points exactly • But do we need to? • Error in experimental results • Fewer dimensions in model • High degree polynomial overfitting data

Examples • Model Fitting

Examples • Model Fitting 1 x 1 d 1 b 1 x 2 d 2 a = 1 x 3 d 3 1 x 4 d 4 ⋮ ⋮ Find best equation ax+b to match data

Examples • Model Fitting 1 x 1 x 12 d 1 c 1 x 2 x 22 d 2 b = 1 x 3 x 32 d 3 a 1 x 4 x 42 d 4 ⋮ ⋮ Find best equation ax 2 + bx + c to match data

Examples • Hugely applicable in sciences • Fitting model to experimental results • Economics • Predicting economic performance from economic indicators • NBA - predicting future performance of draft picks

Back to the Problem • So this is an important problem • ... but we still don’t know what the answer will look like! • In the square system, solved Ax = b , i.e. Ax - b = 0 • In the rectangular system, Ax - b is not necessarily 0 , but instead we can minimize the distance between Ax and b

Vector Distances • How long is this vector? (3,4)

Vector Distances • How long is this vector? (3,4) 5?

Vector Distances • How long is this vector? (3,4) 5? 7?

Vector Distances • How long is this vector? (3,4) 5? 7? Something else?

Vector Distances • Distances (called ‘norms’, denoted with ‖‖ ) • We require four properties: ‖ 0 ‖ = 0 ‖ x ‖ > 0 if x ≠ 0 ‖ c x ‖ = |c| ‖ x ‖ ‖ x + y ‖ ≤ ‖ x ‖ + ‖ y ‖ • Last property: triangle inequality

Vector Distances • Common vector norms: p-norms • ( ∑ |x i | p )^(1/p) • Common cases: • p = 1 (Manhattan distance) • p = 2 (Euclidean distance) • p = infinity (Chebyshev norm)

Vector Distances • Denote particular p-norm with subscript • ‖ x ‖ 1 , ‖ x ‖ 2 , etc... • Note alternate form of 2-norm • sqrt( x T x )

Back to the Problem (Again) • Rectangular systems solved with respect to the 2-norm • x * = min ‖ Ax - b ‖ 2 x = min sqrt( ∑ ( A (i,:)* x - b(i)) 2 ) x = min ∑ ( A (i,:)* x - b(i)) 2 x • We say x * is the least-squares solution to the rectangular system Ax = b , with residual r = Ax * - b

Least Squares • Why the 2-norm? • Intuitive • Sometimes is the ‘proper’ measure • Easy to solve • Of the 3 reasons, third is most important

2x1 Least Squares • Take a 2x1 example 2 x = 3 1 3

2x1 Least Squares (3,3) (2,1)

2x1 Least Squares • Given line and point p , find closest point on line to p • Perpendicular from p to the line • a.k.a. orthogonal projection

Review of Orthogonality • We say two vectors are orthogonal if their dot product is equal to 0 • x T y = ‖ x ‖ 2 ‖ y ‖ 2 cos Θ • If x ≠ 0 and y ≠ 0 , above is zero iff cos Θ = 0, i.e. Θ =± π /2, i.e. they are perpendicular

Review of Orthogonality • We say two vectors are orthonormal if they are orthogonal and ‖ v 1 ‖ 2 = ‖ v 2 ‖ 2 = 1 • Can extend to say sets of n vectors are orthonormal with respect to each other • We say a matrix Q is orthogonal if its columns are all orthonormal with respect to each other • Q T Q = I

Perpendicular Residual • In our 2x1 case, the residual a x - b is orthogonal to the vector a • Leads to a T r = 0 a T ( a x - b ) = 0 a T a x - a T b = 0 a T a x = a T b

3x2 Case • Trust your geometric intuition • 3x2 case: closest point on plane • Holds in higher dimensions (but best not to try and picture it!)

3x2 Case • Find closest point on 2D plane defined by vectors a 1 and a 2 (3D vectors) • Residual must be orthogonal to both a 1 and a 2 • Two equations: a 1T r = 0 a 2T r = 0 • Rewrite as a 1T a 1 a 2 x = b a 2T

General Case • Extend this to m equations in n variables • Our residual must be orthogonal to each column in A • Results in n equations, each of the form A (:,i) T r = 0 • Can rewrite as A T Ax = A T b

Normal Equations • This is known as the system of normal equations • The solution to A T Ax = A T b , x * , is the solution to the least squares problem Ax = b • Convert rectangular system into square system, solve using standard techniques (note: can use Cholesky)

Outliers

Outliers • Where do they come from? • Error in measurements • User error • Why do they have such an effect? • Least squares

Outliers • How can we handle them? • Toss out worst-fitting points (but need to make sure they really are outliers first!) • Measure error differently

Solving Least Squares in MATLAB • Remember \ ? • Solves rectangular as well as square systems • A \ b will solve the rectangular system Ax = b in the least squares sense • Can also specify multiple right hand sides: A \ B solves AX = B for each column of B

Linear Fitting CS3220 - Summer 2008 Jonathan Kaldor (based on Sp07 - PowerPoint PPT Presentation

Linear Fitting CS3220 - Summer 2008 Jonathan Kaldor (based on Sp07 Slides) From N to M We have been talking about solving linear systems of n equations in n variables In other words, Ax = b where A is n x n Usually: a single

Track fitting, vertex fitting and Track fitting, vertex fitting and Track fitting, vertex fitting

Week 2 Video 5 Cross-Validation and Over-Fitting Over-Fitting Ive mentioned over-fitting a

Lecture 11 Fitting ARIMA Models 10/10/2018 1 Model Fitting Fitting ARIMA For an

Least Squares and Data Fitting Data fitting How do we best fit a set of data points? Linear

Linear Least Squares I Steve Marschner Cornell CS 322 Cornell CS 322 Linear Least Squares I 1

Fitting a Line, Residuals, and Correlation October 28, 2019 October 28, 2019 1 / 36 Fitting a

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Functions and Data Fitting COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning

Over fitting distribution functions over Bayesian Regression / " ' i diggllloise dist

Fitting high resolution structures into low resolution EM maps Michael Rossmann Purdue

Unit 1: Data Fitting Motivation Data fitting: Construct a continuous function that represents

Fitting a Line, Residuals, and Correlation August 27, 2019 August 27, 2019 1 / 54 Fitting a

Overview IAML: Linear Regression The linear model Fitting the linear model to data

Locality and Smoothness or Wavelets and Splines May 2, 2018 Wavelet - a small wave Fitting a

Fitting Agent Fitting Agent- -Based Models to Based Models to Historical Networks Historical

Mechanical Fitting Failures Reporting and Data Analysis - 1 - MFFR Reporting 191.12

Outline Outline Several Random Variables Several Random Variables Joint

Perfectoid fields, deeply ramified fields and their relatives Franz-Viktor Kuhlmann (joint work

CS 61: Database Systems Advanced data modeling Adapted from Silberschatz, Korth, and Sundarshan

Semantic Types and Function Application Ling324 Semantic Types we have specified so far for the

Supervised Metric Learning M. Sebban Laboratoire Hubert Curien , UMR CNRS 5516 University of Jean

Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 17: Distance models

Dynamic Classifier Selection Based on Imprecise Probabilities Meizhu Li Ghent University

On the construction of minimax-distance (sub-)optimal designs Luc Pronzato Universit Cte

Linear Fitting CS3220 - Summer 2008 Jonathan Kaldor (based on Sp07 - PowerPoint PPT Presentation

Linear Fitting CS3220 - Summer 2008 Jonathan Kaldor (based on Sp07 Slides) From N to M We have been talking about solving linear systems of n equations in n variables In other words, Ax = b where A is n x n Usually: a single

Track fitting, vertex fitting and Track fitting, vertex fitting and Track fitting, vertex fitting

Week 2 Video 5 Cross-Validation and Over-Fitting Over-Fitting Ive mentioned over-fitting a

Lecture 11 Fitting ARIMA Models 10/10/2018 1 Model Fitting Fitting ARIMA For an

Least Squares and Data Fitting Data fitting How do we best fit a set of data points? Linear

Linear Least Squares I Steve Marschner Cornell CS 322 Cornell CS 322 Linear Least Squares I 1

Fitting a Line, Residuals, and Correlation October 28, 2019 October 28, 2019 1 / 36 Fitting a

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Functions and Data Fitting COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning

Over fitting distribution functions over Bayesian Regression / &quot; ' i diggllloise dist

Fitting high resolution structures into low resolution EM maps Michael Rossmann Purdue

Unit 1: Data Fitting Motivation Data fitting: Construct a continuous function that represents

Fitting a Line, Residuals, and Correlation August 27, 2019 August 27, 2019 1 / 54 Fitting a

Overview IAML: Linear Regression The linear model Fitting the linear model to data

Locality and Smoothness or Wavelets and Splines May 2, 2018 Wavelet - a small wave Fitting a

Fitting Agent Fitting Agent- -Based Models to Based Models to Historical Networks Historical

Mechanical Fitting Failures Reporting and Data Analysis - 1 - MFFR Reporting 191.12

Outline Outline Several Random Variables Several Random Variables Joint

Perfectoid fields, deeply ramified fields and their relatives Franz-Viktor Kuhlmann (joint work

CS 61: Database Systems Advanced data modeling Adapted from Silberschatz, Korth, and Sundarshan

Semantic Types and Function Application Ling324 Semantic Types we have specified so far for the

Supervised Metric Learning M. Sebban Laboratoire Hubert Curien , UMR CNRS 5516 University of Jean

Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 17: Distance models

Dynamic Classifier Selection Based on Imprecise Probabilities Meizhu Li Ghent University

On the construction of minimax-distance (sub-)optimal designs Luc Pronzato Universit Cte

Over fitting distribution functions over Bayesian Regression / " ' i diggllloise dist