Least Squares and Data Fitting Data fitting How do we best fit a - PowerPoint PPT Presentation

Least Squares and Data Fitting

Data fitting How do we best fit a set of data points?

Linear Least Squares 1) Fitting with a line Given 𝑛 data points { 𝑢 ! , 𝑧 ! , … , 𝑢 " , 𝑧 " } , we want to find the function 𝑧 = 𝑦 ! + 𝑦 " 𝑢 that best fit the data (or better, we want to find the coefficients 𝑦 # , 𝑦 ! ). Thinking geometrically, we can think “what is the line that most nearly passes through all the points?”

Given 𝑛 data points { 𝑢 ! , 𝑧 ! , … , 𝑢 " , 𝑧 " } , we want to find 𝑦 # and 𝑦 ! such that 𝑧 $ = 𝑦 # + 𝑦 ! 𝑢 $ ∀𝑗 ∈ 1, 𝑛 or in matrix form: Note that this system of 𝑧 ! linear equations has more 1 𝑢 ! 𝑦 # ⋮ 𝑩 𝒚 = 𝒄 ⋮ ⋮ equations than unknowns – 𝑦 ! = 𝑧 " 1 𝑢 " OVERDETERMINED SYSTEMS 𝒏×𝒐 𝒐×𝟐 𝒏×𝟐 We want to find the appropriate linear combination of the columns of 𝑩 that makes up the vector 𝒄 . If a solution exists that satisfies 𝑩 𝒚 = 𝒄 then 𝒄 ∈ 𝑠𝑏𝑜𝑕𝑓(𝑩)

Linear Least Squares • In most cases, 𝒄 ∉ 𝑠𝑏𝑜𝑕𝑓(𝑩) and 𝑩 𝒚 = 𝒄 does not have an exact solution! • Therefore, an overdetermined system is better expressed as 𝑩 𝒚 ≅ 𝒄

Linear Least Squares • Least Squares : find the solution 𝒚 that minimizes the residual 𝒔 = 𝒄 − 𝑩 𝒚 • Let’s define the function 𝜚 as the square of the 2-norm of the residual % 𝜚 𝒚 = 𝒄 − 𝑩 𝒚 %

Linear Least Squares • Least Squares : find the solution 𝒚 that minimizes the residual 𝒔 = 𝒄 − 𝑩 𝒚 • Let’s define the function 𝜚 as the square of the 2-norm of the residual % 𝜚 𝒚 = 𝒄 − 𝑩 𝒚 % • Then the least squares problem becomes min 𝒚 𝜚 (𝒚) • Suppose 𝜚: ℛ " → ℛ is a smooth function, then 𝜚 𝒚 reaches a (local) maximum or minimum at a point 𝒚 ∗ ∈ ℛ " only if ∇𝜚 𝒚 ∗ = 0

How to find the minimizer? • To minimize the 2-norm of the residual vector % min 𝒚 𝜚 𝒚 = 𝒄 − 𝑩 𝒚 % 𝜚 𝒚 = (𝒄 − 𝑩 𝒚) ( (𝒄 − 𝑩 𝒚 ) ∇𝜚 𝒚 = 2(𝑩 ( 𝒄 − 𝑩 ( 𝑩 𝒚) Normal Equations – solve a linear system of equations First order necessary condition: ∇𝜚 𝒚 = 0 → 𝑩 ( 𝒄 − 𝑩 ( 𝑩 𝒚 = 𝟏 → 𝑩 ( 𝑩 𝒚 = 𝑩 ( 𝒄 Second order sufficient condition: 𝐸 % 𝜚 𝒚 = 2𝑩 ( 𝑩 2𝑩 ( 𝑩 is a positive semi-definite matrix → the solution is a minimum

Linear Least Squares (another approach) • Find 𝒛 = 𝑩 𝒚 which is closest to the vector 𝒄 • What is the vector 𝒛 = 𝑩 𝒚 ∈ 𝑠𝑏𝑜𝑕𝑓(𝑩) that is closest to vector 𝒛 in the Euclidean norm? When 𝒔 = 𝒄 − 𝒛 = 𝒄 − 𝑩 𝒚 is orthogonal to all columns of 𝑩 , then 𝒛 is closest to 𝒄 𝑩 " 𝑩 𝒚 = 𝑩 " 𝒄 𝑩 𝑼 𝒔 = 𝑩 𝑼 (𝒄 − 𝑩 𝒚)= 0

Summary: • 𝑩 is a 𝑛×𝑜 matrix, where 𝑛 > 𝑜 . • 𝑛 is the number of data pair points. 𝑜 is the number of parameters of the “best fit” function. • Linear Least Squares problem 𝑩 𝒚 ≅ 𝒄 always has solution. • The Linear Least Squares solution 𝒚 minimizes the square of the 2-norm of the residual: % min 𝒄 − 𝑩 𝒚 % 𝒚 • One method to solve the minimization problem is to solve the system of Normal Equations 𝑩 ( 𝑩 𝒚 = 𝑩 ( 𝒄 • Let’s see some examples and discuss the limitations of this method.

Example: Solve: 𝑩 # 𝑩 𝒚 = 𝑩 # 𝒄

Data fitting - not always a line fit! • Does not need to be a line! For example, here we are fitting the data using a quadratic curve. Linear Least Squares : The problem is linear in its coefficients!

Another examples We want to find the coefficients of the quadratic function that best fits the data points: 𝑧 = 𝑦 ! + 𝑦 " 𝑢 + 𝑦 # 𝑢 # We would not want our “fit” curve to pass through the data points exactly as we are looking to model the general trend and not capture the noise.

Data fitting 𝑧 ! " 𝑦 $ 1 𝑢 ! 𝑢 ! Solve: 𝑩 # 𝑩 𝒚 = 𝑩 # 𝒄 𝑦 ! ⋮ = ⋮ ⋮ ⋮ 𝑦 " 𝑧 # " 1 𝑢 # 𝑢 # ( 𝑢 $ , 𝑧 $ )

Which function is not suitable for linear least squares? A) 𝑧 = 𝑏 + 𝑐 𝑦 + 𝑑 𝑦 # + 𝑒 𝑦 % B) 𝑧 = 𝑦 𝑏 + 𝑐 𝑦 + 𝑑 𝑦 # + 𝑒 𝑦 % C) 𝑧 = 𝑏 sin 𝑦 + 𝑐/ cos 𝑦 D) 𝑧 = 𝑏 sin 𝑦 + 𝑦/ cos 𝑐𝑦 E) 𝑧 = 𝑏 𝑓 &#' + 𝑐 𝑓 #'

Computational Cost 𝑩 # 𝑩 𝒚 = 𝑩 # 𝒄 • Compute 𝑩 ( 𝑩 : 𝑃 𝑛𝑜 % % ! 2 𝑜 2 , Cholesky → 𝑃 • Factorize 𝑩 ( 𝑩 : LU → 𝑃 2 𝑜 2 • Solve 𝑃 𝑜 % • Since 𝑛 > 𝑜 the overall cost is 𝑃 𝑛𝑜 %

Short questions Given the data in the table below, which of the plots shows the line of best fit in terms of least squares? A) B) C) D)

Short questions Given the data in the table below, and the least squares model 𝑧 = 𝑑 ! + 𝑑 % sin 𝑢𝜌 + 𝑑 2 sin 𝑢𝜌/2 + 𝑑 3 sin 𝑢𝜌/4 written in matrix form as determine the entry 𝐵 %2 of the matrix 𝑩 . Note that indices start with 1. A) −1.0 B) 1.0 C) − 0.7 D) 0.7 E) 0.0

Solving Linear Least Squares with SVD

What we have learned so far… 𝑩 is a 𝑛×𝑜 matrix where 𝑛 > 𝑜 (more points to fit than coefficient to be determined) Normal Equations: 𝑩 ! 𝑩 𝒚 = 𝑩 ! 𝒄 The solution 𝑩 𝒚 ≅ 𝒄 is unique if and only if 𝑠𝑏𝑜𝑙 𝐁 = 𝑜 • ( 𝑩 is full column rank) 𝑠𝑏𝑜𝑙 𝐁 = 𝑜 → columns of 𝑩 are linearly independent → 𝑜 non-zero • singular values → 𝑩 ! 𝑩 has only positive eigenvalues → 𝑩 ! 𝑩 is a symmetric and positive definite matrix → 𝑩 ! 𝑩 is invertible 𝒚 = 𝑩 ! 𝑩 "𝟐 𝑩 ! 𝒄 If 𝑠𝑏𝑜𝑙 𝐁 < 𝑜 , then 𝑩 is rank-deficient, and solution of linear least squares • problem is not unique .

Condition number for Normal Equations Finding the least square solution of 𝑩 𝒚 ≅ 𝒄 (where 𝑩 is full rank matrix) using the Normal Equations 𝑩 ( 𝑩 𝒚 = 𝑩 ( 𝒄 has some advantages, since we are solving a square system of linear equations with a symmetric matrix (and hence it is possible to use decompositions such as Cholesky Factorization) However, the normal equations tend to worsen the conditioning of the matrix. 𝑑𝑝𝑜𝑒 𝑩 ( 𝑩 = (𝑑𝑝𝑜𝑒 𝑩 ) % How can we solve the least square problem without squaring the condition of the matrix?

SVD to solve linear least squares problems 𝑩 is a 𝑛×𝑜 rectangular matrix where 𝑛 > 𝑜, and hence the SVD decomposition is given by: 𝜏 # ⋱ " ⋮ … ⋮ … 𝐰 # … 𝜏 % 𝑩 = 𝒗 # … 𝒗 $ ⋮ ⋮ ⋮ 0 " ⋮ … ⋮ … 𝐰 % … ⋮ 0 We want to find the least square solution of 𝑩 𝒚 ≅ 𝒄 , where 𝑩 = 𝑽 𝚻 𝑾 𝑼 or better expressed in reduced form: 𝑩 = 𝑽 5 𝚻 𝑺 𝑾 𝑼

Recall Reduced SVD 𝑛 > 𝑜 𝑩 = 𝑽 4 𝚻 𝑺 𝑾 𝑼 𝑛×𝑜 𝑛×𝑜 𝑜×𝑜 𝑜×𝑜

SVD to solve linear least squares problems 𝑩 = 𝑽 4 𝚻 𝑺 𝑾 𝑼 ( 𝜏 ! ⋮ … ⋮ … 𝐰 ! … 𝑩 = 𝒗 ! … 𝒗 A ⋱ ⋮ ⋮ ⋮ ( ⋮ … ⋮ 𝜏 A … 𝐰 A … We want to find the least square solution of 𝑩 𝒚 ≅ 𝒄 , where 𝑩 = 𝑽 $ 𝚻 𝑺 𝑾 𝑼 Normal equations: 𝑩 " 𝑩 𝒚 = 𝑩 " 𝒄 ⟶ 𝑽 & 𝚻 𝑺 𝑾 " " 𝑽 & 𝚻 𝑺 𝑾 " 𝒚 = 𝑽 & 𝚻 𝑺 𝑾 " " 𝒄 " 𝑽 & 𝚻 𝑺 𝑾 " 𝒚 = 𝑾 𝚻 𝑺 𝑽 & " 𝒄 𝑾 𝚻 𝑺 𝑽 & " 𝒄 𝑾 𝚻 𝑺 𝚻 𝑺 𝑾 " 𝒚 = 𝑾 𝚻 𝑺 𝑽 & ( 𝑾 " 𝒚 = 𝚻 𝑺 𝑽 & " 𝒄 𝚻 𝑺 When can we take the inverse of the singular matrix?

" 𝑾 # 𝒚 = 𝚻 𝑺 𝑽 $ # 𝒄 𝚻 𝑺 1) Full rank matrix ( 𝜏 $ ≠ 0 ∀𝑗 ): Unique solution: rank 𝑩 = 𝑜 ! 𝒄 "' 𝑽 $ 𝒚 = 𝑾 𝚻 𝑺 # 𝒄 𝑾 # 𝒚 = %& 𝑽 $ 𝚻 𝑺 𝑛×1 𝑜×1 𝑜×𝑜 𝑜×𝑜 𝑜×𝑛 2) Rank deficient matrix ( rank 𝑩 = 𝑠 < 𝑜 ) " 𝑾 # 𝒚 = 𝚻 𝑺 𝑽 $ # 𝒄 𝚻 𝑺 Solution is not unique!! " Find solution 𝒚 such that min 𝒚 𝜚 𝒚 = 𝒄 − 𝑩 𝒚 " min 𝒚 𝟑 and also 𝒚

2) Rank deficient matrix (continue) " 𝑾 # 𝒚 = 𝚻 𝑺 𝑽 $ # 𝒄 and also satisfies We want to find the solution 𝒚 that satisfies 𝚻 𝑺 min 𝒚 𝟑 𝒚 # 𝒄 for the variable 𝒛 Change of variables: Set 𝑾 # 𝒚 = 𝒛 and then solve 𝚻 𝑺 𝒛 = 𝑽 $ " 𝒄 𝑧 + = 𝒗 + 𝑗 = 1,2, … , 𝑠 " 𝒄 𝒗 # 𝜏 # 𝑧 # 𝜏 + ⋮ ⋱ ⋮ What do we do when 𝑗 > 𝑠 ? " 𝒄 𝒗 ) 𝜏 ) 𝑧 ) Which choice of 𝑧 + will minimize = 𝑧 )*# " 𝒗 )*# 𝒄 0 ⋮ ⋮ ⋱ 𝒚 𝟑 = 𝑾 𝒛 𝟑 ? 𝑧 % " 𝒄 𝒗 % 0 𝑧 + = 0, 𝑗 = 𝑠 + 1, … , 𝑜 Set Evaluate 𝑧 & ) (𝒗 * ⋮ … ⋮ ) # 𝒄) 𝑧 " 𝒚 = 𝑾𝒛 = 𝒘 & … 𝒘 ) 𝒚 = 9 𝑧 * 𝒘 𝒋 = 9 𝒘 𝒋 ⋮ 𝜏 * ⋮ … ⋮ 𝑧 ) *+& *+& - ! ./

Least Squares and Data Fitting Data fitting How do we best fit a - PowerPoint PPT Presentation

Least Squares and Data Fitting Data fitting How do we best fit a set of data points? Linear Least Squares 1) Fitting with a line Given data points { ! , ! , , " , " } , we want to find the function =

Track fitting, vertex fitting and Track fitting, vertex fitting and Track fitting, vertex fitting

Practical Least-Squares for Computer Graphics Siggraph Course 11 Siggraph Course 11 Practical

Linear Least Squares I Steve Marschner Cornell CS 322 Cornell CS 322 Linear Least Squares I 1

Statistical Geometry Processing Winter Semester 2011/2012 Least-Squares Least-Squares Fitting

Statistical Properties of the Regularized Least Squares Functional and a hybrid LSQR Newton method

8. Least squares Review of linear equations Least squares Example: curve-fitting

Least Mean Squares Regression Machine Learning 1 Least Squares Method for regression

The Mathemagic of Magic Squares History of Magic Squares Mathematics and Magic Squares

Geometry of Least Squares 2 Least squares from the

Week 2 Video 5 Cross-Validation and Over-Fitting Over-Fitting Ive mentioned over-fitting a

9. Equality constraints and tradeoffs More least squares Example: moving average model

Moving Least Squares Outline The Approximation Power of Moving Least- Squares D. Levin

ECE 516: Adaptive Digital Filters Lecture 13 (Recursive Least-Squares) Mojtaba Soltanalian 2

The Chi-squared Distribution of the Regularized Least Squares Functional for Regularization

Non linear Least Squares Lectures for PHD course on Numerical optimization Enrico Bertolazzi

Functions and Data Fitting COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning

11. Regression and Least Squares Prof. Tesler Math 186 Winter 2019 Prof. Tesler Ch. 11: Linear

Fitting a Line, Residuals, and Correlation August 27, 2019 August 27, 2019 1 / 54 Fitting a

Announcements Midterm review: next Wed Oct 4, 12-1 pm, ENS 31NQ Lecture 9: Fitting, Contours

Linear Regression David M. Blei COS424 Princeton University April 10, 2008 D. Blei Linear

Basic Linear Regression James H. Steiger Department of Psychology and Human Development

COL866: Foundations of Data Science Ragesh Jaiswal, IITD Ragesh Jaiswal, IITD COL866:

1 Hough Transform: Noisy line tokens votes Mechanics of the Hough transform Construct an

Announcements Wednesday, November 28 Please fill out your CIOS survey! If 85% of the class