least-squares L. Olson Department of Computer Science University - PowerPoint PPT Presentation

least-squares L. Olson Department of Computer Science University of Illinois at Urbana-Champaign 1

polling data Suppose we are given the data { ( x 1 , y 1 ) , ..., ( x n , y n ) } and we want to find a curve that best fits the data. 2

fitting curves 3

fitting a line Given n data points { ( x 1 , y i ) , ..., ( x n , y n ) } find a and b such that y i = ax i + b ∀ i ∈ [ 1 , n ] . In matrix form, find a and b that solves  x 1 1   y 1  � � a =  . .   .  . . . b     . . . x n 1 y n Systems with more equations than unknowns are called overdetermined 4

overdetermined systems If A is an m × n matrix, then in general, an m × 1 vector b may not lie in the column space of A . Hence Ax = b may not have an exact solution. Definition The residual vector is r = b − Ax . The least squares solution is given by minimizing the square of the residual in the 2-norm. 5

normal equations Writing r = ( b − Ax ) and substituting, we want to find an x that minimizes the following function φ ( x ) = || r || 2 2 = r T r = ( b − Ax ) T ( b − Ax ) = b T b − 2 x T A T b + x T A T Ax From calculus we know that the minimizer occurs where ∇ φ ( x ) = 0. The derivative is given by ∇ φ ( x ) = − 2 A T b + 2 A T Ax = 0 Definition The system of normal equations is given by A T Ax = A T b . 6

solving normal equations Since the normal equations forms a symmetric system, we can solve by computing the Cholesky factorization A T A = LL T and solving Ly = A T b and L T x = y . Consider   1 1 A = 0 ǫ     0 ǫ where 0 < ǫ < √ ǫ mach . The normal equations for this system is given by � � � � 1 + ǫ 2 1 1 1 A T A = = 1 + ǫ 2 1 1 1 7

normal equations: conditioning The normal equations tend to worsen the condition of the matrix. Theorem cond ( A T A ) = ( cond ( A )) 2 1 >> A = rand(10,10); 2 >> cond(A) 43.4237 3 4 >> cond(A’*A) 1.8856e+03 5 How can we solve the least squares problem without squaring the condition of the matrix? 8

other approaches • QR factorization. • For A ∈ R m × n , factor A = QR where • Q is an m × m orthogonal matrix • R is an m × n upper triangular matrix (since R is an m × n upper � � R ′ triangular matrix we can write R = where R is n × n upper 0 triangular and 0 is the ( m − n ) × n matrix of zeros) • SVD - singular value decomposition • For A ∈ R m × n , factor A = USV T where • U is an m × m orthogonal matrix • V is an n × n orthogonal matrix • S is an m × n diagonal matrix whose elements are the singular values. 9

orthogonal matrices Definition A matrix Q is orthogonal if Q T Q = QQ T = I Orthogonal matrices preserve the Euclidean norm of any vector v , || Qv || 2 2 = ( Qv ) T ( Qv ) = v T Q T Qv = v T v = || v || 2 2 . 10

using qr factorization for least squares Now that we know orthogonal matrices preserve the euclidean norm, we can apply orthogonal matrices to the residual vector without changing the norm of the residual. 2 2 2 � � R � � � � R � � � � R � � � r � 2 2 = � b − Ax � 2 � � � � Q T b − Q T Q � � � Q T b − � 2 = � b − Q x = x = x � � � � � � 0 0 0 � � � 2 2 2 � � � � c 1 x 1 If Q T b = and x = then c 2 x 2 2 2 2 � � � �� R � � c 1 � � Rx 1 � c 1 − Rx 1 � Q T b − = || c 1 − Rx 1 || 2 2 + || c 2 || 2 � � � � � � x = − = � � � � � � 2 0 c 2 0 c 2 � � � � � 2 2 2 Hence the least squares solution is given by solving � � � � � � R x 1 c 1 = . We can solve Rx 1 = c 1 using back substitution and 0 x 2 c 2 the residual is || r || 2 = || c 2 || 2 . 11

gram-schmidt orthogonalization One way to obtain the QR factorization of a matrix A is by Gram-Schmidt orthogonalization. We are looking for a set of orthogonal vectors q that span the range of A . For the simple case of 2 vectors { a 1 , a 2 } , first normalize a 1 and obtain a 1 q 1 = || a 1 || . Now we need q 2 such that q T 1 q 2 = 0 and q 2 = a 2 + cq 1 . That is, R ( q 1 , q 2 ) = R ( a 1 , a 2 ) Enforcing orthogonality gives: q T 1 q 2 = 0 = q T 1 a 2 + cq T 1 q 1 12

gram-schmidt orthogonalization q T 1 q 2 = 0 = q T 1 a 2 + cq T 1 q 1 Solving for the constant c. c = − q T 1 a 2 q T 1 q 1 reformulating q 2 gives. q 2 = a 2 − q T 1 a 2 q 1 q T 1 q 1 Adding another vector a 3 and we have for q 3 , q 3 = a 3 − q T q 2 − q T 2 a 3 1 a 3 q 1 q T q T 2 q 2 1 q 1 Repeating this idea for n columns gives us Gram-Schmidt orthogonalization. 13

gram-schmidt orthogonalization Since R is upper triangular and A = QR we have a 1 = q 1 r 11 a 2 = q 1 r 12 + q 2 r 22 . . = . . . . a n = q 1 r 1 n + q 2 r 2 n + ... + q n r nn From this we see that r ij = q T i a j i q i , j > i q T 14

orthogonal projection The orthogonal projector onto the range of q 1 can be written: q 1 q T 1 q T 1 q 1 . Application of this operator to a vector a orthogonally projects a onto q 1 . If we subtract the result from a we are left with a vector that is orthogonal to q 1 . 1 ( I − q 1 q T q T 1 ) a = 0 q T 1 q 1 15

gram-schmidt orthogonalization 1 function [Q,R] = gs_qr (A) 2 3 m = size(A,1); 4 n = size(A,2); 5 6 for i = 1:n R(i,i) = norm(A(:,i),2); 7 Q(:,i) = A(:,i)./R(i,i); 8 for j = i+1:n 9 R(i,j) = Q(:,i)’ * A(:,j); 10 A(:,j) = A(:,j) - R(i,j)*Q(:,i); 11 end 12 13 end 14 15 end 16

using svd for least squares Recall that a singular value decomposition is given by   σ 1   v T     . . . ... . . .   1 . . . . . . . . .     A = u 1 u m σ r    .    . . . .     . . . . . . .     v T . . . ...   . . . n . . . . . . . . .   0 where σ i are the singular values. 17

using svd for least squares Assume that A has rank k (and hence k nonzero singular values σ i ) and recall that we want to minimize || r || 2 2 = || b − Ax || 2 2 . Substituting the SVD for A we find that || r || 2 2 = || b − Ax || 2 2 = || b − USV T x || 2 2 where U and V are orthogonal and S is diagonal with k nonzero singular values. || b − USV T x || 2 2 = || U T b − U T USV T x || 2 2 = || U T b − SV T x || 2 2 18

using svd for least squares Let c = U T b and y = V T x (and hence x = Vy ) in || U T b − SV T x || 2 2 . We now have || r || 2 2 = || c − Sy || 2 2 Since S has only k nonzero diagonal elements, we have k n ( c i − σ i y i ) 2 + � � || r || 2 c 2 2 = i i = 1 i = k + 1 which is minimized when y i = c i σ i for 1 � i � k . 19

using svd for least squares Theorem Let A be an m × n matrix of rank r and let A = USV T , the singular value decomposition. The least squares solution of the system Ax = b is r � ( σ − 1 x = c i ) v i i i = 1 where c i = u T i b . 20

least-squares L. Olson Department of Computer Science University - PowerPoint PPT Presentation

least-squares L. Olson Department of Computer Science University of Illinois at Urbana-Champaign 1 polling data Suppose we are given the data { ( x 1 , y 1 ) , ..., ( x n , y n ) } and we want to find a curve that best fits the data. 2

Practical Least-Squares for Computer Graphics Siggraph Course 11 Siggraph Course 11 Practical

Statistical Properties of the Regularized Least Squares Functional and a hybrid LSQR Newton method

Least Mean Squares Regression Machine Learning 1 Least Squares Method for regression

The Mathemagic of Magic Squares History of Magic Squares Mathematics and Magic Squares

ECE 516: Adaptive Digital Filters Lecture 13 (Recursive Least-Squares) Mojtaba Soltanalian 2

Statistical Geometry Processing Winter Semester 2011/2012 Least-Squares Least-Squares Fitting

9. Equality constraints and tradeoffs More least squares Example: moving average model

8. Least squares Review of linear equations Least squares Example: curve-fitting

Linear Least Squares I Steve Marschner Cornell CS 322 Cornell CS 322 Linear Least Squares I 1

Moving Least Squares Outline The Approximation Power of Moving Least- Squares D. Levin

The Chi-squared Distribution of the Regularized Least Squares Functional for Regularization

Non linear Least Squares Lectures for PHD course on Numerical optimization Enrico Bertolazzi

Geometry of Least Squares 2 Least squares from the

Squares of function spaces and function spaces on squares Miko laj Krupski University of

A fast way to compute Least Squares Teo Zhi Shen Anderson Serangoon Junior College Least

Humanoid Robotics Least Squares Maren Bennewitz Goal of This Lecture Introduction to least

Ad Hoc Synchroniza/on Considered Harmful Weiwei Xiong, Soyoen

Design Extension Condi.ons and Severe Accidents in Light Water

Easy Data Peter Grnwald Centrum Wiskunde & Informatica Amsterdam Mathematical Institute

Quantum Computing Kitty Yeung, Ph.D. in Applied Physics Creative Technologist + Sr. PM Microsoft

Syntax & semantics of Beginning Student Readings: HtDP , Intermezzo 1 (Section 8). We are

Verification Conditions Juan Pablo Galeotti, Alessandra Gorla,

KNOWLEDGE GRAPH CONSTRUCTION Jay Pujara CMPS290C 4/8/2014 Talk goals! Problem: converting

Using Research and Development to Redesign: Immediately Implementable Methods and Practices to

least-squares L. Olson Department of Computer Science University - PowerPoint PPT Presentation

least-squares L. Olson Department of Computer Science University of Illinois at Urbana-Champaign 1 polling data Suppose we are given the data { ( x 1 , y 1 ) , ..., ( x n , y n ) } and we want to find a curve that best fits the data. 2

Practical Least-Squares for Computer Graphics Siggraph Course 11 Siggraph Course 11 Practical

Statistical Properties of the Regularized Least Squares Functional and a hybrid LSQR Newton method

Least Mean Squares Regression Machine Learning 1 Least Squares Method for regression

The Mathemagic of Magic Squares History of Magic Squares Mathematics and Magic Squares

ECE 516: Adaptive Digital Filters Lecture 13 (Recursive Least-Squares) Mojtaba Soltanalian 2

Statistical Geometry Processing Winter Semester 2011/2012 Least-Squares Least-Squares Fitting

9. Equality constraints and tradeoffs More least squares Example: moving average model

8. Least squares Review of linear equations Least squares Example: curve-fitting

Linear Least Squares I Steve Marschner Cornell CS 322 Cornell CS 322 Linear Least Squares I 1

Moving Least Squares Outline The Approximation Power of Moving Least- Squares D. Levin

The Chi-squared Distribution of the Regularized Least Squares Functional for Regularization

Non linear Least Squares Lectures for PHD course on Numerical optimization Enrico Bertolazzi

Geometry of Least Squares 2 Least squares from the

Squares of function spaces and function spaces on squares Miko laj Krupski University of

A fast way to compute Least Squares Teo Zhi Shen Anderson Serangoon Junior College Least

Humanoid Robotics Least Squares Maren Bennewitz Goal of This Lecture Introduction to least

Ad Hoc Synchroniza/on Considered Harmful Weiwei Xiong, Soyoen

Design Extension Condi.ons and Severe Accidents in Light Water

Easy Data Peter Grnwald Centrum Wiskunde &amp; Informatica Amsterdam Mathematical Institute

Quantum Computing Kitty Yeung, Ph.D. in Applied Physics Creative Technologist + Sr. PM Microsoft

Syntax &amp; semantics of Beginning Student Readings: HtDP , Intermezzo 1 (Section 8). We are

Verification Conditions Juan Pablo Galeotti, Alessandra Gorla,

KNOWLEDGE GRAPH CONSTRUCTION Jay Pujara CMPS290C 4/8/2014 Talk goals! Problem: converting

Using Research and Development to Redesign: Immediately Implementable Methods and Practices to

Easy Data Peter Grnwald Centrum Wiskunde & Informatica Amsterdam Mathematical Institute

Syntax & semantics of Beginning Student Readings: HtDP , Intermezzo 1 (Section 8). We are