least squares
play

least-squares L. Olson Department of Computer Science University - PowerPoint PPT Presentation

least-squares L. Olson Department of Computer Science University of Illinois at Urbana-Champaign 1 polling data Suppose we are given the data { ( x 1 , y 1 ) , ..., ( x n , y n ) } and we want to find a curve that best fits the data. 2


  1. least-squares L. Olson Department of Computer Science University of Illinois at Urbana-Champaign 1

  2. polling data Suppose we are given the data { ( x 1 , y 1 ) , ..., ( x n , y n ) } and we want to find a curve that best fits the data. 2

  3. fitting curves 3

  4. fitting a line Given n data points { ( x 1 , y i ) , ..., ( x n , y n ) } find a and b such that y i = ax i + b ∀ i ∈ [ 1 , n ] . In matrix form, find a and b that solves  x 1 1   y 1  � � a =  . .   .  . . . b     . . . x n 1 y n Systems with more equations than unknowns are called overdetermined 4

  5. overdetermined systems If A is an m × n matrix, then in general, an m × 1 vector b may not lie in the column space of A . Hence Ax = b may not have an exact solution. Definition The residual vector is r = b − Ax . The least squares solution is given by minimizing the square of the residual in the 2-norm. 5

  6. normal equations Writing r = ( b − Ax ) and substituting, we want to find an x that minimizes the following function φ ( x ) = || r || 2 2 = r T r = ( b − Ax ) T ( b − Ax ) = b T b − 2 x T A T b + x T A T Ax From calculus we know that the minimizer occurs where ∇ φ ( x ) = 0. The derivative is given by ∇ φ ( x ) = − 2 A T b + 2 A T Ax = 0 Definition The system of normal equations is given by A T Ax = A T b . 6

  7. solving normal equations Since the normal equations forms a symmetric system, we can solve by computing the Cholesky factorization A T A = LL T and solving Ly = A T b and L T x = y . Consider   1 1 A = 0 ǫ     0 ǫ where 0 < ǫ < √ ǫ mach . The normal equations for this system is given by � � � � 1 + ǫ 2 1 1 1 A T A = = 1 + ǫ 2 1 1 1 7

  8. normal equations: conditioning The normal equations tend to worsen the condition of the matrix. Theorem cond ( A T A ) = ( cond ( A )) 2 1 >> A = rand(10,10); 2 >> cond(A) 43.4237 3 4 >> cond(A’*A) 1.8856e+03 5 How can we solve the least squares problem without squaring the condition of the matrix? 8

  9. other approaches • QR factorization. • For A ∈ R m × n , factor A = QR where • Q is an m × m orthogonal matrix • R is an m × n upper triangular matrix (since R is an m × n upper � � R ′ triangular matrix we can write R = where R is n × n upper 0 triangular and 0 is the ( m − n ) × n matrix of zeros) • SVD - singular value decomposition • For A ∈ R m × n , factor A = USV T where • U is an m × m orthogonal matrix • V is an n × n orthogonal matrix • S is an m × n diagonal matrix whose elements are the singular values. 9

  10. orthogonal matrices Definition A matrix Q is orthogonal if Q T Q = QQ T = I Orthogonal matrices preserve the Euclidean norm of any vector v , || Qv || 2 2 = ( Qv ) T ( Qv ) = v T Q T Qv = v T v = || v || 2 2 . 10

  11. using qr factorization for least squares Now that we know orthogonal matrices preserve the euclidean norm, we can apply orthogonal matrices to the residual vector without changing the norm of the residual. 2 2 2 � � R � � � � R � � � � R � � � r � 2 2 = � b − Ax � 2 � � � � Q T b − Q T Q � � � Q T b − � 2 = � b − Q x = x = x � � � � � � 0 0 0 � � � 2 2 2 � � � � c 1 x 1 If Q T b = and x = then c 2 x 2 2 2 2 � � � �� � �� � R � � c 1 � � Rx 1 � c 1 − Rx 1 � Q T b − = || c 1 − Rx 1 || 2 2 + || c 2 || 2 � � � � � � x = − = � � � � � � 2 0 c 2 0 c 2 � � � � � 2 2 2 Hence the least squares solution is given by solving � � � � � � R x 1 c 1 = . We can solve Rx 1 = c 1 using back substitution and 0 x 2 c 2 the residual is || r || 2 = || c 2 || 2 . 11

  12. gram-schmidt orthogonalization One way to obtain the QR factorization of a matrix A is by Gram-Schmidt orthogonalization. We are looking for a set of orthogonal vectors q that span the range of A . For the simple case of 2 vectors { a 1 , a 2 } , first normalize a 1 and obtain a 1 q 1 = || a 1 || . Now we need q 2 such that q T 1 q 2 = 0 and q 2 = a 2 + cq 1 . That is, R ( q 1 , q 2 ) = R ( a 1 , a 2 ) Enforcing orthogonality gives: q T 1 q 2 = 0 = q T 1 a 2 + cq T 1 q 1 12

  13. gram-schmidt orthogonalization q T 1 q 2 = 0 = q T 1 a 2 + cq T 1 q 1 Solving for the constant c. c = − q T 1 a 2 q T 1 q 1 reformulating q 2 gives. q 2 = a 2 − q T 1 a 2 q 1 q T 1 q 1 Adding another vector a 3 and we have for q 3 , q 3 = a 3 − q T q 2 − q T 2 a 3 1 a 3 q 1 q T q T 2 q 2 1 q 1 Repeating this idea for n columns gives us Gram-Schmidt orthogonalization. 13

  14. gram-schmidt orthogonalization Since R is upper triangular and A = QR we have a 1 = q 1 r 11 a 2 = q 1 r 12 + q 2 r 22 . . = . . . . a n = q 1 r 1 n + q 2 r 2 n + ... + q n r nn From this we see that r ij = q T i a j i q i , j > i q T 14

  15. orthogonal projection The orthogonal projector onto the range of q 1 can be written: q 1 q T 1 q T 1 q 1 . Application of this operator to a vector a orthogonally projects a onto q 1 . If we subtract the result from a we are left with a vector that is orthogonal to q 1 . 1 ( I − q 1 q T q T 1 ) a = 0 q T 1 q 1 15

  16. gram-schmidt orthogonalization 1 function [Q,R] = gs_qr (A) 2 3 m = size(A,1); 4 n = size(A,2); 5 6 for i = 1:n R(i,i) = norm(A(:,i),2); 7 Q(:,i) = A(:,i)./R(i,i); 8 for j = i+1:n 9 R(i,j) = Q(:,i)’ * A(:,j); 10 A(:,j) = A(:,j) - R(i,j)*Q(:,i); 11 end 12 13 end 14 15 end 16

  17. using svd for least squares Recall that a singular value decomposition is given by   σ 1   v T     . . . ... . . .   1 . . . . . . . . .     A = u 1 u m σ r    .    . . . .     . . . . . . .     v T . . . ...   . . . n . . . . . . . . .   0 where σ i are the singular values. 17

  18. using svd for least squares Assume that A has rank k (and hence k nonzero singular values σ i ) and recall that we want to minimize || r || 2 2 = || b − Ax || 2 2 . Substituting the SVD for A we find that || r || 2 2 = || b − Ax || 2 2 = || b − USV T x || 2 2 where U and V are orthogonal and S is diagonal with k nonzero singular values. || b − USV T x || 2 2 = || U T b − U T USV T x || 2 2 = || U T b − SV T x || 2 2 18

  19. using svd for least squares Let c = U T b and y = V T x (and hence x = Vy ) in || U T b − SV T x || 2 2 . We now have || r || 2 2 = || c − Sy || 2 2 Since S has only k nonzero diagonal elements, we have k n ( c i − σ i y i ) 2 + � � || r || 2 c 2 2 = i i = 1 i = k + 1 which is minimized when y i = c i σ i for 1 � i � k . 19

  20. using svd for least squares Theorem Let A be an m × n matrix of rank r and let A = USV T , the singular value decomposition. The least squares solution of the system Ax = b is r � ( σ − 1 x = c i ) v i i i = 1 where c i = u T i b . 20

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend