8 least squares
play

8. Least squares Review of linear equations Least squares Example: - PowerPoint PPT Presentation

CS/ECE/ISyE 524 Introduction to Optimization Spring 201718 8. Least squares Review of linear equations Least squares Example: curve-fitting Vector norms Geometrical intuition Laurent Lessard (www.laurentlessard.com)


  1. CS/ECE/ISyE 524 Introduction to Optimization Spring 2017–18 8. Least squares ❼ Review of linear equations ❼ Least squares ❼ Example: curve-fitting ❼ Vector norms ❼ Geometrical intuition Laurent Lessard (www.laurentlessard.com)

  2. Review of linear equations System of m linear equations in n unknowns: a 11 x 1 + · · · + a 1 n x n = b 1       a 11 a 1 n x 1 b 1 . . . a 21 x 1 + · · · + a 2 n x n = b 2 . . . . ... . . . . ⇐ ⇒  =  . .   .   .  . . . . . .      . . . a m 1 a mn x n b m . . . a m 1 x 1 + · · · + a mn x n = b m Compact representation: Ax = b . Only three possibilities: 1. exactly one solution (e.g. x 1 + x 2 = 3 and x 1 − x 2 = 1) 2. infinitely many solutions (e.g. x 1 + x 2 = 0) 3. no solutions (e.g. x 1 + x 2 = 1 and x 1 + x 2 = 2) 8-2

  3. Review of linear equations ❼ column interpretation : the vector b is a linear combination of { a 1 , . . . , a n } , the columns of A .   x 1 x 2   � � Ax =  = a 1 x 1 + · · · + a n x n = b a 1 a 2 a n  .  . . . .   .  x n The solution x tells us how the vectors a i can be combined in order to produce b . ❼ can be visualized in the output space R m . 8-3

  4. Review of linear equations ❼ row interpretation : the intersection of hyperplanes i is the i th row of A . a T a T ˜ i x = b i where ˜       a T a T ˜ ˜ 1 x b 1 1 a T a T ˜ ˜ 2 x b 2       2 Ax =  x =  =  .   .   .  . . .       . . .     a T a T ˜ ˜ m x b m m The solution x is a point at the intersection of the affine hyperplanes. Each ˜ a i is a normal vector to a hyperplane. ❼ can be visualized in the input space R n . 8-4

  5. Review of linear equations ❼ The set of solutions of Ax = b is an affine subspace . ❼ If m > n , there is (usually but not always) no solution. This is the case where A is tall (overdetermined). ◮ Can we find x so that Ax ≈ b ? ◮ One possibility is to use least squares . ❼ If m < n , there are infinitely many solutions. This is the case where A is wide (underdetermined). ◮ Among all solutions to Ax = b , which one should we pick? ◮ One possibility is to use regularization . In this lecture, we will discuss least squares . 8-5

  6. Least squares ❼ Typical case of interest: m > n (overdetermined). If there is no solution to Ax = b we try instead to have Ax ≈ b . ❼ The least-squares approach: make Euclidean norm � Ax − b � as small as possible. ❼ Equivalently: make � Ax − b � 2 as small as possible. Standard form: � 2 � � minimize � Ax − b x It’s an unconstrained optimization problem. 8-6

  7. Least squares ❼ Typical case of interest: m > n (overdetermined). If there is no solution to Ax = b we try instead to have Ax ≈ b . ❼ The least-squares approach: make Euclidean norm � Ax − b � as small as possible. ❼ Equivalently: make � Ax − b � 2 as small as possible. Properties : √ � ❼ � x � = x 2 1 + · · · + x 2 x T x n = ❼ In Julia: � x � = norm(x) ❼ In JuMP: � x � 2 = dot(x,x) = sum(x.^2) 8-7

  8. Least squares ❼ column interpretation : find the linear combination of columns { a 1 , . . . , a n } that is closest to b . � Ax − b � 2 = � 2 � � � ( a 1 x 1 + · · · + a n x n ) − b b a 2 a 1 x 1 + a 2 x 2 a 1 8-8

  9. Least squares i is the i th row of A , define a T ❼ row interpretation : If ˜ i x − b i to be the i th residual component. a T r i := ˜ � Ax − b � 2 = (˜ 1 x − b 1 ) 2 + · · · + (˜ a T a T m x − b m ) 2 We minimize the sum of squares of the residuals. ❼ Solving Ax = b would make all residual components zero. Least squares attempts to make all of them small. 8-9

  10. Example: curve-fitting ❼ We are given noisy data points ( x i , y i ). ❼ We suspect they are related by y = px 2 + qx + r ❼ Find the p , q , r that best agrees with the data. Writing all the equations: y 1 ≈ px 2 1 + qx 1 + r     x 2 1 y 1 x 1 1   y 2 ≈ px 2 p 2 + qx 2 + r x 2 y 2 x 2 1     2 = ⇒  ≈ q  .   . . .  . . . . .   .  .   . . .  . r    x 2 1 y m x m y m ≈ px 2 m + qx m + r m ❼ Also called regression 8-10

  11. Example: curve-fitting ❼ More complicated : y = pe x + q cos( x ) − r √ x + s x 3 ❼ Find the p , q , r , s that best agrees with the data. Writing all the equations: −√ x 1     e x 1 x 3 cos( x 1 ) y 1   p 1 −√ x 2 x 3 e x 2 cos( x 2 ) y 2 q     2    ≈  .   . . . .  . . . . .   r     . . . . .   −√ x m    s e x m x 3 cos( x m ) y m m ❼ Julia notebook: Regression.ipynb 8-11

  12. Vector norms We want to solve Ax = b , but there is no solution. Define the residual to be the quantity r := b − Ax . We can’t make it zero, so instead we try to make it small . Many options! ❼ minimize the largest component (a.k.a. the ∞ -norm) � r � ∞ = max | r i | i ❼ minimize the sum of absolute values (a.k.a. the 1-norm) � r � 1 = | r 1 | + | r 2 | + · · · + | r m | ❼ minimize the Euclidean norm (a.k.a. the 2-norm) � r 2 1 + r 2 � r � 2 = � r � = 2 + · · · + r 2 m 8-12

  13. Vector norms � x � � 1 � Example: find that is closest to . 2 x y 4 Blue line is the set of points with coordinates ( x , x ). 3 2 Find the one closest to the red point located at (1 , 2). 1 Answer depends on your notion x of distance! - 1 1 2 3 4 - 1 8-13

  14. Vector norms � x � � 1 � Example: find that is closest to . 2 x f ( x ) 4 3 Minimize largest component: 2 min max {| x − 1 | , | x − 2 |} x 1 Optimum is at x = 1 . 5. x - 1 1 2 3 4 - 1 8-14

  15. Vector norms � x � � 1 � Example: find that is closest to . 2 x f ( x ) 4 3 Minimize sum of components: 2 min | x − 1 | + | x − 2 | x 1 Optimum is any 1 ≤ x ≤ 2. x - 1 1 2 3 4 - 1 8-15

  16. Vector norms � x � � 1 � Example: find that is closest to . 2 x f ( x ) 4 3 Minimize sum of squares: x ( x − 1) 2 + ( x − 2) 2 2 min 1 Optimum is at x = 1 . 5. x - 1 1 2 3 4 - 1 8-16

  17. Vector norms � x � � 1 � Example: find that is closest to . 2 x f ( x ) 4 Equivalently, we can: 3 Minimize √ sum of squares 2 ( x − 1) 2 + ( x − 2) 2 � min 1 x Optimum is at x = 1 . 5. x - 1 1 2 3 4 - 1 8-17

  18. Vector norms ❼ minimizing the largest component is an LP: min t � a T � min max � ˜ i x − r i ⇐ ⇒ x , t � x i a T s.t. − t ≤ ˜ i x − r i ≤ t ❼ minimizing the sum of absolute values is an LP: m min t 1 + · · · + t m � a T � � min � ˜ i x − r i ⇐ ⇒ x , t i � x a T s.t. − t i ≤ ˜ i x − r i ≤ t i i =1 ❼ minimizing the 2-norm is not an LP! m � 2 � a T � min ˜ i x − r i x i =1 8-18

  19. Geometry of LS b b − A ˆ x a 2 A ˆ x a 1 ❼ The set of points { Ax } is a subspace . ❼ We want to find ˆ x such that A ˆ x is closest to b . ❼ Insight : ( b − A ˆ x ) must be orthogonal to all line segments contained in the subspace. 8-19

  20. Geometry of LS b b − A ˆ x a 2 A ˆ x a 1 x − Az ) T ( b − A ˆ ❼ Must have: ( A ˆ x ) = 0 for all z x − z ) T ( A T b − A T A ˆ ❼ Simplifies to: (ˆ x ) = 0. Since this holds for all z , the normal equations are satisfied: A T A ˆ x = A T b 8-20

  21. Normal equations Theorem: If ˆ x satisfies the normal equations, then ˆ x is a solution to the least-squares optimization problem � Ax − b � 2 minimize x Proof: Suppose A T A ˆ x = A T b . Let x be any other point. � Ax − b � 2 = � A ( x − ˆ x − b ) � 2 x ) + ( A ˆ x ) � 2 + � A ˆ x − b � 2 + 2( x − ˆ x ) T A T ( A ˆ = � A ( x − ˆ x − b ) x ) � 2 + � A ˆ x − b � 2 = � A ( x − ˆ x − b � 2 ≥ � A ˆ 8-21

  22. Normal equations Least squares problems are easy to solve! ❼ Solving a least squares problem amounts to solving the normal equations. ❼ Normal equations can be solved in a variety of standard ways: LU (Cholesky) factorization, for example. ❼ More specialized methods are available if A is very large, sparse, or has a particular structure that can be exploited. ❼ Comparable to LPs in terms of solution difficulty. 8-22

  23. Least squares in Julia 1. Using JuMP: using JuMP, Gurobi m = Model(solver=GurobiSolver(OutputFlag=0)) @variable( m, x[1:size(A,2)] ) @objective( m, Min, sum((A*x-b).^2) ) solve(m) Note: only Gurobi or Mosek currently support this syntax 2. Solving the normal equations directly: x = inv(A’*A)*(A’*b) Note: Requires A to have full column rank ( A T A invertible) 3. Using the backslash operator (similar to Matlab): x = A\b Note: Fastest and most reliable option! 8-23

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend