6 approximation and fitting
play

6. Approximation and fitting Prof. Ying Cui Department of - PowerPoint PPT Presentation

6. Approximation and fitting Prof. Ying Cui Department of Electrical Engineering Shanghai Jiao Tong University 2018 SJTU Ying Cui 1 / 31 Outline Norm approximation Least-norm problems Regularized approximation Robust approximation SJTU


  1. 6. Approximation and fitting Prof. Ying Cui Department of Electrical Engineering Shanghai Jiao Tong University 2018 SJTU Ying Cui 1 / 31

  2. Outline Norm approximation Least-norm problems Regularized approximation Robust approximation SJTU Ying Cui 2 / 31

  3. Basic norm approximation problem min � Ax − b � x where A ∈ R m × n with m ≥ n and independent columns and b ∈ R m are given, and � · � is a norm on R m ◮ a solvable convex problem ◮ optimal value is zero iff b ∈ R ( A ) = { Ax | x ∈ R n } ◮ solution is A − 1 b when m = n ◮ optimal value is non-zero and problem is more interesting and useful if b �∈ R ( A ) ◮ an optimal point x ∗ = arg min x � Ax − b � is called an approximate solution of Ax ≈ b in norm � · � ◮ r = Ax − b is called the residual for the problem SJTU Ying Cui 3 / 31

  4. Interpretations approximation interpretation : ◮ fit/approximate vector b by a linear combination of columns of A , as closely as possible, with deviation measured in � · � ◮ Ax = x 1 a 1 + · · · + x n a n ( a 1 , · · · , a n ∈ R m : columns of A ) ◮ Ax ∗ is best approximation of b ◮ approximation problem is also called regression problem ◮ a 1 , · · · , a n are called regressors ◮ x ∗ 1 a 1 + · · · + x ∗ n a n is called regression of b estimation interpretation : ◮ estimate a parameter vector x on based on an imperfect linear vector measurement b ◮ consider a linear measurement model y = Ax + v , where y ∈ R m is a vector measurement, x ∈ R m is a vector of parameters to be estimated, v ∈ R m is some measurement error that is unknown but presumed to be small in � · � ◮ x ∗ is best guess of x , given y = b SJTU Ying Cui 4 / 31

  5. Interpretations geometric interpretation : ◮ find a projection of point b onto subspace R ( A ) in � · � , i.e., a point in R ( A ) that is closest to b , i.e., an optimal point of min � u − b � u s . t . u ∈ R ( A ) ◮ Ax ∗ is point in R ( A ) closest to b design interpretation : ◮ choose a vector of design variables x that achieves, as closely as possible, the target/desired results b ◮ residual vector r = Ax − b can be interpreted as the deviation between actual results Ax and target/desired results b ◮ x ∗ is design that best achieves desired results b SJTU Ying Cui 5 / 31

  6. Examples least-squares approximation ( � · � 2 ): equivalent problem (QP, obtained by squaring objective): � Ax − b � 2 2 = r 2 1 + · · · + r 2 min m x ◮ objective function f ( x ) = x T A T Ax − 2 x T A T b + b T b is convex quadratic ◮ point x is optimal iff it satisfies ∇ f ( x ) = 2 A T Ax − 2 A T b = 0 = ⇒ A T Ax = A T b which always have a solution ◮ unique solution x ∗ = ( A T A ) − 1 A T b if columns of A are independent (i.e., rank A = n ) SJTU Ying Cui 6 / 31

  7. Examples Chebyshev or minimax approximation ( � · � ∞ ): min � Ax − b � ∞ = max {| r 1 | , · · · , | r m |} x ◮ equivalent problem (LP): min t x ∈ R n , t ∈ R s . t . − t 1 � Ax − b � t 1 Sum of absolute residuals approximation ( � · � 1 ): min � Ax − b � 1 = | r 1 | + · · · + | r m | x ◮ equivalent problem (LP): 1 T t min x , t ∈ R n s . t . − t � Ax − b � t SJTU Ying Cui 7 / 31

  8. Penalty function approximation min φ ( r 1 ) + · · · + φ ( r m ) x ∈ R n , r ∈ R m s . t . r = Ax − b ◮ (residual) penalty function φ : R → R is a measure of dislike of a residual, and assumed to be convex ◮ in many cases, symmetric, nonnegative and φ (0) = 0 ◮ interpretation: minimize total penalty incurred by residuals of approximation Ax of b ◮ extension of equivalent problem of l p -norm (1 ≤ p < ∞ ) approximation p = | r 1 | p + · · · + | r m | p � Ax − b � p min x with a separable and symmetric function of residuals as objective function SJTU Ying Cui 8 / 31

  9. Common penalty functions ◮ l p -norm penalty function φ ( u ) = | u | p (1 ≤ p < ∞ ) ◮ quadratic penalty function φ ( u ) = u 2 ◮ absolute value penalty function φ ( u ) = | u | ◮ deadzone-linear penalty function with deadzone width a > 0 � 0 , | u | ≤ a φ ( u ) = max { 0 , | u | − a } = | u | − a , | u | > a ◮ log-barrier penalty function with limit a > 0 − a 2 log(1 − ( u / a ) 2 ) , � | u | < a φ ( u ) = ∞ , | u | ≥ a SJTU Ying Cui 9 / 31

  10. Example histogram of residual amplitudes for four penalty functions φ ( u ) = | u | , φ ( u ) = u 2 , φ ( u ) = max { 0 , | u |− 0 . 5 } , φ ( u ) = − log(1 − u 2 ) ◮ many zero or very small residuals, more large ones ◮ many modest residuals, relatively fewer large ones ◮ many residuals right at edge of ‘free’ zone ◮ residual distribution similar to that of quadratic except no residuals larger than 1 ◮ shape of penalty function has a large effect on distribution of residuals SJTU Ying Cui 10 / 31

  11. Sensitivity to outliers or large errors ◮ in estimation or regression context, an outlier is a measurement y i = a T i x + v i with a relatively large noise v i ◮ often associated with a flawed measurement or faulty data ◮ ideally guess which measurements are outliers, and either remove outliers or greatly lower their weight ◮ cannot assign zero penalty for very large residuals ◮ avoid making all residuals large to yield a total penalty of zero ◮ sensitivity to outliers depends on (relative) value of penalty function for large residuals ◮ least sensitive convex penalty functions are those grow linearly, i.e., like | u | , for large u , called robust (against outliers) SJTU Ying Cui 11 / 31

  12. Robust convex penalty functions ◮ absolute value penalty function: φ ( u ) = | u | ◮ Huber penalty function (with parameter M > 0): � u 2 , | u | ≤ M φ hub ( u ) = M (2 | u | − M ) , | u | > M ◮ example: use an affine function f ( t ) = α + β t to fit 42 points ( t i , y i ) (circles) with two obvious outliers ◮ left: Huber penalty for M = 1 ◮ right: fit using quadratic penalty (dashed) is rotated away from non-outlier data, toward outliers; fit using Huber penalty (solid) gives a far better fit to non-outlier data SJTU Ying Cui 12 / 31

  13. Approximation with constraints add constraints to basic norm approximation problem ◮ in an approximation problem, constraints can be used to ensure that approximation Ax of b satisfies certain properties ◮ in an estimation problem, constraints arise as prior knowledge of vector x to be estimated, or from prior knowledge of estimation error v ◮ in a geometric problem, constraints arise in determining projection of a point b on a set more complicated than a subspace, e.g., a cone or polyhedron SJTU Ying Cui 13 / 31

  14. Examples Nonnegativity constraints : min � Ax − b � x s . t . x � 0 ◮ approximate b using a conic combination of columns of A ◮ estimate x known to be nonnegative, e.g., powers, rates ◮ determine projection of b onto cone generated by columns of A Variable bounds : min � Ax − b � x s . t . l � x � u ◮ estimate x with prior knowledge of interval for each variable ◮ determine projection of b onto image of a box under linear mapping induced by A SJTU Ying Cui 14 / 31

  15. Examples Probability distribution : min � Ax − b � x x � 0 , 1 T x = 1 s . t . ◮ approximate b using a convex combination of columns of A ◮ estimate proportions or relative frequencies Norm ball constraint : min � Ax − b � x s . t . � x − x 0 � ≤ d ◮ estimate x with prior guess x 0 and maximum plausible deviation d ◮ approximate b using a linear combination of columns of A within trust region � x − x 0 � ≤ d SJTU Ying Cui 15 / 31

  16. Least-norm problems min � x � x s . t . Ax = b where A ∈ R m × n with m ≤ n and independent rows, b ∈ R m , and � · � is a norm on R n ◮ a solvable convex problem ◮ only feasible point is A − 1 b when m = n ◮ problem is interesting only when m < n ( Ax = b underdetermined) ◮ an optimal point x ∗ is called a least-norm solution of Ax = b in norm � · � ◮ reformulation: norm approximation problem min u � x 0 + Zu � , ( x 0 + Zu : general solution of Ax = b , Z ∈ R n × m , u ∈ R m ) ◮ extension: least-penalty problem min φ ( x 1 ) + · · · + φ ( x n ) φ : convex, nonnegative, φ (0) = 0 x s . t . Ax = b SJTU Ying Cui 16 / 31

  17. Interpretations control or design interpretation : ◮ x are n design variables (inputs), b are m required results (outputs), and Ax = b represent m requirements on design ◮ design is underspecified with n − m DoFs (as m < n ) ◮ choose smallest (‘most efficient’) design (measured by norm � · � ) that satisfies requirements estimation interpretation : ◮ x are n parameters, and b are m perfect measurements ◮ measurements do not completely determine parameters (as m < n ), and prior information is that parameters are small (measured by norm � · � ) ◮ choose smallest (‘most plausible’) estimate consistent with measurements geometric interpretation : ◮ find point in affine set { x | Ax = b } with minimum distance (measured by norm � · � ) to 0 SJTU Ying Cui 17 / 31

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend