6. Approximation and fitting Prof. Ying Cui Department of - PowerPoint PPT Presentation

6. Approximation and fitting Prof. Ying Cui Department of Electrical Engineering Shanghai Jiao Tong University 2018 SJTU Ying Cui 1 / 31

Outline Norm approximation Least-norm problems Regularized approximation Robust approximation SJTU Ying Cui 2 / 31

Basic norm approximation problem min � Ax − b � x where A ∈ R m × n with m ≥ n and independent columns and b ∈ R m are given, and � · � is a norm on R m ◮ a solvable convex problem ◮ optimal value is zero iff b ∈ R ( A ) = { Ax | x ∈ R n } ◮ solution is A − 1 b when m = n ◮ optimal value is non-zero and problem is more interesting and useful if b �∈ R ( A ) ◮ an optimal point x ∗ = arg min x � Ax − b � is called an approximate solution of Ax ≈ b in norm � · � ◮ r = Ax − b is called the residual for the problem SJTU Ying Cui 3 / 31

Interpretations approximation interpretation : ◮ fit/approximate vector b by a linear combination of columns of A , as closely as possible, with deviation measured in � · � ◮ Ax = x 1 a 1 + · · · + x n a n ( a 1 , · · · , a n ∈ R m : columns of A ) ◮ Ax ∗ is best approximation of b ◮ approximation problem is also called regression problem ◮ a 1 , · · · , a n are called regressors ◮ x ∗ 1 a 1 + · · · + x ∗ n a n is called regression of b estimation interpretation : ◮ estimate a parameter vector x on based on an imperfect linear vector measurement b ◮ consider a linear measurement model y = Ax + v , where y ∈ R m is a vector measurement, x ∈ R m is a vector of parameters to be estimated, v ∈ R m is some measurement error that is unknown but presumed to be small in � · � ◮ x ∗ is best guess of x , given y = b SJTU Ying Cui 4 / 31

Interpretations geometric interpretation : ◮ find a projection of point b onto subspace R ( A ) in � · � , i.e., a point in R ( A ) that is closest to b , i.e., an optimal point of min � u − b � u s . t . u ∈ R ( A ) ◮ Ax ∗ is point in R ( A ) closest to b design interpretation : ◮ choose a vector of design variables x that achieves, as closely as possible, the target/desired results b ◮ residual vector r = Ax − b can be interpreted as the deviation between actual results Ax and target/desired results b ◮ x ∗ is design that best achieves desired results b SJTU Ying Cui 5 / 31

Examples least-squares approximation ( � · � 2 ): equivalent problem (QP, obtained by squaring objective): � Ax − b � 2 2 = r 2 1 + · · · + r 2 min m x ◮ objective function f ( x ) = x T A T Ax − 2 x T A T b + b T b is convex quadratic ◮ point x is optimal iff it satisfies ∇ f ( x ) = 2 A T Ax − 2 A T b = 0 = ⇒ A T Ax = A T b which always have a solution ◮ unique solution x ∗ = ( A T A ) − 1 A T b if columns of A are independent (i.e., rank A = n ) SJTU Ying Cui 6 / 31

Examples Chebyshev or minimax approximation ( � · � ∞ ): min � Ax − b � ∞ = max {| r 1 | , · · · , | r m |} x ◮ equivalent problem (LP): min t x ∈ R n , t ∈ R s . t . − t 1 � Ax − b � t 1 Sum of absolute residuals approximation ( � · � 1 ): min � Ax − b � 1 = | r 1 | + · · · + | r m | x ◮ equivalent problem (LP): 1 T t min x , t ∈ R n s . t . − t � Ax − b � t SJTU Ying Cui 7 / 31

Penalty function approximation min φ ( r 1 ) + · · · + φ ( r m ) x ∈ R n , r ∈ R m s . t . r = Ax − b ◮ (residual) penalty function φ : R → R is a measure of dislike of a residual, and assumed to be convex ◮ in many cases, symmetric, nonnegative and φ (0) = 0 ◮ interpretation: minimize total penalty incurred by residuals of approximation Ax of b ◮ extension of equivalent problem of l p -norm (1 ≤ p < ∞ ) approximation p = | r 1 | p + · · · + | r m | p � Ax − b � p min x with a separable and symmetric function of residuals as objective function SJTU Ying Cui 8 / 31

Common penalty functions ◮ l p -norm penalty function φ ( u ) = | u | p (1 ≤ p < ∞ ) ◮ quadratic penalty function φ ( u ) = u 2 ◮ absolute value penalty function φ ( u ) = | u | ◮ deadzone-linear penalty function with deadzone width a > 0 � 0 , | u | ≤ a φ ( u ) = max { 0 , | u | − a } = | u | − a , | u | > a ◮ log-barrier penalty function with limit a > 0 − a 2 log(1 − ( u / a ) 2 ) , � | u | < a φ ( u ) = ∞ , | u | ≥ a SJTU Ying Cui 9 / 31

Example histogram of residual amplitudes for four penalty functions φ ( u ) = | u | , φ ( u ) = u 2 , φ ( u ) = max { 0 , | u |− 0 . 5 } , φ ( u ) = − log(1 − u 2 ) ◮ many zero or very small residuals, more large ones ◮ many modest residuals, relatively fewer large ones ◮ many residuals right at edge of ‘free’ zone ◮ residual distribution similar to that of quadratic except no residuals larger than 1 ◮ shape of penalty function has a large effect on distribution of residuals SJTU Ying Cui 10 / 31

Sensitivity to outliers or large errors ◮ in estimation or regression context, an outlier is a measurement y i = a T i x + v i with a relatively large noise v i ◮ often associated with a flawed measurement or faulty data ◮ ideally guess which measurements are outliers, and either remove outliers or greatly lower their weight ◮ cannot assign zero penalty for very large residuals ◮ avoid making all residuals large to yield a total penalty of zero ◮ sensitivity to outliers depends on (relative) value of penalty function for large residuals ◮ least sensitive convex penalty functions are those grow linearly, i.e., like | u | , for large u , called robust (against outliers) SJTU Ying Cui 11 / 31

Robust convex penalty functions ◮ absolute value penalty function: φ ( u ) = | u | ◮ Huber penalty function (with parameter M > 0): � u 2 , | u | ≤ M φ hub ( u ) = M (2 | u | − M ) , | u | > M ◮ example: use an affine function f ( t ) = α + β t to fit 42 points ( t i , y i ) (circles) with two obvious outliers ◮ left: Huber penalty for M = 1 ◮ right: fit using quadratic penalty (dashed) is rotated away from non-outlier data, toward outliers; fit using Huber penalty (solid) gives a far better fit to non-outlier data SJTU Ying Cui 12 / 31

Approximation with constraints add constraints to basic norm approximation problem ◮ in an approximation problem, constraints can be used to ensure that approximation Ax of b satisfies certain properties ◮ in an estimation problem, constraints arise as prior knowledge of vector x to be estimated, or from prior knowledge of estimation error v ◮ in a geometric problem, constraints arise in determining projection of a point b on a set more complicated than a subspace, e.g., a cone or polyhedron SJTU Ying Cui 13 / 31

Examples Nonnegativity constraints : min � Ax − b � x s . t . x � 0 ◮ approximate b using a conic combination of columns of A ◮ estimate x known to be nonnegative, e.g., powers, rates ◮ determine projection of b onto cone generated by columns of A Variable bounds : min � Ax − b � x s . t . l � x � u ◮ estimate x with prior knowledge of interval for each variable ◮ determine projection of b onto image of a box under linear mapping induced by A SJTU Ying Cui 14 / 31

Examples Probability distribution : min � Ax − b � x x � 0 , 1 T x = 1 s . t . ◮ approximate b using a convex combination of columns of A ◮ estimate proportions or relative frequencies Norm ball constraint : min � Ax − b � x s . t . � x − x 0 � ≤ d ◮ estimate x with prior guess x 0 and maximum plausible deviation d ◮ approximate b using a linear combination of columns of A within trust region � x − x 0 � ≤ d SJTU Ying Cui 15 / 31

Least-norm problems min � x � x s . t . Ax = b where A ∈ R m × n with m ≤ n and independent rows, b ∈ R m , and � · � is a norm on R n ◮ a solvable convex problem ◮ only feasible point is A − 1 b when m = n ◮ problem is interesting only when m < n ( Ax = b underdetermined) ◮ an optimal point x ∗ is called a least-norm solution of Ax = b in norm � · � ◮ reformulation: norm approximation problem min u � x 0 + Zu � , ( x 0 + Zu : general solution of Ax = b , Z ∈ R n × m , u ∈ R m ) ◮ extension: least-penalty problem min φ ( x 1 ) + · · · + φ ( x n ) φ : convex, nonnegative, φ (0) = 0 x s . t . Ax = b SJTU Ying Cui 16 / 31

Interpretations control or design interpretation : ◮ x are n design variables (inputs), b are m required results (outputs), and Ax = b represent m requirements on design ◮ design is underspecified with n − m DoFs (as m < n ) ◮ choose smallest (‘most efficient’) design (measured by norm � · � ) that satisfies requirements estimation interpretation : ◮ x are n parameters, and b are m perfect measurements ◮ measurements do not completely determine parameters (as m < n ), and prior information is that parameters are small (measured by norm � · � ) ◮ choose smallest (‘most plausible’) estimate consistent with measurements geometric interpretation : ◮ find point in affine set { x | Ax = b } with minimum distance (measured by norm � · � ) to 0 SJTU Ying Cui 17 / 31

6. Approximation and fitting Prof. Ying Cui Department of - PowerPoint PPT Presentation

6. Approximation and fitting Prof. Ying Cui Department of Electrical Engineering Shanghai Jiao Tong University 2018 SJTU Ying Cui 1 / 31 Outline Norm approximation Least-norm problems Regularized approximation Robust approximation SJTU

Track fitting, vertex fitting and Track fitting, vertex fitting and Track fitting, vertex fitting

Week 2 Video 5 Cross-Validation and Over-Fitting Over-Fitting Ive mentioned over-fitting a

6. Approximation and fitting norm approximation least-norm problems regularized

Lecture 11 Fitting ARIMA Models 10/10/2018 1 Model Fitting Fitting ARIMA For an

Functions and Data Fitting COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning

Fitting a Line, Residuals, and Correlation October 28, 2019 October 28, 2019 1 / 36 Fitting a

Fitting a Line, Residuals, and Correlation August 27, 2019 August 27, 2019 1 / 54 Fitting a

Least Squares and Data Fitting Data fitting How do we best fit a set of data points? Linear

Over fitting distribution functions over Bayesian Regression / " ' i diggllloise dist

Fitting high resolution structures into low resolution EM maps Michael Rossmann Purdue

Unit 1: Data Fitting Motivation Data fitting: Construct a continuous function that represents

Mechanical Fitting Failures Reporting and Data Analysis - 1 - MFFR Reporting 191.12

Lecture 19 Fitting CAR and SAR Models Colin Rundel 03/29/2017 1 Fitting areal models 2 CAR

Lecture 3 bis Fitting and the Hough transform Fitting: Motivation 9300 Harris Corners Pkwy,

Lecture 18 Fitting CAR and SAR Models Colin Rundel 11/07/2018 1 Fitting areal models Revised

Fitting: Deformable contours Goal: move from array of pixel values (or Monday, Feb 21 filter

Presentation on historical financials under new reporting structure January 8, 2016 Disclaimer

Escalante River 2016 Flannelmouth Sucker ( Catostomus latipinnis ) Large-bodied Protected

CS/IT 490 WD Web Development Dr. Steven Bitner What this class will not be Design principles

From Idea to Virtual Reality An Intro to WebVR Andrea Hawksley, eleVR, HARC Who am I? eleVR

IT360 Applied Database Systems Entity-Relationship Model Chapter 5 in Kroenke Database

Chuck Basics Chunity - Workshop Vitor Guerra Rolla Postdoctoral Fellow vitorgr@impa.br Chuck

Audio Programming with Chuck Session 1: Basics: Sound, Waves, and ChucK Programming Vitor Guerra

http://cs246.stanford.edu 3/9/20 Jure Leskovec, Stanford CS246: Mining Massive Datasets,

6. Approximation and fitting Prof. Ying Cui Department of - PowerPoint PPT Presentation

6. Approximation and fitting Prof. Ying Cui Department of Electrical Engineering Shanghai Jiao Tong University 2018 SJTU Ying Cui 1 / 31 Outline Norm approximation Least-norm problems Regularized approximation Robust approximation SJTU

Track fitting, vertex fitting and Track fitting, vertex fitting and Track fitting, vertex fitting

Week 2 Video 5 Cross-Validation and Over-Fitting Over-Fitting Ive mentioned over-fitting a

6. Approximation and fitting norm approximation least-norm problems regularized

Lecture 11 Fitting ARIMA Models 10/10/2018 1 Model Fitting Fitting ARIMA For an

Functions and Data Fitting COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning

Fitting a Line, Residuals, and Correlation October 28, 2019 October 28, 2019 1 / 36 Fitting a

Fitting a Line, Residuals, and Correlation August 27, 2019 August 27, 2019 1 / 54 Fitting a

Least Squares and Data Fitting Data fitting How do we best fit a set of data points? Linear

Over fitting distribution functions over Bayesian Regression / &quot; ' i diggllloise dist

Fitting high resolution structures into low resolution EM maps Michael Rossmann Purdue

Unit 1: Data Fitting Motivation Data fitting: Construct a continuous function that represents

Mechanical Fitting Failures Reporting and Data Analysis - 1 - MFFR Reporting 191.12

Lecture 19 Fitting CAR and SAR Models Colin Rundel 03/29/2017 1 Fitting areal models 2 CAR

Lecture 3 bis Fitting and the Hough transform Fitting: Motivation 9300 Harris Corners Pkwy,

Lecture 18 Fitting CAR and SAR Models Colin Rundel 11/07/2018 1 Fitting areal models Revised

Fitting: Deformable contours Goal: move from array of pixel values (or Monday, Feb 21 filter

Presentation on historical financials under new reporting structure January 8, 2016 Disclaimer

Escalante River 2016 Flannelmouth Sucker ( Catostomus latipinnis ) Large-bodied Protected

CS/IT 490 WD Web Development Dr. Steven Bitner What this class will not be Design principles

From Idea to Virtual Reality An Intro to WebVR Andrea Hawksley, eleVR, HARC Who am I? eleVR

IT360 Applied Database Systems Entity-Relationship Model Chapter 5 in Kroenke Database

Chuck Basics Chunity - Workshop Vitor Guerra Rolla Postdoctoral Fellow vitorgr@impa.br Chuck

Audio Programming with Chuck Session 1: Basics: Sound, Waves, and ChucK Programming Vitor Guerra

http://cs246.stanford.edu 3/9/20 Jure Leskovec, Stanford CS246: Mining Massive Datasets,

Over fitting distribution functions over Bayesian Regression / " ' i diggllloise dist