Why LASSO, EN, and General Regularization CLOT: Invariance-Based - PowerPoint PPT Presentation

Need for Regularization Currently Used . . . Why: Remaining . . . Probabilistic . . . Why LASSO, EN, and General Regularization CLOT: Invariance-Based Scale-Invariance: . . . Shift-Invariance: . . . Explanation Why LASSO Beyond EN and CLOT? Hamza Alkhatib 1 , Ingo Neumann 1 Home Page Vladik Kreinovich 2 , and Chon Van Le 3 Title Page 1 Geodetic Institute, Leibniz University of Hannover Hannover, Germany ◭◭ ◮◮ alkhatib@gih.uni-hannover.de ◭ ◮ neumann@gih.uni-hannover.de 2 University of Texas at El Paso Page 1 of 59 El Paso, Texas 79968, USA, vladik@utep.edu 3 International University – VNU HCMC Go Back Ho Chi Minh City, Vietnam, lvchon@hcmiu.edu.vn Full Screen Close Quit

Need for Regularization Currently Used . . . 1. Need for Solving the Inverse Problem Why: Remaining . . . • Once we have a model of a system, Probabilistic . . . General Regularization – we can use this model to predict the system’s be- Scale-Invariance: . . . havior, Shift-Invariance: . . . – in particular, to predict the results of future mea- Why LASSO surements and observations of this system. Beyond EN and CLOT? • The problem of estimating future measurement results Home Page based on the model is known as the forward problem . Title Page • In many practical situations, we do not know the exact ◭◭ ◮◮ model. ◭ ◮ • To be more precise: Page 2 of 59 – we know the general form of a dependence between Go Back physical quantities, Full Screen – but the parameters of this dependence need to be determined from the observations. Close Quit

Need for Regularization Currently Used . . . 2. Need for Inverse Problem (cont-d) Why: Remaining . . . • For example, often, we have a linear model Probabilistic . . . General Regularization n � y = a 0 + a i · x i . Scale-Invariance: . . . i =1 Shift-Invariance: . . . Why LASSO • The parameters a i need to be experimentally deter- Beyond EN and CLOT? mined. Home Page • In general, we need to determine the parameters of the Title Page model based on the measurement results. ◭◭ ◮◮ • This problem is known as the inverse problem . ◭ ◮ • To actually find the parameters, we can use, e.g., the Page 3 of 59 Maximum Likelihood method. Go Back Full Screen Close Quit

Need for Regularization Currently Used . . . 3. Need for Inverse Problem (cont-d) Why: Remaining . . . • For example: Probabilistic . . . General Regularization – when the errors are normally distributed, Scale-Invariance: . . . – the Maximum Likelihood procedure results in the Shift-Invariance: . . . usual Least Squares estimates. Why LASSO • For example, for a general linear model with parame- Beyond EN and CLOT? ters a i : Home Page – once we know several tuples of corresponding values Title Page ( x ( k ) 1 , . . . , x ( k ) n , y ( k ) ), 1 ≤ k ≤ K , ◭◭ ◮◮ – then we can find the parameters from the condition ◭ ◮ that Page 4 of 59 �� 2 K � � n y ( k ) − � � a i · x ( k ) Go Back a 0 + → min a 0 ,...,a n . i k =1 i =1 Full Screen Close Quit

Need for Regularization Currently Used . . . 4. Need for Regularization Why: Remaining . . . • In some practical situations: Probabilistic . . . General Regularization – based on the measurement results, Scale-Invariance: . . . – we can determine all the model’s parameters with Shift-Invariance: . . . reasonably accuracy. Why LASSO • Often, several different combinations of parameters are Beyond EN and CLOT? consistent with all the measurement results. Home Page • Such inverse problems are called ill-defined . Title Page ◭◭ ◮◮ • E.g., in dynamical systems, the observations provide a smoothed picture of the system’s dynamics. ◭ ◮ • For example, we can be tracing the motion of a me- Page 5 of 59 chanical system caused by an external force. Go Back Full Screen Close Quit

Need for Regularization Currently Used . . . 5. Need for Regularization (cont-d) Why: Remaining . . . • Then: Probabilistic . . . General Regularization – a strong but short-time force in one direction fol- Scale-Invariance: . . . lowed by Shift-Invariance: . . . – a similar strong and short-time force in the opposite Why LASSO direction will (almost) cancel each other. Beyond EN and CLOT? • So the same almost-unchanging behavior is consistent Home Page both: Title Page – with the absence of forces and ◭◭ ◮◮ – with the above wildly-oscillating force. ◭ ◮ • A similar phenomenon occurs when: Page 6 of 59 – based on the observed economic behavior, Go Back – we try to reconstruct the external forces affecting Full Screen the economic system. Close Quit

Need for Regularization Currently Used . . . 6. Need for Regularization (cont-d) Why: Remaining . . . • In such situations: Probabilistic . . . General Regularization – the only way to narrow down the set of possible Scale-Invariance: . . . solution Shift-Invariance: . . . – is to take into account some general a priori infor- Why LASSO mation. Beyond EN and CLOT? • For example, for forces, we may know – e.g., from ex- Home Page perts – the upper bound. Title Page • The use of such a priori information is known as regu- ◭◭ ◮◮ larization . ◭ ◮ Page 7 of 59 Go Back Full Screen Close Quit

Need for Regularization Currently Used . . . 7. Which Regularizations Are Currently Used Why: Remaining . . . • There are many possible regularizations. Probabilistic . . . General Regularization • Many of them have been tried. Scale-Invariance: . . . • Based on the results of these tries, a few techniques Shift-Invariance: . . . turned out to be empirically successful. Why LASSO • The most widely used technique of this type is known Beyond EN and CLOT? Home Page as LASSO technique. Title Page • LASSO is short of Least Absolute Shrinkage and Se- lection Operator. ◭◭ ◮◮ • We require that the sum of the absolute values ◭ ◮ n def � a � 1 = � | a i | be bounded by some number. Page 8 of 59 i =0 Go Back Full Screen Close Quit

Need for Regularization Currently Used . . . 8. Currently Used Regularizations (cont-d) Why: Remaining . . . • Another widely used method is a ridge regression method, Probabilistic . . . n def General Regularization a 2 � in which we limit the sum of the squares S = i . i =0 Scale-Invariance: . . . • This is equivalent to bounding its square root Shift-Invariance: . . . √ Why LASSO def � a � 2 = S. Beyond EN and CLOT? Home Page • Very promising are also: Title Page – the Elastic Net (EN) method, in which we limit a ◭◭ ◮◮ linear combination � a � 1 + c · S , and ◭ ◮ – the Combined L-One and Two (CLOT) method in which we limit a linear combination � a � 1 + c · � a � 2 . Page 9 of 59 Go Back Full Screen Close Quit

Need for Regularization Currently Used . . . 9. Why: Remaining Question and What We Do Why: Remaining . . . in This Talk Probabilistic . . . • The above empirical facts prompt a natural question: General Regularization why the above regularization techniques work the best? Scale-Invariance: . . . Shift-Invariance: . . . • We show that the efficiency of these methods can be Why LASSO explained by the natural invariance requirements. Beyond EN and CLOT? Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 10 of 59 Go Back Full Screen Close Quit

Need for Regularization Currently Used . . . 10. General Idea of Regularization and Its Possi- Why: Remaining . . . ble Probabilistic Background Probabilistic . . . • In general, regularization means that we dismiss values General Regularization a i which are too large or too small. Scale-Invariance: . . . Shift-Invariance: . . . • In some cases, this dismissal is based on subjective Why LASSO estimations of what is large and what is small. Beyond EN and CLOT? • In other cases, the conclusion about what is large and Home Page what is not large is based on past experience. Title Page • So, it is based on the frequencies (= probabilities) with ◭◭ ◮◮ which different values have been observed in the past. ◭ ◮ • In this talk, we consider both types of regularization. Page 11 of 59 Go Back Full Screen Close Quit

Need for Regularization Currently Used . . . 11. Probabilistic Regularization: Towards a Pre- Why: Remaining . . . cise Definition Probabilistic . . . • There is no a priori reason to believe that different General Regularization parameters have different distributions. Scale-Invariance: . . . Shift-Invariance: . . . • So, in the first approximation, it makes sense to assume Why LASSO that they have the same probability distribution. Beyond EN and CLOT? • Let us denote the probability density function of this Home Page common distribution by ρ ( a ). Title Page • In other words, the original information is invariant ◭◭ ◮◮ w.r.t. all possible permutations of parameters. ◭ ◮ • Then, the resulting joint distribution should also be Page 12 of 59 invariant with respect to all the permutations. Go Back • This implies, in particular, that all the marginal distributions are the same. Full Screen Close Quit

Why LASSO, EN, and General Regularization CLOT: Invariance-Based - PowerPoint PPT Presentation

Need for Regularization Currently Used . . . Why: Remaining . . . Probabilistic . . . Why LASSO, EN, and General Regularization CLOT: Invariance-Based Scale-Invariance: . . . Shift-Invariance: . . . Explanation Why LASSO Beyond EN and

Ridge/Lasso Regression, Model selection Xuezhi Wang Computer Science Department Carnegie Mellon

Why Geometric Progression LASSO Method in Selecting the LASSO How Is Selected: . . . Natural

Sparse Exponential Weighting as an alternative to LASSO and Dantzig selector Alexandre Tsybakov

Sparse CCA using Lasso Anastasia Lykou & Joe Whittaker Department of Mathematics and

Using Stata 16s lasso features for prediction and inference Di Liu StataCorp August, 2019

A practical tour of optimization algorithms for the Lasso Alexandre Gramfort

Why LASSO, Ridge Need for Strictly . . . Regression, and EN: General Analysis of the . . . Why

On the performance of the Lasso in terms of prediction loss Joint work with M. Hebiri and J.

Regularization: Ridge Regression and the LASSO Statistics 305: Autumn Quarter 2006/2007

The Factor-Lasso and K-Step Bootstrap Approach for Inference in High-Dimensional Economic

Screening Rules for Lasso with Non-Convex Sparse Regularizers Joseph Salmon

Complexity Analysis of the Lasso Regularization Path Julien Mairal and Bin Yu Inria, UC Berkeley

Omitted variable bias of Lasso-based inference methods: A finite sample analysis uthrich

On Model Selection Consistency Of Lasso Yewon Kim 12/08/2015 Introduction Model selection is a

On the Distribution of the Adaptive LASSO Estimator U. Schneider (joint with B. M. P otscher)

Using the lasso in Stata for inference in high-dimensional models David M. Drukker Executive

Analysis of a model of elastic plastic mixtures (Prandtl-Reuss-mixtures) Project of Josef

Particle Systems CSE169: Computer Animation Instructor: Steve Rotenberg UCSD, Winter 2019

The LNG Inventory Routing Problem with Pick-Up Contracts Henrik Andersson Marielle Christiansen

RESPONSE TO INTERVENTION Data Warehouse Audit: A big picture look at K-12 instruction,

QCD Phenomenology at High Energy Bryan Webber CERN Academic Training Lectures 2008 Lecture 3:

Composition operators on Sobolev spaces* A. Ukhlov Ben-Gurion University of the Negev, Israel

National Mitigation Angie Gladwell Deputy Assistant Administrator, Investment Strategy Risk

Loops and Expression Types Roman Kontchakov / Carsten Fuhs Birkbeck, University of London

Why LASSO, EN, and General Regularization CLOT: Invariance-Based - PowerPoint PPT Presentation

Need for Regularization Currently Used . . . Why: Remaining . . . Probabilistic . . . Why LASSO, EN, and General Regularization CLOT: Invariance-Based Scale-Invariance: . . . Shift-Invariance: . . . Explanation Why LASSO Beyond EN and

Ridge/Lasso Regression, Model selection Xuezhi Wang Computer Science Department Carnegie Mellon

Why Geometric Progression LASSO Method in Selecting the LASSO How Is Selected: . . . Natural

Sparse Exponential Weighting as an alternative to LASSO and Dantzig selector Alexandre Tsybakov

Sparse CCA using Lasso Anastasia Lykou &amp; Joe Whittaker Department of Mathematics and

Using Stata 16s lasso features for prediction and inference Di Liu StataCorp August, 2019

A practical tour of optimization algorithms for the Lasso Alexandre Gramfort

Why LASSO, Ridge Need for Strictly . . . Regression, and EN: General Analysis of the . . . Why

On the performance of the Lasso in terms of prediction loss Joint work with M. Hebiri and J.

Regularization: Ridge Regression and the LASSO Statistics 305: Autumn Quarter 2006/2007

The Factor-Lasso and K-Step Bootstrap Approach for Inference in High-Dimensional Economic

Screening Rules for Lasso with Non-Convex Sparse Regularizers Joseph Salmon

Complexity Analysis of the Lasso Regularization Path Julien Mairal and Bin Yu Inria, UC Berkeley

Omitted variable bias of Lasso-based inference methods: A finite sample analysis uthrich

On Model Selection Consistency Of Lasso Yewon Kim 12/08/2015 Introduction Model selection is a

On the Distribution of the Adaptive LASSO Estimator U. Schneider (joint with B. M. P otscher)

Using the lasso in Stata for inference in high-dimensional models David M. Drukker Executive

Analysis of a model of elastic plastic mixtures (Prandtl-Reuss-mixtures) Project of Josef

Particle Systems CSE169: Computer Animation Instructor: Steve Rotenberg UCSD, Winter 2019

The LNG Inventory Routing Problem with Pick-Up Contracts Henrik Andersson Marielle Christiansen

RESPONSE TO INTERVENTION Data Warehouse Audit: A big picture look at K-12 instruction,

QCD Phenomenology at High Energy Bryan Webber CERN Academic Training Lectures 2008 Lecture 3:

Composition operators on Sobolev spaces* A. Ukhlov Ben-Gurion University of the Negev, Israel

National Mitigation Angie Gladwell Deputy Assistant Administrator, Investment Strategy Risk

Loops and Expression Types Roman Kontchakov / Carsten Fuhs Birkbeck, University of London

Sparse CCA using Lasso Anastasia Lykou & Joe Whittaker Department of Mathematics and