why lasso en and
play

Why LASSO, EN, and General Regularization CLOT: Invariance-Based - PowerPoint PPT Presentation

Need for Regularization Currently Used . . . Why: Remaining . . . Probabilistic . . . Why LASSO, EN, and General Regularization CLOT: Invariance-Based Scale-Invariance: . . . Shift-Invariance: . . . Explanation Why LASSO Beyond EN and


  1. Need for Regularization Currently Used . . . Why: Remaining . . . Probabilistic . . . Why LASSO, EN, and General Regularization CLOT: Invariance-Based Scale-Invariance: . . . Shift-Invariance: . . . Explanation Why LASSO Beyond EN and CLOT? Hamza Alkhatib 1 , Ingo Neumann 1 Home Page Vladik Kreinovich 2 , and Chon Van Le 3 Title Page 1 Geodetic Institute, Leibniz University of Hannover Hannover, Germany ◭◭ ◮◮ alkhatib@gih.uni-hannover.de ◭ ◮ neumann@gih.uni-hannover.de 2 University of Texas at El Paso Page 1 of 59 El Paso, Texas 79968, USA, vladik@utep.edu 3 International University – VNU HCMC Go Back Ho Chi Minh City, Vietnam, lvchon@hcmiu.edu.vn Full Screen Close Quit

  2. Need for Regularization Currently Used . . . 1. Need for Solving the Inverse Problem Why: Remaining . . . • Once we have a model of a system, Probabilistic . . . General Regularization – we can use this model to predict the system’s be- Scale-Invariance: . . . havior, Shift-Invariance: . . . – in particular, to predict the results of future mea- Why LASSO surements and observations of this system. Beyond EN and CLOT? • The problem of estimating future measurement results Home Page based on the model is known as the forward problem . Title Page • In many practical situations, we do not know the exact ◭◭ ◮◮ model. ◭ ◮ • To be more precise: Page 2 of 59 – we know the general form of a dependence between Go Back physical quantities, Full Screen – but the parameters of this dependence need to be determined from the observations. Close Quit

  3. Need for Regularization Currently Used . . . 2. Need for Inverse Problem (cont-d) Why: Remaining . . . • For example, often, we have a linear model Probabilistic . . . General Regularization n � y = a 0 + a i · x i . Scale-Invariance: . . . i =1 Shift-Invariance: . . . Why LASSO • The parameters a i need to be experimentally deter- Beyond EN and CLOT? mined. Home Page • In general, we need to determine the parameters of the Title Page model based on the measurement results. ◭◭ ◮◮ • This problem is known as the inverse problem . ◭ ◮ • To actually find the parameters, we can use, e.g., the Page 3 of 59 Maximum Likelihood method. Go Back Full Screen Close Quit

  4. Need for Regularization Currently Used . . . 3. Need for Inverse Problem (cont-d) Why: Remaining . . . • For example: Probabilistic . . . General Regularization – when the errors are normally distributed, Scale-Invariance: . . . – the Maximum Likelihood procedure results in the Shift-Invariance: . . . usual Least Squares estimates. Why LASSO • For example, for a general linear model with parame- Beyond EN and CLOT? ters a i : Home Page – once we know several tuples of corresponding values Title Page ( x ( k ) 1 , . . . , x ( k ) n , y ( k ) ), 1 ≤ k ≤ K , ◭◭ ◮◮ – then we can find the parameters from the condition ◭ ◮ that Page 4 of 59 �� 2 K � � n y ( k ) − � � a i · x ( k ) Go Back a 0 + → min a 0 ,...,a n . i k =1 i =1 Full Screen Close Quit

  5. Need for Regularization Currently Used . . . 4. Need for Regularization Why: Remaining . . . • In some practical situations: Probabilistic . . . General Regularization – based on the measurement results, Scale-Invariance: . . . – we can determine all the model’s parameters with Shift-Invariance: . . . reasonably accuracy. Why LASSO • Often, several different combinations of parameters are Beyond EN and CLOT? consistent with all the measurement results. Home Page • Such inverse problems are called ill-defined . Title Page ◭◭ ◮◮ • E.g., in dynamical systems, the observations provide a smoothed picture of the system’s dynamics. ◭ ◮ • For example, we can be tracing the motion of a me- Page 5 of 59 chanical system caused by an external force. Go Back Full Screen Close Quit

  6. Need for Regularization Currently Used . . . 5. Need for Regularization (cont-d) Why: Remaining . . . • Then: Probabilistic . . . General Regularization – a strong but short-time force in one direction fol- Scale-Invariance: . . . lowed by Shift-Invariance: . . . – a similar strong and short-time force in the opposite Why LASSO direction will (almost) cancel each other. Beyond EN and CLOT? • So the same almost-unchanging behavior is consistent Home Page both: Title Page – with the absence of forces and ◭◭ ◮◮ – with the above wildly-oscillating force. ◭ ◮ • A similar phenomenon occurs when: Page 6 of 59 – based on the observed economic behavior, Go Back – we try to reconstruct the external forces affecting Full Screen the economic system. Close Quit

  7. Need for Regularization Currently Used . . . 6. Need for Regularization (cont-d) Why: Remaining . . . • In such situations: Probabilistic . . . General Regularization – the only way to narrow down the set of possible Scale-Invariance: . . . solution Shift-Invariance: . . . – is to take into account some general a priori infor- Why LASSO mation. Beyond EN and CLOT? • For example, for forces, we may know – e.g., from ex- Home Page perts – the upper bound. Title Page • The use of such a priori information is known as regu- ◭◭ ◮◮ larization . ◭ ◮ Page 7 of 59 Go Back Full Screen Close Quit

  8. Need for Regularization Currently Used . . . 7. Which Regularizations Are Currently Used Why: Remaining . . . • There are many possible regularizations. Probabilistic . . . General Regularization • Many of them have been tried. Scale-Invariance: . . . • Based on the results of these tries, a few techniques Shift-Invariance: . . . turned out to be empirically successful. Why LASSO • The most widely used technique of this type is known Beyond EN and CLOT? Home Page as LASSO technique. Title Page • LASSO is short of Least Absolute Shrinkage and Se- lection Operator. ◭◭ ◮◮ • We require that the sum of the absolute values ◭ ◮ n def � a � 1 = � | a i | be bounded by some number. Page 8 of 59 i =0 Go Back Full Screen Close Quit

  9. Need for Regularization Currently Used . . . 8. Currently Used Regularizations (cont-d) Why: Remaining . . . • Another widely used method is a ridge regression method, Probabilistic . . . n def General Regularization a 2 � in which we limit the sum of the squares S = i . i =0 Scale-Invariance: . . . • This is equivalent to bounding its square root Shift-Invariance: . . . √ Why LASSO def � a � 2 = S. Beyond EN and CLOT? Home Page • Very promising are also: Title Page – the Elastic Net (EN) method, in which we limit a ◭◭ ◮◮ linear combination � a � 1 + c · S , and ◭ ◮ – the Combined L-One and Two (CLOT) method in which we limit a linear combination � a � 1 + c · � a � 2 . Page 9 of 59 Go Back Full Screen Close Quit

  10. Need for Regularization Currently Used . . . 9. Why: Remaining Question and What We Do Why: Remaining . . . in This Talk Probabilistic . . . • The above empirical facts prompt a natural question: General Regularization why the above regularization techniques work the best? Scale-Invariance: . . . Shift-Invariance: . . . • We show that the efficiency of these methods can be Why LASSO explained by the natural invariance requirements. Beyond EN and CLOT? Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 10 of 59 Go Back Full Screen Close Quit

  11. Need for Regularization Currently Used . . . 10. General Idea of Regularization and Its Possi- Why: Remaining . . . ble Probabilistic Background Probabilistic . . . • In general, regularization means that we dismiss values General Regularization a i which are too large or too small. Scale-Invariance: . . . Shift-Invariance: . . . • In some cases, this dismissal is based on subjective Why LASSO estimations of what is large and what is small. Beyond EN and CLOT? • In other cases, the conclusion about what is large and Home Page what is not large is based on past experience. Title Page • So, it is based on the frequencies (= probabilities) with ◭◭ ◮◮ which different values have been observed in the past. ◭ ◮ • In this talk, we consider both types of regularization. Page 11 of 59 Go Back Full Screen Close Quit

  12. Need for Regularization Currently Used . . . 11. Probabilistic Regularization: Towards a Pre- Why: Remaining . . . cise Definition Probabilistic . . . • There is no a priori reason to believe that different General Regularization parameters have different distributions. Scale-Invariance: . . . Shift-Invariance: . . . • So, in the first approximation, it makes sense to assume Why LASSO that they have the same probability distribution. Beyond EN and CLOT? • Let us denote the probability density function of this Home Page common distribution by ρ ( a ). Title Page • In other words, the original information is invariant ◭◭ ◮◮ w.r.t. all possible permutations of parameters. ◭ ◮ • Then, the resulting joint distribution should also be Page 12 of 59 invariant with respect to all the permutations. Go Back • This implies, in particular, that all the marginal dis- tributions are the same. Full Screen Close Quit

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend