Why LASSO, Ridge Need for Strictly . . . Regression, and EN: - PowerPoint PPT Presentation

Need for Regularization Which Regularizations . . . Need for Degrees of . . . Need for “And”- and . . . Why LASSO, Ridge Need for Strictly . . . Regression, and EN: General Analysis of the . . . Why LASSO Explanation Based on Soft Why Ridge Regression Why EN: Idea Computing Home Page Title Page Woraphon Yamaka 1 , Hamza Alkhatib 2 , Ingo Neumann 2 , and Vladik Kreinovich 3 ◭◭ ◮◮ 1 Faculty of Economics, Chiang Mai University ◭ ◮ Chiang Mai, Thailand, woraphon.econ@gmail.com 2 Geodesic Institute, Leibniz University of Hannover Page 1 of 34 Hannover, Germany, alkhatib@gih.uni-hannover.de neumann@gih.uni-hannover.de Go Back 3 Department of Computer Science, University of Texas at El Paso Full Screen El Paso, Texas 79968, USA, vladik@utep.edu Close Quit

Need for Regularization Which Regularizations . . . 1. Need for Regularization Need for Degrees of . . . • In practice, in addition to measurement results, we of- Need for “And”- and . . . ten use imprecise expert knowledge. Need for Strictly . . . General Analysis of the . . . • For example, physicists usually believe that: Why LASSO – when the value of a physical quantity x is small, Why Ridge Regression – we expand the dependence y = f ( x ) of some other Why EN: Idea quantity y on x in Taylor series, and Home Page – ignore quadratic and higher order terms in this ex- Title Page pansion. ◭◭ ◮◮ • The usual argument is that: ◭ ◮ – when x is small, Page 2 of 34 – its square x 2 is so much smaller than x that it can Go Back safely be ignored. Full Screen Close Quit

Need for Regularization Which Regularizations . . . 2. Need for Regularization (cont-d) Need for Degrees of . . . • This is indeed true: Need for “And”- and . . . – if x = 10% = 0 . 1, then x 2 = 0 . 01 ≪ 0 . 1; Need for Strictly . . . General Analysis of the . . . – if x = 1% = 0 , 01, then we can say that x 2 = Why LASSO 0 . 0001 ≪ x = 0 . 01 with even higher confidence. Why Ridge Regression • However, from the purely mathematical viewpoint, this Why EN: Idea argument is not fully convincing. Home Page • Indeed, the quadratic term in the Taylor expansion is Title Page not x 2 , but a 2 · x 2 for some coefficient a 2 . ◭◭ ◮◮ • From the purely mathematical viewpoint, this coeffi- ◭ ◮ cient a 2 can be huge. Page 3 of 34 • In this case the product a 2 · x 2 will also be big, and we Go Back will not be able to ignore it. Full Screen • From the physicist’s viewpoint, however, this argument is valid. Close Quit

Need for Regularization Which Regularizations . . . 3. Need for Regularization (cont-d) Need for Degrees of . . . • Indeed, physicists usually assume that the coefficients Need for “And”- and . . . cannot be too large, they must be reasonably small. Need for Strictly . . . General Analysis of the . . . • This imprecise additional assumption underlies many Why LASSO successes of physics. Why Ridge Regression • It can also be used as a supplement to measurements Why EN: Idea when we estimate the values of physical quantities. Home Page • This is common sense. Title Page • Sometimes, after applying some mathematical tech- ◭◭ ◮◮ niques, we get too large values of some parameters. ◭ ◮ • This usually means that something is not right: Page 4 of 34 – either with our method Go Back – or with some measurement results – they may be Full Screen outliers. Close Quit

Need for Regularization Which Regularizations . . . 4. Need for Regularization (cont-d) Need for Degrees of . . . • In simple cases, it is clear that if we have a record of Need for “And”- and . . . temperature in some area, Need for Strictly . . . General Analysis of the . . . – and we see 17, 18, 19, 18, 17, and then suddenly 42 Why LASSO degrees, Why Ridge Regression – we should get very suspicious – especially if the Why EN: Idea next day, we again have the high of 19. Home Page • Physicists’ intuition is great, but we cannot always rely Title Page on this intuition. ◭◭ ◮◮ • There are many problems that need solving. ◭ ◮ • It is not realistic to expect to have a skilled physicist Page 5 of 34 for each such problem. Go Back • How to deal with situations when a professional physi- Full Screen cist is not available? Close Quit

Need for Regularization Which Regularizations . . . 5. Need for Regularization (cont-d) Need for Degrees of . . . • We need to have a precise description of: Need for “And”- and . . . Need for Strictly . . . – what we mean General Analysis of the . . . – when we say that the coefficients a 0 , . . . , a n describ- Why LASSO ing a model must be reasonably small. Why Ridge Regression – Such descriptions are known as regularization . Why EN: Idea Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 6 of 34 Go Back Full Screen Close Quit

Need for Regularization Which Regularizations . . . 6. Which Regularizations Are Currently Used Need for Degrees of . . . • Out of many possible regularizations, the following three Need for “And”- and . . . techniques have been most empirically successful: Need for Strictly . . . General Analysis of the . . . – LASSO technique when we limit the sum of the n Why LASSO � absolute values | a i | ; Why Ridge Regression i =1 – ridge regression method, in which we limit the sum Why EN: Idea n Home Page � a 2 of the squares i ; and i =0 Title Page – the Elastic Net (EN) method, in which we limit a ◭◭ ◮◮ linear combination of the above two sums. ◭ ◮ • Why? Page 7 of 34 • In this paper, we show that: Go Back – a natural formalization of commonsense intuition Full Screen – indeed leads to these three regularization techniques. Close Quit

Need for Regularization Which Regularizations . . . 7. Need for Degrees of Confidence Need for Degrees of . . . • Precise statements like “ x is larger than 5” are either Need for “And”- and . . . true or false. Need for Strictly . . . General Analysis of the . . . • In contrast, imprecise statements like “ x is reasonably Why LASSO small” are not well-defined. Why Ridge Regression • For some values x , for example, for x = 0 . 0001, the Why EN: Idea expert is absolutely sure that x is small. Home Page • For other values like x = 10 7 , the expert is usually Title Page absolutely sure that this value is not reasonably small. ◭◭ ◮◮ • However, for intermediate values x : ◭ ◮ – the expert is usually not 100% sure whether this Page 8 of 34 value is indeed reasonably small; Go Back – he or she is only sure to some degree. Full Screen Close Quit

Need for Regularization Which Regularizations . . . 8. Need for Degrees of Confidence (cont-d) Need for Degrees of . . . • It is therefore reasonable to ask the expert to assign: Need for “And”- and . . . Need for Strictly . . . – to each value x , General Analysis of the . . . – a degree µ ( x ) to which this expert believes that x Why LASSO is reasonably small. Why Ridge Regression • We can use different scales for such degrees. Why EN: Idea Home Page • In the computer, “absolutely true” is usually described as 1, and “absolutely false” as 0. Title Page ◭◭ ◮◮ • So, it is convenient to use a scale from 0 to 1 for such degrees. ◭ ◮ • This assignment is one of the main ideas behind fuzzy Page 9 of 34 logic . Go Back • This technique was specifically developed to deal with Full Screen such imprecision. Close Quit

Need for Regularization Which Regularizations . . . 9. Need for Degrees of Confidence (cont-d) Need for Degrees of . . . • This way, we can assign: Need for “And”- and . . . Need for Strictly . . . – to each imprecise statement, General Analysis of the . . . – a function µ ( x ) that describes to what degree this Why LASSO statement is satisfied for each value x . Why Ridge Regression • This function is known as a membership function or a Why EN: Idea fuzzy set . Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 10 of 34 Go Back Full Screen Close Quit

Need for Regularization Which Regularizations . . . 10. Need for “And”- and “Or”-Operations Need for Degrees of . . . • Often, experts make complex statements. Need for “And”- and . . . Need for Strictly . . . • For example, they may say that x is reasonably small, General Analysis of the . . . but not very small. Why LASSO • This statement is obtained: Why Ridge Regression – from the basic statements “ x is reasonably small” Why EN: Idea Home Page and “ x is very small” – by applying connectives “not” and “but” (which Title Page here means the same as “and”). ◭◭ ◮◮ • In general: ◭ ◮ – we can use connectives “and”, “or”, and “not” Page 11 of 34 – to combine elementary statements into a composite Go Back one. Full Screen Close Quit

Why LASSO, Ridge Need for Strictly . . . Regression, and EN: - PowerPoint PPT Presentation

Need for Regularization Which Regularizations . . . Need for Degrees of . . . Need for And- and . . . Why LASSO, Ridge Need for Strictly . . . Regression, and EN: General Analysis of the . . . Why LASSO Explanation Based on Soft Why

Ridge/Lasso Regression, Model selection Xuezhi Wang Computer Science Department Carnegie Mellon

Regularization: Ridge Regression and the LASSO Statistics 305: Autumn Quarter 2006/2007

Why Geometric Progression LASSO Method in Selecting the LASSO How Is Selected: . . . Natural

Sparse CCA using Lasso Anastasia Lykou & Joe Whittaker Department of Mathematics and

A practical tour of optimization algorithms for the Lasso Alexandre Gramfort

Sparse Exponential Weighting as an alternative to LASSO and Dantzig selector Alexandre Tsybakov

Using Stata 16s lasso features for prediction and inference Di Liu StataCorp August, 2019

Why LASSO, EN, and General Regularization CLOT: Invariance-Based Scale-Invariance: . . .

Mount Sutro Mount Sutro South Ridge & Edgewood Avenue South Ridge & Edgewood Avenue

Blue Ridge Blue Ridge $858,700,000 in new investment since 2010 Blue Ridge Anecdotal Market

RIDGE and LASSO regularization for regression Feature selection - Some algorithms perform

Omitted variable bias of Lasso-based inference methods: A finite sample analysis uthrich

Complexity Analysis of the Lasso Regularization Path Julien Mairal and Bin Yu Inria, UC Berkeley

On Model Selection Consistency Of Lasso Yewon Kim 12/08/2015 Introduction Model selection is a

On the Distribution of the Adaptive LASSO Estimator U. Schneider (joint with B. M. P otscher)

Using the lasso in Stata for inference in high-dimensional models David M. Drukker Executive

COMS 4721: Machine Learning for Data Science Lecture 4, 1/26/2017 Prof. John Paisley Department

POIR 613: Computational Social Science Pablo Barber a School of International Relations

One-Hot Encoding MACH IN E LEARN IN G W ITH P YS PARK Andrew Collier Data Scientist, Exegetic

GJGNY Advisory Council Meeting April 15, 2016 2 Agenda Program Status Future Funding

Survey of Machine Learning Methods Pedro Rodriguez CU Boulder PhD Student in Large-Scale Machine

Lecture 08: Ridge Regression, Equivalent Formulations and KKT Conditions Instructor: Prof. Ganesh

Optimization MS Maths Big Data Alexandre Gramfort alexandre.gramfort@telecom-paristech.fr

CS480/680 Machine Learning Lecture 3: May 13, 2019 Linear Regression [RN] Sec. 18.6.1, [HTF]

Sambuz

Useful Links

Newsletter

Mail Us

Why LASSO, Ridge Need for Strictly . . . Regression, and EN: - PowerPoint PPT Presentation

Need for Regularization Which Regularizations . . . Need for Degrees of . . . Need for And- and . . . Why LASSO, Ridge Need for Strictly . . . Regression, and EN: General Analysis of the . . . Why LASSO Explanation Based on Soft Why

Ridge/Lasso Regression, Model selection Xuezhi Wang Computer Science Department Carnegie Mellon

Regularization: Ridge Regression and the LASSO Statistics 305: Autumn Quarter 2006/2007

Why Geometric Progression LASSO Method in Selecting the LASSO How Is Selected: . . . Natural

Sparse CCA using Lasso Anastasia Lykou &amp; Joe Whittaker Department of Mathematics and

A practical tour of optimization algorithms for the Lasso Alexandre Gramfort

Sparse Exponential Weighting as an alternative to LASSO and Dantzig selector Alexandre Tsybakov

Using Stata 16s lasso features for prediction and inference Di Liu StataCorp August, 2019

Why LASSO, EN, and General Regularization CLOT: Invariance-Based Scale-Invariance: . . .

Mount Sutro Mount Sutro South Ridge &amp; Edgewood Avenue South Ridge &amp; Edgewood Avenue

Blue Ridge Blue Ridge $858,700,000 in new investment since 2010 Blue Ridge Anecdotal Market

RIDGE and LASSO regularization for regression Feature selection - Some algorithms perform

Omitted variable bias of Lasso-based inference methods: A finite sample analysis uthrich

Complexity Analysis of the Lasso Regularization Path Julien Mairal and Bin Yu Inria, UC Berkeley

On Model Selection Consistency Of Lasso Yewon Kim 12/08/2015 Introduction Model selection is a

On the Distribution of the Adaptive LASSO Estimator U. Schneider (joint with B. M. P otscher)

Using the lasso in Stata for inference in high-dimensional models David M. Drukker Executive

COMS 4721: Machine Learning for Data Science Lecture 4, 1/26/2017 Prof. John Paisley Department

POIR 613: Computational Social Science Pablo Barber a School of International Relations

One-Hot Encoding MACH IN E LEARN IN G W ITH P YS PARK Andrew Collier Data Scientist, Exegetic

GJGNY Advisory Council Meeting April 15, 2016 2 Agenda Program Status Future Funding

Survey of Machine Learning Methods Pedro Rodriguez CU Boulder PhD Student in Large-Scale Machine

Lecture 08: Ridge Regression, Equivalent Formulations and KKT Conditions Instructor: Prof. Ganesh

Optimization MS Maths Big Data Alexandre Gramfort alexandre.gramfort@telecom-paristech.fr

CS480/680 Machine Learning Lecture 3: May 13, 2019 Linear Regression [RN] Sec. 18.6.1, [HTF]

Sambuz

Useful Links

Newsletter

Mail Us

Sparse CCA using Lasso Anastasia Lykou & Joe Whittaker Department of Mathematics and

Mount Sutro Mount Sutro South Ridge & Edgewood Avenue South Ridge & Edgewood Avenue