Robust strategies and model selection Stefan Van Aelst Department - PowerPoint PPT Presentation

Robust strategies and model selection Stefan Van Aelst Department of Applied Mathematics and Computer Science Ghent University, Belgium Stefan.VanAelst@UGent.be ERCIM09 - COMISEF/COST Tutorial

Outline Regression model 1 Least squares 2 Manual variable selection approach 3 Automatic variable selection approach 4 Robustness 5 Robust variable selection: sequencing 6 Robust variable selection: segmentation 7 Robust selection procedures Stefan Van Aelst 2

Regression model Regression setting Consider a dataset Z n = { ( y i , x i 1 , . . . , x id ) = ( y i , x i ); i = 1 , . . . , n } ⊂ R d + 1 . Y is the response variable X 1 , . . . , X d are the candidate regressors The corresponding linear model is: i = 1 , . . . , n y i = β 1 x i 1 + · · · + β d x id + ǫ i y i = x ′ i = 1 , . . . , n i β + ǫ i where the errors ǫ i are assumed to be iid with E ( ǫ i ) = 0 and Var ( ǫ i ) = σ 2 > 0 . Estimate the regression coefficients β from the data. Robust selection procedures Stefan Van Aelst 3

Least squares Least squares solution n � � � 2 ˆ min y i − x ′ β LS solves i β β i = 1 Write X = ( x 1 , . . . , x n ) t y = ( y 1 , . . . , x n ) t β ( y − X β ) t ( y − X β ) Then, ˆ min β LS solves ⇒ β LS = ( X t X ) − 1 X t y ˆ β = X ( X t X ) − 1 X t y = H y y = X ˆ ˆ Robust selection procedures Stefan Van Aelst 4

Least squares Least squares properties Unbiased estimator: E (ˆ β LS ) = β Gauss-Markov theorem: LS has smallest variance among all unbiased linear estimators of β . Why do variable selection? Robust selection procedures Stefan Van Aelst 5

Least squares Expected prediction error Assume the true regression function is linear: Y | x = f ( x ) + ǫ = x t β + ǫ Predict the response Y 0 at x 0 : Y 0 = x t 0 β + ǫ 0 = f ( x 0 ) + ǫ 0 Use an estimator of the regression coefficients: ˜ β f ( x 0 ) = x t 0 ˜ Estimated prediction: ˜ β � f ( x 0 )) 2 � ( Y 0 − ˜ Expected prediction error: E Robust selection procedures Stefan Van Aelst 6

Least squares Expected prediction error � f ( x 0 )) 2 � f ( x 0 )) 2 ] = E E [( Y 0 − ˜ ( f ( x 0 ) + ǫ 0 − ˜ � f ( x 0 )) 2 � = σ 2 + E ( f ( x 0 ) − ˜ = σ 2 + MSE (˜ f ( x 0 )) σ 2 : irreducible variance of the new observation y 0 f ( x 0 )) mean squared error of the prediction at x 0 by MSE (˜ the estimator ˜ f Robust selection procedures Stefan Van Aelst 7

Least squares MSE of a prediction � f ( x 0 )) 2 � MSE (˜ f ( x 0 )) = E ( f ( x 0 ) − ˜ � β )] 2 � [ x t 0 ( β − ˜ = E � β )] 2 � [ x t 0 ( β − E (˜ β ) + E (˜ β ) − ˜ = E f ( x 0 )) 2 + Var (˜ = bias (˜ f ( x 0 )) LS is unbiased ⇒ bias (˜ f ( x 0 )) = 0 f ( x 0 )) (Gauss-Markov) LS minimizes Var (˜ LS has smallest MSPE among all linear unbiased estimators Robust selection procedures Stefan Van Aelst 8

Least squares LS instability LS becomes unstable with large MSPE if Var (˜ f ( x 0 )) is high. This can happen if Many noise variables among the candidate regressors Highly correlated predictors (multicollinearity) ⇒ Improve on least squares MSPE by trading (a little) bias for (a lot of) variance! Robust selection procedures Stefan Van Aelst 9

Manual variable selection approach Manual variable selection Try to determine the set of the most important regressors Remove the noise regressors from the model Avoid multicollinearity Methods All subsets Backward elimination Forward selection Stepwise selection → choose a selection criterion Robust selection procedures Stefan Van Aelst 10

Manual variable selection approach Submodels Dataset Z n = { ( y i , x i 1 , . . . , x id ) = ( y i , x i ); i = 1 , . . . , n } ⊂ R d + 1 . Let α ⊂ { 1 , . . . , d } denote the predictors included in a submodel The corresponding submodel is: y i = x ′ i = 1 , . . . , n . α i β α + ǫ α i A selected model is considered a good model if It is parsimonious It fits the data well It yields good predictions for similar data Robust selection procedures Stefan Van Aelst 11

Manual variable selection approach Some standard selection criteria A ( α ) = 1 − RSS ( α ) / ( n − d ( α )) Adjusted R 2 : RSS ( 1 ) / ( n − 1 ) C ( α ) = RSS ( α ) − ( n − 2 d ( α )) Mallow’s C p : σ 2 ˆ FPE ( α ) = RSS ( α ) + 2 d ( α ) Final Prediction Error: � σ 2 ˆ AIC ( α ) = − 2 L ( α ) + 2 d ( α ) AIC: BIC ( α ) = − 2 L ( α ) + log ( n ) d ( α ) BIC: where ˆ σ is the residual scale estimate in the "full" model Robust selection procedures Stefan Van Aelst 12

Manual variable selection approach Resampling based selection criteria Consider the (conditional) expected prediction error: � � � n � � 2 � 1 � � z i − x ′ � y , X α i ˆ PE ( α ) = E β α � , n i = 1 Estimates of the PE can be used as selection criterion. Estimates can be obtained by cross-validation or bootstrap. A more advanced selection criterion takes both goodness-of-fit and PE into account: � � � n n � � 2 � � 2 � PPE ( α ) = 1 � 1 � � y i − x ′ z i − x ′ � y , X α i ˆ + f ( n ) d ( α )+ˆ E α i ˆ β α β α � n n i = 1 i = 1 Robust selection procedures Stefan Van Aelst 13

Automatic variable selection approach Automatic variable selection Try to find a stable model that fits the data well Shrinkage: constrained least squares optimization Stagewise forward procedures Methods Ridge regression Lasso Least Angle regression L 2 Boosting Elastic Net Robust selection procedures Stefan Van Aelst 14

Automatic variable selection approach Lasso Least Absolute Shrinkage and Selection Operator   2 n d � � β lasso = arg min ˆ  y i − β 0 − β j x ij  β i = 1 j = 1 d � | β j | ≤ t subject to � β � 1 = j = 1 0 < t < � ˆ β LS � 1 is a tuning parameter Robust selection procedures Stefan Van Aelst 15

Automatic variable selection approach Example: LASSO fits LASSO 6 * 1 * * * * * * 4 Standardized Coefficients * * * * * * 8 * 2 * * 4 * * * * * * * * * * * * * 0 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * 7 * * * 3 −2 * 6 * 2 4 6 8 Df Robust selection procedures Stefan Van Aelst 16

Automatic variable selection approach Properties of LAR Least angle regression (LAR) selects the predictors in order of importance. LAR changes the contributions of the predictors gradually as they are needed. LAR is very similar to LASSO and can easily be adjusted to produce the LASSO solution LAR only uses the means, variances and correlations of the variables. LAR is computationally as efficient as LS Robust selection procedures Stefan Van Aelst 18

Automatic variable selection approach Example: LAR fits LAR 1 * * 0.6 * * * * * * Standardized Coefficients 0.4 * * 2 * * * * * * 0.2 4 * * * * * * * * * * * * * 0.0 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * 7 * * * 3 * −0.2 * 6 * 2 4 6 8 Df Robust selection procedures Stefan Van Aelst 19

Automatic variable selection approach L 2 boosting Standardize the variables. 1 Put r = y and ˆ F 0 = 0 2 Select x 1 such that | cor ( r , x 1 ) | = max j | cor ( r , x j ) | . 3 Update r = y − ν ˆ f ( x 1 ) where 0 < ν ≤ 1 is the step length and ˆ f ( x 1 ) are the fitted values from the LS regression of y on x 1 . Similarly, update ˆ F 1 = ˆ F 0 + ν ˆ f ( x 1 ) 4 Continue the procedure . . . Robust selection procedures Stefan Van Aelst 20

Automatic variable selection approach Sequencing variables Several selection algorithms sequence the predictors in "order of importance" or screen out the most relevant variables Forward/stepwise selection Stagewise forward selection Penalty methods Least angle regression L 2 boosting These methods are computationally very efficient because they are only based on means, variances and correlations. Robust selection procedures Stefan Van Aelst 21

Robustness Robustness: Data with outliers Question: Number of partners men and women desire to have in the next 30 years? Men: Mean=64.3, Median=1 − → Mean is sensitive to outliers − → Median is robust and thus more reliable Robust selection procedures Stefan Van Aelst 22

Robustness Least squares regression 5.5 Log Light Intensity 5.0 4.5 LS 4.0 3.6 3.8 4.0 4.2 4.4 4.6 Log Surface Temperature � r 2 LS: Minimize i ( β ) Robust selection procedures Stefan Van Aelst 23

Robustness Outliers 6.0 5.5 Log Light Intensity LS 5.0 4.5 4.0 3.6 3.8 4.0 4.2 4.4 4.6 Log Surface Temperature Outliers attract LS! Robust selection procedures Stefan Van Aelst 24

Robust strategies and model selection Stefan Van Aelst Department - PowerPoint PPT Presentation

Robust strategies and model selection Stefan Van Aelst Department of Applied Mathematics and Computer Science Ghent University, Belgium Stefan.VanAelst@UGent.be ERCIM09 - COMISEF/COST Tutorial Outline Regression model 1 Least squares 2

Selecting Variables in Two-Group Robust Linear Discriminant Analysis . . . . . Stefan Van

Outlier Outlier Outlier- Outlier - -robust - robust robust robust identification

Stefan Plantikow, Neo4j 2017 Stefan Plantikow, Neo4j 2 2017 Stefan Plantikow, Neo4j

ERP Selection KIRTANE & PANDIT Suhas Deshpande Why ERP Selection is important ?

STAT 213 Model Selection II Colin Reimer Dawson Oberlin College March 30, 2018 1 / 13 Outline

Robust Causal Domain Adaptation in a Simple Diagnostic Setting Thijs van Ommen Ghent, July 4,

Robust Pad Approximation Nick Trefethen, Oxford University Robust Pad Approximation Nick

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

SECONDHAND SELECTION Sales Price - 275,000.00 EU SECONDHAND SELECTION INTERNAL VIEWS SECONDHAND

SELECTION Deterministic Stochastic Proportionate selection: Roulette Wheel Selection

Selection 2 Selection Selection given a set of (distinct) elements, finding the element larger

Short Course in Supervised Learning Robust Optimization and Machine Learning Robust Supervised

MODEL SELECTION AND REGULARISATION MODEL SELECTION ESTIMATING THE ACCURACY OF THE MODEL We

Model Selection and Assumptions November 15, 2019 November 15, 2019 1 / 32 Forward Selection

STAT 213 Multicollinearity and Model Selection Colin Reimer Dawson Oberlin College 7 April 2016

Presentation Nico W. Van Yperen 23/11/15 Nico W. Van Yperen www.rug.nl/staff/N.van.Yperen Nico W.

Data Analysis and Alternatives 09 OCTOBER 2017 VIENNA NATIONAL EUROPEAN MIGRATION NETWORK

Scarcity is not the mother of invention! Peter Schuster Institut fr Theoretische Chemie,

Prepared by: Joint Venture of Sheltech Consultants Pvt. Ltd. Arc Bangladesh 1 Economic Survey

REPUBLIC OF UZBEKISTAN Investor Presentation Table of Contents 1. Introduction to Uzbekistan 3

Tiger Brands Limited Group Results Presentation for the year ended 30 September 2015 Index

IACM Annual Conference November 7, 2018 Chris Savage, Sr. Director Global Environmental Affairs

TreeHouse Foods, Inc. (Exact name of the registrant as specified in its charter) Delaware

An Ove rvie w o f Onta rio Ag ric ulture a nd Fo o d: An Ove rvie w o f Onta rio

Robust strategies and model selection Stefan Van Aelst Department - PowerPoint PPT Presentation

Robust strategies and model selection Stefan Van Aelst Department of Applied Mathematics and Computer Science Ghent University, Belgium Stefan.VanAelst@UGent.be ERCIM09 - COMISEF/COST Tutorial Outline Regression model 1 Least squares 2

Selecting Variables in Two-Group Robust Linear Discriminant Analysis . . . . . Stefan Van

Outlier Outlier Outlier- Outlier - -robust - robust robust robust identification

Stefan Plantikow, Neo4j 2017 Stefan Plantikow, Neo4j 2 2017 Stefan Plantikow, Neo4j

ERP Selection KIRTANE &amp; PANDIT Suhas Deshpande Why ERP Selection is important ?

STAT 213 Model Selection II Colin Reimer Dawson Oberlin College March 30, 2018 1 / 13 Outline

Robust Causal Domain Adaptation in a Simple Diagnostic Setting Thijs van Ommen Ghent, July 4,

Robust Pad Approximation Nick Trefethen, Oxford University Robust Pad Approximation Nick

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

SECONDHAND SELECTION Sales Price - 275,000.00 EU SECONDHAND SELECTION INTERNAL VIEWS SECONDHAND

SELECTION Deterministic Stochastic Proportionate selection: Roulette Wheel Selection

Selection 2 Selection Selection given a set of (distinct) elements, finding the element larger

Short Course in Supervised Learning Robust Optimization and Machine Learning Robust Supervised

MODEL SELECTION AND REGULARISATION MODEL SELECTION ESTIMATING THE ACCURACY OF THE MODEL We

Model Selection and Assumptions November 15, 2019 November 15, 2019 1 / 32 Forward Selection

STAT 213 Multicollinearity and Model Selection Colin Reimer Dawson Oberlin College 7 April 2016

Presentation Nico W. Van Yperen 23/11/15 Nico W. Van Yperen www.rug.nl/staff/N.van.Yperen Nico W.

Data Analysis and Alternatives 09 OCTOBER 2017 VIENNA NATIONAL EUROPEAN MIGRATION NETWORK

Scarcity is not the mother of invention! Peter Schuster Institut fr Theoretische Chemie,

Prepared by: Joint Venture of Sheltech Consultants Pvt. Ltd. Arc Bangladesh 1 Economic Survey

REPUBLIC OF UZBEKISTAN Investor Presentation Table of Contents 1. Introduction to Uzbekistan 3

Tiger Brands Limited Group Results Presentation for the year ended 30 September 2015 Index

IACM Annual Conference November 7, 2018 Chris Savage, Sr. Director Global Environmental Affairs

TreeHouse Foods, Inc. (Exact name of the registrant as specified in its charter) Delaware

An Ove rvie w o f Onta rio Ag ric ulture a nd Fo o d: An Ove rvie w o f Onta rio

ERP Selection KIRTANE & PANDIT Suhas Deshpande Why ERP Selection is important ?