Conditional Predictive Inference Post Model Selection Hannes Leeb - PowerPoint PPT Presentation

Conditional Predictive Inference Post Model Selection Hannes Leeb Department of Statistics Yale University Model Selection Workshop, Vienna, July 25, 2008 Hannes Leeb Conditional Predictive Inference Post Model Selection

Introduction Setting Goal 1 Goal 2 Conclusion Problem statment Predictive inference post model selection in setting with large dimension and (comparatively) small sample size. Example: Stenbakken & Souders (1987, 1991): Predict performance of D/A converters. Select 64 explanatory variables from a total of 8,192 based on a sample of size 88. Features of this example: Large number of candidate models Selected model is complex in relation to sample size Focus on predictive performance and inference, not on correctness Model is selected and fitted to the data once and then used repeatedly for prediction Hannes Leeb Conditional Predictive Inference Post Model Selection

Introduction Setting Goal 1 Goal 2 Conclusion Problem statment Predictive inference post model selection in setting with large dimension and (comparatively) small sample size. Problem studied here: Given a training sample of size n and a collection M of candidate models, find a ‘good’ model m ∈ M and conduct predictive inference based on selected model, conditional on the training sample. Features: # M ≫ n , i.e., potentially many candidate models | m | ∼ n , i.e., potentially complex candidate models no strong regularity conditions Hannes Leeb Conditional Predictive Inference Post Model Selection

Introduction Setting Goal 1 Goal 2 Conclusion Overview of results We consider a model selector and a prediction interval post model selection (that are based on a variant of generalized cross-validation) in linear regression with random design. For Gaussian data we show: The prediction interval is ‘approximately valid and short’ conditional on the training sample, except on an event whose probability is less than � � C 1 # M exp − C 2 ( n − |M| ) , where # M denotes the number of candidate models, and |M| denotes the number of parameters in the most complex candidate model. This finite-sample result holds uniformly over all data-generating processes that we consider. Hannes Leeb Conditional Predictive Inference Post Model Selection

Introduction Setting Goal 1 Goal 2 Conclusion The data-generating process Gaussian linear model with random design Consider a response y that is related to a (possibly infinite) number of explanatory variables x j , j ≥ 1 , by ∞ � y = x j θ j + u (1) j =1 with x 1 = 1 . Assume that u has mean zero and is uncorrelated with the x j ’s. Moreover, assume that the x j ’s for j > 1 and u are jointly non-degenerate Gaussian, such that the sum converges in L 2 . Hannes Leeb Conditional Predictive Inference Post Model Selection

Introduction Setting Goal 1 Goal 2 Conclusion The data-generating process Gaussian linear model with random design Consider a response y that is related to a (possibly infinite) number of explanatory variables x j , j ≥ 1 , by ∞ � y = x j θ j + u (1) j =1 with x 1 = 1 . Assume that u has mean zero and is uncorrelated with the x j ’s. Moreover, assume that the x j ’s for j > 1 and u are jointly non-degenerate Gaussian, such that the sum converges in L 2 . The unknown parameters here are θ , the variance of u , as well as the means and the variance/covariance structure of the x j ’s. Hannes Leeb Conditional Predictive Inference Post Model Selection

Introduction Setting Goal 1 Goal 2 Conclusion The data-generating process Gaussian linear model with random design Consider a response y that is related to a (possibly infinite) number of explanatory variables x j , j ≥ 1 , by ∞ � y = x j θ j + u (1) j =1 with x 1 = 1 . Assume that u has mean zero and is uncorrelated with the x j ’s. Moreover, assume that the x j ’s for j > 1 and u are jointly non-degenerate Gaussian, such that the sum converges in L 2 . No further regularity conditions are imposed. Hannes Leeb Conditional Predictive Inference Post Model Selection

Introduction Setting Goal 1 Goal 2 Conclusion The candidate models and predictors The candidate models and predictors Consider a sample ( X, Y ) of n independent realizations of ( x, y ) as in (1), and a collection M of candidate models. Each model m ∈ M is assumed to satisfy | m | < n − 1 . Each model m is fit to the data by least-squares. Given a new set of explanatory variables x ( f ) , the corresponding response y ( f ) is predicted by ∞ x ( f ) y ( f ) ( m ) � ˜ ˆ = θ j ( m ) j j =1 when using model m . Here, x ( f ) , y ( f ) is another independent realization from (1), and ˜ θ ( m ) is the restricted least-squares estimator corresponding to m . Hannes Leeb Conditional Predictive Inference Post Model Selection

Introduction Setting Goal 1 Goal 2 Conclusion Two goals (i) Select a ‘good’ model from M for prediction out-of-sample, and (ii) conduct predictive inference based on the selected model, both conditional on the training sample. Two Quantities of Interest For m ∈ M , let ρ 2 ( m ) denote the conditional mean-squared error y ( f ) ( m ) given the training sample, i.e., of the predictor ˆ � 2 � � � � � y ( f ) − ˆ ρ 2 ( m ) y ( f ) ( m ) � � = E � X, Y . � � � For m ∈ M , the conditional distribution of the prediction error y ( f ) ( m ) − y ( f ) given the training sample is ˆ y ( f ) ( m ) − y ( f ) � � N ( ν ( m ) , δ 2 ( m )) ˆ � X, Y ∼ ≡ L ( m ) . � � � Note that ρ 2 ( m ) = ν 2 ( m ) + δ 2 ( m ) . Hannes Leeb Conditional Predictive Inference Post Model Selection

Conditional Predictive Inference Post Model Selection Hannes Leeb - PowerPoint PPT Presentation

Conditional Predictive Inference Post Model Selection Hannes Leeb Department of Statistics Yale University Model Selection Workshop, Vienna, July 25, 2008 Hannes Leeb Conditional Predictive Inference Post Model Selection Introduction

Post-Selection Inference Todd Kuffner Washington University in St. Louis PhyStat 2016

Model Predictive Control Model Predictive Control of Hybrid Systems of Hybrid Systems Model

Contents 1 Causal Inference and Predictive Comparison 2 1.1 How Predictive Comparison Can

Session 3 Upskilling for Predictive Analytics Travis M Short, FSA Upskilling for Predictive

Review: Conditional Probability Conditional Probability The conditional probability of event

11/15/16 Conditional distributions Let X and Y be discrete r.v.s. Conditional probability mass

ERP Selection KIRTANE & PANDIT Suhas Deshpande Why ERP Selection is important ?

STAT 213 Model Selection II Colin Reimer Dawson Oberlin College March 30, 2018 1 / 13 Outline

Selective Inference via the Condition on Selection Framework: Inference after Variable Selection

Bayesian Model Selection and Averaging Nonlinear Models Bayes factors Example Families FFX

Generalized Model Predictive Control (Discretely Generalized MPC) Sa sa V. Rakovi c, Ph.D.

Selective inference: a conditional perspective Xiaoying Tian Harris Joint work with Jonathan

SECONDHAND SELECTION Sales Price - 275,000.00 EU SECONDHAND SELECTION INTERNAL VIEWS SECONDHAND

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

SELECTION Deterministic Stochastic Proportionate selection: Roulette Wheel Selection

Selection 2 Selection Selection given a set of (distinct) elements, finding the element larger

1. Introduction In this lecture we will derive the formulas for the symmetric two-sided prediction

Section 2.2: Simple Linear Regression: Predictions and Inference Jared S. Murray The University

Produc'onof(an')nucleiinpp andPbPbcollisionswithALICE

Mobile and Ubiquitous Computing on Smartphones Chapter 9a: Voice Analytics Emmanuel Agu Speech

Chapter 11 Output Analysis for a Single Model Banks, Carson, Nelson & Nicol Discrete-Event

Modeling and Control of Dynamic Systems Validation Darya Krushevskaya Konstantin Tretyakov

Outline Outline Review of PSP Levels Overview Resource planning Planning IV: Planning IV:

Multiple Regression and Logistic Regression I Dajiang Liu @PHS 525 Apr-14-2016 Multiple

Sambuz

Useful Links

Newsletter

Mail Us

Conditional Predictive Inference Post Model Selection Hannes Leeb - PowerPoint PPT Presentation

Conditional Predictive Inference Post Model Selection Hannes Leeb Department of Statistics Yale University Model Selection Workshop, Vienna, July 25, 2008 Hannes Leeb Conditional Predictive Inference Post Model Selection Introduction

Post-Selection Inference Todd Kuffner Washington University in St. Louis PhyStat 2016

Model Predictive Control Model Predictive Control of Hybrid Systems of Hybrid Systems Model

Contents 1 Causal Inference and Predictive Comparison 2 1.1 How Predictive Comparison Can

Session 3 Upskilling for Predictive Analytics Travis M Short, FSA Upskilling for Predictive

Review: Conditional Probability Conditional Probability The conditional probability of event

11/15/16 Conditional distributions Let X and Y be discrete r.v.s. Conditional probability mass

ERP Selection KIRTANE &amp; PANDIT Suhas Deshpande Why ERP Selection is important ?

STAT 213 Model Selection II Colin Reimer Dawson Oberlin College March 30, 2018 1 / 13 Outline

Selective Inference via the Condition on Selection Framework: Inference after Variable Selection

Bayesian Model Selection and Averaging Nonlinear Models Bayes factors Example Families FFX

Generalized Model Predictive Control (Discretely Generalized MPC) Sa sa V. Rakovi c, Ph.D.

Selective inference: a conditional perspective Xiaoying Tian Harris Joint work with Jonathan

SECONDHAND SELECTION Sales Price - 275,000.00 EU SECONDHAND SELECTION INTERNAL VIEWS SECONDHAND

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

SELECTION Deterministic Stochastic Proportionate selection: Roulette Wheel Selection

Selection 2 Selection Selection given a set of (distinct) elements, finding the element larger

1. Introduction In this lecture we will derive the formulas for the symmetric two-sided prediction

Section 2.2: Simple Linear Regression: Predictions and Inference Jared S. Murray The University

Produc'onof(an')nucleiinpp andPbPbcollisionswithALICE

Mobile and Ubiquitous Computing on Smartphones Chapter 9a: Voice Analytics Emmanuel Agu Speech

Chapter 11 Output Analysis for a Single Model Banks, Carson, Nelson &amp; Nicol Discrete-Event

Modeling and Control of Dynamic Systems Validation Darya Krushevskaya Konstantin Tretyakov

Outline Outline Review of PSP Levels Overview Resource planning Planning IV: Planning IV:

Multiple Regression and Logistic Regression I Dajiang Liu @PHS 525 Apr-14-2016 Multiple

Sambuz

Useful Links

Newsletter

Mail Us

ERP Selection KIRTANE & PANDIT Suhas Deshpande Why ERP Selection is important ?

Chapter 11 Output Analysis for a Single Model Banks, Carson, Nelson & Nicol Discrete-Event