Significance testing after cross-validation Joshua Loftus ( - PowerPoint PPT Presentation

Significance testing after cross-validation Joshua Loftus ( jloftus@turing.ac.uk ) (building from joint work with Jonathan Taylor) 9 December, 2016 Slides and markdown source at https://joftius.github.io/turing 1 / 20

Setting: regression model selection Linear model y = Xβ + ǫ y vector of outcomes X predictor/feature matrix β parameters/weights to be estimated, assume most are “null,” i.e. equal 0 (sparsity) ǫ random errors, assume probability distribution N (0 , σ 2 I ) Pick subset of predictors we think are non-null How good is the model using this subset? Are chosen predictors actually non-null, i.e. significant? Type 1 error : declaring a predictor significant when it is actually null. 2 / 20

Motivating example: forward stepwise Data: California county health data. . . Outcome: log-years of potential life lost. Model: 5 out of 30 predictors chosen by FS with AIC. model <- step ( lm (y ~ .-1, df), k = 2, trace = 0) print ( summary (model)$coefficients[, c (1,4)], digits = 2) ## Estimate Pr(>|t|) ## Food.Environment.Index 0.342 0.0296 ## `%.With.Access` -0.036 0.0017 ## `%.Excessive.Drinking` 0.090 0.0182 ## Teen.Birth.Rate 0.026 0.0045 ## Average.Daily.PM2.5 -0.225 0.0211 5 interesting effects, all significant. Time to publish! 3 / 20

What’s wrong with this? The outcome was actually just noise, independent of the predictors set.seed (1) df = read.csv ("CaliforniaCountyHealth.csv") df$y <- rnorm ( nrow (df)) #!!! (With apologies for deceiving you, I hope this makes the point. . . ) 4 / 20

Selection can make noise look like signal Any time we use the data to make a decision (e.g. pick one model instead of some others), we may introduce a selection effect (bias). This happens with forward stepwise, Lasso, elastic net with cross-validation, etc. Significance tests, prediction error, R 2 , goodness of fit tests, etc, can all suffer from this selection bias 5 / 20

Most common solution: data splitting Pros: Simple: only takes a few lines of code Robust: requires few assumptions Controls (selective) type 1 error, no selection bias Cons: Reproducibility issues: different random splits, different split proportions Efficiency: using less data for model selection, also less power Feasibility: categorical variables with rare levels (e.g. rare variants) 6 / 20

Literature on (conditional) post-selection inference Frequentist interpretation Hurvich & Tsai (1990) Lasso, sequential Lockhart et al. (2014) General penalty, global null, geometry Taylor, Loftus, and Tibshirani (2015), Azaïs, Castro, and Mourareau (2015) Forward stepwise, sequential Loftus and Taylor (2014) Fixed λ Lasso / conditional Lee et al. (2015), Fithian, Sun, and Taylor (2014) Forward stepwise and LAR Tibshirani et al. (2014) Asymptotics Tian and Taylor (2015a) Unknown σ Tian, Loftus, and Taylor (2015), Gross, Taylor, and Tibshirani (2015) Group selection / unknown σ Loftus and Taylor (2015) Cross-validation Tian and Taylor (2015b), Loftus (2015) Unsupervised learning Blier, Loftus, and Taylor (2016) (Incomplete list, growing fast) 7 / 20

Previous work: affine model selection Model selection map M : R n → M , with M space of potential models. Observe E m = { M ( y ) = m } , want to condition on this event. For many model selection procedures (e.g. Lasso at fixed λ ) L ( y | M ( y ) = m ) = L ( y | A ( m ) y ≤ b ( m ) ) on { M ( y ) = m } � �� what we want simple geometry MVN constrained to a polytope. 8 / 20

Quadratic model selection framework For some model selection procedures (e.g. forward stepwise with groups, cross-validation), model selection event can be decomposed as Quadratic selection event � { y : y T Q j y + a T E m := { M ( y ) = m } = j y + b j ≥ 0 } j ∈ J m These Q, a, b are constant on E m , so conditionally they are constants For conditional inference, need to compute this intersection of quadratics 9 / 20

Truncated χ significance test Suppose y ∼ N ( µ, σ 2 I ) with σ 2 known, H 0 ( m ) : P m µ = 0 , P m is constant on { M ( y ) = m } , r := Tr ( P m ) , R := P m y , u := R/ � R � 2 , z := y − R , D m := { t ≥ 0 : M ( utσ + z ) = m } , and the observed statistic T = � R � 2 /σ Post-selection Tχ distribution T | ( m, z, u ) ∼ χ r | D m (1) where the vertical bar denotes truncation. Hence, with f r the pdf of a central χ r random variable � D m ∩ [ T, ∞ ] f r ( t ) dt Tχ := ∼ U [0 , 1] (2) � D m f r ( t ) dt is a p -value controlling selective type 1 error. 10 / 20

Geometry problem: intersection of quadratic regions y Figure 1: The complement of each quadratic is shaded with a different color. The unshaded, white region is E m . 11 / 20

Geometry problem: intersection of quadratic regions y z u Figure 1: The complement of each quadratic is shaded with a different color. The unshaded, white region is E m . 11 / 20

Geometry problem: intersection of quadratic regions uT + z Figure 1: The complement of each quadratic is shaded with a different color. The unshaded, white region is E m . 11 / 20

Adaptive model selection with cross-validation For K -fold cv, data partitioned (randomly) into D 1 , . . . , D K . For each k = 1 , . . . , K , hold out D k as a test set while training a model on the other K − 1 folds. Form estimate RSS k of out-of-sample prediction error. Average these estimates over test folds. Use to choose model complexity: evaluate RSS k,s for various sparsity choices s . Pick s minimizing the cv-RSS estimate. Run forward stepwise with maxsteps S . For s = 1 , . . . , S evaluate the test error RSS k,s . Average to get RSS s . Pick s ∗ minimizing this. Run forward stepwise on the whole data for s ∗ steps. Can we do selective inference for the final models chosen this way? 12 / 20

Notation for cross-validation Let f, g index CV test folds. On fold f , model m f at step s , and − f denoting the training set for test fold f (complement of f ). m f ,s ) † (not a projection) Define P f,s := X f m f ,s ( X − f f =1 � y f − P f,s y − f � 2 � K s = argmin s 2 Sums of squares. . . maybe it’s a quadratic form? 13 / 20

Blockwise quadratic form of cv-RSS Key result of Loftus (2015). ff := � Define Q s g � = f ( P g,s ) T f ( P g,s ) f and K � Q s fg := − ( P f,s ) g − ( P g,s ) T ( P h,s ) T f ( P h,s ) T f + g h =1 h/ ∈{ f,g } Then with y K denoting the observations ordered by CV-folds, cv-RSS ( s ) = y T K Q s y K This quadratic form allows us to conduct inference conditional on models selected by cross-validation 14 / 20

Empirical CDF: forward stepwise simulation n = 100, p = 200, K = 5, sparsity = 5, betas = 1 1.00 0.75 Type Adjusted Naive ecdf NoCV 0.50 Null FALSE TRUE 0.25 0.00 0.0 0.4 0.8 Pvalue 15 / 20

Empirical CDF: LAR simulation n = 50, p = 100, K = 5, sparsity = 5 1.00 0.75 Null FALSE TRUE ecdf 0.50 Type Adjusted Naive NoCV 0.25 0.00 0.00 0.25 0.50 0.75 1.00 Pvalue 16 / 20

Remarks Technical details in the papers, a few notes: Tests not independent Computationally expensive May be low powered against some alternatives Can also do σ 2 unknown case Most usual limitations of model selection still apply Software implementation: selectiveInference R package on CRAN Github repo: https://github.com/selective-inference/ 17 / 20

References Taylor, Tibshirani (2015). Statistical learning and selective inference. PNAS . Benjamini, (2010). Simultaneous and selective inference: current successes and future challenges. Biometrical Journal. Berk et al, (2010). Statistical inference after model selection. Journal of Quantitative Criminology. Berk et al, (2013). Valid post-selection inference. Annals of Statistics. Simon et al, (2011). Regularization Paths for Cox’s Proportional Hazards Model via Coordinate Descent. Journal of Statistical Software. Loftus, (2015). Selective inference after cross-validation. arXiv Preprint. Loftus and Taylor, (2015). Selective inference in regression models with groups of variables. arXiv Preprint. 18 / 20

Thanks for your attention! Questions? jloftus@turing.ac.uk 19 / 20

More references Azaïs, Jean-Marc, Yohann de Castro, and Stéphane Mourareau. 2015. “Power of the Kac-Rice Detection Test.” ArXiv Preprint ArXiv:1503.05093 . Blier, Léonard, Joshua R. Loftus, and Jonathan E. Taylor. 2016. “Inference on the Number of Clusters in k -Means Clustering.” In Progress . Fithian, William, Dennis Sun, and Jonathan Taylor. 2014. “Optimal Inference After Model Selection.” ArXiv Preprint ArXiv:1410.2597 . Gross, S. M., J. Taylor, and R. Tibshirani. 2015. “A Selective Approach to Internal Inference.” ArXiv E-Prints , October. Lee, Jason D, Dennis L Sun, Yuekai Sun, and Jonathan E Taylor. 2015. “Exact Post-Selection Inference with the Lasso.” Ann. Statist. Lockhart, Richard, Jonathan Taylor, Ryan J Tibshirani, and Robert Tibshirani. 2014. “A Significance Test for the Lasso.” Annals of Statistics 42 (2). NIH Public Access: 413. Loftus, J. R., and J. E. Taylor. 2015. “Selective inference in regression models with groups of variables.” ArXiv E-Prints , November. 20 / 20

Significance testing after cross-validation Joshua Loftus ( - PowerPoint PPT Presentation

Significance testing after cross-validation Joshua Loftus ( jloftus@turing.ac.uk ) (building from joint work with Jonathan Taylor) 9 December, 2016 Slides and markdown source at https://joftius.github.io/turing 1 / 20 Setting: regression model

Progress to Date in A3: Method Transfer, Partial Validation and Cross validation A3: Method

Cross-validation and the Bootstrap In the section we discuss two resampling methods:

STAT 213 Cross-Validation (and Multifactor ANOVA?) Colin Reimer Dawson Oberlin College 12

Greenhouse Gas CEQA Greenhouse Gas CEQA Significance Threshold Significance Threshold

Introduction to Data Science: Classifier n 1 n 1 k k Suppose you want to compare two

02 | 27 SOUTHERN CROSS 23.04 03 | 27 SOUTHERN CROSS 23.04 04 | 27 SOUTHERN CROSS 23.04 06

GLO Science Professional Before & After Images Before GLO After GLO Before GLO After GLO

The Shadow of the Cross The Cross of Jesus part 1B The Shadow of the Cross Hebrews 10:1-14 The

Validation of National Burn Severity Validation of National Burn Severity Validation of National

Form Validation 1 CS380 What is form validation? 2 validation: ensuring that form's values

Stratified Cross-Validation in Multi-Label Classification Using Genetic Algorithms 7-8/02/2013

Holdout and Cross- -Validation Validation Holdout and Cross Methods Overfitting Avoidance

Criticality experiments and benchmarks for for validation of cross validation of cross sections:

Importance-Weighted Cross- Importance-Weighted Cross- Validation for Covariate Shift Validation

Data Mining II Model Validation Heiko Paulheim Why Model Validation? We have seen so far

Significance How important is it? Thoughts on historical significance A property must have

Cross-track Infrared Sounder (CrIS) SDR Calibration/Validation Plan Status Gail Bingham, Denise

Cross-Border M&A: Notice 201452 Chartered J. Brian Davis Penn State Law Center for

O PINION IN THE S UMMER OF 2013 Alex Brezinski David Rae Sam Solomon American Association of

Introduction to the CCO Transformation and Quality Strategy November 2, 2017 Presented by: Lisa

INFO 4300 / CS4300 Information Retrieval slides adapted from Hinrich Sch utzes, linked from

2/13/2017 Unlocking Potential: Promoting Strengths and Inspiring Success February 8, 2017 The

AMECON: Abstract Meta-Concept Features for Text Illustration Ines Chami 1, *, Youssef Tamaazousti

Division of Labor and Productivity Advantage of Cities: Theory and Evidence from Brazil Lin Tian

Significance testing after cross-validation Joshua Loftus ( - PowerPoint PPT Presentation

Significance testing after cross-validation Joshua Loftus ( jloftus@turing.ac.uk ) (building from joint work with Jonathan Taylor) 9 December, 2016 Slides and markdown source at https://joftius.github.io/turing 1 / 20 Setting: regression model

Progress to Date in A3: Method Transfer, Partial Validation and Cross validation A3: Method

Cross-validation and the Bootstrap In the section we discuss two resampling methods:

STAT 213 Cross-Validation (and Multifactor ANOVA?) Colin Reimer Dawson Oberlin College 12

Greenhouse Gas CEQA Greenhouse Gas CEQA Significance Threshold Significance Threshold

Introduction to Data Science: Classifier n 1 n 1 k k Suppose you want to compare two

02 | 27 SOUTHERN CROSS 23.04 03 | 27 SOUTHERN CROSS 23.04 04 | 27 SOUTHERN CROSS 23.04 06

GLO Science Professional Before &amp; After Images Before GLO After GLO Before GLO After GLO

The Shadow of the Cross The Cross of Jesus part 1B The Shadow of the Cross Hebrews 10:1-14 The

Validation of National Burn Severity Validation of National Burn Severity Validation of National

Form Validation 1 CS380 What is form validation? 2 validation: ensuring that form's values

Stratified Cross-Validation in Multi-Label Classification Using Genetic Algorithms 7-8/02/2013

Holdout and Cross- -Validation Validation Holdout and Cross Methods Overfitting Avoidance

Criticality experiments and benchmarks for for validation of cross validation of cross sections:

Importance-Weighted Cross- Importance-Weighted Cross- Validation for Covariate Shift Validation

Data Mining II Model Validation Heiko Paulheim Why Model Validation? We have seen so far

Significance How important is it? Thoughts on historical significance A property must have

Cross-track Infrared Sounder (CrIS) SDR Calibration/Validation Plan Status Gail Bingham, Denise

Cross-Border M&amp;A: Notice 201452 Chartered J. Brian Davis Penn State Law Center for

O PINION IN THE S UMMER OF 2013 Alex Brezinski David Rae Sam Solomon American Association of

Introduction to the CCO Transformation and Quality Strategy November 2, 2017 Presented by: Lisa

INFO 4300 / CS4300 Information Retrieval slides adapted from Hinrich Sch utzes, linked from

2/13/2017 Unlocking Potential: Promoting Strengths and Inspiring Success February 8, 2017 The

AMECON: Abstract Meta-Concept Features for Text Illustration Ines Chami 1, *, Youssef Tamaazousti

Division of Labor and Productivity Advantage of Cities: Theory and Evidence from Brazil Lin Tian

GLO Science Professional Before & After Images Before GLO After GLO Before GLO After GLO

Cross-Border M&A: Notice 201452 Chartered J. Brian Davis Penn State Law Center for