Optimal Inference After Model Selection Will Fithian Joint work - PowerPoint PPT Presentation

Optimal Inference After Model Selection Will Fithian Joint work with Dennis Sun & Jonathan Taylor December 11, 2015

Outline 1 Introduction 2 Inference After Selection 3 Linear Regression 4 Other Examples

Two Stages Two stages of a statistical investigation: 1. Selection: Choose a probabilistic model for the data, formulate an inference problem. Ask a question 2. Inference: Attempt the problem using data & selected model. Answer the question

Two Stages Two stages of a statistical investigation: 1. Selection: Choose a probabilistic model for the data, formulate an inference problem. Ask a question 2. Inference: Attempt the problem using data & selected model. Answer the question Classical admonishment: no looking at data until stage 2 Actual practice: choose variables, check for interactions, overdispersion, ...

Two Stages Two stages of a statistical investigation: 1. Selection: Choose a probabilistic model for the data, formulate an inference problem. Ask a question 2. Inference: Attempt the problem using data & selected model. Answer the question Classical admonishment: no looking at data until stage 2 Actual practice: choose variables, check for interactions, overdispersion, ... How should we relax the classical view?

Naive Inference After Selection What is wrong with naive inference after selection? Example (File Drawer Effect): Observe independent Y i ∼ N ( µ i , 1) , i = 1 , . . . , n . 1. Restrict attention to apparently large effects ˆ I = { i : | Y i | > 1 } . 2. Nominal level- α test of H 0 ,i : µ i = 0 , for i ∈ ˆ I (e.g., α = 0 . 05 : reject if | Y i | > 1 . 96 )

Naive Inference After Selection What is wrong with naive inference after selection? Example (File Drawer Effect): Observe independent Y i ∼ N ( µ i , 1) , i = 1 , . . . , n . 1. Restrict attention to apparently large effects ˆ I = { i : | Y i | > 1 } . 2. Nominal level- α test of H 0 ,i : µ i = 0 , for i ∈ ˆ I (e.g., α = 0 . 05 : reject if | Y i | > 1 . 96 ) “Everyone knows” this is invalid. Why?

Naive Inference After Selection Problem: frequency properties among selected nulls # true nulls tested → P H 0 ,i ( i ∈ ˆ I, reject H 0 ,i ) # false rejections P ( i ∈ ˆ I ) = P H 0 ,i ( reject H 0 ,i | i ∈ ˆ I )

Naive Inference After Selection Problem: frequency properties among selected nulls # true nulls tested → P H 0 ,i ( i ∈ ˆ I, reject H 0 ,i ) # false rejections P ( i ∈ ˆ I ) = P H 0 ,i ( reject H 0 ,i | i ∈ ˆ I ) Solution: directly control selective type I error rate P H 0 ,i ( reject H 0 ,i | i ∈ ˆ I ) Example: P H 0 ,i ( | Y i | > 2 . 41 | | Y i | > 1) = 0 . 05

Naive Inference After Selection Problem: frequency properties among selected nulls # true nulls tested → P H 0 ,i ( i ∈ ˆ I, reject H 0 ,i ) # false rejections P ( i ∈ ˆ I ) = P H 0 ,i ( reject H 0 ,i | i ∈ ˆ I ) Solution: directly control selective type I error rate P H 0 ,i ( reject H 0 ,i | i ∈ ˆ I ) Example: P H 0 ,i ( | Y i | > 2 . 41 | | Y i | > 1) = 0 . 05 Guiding principle when asking random questions: The answer must be valid, given that the question was asked

False Coverage-Statement Rate Benjamini & Yekutieli (2005): CIs for selected parameters, e.g. • selected genes in GWAS • selected treatment in clinical trials Analog of FDR: � � # non-covering CIs ≤ α E 1 ∨ # CIs constructed Conditional inference used as device for FCR control (Weinstein, F, & Benjamini 2013) Also used to correct bias (e.g. Sampson & Sill, 2005; Zöllner & Pritchard, 2007; Zhong & Prentice 2008) Difference in perspective: should we average over questions?

Motivating Example 1: Verifying the Winner Setup: Quinnipiac poll of 667 Iowa Republicans, May 2014: Rank Candidate Result 1. Scott Walker 21% 2. Rand Paul 13% 3. Marco Rubio 13% 4. Ted Cruz 12% . . . . . . 14. Bobby Jindal 1% 15. Lindsey Graham 0% Question: Is Scott Walker really winning? By how much? Problem: Winner’s curse “Question selection,” not really “model selection” Related to subset selection (Gupta & Nagel 1967, others)

Motivating Example 2: Inference After Model Checking Two-sample problem: i.i.d. i.i.d. X 1 , . . . , X m ∼ F 1 , Y 1 , . . . , Y n ∼ F 2

Motivating Example 2: Inference After Model Checking Two-sample problem: i.i.d. i.i.d. X 1 , . . . , X m ∼ F 1 , Y 1 , . . . , Y n ∼ F 2 Test Gaussian model based on normalized residuals � X 1 − X � , . . . , X m − X Y 1 − Y , . . . , Y n − Y R = , S X S X S Y S Y If test rejects, use permutation test (e.g., Wilcoxon): F 1 =? , F 2 =? , H 0 : F 1 = F 2 Otherwise, use two-sample t -test: F 1 = N ( µ, σ 2 ) , F 2 = N ( ν, τ 2 ) , H 0 : µ = ν Model selection, strong sense

Motivating Example 3: Regression After Variable Selection E.g., solve lasso at fixed λ > 0 (Tibshirani, 1996): � Y − Xγ � 2 γ = arg min ˆ 2 + λ � γ � 1 γ “Active set” E = { j : ˆ γ j � = 0 } induces selected model M ( E ) : � � X E β E , σ 2 I n Y ∼ N

Motivating Example 3: Regression After Variable Selection E.g., solve lasso at fixed λ > 0 (Tibshirani, 1996): � Y − Xγ � 2 γ = arg min ˆ 2 + λ � γ � 1 γ “Active set” E = { j : ˆ γ j � = 0 } induces selected model M ( E ) : � � X E β E , σ 2 I n Y ∼ N Can we get valid tests / intervals for β E j , j ∈ E ? Lee, Sun, Sun, & Taylor (2013) studied slightly different problem (inference w.r.t. different model)

Random Model, Random Null Testing null hypothesis H 0 in model M Selective error rate: P M,H 0 ( reject H 0 | ( M, H 0 ) selected ) Nominal error rate: P M,H 0 ( reject H 0 )

Random Model, Random Null Testing null hypothesis H 0 in model M Selective error rate: P M,H 0 ( reject H 0 | ( M, H 0 ) selected ) Nominal error rate: P M,H 0 ( reject H 0 ) “Kosher” adaptive selection: two independent experiments • Select M , H 0 based on exploratory experiment 1 • Test using confirmatory experiment 2

Random Model, Random Null Testing null hypothesis H 0 in model M Selective error rate: P M,H 0 ( reject H 0 | ( M, H 0 ) selected ) Nominal error rate: P M,H 0 ( reject H 0 ) “Kosher” adaptive selection: two independent experiments • Select M , H 0 based on exploratory experiment 1 • Test using confirmatory experiment 2 M, H 0 random, but no adjustment necessary: P M,H 0 ( reject H 0 | ( M, H 0 ) selected ) = P M,H 0 ( reject H 0 ) .

Data Splitting Assume Y = ( Y 1 , Y 2 ) with Y 1 ⊥ ⊥ Y 2 Data splitting mimics exploratory / confirmatory split: • Select model based on Y 1 • Analyze Y 2 as though model chosen “ahead of time.” Again, no adjustment necessary: P M,H 0 ( reject H 0 | ( M, H 0 ) selected ) = P M,H 0 ( reject H 0 ) .

Data Splitting Assume Y = ( Y 1 , Y 2 ) with Y 1 ⊥ ⊥ Y 2 Data splitting mimics exploratory / confirmatory split: • Select model based on Y 1 • Analyze Y 2 as though model chosen “ahead of time.” Again, no adjustment necessary: P M,H 0 ( reject H 0 | ( M, H 0 ) selected ) = P M,H 0 ( reject H 0 ) . Objections to data splitting: • less data for selection • less data for inference • not always possible (e.g., autocorrelated data)

Data Carving Think of data as “revealed in stages:” Let A = { ( M, H 0 ) selected } . F 0 ⊆ F ( 1 A ( Y )) ⊆ F ( Y ) used for selection used for inference

Data Carving Think of data as “revealed in stages:” Let A = { ( M, H 0 ) selected } . F 0 ⊆ F ( 1 A ( Y )) ⊆ F ( Y ) used for selection used for inference Conditioning on A in stage two ⇐ ⇒ Y ∈ A excluded as evidence against H 0

Data Carving Think of data as “revealed in stages:” Let A = { ( M, H 0 ) selected } . F 0 ⊆ F ( 1 A ( Y )) ⊆ F ( Y ) used for selection used for inference Conditioning on A in stage two ⇐ ⇒ Y ∈ A excluded as evidence against H 0 Data splitting conditions on Y 1 instead of 1 A ( Y 1 ) F 0 ⊆ F ( 1 A ( Y 1 )) ⊆ F ( Y 1 ) ⊆ F ( Y 1 , Y 2 ) . used for selection wasted used for inference Data Carving: Use all leftover information for inference

Lasso Partition Yellow region: { y : Variables 1, 3 selected }

Lasso Partition M.hat = which(coef(glmnet(X, Y), lambda) != 0)

Goals Prior work on linear regression after selection with σ 2 known Lockhart et al. (2014), Tibshirani et al. (2014), Lee et al. (2013), Loftus and Taylor (2014), Lee and Taylor (2014), ... Our goals: 1 Formalize inference after selection 2 Understand power — can it be improved? 3 Generalize to unknown σ 2 4 Generalize to other exponential families

Outline 1 Introduction 2 Inference After Selection 3 Linear Regression 4 Other Examples

Selective Hypothesis Tests Setup: Observe Y ∼ F on space ( Y , F ) , F unknown Question space: collection Q of all candidate testing problems q Testing problem is a pair q = ( M, H 0 ) of • model M ( q ) (family of distributions) • null hypothesis H 0 ( q ) ⊆ M ( q ) . (wlog H 1 = M \ H 0 )

Selective Hypothesis Tests Setup: Observe Y ∼ F on space ( Y , F ) , F unknown Question space: collection Q of all candidate testing problems q Testing problem is a pair q = ( M, H 0 ) of • model M ( q ) (family of distributions) • null hypothesis H 0 ( q ) ⊆ M ( q ) . (wlog H 1 = M \ H 0 ) Two stages: 1. Selection: Select subset � Q ( Y ) ⊆ Q to test 2. Inference: Test H 0 vs. M \ H 0 for each q = ( M, H 0 ) ∈ � Q

Optimal Inference After Model Selection Will Fithian Joint work - PowerPoint PPT Presentation

Optimal Inference After Model Selection Will Fithian Joint work with Dennis Sun & Jonathan Taylor December 11, 2015 Outline 1 Introduction 2 Inference After Selection 3 Linear Regression 4 Other Examples Two Stages Two stages of a

Post-Selection Inference Todd Kuffner Washington University in St. Louis PhyStat 2016

Selective Inference via the Condition on Selection Framework: Inference after Variable Selection

GLO Science Professional Before & After Images Before GLO After GLO Before GLO After GLO

Conditional Predictive Inference Post Model Selection Hannes Leeb Department of Statistics Yale

ERP Selection KIRTANE & PANDIT Suhas Deshpande Why ERP Selection is important ?

STAT 213 Model Selection II Colin Reimer Dawson Oberlin College March 30, 2018 1 / 13 Outline

Bayesian Model Selection and Averaging Nonlinear Models Bayes factors Example Families FFX

Selection of an optimal Selection of an optimal antifungal for treatment of antifungal for

SECONDHAND SELECTION Sales Price - 275,000.00 EU SECONDHAND SELECTION INTERNAL VIEWS SECONDHAND

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

SELECTION Deterministic Stochastic Proportionate selection: Roulette Wheel Selection

Selection 2 Selection Selection given a set of (distinct) elements, finding the element larger

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Inference for parameters of interest after lasso model selection David M. Drukker Executive

Optimal Agents Nick Hay 27th September 2005 1 / 36 Nick Hay Optimal Agents The Optimal Agent

Toward Computing Towards an Optimal . . . An (Almost) Optimal . . . Minor Problem an Optimal

Security Potpourri! CS 161: Computer Security Guest Lecturers: Frank Li, Rebecca Portnoff, Grant

cer Serv vices Particip pation A Agenda Backgrou und to up a and coming g Participati on

THE CASE OF VSTERNORRLANDS COUNTY NORBA/NS-RSA-Conference, Oslo, March 14-15 20102 Mats

Getting to Know grid Graphics Paul Murrell, The University of Auckland, June 2015 An overview of

Intro to NSF-LDC Satellite Th Three ee Co Coding ding Fo Foci ci Todays main foci

An IT framework for a quick evaluation of accuracy of Italian LFS. Cinzia Graziani, Silvia

Trees Applied Multivariate Statistics Spring 2012 Overview Intuition for Trees

Development Economics ECON 4915 Andreas Kotsadam Andreas.Kotsadam@frisch.uio.no Outline

Optimal Inference After Model Selection Will Fithian Joint work - PowerPoint PPT Presentation

Optimal Inference After Model Selection Will Fithian Joint work with Dennis Sun & Jonathan Taylor December 11, 2015 Outline 1 Introduction 2 Inference After Selection 3 Linear Regression 4 Other Examples Two Stages Two stages of a

Post-Selection Inference Todd Kuffner Washington University in St. Louis PhyStat 2016

Selective Inference via the Condition on Selection Framework: Inference after Variable Selection

GLO Science Professional Before &amp; After Images Before GLO After GLO Before GLO After GLO

Conditional Predictive Inference Post Model Selection Hannes Leeb Department of Statistics Yale

ERP Selection KIRTANE &amp; PANDIT Suhas Deshpande Why ERP Selection is important ?

STAT 213 Model Selection II Colin Reimer Dawson Oberlin College March 30, 2018 1 / 13 Outline

Bayesian Model Selection and Averaging Nonlinear Models Bayes factors Example Families FFX

Selection of an optimal Selection of an optimal antifungal for treatment of antifungal for

SECONDHAND SELECTION Sales Price - 275,000.00 EU SECONDHAND SELECTION INTERNAL VIEWS SECONDHAND

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

SELECTION Deterministic Stochastic Proportionate selection: Roulette Wheel Selection

Selection 2 Selection Selection given a set of (distinct) elements, finding the element larger

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Inference for parameters of interest after lasso model selection David M. Drukker Executive

Optimal Agents Nick Hay 27th September 2005 1 / 36 Nick Hay Optimal Agents The Optimal Agent

Toward Computing Towards an Optimal . . . An (Almost) Optimal . . . Minor Problem an Optimal

Security Potpourri! CS 161: Computer Security Guest Lecturers: Frank Li, Rebecca Portnoff, Grant

cer Serv vices Particip pation A Agenda Backgrou und to up a and coming g Participati on

THE CASE OF VSTERNORRLANDS COUNTY NORBA/NS-RSA-Conference, Oslo, March 14-15 20102 Mats

Getting to Know grid Graphics Paul Murrell, The University of Auckland, June 2015 An overview of

Intro to NSF-LDC Satellite Th Three ee Co Coding ding Fo Foci ci Todays main foci

An IT framework for a quick evaluation of accuracy of Italian LFS. Cinzia Graziani, Silvia

Trees Applied Multivariate Statistics Spring 2012 Overview Intuition for Trees

Development Economics ECON 4915 Andreas Kotsadam Andreas.Kotsadam@frisch.uio.no Outline

GLO Science Professional Before & After Images Before GLO After GLO Before GLO After GLO

ERP Selection KIRTANE & PANDIT Suhas Deshpande Why ERP Selection is important ?