optimal inference after model selection
play

Optimal Inference After Model Selection Will Fithian Joint work - PowerPoint PPT Presentation

Optimal Inference After Model Selection Will Fithian Joint work with Dennis Sun & Jonathan Taylor December 11, 2015 Outline 1 Introduction 2 Inference After Selection 3 Linear Regression 4 Other Examples Two Stages Two stages of a


  1. Optimal Inference After Model Selection Will Fithian Joint work with Dennis Sun & Jonathan Taylor December 11, 2015

  2. Outline 1 Introduction 2 Inference After Selection 3 Linear Regression 4 Other Examples

  3. Two Stages Two stages of a statistical investigation: 1. Selection: Choose a probabilistic model for the data, formulate an inference problem. Ask a question 2. Inference: Attempt the problem using data & selected model. Answer the question

  4. Two Stages Two stages of a statistical investigation: 1. Selection: Choose a probabilistic model for the data, formulate an inference problem. Ask a question 2. Inference: Attempt the problem using data & selected model. Answer the question Classical admonishment: no looking at data until stage 2 Actual practice: choose variables, check for interactions, overdispersion, ...

  5. Two Stages Two stages of a statistical investigation: 1. Selection: Choose a probabilistic model for the data, formulate an inference problem. Ask a question 2. Inference: Attempt the problem using data & selected model. Answer the question Classical admonishment: no looking at data until stage 2 Actual practice: choose variables, check for interactions, overdispersion, ... How should we relax the classical view?

  6. Naive Inference After Selection What is wrong with naive inference after selection? Example (File Drawer Effect): Observe independent Y i ∼ N ( µ i , 1) , i = 1 , . . . , n . 1. Restrict attention to apparently large effects ˆ I = { i : | Y i | > 1 } . 2. Nominal level- α test of H 0 ,i : µ i = 0 , for i ∈ ˆ I (e.g., α = 0 . 05 : reject if | Y i | > 1 . 96 )

  7. Naive Inference After Selection What is wrong with naive inference after selection? Example (File Drawer Effect): Observe independent Y i ∼ N ( µ i , 1) , i = 1 , . . . , n . 1. Restrict attention to apparently large effects ˆ I = { i : | Y i | > 1 } . 2. Nominal level- α test of H 0 ,i : µ i = 0 , for i ∈ ˆ I (e.g., α = 0 . 05 : reject if | Y i | > 1 . 96 ) “Everyone knows” this is invalid. Why?

  8. Naive Inference After Selection Problem: frequency properties among selected nulls # true nulls tested → P H 0 ,i ( i ∈ ˆ I, reject H 0 ,i ) # false rejections P ( i ∈ ˆ I ) = P H 0 ,i ( reject H 0 ,i | i ∈ ˆ I )

  9. Naive Inference After Selection Problem: frequency properties among selected nulls # true nulls tested → P H 0 ,i ( i ∈ ˆ I, reject H 0 ,i ) # false rejections P ( i ∈ ˆ I ) = P H 0 ,i ( reject H 0 ,i | i ∈ ˆ I ) Solution: directly control selective type I error rate P H 0 ,i ( reject H 0 ,i | i ∈ ˆ I ) Example: P H 0 ,i ( | Y i | > 2 . 41 | | Y i | > 1) = 0 . 05

  10. Naive Inference After Selection Problem: frequency properties among selected nulls # true nulls tested → P H 0 ,i ( i ∈ ˆ I, reject H 0 ,i ) # false rejections P ( i ∈ ˆ I ) = P H 0 ,i ( reject H 0 ,i | i ∈ ˆ I ) Solution: directly control selective type I error rate P H 0 ,i ( reject H 0 ,i | i ∈ ˆ I ) Example: P H 0 ,i ( | Y i | > 2 . 41 | | Y i | > 1) = 0 . 05 Guiding principle when asking random questions: The answer must be valid, given that the question was asked

  11. False Coverage-Statement Rate Benjamini & Yekutieli (2005): CIs for selected parameters, e.g. • selected genes in GWAS • selected treatment in clinical trials Analog of FDR: � � # non-covering CIs ≤ α E 1 ∨ # CIs constructed Conditional inference used as device for FCR control (Weinstein, F, & Benjamini 2013) Also used to correct bias (e.g. Sampson & Sill, 2005; Zöllner & Pritchard, 2007; Zhong & Prentice 2008) Difference in perspective: should we average over questions?

  12. Motivating Example 1: Verifying the Winner Setup: Quinnipiac poll of 667 Iowa Republicans, May 2014: Rank Candidate Result 1. Scott Walker 21% 2. Rand Paul 13% 3. Marco Rubio 13% 4. Ted Cruz 12% . . . . . . 14. Bobby Jindal 1% 15. Lindsey Graham 0% Question: Is Scott Walker really winning? By how much? Problem: Winner’s curse “Question selection,” not really “model selection” Related to subset selection (Gupta & Nagel 1967, others)

  13. Motivating Example 2: Inference After Model Checking Two-sample problem: i.i.d. i.i.d. X 1 , . . . , X m ∼ F 1 , Y 1 , . . . , Y n ∼ F 2

  14. Motivating Example 2: Inference After Model Checking Two-sample problem: i.i.d. i.i.d. X 1 , . . . , X m ∼ F 1 , Y 1 , . . . , Y n ∼ F 2 Test Gaussian model based on normalized residuals � X 1 − X � , . . . , X m − X Y 1 − Y , . . . , Y n − Y R = , S X S X S Y S Y If test rejects, use permutation test (e.g., Wilcoxon): F 1 =? , F 2 =? , H 0 : F 1 = F 2 Otherwise, use two-sample t -test: F 1 = N ( µ, σ 2 ) , F 2 = N ( ν, τ 2 ) , H 0 : µ = ν Model selection, strong sense

  15. Motivating Example 3: Regression After Variable Selection E.g., solve lasso at fixed λ > 0 (Tibshirani, 1996): � Y − Xγ � 2 γ = arg min ˆ 2 + λ � γ � 1 γ “Active set” E = { j : ˆ γ j � = 0 } induces selected model M ( E ) : � � X E β E , σ 2 I n Y ∼ N

  16. Motivating Example 3: Regression After Variable Selection E.g., solve lasso at fixed λ > 0 (Tibshirani, 1996): � Y − Xγ � 2 γ = arg min ˆ 2 + λ � γ � 1 γ “Active set” E = { j : ˆ γ j � = 0 } induces selected model M ( E ) : � � X E β E , σ 2 I n Y ∼ N Can we get valid tests / intervals for β E j , j ∈ E ? Lee, Sun, Sun, & Taylor (2013) studied slightly different problem (inference w.r.t. different model)

  17. Random Model, Random Null Testing null hypothesis H 0 in model M Selective error rate: P M,H 0 ( reject H 0 | ( M, H 0 ) selected ) Nominal error rate: P M,H 0 ( reject H 0 )

  18. Random Model, Random Null Testing null hypothesis H 0 in model M Selective error rate: P M,H 0 ( reject H 0 | ( M, H 0 ) selected ) Nominal error rate: P M,H 0 ( reject H 0 ) “Kosher” adaptive selection: two independent experiments • Select M , H 0 based on exploratory experiment 1 • Test using confirmatory experiment 2

  19. Random Model, Random Null Testing null hypothesis H 0 in model M Selective error rate: P M,H 0 ( reject H 0 | ( M, H 0 ) selected ) Nominal error rate: P M,H 0 ( reject H 0 ) “Kosher” adaptive selection: two independent experiments • Select M , H 0 based on exploratory experiment 1 • Test using confirmatory experiment 2 M, H 0 random, but no adjustment necessary: P M,H 0 ( reject H 0 | ( M, H 0 ) selected ) = P M,H 0 ( reject H 0 ) .

  20. Data Splitting Assume Y = ( Y 1 , Y 2 ) with Y 1 ⊥ ⊥ Y 2 Data splitting mimics exploratory / confirmatory split: • Select model based on Y 1 • Analyze Y 2 as though model chosen “ahead of time.” Again, no adjustment necessary: P M,H 0 ( reject H 0 | ( M, H 0 ) selected ) = P M,H 0 ( reject H 0 ) .

  21. Data Splitting Assume Y = ( Y 1 , Y 2 ) with Y 1 ⊥ ⊥ Y 2 Data splitting mimics exploratory / confirmatory split: • Select model based on Y 1 • Analyze Y 2 as though model chosen “ahead of time.” Again, no adjustment necessary: P M,H 0 ( reject H 0 | ( M, H 0 ) selected ) = P M,H 0 ( reject H 0 ) . Objections to data splitting: • less data for selection • less data for inference • not always possible (e.g., autocorrelated data)

  22. Data Carving Think of data as “revealed in stages:” Let A = { ( M, H 0 ) selected } . F 0 ⊆ F ( 1 A ( Y )) ⊆ F ( Y ) used for selection used for inference

  23. Data Carving Think of data as “revealed in stages:” Let A = { ( M, H 0 ) selected } . F 0 ⊆ F ( 1 A ( Y )) ⊆ F ( Y ) used for selection used for inference Conditioning on A in stage two ⇐ ⇒ Y ∈ A excluded as evidence against H 0

  24. Data Carving Think of data as “revealed in stages:” Let A = { ( M, H 0 ) selected } . F 0 ⊆ F ( 1 A ( Y )) ⊆ F ( Y ) used for selection used for inference Conditioning on A in stage two ⇐ ⇒ Y ∈ A excluded as evidence against H 0 Data splitting conditions on Y 1 instead of 1 A ( Y 1 ) F 0 ⊆ F ( 1 A ( Y 1 )) ⊆ F ( Y 1 ) ⊆ F ( Y 1 , Y 2 ) . used for selection wasted used for inference Data Carving: Use all leftover information for inference

  25. Lasso Partition Yellow region: { y : Variables 1, 3 selected }

  26. Lasso Partition M.hat = which(coef(glmnet(X, Y), lambda) != 0)

  27. Goals Prior work on linear regression after selection with σ 2 known Lockhart et al. (2014), Tibshirani et al. (2014), Lee et al. (2013), Loftus and Taylor (2014), Lee and Taylor (2014), ... Our goals: 1 Formalize inference after selection 2 Understand power — can it be improved? 3 Generalize to unknown σ 2 4 Generalize to other exponential families

  28. Outline 1 Introduction 2 Inference After Selection 3 Linear Regression 4 Other Examples

  29. Selective Hypothesis Tests Setup: Observe Y ∼ F on space ( Y , F ) , F unknown Question space: collection Q of all candidate testing problems q Testing problem is a pair q = ( M, H 0 ) of • model M ( q ) (family of distributions) • null hypothesis H 0 ( q ) ⊆ M ( q ) . (wlog H 1 = M \ H 0 )

  30. Selective Hypothesis Tests Setup: Observe Y ∼ F on space ( Y , F ) , F unknown Question space: collection Q of all candidate testing problems q Testing problem is a pair q = ( M, H 0 ) of • model M ( q ) (family of distributions) • null hypothesis H 0 ( q ) ⊆ M ( q ) . (wlog H 1 = M \ H 0 ) Two stages: 1. Selection: Select subset � Q ( Y ) ⊆ Q to test 2. Inference: Test H 0 vs. M \ H 0 for each q = ( M, H 0 ) ∈ � Q

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend