Selective inference: a conditional perspective Xiaoying Tian Harris - PowerPoint PPT Presentation

Selective inference: a conditional perspective Xiaoying Tian Harris Joint work with Jonathan Taylor September 26, 2016

Model selection ◮ Observe data ( y , X ), X ∈ R n × p , y ∈ R n

Model selection ◮ Observe data ( y , X ), X ∈ R n × p , y ∈ R n ◮ model = lm(y ∼ X1 + X2 + X3 + X4)

Model selection ◮ Observe data ( y , X ), X ∈ R n × p , y ∈ R n ◮ model = lm(y ∼ X1 + X2 + X3 + X4) model = lm(y ∼ X1 + X2 + X4)

Model selection ◮ Observe data ( y , X ), X ∈ R n × p , y ∈ R n ◮ model = lm(y ∼ X1 + X2 + X3 + X4) model = lm(y ∼ X1 + X2 + X4) model = lm(y ∼ X1 + X3 + X4)

Model selection ◮ Observe data ( y , X ), X ∈ R n × p , y ∈ R n ◮ model = lm(y ∼ X1 + X2 + X3 + X4) model = lm(y ∼ X1 + X2 + X4) model = lm(y ∼ X1 + X3 + X4) ◮ Inference after model selection 1. Use data to select a set of variables E 2. Normal z-test to get p-values

Model selection ◮ Observe data ( y , X ), X ∈ R n × p , y ∈ R n ◮ model = lm(y ∼ X1 + X2 + X3 + X4) model = lm(y ∼ X1 + X2 + X4) model = lm(y ∼ X1 + X3 + X4) ◮ Inference after model selection 1. Use data to select a set of variables E 2. Normal z-test to get p-values ◮ Problem: inflated significance 1. Normal z-tests need adjustment 2. Selection is biased towards “significance”

Inflated Significance Setup: ◮ X ∈ R 100 × 200 has i.i.d normal entries ◮ y = X β + ǫ , ǫ ∼ N (0 , I ) ◮ β = (5 , . . . , 5 , 0 , . . . , 0) � �� 10 ◮ LASSO, nonzero coefficient set E ◮ z-test, null pvalues for i ∈ E , i �∈ { 1 , . . . , 10 } 0.5 null pvalues after selection 0.4 0.3 frequencies 0.2 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 p-values

Post-selection inference ◮ PoSI approach: 1. Reduce to simultaneous inference 2. Protects against any selection procedure 3. Conservative and computationally expensive

Post-selection inference ◮ PoSI approach: 1. Reduce to simultaneous inference 2. Protects against any selection procedure 3. Conservative and computationally expensive ◮ Selective inference approach: 1. Conditional approach 2. Specific to particular selection procedures 3. More powerful tests

Conditional approach: example Consider the selection for “big effects”: � n i . i . d i =1 X i ◮ X 1 , . . . , X n ∼ N (0 , 1), X = n ◮ Select for “big effects”, X > 1 ◮ Observation: X obs = 1 . 1, with n = 5 ◮ Normal z -test v.s. selective test for H 0 : µ = 0. original distribution for ¯ X conditional distribution after selection 0.9 6 0.8 5 0.7 0.6 4 0.5 3 0.4 0.3 2 0.2 1 0.1 0.0 0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 0 . 0 0 . 5 1 . 0 1 . 5 2 . 0

Conditional approach: example Consider the selection for “big effects”: � n i . i . d i =1 X i ◮ X 1 , . . . , X n ∼ N (0 , 1), X = n ◮ Select for “big effects”, X > 1 ◮ Observation: X obs = 1 . 1, with n = 5 ◮ Normal z -test v.s. selective test for H 0 : µ = 0. original distribution for ¯ X conditional distribution after selection 0 . 9 6 0 . 8 5 0 . 7 0 . 6 4 0 . 5 3 0 . 4 0 . 3 2 0 . 2 1 0 . 1 0 . 0 0 − 1 . 5 − 1 . 0 − 0 . 5 0 . 0 0 . 5 1 . 0 1 . 5 0 . 0 0 . 5 1 . 0 1 . 5 2 . 0

Moral of selective inference Conditional approach: ◮ Selection, e.g. X > 1. ◮ Conditional distribution after selection, e.g. N ( µ, 1 n ), truncated at 1. ◮ Target of inference may (or may not) depend on outcome of the selection. 1. Not dependent: e.g. H 0 : µ = 0. 2. Dependent: e.g. two-sample problem, inference for variables selected by LASSO

Moral of selective inference Conditional approach: ◮ Selection, e.g. X > 1. ◮ Conditional distribution after selection, e.g. N ( µ, 1 n ), truncated at 1. ◮ Target of inference may (or may not) depend on outcome of the selection. 1. Not dependent: e.g. H 0 : µ = 0. 2. Dependent: e.g. two-sample problem, inference for variables selected by LASSO ◮ Random hypothesis?

Random hypothesis ◮ Replication studies

Random hypothesis ◮ Replication studies ◮ Data splitting: observe data ( X , y ), with X fixed, entries of y are independent (given X )

Random hypothesis ◮ Replication studies ◮ Data splitting: observe data ( X , y ), with X fixed, entries of y are independent (given X ) Random hypothesis selected by the data

Random hypothesis ◮ Replication studies ◮ Data splitting: observe data ( X , y ), with X fixed, entries of y are independent (given X ) Random hypothesis selected by the data ◮ Data splitting as a conditional approach: L ( y 2 ) = L ( y 2 | H 0 selected by y 1 ) .

Selective inference: a conditional approach ◮ Data splitting as a conditional approach: L ( y 2 ) = L ( y 2 | H 0 selected by y 1 ) . ◮ Inference based on the conditional law: y ∗ = y ∗ ( y , ω ) , L ( y | H 0 selected by y ∗ ) , where ω is some randomization independent of y .

Selective inference: a conditional approach ◮ Data splitting as a conditional approach: L ( y 2 ) = L ( y 2 | H 0 selected by y 1 ) . ◮ Inference based on the conditional law: y ∗ = y ∗ ( y , ω ) , L ( y | H 0 selected by y ∗ ) , where ω is some randomization independent of y . ◮ Examples of y ∗ : 1. y ∗ = y 1 , where ω is a random split 2. y ∗ = y , ω is void 3. y ∗ = y + ω , where ω ∼ N (0 , γ 2 ), additive noise

Different y ∗ ◮ Much more powerful tests. ◮ Randomization transfers the properties of unselective distributions to selective counterparts. y ∗ = y y ∗ = y 1 y ∗ = y + ω randomized LASSO y Lee et al. Data T. & Taylor T. & Tay- (2013), splitting, (2015) lor Taylor et Fithian et (2015) al.(2014) al.(2014)

Selective v.s. unselective distributions � n i . i . d i =1 X i Example: X 1 , . . . , X n ∼ N (0 , 1), X = , n = 5. n Selection: X > 1. original distribution ¯ X conditional distribution after selection 0.9 6 0.8 5 0.7 0.6 4 0.5 3 0.4 0.3 2 0.2 1 0.1 0.0 0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 0 . 0 0 . 5 1 . 0 1 . 5

Selective v.s. unselective distributions � n i . i . d i =1 X i Example: X 1 , . . . , X n ∼ N (0 , 1), X = , n = 5. n Selection: X + ω > 1, where ω ∼ Laplace (0 . 15) Explicit formulas for the densities of the selective distribution. original distribution ¯ X conditional distribution after selection 0 . 9 2 . 0 0 . 8 0 . 7 1 . 5 0 . 6 0 . 5 1 . 0 0 . 4 0 . 3 0 . 5 0 . 2 0 . 1 0 . 0 0 . 0 − 1 . 5 − 1 . 0 − 0 . 5 0 . 0 0 . 5 1 . 0 1 . 5 0 . 0 0 . 5 1 . 0 1 . 5 The selective distribution is much better behaved after randomization

Selective v.s. Unselective distributions i . i . d ◮ Suppose X i ∼ F , X i ∈ R k . � n i =1 ξ i ( X i ) + o p ( n − 1 ◮ Linearizable statistics: T = 1 2 ), with ξ i n being measurable to X i ’s. ◮ Central limit theorem: � � µ, Σ T ⇒ N , n where E [ T ] = µ ∈ R p , Var ( T ) = Σ .

Selective v.s. Unselective distributions i . i . d ◮ Suppose X i ∼ F , X i ∈ R k . � n i =1 ξ i ( X i ) + o p ( n − 1 ◮ Linearizable statistics: T = 1 2 ), with ξ i n being measurable to X i ’s. ◮ Central limit theorem: � � µ, Σ T ⇒ N , n where E [ T ] = µ ∈ R p , Var ( T ) = Σ . Would this still hold under the selective distribution?

Selective distributions Randomized selection with T ∗ = T ∗ ( T , ω ), ˆ M : T ∗ �→ M , ◮ Original distribution of T (with density f ): f ( t ) ◮ Selective distribution: � � � ˆ f ( t ) ℓ ( t ) , ℓ ( t ) ∝ M [ T ∗ ( t + ω )] = M g ( ω ) d ω 1 where g is the density for ω . ◮ ℓ ( t ) is also called the selective likelihood.

Selective central limit theorem Theorem (Selective CLT, T. and Taylor (2015)) If 1. Model selection is made with T ∗ = T ∗ ( T , ω ) 2. Selective likelihood ℓ ( t ) satisfies some regularity conditions 3. T has moment generating function in a neighbourhood of the origin then L ( T | H 0 selected by T ∗ ) ⇒ L ( N ( µ, Σ) | H 0 selected by T ∗ ) ,

Power comparison Unrandomized y ∗ = y , randomized y ∗ = y + ω , ω ∼ N (0 , 0 . 1 σ 2 ). HIVDB http://hivdb.stanford.edu/ Parameter values Parameter values 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 2.5 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 2.5 P62V P62V Randomized Unrandomized P65R P65R P67N P67N P69i P69i P75I P77L P77L P83K P83K P90I P90I P115F P115F P151M P151M P181C P181C P184V P184V P190A P190A P215F P215F P215Y P215Y P219R P219R

Tradeoff between power and model selection ◮ Setup y = X β + ǫ , n = 100 , p = 200, ǫ ∼ N (0 , I ), β = (7 , . . . , 7 , 0 , . . . , 0). X is equicorrelated with ρ = 0 . 3. � �� 7 ◮ Use randomized y ∗ to fit Lasso, active set E : 1. Data splitting / Data carving: y ∗ = y 1 random subset of y , 2. Additive randomization: y ∗ = y + ω , ω ∼ N (0 , γ 2 I ). Data carving picture credit Fithian et al. (2014).

Fithian, W., Sun, D. & Taylor, J. (2014), ‘Optimal inference after model selection’, arXiv:1410.2597 [math, stat] . arXiv: 1410.2597. URL: http://arxiv.org/abs/1410.2597

Selective inference: a conditional perspective Xiaoying Tian Harris - PowerPoint PPT Presentation

Selective inference: a conditional perspective Xiaoying Tian Harris Joint work with Jonathan Taylor September 26, 2016 Model selection Observe data ( y , X ), X R n p , y R n Model selection Observe data ( y , X ), X R n

Mixed Oxides in Selective Mixed Oxides in Selective Mixed Oxides in Selective Mixed Oxides in

Texas Instruments & RFAB TI Information Selective Disclosure TI Information Selective

Cimzia Selective rebrand Concept A Cimzia Selective rebrand Logo Main / Colour Grayscale

Selective Prediction Binary classifications Rong Zhou November 8, 2017 Table of contents 1.

Review: Conditional Probability Conditional Probability The conditional probability of event

11/15/16 Conditional distributions Let X and Y be discrete r.v.s. Conditional probability mass

Perspective LanguaL Structured Vocabulary: USDA Perspective Joanne Holden Perspective: Earth

Selective Inference via the Condition on Selection Framework: Inference after Variable Selection

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Conditional Predictive Inference Post Model Selection Hannes Leeb Department of Statistics Yale

M odels for Inexact Reasoning Fuzzy Logic Lesson 6 Inference from Conditional Fuzzy

Conditional Statements Python Conditional Statements Sometimes a statement (or a block of

Markov random fields 2. conditional specifications 3. conditional auto-regression Rasmus

Formal Modeling in Cognitive Science Independence Lecture 23: Conditional Probability; Bayes

Conditional Sentences as Conditional Speech Acts Workshop Questioning Speech Acts Universitt

Conditional Probability & Independence Conditional Probabilities Question : How should we

MOVING THE WORLD AT WORK Wilson R. Jones President and Chief Executive Officer Oshkosh

s r rt

Family Advisors Webinar 2: Identifying and Training Advisors Pam Dardess, MPH Principal

LBUSD Superintendent Selection Board of Education Presentation January 2020 Superintendent

Implementing Quantile Selection Models in Stata Mariel Siravegna Ercio Munoz Georgetown

Policy Exploration for JITDs (Java) By Team Datum Cracking Results from Paper vs. Observed

Selection Detection and Two-Sample-Testing: Generalized Greenwood Statistics and their

Introduction to Machine Learning Active Learning Barnabs Pczos 1 Credits Some of the

Selective inference: a conditional perspective Xiaoying Tian Harris - PowerPoint PPT Presentation

Selective inference: a conditional perspective Xiaoying Tian Harris Joint work with Jonathan Taylor September 26, 2016 Model selection Observe data ( y , X ), X R n p , y R n Model selection Observe data ( y , X ), X R n

Mixed Oxides in Selective Mixed Oxides in Selective Mixed Oxides in Selective Mixed Oxides in

Texas Instruments &amp; RFAB TI Information Selective Disclosure TI Information Selective

Cimzia Selective rebrand Concept A Cimzia Selective rebrand Logo Main / Colour Grayscale

Selective Prediction Binary classifications Rong Zhou November 8, 2017 Table of contents 1.

Review: Conditional Probability Conditional Probability The conditional probability of event

11/15/16 Conditional distributions Let X and Y be discrete r.v.s. Conditional probability mass

Perspective LanguaL Structured Vocabulary: USDA Perspective Joanne Holden Perspective: Earth

Selective Inference via the Condition on Selection Framework: Inference after Variable Selection

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Conditional Predictive Inference Post Model Selection Hannes Leeb Department of Statistics Yale

M odels for Inexact Reasoning Fuzzy Logic Lesson 6 Inference from Conditional Fuzzy

Conditional Statements Python Conditional Statements Sometimes a statement (or a block of

Markov random fields 2. conditional specifications 3. conditional auto-regression Rasmus

Formal Modeling in Cognitive Science Independence Lecture 23: Conditional Probability; Bayes

Conditional Sentences as Conditional Speech Acts Workshop Questioning Speech Acts Universitt

Conditional Probability &amp; Independence Conditional Probabilities Question : How should we

MOVING THE WORLD AT WORK Wilson R. Jones President and Chief Executive Officer Oshkosh

s r rt

Family Advisors Webinar 2: Identifying and Training Advisors Pam Dardess, MPH Principal

LBUSD Superintendent Selection Board of Education Presentation January 2020 Superintendent

Implementing Quantile Selection Models in Stata Mariel Siravegna Ercio Munoz Georgetown

Policy Exploration for JITDs (Java) By Team Datum Cracking Results from Paper vs. Observed

Selection Detection and Two-Sample-Testing: Generalized Greenwood Statistics and their

Introduction to Machine Learning Active Learning Barnabs Pczos 1 Credits Some of the

Texas Instruments & RFAB TI Information Selective Disclosure TI Information Selective

Conditional Probability & Independence Conditional Probabilities Question : How should we