 
              Selective Inference for Effect Modification Qingyuan Zhao (Joint work with Dylan Small and Ashkan Ertefaie) Department of Statistics, University of Pennsylvania May 24, ACIC 2017 Manuscript and slides are available at http://www-stat.wharton.upenn.edu/~qyzhao/ .
Effect modification Effect modification Qingyuan Zhao Effect modification means the treatment has a different Problem formulation effect among different subgroups. Selective In other words, there is interaction between treatment and inference: why and how covariates in the outcome model. Selective inference for Why care about effect modification? effect modification Extrapolation of average causal effect to a different Numerical population. examples Personalizing the treatment. Future work Understanding the causal mechanism. References 2/28
Subgroup analysis and regression analysis Effect modification Qingyuan Zhao Subgroup analysis and regression analysis are the most Problem common ways to analyze effect modification. formulation Selective Prespecified subgroups/interactions: inference: why and how Free of selection bias. Scientifically rigorous. Selective Limited in number. No flexibility. inference for effect Post hoc subgroups/interactions. modification Scheff´ e, Tukey (1950s): multiple comparisons. Numerical examples Lots of recent work on discovering effect modification. Future work But how to guarantee coverage? A call for valid inference References after model selection. 3/28
Setting Effect A nonparametric model for the potential outcomes: modification Qingyuan Zhao Y i ( t ) = η ( X i ) + t · ∆( X i ) + ǫ i ( t ) , i = 1 , . . . , n . Problem formulation Selective inference: why ∆( x ) is the parameter of interest. and how Saturated if the treatment is binary, t ∈ { 0 , 1 } . Selective inference for effect Basic assumptions: modification Numerical Assumption examples 1 Consistency of the observed outcome: Y i = Y i ( T i ) ; Future work References 2 Unconfoundedness: T i ⊥ ⊥ Y i ( t ) | X i , ∀ t ∈ T ; 3 Positivity/Overlap: Var ( T i | X i = x ) exists and is bounded away from 0 for all x . 4/28
Naive linear modeling I Effect modification Qingyuan Zhao A straw man Problem formulation Instead of the nonparametric model, Selective inference: why and how Y i ( t ) = η ( X i ) + t · ∆( X i ) + ǫ i , i = 1 , . . . , n , Selective inference for fit a linear model (the intercepts are dropped for simplicity) effect modification Y i ( t ) = γ T X i + T i · ( β T X i ) + ˜ ǫ i , i = 1 , . . . , n . Numerical examples Future work Dismiss all insignificant interaction terms, then refit the model. References 6/28
Naive linear modeling II Effect modification Qingyuan Two critical fallacies: Zhao 1 Linear model could be misspecified. Problem formulation Solution: use machine learning algorithms to estimate Selective the nuisance parameters. inference: why and how Targeted learning [van der Laan and Rose, 2011], double Selective machine learning [Chernozhukov, Chetverikov, Demirer, inference for effect Duflo, Hansen, et al., 2016]. modification 2 Statistical inference ignored data snooping. Numerical examples Solution: use selective inference. Future work Lee, Sun, Sun, and Taylor [2016], Fithian, Sun, and Taylor References [2014], Tian and Taylor [2017b]. 7/28
Background: valid inference after model selection I Effect modification Qingyuan Zhao Problem formulation Acknowledge that the model is selected using the data. Selective Model selection procedure: inference: why and how { X i , T i , Y i } n i =1 �→ ˆ M (data �→ a subset of covariates). Selective inference for effect ˆ The target parameter β ∗ M : x T M β ∗ M is defined by M is the modification ˆ ˆ ˆ “best linear approximation” of ∆( x ) [Berk, Brown, Buja, Numerical examples Zhang, and Zhao, 2013]. Future work References 8/28
Background: valid inference after model selection II Effect modification Qingyuan Two types of confidence intervals: Zhao Simultaneous coverage [Berk et al., 2013]: 1 Problem formulation �� � β ∗ j ∈ [ D − j , D + j ] for any j ∈ ˆ ≥ 1 − q , ∀ ˆ � P M M . Selective ˆ M inference: why and how Conditional coverage [Lee et al., 2016]: 2 Selective inference for effect � �� � ˆ β ∗ j ∈ [ D − j , D + modification � P j ] M = M ≥ 1 − q , ∀M . � M � Numerical examples Guarantees the control of false coverage rate (FCR, the Future work average proportion of non-covering intervals among the References reported) [Benjamini and Yekutieli, 2005]. 9/28
Background: selective inference in linear models I Effect modification Suppose we have noisy observations of ∆: Qingyuan Zhao Y i = ∆( X i ) + ǫ i , i = 1 , . . . , n , Problem formulation Selective Submodel parameter inference: why and how Selective n � 2 inference for � � β ∗ ∆( X i ) − α − X T M = arg min . i , M β M effect modification α, β M i =1 Numerical examples Linear selection rule Future work References { ˆ � � M = M} = A M ( X ) · Y ≤ b M ( X ) . Example: Nonzero elements in the Lasso solution. 10/28
Background: selective inference in linear models II Main result of Lee et al. [2016]: Effect modification � ˆ β ∗ � � � β ˆ j | AY ≤ b is truncated normal with mean j . Qingyuan ˆ M M Zhao Need normality of noise, but can be relaxed in large Problem formulation sample [Tian and Taylor, 2017a]. Selective Geometric intuition: inference: why and how Selective inference for effect modification Numerical examples Future work References � ˆ β ∗ � � � Invert the pivotal statistic F ( j , j ) ∼ Unif (0 , 1) β ˆ ˆ M M to construct selective confidence interval. 11/28
Eliminate the nuisance parameter Effect modification Back to the causal model (of the observables) Qingyuan Zhao Y i = η ( X i ) + T i · ∆( X i ) + ǫ i , i = 1 , . . . , n . Problem formulation Selective Problem: how to eliminate the nuisance parameter η ( x )? inference: why and how Robinson [1988]’s transformation Selective inference for effect Let µ y ( x ) = E [ Y i | X i = x ] and µ t ( x ) = E [ T i | X i = x ], so modification µ y ( x ) = η ( x ) + µ t ( x )∆( x ). An equivalent model is Numerical examples � � Future work Y i − µ y ( X i ) = T i − µ t ( X i ) · ∆( X i ) + ǫ i , i = 1 , . . . , n . References The new nuisance parameters µ y ( x ) and µ t ( x ) can be directly estimated from the data. 13/28
Our complete proposal Effect modification Estimate µ y ( x ) and µ t ( x ) using machine learning Qingyuan Zhao algorithms (for example random forest). Problem Select a model for effect modification by solving formulation Selective inference: why n � 2 � and how � · ( α + X T � � min ( Y i − ˆ µ y ( X i )) − T i − ˆ µ t ( X i ) i β ) + λ � β � 1 . Selective α, β inference for i =1 effect modification Use the pivotal statistic in Lee et al. [2016] to obtain Numerical selective confidence intervals of examples Future work n References � 2 . β ∗ � ( T i − µ t ( X i )) 2 � ∆( X i ) − α − X T M = arg min M β ˆ ˆ i , ˆ M α, β ˆ M i =1 14/28
Main result Effect modification Qingyuan Challenge: µ y and µ t are estimated (with error). Zhao Problem formulation Assumption Selective Rate assumptions in Robinson [1988]: inference: why and how µ t − µ t � ∞ = o p ( n − 1 / 4 ) , � ˆ � ˆ µ y − µ y � ∞ = o p (1) , Selective µ y − µ y � ∞ = o p ( n − 1 / 2 ) . inference for � ˆ µ t − µ t � ∞ · � ˆ effect modification Numerical Theorem examples Under additional assumptions on the selection event, the Future work pivotal statistic and hence the selective confidence interval is References asymptotically valid. 15/28
Simulation Effect modification Idealized estimation error Qingyuan Zhao The true design and the true outcome were generated by Problem formulation X i ∈ R 30 i . i . d . i . i . d . ∼ N ( X T ∼ N ( 0 , I ) , Y i i β , 1) , i = 1 , . . . , n , Selective inference: why where β = (1 , 1 , 1 , 0 , . . . , 0) T ∈ R 30 . and how Selective Then the design and the outcome were perturbed by inference for effect modification X i �→ X i · (1 + n − γ e 1 i ) , Y i �→ Y i + n − γ e 2 i , Numerical examples Future work where e 1 i and e 2 i are independent standard normal. References In the paper we also evaluated the validity of the entire procedure. 17/28
Rate assumptions are necessary and sufficient µ y − µ y � ∞ = o p ( n − 1 / 2 ). Effect Crucial rate assumption: � ˆ µ t − µ t � ∞ · � ˆ modification Phase transition at γ = 0 . 25. Qingyuan When γ > 0 . 25: FCR is controlled at 10%. Zhao When γ < 0 . 25: FCR is not controlled. Problem formulation Naive inference Selective inference Selective 1.00 inference: why ● ● ● and how ● ● Selective ● ● inference for False coverage proportion 0.75 ● ● effect ● γ modification ● ● 0.15 ● 0.2 Numerical 0.50 0.25 examples ● 0.3 ● 0.35 Future work ● ● References 0.25 0.00 1e+03 1e+05 1e+03 1e+05 Sample size 18/28
Recommend
More recommend