bootstrapping sensitivity analysis
play

Bootstrapping Sensitivity analysis Qingyuan Zhao Statistical - PowerPoint PPT Presentation

Bootstrapping Sensitivity analysis Qingyuan Zhao Statistical Laboratory, University of Cambridge August 3, 2020 @ JSM Sensitivity analysis The broader concept [Saltelli et al., 2004] Sensitivity analysis is the study of how the


  1. Bootstrapping Sensitivity analysis Qingyuan Zhao Statistical Laboratory, University of Cambridge August 3, 2020 @ JSM

  2. Sensitivity analysis The broader concept [Saltelli et al., 2004] ◮ Sensitivity analysis is “the study of how the uncertainty in the output of a mathematical model or system (numerical or otherwise) can be apportioned to different sources of uncertainty in its inputs ”. ◮ Model inputs may be any factor that “can be changed in a model prior to its execution”, including” “structural and epistemic sources of uncertainty”. In observational studies ◮ The most typical question is: How do the qualitative and/or quantitative conclusions of the observational study change if the no unmeasured confounding assumption is violated? 1/18

  3. Sensitivity analysis for observational studies State of the art ◮ Gazillions of methods specifically designed for different problems. ◮ Various forms of statistical guarantees. ◮ Often not straightforward to interpret Goals of this talk 1. What is the common structure behind various methods for sensitivity analysis? 2. Can we bootstrap sensitivity analysis? 2/18

  4. What is a sensitivity model? General setup infer Observed data O = ⇒ Distribution of the full data F . ◮ Prototypical example: Observe iid copies of O = ( X , A , Y ) from the underlying full data F = ( X , A , Y (0) , Y (1)), where A is a binary treatment, X is covariates, Y is outcome. An abstraction A sensitivity model is a family of distributions F θ,η of F that satisfies: 1. Augmentation: Setting η = 0 corresponds to a primary analysis assuming no unmeasured confounders. 2. Model identifiability: Given η , the implied marginal distribution O θ,η of the observed data O is identifiable. Statistical problem Given η (or the range of η ), use the observed data to make inference about some causal parameter β = β ( θ, η ). 3/18

  5. Understanding sensitivity models Observational equivalence ◮ F θ,η and F θ ′ ,η ′ are said to be observationally equivalent if O θ,η = O θ ′ ,η ′ . We write this as F θ,η ≃ F θ ′ ,η ′ . ◮ Equivalence class [ F θ,η ] = {F θ ′ ,η ′ | F θ,η ≃ F θ ′ ,η ′ } . Types of sensitivity models Testable models When F θ,η is not rich enough, [ F θ,η ] is a singleton and η can be identified from the observed data (should be avoided in practice). Global models For any ( θ, η ) and η ′ , there exists F θ ′ ,η ′ ≃ F θ,η . Separable models For any ( θ, η ), F θ,η ≃ F θ, 0 . 4/18

  6. A visualization η η [F θ , η ] [F θ , η ] θ θ Left: Global sensitivity models; Right: Separable sensitivity models. 5/18

  7. Statistical inference Modes of inference 1. Point identified sensitivity analysis is performed at a fixed η . 2. Partially identified sensitivity analysis is performed simultaneously over η ∈ H for a given range H . Statistical guarantees of interval estimators 1. Confidence interval [ C L ( O 1: n ; η ) , C U ( O 1: n ; η )] satisfies � � inf β ( θ 0 , η 0 ) ∈ [ C L ( η 0 ) , C U ( η 0 )] ≥ 1 − α. θ 0 ,η 0 P θ 0 ,η 0 2. Sensitivity interval [ C L ( O 1: n ; H ) , C U ( O 1: n ; H )] satisfies � � inf β ( θ 0 , η 0 ) ∈ [ C L ( H ) , C U ( H )] ≥ 1 − α. (1) θ 0 ,η 0 P θ 0 ,η 0 They look almost the same, but because the latter interval only depends on H , (1) is actually equivalent to � � inf inf β ( θ, η ) ∈ [ C L ( H ) , C U ( H )] ≥ 1 − α. P θ 0 ,η 0 θ 0 ,η 0 F θ,η ≃F θ 0 ,η 0 6/18

  8. Approaches to sensitivity analysis ◮ Point identified sensitivity analysis is basically the same as primary analysis with known “offset” η . ◮ Partially identified sensitivity analysis is much harder. Let F θ 0 ,η 0 be the truth. The fundamental problem is to make inference about η ∈ H { β ( θ, η ) | F θ,η ≃ F θ 0 ,η 0 } and sup inf { β ( θ, η ) | F θ,η ≃ F θ 0 ,η 0 } η ∈ H Method 1 Solve the population optimization problems analytically. ◮ Not always feasible. Method 2 Solve the sample approximation problem and use asymptotic normality. ◮ Central limit theorems not always true or established. Method 3 Take the union of confidence intervals � [ C L ( H ) , C U ( H )] = [ C L ( η ) , C U ( η )] . η ∈ H ◮ By the union bound, this is a (1 − α )-sensitivity interval if all [ C L ( η ) , C U ( η )] are (1 − α )-confidence intervals. 7/18

  9. Computational challenges for Method 3 � [ C L ( H ) , C U ( H )] = [ C L ( η ) , C U ( η )] . η ∈ H ◮ Using asymptotic theory, it is often not difficult to construct asymptotic confidence intervals of the form 2 · ˆ σ ( η ) [ C L ( η ) , C U ( η )] = ˆ β ( η ) ∓ z α √ n ◮ Unlike Method 2 that only needs to optimize ˆ β ( η ), Method 3 further needs to optimize the usually much more complicated ˆ σ ( η ) over η ∈ H . 8/18

  10. Method 4: Percentile bootstrap 1. For fixed η , use the percentile bootstrap confidence interval ( b is an index for data resample) � ˆ � ˆ � � �� ˆ ˆ [ C L ( η ) , C U ( η )] = Q α β b ( η ) , Q 1 − α β b ( η ) . 2 2 2. Use the generalized minimax inequality to interchange quantile and infimum/supremum: Percentile bootstrap sensitivity interval ˆ � ˆ � ˆ ˆ � � � � � � ˆ ˆ ˆ ˆ Q α inf β b ( η ) ≤ inf η Q α β b ( η ) ≤ sup η Q 1 − α β b ( η ) ≤ Q 1 − α sup β b ( η ) . 2 2 2 2 η η Union sensitivity interval Advantages ◮ Computation is reduced to repeating Method 2 over data resamples. ◮ Only need coverage guarantee for [ C L ( η ) , C U ( η )] for fixed η . 9/18

  11. Bootstrapping sensitivity analysis Point-identified parameter: Efron’s bootstrap Bootstrap Point estimator = = = = = = = = = = = = ⇒ Confidence interval Partially identified parameter: Three ideas Optimization Percentile Bootstrap Minimax inequality Extrema estimator = = = = = = = = = = = = ⇒ Sensitivity interval Rest of the talk Apply this idea to IPW estimators for a marginal sensitivity model. 10/18

  12. Our sensitivity model ◮ Consider the prototypical example: A is a binary treatment, X is covariates, Y is outcome. ◮ U “summarizes” unmeasured confounding, so A ⊥ ⊥ Y (0) , Y (1) | X , U . ◮ Let e 0 ( x ) = P 0 ( A = 1 | X = x ) , e ( x , u ) = P ( A = 1 | X = x , U = u ) . Marginal sensitivity models e ( x , u ) : 1 � � E M (Γ) = Γ ≤ OR ( e ( x , u ) , e 0 ( x )) ≤ Γ , ∀ x ∈ X , y . ◮ Compare this to the Rosenbaum [2002] model: e ( x , u ) : 1 � � E R (Γ) = Γ ≤ OR ( e ( x , u 1 ) , e ( x , u 2 )) ≤ Γ , ∀ x ∈ X , u 1 , u 2 . ◮ Tan [2006] first considered the marginal model, but he did not consider statistical inference in finite sample. √ Γ) ⊆ E R (Γ) ⊆ E M (Γ) . 1 ◮ Relationship between the two models: E M ( 1 The second part needs “compatibility”: e ( x , y ) should marginalize to e 0 ( x ). 11/18

  13. Parametric extension ◮ In practice, the propensity score e 0 ( X ) = P 0 ( A = 1 | X ) is often estimated by a parametric model. Parametric marginal sensitivity models e ( x , u ) : 1 � � E M (Γ , β 0 ) = Γ ≤ OR ( e ( x , u ) , e β 0 ( x )) ≤ Γ , ∀ x ∈ X , y ◮ e β 0 ( x ) is the best parametric approximation to e 0 ( x ). This sensitivity model covers both 1. Model misspecification , that is, e β 0 ( x ) � = e 0 ( x ); and 2. Missing not at random , that is, e 0 ( x ) � = e ( x , u ). 12/18

  14. Logistic representations 1. Rosenbaum’s sensitivity model: logit ( e ( x , u )) = g ( x ) + u log Γ , where 0 ≤ U ≤ 1. 2. Marginal sensitivity model: logit ( e η ( x , u )) = logit ( e 0 ( x )) + η ( x , u ) , where η ∈ H Γ = { η ( x , u ) | � η � ∞ = sup | η ( x , u ) | ≤ log Γ } . 3. Parametric marginal sensitivity model: logit ( e η ( x , u )) = logit ( e β 0 ( x )) + η ( x , u ) , where η ∈ H Γ . 13/18

  15. Computation Bootstrapping partially identified sensitivity analysis Optimization Percentile Bootstrap Minimax inequality Extrema estimator = = = = = = = = = = = = ⇒ Sensitivity interval ◮ Stabilized inverse-probability weighted (IPW) estimator for β = E [ Y (1)]: n � − 1 � 1 n � 1 A i A i Y i � ˆ � � β ( η ) = , n ˆ e η ( X i , U i ) n e η ( X i , U i ) ˆ i =1 i =1 where ˆ e η can be obtained by plugging in an estimator of β 0 . ◮ Computing extrema of ˆ β ( η ) is a linear fractional programming : Let h i = exp {− η ( X i , U i ) } and g i = 1 / e ˆ β 0 ( X i ), � n i =1 A i Y i [1 + h i ( g i − 1)] max or min i =1 A i [1 + h i ( g i − 1)] , � n h i ∈ [Γ − 1 , Γ] , i = 1 , . . . , n . subject to This can be converted to a linear programming and can in fact be solved in O ( n ) time (optimal rate). 14/18

  16. Example Fish consumption and blood mercury ◮ 873 controls: ≤ 1 serving of fish per month. ◮ 234 treated: ≥ 12 servings of fish per month. ◮ Covariates: gender, age, income (very imblanced), race, education, ever smoked, # cigarettes. Implementation details ◮ Rosenbaum’s method: 1-1 matching, CI constructed by Hodges-Lehmann (assuming causal effect is constant). ◮ Our method (percentile Bootstrap): stabilized IPW for ATT w/wo augmentation by outcome linear regression. 15/18

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend