controlling for confounders through approximate
play

Controlling for confounders through approximate sufficiency Rina - PowerPoint PPT Presentation

Controlling for confounders through approximate sufficiency Rina Foygel Barber (joint with Lucas Janson) http://www.stat.uchicago.edu/~rina/ Collaborator Lucas Janson (Harvard U.) 2/27 Intro: testing conditional independence confounders


  1. Controlling for confounders through approximate sufficiency Rina Foygel Barber (joint with Lucas Janson) http://www.stat.uchicago.edu/~rina/

  2. Collaborator Lucas Janson (Harvard U.) 2/27

  3. ✶ Intro: testing conditional independence confounders Z ? response Y features X Classical (parametric) approach: • Assume a parametric model such as Y | X , Z ∼ f ( · ; α ⊤ X + β ⊤ Z ) • Parametric inference to test H 0 : α = 0 3/27

  4. Intro: testing conditional independence confounders Z ? response Y features X Classical (parametric) approach: • Assume a parametric model such as Y | X , Z ∼ f ( · ; α ⊤ X + β ⊤ Z ) • Parametric inference to test H 0 : α = 0 Model-X approach a.k.a.Conditional Randomization Test (Cand` es et al 2018) • Known distribution of X | Z (distrib. of Y unknown) • Choose function T ( X ; Y , Z ) that measures association • Resample copies ˜ X (1) , . . . , ˜ X ( M ) iid ∼ (distrib. of X | Z ) pval = 1 + � m ✶ { T ( ˜ X ( m ) ; Y , Z ) ≥ T ( X ; Y , Z ) } � 1 + M 3/27

  5. Intro: testing conditional independence confounders Z ? response Y features X 4/27

  6. Intro: testing conditional independence confounders Z ? response Y features X Model-X approach via sufficient statistics (Huang & Janson 2019) • Distribution of X | Z is only partially known • By conditioning on sufficient statistic S ( X , Z ), can resample copies ˜ X (1) , . . . , ˜ X ( M ) iid ∼ (distrib. of X | S ( X , Z )) & compute p-value for test statistic T as before 4/27

  7. Intro: testing conditional independence confounders Z ? response Y features X Model-X approach via sufficient statistics (Huang & Janson 2019) • Distribution of X | Z is only partially known • By conditioning on sufficient statistic S ( X , Z ), can resample copies ˜ X (1) , . . . , ˜ X ( M ) iid ∼ (distrib. of X | S ( X , Z )) & compute p-value for test statistic T as before • Example: canonical GLMs � � X i · Z ⊤ i θ − a ( Z ⊤ — X i ∼ exp i θ ) , i = 1 , . . . , n , with θ unknown — S ( X , Z ) = � i X i Z i is suff. stat. for X = ( X 1 , . . . , X n ) 4/27

  8. Intro: testing goodness-of-fit (GoF) More generally... Goodness-of-fit test Testing H 0 : X ∼ P θ for some θ ∈ Θ, where { P θ : θ ∈ Θ } is a parametric family 5/27

  9. Intro: testing goodness-of-fit (GoF) More generally... Goodness-of-fit test Testing H 0 : X ∼ P θ for some θ ∈ Θ, where { P θ : θ ∈ Θ } is a parametric family Conditional independence testing can be a special case: • Assume X | Z ∼ P θ ( ·| Z ) for some θ ∈ Θ • Null hypothesis H 0 : X ⊥ ⊥ Y | Z • Equivalently... H 0 : X | Y , Z ∼ P θ ( ·| Z ) for some θ ∈ Θ • Note: we condition on Y and Z (i.e., treat as fixed) 5/27

  10. Intro: testing goodness-of-fit (GoF) A general framework: • Choose any test statistic T : X → R • Draw copies ˜ X (1) , . . . , ˜ X ( M ) • Compute rank-based p-value pval = 1 + � m ✶ { T ( ˜ X ( m ) ) ≥ T ( X ) } 1 + M X ( M ) are exchangeable under H 0 � p-value is valid • If X , ˜ X (1) , . . . , ˜ 6/27

  11. Co-sufficient sampling (CSS) Co-sufficient sampling X ( m ) ∼ (distrib. of X | S ( X )), Sample copies ˜ where S ( X ) is a sufficient statistic for the family { P θ : θ ∈ Θ } Can be applied to: 1. Test goodness-of-fit (GoF) (Engen & Lilleg˚ ard 1997, Lockhart et al 2007, Stephens 2012, Hazra 2013 ....) 2. Test conditional independence (special case of GoF) (Rosenbaum 1984, Kolassa 2003, Huang & Janson 2019) 3. Construct conf. intervals for a parameter of interest (by inverting GoF tests) 7/27

  12. Co-sufficient sampling (CSS) Co-sufficient sampling X ( m ) ∼ (distrib. of X | S ( X )), Sample copies ˜ where S ( X ) is a sufficient statistic for the family { P θ : θ ∈ Θ } 8/27

  13. Co-sufficient sampling (CSS) Co-sufficient sampling X ( m ) ∼ (distrib. of X | S ( X )), Sample copies ˜ where S ( X ) is a sufficient statistic for the family { P θ : θ ∈ Θ } Permutation tests are an example of CSS iid • H 0 : X 1 , . . . , X n ∼ D for D ∈ (some set) • The order statistics X (1) ≤ · · · ≤ X ( n ) are sufficient under the null • Permutation test ⇔ resampling X conditional on order statistics • Application: testing X ⊥ ⊥ Y H 0 : conditional on Y 1 , . . . , Y n , it holds that X 1 , . . . , X n are i.i.d. 8/27

  14. Co-sufficient sampling (CSS) Limitation of co-sufficient sampling... no power in many settings! Example—logistic model: • X = ( X 1 , . . . , X n ) ∈ { 0 , 1 } n , Z = ( Z 1 , . . . , Z n ) ∈ ( R k ) n • If the Z i ’s are in general position, then � i X i Z i ∈ R k uniquely determines X X (1) = · · · = ˜ X ( M ) = X � zero power) (so if we resample, will have ˜ 9/27

  15. Co-sufficient sampling (CSS) Limitation of co-sufficient sampling... no power in many settings! 10/27

  16. Co-sufficient sampling (CSS) Limitation of co-sufficient sampling... no power in many settings! For many other models, the minimal sufficient statistic S ( X ) is essentially the data itself, e.g., • Mixture of Gaussians or mixture of GLMs • Non-canonical GLMs • Heavy tailed distributions (e.g., multivariate t) • Models with missing or corrupted data 10/27

  17. Approximate sufficiency For a family { P θ : θ ∈ Θ } , a function S ( X ) is a sufficient statistic if ∀ θ, θ ′ . (distrib. of X | S ( X ) , X ∼ P θ ) = (distrib. of X | S ( X ) , X ∼ P θ ′ ) Asymptotic sufficiency: (Le Cam, Wald, ...) Informally... ∀ θ, θ ′ . (distrib. of X | S ( X ) , X ∼ P θ ) ≈ (distrib. of X | S ( X ) , X ∼ P θ ′ ) • Under regularity conditions, S ( X ) = � θ MLE ( X ) is asymp. suff. 11/27

  18. Approximate co-sufficient sampling (aCSS) Main idea: • Let � θ ∈ Θ be an approximate MLE given the data X • Let p θ ( ·| � θ ) = distrib. of X | � θ , if marginally X ∼ P θ � under the null, X | � θ ∼ p θ 0 ( ·| � θ ) for the unknown true θ 0 X ( M ) from p � • Sample copies ˜ X (1) , . . . , ˜ θ ( ·| � θ ) ≈ p θ 0 ( ·| � θ ) � �� � by approx. sufficiency X ( M ) ≈ exchangeable under H 0 � p-value is ≈ valid X , ˜ X (1) , . . . , ˜ 12/27

  19. Approximate co-sufficient sampling (aCSS) Distance to exchangeability � � �� d exch ( X , ˜ X (1) , . . . , ˜ ( X , ˜ X (1) , . . . , ˜ X ( M ) ) = X ( M ) ) , D inf d TV Exch. distrib. D on X M +1 For any test statistic T ( X ), the p-value pval = 1 + � m ✶ { T ( ˜ X ( m ) ) ≥ T ( X ) } 1 + M satisfies P { pval ≤ α } ≤ α + d exch ( X , ˜ X (1) , . . . , ˜ X ( M ) ) . 13/27

  20. aCSS algorithm • Step 1: choose a test statistic T : X → R • Step 2: observe data X , and compute an approximate MLE � θ X ( M ) from ≈ distribution of X | � • Step 3: sample copies ˜ X (1) , . . . , ˜ θ • Step 4: compute a rank-based p-value to test H 0 : pval = 1 + � m ✶ { T ( ˜ X ( m ) ) ≥ T ( X ) } 1 + M 14/27

  21. aCSS algorithm • Step 1: choose a test statistic T : X → R • Step 2: observe data X , and compute an approximate MLE � θ X ( M ) from ≈ distribution of X | � • Step 3: sample copies ˜ X (1) , . . . , ˜ θ • Step 4: compute a rank-based p-value to test H 0 : pval = 1 + � m ✶ { T ( ˜ X ( m ) ) ≥ T ( X ) } 1 + M 14/27

  22. aCSS algorithm • Step 2: observe data X , and compute an approximate MLE � θ Ideally would like to minimize σ · W ⊤ θ L ( θ ; X , W ) = L ( θ ; X ) + � �� � � �� � perturb with W ∼ N (0 , 1 d I d ) penalized neg. log-likelihood (choose σ ≪ n 1 / 2 ) − log f ( X ; θ )+ R ( θ ) (see also Tian & Taylor 2018—random perturbation for selective inference) 15/27

  23. aCSS algorithm • Step 2: observe data X , and compute an approximate MLE � θ Ideally would like to minimize σ · W ⊤ θ L ( θ ; X , W ) = L ( θ ; X ) + � �� � � �� � perturb with W ∼ N (0 , 1 d I d ) penalized neg. log-likelihood (choose σ ≪ n 1 / 2 ) − log f ( X ; θ )+ R ( θ ) (see also Tian & Taylor 2018—random perturbation for selective inference) But... what if nonconvex? what if no global minimum? θ : X × R d → Θ, returns � — Function � θ ( X , W ). — If � θ ( X , W ) is a strict SOSP of L ( θ ; X , W ), proceed to next step. X (1) = · · · = ˜ X ( M ) = X � pval = 1. — Otherwise return ˜ 15/27

  24. ✶ ✶ aCSS algorithm X ( M ) from ≈ distribution of X | � • Step 3: sample copies ˜ X (1) , . . . , ˜ θ 16/27

  25. ✶ aCSS algorithm X ( M ) from ≈ distribution of X | � • Step 3: sample copies ˜ X (1) , . . . , ˜ θ Density of X | � θ , conditional on the event that � θ ( X , W ) is strict SOSP: � � � � −�∇ θ L ( � θ ; x ) � θ L ( � ∇ 2 ∝ f ( x ; θ 0 ) · exp · det θ ; x ) · ✶ x ∈X � 2 σ 2 / d θ տ support of X | � θ 16/27

  26. aCSS algorithm X ( M ) from ≈ distribution of X | � • Step 3: sample copies ˜ X (1) , . . . , ˜ θ Density of X | � θ , conditional on the event that � θ ( X , W ) is strict SOSP: � � � � −�∇ θ L ( � θ ; x ) � θ L ( � ∇ 2 ∝ f ( x ; θ 0 ) · exp · det θ ; x ) · ✶ x ∈X � 2 σ 2 / d θ տ support of X | � θ θ 0 unknown � use � θ as plug-in estimate: � � � � −�∇ θ L ( � θ ; x ) � ∝ f ( x ; � θ L ( � ∇ 2 θ ) · exp · det θ ; x ) · ✶ x ∈X � 2 σ 2 / d θ 16/27

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend