Controlling for confounders through approximate sufficiency Rina - PowerPoint PPT Presentation

Controlling for confounders through approximate sufficiency Rina Foygel Barber (joint with Lucas Janson) http://www.stat.uchicago.edu/~rina/

Collaborator Lucas Janson (Harvard U.) 2/27

✶ Intro: testing conditional independence confounders Z ? response Y features X Classical (parametric) approach: • Assume a parametric model such as Y | X , Z ∼ f ( · ; α ⊤ X + β ⊤ Z ) • Parametric inference to test H 0 : α = 0 3/27

Intro: testing conditional independence confounders Z ? response Y features X Classical (parametric) approach: • Assume a parametric model such as Y | X , Z ∼ f ( · ; α ⊤ X + β ⊤ Z ) • Parametric inference to test H 0 : α = 0 Model-X approach a.k.a.Conditional Randomization Test (Cand` es et al 2018) • Known distribution of X | Z (distrib. of Y unknown) • Choose function T ( X ; Y , Z ) that measures association • Resample copies ˜ X (1) , . . . , ˜ X ( M ) iid ∼ (distrib. of X | Z ) pval = 1 + � m ✶ { T ( ˜ X ( m ) ; Y , Z ) ≥ T ( X ; Y , Z ) } � 1 + M 3/27

Intro: testing conditional independence confounders Z ? response Y features X 4/27

Intro: testing conditional independence confounders Z ? response Y features X Model-X approach via sufficient statistics (Huang & Janson 2019) • Distribution of X | Z is only partially known • By conditioning on sufficient statistic S ( X , Z ), can resample copies ˜ X (1) , . . . , ˜ X ( M ) iid ∼ (distrib. of X | S ( X , Z )) & compute p-value for test statistic T as before 4/27

Intro: testing conditional independence confounders Z ? response Y features X Model-X approach via sufficient statistics (Huang & Janson 2019) • Distribution of X | Z is only partially known • By conditioning on sufficient statistic S ( X , Z ), can resample copies ˜ X (1) , . . . , ˜ X ( M ) iid ∼ (distrib. of X | S ( X , Z )) & compute p-value for test statistic T as before • Example: canonical GLMs � � X i · Z ⊤ i θ − a ( Z ⊤ — X i ∼ exp i θ ) , i = 1 , . . . , n , with θ unknown — S ( X , Z ) = � i X i Z i is suff. stat. for X = ( X 1 , . . . , X n ) 4/27

Intro: testing goodness-of-fit (GoF) More generally... Goodness-of-fit test Testing H 0 : X ∼ P θ for some θ ∈ Θ, where { P θ : θ ∈ Θ } is a parametric family 5/27

Intro: testing goodness-of-fit (GoF) More generally... Goodness-of-fit test Testing H 0 : X ∼ P θ for some θ ∈ Θ, where { P θ : θ ∈ Θ } is a parametric family Conditional independence testing can be a special case: • Assume X | Z ∼ P θ ( ·| Z ) for some θ ∈ Θ • Null hypothesis H 0 : X ⊥ ⊥ Y | Z • Equivalently... H 0 : X | Y , Z ∼ P θ ( ·| Z ) for some θ ∈ Θ • Note: we condition on Y and Z (i.e., treat as fixed) 5/27

Intro: testing goodness-of-fit (GoF) A general framework: • Choose any test statistic T : X → R • Draw copies ˜ X (1) , . . . , ˜ X ( M ) • Compute rank-based p-value pval = 1 + � m ✶ { T ( ˜ X ( m ) ) ≥ T ( X ) } 1 + M X ( M ) are exchangeable under H 0 � p-value is valid • If X , ˜ X (1) , . . . , ˜ 6/27

Co-sufficient sampling (CSS) Co-sufficient sampling X ( m ) ∼ (distrib. of X | S ( X )), Sample copies ˜ where S ( X ) is a sufficient statistic for the family { P θ : θ ∈ Θ } Can be applied to: 1. Test goodness-of-fit (GoF) (Engen & Lilleg˚ ard 1997, Lockhart et al 2007, Stephens 2012, Hazra 2013 ....) 2. Test conditional independence (special case of GoF) (Rosenbaum 1984, Kolassa 2003, Huang & Janson 2019) 3. Construct conf. intervals for a parameter of interest (by inverting GoF tests) 7/27

Co-sufficient sampling (CSS) Co-sufficient sampling X ( m ) ∼ (distrib. of X | S ( X )), Sample copies ˜ where S ( X ) is a sufficient statistic for the family { P θ : θ ∈ Θ } 8/27

Co-sufficient sampling (CSS) Co-sufficient sampling X ( m ) ∼ (distrib. of X | S ( X )), Sample copies ˜ where S ( X ) is a sufficient statistic for the family { P θ : θ ∈ Θ } Permutation tests are an example of CSS iid • H 0 : X 1 , . . . , X n ∼ D for D ∈ (some set) • The order statistics X (1) ≤ · · · ≤ X ( n ) are sufficient under the null • Permutation test ⇔ resampling X conditional on order statistics • Application: testing X ⊥ ⊥ Y H 0 : conditional on Y 1 , . . . , Y n , it holds that X 1 , . . . , X n are i.i.d. 8/27

Co-sufficient sampling (CSS) Limitation of co-sufficient sampling... no power in many settings! Example—logistic model: • X = ( X 1 , . . . , X n ) ∈ { 0 , 1 } n , Z = ( Z 1 , . . . , Z n ) ∈ ( R k ) n • If the Z i ’s are in general position, then � i X i Z i ∈ R k uniquely determines X X (1) = · · · = ˜ X ( M ) = X � zero power) (so if we resample, will have ˜ 9/27

Co-sufficient sampling (CSS) Limitation of co-sufficient sampling... no power in many settings! 10/27

Co-sufficient sampling (CSS) Limitation of co-sufficient sampling... no power in many settings! For many other models, the minimal sufficient statistic S ( X ) is essentially the data itself, e.g., • Mixture of Gaussians or mixture of GLMs • Non-canonical GLMs • Heavy tailed distributions (e.g., multivariate t) • Models with missing or corrupted data 10/27

Approximate sufficiency For a family { P θ : θ ∈ Θ } , a function S ( X ) is a sufficient statistic if ∀ θ, θ ′ . (distrib. of X | S ( X ) , X ∼ P θ ) = (distrib. of X | S ( X ) , X ∼ P θ ′ ) Asymptotic sufficiency: (Le Cam, Wald, ...) Informally... ∀ θ, θ ′ . (distrib. of X | S ( X ) , X ∼ P θ ) ≈ (distrib. of X | S ( X ) , X ∼ P θ ′ ) • Under regularity conditions, S ( X ) = � θ MLE ( X ) is asymp. suff. 11/27

Approximate co-sufficient sampling (aCSS) Main idea: • Let � θ ∈ Θ be an approximate MLE given the data X • Let p θ ( ·| � θ ) = distrib. of X | � θ , if marginally X ∼ P θ � under the null, X | � θ ∼ p θ 0 ( ·| � θ ) for the unknown true θ 0 X ( M ) from p � • Sample copies ˜ X (1) , . . . , ˜ θ ( ·| � θ ) ≈ p θ 0 ( ·| � θ ) � �� by approx. sufficiency X ( M ) ≈ exchangeable under H 0 � p-value is ≈ valid X , ˜ X (1) , . . . , ˜ 12/27

Approximate co-sufficient sampling (aCSS) Distance to exchangeability � � �� d exch ( X , ˜ X (1) , . . . , ˜ ( X , ˜ X (1) , . . . , ˜ X ( M ) ) = X ( M ) ) , D inf d TV Exch. distrib. D on X M +1 For any test statistic T ( X ), the p-value pval = 1 + � m ✶ { T ( ˜ X ( m ) ) ≥ T ( X ) } 1 + M satisfies P { pval ≤ α } ≤ α + d exch ( X , ˜ X (1) , . . . , ˜ X ( M ) ) . 13/27

aCSS algorithm • Step 1: choose a test statistic T : X → R • Step 2: observe data X , and compute an approximate MLE � θ X ( M ) from ≈ distribution of X | � • Step 3: sample copies ˜ X (1) , . . . , ˜ θ • Step 4: compute a rank-based p-value to test H 0 : pval = 1 + � m ✶ { T ( ˜ X ( m ) ) ≥ T ( X ) } 1 + M 14/27

aCSS algorithm • Step 2: observe data X , and compute an approximate MLE � θ Ideally would like to minimize σ · W ⊤ θ L ( θ ; X , W ) = L ( θ ; X ) + � �� perturb with W ∼ N (0 , 1 d I d ) penalized neg. log-likelihood (choose σ ≪ n 1 / 2 ) − log f ( X ; θ )+ R ( θ ) (see also Tian & Taylor 2018—random perturbation for selective inference) 15/27

aCSS algorithm • Step 2: observe data X , and compute an approximate MLE � θ Ideally would like to minimize σ · W ⊤ θ L ( θ ; X , W ) = L ( θ ; X ) + � �� perturb with W ∼ N (0 , 1 d I d ) penalized neg. log-likelihood (choose σ ≪ n 1 / 2 ) − log f ( X ; θ )+ R ( θ ) (see also Tian & Taylor 2018—random perturbation for selective inference) But... what if nonconvex? what if no global minimum? θ : X × R d → Θ, returns � — Function � θ ( X , W ). — If � θ ( X , W ) is a strict SOSP of L ( θ ; X , W ), proceed to next step. X (1) = · · · = ˜ X ( M ) = X � pval = 1. — Otherwise return ˜ 15/27

✶ ✶ aCSS algorithm X ( M ) from ≈ distribution of X | � • Step 3: sample copies ˜ X (1) , . . . , ˜ θ 16/27

✶ aCSS algorithm X ( M ) from ≈ distribution of X | � • Step 3: sample copies ˜ X (1) , . . . , ˜ θ Density of X | � θ , conditional on the event that � θ ( X , W ) is strict SOSP: � � � � −�∇ θ L ( � θ ; x ) � θ L ( � ∇ 2 ∝ f ( x ; θ 0 ) · exp · det θ ; x ) · ✶ x ∈X � 2 σ 2 / d θ տ support of X | � θ 16/27

aCSS algorithm X ( M ) from ≈ distribution of X | � • Step 3: sample copies ˜ X (1) , . . . , ˜ θ Density of X | � θ , conditional on the event that � θ ( X , W ) is strict SOSP: � � � � −�∇ θ L ( � θ ; x ) � θ L ( � ∇ 2 ∝ f ( x ; θ 0 ) · exp · det θ ; x ) · ✶ x ∈X � 2 σ 2 / d θ տ support of X | � θ θ 0 unknown � use � θ as plug-in estimate: � � � � −�∇ θ L ( � θ ; x ) � ∝ f ( x ; � θ L ( � ∇ 2 θ ) · exp · det θ ; x ) · ✶ x ∈X � 2 σ 2 / d θ 16/27

Controlling for confounders through approximate sufficiency Rina - PowerPoint PPT Presentation

Controlling for confounders through approximate sufficiency Rina Foygel Barber (joint with Lucas Janson) http://www.stat.uchicago.edu/~rina/ Collaborator Lucas Janson (Harvard U.) 2/27 Intro: testing conditional independence confounders

Improving Outcomes and Controlling Costs: Improving Outcomes and Controlling Costs: Improving

Project Pacifier Controlling a TV with a Clicker device by Florian Thurnwald THURNWALD

Controlling for Context by Standardizing V2A May 20, 2020 2A 1 2A 2 2020 Schield ECOTS

IC220 Set #7: Controlling the Single Cycle Implementation (Chapter Four) 1 Control Selecting

Approximate Computing Is Dead; Long Live Approximate Computing Adrian Sampson Cornell Hardware

Approximate Nearest Neighbors Search Approximate Nearest Neighbors Search in High Dimensions in

Causality Actions, Confounders and Interventions Christos Dimitrakakis October 30, 2019 . . .

www.dagitty.net Dealing with confounders just got easier! George TH Ellison PhD DSc TIME

EpiGraphDB Query for confounders http://epigraphdb.org/confounder/ (cf:Gwas)-[r1:MR]->

Confounders and Corfield: Back to the Future 12 July, 2018 0G 2018 ICOTS-10 1 0G 2018

Policy Evaluation with Latent Confounders via Optimal Balance Andrew Bennett 1 Cornell University

STATISTICS 536B, Lecture #3 March 3, 2015 General options for binary Y , binary X , confounders C

Pitfalls of data-driven networking: A case study of latent causal confounders in video streaming

in the presence of latent confounders and linear non-Gaussian SEMs Shohei Shimizu Osaka

Controlling Particle Diffusion Controlling Particle Diffusion on the Nanoscales Nanoscales on

Controlling Controlling Palmer Palmer Amaranth in Amaranth in Soybean Soybean Eric P.

You have studied bill width in a population of finches for many years. You record your data in

Rigorous Evaluation Analysis and Reporting Structure is from A Practical Guide to Usability

Hypothesis Testing Problem: choose, on basis of data X , between two alternatives. Formally:

M6S3 - Pvalue Interpretation Professor Jarad Niemi STAT 226 - Iowa State University October 30,

Julin Urbano, Harlley Lima, Alan Hanjalic @TU Delft SIGIR 2019 July 23 rd Paris Picture by

EXERCISES <

Set 5: Web Development Toolkits Why Use a Toolkit? Choices Yahoo! UI Library (YUI)

Drones & Planes Eyes in the Sky Planes Not the first form of flight Pioneered by

Controlling for confounders through approximate sufficiency Rina - PowerPoint PPT Presentation

Controlling for confounders through approximate sufficiency Rina Foygel Barber (joint with Lucas Janson) http://www.stat.uchicago.edu/~rina/ Collaborator Lucas Janson (Harvard U.) 2/27 Intro: testing conditional independence confounders

Improving Outcomes and Controlling Costs: Improving Outcomes and Controlling Costs: Improving

Project Pacifier Controlling a TV with a Clicker device by Florian Thurnwald THURNWALD

Controlling for Context by Standardizing V2A May 20, 2020 2A 1 2A 2 2020 Schield ECOTS

IC220 Set #7: Controlling the Single Cycle Implementation (Chapter Four) 1 Control Selecting

Approximate Computing Is Dead; Long Live Approximate Computing Adrian Sampson Cornell Hardware

Approximate Nearest Neighbors Search Approximate Nearest Neighbors Search in High Dimensions in

Causality Actions, Confounders and Interventions Christos Dimitrakakis October 30, 2019 . . .

www.dagitty.net Dealing with confounders just got easier! George TH Ellison PhD DSc TIME

EpiGraphDB Query for confounders http://epigraphdb.org/confounder/ (cf:Gwas)-[r1:MR]-&gt;

Confounders and Corfield: Back to the Future 12 July, 2018 0G 2018 ICOTS-10 1 0G 2018

Policy Evaluation with Latent Confounders via Optimal Balance Andrew Bennett 1 Cornell University

STATISTICS 536B, Lecture #3 March 3, 2015 General options for binary Y , binary X , confounders C

Pitfalls of data-driven networking: A case study of latent causal confounders in video streaming

in the presence of latent confounders and linear non-Gaussian SEMs Shohei Shimizu Osaka

Controlling Particle Diffusion Controlling Particle Diffusion on the Nanoscales Nanoscales on

Controlling Controlling Palmer Palmer Amaranth in Amaranth in Soybean Soybean Eric P.

You have studied bill width in a population of finches for many years. You record your data in

Rigorous Evaluation Analysis and Reporting Structure is from A Practical Guide to Usability

Hypothesis Testing Problem: choose, on basis of data X , between two alternatives. Formally:

M6S3 - Pvalue Interpretation Professor Jarad Niemi STAT 226 - Iowa State University October 30,

Julin Urbano, Harlley Lima, Alan Hanjalic @TU Delft SIGIR 2019 July 23 rd Paris Picture by

EXERCISES &lt;

Set 5: Web Development Toolkits Why Use a Toolkit? Choices Yahoo! UI Library (YUI)

Drones &amp; Planes Eyes in the Sky Planes Not the first form of flight Pioneered by

EpiGraphDB Query for confounders http://epigraphdb.org/confounder/ (cf:Gwas)-[r1:MR]->

EXERCISES <

Drones & Planes Eyes in the Sky Planes Not the first form of flight Pioneered by