Using sparsity to overcome unmeasured confounding: Two examples - PowerPoint PPT Presentation

Using sparsity to overcome unmeasured confounding: Two examples Qingyuan Zhao Statistical Laboratory, University of Cambridge October 15, 2019 @ MRC-BSU Seminar Slides and more information are available at http://www.statslab.cam.ac.uk/~qz280/ .

About me New University Lecturer in the Stats Lab (in West Cambridge). PhD (2011-2016) in Statistics from Stanford, advised by Trevor Hastie. Postdoc (2016-2019) at University of Pennsylvania, advised by Dylan Small and Sean Hennessy. Current research area: Causal Inference. Interested applications: public health, genetics, social sciences, computer science. Qingyuan Zhao (Stats Lab) Specificity MRC-BSU Seminar 1 / 30

Growing interest in causal inference United States United Kingdom ● ● 100 ● ● ● ● ● Interest (Google Trends) ● ● ● ● 75 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● 50 ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● 25 ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ●● ● ●●●●● ● ● ● ●●● ●●● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ●●●●● ● ● ● ● ● ● 0 ●● ● ● ●● ● ● ● ● Jan 2010 Jan 2012 Jan 2014 Jan 2016 Jan 2018 Jan 2020 Time Figure: Data from Google Trends. Qingyuan Zhao (Stats Lab) Specificity MRC-BSU Seminar 2 / 30

Old and new problems Epidemiology and public health: effectiveness of prevention/treatment, causal effect of risk factors, etc. Quantitative social sciences: evaluation of social programs, policy impact, etc. Precision medicine. Massive online experiments. Fairness of machine learning algorithms. Big Data � = better inference. Qingyuan Zhao (Stats Lab) Specificity MRC-BSU Seminar 3 / 30

Causal inference in Cambridge In Stats Lab A new 16-lecture Part III course in the Michaelmas term (Tuesday & Thursday 12-1). A new reading group ( http://talks.cam.ac.uk/show/index/105688 ). In BSU and the Clinical School I would like to learn more!! Cross schools? Causal inference research requires inter-disciplinary collaboration. Qingyuan Zhao (Stats Lab) Specificity MRC-BSU Seminar 4 / 30

Back to the main topic Bradford Hill (1965) criteria Strength (effect size); 1 Consistency (reproducibility); 2 Specificity; Specificity; 3 Temporality; 4 Biological gradient (dose-response relationship); 5 Plausibility (mechanism); 6 Coherence (between epidemiology and lab findings); 7 Experiment; 8 Analogy. 9 Qingyuan Zhao (Stats Lab) Specificity MRC-BSU Seminar 5 / 30

Hill’s original specificity criterion One reason, needless to say, is the specificity of the association. . . . If as here, the association is limited to specific workers and to particular sites and types of disease and there is no association between the work and other modes of dying, then clearly that is a strong argument in favor of causation. Now considered weak or irrelevant. Counter-example: smoking. In Hill’s era, exposure = an occupational setting or a residential location (proxies for true exposures). Nowadays, exposure is much more precise. Qingyuan Zhao (Stats Lab) Specificity MRC-BSU Seminar 6 / 30

This talk: Specificity More precisely: How specificity/sparsity assumptions can help us overcome unmeasured confounding. Growing awareness Development in high-dimensional statistics: multiple testing, lasso and sparsity, model selection, . . . . Growing interest in using negative controls for causal inference . Biological mechanisms are often specific (or more specific as we go more micro). Qingyuan Zhao (Stats Lab) Specificity MRC-BSU Seminar 7 / 30

Two examples Removing “batch effects” in multiple testing A framework called Confounder Adjusted Testing and Estimation (CATE), proposed in Wang*, Zhao*, Hastie, Owen (2017) Annals of Statistics . Invalid instrumental variables in Mendelian randomization A class of methods called Robust Adjusted Profile Score (RAPS), proposed in Zhao, Wang, Hemani, Bowden, Small (2019+) Annals of Statistics . Zhao, Chen, Wang, Small (2019) International Journal of Epidemiology . Connection The two share the same structure and are in some sense “dual” problems. Qingyuan Zhao (Stats Lab) Specificity MRC-BSU Seminar 8 / 30

Batch effect: Motivating example 6 0.15 N(0.055,0.066^2) N(0.024,2.6^2) 4 density 0.10 density 2 0.05 0.00 0 −5 0 5 −1.0 −0.5 0.0 0.5 1.0 t−statistics t−statistics 0.8 2.0 N(−1.8,0.51^2) 0.6 1.5 N(0.043,0.24^2) density density 0.4 1.0 0.2 0.5 0.0 0.0 −4 −2 0 2 4 −1.0 −0.5 0.0 0.5 1.0 t−statistics t−statistics Figure: Empirical distribution of t -statistics for microarray datasets. Qingyuan Zhao (Stats Lab) Specificity MRC-BSU Seminar 9 / 30

Motivating example Table: Empirical distribution of the t -statistics Dataset Median Median absolute deviation 1 0.024 2.6 2 0.055 0.066 3 -1.8 0.51 2 (adjusted for known batches) 0.043 0.24 Far from the “expected” null N (0 , 1) if true effect is sparse. Most likely explanation: batch effect/unmeasured confounding. Qingyuan Zhao (Stats Lab) Specificity MRC-BSU Seminar 10 / 30

Methods Previous work Price et al. (2006) Nat Gen : Add principal components in GWAS. Leek and Storey (2008) PNAS : Surrogate variable analysis (SVA). Gagnon-Bartsch and Speed (2012) Biostatistics : Remove unwanted variation (RUV) using negative control genes. Sun, Zhang, Owen (2012) AoAS : Use sparsity to remove latent variable. A lot of great heuristics. Methods work well in some scenarios. Modelling assumptions were unclear, basically no theory. Connections between the methods were unexplored. Probably most importantly (and surprisingly), nobody called this problem “unmeasured confounding”. Qingyuan Zhao (Stats Lab) Specificity MRC-BSU Seminar 11 / 30

Statistical model Notations X : treatment ( n × 1 vector). Y : outcome ( n × p matrix). In this example, high-dimensional gene expressions. U : unobserved confounder ( n × d matrix). Rows of X , Y , U are observations. Columns of Y are genes. It turns out the everyone is (implicitly) using the following model: Y = X α T + U γ T + noise , U = X β T + noise . Therefore, ordinary least squares of Y vs. X estimate p × 1 = α Γ p × 1 + γ β . d × 1 p × d Qingyuan Zhao (Stats Lab) Specificity MRC-BSU Seminar 12 / 30

Identifiability problem Y = X α T + U γ T + noise , U = X β T + noise . Can be identified without (much) assumption OLS of Y ∼ X : p × 1 = α Γ p × 1 + γ β . p × d d × 1 Factor analysis on the residuals of Y ∼ X regression: γ . Specificity needed α and β cannot be immediately identified because there are more parameters ( p + d ) than equations ( p ). Can be resolved by assuming α is “specific”. Qingyuan Zhao (Stats Lab) Specificity MRC-BSU Seminar 13 / 30

Diagram for CATE U γ 1 β γ 2 γ 3 Y 1 α 1 α 2 Y 2 X α 3 Y 3 Specificity Some entries of α are zero (arrows are missing). Qingyuan Zhao (Stats Lab) Specificity MRC-BSU Seminar 14 / 30

Specificity assumptions p × 1 = α Γ p × 1 + γ β . d × 1 p × d We can assume two kinds of specificity (either one is enough for identification): Negative control At least d known entries of α are zero. Sparsity Most entries of α are zero, though their positions are unknown. Qingyuan Zhao (Stats Lab) Specificity MRC-BSU Seminar 15 / 30

The CATE procedure p × 1 = α Γ p × 1 + γ β . d × 1 p × d 1 Obtain ˆ Γ by regressing Y on X ; 2 Obtain ˆ γ by applying factor analysis on the residuals of Y ∼ X regression; 3-1 With negative controls (say α 1: k = 0), estimate β by regressing ˆ Γ 1: k on ˆ γ 1: k . 3-2 Or using sparsity, estimate β by regressing ˆ Γ on ˆ γ with robust loss function: p � ˆ ρ (ˆ γ T β = arg min Γ j − ˆ j β ) . j =1 (Basically the same as putting lasso penalty on α ). α = ˆ γ ˆ 4 Estimate α by ˆ Γ − ˆ β . Qingyuan Zhao (Stats Lab) Specificity MRC-BSU Seminar 16 / 30

Theory for CATE Our paper derived an asymptotic theory for CATE (distribution of ˆ β and ˆ α , optimally, etc.) Key assumptions Factors are strong enough: � γ � 2 F = Θ( p ). 1 ◮ Recall γ is p × d matrix of the effect of confounders on gene expressions. ◮ In real data: often a small number of strong factors + many weak factors. √ n / p → 0. In the sparsity scenario, α is quite sparse: � α � 1 2 ◮ After working on the dual problem—MR, now I think this rate may be too stringent. Highlight of the theory Under these two (perhaps unrealistic) assumptions, CATE may be as efficient as the oracle OLS estimator that observes Z ! Simulations show that CATE (with some tweaks) perform quite well even when these assumptions are not satisfied. Qingyuan Zhao (Stats Lab) Specificity MRC-BSU Seminar 17 / 30

Using sparsity to overcome unmeasured confounding: Two examples - PowerPoint PPT Presentation

Using sparsity to overcome unmeasured confounding: Two examples Qingyuan Zhao Statistical Laboratory, University of Cambridge October 15, 2019 @ MRC-BSU Seminar Slides and more information are available at http://www.statslab.cam.ac.uk/~qz280/

Using sparsity to overcome unmeasured confounding: Two parametric tales Qingyuan Zhao

Fr ont-door Versus Back-door Adjustment with Unmeasured Confounding: Bias Formulas for Front-door

13 Jan, 2011 Statistical Literacy: Confounding UTSA Confounding 2011 1 2011 2 Statistical

Confounding variables EX P ERIMEN TAL DES IGN IN P YTH ON Luke Hayden Instructor Confounding

Sparsity, Randomness and Compressed Sensing Petros Boufounos Mitsubishi Electric Research Labs

STAT 113 Sampling, Randomization and Confounding Colin Reimer Dawson Oberlin College September

Introduction to Sparsity in Modeling and Learning Introduction to Sparsity in Modeling and

Sparsity and image processing Aurlie Boisbunon INRIA-SAM, AYIN March 26, 2014 Why sparsity?

V0G 7/21/2016 IASE 2B: Teaching Confounding V0 2016 IASE 1 V0 2016 IASE-2 2 B: Teaching

V1 August 1, 2016 Confounding: A Big Idea V1 2015 StatChat2 1 V1 2015 StatChat2 2 2

STAT 113 Sampling, Randomization and Confounding Colin Reimer Dawson Oberlin College August 31

Integrated service logistics to overcome every offshore challenge. For maximum yield, overcome

Model Checking My 27 year quest to overcome the My 27 year quest to overcome the state explosion

Sparsity in Information Theory and Biology Olgica Milenkovic ECE Department, UIUC Joint work

Sparsity and optimality of splines: Deterministic vs. statistical justification Michael Unser

Blind Image Deconvolution Need for Theoretical . . . Based on Sparsity: Need for Improvement

The resurrection of time as a continuous concept in biostatistics, demography and epidemiology

Doubly robust treatment e ff ect estimation with missing attributes E ff ect of tranexamic acid on

Clinical prediction models in the age of artificial intelligence and big data Ewout Steyerberg

Uncovering disassortativity in large scale-free networks Nelly Litvak University of Twente,

Unifying Data Units and Models in (Co-)Clustering C. Biernacki Joint work with A. Lourme 24 e

Zachary B Bischof Fabian B Bustamante Nick Nick Fea Feamst ster er The growth of broadband

Spectral and morphing ensemble Kalman filters Jan Mandel, Jonathan D. Beezley, and Loren Cobb

Program in Clinical Effectiveness (PCE) Agenda Introductions History Demographics