Confounder Adjustment in Multiple Hypothesis Testing Qingyuan Zhao - PowerPoint PPT Presentation

Confounder Adjustment in Multiple Hypothesis Testing Qingyuan Zhao Department of Statistics, Stanford University January 28, 2016 Slides are available at http://web.stanford.edu/~qyzhao/ .

Collaborators Confounder Adjustment Qingyuan Zhao Jingshu Wang Trevor Hastie Art Owen Introduction Background Motivating Examples Previous Work Model and Inference Model and Identifiability Estimation Hypothesis Tests Numerical Examples Summary

Microarray experiments Confounder Adjustment Qingyuan Zhao Introduction Background Motivating Responses: normalized gene expression Examples Previous Work level. Model and Inference Primary variables (variables of interest): Model and Identifiability treatment, disease status, etc. Estimation Hypothesis Tests Control covariates: age, gender, batch, Numerical Examples date, etc. Summary 3/49

Microarray data analysis Confounder Adjustment Biologist: “Which genes are (causally) related to this disease?” Qingyuan Zhao Statistician: “Let me run some analysis.” Introduction Background Two common practices Motivating Examples 1 Sparse regression : regress the primary variable on the Previous Work Model and genes. More common for SNP data and predictive tasks. Inference Model and 2 Association tests/screening (this talk) : for each gene, Identifiability Estimation perform a significance test of correlation with the primary Hypothesis Tests Numerical variable. Examples Summary Statistician: “Here a short list of candidate genes with false discovery rate (FDR) ≤ 20%.” Biologist: “Good, let me validate these discoveries.” 4/49

Concerns Confounder Adjustment J. P. Ioannidis. Why most published research findings are false. Qingyuan Zhao Chance , 18(4):40–47, 2005 Introduction Two major challenges to reproducibility in genetic screening: Background Motivating Examples 1 Correlated tests : Is the FDR still controlled? If not, can Previous Work Model and we correct the statistical analysis? Inference Well studied in the last 15 years [Benjamini and Yekutieli, Model and Identifiability Estimation 2001, Storey et al., 2004, Efron, 2007, Fan et al., 2012]. Hypothesis Tests 2 Confounded tests (this talk) : the individual association Numerical Examples tests are biased in presence of unobserved confounders. Summary Can we still provide a good candidate list? Equally long history [e.g. Alter et al., 2000, Price et al., 2006]. Still many open questions. 5/49

Confounding Confounder Adjustment Brief history Qingyuan Fisher [1935] first uses the term in experiment designs. Zhao Kish [1959] first uses its modern meaning: Introduction Background A mixing of effects of unobserved extraneous factors Motivating Examples (called confounders) with the effect of interest . Previous Work Model and Huge literature, but mostly in causal inference. Inference Model and Identifiability Estimation Aliases for confounders in genetic screening: Hypothesis Tests “systematic ancestry differences” [Price et al., 2006]. Numerical Examples “batch effects” (widely used by biologists). Summary “surrogate variables” [Leek and Storey, 2007, 2008]. “unwanted variation” [Gagnon-Bartsch and Speed, 2012]. “latent effects” [Sun et al., 2012]. 6/49

Example 1: gender study Confounder Adjustment Qingyuan Which genes are more expressed in male/female? Zhao A microarray experiment by Vawter et al. [2004]: Introduction Background Postmortem samples from the brains of 10 individuals. Motivating Examples Previous Work For each individual, 3 samples from different cortices. Model and Inference Each sample is sent to 3 different labs for analysis. Model and Identifiability Two different microarray platforms are used by the labs. Estimation Hypothesis Tests In total, 10 × 3 × 3 = 90 samples. Numerical Examples This example was first used by Gagnon-Bartsch and Speed Summary [2012] to demonstrate the importance to “remove unwanted variation”. 8/49

Screening Confounder Adjustment Qingyuan Zhao Notation Introduction Y : n × p matrix of gene expression. Background Motivating Examples X : n × 1 vector of gender. Previous Work Model and Inference Simplest association test: Model and Identifiability Estimation Regress each column of Y (gene) on X . Hypothesis Tests Numerical Examples In R, run summary(lm(Y ∼ X)) . Summary Equivalent to a two-sample t -test with equal variance. 9/49

Histogram of t-statistics Confounder Adjustment Qingyuan Zhao 6 Introduction Background N(0.055,0.066^2) Motivating Examples Previous Work 4 density Model and Inference Model and Identifiability 2 Estimation Hypothesis Tests Numerical Examples 0 Summary −1.0 −0.5 0.0 0.5 1.0 t−statistics Skewed and very underdispersed. 10/49

What happened? Confounder Adjustment Qingyuan ● Zhao Introduction Background Motivating Examples Previous Work lab ● ● 1 ● Model and ● 2 ● PC2 Inference 3 ● Model and platform Identifiability ● ● ● 0 ● Estimation ● ● ● 1 Hypothesis Tests ● ● ● ● Numerical ● ● ● Examples ● ● Summary ● ● ● ● PC1 11/49

Association test Confounder Adjustment Qingyuan Zhao Notation Y : n × p matrix of gene expression. Introduction Background X : n × 1 vector of gender. Motivating Examples Previous Work Z : n × d matrix of control covariates (lab and platform). Model and Inference Model and Identifiability Modified association test: Estimation Hypothesis Tests Regress each column of Y (gene) on X and Z . Numerical Examples In R, run summary(lm(Y ∼ X+Z)) . Summary Report the significance of the coefficients of X . 12/49

Histogram of t-statistics Confounder Adjustment Qingyuan Zhao 2.0 Introduction Background 1.5 Motivating Examples N(0.043,0.24^2) Previous Work density Model and 1.0 Inference Model and Identifiability Estimation 0.5 Hypothesis Tests Numerical Examples 0.0 Summary −1.0 −0.5 0.0 0.5 1.0 t−statistics Better, but still problematic. Reasonable guess: there are more unobserved confounders! 13/49

Example 2: COPD study Confounder COPD = chronic obstructive pulmonary disease. Adjustment Qingyuan Singh et al. [2011] tried to find genes associated with the Zhao severity of COPD (moderate or severe). Introduction Background Motivating Examples Previous Work 0.15 Model and Inference N(0.024,2.6^2) Model and Identifiability Estimation density 0.10 Hypothesis Tests Numerical Examples 0.05 Summary 0.00 −5 0 5 t−statistics Overdispersed and skewed. 14/49

Example 3: Mutual fund selection Confounder Adjustment Barras et al. [2010] used the following model to select mutual Qingyuan Zhao funds: Introduction Background Y it = α i + γ T i Z t + e it , i = 1 , . . . , n , t = 1 , . . . , p . Motivating Examples Previous Work Y it : observed log-return of fund i at time t . Model and Inference Model and α i : risk-adjusted return (Goal: find funds with positive α ). Identifiability Estimation Z t : systematic risk factors. Hypothesis Tests Numerical Examples They assumed: Summary α is sparse (Berk and Green equilibrium); No unobserved risk factors (is that possible/necessary?). 15/49

Idea 0: Remove the largest principal component(s) Confounder Adjustment Qingyuan Zhao EIGENSTRAT [Price et al., 2006] Introduction Regression model: Background Motivating Examples Previous Work Y n × p = X n × 1 β T p × 1 + Z n × r Γ T p × r + E n × p , Model and Inference where Z is the first r PC(s) of Y . Model and Identifiability Estimation Hypothesis Tests Numerical Motivation: in SNP, the largest PC(s) usually correspond Examples to ancestry difference. Summary Weakness: can easily remove true signals. 17/49

Idea 1: Use control genes Confounder Adjustment Same regression model: Qingyuan Zhao Y n × p = X n × 1 β T p × 1 + Z n × r Γ T p × r + E n × p , Introduction Background Motivating Examples RUV2 [Gagnon-Bartsch and Speed, 2012] Previous Work Model and If we know β C = 0 (negative controls), Inference Model and Identifiability 1 Run PCA on col C ( Y ) to obtain Z . Estimation Hypothesis Tests 2 Run the regression for col - C ( Y ). Numerical Examples Summary Example: bacterial RNAs (spike-in controls). Limited to the availability and number of negative controls. 18/49

Idea 2: Sparsity Confounder Adjustment Same regression model: Qingyuan Zhao Y n × p = X n × 1 β T p × 1 + Z n × r Γ T p × r + E n × p , Introduction Background Motivating Examples Idea: If β contains actual effects, it should be a sparse vector. Previous Work Model and Inference SVA [Leek and Storey, 2008] Model and Identifiability Iterate between Estimation Hypothesis Tests 1 Weighted PCA on Y (based on how likely β = 0 ). Numerical Examples 2 Regress Y on X and the estimated PCs. Summary Does not always converge. 19/49

Idea 2: Sparsity Confounder Adjustment Qingyuan Zhao Same regression model: Introduction Background Y n × p = X n × 1 β T p × 1 + Z n × r Γ T Motivating p × r + E n × p , Examples Previous Work Model and Idea: If β contains actual effects, it should be a sparse vector. Inference Model and Identifiability LEAPP [Sun, Zhang, and Owen, 2012] Estimation Hypothesis Tests 1 Run PCA on the residuals of Y ∼ X . Numerical Examples 2 Run a sparse regression. Summary 20/49

Confounder Adjustment in Multiple Hypothesis Testing Qingyuan Zhao - PowerPoint PPT Presentation

Confounder Adjustment in Multiple Hypothesis Testing Qingyuan Zhao Department of Statistics, Stanford University January 28, 2016 Slides are available at http://web.stanford.edu/~qyzhao/ . Collaborators Confounder Adjustment Qingyuan Zhao

Confounder adjustment in large-scale linear structural models Qingyuan Zhao Department of

Teaching Confounder-Based Statistical Literacy 19 June, 2019 1 2 2019 Univ. New Mexico 2019

STAT 113 Hypothesis Testing I Colin Reimer Dawson Oberlin College October 5, 2017 1 / 17

Covariate Adjustment and Statistical Power Tara Slough EGAP Learning Days X Covariate Adjustment

Chapter 6 Hypothesis Testing What is Hypothesis Testing? the use of statistical

Chapter 6 Hypothesis Testing What is Hypothesis Testing? the use of statistical

STAT 215 Hypothesis Testing I Colin Reimer Dawson Oberlin College September 7, 2017 1 / 14

CME/STATS 195 CME/STATS 195 Lecture 7: Hypothesis Testing and Lecture 7: Hypothesis Testing and

Gov 2000: 6. Hypothesis Testing Matthew Blackwell October 11, 2016 1 / 55 1. Hypothesis

Cluster Validity Hypothesis Random Graph Hypothesis Random Label Hypothesis Relative Criteria

Testing Specification testing Michel Bierlaire Introduction to choice models Differences from

RISK ADJUSTMENT DOCUMENTATION & CODING 1 DEFINE RISK ADJUSTMENT Define Risk Adjustment and

Hypothesis Testing Mark Lunt Centre for Epidemiology Versus Arthritis University of Manchester

Hypothesis tests with binomial example STAT 587 (Engineering) Iowa State University October 2,

t -tests STAT 587 (Engineering) Iowa State University October 2, 2020 Statistical hypothesis

Hypothesis Testing for a Proportion August 21, 2019 August 21, 2019 1 / 64 Hypothesis Testing

Predicting Disease-related Genes using Integrated Biomedical Networks Jiajie Peng

Efficient Scaling Up of Parallel Graph Algorithms for Genome-Scale Biological Problems on Cray

Genetics of Human Consultant: InCarda Atrial Fibrillation Advisory Board: Janssen UC

Drawing Tree-Based Phylogenetic Networks with Minimum Number of Crossings Jonathan Klawitter

Evolutionary Computation Computational Procedures patterned after biological evolution

1 Problem: the DNA sequence alone does not directly inform us about phenotype We have much work

Balls, sticks, triangles and molecules Frederic.Cazals@sophia.inria.fr Algorithms - Biology -

Homework 2 MLE and Naive Bayes Instructions Answer the questions and upload your answers to

Sambuz

Useful Links

Newsletter

Mail Us

Confounder Adjustment in Multiple Hypothesis Testing Qingyuan Zhao - PowerPoint PPT Presentation

Confounder Adjustment in Multiple Hypothesis Testing Qingyuan Zhao Department of Statistics, Stanford University January 28, 2016 Slides are available at http://web.stanford.edu/~qyzhao/ . Collaborators Confounder Adjustment Qingyuan Zhao

Confounder adjustment in large-scale linear structural models Qingyuan Zhao Department of

Teaching Confounder-Based Statistical Literacy 19 June, 2019 1 2 2019 Univ. New Mexico 2019

STAT 113 Hypothesis Testing I Colin Reimer Dawson Oberlin College October 5, 2017 1 / 17

Covariate Adjustment and Statistical Power Tara Slough EGAP Learning Days X Covariate Adjustment

Chapter 6 Hypothesis Testing What is Hypothesis Testing? the use of statistical

Chapter 6 Hypothesis Testing What is Hypothesis Testing? the use of statistical

STAT 215 Hypothesis Testing I Colin Reimer Dawson Oberlin College September 7, 2017 1 / 14

CME/STATS 195 CME/STATS 195 Lecture 7: Hypothesis Testing and Lecture 7: Hypothesis Testing and

Gov 2000: 6. Hypothesis Testing Matthew Blackwell October 11, 2016 1 / 55 1. Hypothesis

Cluster Validity Hypothesis Random Graph Hypothesis Random Label Hypothesis Relative Criteria

Testing Specification testing Michel Bierlaire Introduction to choice models Differences from

RISK ADJUSTMENT DOCUMENTATION &amp; CODING 1 DEFINE RISK ADJUSTMENT Define Risk Adjustment and

Hypothesis Testing Mark Lunt Centre for Epidemiology Versus Arthritis University of Manchester

Hypothesis tests with binomial example STAT 587 (Engineering) Iowa State University October 2,

t -tests STAT 587 (Engineering) Iowa State University October 2, 2020 Statistical hypothesis

Hypothesis Testing for a Proportion August 21, 2019 August 21, 2019 1 / 64 Hypothesis Testing

Predicting Disease-related Genes using Integrated Biomedical Networks Jiajie Peng

Efficient Scaling Up of Parallel Graph Algorithms for Genome-Scale Biological Problems on Cray

Genetics of Human Consultant: InCarda Atrial Fibrillation Advisory Board: Janssen UC

Drawing Tree-Based Phylogenetic Networks with Minimum Number of Crossings Jonathan Klawitter

Evolutionary Computation Computational Procedures patterned after biological evolution

1 Problem: the DNA sequence alone does not directly inform us about phenotype We have much work

Balls, sticks, triangles and molecules Frederic.Cazals@sophia.inria.fr Algorithms - Biology -

Homework 2 MLE and Naive Bayes Instructions Answer the questions and upload your answers to

Sambuz

Useful Links

Newsletter

Mail Us

RISK ADJUSTMENT DOCUMENTATION & CODING 1 DEFINE RISK ADJUSTMENT Define Risk Adjustment and