sizes of non-null hypotheses Jennifer Brennan, Ramya Korlakai - PowerPoint PPT Presentation

Estimating the number and effect sizes of non-null hypotheses Jennifer Brennan, Ramya Korlakai Vinayak, Kevin Jamieson jrb@cs.washington.edu ICML 2020

Example: Fruit Fly Genetics Hao et al. (2008) measured the effect of 13,000 fruit fly genes on susceptibility to influenza Measurements were distributed N(0,1) under the null, higher indicates protection from influenza More protection from influenza

Example: Fruit Fly Genetics Hao et al. (2008) measured the effect of 13,000 fruit fly genes on susceptibility to influenza Measurements were distributed N(0,1) under the null, higher indicates protection from influenza Significant Genes Multiple hypothesis testing identifies few discoveries

Example: Fruit Fly Genetics Hao et al. (2008) measured the effect of 13,000 fruit fly genes on susceptibility to influenza Measurements were distributed N(0,1) under the null, higher indicates protection from influenza 𝑂 0, 1 Observed distribution does not match theoretical null

Example: Fruit Fly Genetics Hao et al. (2008) measured the effect of 13,000 fruit fly genes on susceptibility to influenza Measurements were distributed N(0,1) under the null, higher indicates protection from influenza 𝑂 0, 1 Too many small, positive measurements for chance alone Observed distribution does not match theoretical null

Example: Fruit Fly Genetics Hao et al. (2008) measured the effect of 13,000 fruit fly genes on susceptibility to influenza Measurements were distributed N(0,1) under the null, higher indicates protection from influenza 𝑂 0, 1 Too small to claim individual significance Observed distribution does not match theoretical null

Example: Fruit Fly Genetics Idea: These genes can be counted , even though they can’t be identified

Example: Fruit Fly Genetics Our Estimator >7% of genes have effect size >1/4 (at least 8% increase in influenza resistance ) Idea: These genes can be counted , even though they can’t be identified

Example: Fruit Fly Genetics Our Estimator >7% of genes have effect size >1/4 (at least 8% increase in influenza resistance ) >2% of genes have effect size >1 (at least 28% increase in influenza resistance ) Idea: These genes can be counted , even though they can’t be identified

Example: Fruit Fly Genetics Our Estimator >7% of genes have effect size >1/4 (at least 8% increase in influenza resistance ) >2% of genes have effect size >1 (at least 28% increase in influenza resistance ) Enables power analysis for Idea: These genes can be counted , even though they can’t be identified future experimental designs

Example: Fruit Fly Genetics Our Estimator >7% of genes have effect size >1/4 (at least 8% increase in influenza resistance ) Next Experiment: Take precise measurements (e.g., use many replications) to identify these genes >2% of genes have effect size >1 (at least 28% increase in influenza resistance ) Enables power analysis for Idea: These genes can be counted , even though they can’t be identified future experimental designs

Example: Fruit Fly Genetics Our Estimator >7% of genes have effect size >1/4 (at least 8% increase in influenza resistance ) Next Experiment: Take precise measurements (e.g., use many replications) to identify these genes >2% of genes have effect size >1 (at least 28% increase in influenza resistance ) Next Experiment: Take less precise measurements, identify fewer genes Enables power analysis for Idea: These genes can be counted , even though they can’t be identified future experimental designs

Formal problem statement

Formal problem statement We view multiple hypothesis testing from the perspective of learning mixture distributions

Formal problem statement We view multiple hypothesis testing from the perspective of learning mixture distributions For 𝑗 = 1, 2, … , 𝑜 Draw 𝜈 𝑗 ∼ 𝜉 ∗

Formal problem statement We view multiple hypothesis testing from the perspective of learning mixture distributions For 𝑗 = 1, 2, … , 𝑜 Draw 𝜈 𝑗 ∼ 𝜉 ∗ 𝜈 𝑗 is the (unknown) effect size

Formal problem statement We view multiple hypothesis testing from the perspective of learning mixture distributions For 𝑗 = 1, 2, … , 𝑜 Draw 𝜈 𝑗 ∼ 𝜉 ∗ 𝜈 𝑗 is the (unknown) effect size Observe 𝑌 𝑗 ∼ 𝑔(𝜈 𝑗 )

Formal problem statement We view multiple hypothesis testing from the perspective of learning mixture distributions For 𝑗 = 1, 2, … , 𝑜 Draw 𝜈 𝑗 ∼ 𝜉 ∗ 𝜈 𝑗 is the (unknown) effect size Observe 𝑌 𝑗 ∼ 𝑔(𝜈 𝑗 ) E.g. 𝑔 𝜈 𝑗 = 𝑂(𝜈 𝑗 , 1)

Formal problem statement We view multiple hypothesis testing from the perspective of learning mixture distributions For 𝑗 = 1, 2, … , 𝑜 Draw 𝜈 𝑗 ∼ 𝜉 ∗ 𝜈 𝑗 is the (unknown) effect size Observe 𝑌 𝑗 ∼ 𝑔(𝜈 𝑗 ) E.g. 𝑔 𝜈 𝑗 = 𝑂(𝜈 𝑗 , 1) Identification: Which 𝜈 𝑗 > 0 ? Counting: What is the probability 𝑄 𝜈∼𝜉 ∗ (𝜈 > 0) ?

Formal problem statement We view multiple hypothesis testing from the perspective of learning mixture distributions For 𝑗 = 1, 2, … , 𝑜 Draw 𝜈 𝑗 ∼ 𝜉 ∗ 𝜈 𝑗 is the (unknown) effect size Observe 𝑌 𝑗 ∼ 𝑔(𝜈 𝑗 ) E.g. 𝑔 𝜈 𝑗 = 𝑂(𝜈 𝑗 , 1) Identification: Which 𝜈 𝑗 > 0 ? Counting: What is the probability 𝑄 𝜈∼𝜉 ∗ (𝜈 > 𝛿) , for all 𝛿 ?

Formal problem statement We view multiple hypothesis testing from the perspective of learning mixture distributions For 𝑗 = 1, 2, … , 𝑜 Draw 𝜈 𝑗 ∼ 𝜉 ∗ 𝜈 𝑗 is the (unknown) effect size Observe 𝑌 𝑗 ∼ 𝑔(𝜈 𝑗 ) E.g. 𝑔 𝜈 𝑗 = 𝑂(𝜈 𝑗 , 1) Identification: Which 𝜈 𝑗 > 0 ? (Returns a set in [n]) Counting: What is the probability 𝑄 𝜈∼𝜉 ∗ (𝜈 > 𝛿) , for all 𝛿 ? (Returns a fraction)

Formal problem statement We view multiple hypothesis testing from the perspective of learning mixture distributions For 𝑗 = 1, 2, … , 𝑜 Draw 𝜈 𝑗 ∼ 𝜉 ∗ 𝜈 𝑗 is the (unknown) effect size Observe 𝑌 𝑗 ∼ 𝑔(𝜈 𝑗 ) E.g. 𝑔 𝜈 𝑗 = 𝑂(𝜈 𝑗 , 1) Goal Estimate 𝜂 𝜉 ∗ 𝛿 = 𝑄 𝜈∼𝜉 ∗ (𝜈 > 𝛿) , for all 𝛿 Constraint Never overestimate the true fraction

Related work Estimating the number of non-nulls ( 𝜈 ≠ 0 ) Early techniques [Schweder and Spjøtvoll, 1982; Genovese et al., 2004; Meinshausen et al., 2006] relied on uniformity of p-values under the null Techniques do not extend to arbitrary thresholds (“How many genes improved influenza resistance by at least 20%?”) Plug-in estimators Estimate the entire density 𝜉 , then compute 𝑄 𝜉 (𝜈 > 𝛿) Does not respect our constraint , that we cannot overestimate Connections to False Discovery Rate (FDR) control Tighter FDR control can be obtained by knowing number of non-nulls Previous methods either do not satisfy our constraint [Storey, 2002; Li and Barber, 2019] , or perform poorly in our regime of interest (many hypotheses, small effect sizes) [Stephens, 2016; Katsevich and Ramdas, 2018]

Our Estimator

Goal Estimate Our Estimator Constraint Never overestimate Step 1 Consider the empirical CDF (Cumulative Distribution Function)

Goal Estimate Our Estimator Constraint Never overestimate DKW Inequality Step 1 Consider the empirical CDF (Cumulative Distribution Function) Step 2 Generate confidence intervals on the true CDF

Goal Estimate Our Estimator Constraint Never overestimate With high probability, the true CDF lives within this interval DKW Inequality Step 1 Consider the empirical CDF (Cumulative Distribution Function) Step 2 Generate confidence intervals on the true CDF

sizes of non-null hypotheses Jennifer Brennan, Ramya Korlakai - PowerPoint PPT Presentation

Estimating the number and effect sizes of non-null hypotheses Jennifer Brennan, Ramya Korlakai Vinayak, Kevin Jamieson jrb@cs.washington.edu ICML 2020 Example: Fruit Fly Genetics Hao et al. (2008) measured the effect of 13,000 fruit fly genes

Hypotheses with two variates Two sample hypotheses R.W. Oldford Common hypotheses Recall some

13. hypothesis testing 1 competing hypotheses 2 competing hypotheses 3 competing hypotheses

Multiple Tests Reality Null is True Null is False (No effect/relation) (Effect/relation

Hypotheses with two variates Paired data R.W. Oldford Common hypotheses Recall some common

CS 103 Unit 11 Linked Lists Mark Redekopp 2 NULL Pointer Just like there was a null

Verifying Test Hypotheses - HOL/TestGen An Experiment in Test and Proof Thomas Malcher January

The class of perfectly null sets Preliminaries and introduction and its transitive version

Some simple hypotheses to be Some simple hypotheses to be tested by IBOY-DIWPA data Takakazu

Generating Hypotheses by Generating Hypotheses by Discovering Implicit Associations in

Evaluating Hypotheses IEEE Expert, October 1996 1 Evaluating Hypotheses Sample error, true

Business Statistics CONTENTS A hypothesis test Hypotheses Rejection region and significance

Learning Logically Defined Hypotheses Martin Grohe RWTH Aachen Outline I. A Declarative

Fictions Functions: Three Data-Driven Hypotheses Andrew Piper, McGill University How can we

1 Null Hypothesis Alternative Hypotheses AH I: the different spawning aggregations there

Lecture 26/Chapter 22 Hypothesis Tests for Proportions Null and Alternative Hypotheses

Geometry of null hypersurfaces Jacek Jezierski, Uniwersytet Warszawski e-mail:

CS70: Jean Walrand: Lecture 23. Conditional Probability: Review Conditional Probability: Pictures

Towards an Axiomatization of Privacy and Utility Daniel Kifer Bing-Rong Lin Department of

Dynamic Bayesian Influenza Forecasting James Gattiker gatt@lanl.gov David Osthus Los Alamos

Clinical Challenges in Dose Selection for CombinationTherapy 12 May 2017 Mark Pegram, M.D.

Binomial Distribution Binomial Experiment 1 The same experiment is repeated a fixed number of

Life in the Fast Lane: the confluence lens George Varghese, Microsoft Research I drive fast

Distant-supervised Heterogeneous multitask learning for social event forecasting with

Privacy Preserving Record Linkage Linkage Elizabeth Ashley Durham Health Information Privacy