Hypothesis Testing: Summarizing Information about Causal Effects - PowerPoint PPT Presentation

Hypothesis Testing: Summarizing Information about Causal Effects Fill In Your Name 30 October 2020 1/56

On Testing Many Hypotheses Learning about the causal effects of multiple treatment arms or on multiple outcomes 2/56

Key Points for this lecture Statistical inference (e.g., hypothesis tests and confidence intervals) requires inference — reasoning about the unobserved. p -values require probability distributions. Randomization (or Design) + a Hypothesis + a Test Statistic Function → probability distributions representing the hypothesis (reference distributions) And Observed Values of Test Statistics + Reference Distribution → p -value. 3/56

The Role of Hypothesis Tests in Causal Inference I The Fundamental Problem of Causal Inference says that we can see only one potential outcome for any given unit. So, a counterfactual causal effect of the treatment, Z , for Jake is occurs when y Jake , Z =1 � = y Jake , Z =0 , then how can we learn about the causal effect? One solution is Estimation of Averages of Causal Effects (the ATE, ITT, LATE). This is what we call Neyman’s approach. 4/56

The Role of Hypothesis Tests in Causal Inference II Another solution is to make claims or guesses about the causal effects. We could say, “I think that the effect on Jake is 5.” or “This experiment had no effect on anyone.” And then we ask “How much evidence does this experiment have about that claim?” This evidence is summarized in a p -value. We call this Fisher’s approach. 5/56

The Role of Hypothesis Tests in Causal Inference III Notice: The hypothesis testing approach to causal inference doesn’t provide a best guess but instead tells you about evidence or information about a best guess. Meanwhile, the estimation approach provides a best guess but doesn’t tell you how much you know about that guess. Both approaches can converge and we nearly always report both: “Our best guess of the treatment effect was 5, and we could reject the idea that the effect was 0 ( p =.01).” 6/56

Ingredients of a hypothesis test ◮ A hypothesis is a statement about a relationship among potential outcomes (Strong or Weak) TODO: Tara asks whether we need (Strong or Weak) here. ◮ A test statistic summarizes the relationship between treatment and observed outcomes. ◮ The design allows us to link the hypothesis and the test statistic: calculate a test statistic that describes a relationship between potential outcomes. ◮ The design also generates a distribution of possible test statistics implied by the hypothesis. ◮ A p -value describes the relationship between our observed test statistic and the possible hypothesized test statistics. 7/56

A hypothesis is a statement about or model of a relationship between potential outcomes TODO: Tara would like col names to be more informative (Observed outcomes, treatment, potential outcomes, ITE, etc.) Y Z y0 tau y1 Ybin 0 0 0 10 10 0 30 1 0 30 30 0 0 0 0 200 200 0 1 0 1 90 91 0 11 1 1 10 11 0 23 1 3 20 23 0 34 1 4 30 34 0 45 1 5 40 45 0 190 0 190 90 280 1 200 0 200 20 220 1 For example, the sharp, or strong, null hypothesis of no effects is H 0 : y i , 1 = y i , 0 8/56

Test statistics summarize treatment to outcome relationships ## The mean difference test statistic meanTZ <- function (ys, z) { mean (ys[z == 1]) - mean (ys[z == 0]) } ## The difference of mean ranks test statistic meanrankTZ <- function (ys, z) { ranky <- rank (ys) mean (ranky[z == 1]) - mean (ranky[z == 0]) } observedMeanTZ <- meanTZ (ys = Y, z = Z) observedMeanRankTZ <- meanrankTZ (ys = Y, z = Z) observedMeanTZ [1] -49.6 observedMeanRankTZ [1] 1 9/56

Linking test statistic and hypothesis. What we observe for each person i ( Y i ) is either what we would have observed in treatment ( y i , 1 ) or what we would have observed in control ( y i , 0 ). Y i = Z i y i , 1 + (1 − Z i ) ∗ y i , 0 So, if y i , 1 = y i , 0 then Y i = y i , 0 . What we actually observe is what we would have observed in the control condition. 10/56

Generating the distribution of hypothetical test statistics We need to know how to repeat our experiment: Then we repeat it, calculating the implied test statistic each time: 11/56

Plot the randomization distributions under the null Distributions of Test Statistics Consistent with the Design and H0 : y i1 = y i0 0.04 Observed Test Statistic 0.4 0.03 0.3 Observed Test Statistic Density Density 0.02 0.2 0.01 0.1 0.00 0.0 −100 −50 0 50 100 −10 −5 0 5 10 Mean Differences Consistent with H0 Mean Difference of Ranks Consistent with H0 Figure 1: An example of using the design of the experiment to test a hypothesis. 12/56

p -values summarize the plots pMeanTZ <- mean (possibleMeanDiffsH0 >= observedMeanTZ) pMeanRankTZ <- mean (possibleMeanRankDiffsH0 >= observedMeanRankTZ) pMeanTZ [1] 0.7785 pMeanRankTZ [1] 0.3198 13/56

How to do this in R. ## using the coin package library (coin) set.seed (12345) pMean2 <- pvalue ( oneway_test (Y ~ factor (Z), data = dat, distribution = approximate (nresample dat $ rankY <- rank (dat $ Y) pMeanRank2 <- pvalue ( oneway_test (rankY ~ factor (Z), data = dat, distribution = approximate pMean2 [1] 0.451 99 percent confidence interval: 0.4103 0.4922 pMeanRank2 [1] 0.636 99 percent confidence interval: 0.5957 0.6750 ## using a development version of the RItools package library (devtools) dev_mode () install_github ("markmfredrickson/RItools@randomization-distribution", force = TRUE) checking for file ‘/private/tmp/RtmpO1J5jX/remotes17256752f6f9c/markmfredrickson-RItools-94a2e - preparing ‘RItools’: checking DESCRIPTION meta-information ... v checking DESCRIPTION meta-information - excluding invalid files 14/56 Subdirectory ' man ' contains invalid file names:

How to do this in R. ## using the ri2 package library (ri2) thedesign <- declare_ra (N = N) pMean4 <- conduct_ri (Y ~ Z, declaration = thedesign, sharp_hypothesis = 0, data = dat, sims = 1000 ) summary (pMean4) term estimate two_tailed_p_value 1 Z -49.6 0.4444 pMeanRank4 <- conduct_ri (rankY ~ Z, declaration = thedesign, sharp_hypothesis = 0, data = dat, sims = 1000 ) summary (pMeanRank4) term estimate two_tailed_p_value 1 Z 1 0.6349 15/56

Next topics: ◮ Testing weak null hypotheses H 0 : ¯ y 1 = ¯ y 0 ◮ Rejecting null hypotheses (and making false positive and/or false negative errors) ◮ Power of hypothesis tests ◮ Maintaining correct false positive error rates when testing more than one hypothesis. 16/56

Testing the weak null of no average effects The weak null hypothesis is a claim about aggregates, and is nearly always stated in terms of averages: H 0 : ¯ y 1 = ¯ y 0 The test statistic for this hypothesis nearly always is the difference of means (i.e., meanTZ() above). Z 0.3321 0.3587 0.3587 Why is the OLS p -value different? What assumptions is it making? 17/56

Testing the weak null of no average effects TODO: Add caption 200 150 100 Y 50 0 0 1 Z 18/56

Testing the weak null of no average effects observedTestStat stderror tstat pval -49.6000 48.0448 -1.0324 0.3321 19/56

Rejecting hypotheses and making errors How should we interpret p =0.7785? What about p =0.3198? TODO: Define α What does it mean to “reject” H 0 : y i , 1 = y i , 2 at α = . 05? “In typical use, the level of the test [ α ] is a promise about the test’s performance and the size is a fact about its performance. . . ” (Rosenbaum 2010, Glossary) 20/56

Decision imply errors If errors are necessary, how can we diagnose them? How do we learn whether our hypothesis testing procedure might generate too many false positive errors? Diagnose by simulation: 21/56

Diagnosing false positive rates by simulation Across repetitions of the design: ◮ Create a true null hypothesis. ◮ Test the true null. ◮ The p -value should be large. The proportion of small p -values should be no larger than α . 22/56

Diagnosing false positive rates by simulation Example with a binary outcome. collectPValues <- function (y, z, thedistribution = exact ()) { ## Make Y and Z have no relationship by re-randomizing Z newz <- repeatExperiment ( length (y)) thelm <- lm (y ~ newz, data = dat) ttestP2 <- difference_in_means (y ~ newz, data = dat) owP <- pvalue ( oneway_test (y ~ factor (newz), distribution = thedistribution)) ranky <- rank (y) owRankP <- pvalue ( oneway_test (ranky ~ factor (newz), distribution = thedistribution)) return ( c ( lmp = summary (thelm) $ coef["newz", "Pr(>|t|)"], neyp = ttestP2 $ p.value[[1]], rtp = owP, rtpRank = owRankP )) } 23/56

Diagnosing false positive rates by simulation lmp neyp rtp rtpRank [1,] 2225 2225 2225 2225 [2,] 2775 2775 2775 2775 24/56

Diagnosing false positive rates by simulation lmp neyp rtp rtpRank 0 0 0 0 lmp neyp rtp rtpRank 0.445 0.445 0.000 0.000 25/56

Diagnosing false positive rates by simulation TODO: Add caption 1.0 OLS Neyman Rand Inf Mean Diff Rand Inf Mean Diff Ranks 0.8 Proportion p−values < p 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 p−value=p 26/56

False positive rate with N = 60 and binary outcome TODO: Add caption 1.0 OLS Neyman Rand Inf Mean Diff Rand Inf Mean Diff Ranks 0.8 Proportion p−values < p 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 p−value=p 27/56

Hypothesis Testing: Summarizing Information about Causal Effects - PowerPoint PPT Presentation

Hypothesis Testing: Summarizing Information about Causal Effects Fill In Your Name 30 October 2020 1/56 On Testing Many Hypotheses Learning about the causal effects of multiple treatment arms or on multiple outcomes 2/56 Key Points for this

Causal Effect Evaluation and Causal Network Learning Zhi Geng Peking University, China June

STAT 113 Hypothesis Testing I Colin Reimer Dawson Oberlin College October 5, 2017 1 / 17

Political Science 209 - Fall 2018 Causal Inference Florian Hollenbach 7th September 2018 Causal

Foundations of Causal Discovery Frederick Eberhardt KDD Causality Workshop 2016 Causal Discovery

Chapter 6 Hypothesis Testing What is Hypothesis Testing? the use of statistical

Chapter 6 Hypothesis Testing What is Hypothesis Testing? the use of statistical

STAT 215 Hypothesis Testing I Colin Reimer Dawson Oberlin College September 7, 2017 1 / 14

CME/STATS 195 CME/STATS 195 Lecture 7: Hypothesis Testing and Lecture 7: Hypothesis Testing and

Gov 2000: 6. Hypothesis Testing Matthew Blackwell October 11, 2016 1 / 55 1. Hypothesis

Cluster Validity Hypothesis Random Graph Hypothesis Random Label Hypothesis Relative Criteria

Testing Specification testing Michel Bierlaire Introduction to choice models Differences from

Hypothesis Testing Mark Lunt Centre for Epidemiology Versus Arthritis University of Manchester

Hypothesis tests with binomial example STAT 587 (Engineering) Iowa State University October 2,

t -tests STAT 587 (Engineering) Iowa State University October 2, 2020 Statistical hypothesis

Causal Inference By: Miguel A. Hern an and James M. Robins Part I: Causal inference without

Causal Programming Causal Programming Joshua Brul Joshua Brul

Latent variables Michel Bierlaire Transport and Mobility Laboratory School of Architecture,

Present additional quantitative data Present preliminary report Solicit feedback on

Pr Presentat ation Schedule Monday May 4: 1:10: Breunig, Hoffland, Lammer, Noel 1:30: Ahola,

LECTURE 4 LECTURE 4 Introduction to the Parton Model and Perturbative QCD Fred Olness (SMU)

Large deviations and heterogeneities in driven or non-driven kinetically constrained models

Mindfulness, Wisdom, Sati in Pali & Compassion in Connotes awareness , attention ,

Advisory Council Meeting March 27, 2015 Welcome & Introductions Bob Zlotnik Chair Member

Ends of moduli spaces of Higgs bundles Hartmut Weiss (CAU Kiel) joint with Rafe Mazzeo, Jan

Hypothesis Testing: Summarizing Information about Causal Effects - PowerPoint PPT Presentation

Hypothesis Testing: Summarizing Information about Causal Effects Fill In Your Name 30 October 2020 1/56 On Testing Many Hypotheses Learning about the causal effects of multiple treatment arms or on multiple outcomes 2/56 Key Points for this

Causal Effect Evaluation and Causal Network Learning Zhi Geng Peking University, China June

STAT 113 Hypothesis Testing I Colin Reimer Dawson Oberlin College October 5, 2017 1 / 17

Political Science 209 - Fall 2018 Causal Inference Florian Hollenbach 7th September 2018 Causal

Foundations of Causal Discovery Frederick Eberhardt KDD Causality Workshop 2016 Causal Discovery

Chapter 6 Hypothesis Testing What is Hypothesis Testing? the use of statistical

Chapter 6 Hypothesis Testing What is Hypothesis Testing? the use of statistical

STAT 215 Hypothesis Testing I Colin Reimer Dawson Oberlin College September 7, 2017 1 / 14

CME/STATS 195 CME/STATS 195 Lecture 7: Hypothesis Testing and Lecture 7: Hypothesis Testing and

Gov 2000: 6. Hypothesis Testing Matthew Blackwell October 11, 2016 1 / 55 1. Hypothesis

Cluster Validity Hypothesis Random Graph Hypothesis Random Label Hypothesis Relative Criteria

Testing Specification testing Michel Bierlaire Introduction to choice models Differences from

Hypothesis Testing Mark Lunt Centre for Epidemiology Versus Arthritis University of Manchester

Hypothesis tests with binomial example STAT 587 (Engineering) Iowa State University October 2,

t -tests STAT 587 (Engineering) Iowa State University October 2, 2020 Statistical hypothesis

Causal Inference By: Miguel A. Hern an and James M. Robins Part I: Causal inference without

Causal Programming Causal Programming Joshua Brul Joshua Brul

Latent variables Michel Bierlaire Transport and Mobility Laboratory School of Architecture,

Present additional quantitative data Present preliminary report Solicit feedback on

Pr Presentat ation Schedule Monday May 4: 1:10: Breunig, Hoffland, Lammer, Noel 1:30: Ahola,

LECTURE 4 LECTURE 4 Introduction to the Parton Model and Perturbative QCD Fred Olness (SMU)

Large deviations and heterogeneities in driven or non-driven kinetically constrained models

Mindfulness, Wisdom, Sati in Pali &amp; Compassion in Connotes awareness , attention ,

Advisory Council Meeting March 27, 2015 Welcome &amp; Introductions Bob Zlotnik Chair Member

Ends of moduli spaces of Higgs bundles Hartmut Weiss (CAU Kiel) joint with Rafe Mazzeo, Jan

Mindfulness, Wisdom, Sati in Pali & Compassion in Connotes awareness , attention ,

Advisory Council Meeting March 27, 2015 Welcome & Introductions Bob Zlotnik Chair Member