Gov 2000: 6. Hypothesis Testing Matthew Blackwell October 11, 2016 - PowerPoint PPT Presentation

Gov 2000: 6. Hypothesis Testing Matthew Blackwell October 11, 2016 1 / 55

1. Hypothesis Testing Examples 2. Hypothesis Test Nomenclature 3. Conducting Hypothesis Tests 4. p-values 5. Power Analyses 6. Exact Inference* 7. Wrap up 2 / 55

Where are we? Where are we going? population parameter, drawing on our knowledge of probability. values of the parameter in the confjdence interval. about the data. the term! 3 / 55 • Last few weeks = how to produce a best estimate of some • Also learned how to derive an estimated range of plausible • Now: how to use our estimates to test a particular hypothesis • We’ll draw heavily on our probability knowledge from earlier in

1/ Hypothesis Testing Examples 4 / 55

The lady tasting tea Your advisor asks you to grab a tea with milk for him before your meeting and he says that he prefers tea poured before the milk. You stop by Darwin’s and ask for a tea with milk. When you bring it to your advisor, he complains that it was prepared milk-fjrst. devise a test: 5 / 55 • Remember the setup: • You are skeptical that he can really tell the difgerence, so you ▶ Prepare 8 cups of tea, 4 milk-fjrst, 4 tea-fjrst ▶ Present cups to advisor in a random order ▶ Ask advisor to pick which 4 of the 8 were milk-fjrst.

Assuming we know the truth correct if she were guessing randomly? probability. 1 Another testing example 6 / 55 • Advisor picks out all 4 milk-fjrst cups correctly! • Statistical thought experiment: how often would she get all 4 ▶ Only one way to choose all 4 correct cups. ▶ But 70 ways of choosing 4 cups among 8. ▶ Choosing at random ≈ picking each of these 70 with equal • Chances of guessing all 4 correct is 70 ≈ 0.014 or 1.4%. • ⇝ the guessing at random hypothesis might be implausible.

Social pressure effect 7 / 55

Social pressure effect load("../data/gerber_green_larimer.RData") "Neighbors"]) "Civic Duty"]) neigh.mean - contr.mean ## [1] 0.0634 due to random chance. treatment efgect at all? 8 / 55 social$voted <- 1 * (social$voted == "Yes") neigh.mean <- mean(social$voted[social$treatment == contr.mean <- mean(social$voted[social$treatment == • Treatment efgect of 6.341 percentage points. • But we know that the estimator varies from sample to sample • Could this happen by random chance if there was no

Review of the difference in means ̂ 𝑜 𝑦 𝑦 𝑜 𝑧 𝑧 se [̂ 9 / 55 and population variance 𝜏 2 𝑦 𝑧 and population variance 𝜏 2 • Treated group 𝑍 1 , 𝑍 2 , … , 𝑍 𝑜 𝑧 i.i.d. with population mean 𝜈 𝑧 • Control group 𝑌 1 , 𝑌 2 , … , 𝑌 𝑜 𝑦 i.i.d. with population mean 𝜈 𝑦 • Quantity of interest: population difgerences in average turnout: 𝔽[𝑍 𝑗 ] − 𝔽[𝑌 𝑗 ] = 𝜈 𝑧 − 𝜈 𝑦 • Estimator: sample difgerence in means: ̂ 𝐸 𝑜 = 𝑍 𝑜 𝑧 − 𝑌 𝑜 𝑦 • We estimated the standard error of ̂ 𝐸 𝑜 with: + 𝑇 2 𝐸 𝑜 ] = √𝑇 2

2/ Hypothesis Test Nomenclature 10 / 55

What is a hypothesis test? about the population distribution. see under this assumption. under it. 11 / 55 • A hypothesis test is an evaluation of a particular hypothesis • Statistical thought experiments: ▶ Assume we know (part of) the true DGP.. ▶ Use tools of probability to see what types of data we should ▶ Compare our observed data to this thought experiment. • Statistical proof by contradiction: ▶ We will “reject” the assumed DGP if the data is too unusual

What is a hypothesis? parameters. turnout higher in social pressure group compared to Civic Duty group?) issues? (voting behavior difgerent among members of Congress with daughters?) treaty signers?) 12 / 55 • Defjnition A hypothesis is just a statement about population • We might have hypotheses about causal inferences: ▶ Does social pressure induce higher voter turnout? (mean ▶ Do daughters cause politicians to be more liberal on women’s ▶ Do treaties constrain countries? (behavior difgerent among • We might also have hypotheses about other parameters: ▶ Is the share of Hillary Clinton supporters more than 50%? ▶ Are traits of treatment and control groups difgerent?

Null and alternative hypotheses value for a population parameter. hypothesis is the research claim we are interested in supporting. 13 / 55 • Defjntion The null hypothesis is a proposed, conservative ▶ This is usually “no efgect/difgerence/relationship.” ▶ We denote this hypothesis as 𝐼 0 ∶ 𝜄 = 𝜄 0 . ▶ 𝐼 0 : Social pressure doesn’t afgect turnout ( 𝐼 0 ∶ 𝜈 𝑧 − 𝜈 𝑦 = 0 ) • Defjnition The alternative hypothesis for a given null ▶ Usually, “there is a relationship/difgerence/efgect.” ▶ We denote this as 𝐼 𝑏 ∶ 𝜄 ≠ 𝜄 0 . ▶ 𝐼 𝑏 : Social pressure afgects turnout ( 𝐼 𝑏 ∶ 𝜈 𝑧 − 𝜈 𝑦 ≠ 0 ) • Always mutually exclusive

General framework hypothesis based on the data we observe. 𝑈 under the null. 14 / 55 • A hypothesis test chooses whether or not to reject the null • Rejection based on a test statistic, 𝑈 𝑜 = 𝑈(𝑍 1 , … , 𝑍 𝑜 ) . ▶ Will help us adjudicate between the null and the alternative. ▶ Typically: larger values of 𝑈 𝑜 ⇝ null less plausible. ▶ A test statistic is a r.v. • Defjnition The null/reference distribution is the distribution of ▶ We’ll write its probabilities as ℙ 0 (𝑈 𝑜 ≤ 𝑢) .

Test statistic example ̂ population difg-in-means is not plausible. → 𝑂(0, 1) 𝑒 𝐸 𝑜 ] se [̂ ̂ 𝐸 𝑜 15 / 55 → 𝑂(0, 1) 𝑒 𝐸] se [̂ ̂ ̂ means has a standard normal distribution in large samples: • By the CLT, we know that the standardized difgerence in 𝐸 𝑜 − (𝜈 𝑧 − 𝜈 𝑦 ) 𝑈 𝑜 = • Under the null hypothesis of 𝐼 0 ∶ 𝜈 𝑧 − 𝜈 𝑦 = 0 , then we have 𝑈 𝑜 = • If 𝑈 𝑜 is very far from 0 ⇝ large sample difg-in-means ⇝ no

Rejection regions for which we reject the null. the null. null 16 / 55 • Defjnition The rejection region, 𝑆 , contains the values of 𝑈 𝑜 ▶ These are the areas that indicate that there is evidence against • Two-sided alternative (our focus): ▶ 𝐼 0 ∶ 𝜈 𝑧 − 𝜈 𝑦 = 0 and 𝐼 𝑏 ∶ 𝜈 𝑧 − 𝜈 𝑦 ≠ 0 ▶ Implies that 𝑈 𝑜 >> 0 or 𝑈 𝑜 << 0 will be evidence against the ▶ Rejection regions: |𝑈 𝑜 | > 𝑑 for some value 𝑑 • How to determine these regions?

Type I and Type II errors Type I errors A Type I error is when we reject the null hypothesis when it is in fact true. Type II errors A Type II error is when we fail to reject the null hypothesis when it is false. discerning. 17 / 55 • We say that the Lady is discerning when she is just guessing. • A false discovery (very bad, thus type I). • We say that the Lady is just guessing when she is truly • An undetected fjnding (not as bad, thus type II).

Test level/size Good stufg! to discovery 1,750,000 1 there a Type I error. 18 / 55 Type I error Reject 𝐼 0 Type II error Awesome! Retain 𝐼 0 𝐼 0 True 𝐼 0 False • Defjntion The level/size of the test, or 𝛽 , is the probability of ▶ With two-sided alternative, we reject when |𝑈 𝑜 | > 𝑑 ▶ Size of test then is: ℙ 0 (|𝑈 𝑜 | > 𝑑) = 𝛽 • Choose a level 𝛽 based on aversion to false discovery: ▶ Convention in social sciences is 𝛽 = 0.05 , but nothing magical ▶ Particle physicists at CERN use 𝛽 ≈ ▶ Lower values of 𝛽 guard against “fmukes” but increase barriers

3/ Conducting Hypothesis Tests 19 / 55

Hypothesis testing procedure 1. Choose null and alternative hypotheses 2. Choose a test statistic, 𝑈 𝑜 3. Choose a level, 𝛽 4. Determine rejection region 20 / 55 5. Reject if 𝑈 𝑜 in rejection region, fail to reject otherwise

Rejection region the rejection region only 5% of the time. normal! 21 / 55 0.5 Reject Reject Retain 0.4 0.3 P 0 ( T ) 0.2 0.1 0.0 -c c -4 -2 0 2 4 T under the null hypothesis • What’s the rejection region |𝑈 𝑜 | > 𝑑 if 𝛽 = 0.05 ? • Under the null hypothesis of no efgect, we want 𝑈 𝑜 to be in ▶ ⇝ false rejection of the null only 5% of the time. ▶ Can fjnd 𝑑 based on the null distribution being ≈ standard

Determining the rejection region 22 / 55 0.5 Reject Reject Retain 0.4 0.3 P 0 ( T ) 0.2 0.1 α 2 α 2 0.0 − c = z α 2 c = z α 2 -4 -2 0 2 4 T under the null hypothesis • Find 𝑨 𝛽/2 such that ℙ 0 (𝑈 𝑜 < −𝑨 𝛽/2 ) = ℙ 0 (𝑈 𝑜 > 𝑨 𝛽/2 ) = 𝛽/2

Determining the rejection region 23 / 55 0.5 Reject Reject Retain 0.4 0.3 P 0 ( T ) 0.2 0.1 1 − α 2 α 2 0.0 − c = − z α 2 c = z α 2 -4 -2 0 2 4 T under the null hypothesis • Find 𝑨 𝛽/2 such that ℙ 0 (𝑈 𝑜 < −𝑨 𝛽/2 ) = ℙ 0 (𝑈 𝑜 > 𝑨 𝛽/2 ) = 𝛽/2 • ⇝ fjnd quantile ℙ 0 (𝑈 𝑜 < 𝑨 𝛽/2 ) = 1 − 𝛽/2 ▶ if 𝛽 = 0.05 ⇝ 𝑨 𝛽/2 = qnorm(1-0.05/2) = 1.96

Final hypothesis test 𝐸 𝑜 /̂ se [̂ 𝐸 𝑜 ] 3. Use 𝛽 = 0.05 4. Rejection region is |𝑈 𝑜 | > 1.96 . 24 / 55 1. Hypotheses: 𝐼 0 ∶ 𝜈 𝑧 − 𝜈 𝑦 = 0 vs. 𝐼 𝑏 ∶ 𝜈 𝑧 − 𝜈 𝑦 ≠ 0 2. Test statistic: 𝑈 𝑜 = ̂

Gov 2000: 6. Hypothesis Testing Matthew Blackwell October 11, 2016 - PowerPoint PPT Presentation

Gov 2000: 6. Hypothesis Testing Matthew Blackwell October 11, 2016 1 / 55 1. Hypothesis Testing Examples 2. Hypothesis Test Nomenclature 3. Conducting Hypothesis Tests 4. p-values 5. Power Analyses 6. Exact Inference* 7. Wrap up 2 / 55

STAT 113 Hypothesis Testing I Colin Reimer Dawson Oberlin College October 5, 2017 1 / 17

Chapter 6 Hypothesis Testing What is Hypothesis Testing? the use of statistical

Chapter 6 Hypothesis Testing What is Hypothesis Testing? the use of statistical

STAT 215 Hypothesis Testing I Colin Reimer Dawson Oberlin College September 7, 2017 1 / 14

CME/STATS 195 CME/STATS 195 Lecture 7: Hypothesis Testing and Lecture 7: Hypothesis Testing and

Cluster Validity Hypothesis Random Graph Hypothesis Random Label Hypothesis Relative Criteria

Testing Specification testing Michel Bierlaire Introduction to choice models Differences from

Hypothesis Testing Mark Lunt Centre for Epidemiology Versus Arthritis University of Manchester

Hypothesis tests with binomial example STAT 587 (Engineering) Iowa State University October 2,

t -tests STAT 587 (Engineering) Iowa State University October 2, 2020 Statistical hypothesis

4th Quarter 2000 4th Quarter 2000 November 28, 2000 November 28, 2000 Investor Community

Testing 6.1 Specification testing Michel Bierlaire A short reminder on hypothesis testing

Hypothesis testing get data that differ from the null hypothesis. If the data would be quite

Lecture 4: Hypothesis Testing Ani Manichaikul amanicha@jhsph.edu 20 April 2007 1 / 69 Steps of

Wild fires 1950 1950 2000 2000 250 1950 1950 2000 2000 30 40 50 20 10 0 350 200

Hypothesis Testing Part I James J. Heckman University of Chicago Econ 312, Spring 2019 Heckman

Unit 3: Foundations for inference Lecture 3: Decision errors, significance levels, sample size,

Once again: the Central Limit Theorem and hypothesis testing

Simple Linear Regression Ronet Bachman, Ph.D. Presented by Justice Research and Statistics

CS626 Data Analysis and Simulation Instructor: Peter Kemper R 104A, phone 221-3462,

Null Hypothesis Significance Testing Signifcance Level, Power, t -Tests 18.05 Spring 2014 Jeremy

Review of basic frequentist concepts Shravan Vasishth March 10, 2020 1 Foundations 1.1 Random

The Gaussian parameterized by mean and SD (position / width) product of two Gaussians is

The Power and Limits of Statistics DPRRGSP 2018-11-29 @ReinhardFurrer Applied Statistics