Gov 2000: 6. Hypothesis Testing
Matthew Blackwell
October 11, 2016
1 / 55
Gov 2000: 6. Hypothesis Testing Matthew Blackwell October 11, 2016 - - PowerPoint PPT Presentation
Gov 2000: 6. Hypothesis Testing Matthew Blackwell October 11, 2016 1 / 55 1. Hypothesis Testing Examples 2. Hypothesis Test Nomenclature 3. Conducting Hypothesis Tests 4. p-values 5. Power Analyses 6. Exact Inference* 7. Wrap up 2 / 55
Matthew Blackwell
October 11, 2016
1 / 55
2 / 55
population parameter, drawing on our knowledge of probability.
values of the parameter in the confjdence interval.
about the data.
the term!
3 / 55
4 / 55
Your advisor asks you to grab a tea with milk for him before your meeting and he says that he prefers tea poured before the milk. You stop by Darwin’s and ask for a tea with milk. When you bring it to your advisor, he complains that it was prepared milk-fjrst.
devise a test:
▶ Prepare 8 cups of tea, 4 milk-fjrst, 4 tea-fjrst ▶ Present cups to advisor in a random order ▶ Ask advisor to pick which 4 of the 8 were milk-fjrst. 5 / 55
correct if she were guessing randomly?
▶ Only one way to choose all 4 correct cups. ▶ But 70 ways of choosing 4 cups among 8. ▶ Choosing at random ≈ picking each of these 70 with equal
probability.
1 70 ≈ 0.014 or 1.4%.
Another testing example 6 / 55
7 / 55
load("../data/gerber_green_larimer.RData") social$voted <- 1 * (social$voted == "Yes") neigh.mean <- mean(social$voted[social$treatment == "Neighbors"]) contr.mean <- mean(social$voted[social$treatment == "Civic Duty"]) neigh.mean - contr.mean ## [1] 0.0634
due to random chance.
treatment efgect at all?
8 / 55
and population variance 𝜏2
𝑧
and population variance 𝜏2
𝑦
turnout: 𝔽[𝑍𝑗] − 𝔽[𝑌𝑗] = 𝜈𝑧 − 𝜈𝑦
𝐸𝑜 = 𝑍𝑜𝑧 − 𝑌𝑜𝑦
𝐸𝑜 with: ̂ se[̂ 𝐸𝑜] = √𝑇2
𝑧
𝑜𝑧 + 𝑇2
𝑦
𝑜𝑦
9 / 55
10 / 55
about the population distribution.
▶ Assume we know (part of) the true DGP.. ▶ Use tools of probability to see what types of data we should
see under this assumption.
▶ Compare our observed data to this thought experiment.
▶ We will “reject” the assumed DGP if the data is too unusual
under it.
11 / 55
parameters.
▶ Does social pressure induce higher voter turnout? (mean
turnout higher in social pressure group compared to Civic Duty group?)
▶ Do daughters cause politicians to be more liberal on women’s
issues? (voting behavior difgerent among members of Congress with daughters?)
▶ Do treaties constrain countries? (behavior difgerent among
treaty signers?)
▶ Is the share of Hillary Clinton supporters more than 50%? ▶ Are traits of treatment and control groups difgerent? 12 / 55
value for a population parameter.
▶ This is usually “no efgect/difgerence/relationship.” ▶ We denote this hypothesis as 𝐼0 ∶ 𝜄 = 𝜄0. ▶ 𝐼0: Social pressure doesn’t afgect turnout (𝐼0 ∶ 𝜈𝑧 − 𝜈𝑦 = 0)
hypothesis is the research claim we are interested in supporting.
▶ Usually, “there is a relationship/difgerence/efgect.” ▶ We denote this as 𝐼𝑏 ∶ 𝜄 ≠ 𝜄0. ▶ 𝐼𝑏: Social pressure afgects turnout (𝐼𝑏 ∶ 𝜈𝑧 − 𝜈𝑦 ≠ 0)
13 / 55
hypothesis based on the data we observe.
▶ Will help us adjudicate between the null and the alternative. ▶ Typically: larger values of 𝑈𝑜 ⇝ null less plausible. ▶ A test statistic is a r.v.
𝑈 under the null.
▶ We’ll write its probabilities as ℙ0(𝑈𝑜 ≤ 𝑢). 14 / 55
means has a standard normal distribution in large samples: 𝑈𝑜 = ̂ 𝐸𝑜 − (𝜈𝑧 − 𝜈𝑦) ̂ se[̂ 𝐸]
𝑒
→ 𝑂(0, 1)
𝑈𝑜 = ̂ 𝐸𝑜 ̂ se[̂ 𝐸𝑜]
𝑒
→ 𝑂(0, 1)
population difg-in-means is not plausible.
15 / 55
for which we reject the null.
▶ These are the areas that indicate that there is evidence against
the null.
▶ 𝐼0 ∶ 𝜈𝑧 − 𝜈𝑦 = 0 and 𝐼𝑏 ∶ 𝜈𝑧 − 𝜈𝑦 ≠ 0 ▶ Implies that 𝑈𝑜 >> 0 or 𝑈𝑜 << 0 will be evidence against the
null
▶ Rejection regions: |𝑈𝑜| > 𝑑 for some value 𝑑
16 / 55
Type I errors
A Type I error is when we reject the null hypothesis when it is in fact true.
Type II errors
A Type II error is when we fail to reject the null hypothesis when it is false.
discerning.
17 / 55
𝐼0 True 𝐼0 False Retain 𝐼0 Awesome! Type II error Reject 𝐼0 Type I error Good stufg!
a Type I error.
▶ With two-sided alternative, we reject when |𝑈𝑜| > 𝑑 ▶ Size of test then is: ℙ0(|𝑈𝑜| > 𝑑) = 𝛽
▶ Convention in social sciences is 𝛽 = 0.05, but nothing magical
there
▶ Particle physicists at CERN use 𝛽 ≈
1 1,750,000
▶ Lower values of 𝛽 guard against “fmukes” but increase barriers
to discovery
18 / 55
19 / 55
20 / 55
2 4 0.0 0.1 0.2 0.3 0.4 0.5 T under the null hypothesis P0(T)
Retain Reject Reject c
the rejection region only 5% of the time.
▶ ⇝ false rejection of the null only 5% of the time. ▶ Can fjnd 𝑑 based on the null distribution being ≈ standard
normal!
21 / 55
2 4 0.0 0.1 0.2 0.3 0.4 0.5 T under the null hypothesis P0(T)
Retain Reject Reject α 2 α 2 c = zα 2 −c = zα 2
ℙ0(𝑈𝑜 < −𝑨𝛽/2) = ℙ0(𝑈𝑜 > 𝑨𝛽/2) = 𝛽/2
22 / 55
2 4 0.0 0.1 0.2 0.3 0.4 0.5 T under the null hypothesis P0(T)
Retain Reject Reject 1 − α 2 α 2 c = zα 2 −c = −zα 2
ℙ0(𝑈𝑜 < −𝑨𝛽/2) = ℙ0(𝑈𝑜 > 𝑨𝛽/2) = 𝛽/2
▶ if 𝛽 = 0.05 ⇝ 𝑨𝛽/2 = qnorm(1-0.05/2) = 1.96 23 / 55
𝐸𝑜/̂ se[̂ 𝐸𝑜]
24 / 55
neigh_var <- var(social$voted[social$treatment == "Neighbors"]) neigh_n <- 38201 civic_var <- var(social$voted[social$treatment == "Civic Duty"]) civic_n <- 38218 se_diff <- sqrt(neigh_var/neigh_n + civic_var/civic_n) ## Calcuate test statistic (0.378 - 0.315)/se_diff ## [1] 18.3
25 / 55
5 10 15 20 0.0 0.1 0.2 0.3 0.4 0.5 T under the null hypothesis P0(T)
Retain Reject Reject T = 18.3
26 / 55
̂ 𝜄 for parameter 𝜄.
where 𝑈𝑜 = ̂ 𝜄 − 𝜄0 ̂ se[ ̂ 𝜄]
▶ For standard normal 𝑎, fjnd 𝑨𝛽/2 such that
ℙ(𝑎 ≤ 𝑨𝛽/2) = 1 − 𝛽/2.
ℙ0(|𝑈𝑜| > 𝑨𝛽/2)
𝑞
→ 𝛽.
27 / 55
𝐸𝑜 ± 1.96 × ̂ se
all null hypotheses that we would not reject with a 𝛽-level test.
▶ Construct a 95% CI (𝑏, 𝑐) for 𝜈𝑧 − 𝜈𝑦. ▶ If 0 ∈ (𝑏, 𝑐) ⇝ cannot reject 𝐼0 ∶ 𝜈𝑧 − 𝜈𝑦 = 0 at 𝛽 = 0.05 ▶ If 0 ∉ (𝑏, 𝑐) ⇝ reject 𝐼0 ∶ 𝜈𝑧 − 𝜈𝑦 = 0 at 𝛽 = 0.05
reject them as null hypotheses.
28 / 55
hypothesis that only goes in one direction.
▶ The social pressure efgect is positive (𝐼𝑏 ∶ 𝜈𝑧 − 𝜈𝑦 > 0)
doubt on the null hypothesis.
▶ Rejection region is only in one tail: 𝑈𝑜 > 𝑑, with 𝑑 adjusted
downward relative to two-sided test with the same level.
2 4
0.0 0.1 0.2 0.3 0.4 0.5 T under the null hypothesis P0(T)
0.05 c = 1.64 Retain Reject
29 / 55
30 / 55
informative.
(𝐼0 ∶ 𝜈𝑧 − 𝜈𝑦 = 0) at 𝛽 = 0.05.
31 / 55
p-value
The p-value is the smallest value 𝛽 such that an 𝛽-level test would reject the null hypothesis.
signifjcant at level 𝛽.
▶ Ex: if p-value is 0.03, then we can reject at 𝛽 = 0.05.
𝑈𝑜 = 𝑢obs, the p-value is the probability (under 𝐼0) of
the one observed: ℙ0(|𝑈𝑜| > 𝑢obs)
against the null.
32 / 55
more extreme if there were no treatment efgect? ℙ0(|𝑈𝑜| > 18.5) = ℙ0(𝑈𝑜 > 18.5) + ℙ0(𝑈𝑜 < −18.5) = 2 × ℙ0(𝑈𝑜 < −18.5)
2 * pnorm(-18.5) ## [1] 2.06e-76
33 / 55
▶ An indication of a large substantive efgect ▶ The probability that the null hypothesis is false ▶ The probability that the alternative hypothesis is true
▶ Clustering of p-values at 0.049. ▶ False discovery rates actually quite high (p-value fallacy).
actually make more sense.
▶ CIs allow easy assessment substantive and statistical
signifjcance.
34 / 55
35 / 55
38,000 for each treatment condition?
you think might be the true treatment efgect:
▶ Small efgect sizes (half percentage point) will require huge 𝑜 ▶ Large efgect sizes (10 percentage points) will require smaller 𝑜
36 / 55
rejects the null.
▶ Probability that we reject given some specifjc value of the
parameter ℙ𝜄(|𝑈| > 𝑑)
▶ Power = 1 − ℙ(Type II error) ▶ Better tests = higher power.
world:
▶ Null is true (no treatment efgect) ▶ Null is false (there is a treatment efgect), but test had low
power.
37 / 55
discrimination in hiring.
▶ Null hypothesis is that hiring rates for white and black people
are equal, 𝐼0 ∶ 𝜈𝑥 − 𝜈𝑐 = 0
▶ You sample 10 hiring records of each race, conduct hypothesis
test and fail to reject null.
What’s the problem?
38 / 55
power analysis.
▶ Calculate how likely we are to reject difgerent possible
treatment efgects at difgerent sample sizes.
▶ Can be done before the experiment: which efgects will I be able
to detect with high probability at my 𝑜?
▶ Pick some hypothetical efgect size, 𝜈𝑧 − 𝜈𝑦 = 0.05 ▶ Calculate the distribution of 𝑈 under that efgect size. ▶ Calculate the probability of rejecting the null under that
distribution.
▶ Repeat for difgerent efgect sizes. 39 / 55
sure you have a high probability of rejecting the null if the true efgect is 𝜈𝑧 − 𝜈𝑦 = 0.05.
500 mailers (250 for each type).
▶ Assume we know that 𝜏2
𝑧 = 𝜏2 𝑦 = 0.2
▶ Implies 𝕎[̂
𝐸𝑜] = 0.2/250 + 0.2/250 = 0.0016.
distribution of the estimator under the proposed efgect size: ̂ 𝐸𝑜 ≈ 𝑂(0.05, 0.0016)
40 / 55
|𝑈| = ∣ ̂ 𝐸𝑜 − 0 ̂ se[̂ 𝐸𝑜]∣ > 1.96 ⟺ |̂ 𝐸𝑜| > 1.96 × ̂ se[̂ 𝐸𝑜]
𝐸𝑜] = 0.0016 then we reject when: {̂ 𝐸𝑜 < −1.96 × √0.0016} ∪ {̂ 𝐸𝑜 > 1.96 × √0.0016}
distribution we just derived! ℙ (̂ 𝐸𝑜 < −1.96 × √0.0016) + ℙ (̂ 𝐸𝑜 > 1.96 × √0.0016)
41 / 55
̂ 𝐸𝑜 ≈ 𝑂(0.05, 0.0016): se <- sqrt(0.2/250 + 0.2/250) pnorm(-1.96 * se, mean = 0.05, sd = se) + pnorm(1.96 * se, mean = 0.05, sd = se, lower.tail = FALSE) ## [1] 0.24
increase in voter turnout, then we would be able to reject the null of no efgect about a quarter of the time.
42 / 55
0.0 0.1 0.2 0.3 T
Retain Reject Reject
43 / 55
0.0 0.1 0.2 0.3 T
Retain Reject Reject
Assumed treatment efgect = 0.05 and power = 0.24.
43 / 55
0.0 0.1 0.2 0.3 T
Retain Reject Reject
Assumed treatment efgect = -0.2 and power = 0.999.
43 / 55
0.0 0.1 0.2 0.3 T
Retain Reject Reject
Assumed treatment efgect = -0.1 and power = 0.705.
43 / 55
0.0 0.1 0.2 0.3 T
Retain Reject Reject
Assumed treatment efgect = -0.05 and power = 0.24.
43 / 55
0.0 0.1 0.2 0.3 T
Retain Reject Reject
Assumed treatment efgect = 0 and power = 0.05.
43 / 55
0.0 0.1 0.2 0.3 T
Retain Reject Reject
Assumed treatment efgect = 0.05 and power = 0.24.
43 / 55
0.0 0.1 0.2 0.3 T
Retain Reject Reject
Assumed treatment efgect = 0.1 and power = 0.705.
43 / 55
0.0 0.1 0.2 0.3 T
Retain Reject Reject
Assumed treatment efgect = 0.2 and power = 0.999.
43 / 55
plot the resulting power curve:
▶ 𝑜 = 500 (blue), 1000 (red), 10000 (black)
0.0 0.1 0.2 0.0 0.2 0.4 0.6 0.8 1.0 Hypothesized effect size Power
44 / 55
45 / 55
inferences at any sample size?
▶ Exact means that we know or can fjgure out the distribution of
a statistic without relying on an approximation.
▶ 𝑍𝑗 are i.i.d. with 𝔽[𝑍𝑗] = 𝜈 < ∞ and 𝕎[𝑍𝑗] = 𝜏2 < ∞ ▶ Relied on large 𝑜 to get distribution of 𝑍𝑜 (CLT)
i.i.d. samples from 𝑂(𝜈, 𝜏2)
▶ Stronger assumptions ⇝ learn more with lower 𝑜 ▶ Model dependence: If the model is wrong (𝑍𝑗 are not normal),
inferences will be wrong!
46 / 55
𝑈𝑜 = 𝑍𝑜 − 𝜈
𝑇𝑜 √𝑜 𝑒
→ 𝑂(0, 1)
following for any sample size: 𝑈𝑜 = 𝑍𝑜 − 𝜈
𝑇𝑜 √𝑜
∼ 𝑢𝑜−1
the t distribution) with 𝑜 − 1 degrees of freedom (df).
▶ Family of distributions with parameter df.
pen name, Student.
47 / 55
freedom, which here is dictated by the sample size.
▶ As sample sizes increase, tends toward the 𝑂(0, 1) ▶ Similar shape to the Normal, but with fatter tails.
variance of estimating the SE.
2 4 0.0 0.1 0.2 0.3 0.4 x f(x)
Normal t (df = 5)
48 / 55
𝑈𝑜 = 𝑍𝑜 − 𝜈0 𝑇𝑜/√𝑜
distribution in place of the normal
▶ Testing: fjnding critical values 𝑢𝑜−1,𝛽/2 such that
ℙ0(𝑈 ≤ 𝑢𝑜−1,𝛽/2) = 1 − 𝛽/2
▶ CIs: for 𝑢𝑜−1,𝛽/2 in place of z-values: 𝑍𝑜 ± 𝑢𝑜−1,𝛽/2 × 𝑇𝑜
√𝑜
▶ The 𝑢 distribution has fatter tails ⇝ 𝑢𝑜−1,𝛽/2 > 𝑨𝛽/2 ▶ ⇝ wider CIs, smaller rejection regions 49 / 55
2 4 0.0 0.1 0.2 0.3 0.4 0.5 f(x)
0.975 t = ?
qt(0.975, df = 6 - 1) ## [1] 2.57
50 / 55
51 / 55
▶ Statistical thought experiments. ▶ Allow us to test specifjc hypotheses about parameters.
▶ Summarize evidence against the null in this data set. ▶ Can be misleading, better to use confjdence intervals.
tests.
assumptions.
52 / 55
Return
winner of every presidential election all 8 elections since 1984.
▶ Doesn’t use any polls, just 13 true/false questions. ▶ Ex: “Challenger charisma” ▶ This year he’s trolling liberals: predicts Trump win.
random guessing?
▶ If he randomly choosing between the two candidates in each
election, he’d fmipping 8 coins with probability 0.5.
▶ ⇝ number of correct predictions is Binomial(8, 0.5)
guessing at random?
53 / 55
dbinom(x = 8, size = 8, prob = 0.5) ## [1] 0.00391 2 4 6 8 0.00 0.05 0.10 0.15 0.20 0.25
# of Correct Predctions Probability
54 / 55
55 / 55