Comparing two samples Edwin Leuven Introduction We will now study - - PowerPoint PPT Presentation
Comparing two samples Edwin Leuven Introduction We will now study - - PowerPoint PPT Presentation
Comparing two samples Edwin Leuven Introduction We will now study the relationship between 2 variables X and Y We consider the case when X is binary men/women, high school/college, city/countryside, treated/untreated and call X the
Introduction
We will now study the relationship between 2 variables X and Y We consider the case when X is binary
◮ men/women, high school/college, city/countryside,
treated/untreated and call X the explanatory variable. Y can be binary, continuous or have some other cardinality
◮ income, education, health
and call Y the outcome variable.
2/38
Introduction – Between group differences
We will focus today on differences in averages F.e., the difference in average income between men and women
◮ in the population
E[income|men] − E[income|women]
◮ in the sample
1 nmen
- i:men
incomei − 1 nwomen
- i:women
incomei = incomemen − incomewomen Such differences are always descriptive differences We are also often interested in causal differences
3/38
Introduction – Sample Average Causal Effect
We refer to causal differences as “effects”, f.e.
◮ “What is the effect of college on earnings?”
With the following potential outcomes
◮ y1 i = earnings with college ◮ y0 i = earnings without without college
we can define the sample average treatment effect (SATE) SATE = 1 n
n
- i=1
(y1
i − y0 i )
4/38
Introduction – Fundamental problem of causal inference
The observed outcome (earnings) equals yi = xiy1
i + (1 − xi)y0 i
depending on whether i went to college (xi = 1) or not (xi = 0) For a given person i the causal effect of xi on yi
◮ equals: y1 i − y0 i ◮ is not identified: only 1 potential outcome is observed
(depending on xi) Q: How do we compute the SATE given that we only see yi?
5/38
Introduction – Confounders
How do we estimate SATE = y1 − y0 with observed earnings yi? We can compare average observed earnings of people with college and people without college
- SATE = ycollege − yno college = y1college − y0no college
Does this give the SATE? Only if E[y0|college] = E[y0|no college] E[y1|college] = E[y1|no college] college is randomly assigned!
6/38
Introduction – RCT’s
In practice we think that college graduates are different from people without college in ways that matter for their earnings, f.e. E[ability|x = 1] = E[ability|x = 0] If ability also affects y, then we can no longer attribute differences in y to differences in x alone! This is why randomized control trials (RCT’s) are considered the “gold standard” when estimating causal effects:
◮ randomization equalizes the treatment and control group on
pre-existing characteristics such as ability
7/38
Introduction – RCT’s
This balancing properting of RCT’s is good for internal validity
◮ internal validity =
SATE can be given a causal interpretation in the sample
◮ but watch out for:
◮ Hawthorne effects (somebody’s watching me!) ◮ John Henry effects (am gonna show you!) ◮ Attrition Bias (let’s get outta here!)
The external validity of RCT’s is sometimes less clear
◮ external validity = is the SATE estimate valid for other
samples?
8/38
Two-sample estimator
But for now assume that x is randomized by a coin flip We want to test the following H0 : E[y|x = 1] = E[y|x = 0] vs H1 : E[y|x = 1] = E[y|x = 0]
- r
H0 : θ = 0 vs H1 : θ = 0 where θ = E[y|x = 1] − E[y|x = 0] We estimate θ with the corresponding sample average ˆ θ = yx=1 − yx=0
9/38
Example – Social Pressure Experiment
August 2006 Primary Statewide Election in Michigan Send postcards with different (randomly assigned) messages
- 1. no message (control group)
- 2. civic duty message
- 3. “you are being studied” message (Hawthorne effect)
- 4. neighborhood social pressure message
10/38
Example – Social Pressure Experiment
11/38
Example – Social Pressure Experiment, Balance
social = read.csv("_data/social.csv") with(social, tapply(primary2004, messages, mean)) ## Civic Duty Control Hawthorne Neighbors ## 0.399 0.400 0.403 0.407 with(social, tapply(hhsize, messages, mean)) ## Civic Duty Control Hawthorne Neighbors ## 2.19 2.18 2.18 2.19
12/38
Example – Social Pressure Experiment, Effects
m06 = with(social, tapply(primary2006, messages, mean)) m06 ## Civic Duty Control Hawthorne Neighbors ## 0.315 0.297 0.322 0.378 m06 - m06["Control"] ## Civic Duty Control Hawthorne Neighbors ## 0.0179 0.0000 0.0257 0.0813
13/38
Example – Social Pressure Experiment
Turnout rate: Y T = 0.38, Y C = 0.30, Sample size: nT = 360, nC = 1890 Estimated average treatment effect:
- ATE = Y T − Y C = 0.08
How to compute the 95% CI of ˆ θ and perform a test of H0?
14/38
One-Sample CI and t-Test
First remember how we computed the CI and perform the t-test for a single average ¯ x? By the CLT we know that in large samples ¯ X
approx
∼ N(E[X], Var(X)/n) and CI95% = E[X] ± 1.96 ×
- Var(X)/n
= ¯ X ± 1.96 × SE where SE =
- 1
(n − 1)
- (xi − ¯
x)2/n and t = ¯ X − E[X] SE ∼ t(n − 1)
15/38
Two-Sample CI and t-Test
Our estimator is now the difference between 2 sample averages: ˆ θ = yx=1 − yx=0 How is ˆ θ distributed? Since ¯ yx=k
approx
∼ N(µ1, σ2
k/nk)
where µk = E[Y |X = k] and σ2
k = Var(Y |X = k) we have that
ˆ θ
approx
∼ N(µ1 − µ0, σ2
1/n1 + σ2 0/n0)
which follows from the following result.
16/38
Sums of normal random variables
Sums of normal random variables
If Xk ∼ N(µk, σ2
k)
k = 1, 2 where µk = E[Xk] and σ2
k = Var(Xk), then
X1 + X2 ∼ N(E[X1 + X2], Var(X1 + X2)) ∼ N(µ1 + µ2, σ2
1 + σ2 1 + 2σ12)
where σ12 = Cov(X1, X2), and σ12 = 0 if X1 & X2 are independent.
17/38
Example – Social Pressure Experiment
Turnout rate: Y T = 0.38, Y C = 0.30, Sample size: nT = 360, nC = 1890 Estimated average treatment effect:
- ATE = Y T − Y C = 0.07
Standard error: SE =
- Y T(1 − Y T)
nT + Y C(1 − Y C) nC = 0.028 95% Confidence intervals based on CLT: ( ATE − SE × z0.975,
- ATE + SE × z0.025)
= (0.026, 0.134)
18/38
t-Test – Large samples
Under H0 : θ = 0 our test-statistic now becomes t = ˆ θ − 0 SE(ˆ θ) = yx=1 − yx=0 − 0
- ˆ
σ2
x=1/n1 + ˆ
σ2
x=0/n0
where ˆ σ2
x=k is the sample variance of yi for the x = k group:
ˆ σ2
x=k =
1 nk − 1
- i:xi=k
(yi − yx=k)2 Under H0 : θ = 0 and in large samples t
approx
∼ N(0, 1)
19/38
t-Test – Small samples
In small samples t
approx
∼ t(k) where k ≈ (σ2
x=1/n1 + σ2 x=0/n0)2
σ4
x=1/(n2 1(n1 − 1)) + σ4 x=0/(n2 0(n0 − 1))
when σ2
x=1 = σ2 x=0 we get that k = n − 2.
20/38
Example – Social Pressure Experiment
Turnout rate: Y T = 0.38, Y C = 0.30, Sample size: nT = 360, nC = 1890 Estimated average treatment effect:
- ATE = Y T − Y C = 0.07
Standard error: SE =
- Y T(1 − Y T)
nT + Y C(1 − Y C) nC = 0.028 T-statistic: t = 0.08 0.028 ≈ 2.9
21/38
Example – Social Pressure Experiment, Two-sample test
We have the following reference distribution under H0 : pT = pC N(0, p(1 − p) nT + p(1 − p) nC ) p = (.38 * 360 + .3 * 1890) / (360 + 1890) se0 = sqrt(p*(1-p)/360+p*(1-p)/1890) p; se0 ## [1] 0.313 ## [1] 0.0267 t = 0.07
0.028 = 2.9 gives a p-value of:
2 * pnorm(.08/se0, lower.tail = F) # p-value ## [1] 0.00269
22/38
Power
We reject the null if the test statistic is “too large” to be consistent with our null hypothesis: decision =
- reject H0
if |t| > c do not reject H0 if |t| ≤ c H0 is true H0 is false Not reject H0 Correct Type II error probability 1 − α probability β Reject H0 Type I error Correct probability α probability 1 − β Hypothesis tests control the probability of Type I error, which is equal to the level of tests or α They do not control the probability of Type II error
23/38
Power
Null hypotheses are often uninteresting But, hypothesis testing may indicate the strength of evidence for or against your theory Our ability to discrimintate between H0 and H1 is measured by power: power = 1 − Pr(Type II error) = 1 − β A large p-value can occur either because H0 is true or because H0 is false but the test is not powerful. There is a tradeoff between the two types of error, but typically, we want a most powerful test given the level
24/38
Power Analysis
Power analysis:
- 1. Choose H0, H1, α
◮ f.e. H0 : µ = µ0, H1 : µ = µ1, α = 0.05 ◮ µ = µ1 which implies X ∼ N(µ1, V (X)/n)
- 2. Choose population parameter under hypothetical data
generating process
- 3. Fix either
3.1 the sample size, and compute power
◮ we reject H0 if |X| > µ0 + zα/2 × SE
3.2 the desired power, and compute required sample size
◮ fix the probability in 3a. and solve for n 25/38
Power – The Big Picture
Density Critical Level under Critical Level Distribution under H0 Distribution under H1 Area = Pr(Type II error)
26/38
Low Power
Density z1−α 2 Distribution under H0 Distribution under H1 Area = Pr(Type II error)
27/38
High Power
Density z1−α 2 Distribution under H0 Distribution under H1 Area = Pr(Type II error)
28/38
Power Calculation
Under H0 : µ = µ0 ˆ µ ∼ N(µ0, σ2
0)
where we estimate σ0 with SE0 We reject H0 if ¯ x < µ0 − zα/2SE0
- r
¯ x > µ0 + z1−α/2SE0 Now suppose the null H0 : θ = θ0 is false and that instead θ = θ1 Then the probability of rejecting equals Pr(reject) = Pr(¯ x < µ0 − zα/2SE0) + Pr(¯ x > µ0 + z1−α/2SE0) = Φ
µ0 − zα/2SE0 − θ1
SE1
- + Φ
- −µ0 + z1−α/2SE0 − θ1
SE1
- 29/38
Power – One sample test
# one sample test, with H0: p=p0, H1: p=p1 n = 250 p0 = .50 p1 = .48 se0 = sqrt(p0 * (1 - p0) / n) se1 = sqrt(p1 * (1 - p1) / n) c = qnorm(.975) pnorm(p0 - c * se0, mean=p1, sd=se1) + pnorm(p0 + c * se0, mean=p1, sd=se1, lower.tail = F) ## [1] 0.0967
30/38
Power – Two sample test
# two sample test, with H0: dif=0, H1: dif!=0 n1 = n2 = 500 p1 = .1 p2 = .05 p = (p1 * n1 + p2 * n2) / (n1 + n2) # overall rate se0 = sqrt(p * (1 - p) * (1 / n1 + 1 / n2)) se1 = sqrt(p1 * (1 - p1) / n1 + p2 * (1 - p2) / n2) c = qnorm(.975) pnorm(- c*se0, mean=p2-p1, sd=se1) + pnorm(c*se0, mean=p2-p1, sd=se1, lower.tail=F) ## [1] 0.852
31/38
Power – Two sample test
# two sample test, with H0: dif=0, H1: dif!=0 n1 = n2 = 500; p1 = .1; p2 = .05 power.prop.test(n=n1, p1=p1, p2=p2) ## ## Two-sample comparison of proportions power calculation ## ## n = 500 ## p1 = 0.1 ## p2 = 0.05 ## sig.level = 0.05 ## power = 0.852 ## alternative = two.sided ## ## NOTE: n is number in *each* group
32/38
Example – Social Pressure Experiment, Power calculation
Let pT = 0.38 and pC = 0.30 Two-sample test at the 5% significance level Assume equal group size: n = nT = nC
- 1. If n = 1000, what is the power of the test?
- 2. What group size do we need for a power of 0.8?
power.prop.test(p1=0.38,p2=.3,n=1000)$power ## [1] 0.966 power.prop.test(p1=0.38,p2=.3,power=.8)$n ## [1] 549
33/38
Power graph - H0 : p = 0.3, H1 : p = p1
p1 Power 0.1 0.2 0.3 0.38 0.5 0.6 0.0 0.2 0.4 0.6 0.8 1.0 n = 100 n = 200 n = 549
34/38
Increase power by increasing sample size, n
Density z1−α 2
35/38
Increase power by increasing sample size, 2 × n
Density z1−α 2
36/38
Increase power by increasing sample size, 4 × n
Density z1−α 2
37/38
Conclusion
We discussed two sample tests
◮ we need this when comparing populations
You understand
◮ difference between descriptive and causal differences ◮ counterfactuals ◮ experiments, internal validity, external validity ◮ balancing and confounding ◮ type I and type II errors, power (1-β) and size (α)
You can
◮ compute and interpret 2 sample tests for means and proportions ◮ perform power and sample size calculations
38/38