Comparing two samples Edwin Leuven Introduction We will now study - - PowerPoint PPT Presentation

comparing two samples
SMART_READER_LITE
LIVE PREVIEW

Comparing two samples Edwin Leuven Introduction We will now study - - PowerPoint PPT Presentation

Comparing two samples Edwin Leuven Introduction We will now study the relationship between 2 variables X and Y We consider the case when X is binary men/women, high school/college, city/countryside, treated/untreated and call X the


slide-1
SLIDE 1

Comparing two samples

Edwin Leuven

slide-2
SLIDE 2

Introduction

We will now study the relationship between 2 variables X and Y We consider the case when X is binary

◮ men/women, high school/college, city/countryside,

treated/untreated and call X the explanatory variable. Y can be binary, continuous or have some other cardinality

◮ income, education, health

and call Y the outcome variable.

2/38

slide-3
SLIDE 3

Introduction – Between group differences

We will focus today on differences in averages F.e., the difference in average income between men and women

◮ in the population

E[income|men] − E[income|women]

◮ in the sample

1 nmen

  • i:men

incomei − 1 nwomen

  • i:women

incomei = incomemen − incomewomen Such differences are always descriptive differences We are also often interested in causal differences

3/38

slide-4
SLIDE 4

Introduction – Sample Average Causal Effect

We refer to causal differences as “effects”, f.e.

◮ “What is the effect of college on earnings?”

With the following potential outcomes

◮ y1 i = earnings with college ◮ y0 i = earnings without without college

we can define the sample average treatment effect (SATE) SATE = 1 n

n

  • i=1

(y1

i − y0 i )

4/38

slide-5
SLIDE 5

Introduction – Fundamental problem of causal inference

The observed outcome (earnings) equals yi = xiy1

i + (1 − xi)y0 i

depending on whether i went to college (xi = 1) or not (xi = 0) For a given person i the causal effect of xi on yi

◮ equals: y1 i − y0 i ◮ is not identified: only 1 potential outcome is observed

(depending on xi) Q: How do we compute the SATE given that we only see yi?

5/38

slide-6
SLIDE 6

Introduction – Confounders

How do we estimate SATE = y1 − y0 with observed earnings yi? We can compare average observed earnings of people with college and people without college

  • SATE = ycollege − yno college = y1college − y0no college

Does this give the SATE? Only if E[y0|college] = E[y0|no college] E[y1|college] = E[y1|no college] college is randomly assigned!

6/38

slide-7
SLIDE 7

Introduction – RCT’s

In practice we think that college graduates are different from people without college in ways that matter for their earnings, f.e. E[ability|x = 1] = E[ability|x = 0] If ability also affects y, then we can no longer attribute differences in y to differences in x alone! This is why randomized control trials (RCT’s) are considered the “gold standard” when estimating causal effects:

◮ randomization equalizes the treatment and control group on

pre-existing characteristics such as ability

7/38

slide-8
SLIDE 8

Introduction – RCT’s

This balancing properting of RCT’s is good for internal validity

◮ internal validity =

SATE can be given a causal interpretation in the sample

◮ but watch out for:

◮ Hawthorne effects (somebody’s watching me!) ◮ John Henry effects (am gonna show you!) ◮ Attrition Bias (let’s get outta here!)

The external validity of RCT’s is sometimes less clear

◮ external validity = is the SATE estimate valid for other

samples?

8/38

slide-9
SLIDE 9

Two-sample estimator

But for now assume that x is randomized by a coin flip We want to test the following H0 : E[y|x = 1] = E[y|x = 0] vs H1 : E[y|x = 1] = E[y|x = 0]

  • r

H0 : θ = 0 vs H1 : θ = 0 where θ = E[y|x = 1] − E[y|x = 0] We estimate θ with the corresponding sample average ˆ θ = yx=1 − yx=0

9/38

slide-10
SLIDE 10

Example – Social Pressure Experiment

August 2006 Primary Statewide Election in Michigan Send postcards with different (randomly assigned) messages

  • 1. no message (control group)
  • 2. civic duty message
  • 3. “you are being studied” message (Hawthorne effect)
  • 4. neighborhood social pressure message

10/38

slide-11
SLIDE 11

Example – Social Pressure Experiment

11/38

slide-12
SLIDE 12

Example – Social Pressure Experiment, Balance

social = read.csv("_data/social.csv") with(social, tapply(primary2004, messages, mean)) ## Civic Duty Control Hawthorne Neighbors ## 0.399 0.400 0.403 0.407 with(social, tapply(hhsize, messages, mean)) ## Civic Duty Control Hawthorne Neighbors ## 2.19 2.18 2.18 2.19

12/38

slide-13
SLIDE 13

Example – Social Pressure Experiment, Effects

m06 = with(social, tapply(primary2006, messages, mean)) m06 ## Civic Duty Control Hawthorne Neighbors ## 0.315 0.297 0.322 0.378 m06 - m06["Control"] ## Civic Duty Control Hawthorne Neighbors ## 0.0179 0.0000 0.0257 0.0813

13/38

slide-14
SLIDE 14

Example – Social Pressure Experiment

Turnout rate: Y T = 0.38, Y C = 0.30, Sample size: nT = 360, nC = 1890 Estimated average treatment effect:

  • ATE = Y T − Y C = 0.08

How to compute the 95% CI of ˆ θ and perform a test of H0?

14/38

slide-15
SLIDE 15

One-Sample CI and t-Test

First remember how we computed the CI and perform the t-test for a single average ¯ x? By the CLT we know that in large samples ¯ X

approx

∼ N(E[X], Var(X)/n) and CI95% = E[X] ± 1.96 ×

  • Var(X)/n

= ¯ X ± 1.96 × SE where SE =

  • 1

(n − 1)

  • (xi − ¯

x)2/n and t = ¯ X − E[X] SE ∼ t(n − 1)

15/38

slide-16
SLIDE 16

Two-Sample CI and t-Test

Our estimator is now the difference between 2 sample averages: ˆ θ = yx=1 − yx=0 How is ˆ θ distributed? Since ¯ yx=k

approx

∼ N(µ1, σ2

k/nk)

where µk = E[Y |X = k] and σ2

k = Var(Y |X = k) we have that

ˆ θ

approx

∼ N(µ1 − µ0, σ2

1/n1 + σ2 0/n0)

which follows from the following result.

16/38

slide-17
SLIDE 17

Sums of normal random variables

Sums of normal random variables

If Xk ∼ N(µk, σ2

k)

k = 1, 2 where µk = E[Xk] and σ2

k = Var(Xk), then

X1 + X2 ∼ N(E[X1 + X2], Var(X1 + X2)) ∼ N(µ1 + µ2, σ2

1 + σ2 1 + 2σ12)

where σ12 = Cov(X1, X2), and σ12 = 0 if X1 & X2 are independent.

17/38

slide-18
SLIDE 18

Example – Social Pressure Experiment

Turnout rate: Y T = 0.38, Y C = 0.30, Sample size: nT = 360, nC = 1890 Estimated average treatment effect:

  • ATE = Y T − Y C = 0.07

Standard error: SE =

  • Y T(1 − Y T)

nT + Y C(1 − Y C) nC = 0.028 95% Confidence intervals based on CLT: ( ATE − SE × z0.975,

  • ATE + SE × z0.025)

= (0.026, 0.134)

18/38

slide-19
SLIDE 19

t-Test – Large samples

Under H0 : θ = 0 our test-statistic now becomes t = ˆ θ − 0 SE(ˆ θ) = yx=1 − yx=0 − 0

  • ˆ

σ2

x=1/n1 + ˆ

σ2

x=0/n0

where ˆ σ2

x=k is the sample variance of yi for the x = k group:

ˆ σ2

x=k =

1 nk − 1

  • i:xi=k

(yi − yx=k)2 Under H0 : θ = 0 and in large samples t

approx

∼ N(0, 1)

19/38

slide-20
SLIDE 20

t-Test – Small samples

In small samples t

approx

∼ t(k) where k ≈ (σ2

x=1/n1 + σ2 x=0/n0)2

σ4

x=1/(n2 1(n1 − 1)) + σ4 x=0/(n2 0(n0 − 1))

when σ2

x=1 = σ2 x=0 we get that k = n − 2.

20/38

slide-21
SLIDE 21

Example – Social Pressure Experiment

Turnout rate: Y T = 0.38, Y C = 0.30, Sample size: nT = 360, nC = 1890 Estimated average treatment effect:

  • ATE = Y T − Y C = 0.07

Standard error: SE =

  • Y T(1 − Y T)

nT + Y C(1 − Y C) nC = 0.028 T-statistic: t = 0.08 0.028 ≈ 2.9

21/38

slide-22
SLIDE 22

Example – Social Pressure Experiment, Two-sample test

We have the following reference distribution under H0 : pT = pC N(0, p(1 − p) nT + p(1 − p) nC ) p = (.38 * 360 + .3 * 1890) / (360 + 1890) se0 = sqrt(p*(1-p)/360+p*(1-p)/1890) p; se0 ## [1] 0.313 ## [1] 0.0267 t = 0.07

0.028 = 2.9 gives a p-value of:

2 * pnorm(.08/se0, lower.tail = F) # p-value ## [1] 0.00269

22/38

slide-23
SLIDE 23

Power

We reject the null if the test statistic is “too large” to be consistent with our null hypothesis: decision =

  • reject H0

if |t| > c do not reject H0 if |t| ≤ c H0 is true H0 is false Not reject H0 Correct Type II error probability 1 − α probability β Reject H0 Type I error Correct probability α probability 1 − β Hypothesis tests control the probability of Type I error, which is equal to the level of tests or α They do not control the probability of Type II error

23/38

slide-24
SLIDE 24

Power

Null hypotheses are often uninteresting But, hypothesis testing may indicate the strength of evidence for or against your theory Our ability to discrimintate between H0 and H1 is measured by power: power = 1 − Pr(Type II error) = 1 − β A large p-value can occur either because H0 is true or because H0 is false but the test is not powerful. There is a tradeoff between the two types of error, but typically, we want a most powerful test given the level

24/38

slide-25
SLIDE 25

Power Analysis

Power analysis:

  • 1. Choose H0, H1, α

◮ f.e. H0 : µ = µ0, H1 : µ = µ1, α = 0.05 ◮ µ = µ1 which implies X ∼ N(µ1, V (X)/n)

  • 2. Choose population parameter under hypothetical data

generating process

  • 3. Fix either

3.1 the sample size, and compute power

◮ we reject H0 if |X| > µ0 + zα/2 × SE

3.2 the desired power, and compute required sample size

◮ fix the probability in 3a. and solve for n 25/38

slide-26
SLIDE 26

Power – The Big Picture

Density Critical Level under Critical Level Distribution under H0 Distribution under H1 Area = Pr(Type II error)

26/38

slide-27
SLIDE 27

Low Power

Density z1−α 2 Distribution under H0 Distribution under H1 Area = Pr(Type II error)

27/38

slide-28
SLIDE 28

High Power

Density z1−α 2 Distribution under H0 Distribution under H1 Area = Pr(Type II error)

28/38

slide-29
SLIDE 29

Power Calculation

Under H0 : µ = µ0 ˆ µ ∼ N(µ0, σ2

0)

where we estimate σ0 with SE0 We reject H0 if ¯ x < µ0 − zα/2SE0

  • r

¯ x > µ0 + z1−α/2SE0 Now suppose the null H0 : θ = θ0 is false and that instead θ = θ1 Then the probability of rejecting equals Pr(reject) = Pr(¯ x < µ0 − zα/2SE0) + Pr(¯ x > µ0 + z1−α/2SE0) = Φ

µ0 − zα/2SE0 − θ1

SE1

  • + Φ
  • −µ0 + z1−α/2SE0 − θ1

SE1

  • 29/38
slide-30
SLIDE 30

Power – One sample test

# one sample test, with H0: p=p0, H1: p=p1 n = 250 p0 = .50 p1 = .48 se0 = sqrt(p0 * (1 - p0) / n) se1 = sqrt(p1 * (1 - p1) / n) c = qnorm(.975) pnorm(p0 - c * se0, mean=p1, sd=se1) + pnorm(p0 + c * se0, mean=p1, sd=se1, lower.tail = F) ## [1] 0.0967

30/38

slide-31
SLIDE 31

Power – Two sample test

# two sample test, with H0: dif=0, H1: dif!=0 n1 = n2 = 500 p1 = .1 p2 = .05 p = (p1 * n1 + p2 * n2) / (n1 + n2) # overall rate se0 = sqrt(p * (1 - p) * (1 / n1 + 1 / n2)) se1 = sqrt(p1 * (1 - p1) / n1 + p2 * (1 - p2) / n2) c = qnorm(.975) pnorm(- c*se0, mean=p2-p1, sd=se1) + pnorm(c*se0, mean=p2-p1, sd=se1, lower.tail=F) ## [1] 0.852

31/38

slide-32
SLIDE 32

Power – Two sample test

# two sample test, with H0: dif=0, H1: dif!=0 n1 = n2 = 500; p1 = .1; p2 = .05 power.prop.test(n=n1, p1=p1, p2=p2) ## ## Two-sample comparison of proportions power calculation ## ## n = 500 ## p1 = 0.1 ## p2 = 0.05 ## sig.level = 0.05 ## power = 0.852 ## alternative = two.sided ## ## NOTE: n is number in *each* group

32/38

slide-33
SLIDE 33

Example – Social Pressure Experiment, Power calculation

Let pT = 0.38 and pC = 0.30 Two-sample test at the 5% significance level Assume equal group size: n = nT = nC

  • 1. If n = 1000, what is the power of the test?
  • 2. What group size do we need for a power of 0.8?

power.prop.test(p1=0.38,p2=.3,n=1000)$power ## [1] 0.966 power.prop.test(p1=0.38,p2=.3,power=.8)$n ## [1] 549

33/38

slide-34
SLIDE 34

Power graph - H0 : p = 0.3, H1 : p = p1

p1 Power 0.1 0.2 0.3 0.38 0.5 0.6 0.0 0.2 0.4 0.6 0.8 1.0 n = 100 n = 200 n = 549

34/38

slide-35
SLIDE 35

Increase power by increasing sample size, n

Density z1−α 2

35/38

slide-36
SLIDE 36

Increase power by increasing sample size, 2 × n

Density z1−α 2

36/38

slide-37
SLIDE 37

Increase power by increasing sample size, 4 × n

Density z1−α 2

37/38

slide-38
SLIDE 38

Conclusion

We discussed two sample tests

◮ we need this when comparing populations

You understand

◮ difference between descriptive and causal differences ◮ counterfactuals ◮ experiments, internal validity, external validity ◮ balancing and confounding ◮ type I and type II errors, power (1-β) and size (α)

You can

◮ compute and interpret 2 sample tests for means and proportions ◮ perform power and sample size calculations

38/38