Comparing two samples Edwin Leuven Introduction We will now study - PowerPoint PPT Presentation

Comparing two samples Edwin Leuven

Introduction We will now study the relationship between 2 variables X and Y We consider the case when X is binary ◮ men/women, high school/college, city/countryside, treated/untreated and call X the explanatory variable. Y can be binary, continuous or have some other cardinality ◮ income, education, health and call Y the outcome variable. 2/38

Introduction – Between group differences We will focus today on differences in averages F.e., the difference in average income between men and women ◮ in the population E [ income | men ] − E [ income | women ] ◮ in the sample � � 1 1 income i − income i n men n women i : men i : women = income men − income women Such differences are always descriptive differences We are also often interested in causal differences 3/38

Introduction – Sample Average Causal Effect We refer to causal differences as “effects”, f.e. ◮ “What is the effect of college on earnings?” With the following potential outcomes ◮ y 1 i = earnings with college ◮ y 0 i = earnings without without college we can define the sample average treatment effect (SATE) n � SATE = 1 ( y 1 i − y 0 i ) n i =1 4/38

Introduction – Fundamental problem of causal inference The observed outcome (earnings) equals y i = x i y 1 i + (1 − x i ) y 0 i depending on whether i went to college ( x i = 1) or not ( x i = 0) For a given person i the causal effect of x i on y i ◮ equals: y 1 i − y 0 i ◮ is not identified: only 1 potential outcome is observed (depending on x i ) Q: How do we compute the SATE given that we only see y i ? 5/38

Introduction – Confounders How do we estimate SATE = y 1 − y 0 with observed earnings y i ? We can compare average observed earnings of people with college and people without college � SATE = y college − y no college = y 1college − y 0no college Does this give the SATE? Only if E [ y 0 | college] = E [ y 0 | no college] E [ y 1 | college] = E [ y 1 | no college] college is randomly assigned! 6/38

Introduction – RCT’s In practice we think that college graduates are different from people without college in ways that matter for their earnings, f.e. E [ ability | x = 1] � = E [ ability | x = 0] If ability also affects y , then we can no longer attribute differences in y to differences in x alone! This is why randomized control trials (RCT’s) are considered the “gold standard” when estimating causal effects: ◮ randomization equalizes the treatment and control group on pre-existing characteristics such as ability 7/38

Introduction – RCT’s This balancing properting of RCT’s is good for internal validity ◮ internal validity = � SATE can be given a causal interpretation in the sample ◮ but watch out for: ◮ Hawthorne effects (somebody’s watching me!) ◮ John Henry effects (am gonna show you!) ◮ Attrition Bias (let’s get outta here!) The external validity of RCT’s is sometimes less clear ◮ external validity = is the SATE estimate valid for other samples? 8/38

Two-sample estimator But for now assume that x is randomized by a coin flip We want to test the following H 0 : E [ y | x = 1] = E [ y | x = 0] vs H 1 : E [ y | x = 1] � = E [ y | x = 0] or H 0 : θ = 0 vs H 1 : θ � = 0 where θ = E [ y | x = 1] − E [ y | x = 0] We estimate θ with the corresponding sample average ˆ θ = y x =1 − y x =0 9/38

Example – Social Pressure Experiment August 2006 Primary Statewide Election in Michigan Send postcards with different (randomly assigned) messages 1. no message (control group) 2. civic duty message 3. “you are being studied” message (Hawthorne effect) 4. neighborhood social pressure message 10/38

Example – Social Pressure Experiment 11/38

Example – Social Pressure Experiment, Balance social = read.csv ("_data/social.csv") with (social, tapply (primary2004, messages, mean)) ## Civic Duty Control Hawthorne Neighbors ## 0.399 0.400 0.403 0.407 with (social, tapply (hhsize, messages, mean)) ## Civic Duty Control Hawthorne Neighbors ## 2.19 2.18 2.18 2.19 12/38

Example – Social Pressure Experiment, Effects m06 = with (social, tapply (primary2006, messages, mean)) m06 ## Civic Duty Control Hawthorne Neighbors ## 0.315 0.297 0.322 0.378 m06 - m06["Control"] ## Civic Duty Control Hawthorne Neighbors ## 0.0179 0.0000 0.0257 0.0813 13/38

Example – Social Pressure Experiment Turnout rate: Y T = 0 . 38, Y C = 0 . 30, Sample size: n T = 360, n C = 1890 Estimated average treatment effect: � ATE = Y T − Y C = 0 . 08 How to compute the 95% CI of ˆ θ and perform a test of H 0 ? 14/38

One-Sample CI and t-Test First remember how we computed the CI and perform the t-test for a single average ¯ x ? By the CLT we know that in large samples approx ¯ X ∼ N ( E [ X ] , Var( X ) / n ) and � CI 95% = � � E [ X ] ± 1 . 96 × Var( X ) / n = ¯ X ± 1 . 96 × SE where � � 1 x ) 2 / n SE = ( x i − ¯ ( n − 1) and ¯ X − E [ X ] t = ∼ t ( n − 1) SE 15/38

Two-Sample CI and t-Test Our estimator is now the difference between 2 sample averages: ˆ θ = y x =1 − y x =0 How is ˆ θ distributed? Since approx N ( µ 1 , σ 2 ∼ y x = k ¯ k / n k ) where µ k = E [ Y | X = k ] and σ 2 k = Var( Y | X = k ) we have that approx ˆ N ( µ 1 − µ 0 , σ 2 1 / n 1 + σ 2 θ ∼ 0 / n 0 ) which follows from the following result. 16/38

Sums of normal random variables Sums of normal random variables If X k ∼ N ( µ k , σ 2 k ) k = 1 , 2 where µ k = E [ X k ] and σ 2 k = Var( X k ), then X 1 + X 2 ∼ N ( E [ X 1 + X 2 ] , Var( X 1 + X 2 )) ∼ N ( µ 1 + µ 2 , σ 2 1 + σ 2 1 + 2 σ 12 ) where σ 12 = Cov ( X 1 , X 2 ), and σ 12 = 0 if X 1 & X 2 are independent. 17/38

Example – Social Pressure Experiment Turnout rate: Y T = 0 . 38, Y C = 0 . 30, Sample size: n T = 360, n C = 1890 Estimated average treatment effect: � ATE = Y T − Y C = 0 . 07 Standard error: � Y T (1 − Y T ) + Y C (1 − Y C ) SE = = 0 . 028 n T n C 95% Confidence intervals based on CLT: ( � � ATE − SE × z 0 . 975 , ATE + SE × z 0 . 025 ) = (0 . 026 , 0 . 134) 18/38

t-Test – Large samples Under H 0 : θ = 0 our test-statistic now becomes ˆ θ − 0 y x =1 − y x =0 − 0 t = = � SE (ˆ θ ) σ 2 σ 2 ˆ x =1 / n 1 + ˆ x =0 / n 0 σ 2 where ˆ x = k is the sample variance of y i for the x = k group: � 1 σ 2 ( y i − y x = k ) 2 ˆ x = k = n k − 1 i : x i = k Under H 0 : θ = 0 and in large samples approx ∼ N (0 , 1) t 19/38

t-Test – Small samples In small samples approx t ∼ t ( k ) where ( σ 2 x =1 / n 1 + σ 2 x =0 / n 0 ) 2 k ≈ σ 4 x =1 / ( n 2 1 ( n 1 − 1)) + σ 4 x =0 / ( n 2 0 ( n 0 − 1)) when σ 2 x =1 = σ 2 x =0 we get that k = n − 2. 20/38

Example – Social Pressure Experiment Turnout rate: Y T = 0 . 38, Y C = 0 . 30, Sample size: n T = 360, n C = 1890 Estimated average treatment effect: � ATE = Y T − Y C = 0 . 07 Standard error: � Y T (1 − Y T ) + Y C (1 − Y C ) SE = = 0 . 028 n T n C T-statistic: t = 0 . 08 0 . 028 ≈ 2 . 9 21/38

Example – Social Pressure Experiment, Two-sample test We have the following reference distribution under H 0 : p T = p C N (0 , p (1 − p ) + p (1 − p ) ) n T n C p = (.38 * 360 + .3 * 1890) / (360 + 1890) se0 = sqrt (p * (1 - p) / 360 + p * (1 - p) / 1890) p; se0 ## [1] 0.313 ## [1] 0.0267 t = 0 . 07 0 . 028 = 2 . 9 gives a p -value of: 2 * pnorm (.08 / se0, lower.tail = F) # p-value ## [1] 0.00269 22/38

Power We reject the null if the test statistic is “too large” to be consistent with our null hypothesis: � if | t | > c reject H 0 decision = do not reject H 0 if | t | ≤ c H 0 is true H 0 is false Not reject H 0 Correct Type II error probability 1 − α probability β Reject H 0 Type I error Correct probability 1 − β probability α Hypothesis tests control the probability of Type I error, which is equal to the level of tests or α They do not control the probability of Type II error 23/38

Power Null hypotheses are often uninteresting But, hypothesis testing may indicate the strength of evidence for or against your theory Our ability to discrimintate between H 0 and H 1 is measured by power: power = 1 − Pr(Type II error) = 1 − β A large p -value can occur either because H 0 is true or because H 0 is false but the test is not powerful. There is a tradeoff between the two types of error, but typically, we want a most powerful test given the level 24/38

Power Analysis Power analysis: 1. Choose H 0 , H 1 , α ◮ f.e. H 0 : µ = µ 0 , H 1 : µ = µ 1 , α = 0 . 05 ◮ µ = µ 1 which implies X ∼ N ( µ 1 , V ( X ) / n ) 2. Choose population parameter under hypothetical data generating process 3. Fix either 3.1 the sample size, and compute power ◮ we reject H 0 if | X | > µ 0 + z α/ 2 × SE 3.2 the desired power, and compute required sample size ◮ fix the probability in 3a. and solve for n 25/38

Power – The Big Picture Distribution Area = under H 0 Pr(Type II error) Distribution under H 1 Density Critical Level under Critical Level 26/38

Low Power Distribution Area = under H 0 Pr(Type II error) Density Distribution under H 1 z 1 −α 2 27/38

High Power Distribution Distribution Area = under H 0 under H 1 Pr(Type II error) Density z 1 −α 2 28/38

Comparing two samples Edwin Leuven Introduction We will now study - PowerPoint PPT Presentation

Comparing two samples Edwin Leuven Introduction We will now study the relationship between 2 variables X and Y We consider the case when X is binary men/women, high school/college, city/countryside, treated/untreated and call X the

Business Statistics CONTENTS Comparing two samples Comparing two unrelated samples Comparing

Comparing Two Samples We are often interested in comparing measurements made under two different

Chapter 8 Slide 1 Inferences from Two Samples 8-1 Overview 8-2 Inferences about Two Proportions

Single Factor Experiments Moving from comparing two samples (e.g. two formulations of mortar) to

Comparing Several Samples We are often interested in comparing measurements made under more than

Samples Advertising of samples and handing out samples Advertising Education and Assurance

-Samples [AB98] Hyp: domain S is a smooth curve or surface. S 1 -Samples [AB98] Hyp:

Huamei Dong 04/12/2016 1. Z test or T test for one mean (one sample) or two means (two samples)

Business Statistics CONTENTS Comparing two s Comparing more than two s Analysis of

STAT 113 Independent vs. Paired Samples Colin Reimer Dawson Oberlin College November 16, 2017

Biased and Unbiased Samples James J. Heckman Econ 312, Spring 2019 May 14, 2019 1 / 125

Biased and Unbiased Samples James J. Heckman Econ 312, Spring 2019 May 13, 2019 1 / 125

Climate: What Is It Anyway Comparing Weather and Climate Climate Regions and Biomes Comparing

Business Statistics CONTENTS Comparing the variance of two populations The -distribution The

ACMS 20340 Statistics for Life Sciences Chapter 20: Comparing Two Proportions Two sample tests

Labeling Blood Samples There are documented occurrences and near misses of mislabeling of blood

CS Education Research at Virginia Tech Cliff Shaffer Department of Computer Science Virginia Tech

Human factors: Intertwining of formal and empirical methods in software engineering HaPoC

Hawthorne E F F E C T Direct-to-patient clinical trial participation platform Pre-COVID

26:010:557 / 26:620:557 Social Science Research Methods Dr. Peter R. Gillett Associate

Potential outcomes & threats to validity February 19, 2020 Fill out your reading report

Evaluation C.W. Johnson, Univ ersit y of Glasgo w, Glasgo w, G12 8QQ. Scotland.

Pitfalls and Countermeasures in Software Quality Measurements and Evaluations Hironori Washizaki

MARK2052 MR3 Quantitative Research (T3-2019) 1 Lecture structure for this lecture Course

Comparing two samples Edwin Leuven Introduction We will now study - PowerPoint PPT Presentation

Comparing two samples Edwin Leuven Introduction We will now study the relationship between 2 variables X and Y We consider the case when X is binary men/women, high school/college, city/countryside, treated/untreated and call X the

Business Statistics CONTENTS Comparing two samples Comparing two unrelated samples Comparing

Comparing Two Samples We are often interested in comparing measurements made under two different

Chapter 8 Slide 1 Inferences from Two Samples 8-1 Overview 8-2 Inferences about Two Proportions

Single Factor Experiments Moving from comparing two samples (e.g. two formulations of mortar) to

Comparing Several Samples We are often interested in comparing measurements made under more than

Samples Advertising of samples and handing out samples Advertising Education and Assurance

-Samples [AB98] Hyp: domain S is a smooth curve or surface. S 1 -Samples [AB98] Hyp:

Huamei Dong 04/12/2016 1. Z test or T test for one mean (one sample) or two means (two samples)

Business Statistics CONTENTS Comparing two s Comparing more than two s Analysis of

STAT 113 Independent vs. Paired Samples Colin Reimer Dawson Oberlin College November 16, 2017

Biased and Unbiased Samples James J. Heckman Econ 312, Spring 2019 May 14, 2019 1 / 125

Biased and Unbiased Samples James J. Heckman Econ 312, Spring 2019 May 13, 2019 1 / 125

Climate: What Is It Anyway Comparing Weather and Climate Climate Regions and Biomes Comparing

Business Statistics CONTENTS Comparing the variance of two populations The -distribution The

ACMS 20340 Statistics for Life Sciences Chapter 20: Comparing Two Proportions Two sample tests

Labeling Blood Samples There are documented occurrences and near misses of mislabeling of blood

CS Education Research at Virginia Tech Cliff Shaffer Department of Computer Science Virginia Tech

Human factors: Intertwining of formal and empirical methods in software engineering HaPoC

Hawthorne E F F E C T Direct-to-patient clinical trial participation platform Pre-COVID

26:010:557 / 26:620:557 Social Science Research Methods Dr. Peter R. Gillett Associate

Potential outcomes &amp; threats to validity February 19, 2020 Fill out your reading report

Evaluation C.W. Johnson, Univ ersit y of Glasgo w, Glasgo w, G12 8QQ. Scotland.

Pitfalls and Countermeasures in Software Quality Measurements and Evaluations Hironori Washizaki

MARK2052 MR3 Quantitative Research (T3-2019) 1 Lecture structure for this lecture Course

Potential outcomes & threats to validity February 19, 2020 Fill out your reading report