Variability of a Difference CI and Test for Difference of Proportions CI and Test for Difference of Means
STAT 113 Analytic Intervals and Tests for Differences Between Two - - PowerPoint PPT Presentation
STAT 113 Analytic Intervals and Tests for Differences Between Two - - PowerPoint PPT Presentation
Variability of a Difference CI and Test for Difference of Proportions CI and Test for Difference of Means STAT 113 Analytic Intervals and Tests for Differences Between Two Groups Colin Reimer Dawson Oberlin College November 9, 2017 1 / 27
Variability of a Difference CI and Test for Difference of Proportions CI and Test for Difference of Means
Cases to Address
We will need standard errors to do CIs and tests for the following parameters:
- 1. Single Proportion (Last Time)
- 2. Single Mean (Wrap Up Today)
- 3. Difference of Proportions (Today)
- 4. Difference of Means (Today)
- 5. Mean of Differences (Next Week)
2 / 27
Variability of a Difference CI and Test for Difference of Proportions CI and Test for Difference of Means
Variability of a Difference CI and Test for Difference of Proportions CI and Test for Difference of Means 3 / 27
Variability of a Difference CI and Test for Difference of Proportions CI and Test for Difference of Means
Example: Penguins Again!
Penguin Breeding
The scientists who studied whether metal bands were harmful to penguin survival also examined whether they affected the penguins’ breeding
- patterns. For the metal-band group, 39 of 122 penguin-seasons resulted
in offspring (32%). In the control group, 70 out of 160 penguin-seasons 32% of 122 breeding seasons (combined across penguins), whereas the controls had offpsring in 70 of 160 breeding seasons (44%).
- If we want to construct a confidence interval or do a test about
the difference in the proportion “breeding opportunities” that were successful, what is the relevant population parameter?
- What is the relevant sample statistic?
4 / 27
Variability of a Difference CI and Test for Difference of Proportions CI and Test for Difference of Means
Outline
Variability of a Difference CI and Test for Difference of Proportions CI and Test for Difference of Means 5 / 27
Variability of a Difference CI and Test for Difference of Proportions CI and Test for Difference of Means
Variance and Standard Error of Differences
- With two independent samples, A and B, then quantities such
as ˆ pA − ˆ pB that depend on both random samples have two independent sources of variability.
- So the difference is more variable than either sample statistic
alone.
- Specifically, the variance of the difference is the sum of the
separate variances: s2
ˆ pA−ˆ pB = s2 ˆ pA + s2 ˆ pB
6 / 27
Variability of a Difference CI and Test for Difference of Proportions CI and Test for Difference of Means
Variance and Standard Error of Proportions
- Recall: across all random samples, the standard deviation of
the sample proportions (i.e., the standard error) is sˆ
p =
- p(1 − p)
n where p is the population proportion and n is the sample size.
- The variance of ˆ
p is the square of this; i.e., the same thing without the square root. 7 / 27
Variability of a Difference CI and Test for Difference of Proportions CI and Test for Difference of Means
Standard Error of Difference of Proportions
So the variance of the difference between two independent sample proportions is s2
ˆ pA−ˆ pB = pA(1 − pA)
nA + pB(1 − pB) nB and the standard deviation (i.e., standard error) of the difference is sˆ
pA−ˆ pB =
- pA(1 − pA)
nA + pB(1 − pB) nB 8 / 27
Variability of a Difference CI and Test for Difference of Proportions CI and Test for Difference of Means
Standard Error of Difference of Means
The exact same reasoning applies to the standard error of a difference between means of two independent samples: s¯
xA =
σA √nA s¯
xB =
σB √nB s2
¯ xA = σ2 A
nA s2
¯ xB =
σ2
B
nB s2
¯ xA−¯ xB = σ2 A
nA + σ2
B
nB s¯
xA−¯ xB =
- σ2
A
nA + σ2
B
nB 9 / 27
Variability of a Difference CI and Test for Difference of Proportions CI and Test for Difference of Means
Analytic Approximations of Sampling Distributions
Param. Stat. Randomization Theory SE Test Dist. p ˆ p Simulate from p0
- p0(1−p0)
n
Normal µ ¯ x Bootstrap + shift
s √n
tn−1 pA − pB ˆ pA − ˆ pB Scramble groups
- pA(1−pA)
nA
+ pB(1−pB)
nB
Normal µA − µB ¯ xA − ¯ xB Scramble groups
- s2
A
nA + s2
B
nB
tmin(nA,nB)−1 µD ¯ xD Flip pairs∗
sD √nD
tnD−1 ρ r Scramble pairings
- 1−r2
n−2
tn−2
CI : Statistic ± Critical Value × SE Sandardized Test Statistic : Statistic − Null Param.
- SE
10 / 27
Variability of a Difference CI and Test for Difference of Proportions CI and Test for Difference of Means
Outline
Variability of a Difference CI and Test for Difference of Proportions CI and Test for Difference of Means 11 / 27
Variability of a Difference CI and Test for Difference of Proportions CI and Test for Difference of Means
Distribution of ˆ pA − ˆ pB
- Condition: The sampling distribution of ˆ
pA − ˆ pB is approximately Normal with at least 10 cases from all four combinations: nApA ≥ 10 nA(1 − pA) ≥ 10 nBpB ≥ 10 nB(1 − pB) ≥ 10
- Mean: pA − pB
- Standard deviation (standard error):
SEˆ
p =
- pA(1 − pA)
nA + pB(1 − pB) nB 12 / 27
Variability of a Difference CI and Test for Difference of Proportions CI and Test for Difference of Means
Confidence Interval for a Difference of Proportions
CI Summary: Difference of Proportions
To compute a confidence interval for a difference of two proportions when the sampling distribution for ˆ pA − ˆ pB is approximately Normal (see the last slide for conditions)
- 1. Find the standardized endpoints, Z∗, for the confidence level,
using a standard Normal
- 2. “Destandardize” to get the endpoints
ˆ pA − ˆ pB ± Z∗ ·
- ˆ
pA(1 − ˆ pA) nA + ˆ pB(1 − ˆ pB) nB Why do we use ˆ pA and ˆ pB in the standard error, again? 13 / 27
Variability of a Difference CI and Test for Difference of Proportions CI and Test for Difference of Means
P-values for a difference of two sample proportions from a Standard Normal
Computing P-values when the null sampling distribution is approximately Normal (see previously stated conditions) is the reverse process:
- 1. Convert ˆ
pA − ˆ pB to a z-score within the theoretical null sampling distribution (i.e., using its mean and standard deviation). Zobserved = ˆ pA − ˆ pB − 0 ?
- 2. Find the relevant area beyond Zobserved using a Standard
Normal 14 / 27
Variability of a Difference CI and Test for Difference of Proportions CI and Test for Difference of Means
Null Standard Error
- Problem: H0 states what the difference is, but the standard
error depends on each population proportion: sˆ
pA−ˆ pB =
- pA(1 − pA)
nA + pB(1 − pB) nB
- This is not a function of the difference.
- But, H0 says that pA and pB are the same thing, so we can
estimate this single number using ˆ pcombined, the proportion of the relevant category across both groups.
- Note: hold this proportion constant already when doing a
randomization test. 15 / 27
Variability of a Difference CI and Test for Difference of Proportions CI and Test for Difference of Means
P-values for a difference of two proportions
Computing P-values when the null sampling distribution is approximately Normal (see previously stated conditions) is the reverse process:
- 1. Convert ˆ
pA − ˆ pB to a z-score within the theoretical null sampling distribution (i.e., using its mean and standard deviation). Zobserved = ˆ pA − ˆ pB − 0
- ˆ
pcombined(1−ˆ pcombined) nA
+ ˆ
pcombined(1−ˆ pcombined) nB
= ˆ pA − ˆ pB − 0
- ˆ
pcombined(1 − ˆ pcombined)( 1
nA + 1 nB )
- 2. Find the relevant area beyond Zobserved using a Standard
Normal 16 / 27
Variability of a Difference CI and Test for Difference of Proportions CI and Test for Difference of Means
Example: Penguins Again!
Penguin Breeding
The scientists who studied whether metal bands were harmful to penguin survival also examined whether they affected the penguins’ breeding
- patterns. For the metal-band group, 39 of 122 penguin-seasons resulted
in offspring (32%). In the control group, 70 out of 160 penguin-seasons 32% of 122 breeding seasons (combined across penguins), whereas the controls had offpsring in 70 of 160 breeding seasons (44%).
nmetal = ncontrol = ˆ pmetal = ˆ pcontrol = ˆ pmetal − ˆ pcontrol = 17 / 27
Variability of a Difference CI and Test for Difference of Proportions CI and Test for Difference of Means
Penguins Breed Confidence (Interval)
Is the Normal approximation reasonable? CI : point estimate ± Z∗ · SE SE =
- ˆ
pA(1 − ˆ pA) nA + ˆ pB(1 − ˆ pB) nB = Z∗ = Find a 90% CI for pA − pB 18 / 27
Variability of a Difference CI and Test for Difference of Proportions CI and Test for Difference of Means
Do metal bands reduce breeding chances?
Zobserved = observed difference − null difference standard error SE =
- ˆ
pcombined(1 − ˆ pcombined)( 1 nA + 1 nB ) ˆ pcombined = SE = Zobserved = P-value = Decision? 19 / 27
Variability of a Difference CI and Test for Difference of Proportions CI and Test for Difference of Means
Outline
Variability of a Difference CI and Test for Difference of Proportions CI and Test for Difference of Means 20 / 27
Variability of a Difference CI and Test for Difference of Proportions CI and Test for Difference of Means
Distribution of Difference of Sample Means
Distribution of ¯ x (σ known)
- Two populations with means µA and µB, and standard
deviations σA and σB
- Conditions: Sampling distribution of ¯
xA − ¯ xB is Normal if
- Populations are Normal, or
- Sample sizes in each group are large enough (roughly can use
nA, nB ≥ 27)
- Mean: µA − µB
- Standard deviation (standard error):
SE¯
xA−¯ xB =
- σ2
A
nA + σ2
B
nB 21 / 27
Variability of a Difference CI and Test for Difference of Proportions CI and Test for Difference of Means
Estimating σA and σB
Distribution of standardized difference
- Since we don’t know σA and σB, we use
- SE¯
xA−¯ xB =
- s2
A
nA + s2
B
nB
- Now, the standardized difference has a t-distribution
(¯ xA − ¯ xB) − (µA − µB)
- s2
A
nA + s2
B
nB
∼ td
f
- A conservative rule is to set d
f = min(nA, nB) − 1 22 / 27
Variability of a Difference CI and Test for Difference of Proportions CI and Test for Difference of Means
Example: Credit Card Use and Tip Percentage
Credit Cards and Tip Percentage
We analyze the percent tip left on 157 bills from the First Crush bistro in Northern New York State. The mean percent tip left on the 106 bills paid in cash was 16.39 with a standard deviation of 5.05. The mean percent tip left on the 51 bills paid with a credit card was 17.10 with a standard deviation of 2.47.
library("mosaic"); library("Lock5Data"); data("RestaurantTips") dotPlot(~PctTip | Credit, data = RestaurantTips, width = 1, cex = 1)
PctTip Count
5 10 15 10 20 30 40
- ●
- ●
n
10 20 30 40
- ●
- y
23 / 27
Variability of a Difference CI and Test for Difference of Proportions CI and Test for Difference of Means
Credit Cards and Tip Percentage
We analyze the percent tip left on 157 bills from the First Crush bistro in Northern New York State. The mean percent tip left on the 106 bills paid in cash was 16.39 with a standard deviation of 5.05. The mean percent tip left on the 51 bills paid with a credit card was 17.10 with a standard deviation of 2.47.
ncard = ncash = ¯ xcard = ¯ xcash = s2
card =
s2
cash =
¯ xcard − ¯ xcash = 24 / 27
Variability of a Difference CI and Test for Difference of Proportions CI and Test for Difference of Means
Card Vs. Cash Confidence Interval
Is the Normal approximation reasonable? CI : point estimate ± T ∗ · SE SE =
- s2
A
nA + s2
B
nB = T ∗ = Find a 99% CI for µcard − µcash. 25 / 27
Variability of a Difference CI and Test for Difference of Proportions CI and Test for Difference of Means
Do people who pay differently tip differently?
Tobserved = observed difference − null difference standard error SE =
- s2
A
nA + s2
B
nB SE = Tobserved = P-value = Decision? 26 / 27
Variability of a Difference CI and Test for Difference of Proportions CI and Test for Difference of Means
Analytic Approximations of Sampling Distributions
Param. Stat. Randomization Theory SE Test Dist. p ˆ p Simulate from p0
- p0(1−p0)
n
Normal µ ¯ x Bootstrap + shift
s √n
tn−1 pA − pB ˆ pA − ˆ pB Scramble groups
- pA(1−pA)
nA
+ pB(1−pB)
nB
Normal µA − µB ¯ xA − ¯ xB Scramble groups
- s2
A
nA + s2
B
nB
tmin(nA,nB)−1 µD ¯ xD Flip pairs∗
sD √nD
tnD−1 ρ r Scramble pairings
- 1−r2
n−2
tn−2
CI : Statistic ± Critical Value × SE Sandardized Test Statistic : Statistic − Null Param.
- SE