SLIDE 1
ACMS 20340 Statistics for Life Sciences Chapter 20: Comparing Two - - PowerPoint PPT Presentation
ACMS 20340 Statistics for Life Sciences Chapter 20: Comparing Two - - PowerPoint PPT Presentation
ACMS 20340 Statistics for Life Sciences Chapter 20: Comparing Two Proportions Two sample tests of proportion Just as we compared the means of two different populations, we would like to compare the proportions of two different populations. A
SLIDE 2
SLIDE 3
Sampling Distribution of ˆ p1 − ˆ p2
◮ ˆ
p1 − ˆ p2 is approximately Normal.
◮ The mean of ˆ
p1 − ˆ p2 is p1 − p2.
◮ The standard deviation of ˆ
p1 − ˆ p2 is
- p1(1 − p1)
n1 + p2(1 − p2) n2 .
SLIDE 4
Two population large sample CI
For a large sample CI we can estimate the SE using ˆ p1 and ˆ p2: SE =
- ˆ
p1(1 − ˆ p1) n1 + ˆ p2(1 − ˆ p2) n2 and the corresponding confidence interval is (ˆ p1 − ˆ p2) ± z∗SE Conditions on using the large sample test: To compute a two population large sample CI we require the sample size to be large enough that the number of successes in each sample is more than
- 10. Same for the number of failures in each sample.
SLIDE 5
DHMO Example, continued
Broadening our survey on DHMO awareness, we would like to compare Elkhart with South Bend by finding a 95% CI for the difference in awareness. We collect the following data. Population Size Successes Proportion (1) S.Bend 183 29 0.158 (2) Elkhart 164 33 0.201 Does this data fit the requirements to do a large sample confidence interval?
◮ We need each sample to contain at least 10 successes and 10
failures.
◮ Check!
SLIDE 6
DHMO 2 population large sample CI, cont.
Population Size Successes Proportion (1) S.Bend 183 29 0.158 (2) Elkhart 164 33 0.201 We calculate the standard error of ˆ p1 − ˆ p2: SE =
- ˆ
p1(1 − ˆ p1) n1 + ˆ p2(1 − ˆ p2) n2 = 0.041 Giving the interval (0.158 − 0.201) ± (1.96)(0.041) = [−0.124, 0.038] 0 being in the interval supports the conclusion that there is no difference between the populations.
SLIDE 7
“Plus Four”
Just as in the one-sample situation, there is a way to adjust ˆ p1 and ˆ p2 to get more accurate confidence intervals. This is the “plus four” technique adapted for two samples. To do this we still imagine adding 4 “imaginary” observations. This time we add two to each sample, with each sample getting an imaginary success and failure. ˜ p1 = X1 + 1 n1 + 2 ˜ p2 = X2 + 1 n2 + 2 Like before, the plus-four technique lets us construct CI under much weaker conditions than needed for the large sample method. Conditions on plus-four CI: each sample has at least 5
- bservations (with any mixture of successes and failures)
SLIDE 8
Plus four example
Lets construct a plus-four CI on the data from before. Population Size Successes Proportion (1) S.Bend 183 29 0.158 (2) Elkhart 164 33 0.201 Calculate ˜ p1 = 29 + 1 183 + 2 = 0.162 ˜ p2 = 33 + 1 164 + 2 = 0.205 The standard error is SE =
- ˜
p1(1 − ˜ p1) n1 + 2 + ˜ p2(1 − ˜ p2) n2 + 2 = 0.041 And the interval is (˜ p1 − ˜ p2) ± z∗SE = (0.162 − 0.205) ± (1.96)(0.041) [−0.124, 0.039] Note that we used the altered sample sizes when calculating SE.
SLIDE 9
Comparing Two Proportions
Recall that we are comparing proportions between two different populations. We have a sample from each population. Popluation Parameter Size Successes Proportion 1 p1 n1 X1 ˆ p1 2 p2 n2 X2 ˆ p2 Last time we discussed how to find a confidence interval for the difference between the two population proportions, p1 − p2.
SLIDE 10
Significance Tests for Comparing Proportions
As we did when comparing two means, we’ll assume a null hypothesis of no difference. H0 : p1 = p2 The alternative hypothesis is also treated the same way. It can be
- ne or two sided.
Ha : p1 = p2 Ha : p1 > p2 Ha : p1 < p2
SLIDE 11
Significance Tests for Comparing Proportions
Also, since p1 and p2 are unknown, we’ll use ˆ p1 − ˆ p2 to estimate the difference between the two proportions. To calculate a test statistic, we’ll need a value for the standard error. For confidence intervals, we used SE =
- ˆ
p1(1 − ˆ p1) n1 + ˆ p2(1 − ˆ p2) n2
SLIDE 12
Standard Error
However, there is a change we can make when comparing proportions. If H0 is true, then we can pool the two samples together into a more accurate standard error. We do this by calculating a pooled sample proportion ˆ p = number of successes in both samples combined number of individuals in both samples combined
SLIDE 13
Pooled Sample Proportion
Why can we combine the samples together? Because if H0 is true, all observations in both samples come from similar populations, so we can consider our observations coming from one combined population. This is a special feature of tests for proportion. When comparing two means, the two populations could have different standard deviations. So assuming the null hypothesis is true in that case did not allow us to combine the two populations.
SLIDE 14
Pooled Sample Proportion
However, the standard deviation for the sampling distribution of proportions depends only on p (the proportion for the combined populations). This changes our standard error to SE =
- ˆ
p(1 − ˆ p) 1 n1 + 1 n2
- We’ll use this when calculating our test statistic.
SLIDE 15
Sampling Distribution
SLIDE 16
Significance Test for Comparing Two Proportions
To test the hypothesis H0 : p1 = p2, we first find the pooled sample proportion. Then the two-sample z test statistic is z = ˆ p1 − ˆ p2
- ˆ
p(1 − ˆ p) 1 n1 + 1 n2
- We can use this procedure when the counts of successes and
failures are each 5 or more in each sample.
SLIDE 17
Gastric Freezing 1
Gastric freezing was once a treatment for ulcers. Patients would swallow a deflated balloon with tubes to cool their stomach for an hour in hope of reducing acid production and relieving ulcer pain. The treatment was safe and widely used for years. A randomized comparative experiment was later performed to test the effectiveness of gastric freezing.
SLIDE 18
Gastric Freezing 2
The results of the experiment are summarized as follows. Note that a “success” is if the patient’s condition improves from the treatment administered. Population Treatment Sample Size Successes Proportion 1 Gastric Freezing 82 28 0.342 2 Placebo 78 30 0.385
SLIDE 19
Gastric Freezing 3
As usual, we’ll use the null hypothesis of “no difference.” H0 : p1 = p2 Since we expect the treatment to work better than a placebo, we’ll use a one-sided alternative. Ha : p1 > p2 Also, we have enough successes and failures to pass the necessary conditions, so we can proceed.
SLIDE 20
Gastric Freezing 4
First, we’ll find the pooled sample proportion. ˆ p = number of successes in both samples combined number of individuals in both samples combined = 28 + 30 82 + 78 = 58 160 = 0.363
SLIDE 21
Gastric Freezing 5
So the two-sample z test statistic is z = ˆ p1 − ˆ p2
- ˆ
p(1 − ˆ p) 1 n1 + 1 n2
- =
0.342 − 0.385
- (0.363)(0.637)
1
82 + 1 78
- = −0.57
SLIDE 22
Gastric Freezing 6
Recall our alternative hypothesis is Ha : p1 > p2. <( The P-value is 0.7157. We fail to reject the null hypothesis. There is no evidence that gastric freezing is better than a placebo.
SLIDE 23
Single Sample t Procedures
Situation: We want to perform inference methods on a single sample mean without knowing the mean or standard deviation of the population. Confidence Interval: A level C confidence interval for the mean µ
- f a population is
¯ x ± t∗ s √n where t∗ is the critical value of the t distribution with n − 1 degrees of freedom.
SLIDE 24
Single Sample t Procedures
Hypothesis Test: In order to test the hypothesis H0 : µ = µ0, we calculate the one-sample t statistic t = ¯ x − µ s/√n and find P-values from the t(n − 1) distribution. Conditions: The t procedures work for samples of size n ≥ 15, unless the sample has outliers or strong skewness.
SLIDE 25
Special Case: Matched Pairs
Situation: With matched pairs, we perform inference methods on the difference between the two groups. (as in the case with two independent samples) However, since the individuals in each group are paired up, we treat the individual differences as a single sample. We then perform the single sample inference tests on the unknown mean of the differences, µd.
SLIDE 26
Two-sample t Procedures
Situation: We want to perform inference methods by comparing two population means. We have two samples, which are
- independent. We do not know the mean or standard deviation of
either population. Degrees of Freedom: Instead of using n − 1 (since the samples may be different sizes), we calculate degrees of freedom with the following equation. df = s2
1
n1 + s2
2
n2
2
1 n1−1
s2
1
n1
2 +
1 n2−1
s2
2
n2
2
SLIDE 27
Two-sample t Procedures
Confidence Interval: A level C confidence interval for the difference between the population means, µ1 − µ2 is (¯ x1 − ¯ x2) ± t∗
- s2
1
n1 + s2
2
n2 Hypothesis Test: To test the hypothesis H0 : µ1 = µ2, calculate the two-sample t statistic, t = ¯ x1 − ¯ x2
- s2
1
n1 + s2
2
n2
SLIDE 28
Two-sample t Procedures
Note that we find P-values and critical values in the same manner as the one-sample procedures. However, the degrees of freedom for each method requires the equation mentioned earlier. Conditions: Both sample sizes should be at least 5, unless the distributions do not have similar shapes.
SLIDE 29
Single Sample Proportion Procedures
Situation: We want to perform inference methods on a proportion taken from a single sample (categorical variable). Confidence Interval (large sample): A level C large sample confidence interval for the population proportion p is ˆ p ± z∗
- ˆ
p(1 − ˆ p) n Conditions: The number of successes and failures should each be at least 15.
SLIDE 30
Single Sample Proportion Procedures
Confidence Interval (plus four): A level C “plus four” confidence interval for the population proportion p is ˜ p ± z∗
- ˜
p(1 − ˜ p) n + 4 where ˜ p = number of successes in the sample + 2 n + 4 Conditions: The confidence level must be at least 90% and the sample size should be at least 10.
SLIDE 31
Single Sample Proportion Procedures
Hypothesis Test: In order to test the hypothesis H0 : p = p0, we calculate the one-sample z statistic, z = ˆ p − p0
- p0(1 − p0)
n Conditions: The expected counts must be large enough so that np0 and n(1 − p0) are both 10 or more. (This does not rely on the sample at all.)
SLIDE 32
Two-sample Proportion Procedures
Situation: We want to perform inference methods by comparing two population proportions. We have two samples, which are independent. Confidence Interval (large sample): A level C confidence interval for the difference of the two population proportions, p1 − p2, is (ˆ p1 − ˆ p2) ± z∗
- ˆ
p1(1 − ˆ p1) n1 + ˆ p2(1 − ˆ p2) n2 Conditions: Counts of successes and failures must be at least 10 for each of the two samples.
SLIDE 33
Two-sample Proportion Procedures
Confidence Interval (plus four): A level C “plus four” confidence interval for the difference of the two population proportions, p1 − p2, is (˜ p1 − ˜ p2) ± z∗
- ˜
p1(1 − ˜ p1) n1 + 2 + ˜ p2(1 − ˜ p2) n2 + 2 where 2 trials with one success are added to each sample proportion. Conditions: Each sample size must be at least 5.
SLIDE 34
Two-sample Proportion Procedures
Hypothesis Test: In order to test the hypothesis H0 : p1 = p2, we calculate the two-sample z statistic, z = ˆ p1 − ˆ p2
- ˆ
p(1 − ˆ p) 1 n1 + 1 n2
- where the pooled sample proportion is