ACMS 20340 Statistics for Life Sciences Chapter 20: Comparing Two - - PowerPoint PPT Presentation

acms 20340 statistics for life sciences
SMART_READER_LITE
LIVE PREVIEW

ACMS 20340 Statistics for Life Sciences Chapter 20: Comparing Two - - PowerPoint PPT Presentation

ACMS 20340 Statistics for Life Sciences Chapter 20: Comparing Two Proportions Two sample tests of proportion Just as we compared the means of two different populations, we would like to compare the proportions of two different populations. A


slide-1
SLIDE 1

ACMS 20340 Statistics for Life Sciences

Chapter 20: Comparing Two Proportions

slide-2
SLIDE 2

Two sample tests of proportion

Just as we compared the means of two different populations, we would like to compare the proportions of two different populations. A sample is taken from each population. Population Parameter Size Successes Proportion 1 p1 n1 X1 ˆ p1 2 p2 n2 X2 ˆ p2 We will use the statistic ˆ p1 − ˆ p2 to estimate the difference p1 − p2.

slide-3
SLIDE 3

Sampling Distribution of ˆ p1 − ˆ p2

◮ ˆ

p1 − ˆ p2 is approximately Normal.

◮ The mean of ˆ

p1 − ˆ p2 is p1 − p2.

◮ The standard deviation of ˆ

p1 − ˆ p2 is

  • p1(1 − p1)

n1 + p2(1 − p2) n2 .

slide-4
SLIDE 4

Two population large sample CI

For a large sample CI we can estimate the SE using ˆ p1 and ˆ p2: SE =

  • ˆ

p1(1 − ˆ p1) n1 + ˆ p2(1 − ˆ p2) n2 and the corresponding confidence interval is (ˆ p1 − ˆ p2) ± z∗SE Conditions on using the large sample test: To compute a two population large sample CI we require the sample size to be large enough that the number of successes in each sample is more than

  • 10. Same for the number of failures in each sample.
slide-5
SLIDE 5

DHMO Example, continued

Broadening our survey on DHMO awareness, we would like to compare Elkhart with South Bend by finding a 95% CI for the difference in awareness. We collect the following data. Population Size Successes Proportion (1) S.Bend 183 29 0.158 (2) Elkhart 164 33 0.201 Does this data fit the requirements to do a large sample confidence interval?

◮ We need each sample to contain at least 10 successes and 10

failures.

◮ Check!

slide-6
SLIDE 6

DHMO 2 population large sample CI, cont.

Population Size Successes Proportion (1) S.Bend 183 29 0.158 (2) Elkhart 164 33 0.201 We calculate the standard error of ˆ p1 − ˆ p2: SE =

  • ˆ

p1(1 − ˆ p1) n1 + ˆ p2(1 − ˆ p2) n2 = 0.041 Giving the interval (0.158 − 0.201) ± (1.96)(0.041) = [−0.124, 0.038] 0 being in the interval supports the conclusion that there is no difference between the populations.

slide-7
SLIDE 7

“Plus Four”

Just as in the one-sample situation, there is a way to adjust ˆ p1 and ˆ p2 to get more accurate confidence intervals. This is the “plus four” technique adapted for two samples. To do this we still imagine adding 4 “imaginary” observations. This time we add two to each sample, with each sample getting an imaginary success and failure. ˜ p1 = X1 + 1 n1 + 2 ˜ p2 = X2 + 1 n2 + 2 Like before, the plus-four technique lets us construct CI under much weaker conditions than needed for the large sample method. Conditions on plus-four CI: each sample has at least 5

  • bservations (with any mixture of successes and failures)
slide-8
SLIDE 8

Plus four example

Lets construct a plus-four CI on the data from before. Population Size Successes Proportion (1) S.Bend 183 29 0.158 (2) Elkhart 164 33 0.201 Calculate ˜ p1 = 29 + 1 183 + 2 = 0.162 ˜ p2 = 33 + 1 164 + 2 = 0.205 The standard error is SE =

  • ˜

p1(1 − ˜ p1) n1 + 2 + ˜ p2(1 − ˜ p2) n2 + 2 = 0.041 And the interval is (˜ p1 − ˜ p2) ± z∗SE = (0.162 − 0.205) ± (1.96)(0.041) [−0.124, 0.039] Note that we used the altered sample sizes when calculating SE.

slide-9
SLIDE 9

Comparing Two Proportions

Recall that we are comparing proportions between two different populations. We have a sample from each population. Popluation Parameter Size Successes Proportion 1 p1 n1 X1 ˆ p1 2 p2 n2 X2 ˆ p2 Last time we discussed how to find a confidence interval for the difference between the two population proportions, p1 − p2.

slide-10
SLIDE 10

Significance Tests for Comparing Proportions

As we did when comparing two means, we’ll assume a null hypothesis of no difference. H0 : p1 = p2 The alternative hypothesis is also treated the same way. It can be

  • ne or two sided.

Ha : p1 = p2 Ha : p1 > p2 Ha : p1 < p2

slide-11
SLIDE 11

Significance Tests for Comparing Proportions

Also, since p1 and p2 are unknown, we’ll use ˆ p1 − ˆ p2 to estimate the difference between the two proportions. To calculate a test statistic, we’ll need a value for the standard error. For confidence intervals, we used SE =

  • ˆ

p1(1 − ˆ p1) n1 + ˆ p2(1 − ˆ p2) n2

slide-12
SLIDE 12

Standard Error

However, there is a change we can make when comparing proportions. If H0 is true, then we can pool the two samples together into a more accurate standard error. We do this by calculating a pooled sample proportion ˆ p = number of successes in both samples combined number of individuals in both samples combined

slide-13
SLIDE 13

Pooled Sample Proportion

Why can we combine the samples together? Because if H0 is true, all observations in both samples come from similar populations, so we can consider our observations coming from one combined population. This is a special feature of tests for proportion. When comparing two means, the two populations could have different standard deviations. So assuming the null hypothesis is true in that case did not allow us to combine the two populations.

slide-14
SLIDE 14

Pooled Sample Proportion

However, the standard deviation for the sampling distribution of proportions depends only on p (the proportion for the combined populations). This changes our standard error to SE =

  • ˆ

p(1 − ˆ p) 1 n1 + 1 n2

  • We’ll use this when calculating our test statistic.
slide-15
SLIDE 15

Sampling Distribution

slide-16
SLIDE 16

Significance Test for Comparing Two Proportions

To test the hypothesis H0 : p1 = p2, we first find the pooled sample proportion. Then the two-sample z test statistic is z = ˆ p1 − ˆ p2

  • ˆ

p(1 − ˆ p) 1 n1 + 1 n2

  • We can use this procedure when the counts of successes and

failures are each 5 or more in each sample.

slide-17
SLIDE 17

Gastric Freezing 1

Gastric freezing was once a treatment for ulcers. Patients would swallow a deflated balloon with tubes to cool their stomach for an hour in hope of reducing acid production and relieving ulcer pain. The treatment was safe and widely used for years. A randomized comparative experiment was later performed to test the effectiveness of gastric freezing.

slide-18
SLIDE 18

Gastric Freezing 2

The results of the experiment are summarized as follows. Note that a “success” is if the patient’s condition improves from the treatment administered. Population Treatment Sample Size Successes Proportion 1 Gastric Freezing 82 28 0.342 2 Placebo 78 30 0.385

slide-19
SLIDE 19

Gastric Freezing 3

As usual, we’ll use the null hypothesis of “no difference.” H0 : p1 = p2 Since we expect the treatment to work better than a placebo, we’ll use a one-sided alternative. Ha : p1 > p2 Also, we have enough successes and failures to pass the necessary conditions, so we can proceed.

slide-20
SLIDE 20

Gastric Freezing 4

First, we’ll find the pooled sample proportion. ˆ p = number of successes in both samples combined number of individuals in both samples combined = 28 + 30 82 + 78 = 58 160 = 0.363

slide-21
SLIDE 21

Gastric Freezing 5

So the two-sample z test statistic is z = ˆ p1 − ˆ p2

  • ˆ

p(1 − ˆ p) 1 n1 + 1 n2

  • =

0.342 − 0.385

  • (0.363)(0.637)

1

82 + 1 78

  • = −0.57
slide-22
SLIDE 22

Gastric Freezing 6

Recall our alternative hypothesis is Ha : p1 > p2. <( The P-value is 0.7157. We fail to reject the null hypothesis. There is no evidence that gastric freezing is better than a placebo.

slide-23
SLIDE 23

Single Sample t Procedures

Situation: We want to perform inference methods on a single sample mean without knowing the mean or standard deviation of the population. Confidence Interval: A level C confidence interval for the mean µ

  • f a population is

¯ x ± t∗ s √n where t∗ is the critical value of the t distribution with n − 1 degrees of freedom.

slide-24
SLIDE 24

Single Sample t Procedures

Hypothesis Test: In order to test the hypothesis H0 : µ = µ0, we calculate the one-sample t statistic t = ¯ x − µ s/√n and find P-values from the t(n − 1) distribution. Conditions: The t procedures work for samples of size n ≥ 15, unless the sample has outliers or strong skewness.

slide-25
SLIDE 25

Special Case: Matched Pairs

Situation: With matched pairs, we perform inference methods on the difference between the two groups. (as in the case with two independent samples) However, since the individuals in each group are paired up, we treat the individual differences as a single sample. We then perform the single sample inference tests on the unknown mean of the differences, µd.

slide-26
SLIDE 26

Two-sample t Procedures

Situation: We want to perform inference methods by comparing two population means. We have two samples, which are

  • independent. We do not know the mean or standard deviation of

either population. Degrees of Freedom: Instead of using n − 1 (since the samples may be different sizes), we calculate degrees of freedom with the following equation. df = s2

1

n1 + s2

2

n2

2

1 n1−1

s2

1

n1

2 +

1 n2−1

s2

2

n2

2

slide-27
SLIDE 27

Two-sample t Procedures

Confidence Interval: A level C confidence interval for the difference between the population means, µ1 − µ2 is (¯ x1 − ¯ x2) ± t∗

  • s2

1

n1 + s2

2

n2 Hypothesis Test: To test the hypothesis H0 : µ1 = µ2, calculate the two-sample t statistic, t = ¯ x1 − ¯ x2

  • s2

1

n1 + s2

2

n2

slide-28
SLIDE 28

Two-sample t Procedures

Note that we find P-values and critical values in the same manner as the one-sample procedures. However, the degrees of freedom for each method requires the equation mentioned earlier. Conditions: Both sample sizes should be at least 5, unless the distributions do not have similar shapes.

slide-29
SLIDE 29

Single Sample Proportion Procedures

Situation: We want to perform inference methods on a proportion taken from a single sample (categorical variable). Confidence Interval (large sample): A level C large sample confidence interval for the population proportion p is ˆ p ± z∗

  • ˆ

p(1 − ˆ p) n Conditions: The number of successes and failures should each be at least 15.

slide-30
SLIDE 30

Single Sample Proportion Procedures

Confidence Interval (plus four): A level C “plus four” confidence interval for the population proportion p is ˜ p ± z∗

  • ˜

p(1 − ˜ p) n + 4 where ˜ p = number of successes in the sample + 2 n + 4 Conditions: The confidence level must be at least 90% and the sample size should be at least 10.

slide-31
SLIDE 31

Single Sample Proportion Procedures

Hypothesis Test: In order to test the hypothesis H0 : p = p0, we calculate the one-sample z statistic, z = ˆ p − p0

  • p0(1 − p0)

n Conditions: The expected counts must be large enough so that np0 and n(1 − p0) are both 10 or more. (This does not rely on the sample at all.)

slide-32
SLIDE 32

Two-sample Proportion Procedures

Situation: We want to perform inference methods by comparing two population proportions. We have two samples, which are independent. Confidence Interval (large sample): A level C confidence interval for the difference of the two population proportions, p1 − p2, is (ˆ p1 − ˆ p2) ± z∗

  • ˆ

p1(1 − ˆ p1) n1 + ˆ p2(1 − ˆ p2) n2 Conditions: Counts of successes and failures must be at least 10 for each of the two samples.

slide-33
SLIDE 33

Two-sample Proportion Procedures

Confidence Interval (plus four): A level C “plus four” confidence interval for the difference of the two population proportions, p1 − p2, is (˜ p1 − ˜ p2) ± z∗

  • ˜

p1(1 − ˜ p1) n1 + 2 + ˜ p2(1 − ˜ p2) n2 + 2 where 2 trials with one success are added to each sample proportion. Conditions: Each sample size must be at least 5.

slide-34
SLIDE 34

Two-sample Proportion Procedures

Hypothesis Test: In order to test the hypothesis H0 : p1 = p2, we calculate the two-sample z statistic, z = ˆ p1 − ˆ p2

  • ˆ

p(1 − ˆ p) 1 n1 + 1 n2

  • where the pooled sample proportion is

ˆ p = number of successes in both samples combined number of individuals in both samples combined Conditions: The number of successes and failures in each sample must be at least 5.