ACMS 20340 Statistics for Life Sciences Chapter 20: Comparing Two - PowerPoint PPT Presentation

ACMS 20340 Statistics for Life Sciences Chapter 20: Comparing Two Proportions

Two sample tests of proportion Just as we compared the means of two different populations, we would like to compare the proportions of two different populations. A sample is taken from each population. Population Parameter Size Successes Proportion 1 ˆ p 1 n 1 X 1 p 1 2 p 2 n 2 X 2 p 2 ˆ We will use the statistic ˆ p 1 − ˆ p 2 to estimate the difference p 1 − p 2 .

Sampling Distribution of ˆ p 1 − ˆ p 2 ◮ ˆ p 1 − ˆ p 2 is approximately Normal. ◮ The mean of ˆ p 1 − ˆ p 2 is p 1 − p 2 . ◮ The standard deviation of ˆ p 1 − ˆ p 2 is � p 1 (1 − p 1 ) + p 2 (1 − p 2 ) . n 1 n 2

Two population large sample CI For a large sample CI we can estimate the SE using ˆ p 1 and ˆ p 2 : � ˆ p 1 (1 − ˆ p 1 ) + ˆ p 2 (1 − ˆ p 2 ) SE = n 1 n 2 and the corresponding confidence interval is (ˆ p 1 − ˆ p 2 ) ± z ∗ SE Conditions on using the large sample test: To compute a two population large sample CI we require the sample size to be large enough that the number of successes in each sample is more than 10. Same for the number of failures in each sample.

DHMO Example, continued Broadening our survey on DHMO awareness, we would like to compare Elkhart with South Bend by finding a 95% CI for the difference in awareness. We collect the following data. Population Size Successes Proportion (1) S.Bend 183 29 0.158 (2) Elkhart 164 33 0.201 Does this data fit the requirements to do a large sample confidence interval? ◮ We need each sample to contain at least 10 successes and 10 failures. ◮ Check!

DHMO 2 population large sample CI, cont. Population Size Successes Proportion (1) S.Bend 183 29 0.158 (2) Elkhart 164 33 0.201 We calculate the standard error of ˆ p 1 − ˆ p 2 : � ˆ p 1 (1 − ˆ p 1 ) + ˆ p 2 (1 − ˆ p 2 ) SE = = 0 . 041 n 1 n 2 Giving the interval (0 . 158 − 0 . 201) ± (1 . 96)(0 . 041) = [ − 0 . 124 , 0 . 038] 0 being in the interval supports the conclusion that there is no difference between the populations.

“Plus Four” Just as in the one-sample situation, there is a way to adjust ˆ p 1 and ˆ p 2 to get more accurate confidence intervals. This is the “plus four” technique adapted for two samples. To do this we still imagine adding 4 “imaginary” observations. This time we add two to each sample, with each sample getting an imaginary success and failure. p 1 = X 1 + 1 p 2 = X 2 + 1 ˜ ˜ n 1 + 2 n 2 + 2 Like before, the plus-four technique lets us construct CI under much weaker conditions than needed for the large sample method. Conditions on plus-four CI: each sample has at least 5 observations (with any mixture of successes and failures)

Plus four example Lets construct a plus-four CI on the data from before. Population Size Successes Proportion (1) S.Bend 183 29 0.158 (2) Elkhart 164 33 0.201 Calculate p 1 = 29 + 1 p 2 = 33 + 1 ˜ 183 + 2 = 0 . 162 ˜ 164 + 2 = 0 . 205 The standard error is � ˜ p 1 (1 − ˜ p 1 ) + ˜ p 2 (1 − ˜ p 2 ) SE = = 0 . 041 n 1 + 2 n 2 + 2 And the interval is (˜ p 1 − ˜ p 2 ) ± z ∗ SE = (0 . 162 − 0 . 205) ± (1 . 96)(0 . 041) [ − 0 . 124 , 0 . 039] Note that we used the altered sample sizes when calculating SE.

Comparing Two Proportions Recall that we are comparing proportions between two different populations. We have a sample from each population. Popluation Parameter Size Successes Proportion 1 p 1 n 1 X 1 p 1 ˆ 2 p 2 n 2 X 2 p 2 ˆ Last time we discussed how to find a confidence interval for the difference between the two population proportions, p 1 − p 2 .

Significance Tests for Comparing Proportions As we did when comparing two means, we’ll assume a null hypothesis of no difference. H 0 : p 1 = p 2 The alternative hypothesis is also treated the same way. It can be one or two sided. H a : p 1 � = p 2 H a : p 1 > p 2 H a : p 1 < p 2

Significance Tests for Comparing Proportions Also, since p 1 and p 2 are unknown, we’ll use ˆ p 1 − ˆ p 2 to estimate the difference between the two proportions. To calculate a test statistic, we’ll need a value for the standard error. For confidence intervals, we used � ˆ p 1 (1 − ˆ p 1 ) + ˆ p 2 (1 − ˆ p 2 ) SE = n 1 n 2

Standard Error However, there is a change we can make when comparing proportions. If H 0 is true, then we can pool the two samples together into a more accurate standard error. We do this by calculating a pooled sample proportion p = number of successes in both samples combined ˆ number of individuals in both samples combined

Pooled Sample Proportion Why can we combine the samples together? Because if H 0 is true, all observations in both samples come from similar populations, so we can consider our observations coming from one combined population. This is a special feature of tests for proportion. When comparing two means, the two populations could have different standard deviations. So assuming the null hypothesis is true in that case did not allow us to combine the two populations.

Pooled Sample Proportion However, the standard deviation for the sampling distribution of proportions depends only on p (the proportion for the combined populations). This changes our standard error to � 1 � + 1 � SE = ˆ p (1 − ˆ p ) n 1 n 2 We’ll use this when calculating our test statistic.

Sampling Distribution

Significance Test for Comparing Two Proportions To test the hypothesis H 0 : p 1 = p 2 , we first find the pooled sample proportion. Then the two-sample z test statistic is ˆ p 1 − ˆ p 2 z = � 1 � � + 1 p (1 − ˆ ˆ p ) n 1 n 2 We can use this procedure when the counts of successes and failures are each 5 or more in each sample.

Gastric Freezing 1 Gastric freezing was once a treatment for ulcers. Patients would swallow a deflated balloon with tubes to cool their stomach for an hour in hope of reducing acid production and relieving ulcer pain. The treatment was safe and widely used for years. A randomized comparative experiment was later performed to test the effectiveness of gastric freezing.

Gastric Freezing 2 The results of the experiment are summarized as follows. Note that a “success” is if the patient’s condition improves from the treatment administered. Population Treatment Sample Size Successes Proportion 1 Gastric Freezing 82 28 0.342 2 Placebo 78 30 0.385

Gastric Freezing 3 As usual, we’ll use the null hypothesis of “no difference.” H 0 : p 1 = p 2 Since we expect the treatment to work better than a placebo, we’ll use a one-sided alternative. H a : p 1 > p 2 Also, we have enough successes and failures to pass the necessary conditions, so we can proceed.

Gastric Freezing 4 First, we’ll find the pooled sample proportion. p = number of successes in both samples combined ˆ number of individuals in both samples combined = 28 + 30 82 + 78 = 58 160 = 0 . 363

Gastric Freezing 5 So the two-sample z test statistic is p 1 − ˆ ˆ p 2 z = � 1 � � + 1 p (1 − ˆ ˆ p ) n 1 n 2 0 . 342 − 0 . 385 = � 1 � 82 + 1 � (0 . 363)(0 . 637) 78 = − 0 . 57

Gastric Freezing 6 Recall our alternative hypothesis is H a : p 1 > p 2 . <( The P -value is 0.7157. We fail to reject the null hypothesis. There is no evidence that gastric freezing is better than a placebo.

Single Sample t Procedures Situation: We want to perform inference methods on a single sample mean without knowing the mean or standard deviation of the population. Confidence Interval: A level C confidence interval for the mean µ of a population is x ± t ∗ s ¯ √ n where t ∗ is the critical value of the t distribution with n − 1 degrees of freedom.

Single Sample t Procedures Hypothesis Test: In order to test the hypothesis H 0 : µ = µ 0 , we calculate the one-sample t statistic t = ¯ x − µ s / √ n and find P-values from the t ( n − 1) distribution. Conditions: The t procedures work for samples of size n ≥ 15, unless the sample has outliers or strong skewness.

Special Case: Matched Pairs Situation: With matched pairs, we perform inference methods on the difference between the two groups. (as in the case with two independent samples) However, since the individuals in each group are paired up, we treat the individual differences as a single sample. We then perform the single sample inference tests on the unknown mean of the differences, µ d .

Two-sample t Procedures Situation: We want to perform inference methods by comparing two population means. We have two samples, which are independent . We do not know the mean or standard deviation of either population. Degrees of Freedom: Instead of using n − 1 (since the samples may be different sizes), we calculate degrees of freedom with the following equation. � s 2 � 2 n 1 + s 2 1 2 n 2 df = � s 2 � s 2 � 2 � 2 1 1 + 1 2 n 1 − 1 n 1 n 2 − 1 n 2

Two-sample t Procedures Confidence Interval: A level C confidence interval for the difference between the population means, µ 1 − µ 2 is � s 2 + s 2 1 2 (¯ x 1 − ¯ x 2 ) ± t ∗ n 1 n 2 Hypothesis Test: To test the hypothesis H 0 : µ 1 = µ 2 , calculate the two-sample t statistic, ¯ x 1 − ¯ x 2 t = � s 2 + s 2 1 2 n 1 n 2

ACMS 20340 Statistics for Life Sciences Chapter 20: Comparing Two - PowerPoint PPT Presentation

ACMS 20340 Statistics for Life Sciences Chapter 20: Comparing Two Proportions Two sample tests of proportion Just as we compared the means of two different populations, we would like to compare the proportions of two different populations. A

ACMS 20340 Statistics for Life Sciences Chapter 9: Introducing Probability Why Consider

ACMS 20340 Statistics for Life Sciences Chapter 3: Scatterplots and Correlation Exploratory

ACMS 20340 Statistics for Life Sciences Chapter 7: Samples and Observational Studies Obtaining

ACMS 20340 Statistics for Life Sciences Chapter 8: Designing Experiments Fishers Experiments

ACMS 20340 Statistics for Life Sciences Chapter 13: Sampling Distributions Sampling We use

ACMS 20340 Statistics for Life Sciences Chapter 18: Comparing Two Means Daily Activity and

ACMS 20340 Statistics for Life Sciences Chapter 15: Inference in Practice Inference in Practice

ACMS 20340 Statistics for Life Sciences Chapter 14: Introduction to Inference Sampling

ACMS 20340 Statistics for Life Sciences Chapter 4: Regression A Quick Recap of Chapter 3

ACMS 20340 Statistics for Life Sciences Chapter 11: The Normal Distributions Introducing the

ACMS 20340 Statistics for Life Sciences Chapter 22: The Chi-Square Test for Two-Way Tables

ACMS 20340 Statistics for Life Sciences Chapter 17: Inference About a Population Mean

ACMS 20340 Statistics for Life Sciences Chapter 19: Inference about a Population Proportion

ACMS 20340 Statistics for Life Sciences Chapter 24: One-way Analysis of Variance: Comparing

ACMS 20340 Statistics for Life Sciences Chapter 12: Discrete Probability Distributions What

ACMS 20340 Statistics for Life Sciences Chapter 21: The Chi-Square Test for Goodness of Fit

CEE 697K ENVIRONMENTAL REACTION KINETICS Lecture #8 Special Topics: Pharmaceuticals in Water I

ANNUAL SESSION 2014 President's Report to the 51st Annual Session of the Ghana Baptist Convention

SXPath - Extending XPath towards Spatial Querying on Web Documents Ermelinda Oro 1 Massimo Ruffolo

Low Rank Approximation Lecture 7 Daniel Kressner Chair for Numerical Algorithms and HPC

Chapter 9: Competition From: Gause 1934 Competitive exclusion and co-existence Asterionella

An empirical Bayes procedure for the selection of Gaussian graphical models Estimation bay

Outline Wh y Mac hine Learning What is a w elldened learning problem

Course Projects Sep 13, 2012 Course Projects Covers 50% of your grade 10-12 weeks of work