SLIDE 1
ACMS 20340 Statistics for Life Sciences Chapter 18: Comparing Two - - PowerPoint PPT Presentation
ACMS 20340 Statistics for Life Sciences Chapter 18: Comparing Two - - PowerPoint PPT Presentation
ACMS 20340 Statistics for Life Sciences Chapter 18: Comparing Two Means Daily Activity and Obesity Researchers at the Mayo Clinic investigate the link between obesity and energy spent on daily activities. They choose 20 healthy volunteers and
SLIDE 2
SLIDE 3
Warning!
We CANNOT proceed how we did last time. Last time, we discussed the special case of using differences with a matched pairs design. This experiment does not include any pairing off of the subjects. So how do we approach these sorts of situations?
SLIDE 4
Two-Sample Problems
With two-sample problems, we are actually comparing two separate populations. Our goal is to compare the responses to two treatments or to simply compare the two populations. We are not comparing a sample to its unknown population as we have in the past.
SLIDE 5
Conditions for Inference Comparing Two Population Means
◮ We have two SRSs, from two distinct populations. ◮ The samples are independent, meaning that one sample has
no influence on the other. (Matching would violate independence.)
◮ Both populations are Normally distributed. In practice, it is
sufficient for the distributions to have similar shapes and no strong outliers in the data.
SLIDE 6
Comparing Two Populations
The notation we use for the populations is as follows: Popluation Population Mean Population s.d. 1 µ1 σ1 2 µ2 σ2 All four of these parameters are unknown. When comparing two populations, focus on the difference between the two means: µ1 − µ2.
SLIDE 7
Comparing Two Populations
As with the one-sample t procedures, we estimate the parameters using our sample statistics. Population Sample Size Sample Mean Sample S.D. 1 n1 ¯ x1 s1 2 n2 ¯ x2 s2 NOTE: The sizes of the two samples may be different.
SLIDE 8
Two-Sample t Procedures
Since we’re focusing on the difference between the populations, the variable we are concerned with is the “difference in sample means,”
- r ¯
x1 − ¯ x2. The sampling distributions of ¯ x1 and ¯ x2 have standard deviations σ1/√n1 and σ2/√n2 respectively. When looking at two samples together, our formulas have to change fairly drastically.
SLIDE 9
Two-Sample t Procedures
The standard deviation of the sampling distribution for the difference ¯ x1 − ¯ x2 is
- σ2
1
n1 + σ2
2
n2 . Because we do not know either population standard deviation, we instead use the standard error, SE =
- s2
1
n1 + s2
2
n2 .
SLIDE 10
Degrees of Freedom
Since the samples may be different sizes, we need a new way of choosing our degrees of freedom: df = s2
1
n1 + s2
2
n2 2 1 n1 − 1 s2
1
n1 2 + 1 n2 − 1 s2
2
n2 2 This calculation rarely yields a whole number, so you must round down in order to use the t table, Table C.
SLIDE 11
Confidence Intervals and Hypothesis Tests
A level C confidence interval for µ1 − µ2 is given by (¯ x1 − ¯ x2) ± t∗
- s2
1
n1 + s2
2
n2 . To test the hypothesis H0 : µ1 = µ2 (which is equivalent to H0 : µ1 − µ2 = 0), we calculate the two-sample t statistic t = (¯ x1 − ¯ x2) − (µ1 − µ2)
- s2
1
n1 + s2
2
n2 .
SLIDE 12
Hypothesis Tests
Usually the null hypothesis is one of no difference, i.e. µ1 − µ2 = 0. In this case the two-sample t statistic simplifies to t = (¯ x1 − ¯ x2)
- s2
1
n1 + s2
2
n2 Find the t∗ critical values and P-values the same way as before.
SLIDE 13
Daily Activity and Obesity
Recall: 10 lean subjects and 10 mildly obese subjects are monitored for amount of time spent standing or walking per day in minutes. Group Condition n ¯ x s 1 lean 10 525.751 107.121 2
- bese
10 373.269 67.498 Find a 90% confidence interval for the difference in average daily minutes spent walking or standing.
SLIDE 14
Daily Activity and Obesity
First we must find the degrees of freedom: df = 107.1212 10 + 67.4982 10 2 1 9 107.1212 10 2 + 1 9 67.4982 10 2 = 15.174 Using 15 degrees of freedom, find critical value t∗ for a confidence level of 0.90.
SLIDE 15
Daily Activity and Obesity
The 90% confidence interval for the difference in population mean, µ1 − µ2 is (¯ x1 − ¯ x2) ± t∗
- s2
1
n1 + s2
2
n2 . Plugging in the values we have simplified solution [82.29, 222.67]
SLIDE 16
Studying Alzheimer’s Disease
An observational study of Alzheimer’s disease (AD) obtained data from 10 AD patients exhibiting moderate dementia and selected a group of 14 individuals without AD to act as a control group. For the study to be credible, the populations must be similar. We’ll perform a hypothesis test to determine if there is any difference in age between the two groups.
SLIDE 17
Studying Alzheimer’s Disease
The null hypothesis is one of no difference between the populations. H0 : µ1 = µ2 (that is µ1 − µ2 = 0) The alternative hypothesis is two-sided because we do not have a direction in mind. Ha : µ1 = µ2 (that is µ1 − µ2 = 0)
SLIDE 18
Studying Alzheimer’s Disease
The summary statistics of the two samples are as follows: Group Condition n ¯ x s 1 Alzheimer’s 10 85.9 6.21 2 Control 14 83.7 8.14
SLIDE 19
Studying Alzheimer’s Disease
The two-sample t statistic is t = ¯ x1 − ¯ x2
- s2
1
n1 + s2
2
n2 = 0.75.
SLIDE 20
Studying Alzheimer’s Disease
The degrees of freedom (df) are given by df = 6.212 10 + 8.142 14 2 1 9 6.212 10 2 + 1 13 8.142 14 2 = 21.856 Using Table C, we compare t = 0.75 with the two critical values of the t(21) distribution.
SLIDE 21
SLIDE 22
Studying Alzheimer’s Disease
We fail to reject H0. There is no significant evidence that there is an age difference between the two groups even at a larger significance level α = 0.10.
SLIDE 23
Quiz
To study the effect of the spectrum of light on the growth of plants, researchers assigned tobacco seedlings at random to two groups of 8 plants each. The plants were grown in a greenhouse under identical conditions except for lighting. The control group was grown under natural light, the experimental group under a blue light. What is the experimental design? A completely randomized design. Stem growth in millimeters: Control 4.3, 4.2, 3.9, 4.1, 4.1, 4.2, 3.8, 4.1 Experimental 3.1, 2.9, 3.2, 3.2, 2.7, 2.9, 3.0, 3.1 Find a 95% confidence interval for the difference in mean stem growth.
SLIDE 24
Quiz, continued
Stem growth in millimeters: Control 4.3, 4.2, 3.9, 4.1, 4.1, 4.2, 3.8, 4.1 Experimental 3.1, 2.9, 3.2, 3.2, 2.7, 2.9, 3.0, 3.1 size ¯ x s Control 8 4.09 0.164 Experimental 8 3.01 0.173 Can we use our two-sample method to compute the confidence interval?
◮ Do we have independent random samples? ◮ Is each sample approximately normal?
SLIDE 25
Quiz, continued
Stem growth in millimeters: Control 4.3, 4.2, 3.9, 4.1, 4.1, 4.2, 3.8, 4.1 Experimental 3.1, 2.9, 3.2, 3.2, 2.7, 2.9, 3.0, 3.1 size ¯ x s Control 8 4.09 0.164 Experimental 8 3.01 0.173 SE(¯ x1 − ¯ x2) =
- s2
1
n1 + s2
2
n2 = 0.084 df =
- s2
1/n1 + s2 2/n2
2
- s2
1/n1
2 /(n1 − 1) +
- s2
2/n2
2 /(n2 − 1) = 13.96 Round down to get df = 13.
SLIDE 26
Quiz, continued
size ¯ x s Control 8 4.09 0.164 Experimental 8 3.01 0.173 SE(¯ x1 − ¯ x2) = 0.084 df = 13 Look up 95% critical value: t∗ = 2.160. Calculate: (¯ x1 − ¯ x2) ± t∗SE(¯ x1 − ¯ x2) = (1.08) ± (2.160)(0.084) = [0.90, 1.26] The estimated difference between populations is positive, indicating the control group has more growth than the experimental group.
SLIDE 27