ACMS 20340 Statistics for Life Sciences Chapter 18: Comparing Two - - PowerPoint PPT Presentation

acms 20340 statistics for life sciences
SMART_READER_LITE
LIVE PREVIEW

ACMS 20340 Statistics for Life Sciences Chapter 18: Comparing Two - - PowerPoint PPT Presentation

ACMS 20340 Statistics for Life Sciences Chapter 18: Comparing Two Means Daily Activity and Obesity Researchers at the Mayo Clinic investigate the link between obesity and energy spent on daily activities. They choose 20 healthy volunteers and


slide-1
SLIDE 1

ACMS 20340 Statistics for Life Sciences

Chapter 18: Comparing Two Means

slide-2
SLIDE 2

Daily Activity and Obesity

Researchers at the Mayo Clinic investigate the link between obesity and energy spent on daily activities. They choose 20 healthy volunteers and monitor their activities for 10 days. They are deliberately chosen so there are 10 who are lean and 10 who are mildly obese. However, the individuals in each group are selected randomly.

slide-3
SLIDE 3

Warning!

We CANNOT proceed how we did last time. Last time, we discussed the special case of using differences with a matched pairs design. This experiment does not include any pairing off of the subjects. So how do we approach these sorts of situations?

slide-4
SLIDE 4

Two-Sample Problems

With two-sample problems, we are actually comparing two separate populations. Our goal is to compare the responses to two treatments or to simply compare the two populations. We are not comparing a sample to its unknown population as we have in the past.

slide-5
SLIDE 5

Conditions for Inference Comparing Two Population Means

◮ We have two SRSs, from two distinct populations. ◮ The samples are independent, meaning that one sample has

no influence on the other. (Matching would violate independence.)

◮ Both populations are Normally distributed. In practice, it is

sufficient for the distributions to have similar shapes and no strong outliers in the data.

slide-6
SLIDE 6

Comparing Two Populations

The notation we use for the populations is as follows: Popluation Population Mean Population s.d. 1 µ1 σ1 2 µ2 σ2 All four of these parameters are unknown. When comparing two populations, focus on the difference between the two means: µ1 − µ2.

slide-7
SLIDE 7

Comparing Two Populations

As with the one-sample t procedures, we estimate the parameters using our sample statistics. Population Sample Size Sample Mean Sample S.D. 1 n1 ¯ x1 s1 2 n2 ¯ x2 s2 NOTE: The sizes of the two samples may be different.

slide-8
SLIDE 8

Two-Sample t Procedures

Since we’re focusing on the difference between the populations, the variable we are concerned with is the “difference in sample means,”

  • r ¯

x1 − ¯ x2. The sampling distributions of ¯ x1 and ¯ x2 have standard deviations σ1/√n1 and σ2/√n2 respectively. When looking at two samples together, our formulas have to change fairly drastically.

slide-9
SLIDE 9

Two-Sample t Procedures

The standard deviation of the sampling distribution for the difference ¯ x1 − ¯ x2 is

  • σ2

1

n1 + σ2

2

n2 . Because we do not know either population standard deviation, we instead use the standard error, SE =

  • s2

1

n1 + s2

2

n2 .

slide-10
SLIDE 10

Degrees of Freedom

Since the samples may be different sizes, we need a new way of choosing our degrees of freedom: df = s2

1

n1 + s2

2

n2 2 1 n1 − 1 s2

1

n1 2 + 1 n2 − 1 s2

2

n2 2 This calculation rarely yields a whole number, so you must round down in order to use the t table, Table C.

slide-11
SLIDE 11

Confidence Intervals and Hypothesis Tests

A level C confidence interval for µ1 − µ2 is given by (¯ x1 − ¯ x2) ± t∗

  • s2

1

n1 + s2

2

n2 . To test the hypothesis H0 : µ1 = µ2 (which is equivalent to H0 : µ1 − µ2 = 0), we calculate the two-sample t statistic t = (¯ x1 − ¯ x2) − (µ1 − µ2)

  • s2

1

n1 + s2

2

n2 .

slide-12
SLIDE 12

Hypothesis Tests

Usually the null hypothesis is one of no difference, i.e. µ1 − µ2 = 0. In this case the two-sample t statistic simplifies to t = (¯ x1 − ¯ x2)

  • s2

1

n1 + s2

2

n2 Find the t∗ critical values and P-values the same way as before.

slide-13
SLIDE 13

Daily Activity and Obesity

Recall: 10 lean subjects and 10 mildly obese subjects are monitored for amount of time spent standing or walking per day in minutes. Group Condition n ¯ x s 1 lean 10 525.751 107.121 2

  • bese

10 373.269 67.498 Find a 90% confidence interval for the difference in average daily minutes spent walking or standing.

slide-14
SLIDE 14

Daily Activity and Obesity

First we must find the degrees of freedom: df = 107.1212 10 + 67.4982 10 2 1 9 107.1212 10 2 + 1 9 67.4982 10 2 = 15.174 Using 15 degrees of freedom, find critical value t∗ for a confidence level of 0.90.

slide-15
SLIDE 15

Daily Activity and Obesity

The 90% confidence interval for the difference in population mean, µ1 − µ2 is (¯ x1 − ¯ x2) ± t∗

  • s2

1

n1 + s2

2

n2 . Plugging in the values we have simplified solution [82.29, 222.67]

slide-16
SLIDE 16

Studying Alzheimer’s Disease

An observational study of Alzheimer’s disease (AD) obtained data from 10 AD patients exhibiting moderate dementia and selected a group of 14 individuals without AD to act as a control group. For the study to be credible, the populations must be similar. We’ll perform a hypothesis test to determine if there is any difference in age between the two groups.

slide-17
SLIDE 17

Studying Alzheimer’s Disease

The null hypothesis is one of no difference between the populations. H0 : µ1 = µ2 (that is µ1 − µ2 = 0) The alternative hypothesis is two-sided because we do not have a direction in mind. Ha : µ1 = µ2 (that is µ1 − µ2 = 0)

slide-18
SLIDE 18

Studying Alzheimer’s Disease

The summary statistics of the two samples are as follows: Group Condition n ¯ x s 1 Alzheimer’s 10 85.9 6.21 2 Control 14 83.7 8.14

slide-19
SLIDE 19

Studying Alzheimer’s Disease

The two-sample t statistic is t = ¯ x1 − ¯ x2

  • s2

1

n1 + s2

2

n2 = 0.75.

slide-20
SLIDE 20

Studying Alzheimer’s Disease

The degrees of freedom (df) are given by df = 6.212 10 + 8.142 14 2 1 9 6.212 10 2 + 1 13 8.142 14 2 = 21.856 Using Table C, we compare t = 0.75 with the two critical values of the t(21) distribution.

slide-21
SLIDE 21
slide-22
SLIDE 22

Studying Alzheimer’s Disease

We fail to reject H0. There is no significant evidence that there is an age difference between the two groups even at a larger significance level α = 0.10.

slide-23
SLIDE 23

Quiz

To study the effect of the spectrum of light on the growth of plants, researchers assigned tobacco seedlings at random to two groups of 8 plants each. The plants were grown in a greenhouse under identical conditions except for lighting. The control group was grown under natural light, the experimental group under a blue light. What is the experimental design? A completely randomized design. Stem growth in millimeters: Control 4.3, 4.2, 3.9, 4.1, 4.1, 4.2, 3.8, 4.1 Experimental 3.1, 2.9, 3.2, 3.2, 2.7, 2.9, 3.0, 3.1 Find a 95% confidence interval for the difference in mean stem growth.

slide-24
SLIDE 24

Quiz, continued

Stem growth in millimeters: Control 4.3, 4.2, 3.9, 4.1, 4.1, 4.2, 3.8, 4.1 Experimental 3.1, 2.9, 3.2, 3.2, 2.7, 2.9, 3.0, 3.1 size ¯ x s Control 8 4.09 0.164 Experimental 8 3.01 0.173 Can we use our two-sample method to compute the confidence interval?

◮ Do we have independent random samples? ◮ Is each sample approximately normal?

slide-25
SLIDE 25

Quiz, continued

Stem growth in millimeters: Control 4.3, 4.2, 3.9, 4.1, 4.1, 4.2, 3.8, 4.1 Experimental 3.1, 2.9, 3.2, 3.2, 2.7, 2.9, 3.0, 3.1 size ¯ x s Control 8 4.09 0.164 Experimental 8 3.01 0.173 SE(¯ x1 − ¯ x2) =

  • s2

1

n1 + s2

2

n2 = 0.084 df =

  • s2

1/n1 + s2 2/n2

2

  • s2

1/n1

2 /(n1 − 1) +

  • s2

2/n2

2 /(n2 − 1) = 13.96 Round down to get df = 13.

slide-26
SLIDE 26

Quiz, continued

size ¯ x s Control 8 4.09 0.164 Experimental 8 3.01 0.173 SE(¯ x1 − ¯ x2) = 0.084 df = 13 Look up 95% critical value: t∗ = 2.160. Calculate: (¯ x1 − ¯ x2) ± t∗SE(¯ x1 − ¯ x2) = (1.08) ± (2.160)(0.084) = [0.90, 1.26] The estimated difference between populations is positive, indicating the control group has more growth than the experimental group.

slide-27
SLIDE 27

General Comments on Two-Sample Tests

◮ Two-sample tests are robust against the data not being

exactly normal, just as long as there are no outliers.

◮ It is better to have the two samples be the same size, if

possible.

◮ When the sizes of the two samples are equal and the two

populations being compared have distributions with similar shapes, probabilities from the t table are fairly accurate for a broad range of distributions when the sample sizes are as small as n1 = n2 = 5.

◮ Do not try to estimate the standard deviations beyond

calculating s1 and s2. Standard deviations are actually very hard to estimate since the methods only work if the population is normal. It is usually best to seek expert advice in this case.