ACMS 20340 Statistics for Life Sciences Chapter 17: Inference - - PowerPoint PPT Presentation

acms 20340 statistics for life sciences
SMART_READER_LITE
LIVE PREVIEW

ACMS 20340 Statistics for Life Sciences Chapter 17: Inference - - PowerPoint PPT Presentation

ACMS 20340 Statistics for Life Sciences Chapter 17: Inference About a Population Mean Assumptions for Estimating a Population Mean Previously, in estimating a population mean, we assumed the sample, of size n , is a SRS from the population,


slide-1
SLIDE 1

ACMS 20340 Statistics for Life Sciences

Chapter 17: Inference About a Population Mean

slide-2
SLIDE 2

Assumptions for Estimating a Population Mean

Previously, in estimating a population mean, we assumed

◮ the sample, of size n, is a SRS from the population, ◮ the population is normally distributed, N(µ, σ), and ◮ µ is unknown, while σ is known.

We will now estimate a population mean in the case where we know neither µ nor σ. In what follows, we will assume the population is much larger (at least 20 times larger) than the sample size.

(This is a standard assumption that applies to all of the inference methods that we will cover.)

slide-3
SLIDE 3

Estimating µ without the knowledge of σ (1)

Since we don’t know σ, let’s approximate it using s, the sample standard deviation. Recall from way back in Chapter 2 (!!!) that the standard deviation of a sample, s, was defined to be s =

  • 1

n − 1

  • (xi − ¯

x)2. (Note: s is a sample statistic, while σ is a population parameter.) The sampling distribution of ¯ x is still N(µ, σ/√n), but... WE DON’T KNOW σ. Since we don’t know σ/√n, we cannot standardize ¯ x to find the

  • ne-sample z test statistic

z = ¯ x − µ σ/√n.

slide-4
SLIDE 4

Estimating µ without the knowledge of σ (2)

Instead of σ/√n, we use s/√n, which is called the standard error. Then, we can calculate t = ¯ x − µ s/√n . Whereas z had the normal distribution N(0, 1), t doesn’t. t has the t distribution with n − 1 degrees of freedom, denoted t(n − 1).

slide-5
SLIDE 5

Degrees of Freedom Revisited

The degrees of freedom (df) measures how well s should approximate σ, and it depends only on the sample size n. For a sample of size n we use the t distribution having n − 1 degrees of freedom.

slide-6
SLIDE 6

What is the t distribution? (1)

◮ The density curves of the t distributions are similar in shape

to the standard Normal curve. They are symmetric about 0, single-peaked, and bell-shaped.

slide-7
SLIDE 7

What is the t distribution? (2)

◮ However, the spread of t distributions is greater than that of

the standard Normal distribution: Since we are estimating σ with s, there is variability in not knowing the exact value of σ. Thus the t distribution has heavier tails than the normal distribution.

slide-8
SLIDE 8

What is the t distribution? (3)

◮ As the degrees of freedom increase, t(n − 1) gets closer to

N(0, 1).

slide-9
SLIDE 9

Using the t distribution table

There are many t distributions, one for each df; the table lists some common values for various degrees of freedom.

slide-10
SLIDE 10

Why use the t distribution table?

We can construct confidence intervals for µ and perform hypothesis tests on µ just as before, but without the assumption of σ. We just use the t table instead of the Normal tables.

slide-11
SLIDE 11

Historical Aside

The t distribution was developed by William Sealy Gosset and published in 1908. He was studying Quality Control for his employer, the Guinness Company. Since his employer had a strict non- disclosure clause, he published under the pseudonym ‘Student’.

slide-12
SLIDE 12

Example I: Constructing a Confidence Interval

Suppose we have the following observations 4.21, 5.93, 1.92, 0.39, 6.44, 3.71, 1.43, 1.29, 4.74 We’d like to construct a 95% confidence interval for µ. First, we calculate ¯ x = 3.34 and s = 2.17. The confidence interval is ¯ x ±t∗ s

√n for some

t∗. Which one? Since we have df = n − 1 = 8 and con- fidence level 95%, using the table we find t∗ = 2.306.

slide-13
SLIDE 13

Example I: Constructing a Confidence Interval

Suppose we have the following observations 4.21, 5.93, 1.92, 0.39, 6.44, 3.71, 1.43, 1.29, 4.74 We’d like to construct a 95% confidence interval for µ. First, we calculate ¯ x = 3.34 and s = 2.17. The confidence interval is ¯ x ±t∗ s

√n for some

t∗. Which one? Since we have df = n − 1 = 8 and con- fidence level 95%, using the table we find t∗ = 2.306. Final Interval: 3.34 ± (2.306) × (2.17)/ √ 9 3.34 ± 1.668

slide-14
SLIDE 14

Hypothesis Testing

Hypothesis testing is similar. The most difficult part is using the t-table itself.

slide-15
SLIDE 15

Example 2: Hypothesis Testing

Using the same data as before, does it support the hypothesis µ = 5? H0 : µ = 5 Ha : µ = 5 Our sample: 4.21, 5.93, 1.92, 0.39, 6.44, 3.71, 1.43, 1.29, 4.74 Again, we have ¯ x = 3.34 and s = 2.17. The t-score of our sample is t = ¯ x − 5 s/ √ 9 = 3.34 − 5 2.17/3 = −2.29 Next, we use the t table to estimate the P-value. The degrees of freedom is df = n − 1 = 8.

slide-16
SLIDE 16

Hypothesis Example (cont.)

t = −2.29, df = 8. Remember: this is a two-tailed test. The t table doesn’t have negative values, so we look for the value | − 2.29| = 2.29 We estimate that P between 0.1 and 0.05. Do not reject H0 at 95% level.

slide-17
SLIDE 17

Question

A cola company wants to know how the sweetness of the cola is effected by storage. Ten professional tasters measure the sweetness of the cola before and after it has been stored (where the order in which they taste the cola is randomized). Taster Before Storage After Storage 1 4 2 2 3.8 3.4 3 4.1 3.4 4 3.9 1.9 5 3.1 3.5 6 4.2 1.8 7 2.9 4.2 8 5.3 4.1 9 4.9 3.8 10 6.2 3.9

slide-18
SLIDE 18

Question

Taster Before Storage After Storage 1 4 2 2 3.8 3.4 3 4.1 3.4 4 3.9 1.9 5 3.1 3.5 6 4.2 1.8 7 2.9 4.2 8 5.3 4.1 9 4.9 3.8 10 6.2 3.9 What is this experimental design? A matched-pairs design.

slide-19
SLIDE 19

Question

There are two populations:

◮ Cola before storage (mean sweetness µbefore). ◮ Cola after storage (mean sweetness µafter).

We don’t care what the mean sweetness of either population is,

  • nly whether the sweetness after storage is less than the sweetness

before storage. However, we don’t know σ.

slide-20
SLIDE 20

How do we proceed?

We handle this by using a matched pairs t-test for population mean difference. For each pair in the experiment, we compute the difference in sweetness, and then we perform a hypothesis test for the difference being 0. Set µd = µafter − µbefore. Then we have H0 : µd = 0 Ha : µd < 0

slide-21
SLIDE 21

Calculating the Differences

Taster Before Storage After Storage Difference 1 4 2

  • 2

2 3.8 3.4

  • 0.4

3 4.1 3.4

  • 0.7

4 3.9 1.9

  • 2.0

5 3.1 3.5 0.4 6 4.2 1.8

  • 2.2

7 2.9 4.2 1.3 8 5.3 4.1

  • 1.2

9 4.9 3.8

  • 1.1

10 6.2 3.9

  • 2.3
slide-22
SLIDE 22

Calculating the Differences

Taster Before Storage After Storage Difference 1 4 2

  • 2

2 3.8 3.4

  • 0.4

3 4.1 3.4

  • 0.7

4 3.9 1.9

  • 2.0

5 3.1 3.5 0.4 6 4.2 1.8

  • 2.2

7 2.9 4.2 1.3 8 5.3 4.1

  • 1.2

9 4.9 3.8

  • 1.1

10 6.2 3.9

  • 2.3

Only focus on the differences when computing the sample

  • statistics. In our sample the average difference is ¯

x = −1.02. The sample standard deviation is s = 1.196.

slide-23
SLIDE 23

Carrying Out the Test 1

¯ x = −1.02, s = 1.196. The hypothesis is H0 : µd = 0 Ha : µd < 0 The test statistic is t = ¯ x − 0 s/√n = −1.02 1.196/ √ 10 = −2.70 This is a one-sided test. The P-value is P(t < −2.70). Since the t table only has positive values, we look for the value t = 2.70.

slide-24
SLIDE 24

Carrying Out the Test 2

We have t = 2.70. There are df = n − 1 = 9 degrees of freedom. Using the table we see 2.398 < t = 2.70 < 2.821, so using the one-sided test we get 0.02 > P > 0.01. This suggests a significant loss of sweetness during storage.

slide-25
SLIDE 25

In General. . .

In a matched pairs experiment, the sample consists of pairs of individuals. Each pair contains exactly one individual from each of two populations. We usually only care about differences between the populations. We handle this by reducing each pair of data to a single difference, and then analysing the difference as we have done before with a

  • ne sample t-test.
slide-26
SLIDE 26
  • Moreover. . .

The same reasoning with other hypothesis tests apply to this one. In general we only care about the two populations having different means, so we would use a two-tailed test. H0 : µd = 0 Ha : µd = 0 Sometimes, as in the cola example, we know that the difference, if any, will be in a certain direction. In those cases use a one-sided test.

slide-27
SLIDE 27

Matched Pairs Confidence Intervals

We can also estimate confidence intervals for the difference between two populations. We handle this just as the hypothesis test:

◮ Determine the sample average and sample standard deviation

in difference between pairs.

◮ Compute a confidence interval from that data.

slide-28
SLIDE 28

Confidence Interval Example

Using the cola data, estimate the average loss of sweetness from storage with 95% confidence. The sample mean difference between the pairs is ¯ x = −1.02, with s.d. s = 1.196. We have degrees of freedom df = n − 1 = 9. Use the table to find the value t∗ = 2.262. ¯ x ± t∗ s √n = [−1.874, −0.164] With 95% confidence the true difference in sweetness from before storage to after storage is between −1.874 and −0.164.

slide-29
SLIDE 29

General Considerations

The t tests assume the underlying population is Normal. We say a test is robust if the confidence level does not change much when the conditions for the use of the test are violated. Since the t test applies to small samples, we would like to use the t test as much as possible. Is it robust?

◮ The most important factor is that the sample is random. ◮ If n < 15 then we can use the t test if the data appear close

to normal (symmetric, single peak, no outliers).

◮ If n < 15 and the data is skewed or has outliers, do not use

the t test.

◮ If n ≥ 15 we can use the t test unless there are outliers or a

very strong skewness.

◮ If n ≥ 40 anchors away! Use the t test with abandon.