ACMS 20340 Statistics for Life Sciences Chapter 17: Inference - PowerPoint PPT Presentation

ACMS 20340 Statistics for Life Sciences Chapter 17: Inference About a Population Mean

Assumptions for Estimating a Population Mean Previously, in estimating a population mean, we assumed ◮ the sample, of size n , is a SRS from the population, ◮ the population is normally distributed, N ( µ, σ ), and ◮ µ is unknown, while σ is known. We will now estimate a population mean in the case where we know neither µ nor σ . In what follows, we will assume the population is much larger (at least 20 times larger) than the sample size. (This is a standard assumption that applies to all of the inference methods that we will cover.)

Estimating µ without the knowledge of σ (1) Since we don’t know σ , let’s approximate it using s , the sample standard deviation. Recall from way back in Chapter 2 (!!!) that the standard deviation of a sample, s , was defined to be � 1 � x ) 2 . s = ( x i − ¯ n − 1 (Note: s is a sample statistic , while σ is a population parameter .) x is still N ( µ, σ/ √ n ), but... The sampling distribution of ¯ WE DON’T KNOW σ . Since we don’t know σ/ √ n , we cannot standardize ¯ x to find the one-sample z test statistic z = ¯ x − µ σ/ √ n .

Estimating µ without the knowledge of σ (2) Instead of σ/ √ n , we use s / √ n , which is called the standard error. Then, we can calculate t = ¯ x − µ s / √ n . Whereas z had the normal distribution N (0 , 1), t doesn’t. t has the t distribution with n − 1 degrees of freedom, denoted t ( n − 1).

Degrees of Freedom Revisited The degrees of freedom (df) measures how well s should approximate σ , and it depends only on the sample size n . For a sample of size n we use the t distribution having n − 1 degrees of freedom.

What is the t distribution? (1) ◮ The density curves of the t distributions are similar in shape to the standard Normal curve. They are symmetric about 0, single-peaked, and bell-shaped.

What is the t distribution? (2) ◮ However, the spread of t distributions is greater than that of the standard Normal distribution: Since we are estimating σ with s , there is variability in not knowing the exact value of σ . Thus the t distribution has heavier tails than the normal distribution.

What is the t distribution? (3) ◮ As the degrees of freedom increase, t ( n − 1) gets closer to N (0 , 1).

Using the t distribution table There are many t distributions, one for each df; the table lists some common values for various degrees of freedom.

Why use the t distribution table? We can construct confidence intervals for µ and perform hypothesis tests on µ just as before, but without the assumption of σ . We just use the t table instead of the Normal tables.

Historical Aside The t distribution was developed by William Sealy Gosset and published in 1908. He was studying Quality Control for his employer, the Guinness Company. Since his employer had a strict non- disclosure clause, he published under the pseudonym ‘Student’.

Example I: Constructing a Confidence Interval Suppose we have the following observations 4 . 21 , 5 . 93 , 1 . 92 , 0 . 39 , 6 . 44 , 3 . 71 , 1 . 43 , 1 . 29 , 4 . 74 We’d like to construct a 95% confidence interval for µ . First, we calculate ¯ x = 3 . 34 and s = 2 . 17. x ± t ∗ s The confidence interval is ¯ √ n for some t ∗ . Which one? Since we have df = n − 1 = 8 and confidence level 95%, using the table we find t ∗ = 2 . 306.

Example I: Constructing a Confidence Interval Suppose we have the following observations 4 . 21 , 5 . 93 , 1 . 92 , 0 . 39 , 6 . 44 , 3 . 71 , 1 . 43 , 1 . 29 , 4 . 74 We’d like to construct a 95% confidence interval for µ . First, we calculate ¯ x = 3 . 34 and s = 2 . 17. x ± t ∗ s The confidence interval is ¯ √ n for some t ∗ . Which one? Since we have df = n − 1 = 8 and confidence level 95%, using the table we find t ∗ = 2 . 306. Final Interval: √ 3 . 34 ± (2 . 306) × (2 . 17) / 9 3 . 34 ± 1 . 668

Hypothesis Testing Hypothesis testing is similar. The most difficult part is using the t -table itself.

Example 2: Hypothesis Testing Using the same data as before, does it support the hypothesis µ = 5? H 0 : µ = 5 H a : µ � = 5 Our sample: 4.21, 5.93, 1.92, 0.39, 6.44, 3.71, 1.43, 1.29, 4.74 Again, we have ¯ x = 3 . 34 and s = 2 . 17. The t -score of our sample is t = ¯ x − 5 √ s / 9 = 3 . 34 − 5 2 . 17 / 3 = − 2 . 29 Next, we use the t table to estimate the P -value. The degrees of freedom is df = n − 1 = 8.

Hypothesis Example (cont.) t = − 2 . 29, df = 8. Remember: this is a two-tailed test. The t table doesn’t have negative values, so we look for the value | − 2 . 29 | = 2 . 29 We estimate that P between 0.1 and 0.05. Do not reject H 0 at 95% level.

Question A cola company wants to know how the sweetness of the cola is effected by storage. Ten professional tasters measure the sweetness of the cola before and after it has been stored (where the order in which they taste the cola is randomized). Taster Before Storage After Storage 1 4 2 2 3.8 3.4 3 4.1 3.4 4 3.9 1.9 5 3.1 3.5 6 4.2 1.8 7 2.9 4.2 8 5.3 4.1 9 4.9 3.8 10 6.2 3.9

Question Taster Before Storage After Storage 1 4 2 2 3.8 3.4 3 4.1 3.4 4 3.9 1.9 5 3.1 3.5 6 4.2 1.8 7 2.9 4.2 8 5.3 4.1 9 4.9 3.8 10 6.2 3.9 What is this experimental design? A matched-pairs design.

Question There are two populations: ◮ Cola before storage (mean sweetness µ before ). ◮ Cola after storage (mean sweetness µ after ). We don’t care what the mean sweetness of either population is, only whether the sweetness after storage is less than the sweetness before storage. However, we don’t know σ .

How do we proceed? We handle this by using a matched pairs t -test for population mean difference. For each pair in the experiment, we compute the difference in sweetness, and then we perform a hypothesis test for the difference being 0. Set µ d = µ after − µ before . Then we have H 0 : µ d = 0 H a : µ d < 0

Calculating the Differences Taster Before Storage After Storage Difference 1 4 2 -2 2 3.8 3.4 -0.4 3 4.1 3.4 -0.7 4 3.9 1.9 -2.0 5 3.1 3.5 0.4 6 4.2 1.8 -2.2 7 2.9 4.2 1.3 8 5.3 4.1 -1.2 9 4.9 3.8 -1.1 10 6.2 3.9 -2.3

Calculating the Differences Taster Before Storage After Storage Difference 1 4 2 -2 2 3.8 3.4 -0.4 3 4.1 3.4 -0.7 4 3.9 1.9 -2.0 5 3.1 3.5 0.4 6 4.2 1.8 -2.2 7 2.9 4.2 1.3 8 5.3 4.1 -1.2 9 4.9 3.8 -1.1 10 6.2 3.9 -2.3 Only focus on the differences when computing the sample statistics. In our sample the average difference is ¯ x = − 1 . 02. The sample standard deviation is s = 1 . 196.

Carrying Out the Test 1 x = − 1 . 02, s = 1 . 196. ¯ The hypothesis is H 0 : µ d = 0 H a : µ d < 0 The test statistic is t = ¯ x − 0 − 1 . 02 s / √ n = √ = − 2 . 70 1 . 196 / 10 This is a one-sided test. The P -value is P ( t < − 2 . 70). Since the t table only has positive values, we look for the value t = 2 . 70.

Carrying Out the Test 2 We have t = 2 . 70. There are df = n − 1 = 9 degrees of freedom. Using the table we see 2 . 398 < t = 2 . 70 < 2 . 821 , so using the one-sided test we get 0 . 02 > P > 0 . 01 . This suggests a significant loss of sweetness during storage.

In General . . . In a matched pairs experiment, the sample consists of pairs of individuals. Each pair contains exactly one individual from each of two populations. We usually only care about differences between the populations. We handle this by reducing each pair of data to a single difference, and then analysing the difference as we have done before with a one sample t -test.

Moreover . . . The same reasoning with other hypothesis tests apply to this one. In general we only care about the two populations having different means, so we would use a two-tailed test. H 0 : µ d = 0 H a : µ d � = 0 Sometimes, as in the cola example, we know that the difference, if any, will be in a certain direction. In those cases use a one-sided test.

Matched Pairs Confidence Intervals We can also estimate confidence intervals for the difference between two populations. We handle this just as the hypothesis test: ◮ Determine the sample average and sample standard deviation in difference between pairs. ◮ Compute a confidence interval from that data.

Confidence Interval Example Using the cola data, estimate the average loss of sweetness from storage with 95% confidence. The sample mean difference between the pairs is ¯ x = − 1 . 02, with s.d. s = 1 . 196. We have degrees of freedom df = n − 1 = 9. Use the table to find the value t ∗ = 2 . 262. x ± t ∗ s ¯ √ n = [ − 1 . 874 , − 0 . 164] With 95% confidence the true difference in sweetness from before storage to after storage is between − 1 . 874 and − 0 . 164.

ACMS 20340 Statistics for Life Sciences Chapter 17: Inference - PowerPoint PPT Presentation

ACMS 20340 Statistics for Life Sciences Chapter 17: Inference About a Population Mean Assumptions for Estimating a Population Mean Previously, in estimating a population mean, we assumed the sample, of size n , is a SRS from the population,

ACMS 20340 Statistics for Life Sciences Chapter 9: Introducing Probability Why Consider

ACMS 20340 Statistics for Life Sciences Chapter 3: Scatterplots and Correlation Exploratory

ACMS 20340 Statistics for Life Sciences Chapter 7: Samples and Observational Studies Obtaining

ACMS 20340 Statistics for Life Sciences Chapter 8: Designing Experiments Fishers Experiments

ACMS 20340 Statistics for Life Sciences Chapter 13: Sampling Distributions Sampling We use

ACMS 20340 Statistics for Life Sciences Chapter 18: Comparing Two Means Daily Activity and

ACMS 20340 Statistics for Life Sciences Chapter 15: Inference in Practice Inference in Practice

ACMS 20340 Statistics for Life Sciences Chapter 14: Introduction to Inference Sampling

ACMS 20340 Statistics for Life Sciences Chapter 4: Regression A Quick Recap of Chapter 3

ACMS 20340 Statistics for Life Sciences Chapter 11: The Normal Distributions Introducing the

ACMS 20340 Statistics for Life Sciences Chapter 20: Comparing Two Proportions Two sample tests

ACMS 20340 Statistics for Life Sciences Chapter 22: The Chi-Square Test for Two-Way Tables

ACMS 20340 Statistics for Life Sciences Chapter 19: Inference about a Population Proportion

ACMS 20340 Statistics for Life Sciences Chapter 24: One-way Analysis of Variance: Comparing

ACMS 20340 Statistics for Life Sciences Chapter 12: Discrete Probability Distributions What

ACMS 20340 Statistics for Life Sciences Chapter 21: The Chi-Square Test for Goodness of Fit

Online Learning Machine Learning 1 Some slides based on lectures from Dan Roth, Avrim Blum and

Representing Correlations in Conceptual Spaces Lucas Bechberger Institute of Cognitive Science

The Mixed Blessing in Subsidized Internet Access A presentation at the 7 th Annual Workshop on

Companies John J. Borer III Washington, DC Tuesday September 17, 2013 Overview In the US,

english: the lightest weight programming language of them all hugo liu & henry lieberman mit

Intro to Pa)ern Recogni/on CSCI 8260 Spring 2016 Computer Network A)acks and Defenses

CSCI-UA.9480 Introduction to Computer Security Session 3.5 Meltdown and Spectre Prof. Nadim

What v t variety ety should I I plant By Steve S Spark a and nd Jon onathan Br Brook

ACMS 20340 Statistics for Life Sciences Chapter 17: Inference - PowerPoint PPT Presentation

ACMS 20340 Statistics for Life Sciences Chapter 17: Inference About a Population Mean Assumptions for Estimating a Population Mean Previously, in estimating a population mean, we assumed the sample, of size n , is a SRS from the population,

ACMS 20340 Statistics for Life Sciences Chapter 9: Introducing Probability Why Consider

ACMS 20340 Statistics for Life Sciences Chapter 3: Scatterplots and Correlation Exploratory

ACMS 20340 Statistics for Life Sciences Chapter 7: Samples and Observational Studies Obtaining

ACMS 20340 Statistics for Life Sciences Chapter 8: Designing Experiments Fishers Experiments

ACMS 20340 Statistics for Life Sciences Chapter 13: Sampling Distributions Sampling We use

ACMS 20340 Statistics for Life Sciences Chapter 18: Comparing Two Means Daily Activity and

ACMS 20340 Statistics for Life Sciences Chapter 15: Inference in Practice Inference in Practice

ACMS 20340 Statistics for Life Sciences Chapter 14: Introduction to Inference Sampling

ACMS 20340 Statistics for Life Sciences Chapter 4: Regression A Quick Recap of Chapter 3

ACMS 20340 Statistics for Life Sciences Chapter 11: The Normal Distributions Introducing the

ACMS 20340 Statistics for Life Sciences Chapter 20: Comparing Two Proportions Two sample tests

ACMS 20340 Statistics for Life Sciences Chapter 22: The Chi-Square Test for Two-Way Tables

ACMS 20340 Statistics for Life Sciences Chapter 19: Inference about a Population Proportion

ACMS 20340 Statistics for Life Sciences Chapter 24: One-way Analysis of Variance: Comparing

ACMS 20340 Statistics for Life Sciences Chapter 12: Discrete Probability Distributions What

ACMS 20340 Statistics for Life Sciences Chapter 21: The Chi-Square Test for Goodness of Fit

Online Learning Machine Learning 1 Some slides based on lectures from Dan Roth, Avrim Blum and

Representing Correlations in Conceptual Spaces Lucas Bechberger Institute of Cognitive Science

The Mixed Blessing in Subsidized Internet Access A presentation at the 7 th Annual Workshop on

Companies John J. Borer III Washington, DC Tuesday September 17, 2013 Overview In the US,

english: the lightest weight programming language of them all hugo liu &amp; henry lieberman mit

Intro to Pa)ern Recogni/on CSCI 8260 Spring 2016 Computer Network A)acks and Defenses

CSCI-UA.9480 Introduction to Computer Security Session 3.5 Meltdown and Spectre Prof. Nadim

What v t variety ety should I I plant By Steve S Spark a and nd Jon onathan Br Brook

english: the lightest weight programming language of them all hugo liu & henry lieberman mit