inference for distributions
play

Inference for Distributions Marc H. Mehlman marcmehlman@yahoo.com - PowerPoint PPT Presentation

Inference for Distributions Marc H. Mehlman marcmehlman@yahoo.com University of New Haven Based on Rare Event Rule: rare events happen but not to me. Marc Mehlman Marc Mehlman (University of New Haven) Inference for Distributions 1


  1. Inference for Distributions Marc H. Mehlman marcmehlman@yahoo.com University of New Haven Based on Rare Event Rule: “rare events happen – but not to me”. Marc Mehlman Marc Mehlman (University of New Haven) Inference for Distributions 1 / 42

  2. Table of Contents t Distribution 1 CI for µ : σ unknown 2 t -test: Mean ( σ unknown) 3 t test: Matched Pairs 4 Two Sample z –Test: Means ( σ X and σ Y known) 5 Two Sample t –test: Means ( σ X and σ Y unknown) 6 Pooled Two Sample t –test: Means ( σ X = σ Y unknown) 7 Two Sample F –test: Variance 8 Chapter #8 R Assignment 9 Marc Mehlman Marc Mehlman (University of New Haven) Inference for Distributions 2 / 42

  3. t Distribution t Distribution t Distribution. Marc Mehlman Marc Mehlman (University of New Haven) Inference for Distributions 3 / 42

  4. t Distribution If X 1 , · · · , X n is a normal random sample ¯ � µ, σ � X − µ ¯ X ∼ N √ n ⇒ σ/ √ n ∼ N (0 , 1) . If the random sample is not normal, but n ≥ 30, the above is also true (approximately) by the CLT. However, typically one does not know what σ equals so one is tempted to use s instead of σ . This gives Definition (Student t Distribution) If a random sample is normal or the sample size is ≥ 30 ¯ X − µ S / √ n ∼ t ( n − 1) , where t is the Student t Distribution with n − 1 degrees of freedom . Marc Mehlman Marc Mehlman (University of New Haven) Inference for Distributions 4 / 42

  5. t Distribution The t Distributions When comparing the density curves of the standard Normal distribution and t distributions, several facts are apparent:  The density curves of the t distributions are similar in shape to the standard Normal curve.  The spread of the t distributions is a bit greater than that of the standard Normal distribution.  The t distributions have more probability in the tails and less in the center than does the standard Normal.  As the degrees of freedom increase, the t density curve approaches the standard Normal curve ever more closely. We can use Table D in the back of the book to determine critical values t* for t distributions with different degrees of freedom. 7 Marc Mehlman Marc Mehlman (University of New Haven) Inference for Distributions 5 / 42

  6. t Distribution Robustness The t procedures are exactly correct when the population is exactly Normal. This is rare. The t procedures are robust to small deviations from Normality, but: The sample must be a random sample from the population.   Outliers and skewness strongly influence the mean and therefore the t procedures. Their impact diminishes as the sample size gets larger because of the Central Limit Theorem. As a guideline: When n < 15, the data must be close to Normal and without outliers.  When 15 > n > 40, mild skewness is acceptable, but not outliers.  When n > 40, the t statistic will be valid even with strong skewness.  Marc Mehlman Marc Mehlman (University of New Haven) Inference for Distributions 6 / 42

  7. CI for µ : σ unknown CI for µ : σ unknown Confidence intervals for µ when σ is unknown. Marc Mehlman Marc Mehlman (University of New Haven) Inference for Distributions 7 / 42

  8. CI for µ : σ unknown Confidence intervals A confidence interval is a range of values that contains the true population parameter with probability ( confidence level) C . We have a set of data from a population with both µ and σ unknown. We use x ̅ to estimate µ , and s to estimate σ , using a t distribution (df n − 1).  C is the area between − t * and t *.  We find t * in the line of Table.  The margin of error m is: C m=t ∗ s / √ n m m − t * t * Marc Mehlman Marc Mehlman (University of New Haven) Inference for Distributions 8 / 42

  9. CI for µ : σ unknown Definition (standard error) S def The standard error of the sample mean for σ unknown is SE ¯ = √ n . x Theorem (CI for µ , σ unknown) Assume n ≥ 30 or the population is normal. Let = t ⋆ ( n − 1) ⋆ s margin of error = m def √ n = t ⋆ ( n − 1) ⋆ SE ¯ x . Then the confidence interval is ¯ x ± m. Marc Mehlman Marc Mehlman (University of New Haven) Inference for Distributions 9 / 42

  10. CI for µ : σ unknown Example The meters of total rainfall for Jupa, Beliana in the first decade of each of the last sixteen centuries is given below: 3 . 790155 3 . 628361 3 . 989105 5 . 124677 4 . 227491 3 . 183561 5 . 286963 3 . 323666 4 . 116425 2 . 771781 6 . 243354 5 . 040272 6 . 821760 6 . 170435 5 . 439190 5 . 206938 Find a 90% confidence interval for the mean rainfall per first decade of each century. Solution: The following normal quartile plot indicates Normal Q−Q Plot that the data comes from a normal (or at least almost normal) distribution. Since σ is unknown and the distribution is close to normal with no outliers, R gives us ● > mean(mdat)-qt(0.95,15)*sd(mdat)/sqrt(16) ● ● 6 [1] 4.124238 > mean(mdat)+qt(0.95,15)*sd(mdat)/sqrt(16) ● Sample Quantiles ● ● ● ● [1] 5.171278 5 Thus a 90% confidence interval is (4 . 124238 , 5 . 171278). Notice ● ● 4 ● > t.test(mdat,mu=4.5,conf.level=0.90) ● ● One Sample t-test ● ● data: mdat 3 ● t = 0.4948, df = 15, p-value = 0.6279 −2 −1 0 1 2 alternative hypothesis: true mean is not equal to 4.5 Theoretical Quantiles 90 percent confidence interval: 4.124238 5.171278 sample estimates: mean of x 4.647758 Marc Mehlman Marc Mehlman (University of New Haven) Inference for Distributions 10 / 42

  11. CI for µ : σ unknown Example The meters of total rainfall for Jupa, Beliana in the first decade of each of the last sixteen centuries is given below: 3 . 790155 3 . 628361 3 . 989105 5 . 124677 4 . 227491 3 . 183561 5 . 286963 3 . 323666 4 . 116425 2 . 771781 6 . 243354 5 . 040272 6 . 821760 6 . 170435 5 . 439190 5 . 206938 Find a 90% confidence interval for the mean rainfall per first decade of each century. Solution: The following normal quartile plot indicates Normal Q−Q Plot that the data comes from a normal (or at least almost normal) distribution. Since σ is unknown and the distribution is close to normal with no outliers, R gives us ● > mean(mdat)-qt(0.95,15)*sd(mdat)/sqrt(16) ● ● 6 [1] 4.124238 > mean(mdat)+qt(0.95,15)*sd(mdat)/sqrt(16) ● Sample Quantiles ● ● ● ● [1] 5.171278 5 Thus a 90% confidence interval is (4 . 124238 , 5 . 171278). Notice ● ● 4 ● > t.test(mdat,mu=4.5,conf.level=0.90) ● ● One Sample t-test ● ● data: mdat 3 ● t = 0.4948, df = 15, p-value = 0.6279 −2 −1 0 1 2 alternative hypothesis: true mean is not equal to 4.5 Theoretical Quantiles 90 percent confidence interval: 4.124238 5.171278 sample estimates: mean of x 4.647758 Marc Mehlman Marc Mehlman (University of New Haven) Inference for Distributions 10 / 42

  12. t -test: Mean ( σ unknown) t -test: Mean ( σ unknown) t -test: Mean ( σ unknown) Marc Mehlman Marc Mehlman (University of New Haven) Inference for Distributions 11 / 42

  13. t -test: Mean ( σ unknown) Theorem ( t –test for the Mean ( σ unknown)) Given a random sample, X 1 , · · · , X n , where either the random sample was sampled from a normal population or the sample size n ≥ 30 , let the test statistic be ¯ X − µ 0 T = S / √ n . Then T ∼ t ( n − 1) under H 0 : µ = µ 0 . The p–value of a test of H 0 1 versus H 1 : µ X > µ 0 is P ( T ≥ t ) . 2 versus H 2 : µ X < µ 0 is P ( T ≤ t). 3 versus H 3 : µ X � = µ 0 is 2 P ( T ≥ | t | ) . Marc Mehlman Marc Mehlman (University of New Haven) Inference for Distributions 12 / 42

  14. t -test: Mean ( σ unknown) Example The number of hours of Sesame Street that American 4 year olds watch each year is assumed to x = 125 and s 2 distributed normally. Twenty-five 4 year olds are randomly sampled and one finds ¯ X = 100. What is the p –value for a test of H 0 : µ X = 120 versus H A : µ X > 120? Solution: The population is normally distributed so the test statistic is 125 − 120 t = √ = 2 . 5 10 / 25 comes from t (24) under H 0 . Thus the p –value is > 1-pt(2.5,24) [1] 0.009827088 It seems unlikely that the average number of Sesame Street watching hours is 120 or less. Marc Mehlman Marc Mehlman (University of New Haven) Inference for Distributions 13 / 42

  15. t -test: Mean ( σ unknown) Example The number of hours of Sesame Street that American 4 year olds watch each year is assumed to x = 125 and s 2 distributed normally. Twenty-five 4 year olds are randomly sampled and one finds ¯ X = 100. What is the p –value for a test of H 0 : µ X = 120 versus H A : µ X > 120? Solution: The population is normally distributed so the test statistic is 125 − 120 t = √ = 2 . 5 10 / 25 comes from t (24) under H 0 . Thus the p –value is > 1-pt(2.5,24) [1] 0.009827088 It seems unlikely that the average number of Sesame Street watching hours is 120 or less. Marc Mehlman Marc Mehlman (University of New Haven) Inference for Distributions 13 / 42

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend