Inference for Distributions Marc H. Mehlman marcmehlman@yahoo.com - PowerPoint PPT Presentation

Inference for Distributions Marc H. Mehlman marcmehlman@yahoo.com University of New Haven Based on Rare Event Rule: “rare events happen – but not to me”. Marc Mehlman Marc Mehlman (University of New Haven) Inference for Distributions 1 / 42

Table of Contents t Distribution 1 CI for µ : σ unknown 2 t -test: Mean ( σ unknown) 3 t test: Matched Pairs 4 Two Sample z –Test: Means ( σ X and σ Y known) 5 Two Sample t –test: Means ( σ X and σ Y unknown) 6 Pooled Two Sample t –test: Means ( σ X = σ Y unknown) 7 Two Sample F –test: Variance 8 Chapter #8 R Assignment 9 Marc Mehlman Marc Mehlman (University of New Haven) Inference for Distributions 2 / 42

t Distribution t Distribution t Distribution. Marc Mehlman Marc Mehlman (University of New Haven) Inference for Distributions 3 / 42

t Distribution If X 1 , · · · , X n is a normal random sample ¯ � µ, σ � X − µ ¯ X ∼ N √ n ⇒ σ/ √ n ∼ N (0 , 1) . If the random sample is not normal, but n ≥ 30, the above is also true (approximately) by the CLT. However, typically one does not know what σ equals so one is tempted to use s instead of σ . This gives Definition (Student t Distribution) If a random sample is normal or the sample size is ≥ 30 ¯ X − µ S / √ n ∼ t ( n − 1) , where t is the Student t Distribution with n − 1 degrees of freedom . Marc Mehlman Marc Mehlman (University of New Haven) Inference for Distributions 4 / 42

t Distribution The t Distributions When comparing the density curves of the standard Normal distribution and t distributions, several facts are apparent:  The density curves of the t distributions are similar in shape to the standard Normal curve.  The spread of the t distributions is a bit greater than that of the standard Normal distribution.  The t distributions have more probability in the tails and less in the center than does the standard Normal.  As the degrees of freedom increase, the t density curve approaches the standard Normal curve ever more closely. We can use Table D in the back of the book to determine critical values t* for t distributions with different degrees of freedom. 7 Marc Mehlman Marc Mehlman (University of New Haven) Inference for Distributions 5 / 42

t Distribution Robustness The t procedures are exactly correct when the population is exactly Normal. This is rare. The t procedures are robust to small deviations from Normality, but: The sample must be a random sample from the population.   Outliers and skewness strongly influence the mean and therefore the t procedures. Their impact diminishes as the sample size gets larger because of the Central Limit Theorem. As a guideline: When n < 15, the data must be close to Normal and without outliers.  When 15 > n > 40, mild skewness is acceptable, but not outliers.  When n > 40, the t statistic will be valid even with strong skewness.  Marc Mehlman Marc Mehlman (University of New Haven) Inference for Distributions 6 / 42

CI for µ : σ unknown CI for µ : σ unknown Confidence intervals for µ when σ is unknown. Marc Mehlman Marc Mehlman (University of New Haven) Inference for Distributions 7 / 42

CI for µ : σ unknown Confidence intervals A confidence interval is a range of values that contains the true population parameter with probability ( confidence level) C . We have a set of data from a population with both µ and σ unknown. We use x ̅ to estimate µ , and s to estimate σ , using a t distribution (df n − 1).  C is the area between − t * and t *.  We find t * in the line of Table.  The margin of error m is: C m=t ∗ s / √ n m m − t * t * Marc Mehlman Marc Mehlman (University of New Haven) Inference for Distributions 8 / 42

CI for µ : σ unknown Definition (standard error) S def The standard error of the sample mean for σ unknown is SE ¯ = √ n . x Theorem (CI for µ , σ unknown) Assume n ≥ 30 or the population is normal. Let = t ⋆ ( n − 1) ⋆ s margin of error = m def √ n = t ⋆ ( n − 1) ⋆ SE ¯ x . Then the confidence interval is ¯ x ± m. Marc Mehlman Marc Mehlman (University of New Haven) Inference for Distributions 9 / 42

CI for µ : σ unknown Example The meters of total rainfall for Jupa, Beliana in the first decade of each of the last sixteen centuries is given below: 3 . 790155 3 . 628361 3 . 989105 5 . 124677 4 . 227491 3 . 183561 5 . 286963 3 . 323666 4 . 116425 2 . 771781 6 . 243354 5 . 040272 6 . 821760 6 . 170435 5 . 439190 5 . 206938 Find a 90% confidence interval for the mean rainfall per first decade of each century. Solution: The following normal quartile plot indicates Normal Q−Q Plot that the data comes from a normal (or at least almost normal) distribution. Since σ is unknown and the distribution is close to normal with no outliers, R gives us ● > mean(mdat)-qt(0.95,15)*sd(mdat)/sqrt(16) ● ● 6 [1] 4.124238 > mean(mdat)+qt(0.95,15)*sd(mdat)/sqrt(16) ● Sample Quantiles ● ● ● ● [1] 5.171278 5 Thus a 90% confidence interval is (4 . 124238 , 5 . 171278). Notice ● ● 4 ● > t.test(mdat,mu=4.5,conf.level=0.90) ● ● One Sample t-test ● ● data: mdat 3 ● t = 0.4948, df = 15, p-value = 0.6279 −2 −1 0 1 2 alternative hypothesis: true mean is not equal to 4.5 Theoretical Quantiles 90 percent confidence interval: 4.124238 5.171278 sample estimates: mean of x 4.647758 Marc Mehlman Marc Mehlman (University of New Haven) Inference for Distributions 10 / 42

t -test: Mean ( σ unknown) t -test: Mean ( σ unknown) t -test: Mean ( σ unknown) Marc Mehlman Marc Mehlman (University of New Haven) Inference for Distributions 11 / 42

t -test: Mean ( σ unknown) Theorem ( t –test for the Mean ( σ unknown)) Given a random sample, X 1 , · · · , X n , where either the random sample was sampled from a normal population or the sample size n ≥ 30 , let the test statistic be ¯ X − µ 0 T = S / √ n . Then T ∼ t ( n − 1) under H 0 : µ = µ 0 . The p–value of a test of H 0 1 versus H 1 : µ X > µ 0 is P ( T ≥ t ) . 2 versus H 2 : µ X < µ 0 is P ( T ≤ t). 3 versus H 3 : µ X � = µ 0 is 2 P ( T ≥ | t | ) . Marc Mehlman Marc Mehlman (University of New Haven) Inference for Distributions 12 / 42

t -test: Mean ( σ unknown) Example The number of hours of Sesame Street that American 4 year olds watch each year is assumed to x = 125 and s 2 distributed normally. Twenty-five 4 year olds are randomly sampled and one finds ¯ X = 100. What is the p –value for a test of H 0 : µ X = 120 versus H A : µ X > 120? Solution: The population is normally distributed so the test statistic is 125 − 120 t = √ = 2 . 5 10 / 25 comes from t (24) under H 0 . Thus the p –value is > 1-pt(2.5,24) [1] 0.009827088 It seems unlikely that the average number of Sesame Street watching hours is 120 or less. Marc Mehlman Marc Mehlman (University of New Haven) Inference for Distributions 13 / 42

Inference for Distributions Marc H. Mehlman marcmehlman@yahoo.com - PowerPoint PPT Presentation

Inference for Distributions Marc H. Mehlman marcmehlman@yahoo.com University of New Haven Based on Rare Event Rule: rare events happen but not to me. Marc Mehlman Marc Mehlman (University of New Haven) Inference for Distributions 1

Formal Modeling in Cognitive Science 1 Distributions Lecture 20: Joint, Marginal, and Conditional

? ? ? ? Basic Charts Outline - Distributions & Histograms - Mean, Mode, Average - Chart

Stat 5102 Lecture Slides: Deck 1 Empirical Distributions, Exact Sampling Distributions,

Create Distributions Empirically using Excel V0E 10/11/2014 0E 2014 Schield Creating

Input Distributions Reading: Chapter 6 in Law Input Distributions Overview Probability Theory

Lecture 5: Probability Distributions Random Variables Probability Distributions

Outline Power Law Size Distributions Distributions Power Law Size Distributions Overview

Statistical Inference Lecture 3: Common Families of Distributions MING GAO DASE @ ECNU (for

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Chapter 3: Common Families of Distributions STK4011/9011: Statistical Inference Theory Johan

STAT 401A - Statistical Methods for Research Workers Inference Using t -Distributions Jarad Niemi

Week 2: Inference for SLR Inference: sampling distributions, testing confidence intervals, and

Week 2: Inference for SLR Inference: sampling distributions, testing confidence intervals, and

Empirical Method Based Aggregate Loss Distributions C. K. Stan Khury 2012 INTRODUCTION

CS 330 Paper Review Learning to learn distributions Why Learn distributions aka learn

Triangular Distributions and Correlations The simple math behind triangular distributions and

Playing the Game in a Professional Services Firm Chris Hutchinson, CEO Kathy Steel, CEO

Course Business Discuss midterm projects Due today! Short-ish lecture on effect size

Nearest Neighbor Classification Machine Learning 1 This lecture K-nearest neighbor

2020 PREPARE FOR LIFTOFF. Coffee, With Benefits. Electrolyte & Honey Vanilla Antioxidant

Set Intentions NOT Resolutions! Taught by Harris Health System Employee Wellness Team

Clearheaded or Beheaded? LESSON 4 Your Response to the Lesson What was most interesting in the

Stress, Sleep, and Coping Mechanisms CRIS, Azusa Pacific University John Vernon, Payson Marsh,

Better Problem-Solving and Decision-Making with absolutely no mention of architecture or design

Inference for Distributions Marc H. Mehlman marcmehlman@yahoo.com - PowerPoint PPT Presentation

Inference for Distributions Marc H. Mehlman marcmehlman@yahoo.com University of New Haven Based on Rare Event Rule: rare events happen but not to me. Marc Mehlman Marc Mehlman (University of New Haven) Inference for Distributions 1

Formal Modeling in Cognitive Science 1 Distributions Lecture 20: Joint, Marginal, and Conditional

? ? ? ? Basic Charts Outline - Distributions &amp; Histograms - Mean, Mode, Average - Chart

Stat 5102 Lecture Slides: Deck 1 Empirical Distributions, Exact Sampling Distributions,

Create Distributions Empirically using Excel V0E 10/11/2014 0E 2014 Schield Creating

Input Distributions Reading: Chapter 6 in Law Input Distributions Overview Probability Theory

Lecture 5: Probability Distributions Random Variables Probability Distributions

Outline Power Law Size Distributions Distributions Power Law Size Distributions Overview

Statistical Inference Lecture 3: Common Families of Distributions MING GAO DASE @ ECNU (for

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Chapter 3: Common Families of Distributions STK4011/9011: Statistical Inference Theory Johan

STAT 401A - Statistical Methods for Research Workers Inference Using t -Distributions Jarad Niemi

Week 2: Inference for SLR Inference: sampling distributions, testing confidence intervals, and

Week 2: Inference for SLR Inference: sampling distributions, testing confidence intervals, and

Empirical Method Based Aggregate Loss Distributions C. K. Stan Khury 2012 INTRODUCTION

CS 330 Paper Review Learning to learn distributions Why Learn distributions aka learn

Triangular Distributions and Correlations The simple math behind triangular distributions and

Playing the Game in a Professional Services Firm Chris Hutchinson, CEO Kathy Steel, CEO

Course Business Discuss midterm projects Due today! Short-ish lecture on effect size

Nearest Neighbor Classification Machine Learning 1 This lecture K-nearest neighbor

2020 PREPARE FOR LIFTOFF. Coffee, With Benefits. Electrolyte &amp; Honey Vanilla Antioxidant

Set Intentions NOT Resolutions! Taught by Harris Health System Employee Wellness Team

Clearheaded or Beheaded? LESSON 4 Your Response to the Lesson What was most interesting in the

Stress, Sleep, and Coping Mechanisms CRIS, Azusa Pacific University John Vernon, Payson Marsh,

Better Problem-Solving and Decision-Making with absolutely no mention of architecture or design

? ? ? ? Basic Charts Outline - Distributions & Histograms - Mean, Mode, Average - Chart

2020 PREPARE FOR LIFTOFF. Coffee, With Benefits. Electrolyte & Honey Vanilla Antioxidant