beyond descriptive statistics
play

Beyond descriptive statistics 2 When we have a data set, we usually - PowerPoint PPT Presentation

Beyond descriptive statistics 2 When we have a data set, we usually want to do more with the data than just describe them Keep in mind that data are information of a sample selected or generated from a population, and our goal is to make


  1. Beyond descriptive statistics 2  When we have a data set, we usually want to do more with the data than just describe them  Keep in mind that data are information of a sample selected or generated from a population, and our goal is to make inferences about the population

  2. Statistical inference 3 Statistical inference can be further subdivided into the two main areas of estimation and hypothesis • Estimation is concerned with estimating the values of specific population parameters (Today’s lecture). • Hypothesis testing is concerned with testing whether the value of a population parameter is equal to some specific value (next lecture).

  3. Point estimation and interval estimation 4 • Sometimes we are interested in obtaining specific values as estimates of our parameters (along with estimation precise). These values are referred to as point estimates • Sometimes we want to specify a range within which the parameter values are likely to fall. If the range is narrow, then we may feel our point estimate is good. These are called interval estimates

  4. From Sample to Population! 5 • Purpose of Population? inference: Make decisions about population characteristics when it is impractical to observe the whole population and we only have a sample of data drawn from the population

  5. Towards statistical inference 6 o Parameter : a number describing the population o Statistic: a number describing a sample o Statistical inference: Statistic → Parameter

  6. Estimation of population mean 7 ( , x x ,..., x ) • We have a sample randomly sampled from 1 2 n a population • The population mean µ and variance σ 2 are unknown • Question: how to use the observed sample ( , x x ,..., x ) 1 2 n to estimate µ and σ 2 ?

  7. Point estimator of population mean and variance 8 • A natural estimator for estimating population mean µ is the sample mean n ∑ = i / x x n = 1 i • A natural estimator for estimating population standard deviation σ is the sample standard deviation n 1 ∑ = − 2 ( ) s x x − i n 1 = i 1

  8. Point estimator of population mean 9 • A natural estimator for estimating population mean µ is the sample mean n ∑ = i / x x n = 1 i • Question: How good is this estimate?

  9. Point estimator of population mean 10 • A natural estimator for estimating population mean µ is the sample mean n ∑ = i / x x n = 1 i • Question: How good is this estimate? x • We would like this is close to µ, however, we don’t know the value of µ. • We need to study the distribution of X

  10. Sampling distribution of sample mean 11 • To understand what properties of make it a X desirable estimator for µ, we need to forget about our particular sample for the moment and consider all possible samples of size n that could have been selected from the population • The values of in different samples will be different. X  , , , x x x These values will be denoted by 1 2 3 X • The sampling distribution of is the distribution x of values over all possible samples of size n that could have been selected from the study population

  11. Research question: center of a population 12 Population Mean

  12. Research question: center of a population 13 Random sample 1  Random sample 2  Population . Mean . . Random sample K  Sample is representative of the population

  13. Research question: center of a population 14 X Random sample 1  x 1 Random sample 2  x Population 2 . Mean . Random sample K  x K  The selection of random sample set is A RANDOM EXPERIMENT.  is RANDOM VARIABLE X  are observed values for 1 ,..., x x X K

  14. An example of sampling distribution 15

  15. Sample mean is an unbiased estimator of population mean  , , , x x x 1 2 3 16 • We can show that the average of these samples mean ( over all possible samples) is equal to the population mean µ • Unbiasedness: Let X 1 , X 2 , …, X n be a random sample drawn from some population with mean µ. Then = µ ( X ) E

  16. is minimum variance unbiased estimator of µ X 17 • The unbiasedness of sample mean is not sufficient reason to use it as an estimator of µ • There are many other unbiased estimators, like sample median and the average of min and max • We can show that (but not here): among all kinds of unbiased estimators, the sample mean has the smallest variance • Now what is the variance of sample mean ? X

  17. Standard error(SE) of sample mean X • The variance of sample mean measures the 18 estimation precise. σ • is the population variance 2 = σ 2 var( X ) / n = σ ( ) / SE X n

  18. σ Use to estimate s / n / n σ 19 2 • In practice, the population variance is rarely 2 s known. And the sample variance is a σ 2 reasonable estimator for σ • Therefore, the standard error of mean can / n be estimated by s / n n 1 ∑ = − 2 s ( x x ) (recall that ) − i 1 n = 1 i NOTE : The larger sample size is  the smaller standard error is  the more accurate estimation is X

  19. An example of standard error 20 • A sample of size 10 birthweights: 97, 125, 62, 120, 132, 135, 118, 137, 126, 118 (sample mean x-bar=117.00 and sample standard deviation s=22.44) • In order to estimate the population mean µ, a = point estimate is the sample mean , x 117 . 00 with standard error given by = = = / 22 . 44 / 10 7 . 09 SE s n

  20. Summary of sampling distribution of X 21 • Let X 1 , …, X n be a random sample from a population with µ and σ 2 . Then the mean and variance of is X µ and σ 2 /n, respectively • Furthermore, if X 1 , …, X n be a random sample from a normal population with µ and σ 2 . Then by the properties of linear combination, is also normally X distributed, that is 2 n µ σ X ~ N ( , / ) • Now the question is, if the population is NOT normal, what is the distribution of ? X

  21. The Central Limit Theorem 22 • Let X 1 , X 2 , …, X n denote n independent random variables sampled from some population with mean µ and variance σ 2 • When n is large, the sampling distribution of the sample mean is approximately normally distributed even if the underlying population is not normal ≈ µ σ 2 ( , ) X N n • By standardization: − µ X ≈ Z ~ N (0,1) σ / n

  22. Illustration of Central limit Theorem (CLT) 23

  23. Interval estimation 24 • Let X 1 , X 2 , …, X n denote n independent random variables sampled from some population with mean µ and variance σ 2 • Our goal is to estimate µ. We know that is a good point estimate N1=N2=15" • Now we want to have a confidence interval − + = ± ( X a , X a ) X a such that a 95% confidence interval(CI) for µ should satisfy the following: − < µ < + = Pr( ) 95 % X a X a

  24. Interval estimation 25 2 n µ σ • BY CLT we have . Use the Z-transformation, X ~ N ( , / ) we have − µ X = Z ~ N (0,1) σ / n • Thus for a standard normal variable, what’s the width of the interval within which 95% of the data are covered? Pr(-1<Z<1)=0.6827, Pr(-1.96<Z<1.96)=0.95 Pr(-2.58<Z<2.58)=0.99

  25. Interval estimation 26 = − ≤ ≤ • Thus we have 0.95 ( 1.96 1.96) P Z − µ X − ≤ ≤ = ( 1.96 P 1.96) σ / n However, we don’t know σ , we replace σ with s—sample standard deviation: − µ X − ≤ ≤ 0.95= ( 1.96 P 1.96) s / n / s n Term is called the SAMPLE STANDARD ERROR, denoted as SE( ). Thus 95% confidence interval for µ is : ≈ − ≤ µ ≤ + 0.95 P X ( 1.96 SE X ( ) X 1.96 SE X ( ))

  26. Ex. 27 • Assuming that the n=92 students were a simple random sample of all NYU students. The sample mean weight is x = 145.2 lbs. And sample standard deviation s=23.7. What’s the 95% confidence interval for NYU student mean weight?

  27. Ex. 28 • Assuming that the n=92 students were a simple random sample of all NYU students. The sample mean weight is x = 145.2 lbs. And sample standard deviation s=23.7. What’s the 95% confidence interval for NYU student mean weight? • Answer: First of all the standard error is 23.7 SE X = ( ) =2.47 92 • 95% CI is ± x 1.96* SE X ( ) = ± 145.2 (1.96)(2.47) = ± 145.2 4.8 lbs =[140.4 lbs, 150 lbs]

  28. Interval estimation for an arbitrary level of confidence 1- α 29 Z α /2 is Z critical value for Area 1- α level α /2 ≤ = α P Z ( z α ) / 2 /2 In particular Area α /2 Area α /2 ≤ = P Z ( z ) 0.025 0.025 z α z − α /2 1 /2 EXCEL: NORM.INV(0.025, 0,1)=-1.96 NORM.INV(probability, mean, sd)= critical value

  29. Z critical value table 30 1- α 0.80 0.90 0.95 0.99 α 0.20 0.10 0.05 0.01 α /2 0.10 0.05 0.025 0.005 Z 1- α /2 1.28 1.64 1.96 2.58 E.G. For 0.99 level of confidence interval, go out 2.58 standard errors from sample mean.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend