 
              Chapter 7 Inferences Based on a Single Sample: Estimation with Confidence Intervals
Large-Sample Confidence Interval for a Population Mean How to estimate the population mean and assess the estimate’s reliability?  is an estimate of , and we use CLT to x assess how accurate that estimate is According to CLT, 95% of all from sample x size n lie within of the mean   1 . 96 x We can use this to assess accuracy of x  as an estimate of
Large-Sample Confidence Interval for a Population Mean  1 . 96      1 . 96 . 95 x x x n We are 95% confident, for any from sample  x  size n, that will lie in the interval  x 1 . 96 n
Large-Sample Confidence Interval for a Population Mean We usually don’t know , but with a large  sample s is a good estimator of .  We can calculate confidence intervals for different confidence coefficients Confidence coefficient – probability that a randomly selected confidence interval encloses the population parameter Confidence level – Confidence coefficient expressed as a percentage
Large-Sample Confidence Interval for a Population Mean  The confidence coefficient is equal to 1- , and is split between the two tails of the distribution
Large-Sample Confidence Interval for a Population Mean The Confidence Interval is expressed more generally as      x z x z   2 2 x n For samples of size > 30, the confidence interval is expressed as   s    x z  2   n Requires that the sample used be random
Large-Sample Confidence Interval for a Population Mean Commonly used values of z  /2 Confidence level 100(1-  )   /2 z z  /2  2 90% .10 .05 1.645 95% .05 .025 1.96 99% .01 .005 2.575
Small-Sample Confidence Interval for a Population Mean 2 problems presented by sample sizes of less than 30 – CLT no longer applies – Population standard deviation is almost always unknown, and s may provide a poor estimation when n is small
Small-Sample Confidence Interval for a Population Mean If we can assume that the sampled population is approximately normal, then the sampling distribution of can be assumed x to be approximately normal     x x Instead of using we use   z t  n s n This t is referred to as the t-statistic
Small-Sample Confidence Interval for a Population Mean The t-statistic has: a sampling distribution very similar to z Variability dependent on n, or sample size. Variability is expressed as (n-1) degrees of freedom (df). As (df) gets smaller, variability increases
Small-Sample Confidence Interval for a Population Mean Table for t-distribution contains t-value for various combinations of degrees of freedom and t  Partial table below shows components of table Need Table 7.3 from text inserted here.
Small-Sample Confidence Interval for a Population Mean Comparing t and z distributions for the same  , with df=4 for the t-distribution, you can see that the t-score is larger, and therefore the confidence interval will be wider. The closer df gets to 30, the more closely the t-distribution approximates the normal distribution
Small-Sample Confidence Interval for a Population Mean When creating a Confidence interval around  for a small sample we use   s    x t  2   n basing t  /2 on n-1 degrees of freedom We assume a random sample drawn from a population that is approximately normally distributed
Large-Scale Confidence Interval for a Population Proportion Confidence intervals around a proportion are confidence intervals around the probability of success in a binomial experiment Sample statistic of interest is ˆ p ˆ Mean of sampling distribution of is p. p is an p ˆ p unbiased estimator of Standard deviation of the sampling distribution is  p  where q=1-p pq n ˆ ˆ p For large samples, the sampling distribution of is approximately normal
Large-Scale Confidence Interval for a Population Proportion   ˆ Sample size n is large if falls between 0 p 3 ˆ p and 1 Confidence interval is calculated as ˆ ˆ pq p q       ˆ ˆ ˆ p z p z p z    2 2 2 p n n x   p  ˆ ˆ ˆ where and 1 q p n
Large-Scale Confidence Interval for a Population Proportion When p is near 0 or 1, the confidence intervals calculated using the formulas presented are misleading An adjustment can be used that works for any p, even with very small sample sizes   ~ ~  1 p p ~  p z  2  n 4
Determining the Sample Size When we want to estimate  to within x units with a (1-  ) level of confidence, we can calculate the sample size needed We use the Sampling Error (SE), which is half the width of the confidence interval To estimate  with Sampling error SE and 100(1-  )% confidence,   2 2  z  2  n   2 SE where  is estimated by s or R/4
Determining the Sample Size Assume a sample with  =.01, and a range R of .4 What size sample do we need to achieve a desired SE of .025?   2   2  2 2 z ( 2 . 575 ) . 1  2    n 106 . 09     2 2 . 025 SE
Determining the Sample Size Sample size can also be estimated for population proportion p     2 z pq  2  n   2 SE Since pq is unknown you must estimate. Estimates with a value of p being equal or close to .5 are the most conservative
Finite Population Correction for Simple Random Sampling Used when the sample size n is large relative to the size of the population N, when n/N >.05 Standard error calculation for  with correction  s N n  ˆ  x N n Standard error calculation for p with correction     ˆ ˆ p 1 p N n  ˆ  p n N
Sample Survey Designs • Simple Random Sample • Stratified Random Sampling – separation into two or more groups of sampling units – Produce estimators with smaller standard errors – Increase representativeness – Can reduce cost
Sample Survey Designs • Systematic Sampling – Sampling of every nth unit – Samples are easier to select – Can lead to systematic bias, particularly if there is periodicity in the data you are drawing the data from
Sample Survey Designs • Randomized Response Sampling – Used when questions in the survey are of a sensitive nature and likely to result in false answers
Sample Survey Designs Nonresponse – when units in the sample do not produce observations Nonresponse can produce bias in the results of the survey if there is a relationship between the type of response and whether or not a response is achieved. If a random sample is called for, any nonresponse means that your sample is no longer random
Recommend
More recommend