Unit 3: Foundations for inference Lecture 1: Variability in - - PowerPoint PPT Presentation
Unit 3: Foundations for inference Lecture 1: Variability in - - PowerPoint PPT Presentation
Unit 3: Foundations for inference Lecture 1: Variability in estimates and CLT Statistics 101 Thomas Leininger May 28 2013 Announcements Announcements 1 Variability in estimates 2 Example Sampling distributions - via simulation Sampling
Announcements
1
Announcements
2
Variability in estimates Example Sampling distributions - via simulation Sampling distributions - via CLT
Statistics 101 U3 - L1: Variability in estimates and CLT Thomas Leininger
Announcements
Announcements
Labs 2 & 3 due today PS 3 due tomorrow Projects
Statistics 101 (Thomas Leininger) U3 - L1: Variability in estimates and CLT May 28 2013 2 / 16
Variability in estimates
1
Announcements
2
Variability in estimates Example Sampling distributions - via simulation Sampling distributions - via CLT
Statistics 101 U3 - L1: Variability in estimates and CLT Thomas Leininger
Variability in estimates Example
1
Announcements
2
Variability in estimates Example Sampling distributions - via simulation Sampling distributions - via CLT
Statistics 101 U3 - L1: Variability in estimates and CLT Thomas Leininger
Variability in estimates Example http://pewresearch.org/pubs/2191/young-adults-workers-labor-market-pay-careers-advancement-recession Statistics 101 (Thomas Leininger) U3 - L1: Variability in estimates and CLT May 28 2013 3 / 16
Variability in estimates Example
Margin of error
41% ± 2.9%: We are 95% confident that 38.1% to 43.9% of the public believe young adults, rather than middle-aged or older adults, are having the toughest time in today’s economy. 49% ± 4.4%: We are 95% confident that 44.6% to 53.4% of 18-34 years olds have taken a job they didn’t want just to pay the bills.
Statistics 101 (Thomas Leininger) U3 - L1: Variability in estimates and CLT May 28 2013 4 / 16
Variability in estimates Example
Parameter estimation
We are often interested in population parameters. Since complete populations are difficult (or impossible) to collect data on, we use sample statistics as point estimates for the unknown population parameters of interest. Sample statistics vary from sample to sample. Quantifying how sample statistics vary provides a way to estimate the margin of error associated with our point estimate. But before we get to quantifying the variability among samples, let’s try to understand how and why point estimates vary from sample to sample. Suppose we randomly sample 1,000 adults from each state in the US. Would you expect the sample means of their heights to be the same, somewhat different, or very different?
Statistics 101 (Thomas Leininger) U3 - L1: Variability in estimates and CLT May 28 2013 5 / 16
Variability in estimates Sampling distributions - via simulation
1
Announcements
2
Variability in estimates Example Sampling distributions - via simulation Sampling distributions - via CLT
Statistics 101 U3 - L1: Variability in estimates and CLT Thomas Leininger
Variability in estimates Sampling distributions - via simulation
Average number of Duke games attended
Next let’s look at the population data for the number of Duke basketball games attended:
number of Duke games attended Frequency 10 20 30 40 50 60 70 50 100 150
Statistics 101 (Thomas Leininger) U3 - L1: Variability in estimates and CLT May 28 2013 6 / 16
Variability in estimates Sampling distributions - via simulation
Average number of Duke games attended (cont.)
Sampling distribution, n = 10:
sample means from samples of n = 10 Frequency 5 10 15 20 500 1000 1500 2000
What does each observa- tion in this distribution rep- resent? Is the variability of the sam- pling distribution smaller or larger than the variability of the population distribution? Why?
Statistics 101 (Thomas Leininger) U3 - L1: Variability in estimates and CLT May 28 2013 7 / 16
Variability in estimates Sampling distributions - via simulation
Average number of Duke games attended (cont.)
Sampling distribution, n = 10:
sample means from samples of n = 10 Frequency 5 10 15 20 500 1000 1500 2000
What does each observa- tion in this distribution rep- resent? Sample mean, ¯
x, of
samples of size n = 10. Is the variability of the sam- pling distribution smaller or larger than the variability of the population distribution? Why?
Statistics 101 (Thomas Leininger) U3 - L1: Variability in estimates and CLT May 28 2013 7 / 16
Variability in estimates Sampling distributions - via simulation
Average number of Duke games attended (cont.)
Sampling distribution, n = 10:
sample means from samples of n = 10 Frequency 5 10 15 20 500 1000 1500 2000
What does each observa- tion in this distribution rep- resent? Sample mean, ¯
x, of
samples of size n = 10. Is the variability of the sam- pling distribution smaller or larger than the variability of the population distribution? Why? Smaller, sample means will vary less than individual
- bservations.
Statistics 101 (Thomas Leininger) U3 - L1: Variability in estimates and CLT May 28 2013 7 / 16
Variability in estimates Sampling distributions - via simulation
Average number of Duke games attended (cont.)
Sampling distribution, n = 30:
sample means from samples of n = 30 Frequency 2 4 6 8 10 200 400 600 800
How did the shape, cen- ter, and spread of the sam- pling distribution change go- ing from n = 10 to n = 30?
Statistics 101 (Thomas Leininger) U3 - L1: Variability in estimates and CLT May 28 2013 8 / 16
Variability in estimates Sampling distributions - via simulation
Average number of Duke games attended (cont.)
Sampling distribution, n = 30:
sample means from samples of n = 30 Frequency 2 4 6 8 10 200 400 600 800
How did the shape, cen- ter, and spread of the sam- pling distribution change go- ing from n = 10 to n = 30? Shape is more symmetric, center is about the same, spread is smaller.
Statistics 101 (Thomas Leininger) U3 - L1: Variability in estimates and CLT May 28 2013 8 / 16
Variability in estimates Sampling distributions - via simulation
Average number of Duke games attended (cont.)
Sampling distribution, n = 70:
sample means from samples of n = 70 Frequency 3 4 5 6 7 8 9 200 400 600 800 1000 1200
Statistics 101 (Thomas Leininger) U3 - L1: Variability in estimates and CLT May 28 2013 9 / 16
Variability in estimates Sampling distributions - via simulation
Average number of Duke games attended (cont.)
Question The mean of the sampling distribution is 5.75, and the standard devia- tion of the sampling distribution (also called the standard error) is 0.75. Which of the following is the most reasonable guess for the 95% con- fidence interval for the true average number of Duke games attended by students? (a) 5.75 ± 0.75 (b) 5.75 ± 2 × 0.75 (c) 5.75 ± 3 × 0.75 (d) cannot tell from the information given
Statistics 101 (Thomas Leininger) U3 - L1: Variability in estimates and CLT May 28 2013 10 / 16
Variability in estimates Sampling distributions - via simulation
Average number of Duke games attended (cont.)
Question The mean of the sampling distribution is 5.75, and the standard devia- tion of the sampling distribution (also called the standard error) is 0.75. Which of the following is the most reasonable guess for the 95% con- fidence interval for the true average number of Duke games attended by students? (a) 5.75 ± 0.75 (b) 5.75 ± 2 × 0.75 → (4.25, 7.25) (c) 5.75 ± 3 × 0.75 (d) cannot tell from the information given
Statistics 101 (Thomas Leininger) U3 - L1: Variability in estimates and CLT May 28 2013 10 / 16
Variability in estimates Sampling distributions - via CLT
1
Announcements
2
Variability in estimates Example Sampling distributions - via simulation Sampling distributions - via CLT
Statistics 101 U3 - L1: Variability in estimates and CLT Thomas Leininger
Variability in estimates Sampling distributions - via CLT
Central limit theorem
Central limit theorem The distribution of the sample mean is well approximated by a normal model:
¯ x ∼ N
- mean = µ, SE = σ
√n
- If σ is unknown, use s.
So it wasn’t a coincidence that the sampling distributions we saw earlier were symmetric. We won’t go into the proving why SE =
σ √n, but note that as n
increases SE decreases. As the sample size increases we would expect samples to yield more consistent sample means, hence the variability among the sample means would be lower.
Statistics 101 (Thomas Leininger) U3 - L1: Variability in estimates and CLT May 28 2013 11 / 16
Variability in estimates Sampling distributions - via CLT
CLT - conditions
Certain conditions must be met for the CLT to apply:
1
Independence: Sampled observations must be independent. This is difficult to verify, but is more likely if
random sampling/assignment is used, and, if sampling without replacement, n < 10% of the population.
2
Sample size/skew/outliers: Either 1) the population distribution is normal OR 2) n > 30 and the population distribution is not extremely skewed. This is also difficult to verify for the population, but we can check it using the sample data, and assume that the sample mirrors the population.
Statistics 101 (Thomas Leininger) U3 - L1: Variability in estimates and CLT May 28 2013 12 / 16
Variability in estimates Sampling distributions - via CLT
CLT - sample size/skew condition - simulations (1)
http://onlinestatbook.com/stat sim/sampling dist/index.html Statistics 101 (Thomas Leininger) U3 - L1: Variability in estimates and CLT May 28 2013 13 / 16
Variability in estimates Sampling distributions - via CLT
CLT - sample size/skew condition - simulations (2)
Statistics 101 (Thomas Leininger) U3 - L1: Variability in estimates and CLT May 28 2013 14 / 16
Variability in estimates Sampling distributions - via CLT
CLT - sample size/skew condition - simulations (3)
Statistics 101 (Thomas Leininger) U3 - L1: Variability in estimates and CLT May 28 2013 15 / 16
Variability in estimates Sampling distributions - via CLT
CLT - sample size/skew condition
Question Which of the below visualizations is not appropriate for checking if the sample, and hence the population, has an extremely skewed distribu- tion? (a) histogram (b) boxplot (c) normal probability plot (d) barplot
Statistics 101 (Thomas Leininger) U3 - L1: Variability in estimates and CLT May 28 2013 16 / 16
Variability in estimates Sampling distributions - via CLT
CLT - sample size/skew condition
Question Which of the below visualizations is not appropriate for checking if the sample, and hence the population, has an extremely skewed distribu- tion? (a) histogram (b) boxplot (c) normal probability plot (d) barplot
Statistics 101 (Thomas Leininger) U3 - L1: Variability in estimates and CLT May 28 2013 16 / 16