Unit 3: Foundations for inference Lecture 1: Variability in - - PowerPoint PPT Presentation

unit 3 foundations for inference lecture 1 variability in
SMART_READER_LITE
LIVE PREVIEW

Unit 3: Foundations for inference Lecture 1: Variability in - - PowerPoint PPT Presentation

Unit 3: Foundations for inference Lecture 1: Variability in estimates and CLT Statistics 101 Thomas Leininger May 28 2013 Announcements Announcements 1 Variability in estimates 2 Example Sampling distributions - via simulation Sampling


slide-1
SLIDE 1

Unit 3: Foundations for inference Lecture 1: Variability in estimates and CLT

Statistics 101

Thomas Leininger

May 28 2013

slide-2
SLIDE 2

Announcements

1

Announcements

2

Variability in estimates Example Sampling distributions - via simulation Sampling distributions - via CLT

Statistics 101 U3 - L1: Variability in estimates and CLT Thomas Leininger

slide-3
SLIDE 3

Announcements

Announcements

Labs 2 & 3 due today PS 3 due tomorrow Projects

Statistics 101 (Thomas Leininger) U3 - L1: Variability in estimates and CLT May 28 2013 2 / 16

slide-4
SLIDE 4

Variability in estimates

1

Announcements

2

Variability in estimates Example Sampling distributions - via simulation Sampling distributions - via CLT

Statistics 101 U3 - L1: Variability in estimates and CLT Thomas Leininger

slide-5
SLIDE 5

Variability in estimates Example

1

Announcements

2

Variability in estimates Example Sampling distributions - via simulation Sampling distributions - via CLT

Statistics 101 U3 - L1: Variability in estimates and CLT Thomas Leininger

slide-6
SLIDE 6

Variability in estimates Example http://pewresearch.org/pubs/2191/young-adults-workers-labor-market-pay-careers-advancement-recession Statistics 101 (Thomas Leininger) U3 - L1: Variability in estimates and CLT May 28 2013 3 / 16

slide-7
SLIDE 7

Variability in estimates Example

Margin of error

41% ± 2.9%: We are 95% confident that 38.1% to 43.9% of the public believe young adults, rather than middle-aged or older adults, are having the toughest time in today’s economy. 49% ± 4.4%: We are 95% confident that 44.6% to 53.4% of 18-34 years olds have taken a job they didn’t want just to pay the bills.

Statistics 101 (Thomas Leininger) U3 - L1: Variability in estimates and CLT May 28 2013 4 / 16

slide-8
SLIDE 8

Variability in estimates Example

Parameter estimation

We are often interested in population parameters. Since complete populations are difficult (or impossible) to collect data on, we use sample statistics as point estimates for the unknown population parameters of interest. Sample statistics vary from sample to sample. Quantifying how sample statistics vary provides a way to estimate the margin of error associated with our point estimate. But before we get to quantifying the variability among samples, let’s try to understand how and why point estimates vary from sample to sample. Suppose we randomly sample 1,000 adults from each state in the US. Would you expect the sample means of their heights to be the same, somewhat different, or very different?

Statistics 101 (Thomas Leininger) U3 - L1: Variability in estimates and CLT May 28 2013 5 / 16

slide-9
SLIDE 9

Variability in estimates Sampling distributions - via simulation

1

Announcements

2

Variability in estimates Example Sampling distributions - via simulation Sampling distributions - via CLT

Statistics 101 U3 - L1: Variability in estimates and CLT Thomas Leininger

slide-10
SLIDE 10

Variability in estimates Sampling distributions - via simulation

Average number of Duke games attended

Next let’s look at the population data for the number of Duke basketball games attended:

number of Duke games attended Frequency 10 20 30 40 50 60 70 50 100 150

Statistics 101 (Thomas Leininger) U3 - L1: Variability in estimates and CLT May 28 2013 6 / 16

slide-11
SLIDE 11

Variability in estimates Sampling distributions - via simulation

Average number of Duke games attended (cont.)

Sampling distribution, n = 10:

sample means from samples of n = 10 Frequency 5 10 15 20 500 1000 1500 2000

What does each observa- tion in this distribution rep- resent? Is the variability of the sam- pling distribution smaller or larger than the variability of the population distribution? Why?

Statistics 101 (Thomas Leininger) U3 - L1: Variability in estimates and CLT May 28 2013 7 / 16

slide-12
SLIDE 12

Variability in estimates Sampling distributions - via simulation

Average number of Duke games attended (cont.)

Sampling distribution, n = 10:

sample means from samples of n = 10 Frequency 5 10 15 20 500 1000 1500 2000

What does each observa- tion in this distribution rep- resent? Sample mean, ¯

x, of

samples of size n = 10. Is the variability of the sam- pling distribution smaller or larger than the variability of the population distribution? Why?

Statistics 101 (Thomas Leininger) U3 - L1: Variability in estimates and CLT May 28 2013 7 / 16

slide-13
SLIDE 13

Variability in estimates Sampling distributions - via simulation

Average number of Duke games attended (cont.)

Sampling distribution, n = 10:

sample means from samples of n = 10 Frequency 5 10 15 20 500 1000 1500 2000

What does each observa- tion in this distribution rep- resent? Sample mean, ¯

x, of

samples of size n = 10. Is the variability of the sam- pling distribution smaller or larger than the variability of the population distribution? Why? Smaller, sample means will vary less than individual

  • bservations.

Statistics 101 (Thomas Leininger) U3 - L1: Variability in estimates and CLT May 28 2013 7 / 16

slide-14
SLIDE 14

Variability in estimates Sampling distributions - via simulation

Average number of Duke games attended (cont.)

Sampling distribution, n = 30:

sample means from samples of n = 30 Frequency 2 4 6 8 10 200 400 600 800

How did the shape, cen- ter, and spread of the sam- pling distribution change go- ing from n = 10 to n = 30?

Statistics 101 (Thomas Leininger) U3 - L1: Variability in estimates and CLT May 28 2013 8 / 16

slide-15
SLIDE 15

Variability in estimates Sampling distributions - via simulation

Average number of Duke games attended (cont.)

Sampling distribution, n = 30:

sample means from samples of n = 30 Frequency 2 4 6 8 10 200 400 600 800

How did the shape, cen- ter, and spread of the sam- pling distribution change go- ing from n = 10 to n = 30? Shape is more symmetric, center is about the same, spread is smaller.

Statistics 101 (Thomas Leininger) U3 - L1: Variability in estimates and CLT May 28 2013 8 / 16

slide-16
SLIDE 16

Variability in estimates Sampling distributions - via simulation

Average number of Duke games attended (cont.)

Sampling distribution, n = 70:

sample means from samples of n = 70 Frequency 3 4 5 6 7 8 9 200 400 600 800 1000 1200

Statistics 101 (Thomas Leininger) U3 - L1: Variability in estimates and CLT May 28 2013 9 / 16

slide-17
SLIDE 17

Variability in estimates Sampling distributions - via simulation

Average number of Duke games attended (cont.)

Question The mean of the sampling distribution is 5.75, and the standard devia- tion of the sampling distribution (also called the standard error) is 0.75. Which of the following is the most reasonable guess for the 95% con- fidence interval for the true average number of Duke games attended by students? (a) 5.75 ± 0.75 (b) 5.75 ± 2 × 0.75 (c) 5.75 ± 3 × 0.75 (d) cannot tell from the information given

Statistics 101 (Thomas Leininger) U3 - L1: Variability in estimates and CLT May 28 2013 10 / 16

slide-18
SLIDE 18

Variability in estimates Sampling distributions - via simulation

Average number of Duke games attended (cont.)

Question The mean of the sampling distribution is 5.75, and the standard devia- tion of the sampling distribution (also called the standard error) is 0.75. Which of the following is the most reasonable guess for the 95% con- fidence interval for the true average number of Duke games attended by students? (a) 5.75 ± 0.75 (b) 5.75 ± 2 × 0.75 → (4.25, 7.25) (c) 5.75 ± 3 × 0.75 (d) cannot tell from the information given

Statistics 101 (Thomas Leininger) U3 - L1: Variability in estimates and CLT May 28 2013 10 / 16

slide-19
SLIDE 19

Variability in estimates Sampling distributions - via CLT

1

Announcements

2

Variability in estimates Example Sampling distributions - via simulation Sampling distributions - via CLT

Statistics 101 U3 - L1: Variability in estimates and CLT Thomas Leininger

slide-20
SLIDE 20

Variability in estimates Sampling distributions - via CLT

Central limit theorem

Central limit theorem The distribution of the sample mean is well approximated by a normal model:

¯ x ∼ N

  • mean = µ, SE = σ

√n

  • If σ is unknown, use s.

So it wasn’t a coincidence that the sampling distributions we saw earlier were symmetric. We won’t go into the proving why SE =

σ √n, but note that as n

increases SE decreases. As the sample size increases we would expect samples to yield more consistent sample means, hence the variability among the sample means would be lower.

Statistics 101 (Thomas Leininger) U3 - L1: Variability in estimates and CLT May 28 2013 11 / 16

slide-21
SLIDE 21

Variability in estimates Sampling distributions - via CLT

CLT - conditions

Certain conditions must be met for the CLT to apply:

1

Independence: Sampled observations must be independent. This is difficult to verify, but is more likely if

random sampling/assignment is used, and, if sampling without replacement, n < 10% of the population.

2

Sample size/skew/outliers: Either 1) the population distribution is normal OR 2) n > 30 and the population distribution is not extremely skewed. This is also difficult to verify for the population, but we can check it using the sample data, and assume that the sample mirrors the population.

Statistics 101 (Thomas Leininger) U3 - L1: Variability in estimates and CLT May 28 2013 12 / 16

slide-22
SLIDE 22

Variability in estimates Sampling distributions - via CLT

CLT - sample size/skew condition - simulations (1)

http://onlinestatbook.com/stat sim/sampling dist/index.html Statistics 101 (Thomas Leininger) U3 - L1: Variability in estimates and CLT May 28 2013 13 / 16

slide-23
SLIDE 23

Variability in estimates Sampling distributions - via CLT

CLT - sample size/skew condition - simulations (2)

Statistics 101 (Thomas Leininger) U3 - L1: Variability in estimates and CLT May 28 2013 14 / 16

slide-24
SLIDE 24

Variability in estimates Sampling distributions - via CLT

CLT - sample size/skew condition - simulations (3)

Statistics 101 (Thomas Leininger) U3 - L1: Variability in estimates and CLT May 28 2013 15 / 16

slide-25
SLIDE 25

Variability in estimates Sampling distributions - via CLT

CLT - sample size/skew condition

Question Which of the below visualizations is not appropriate for checking if the sample, and hence the population, has an extremely skewed distribu- tion? (a) histogram (b) boxplot (c) normal probability plot (d) barplot

Statistics 101 (Thomas Leininger) U3 - L1: Variability in estimates and CLT May 28 2013 16 / 16

slide-26
SLIDE 26

Variability in estimates Sampling distributions - via CLT

CLT - sample size/skew condition

Question Which of the below visualizations is not appropriate for checking if the sample, and hence the population, has an extremely skewed distribu- tion? (a) histogram (b) boxplot (c) normal probability plot (d) barplot

Statistics 101 (Thomas Leininger) U3 - L1: Variability in estimates and CLT May 28 2013 16 / 16