STAT 113 Bootstrap Confidence Intervals Colin Reimer Dawson - - PowerPoint PPT Presentation

stat 113 bootstrap confidence intervals
SMART_READER_LITE
LIVE PREVIEW

STAT 113 Bootstrap Confidence Intervals Colin Reimer Dawson - - PowerPoint PPT Presentation

Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals STAT 113 Bootstrap Confidence Intervals Colin Reimer Dawson Oberlin College 3 March 2017 Confidence Intervals Bootstrap Resampling


slide-1
SLIDE 1

Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals

STAT 113 Bootstrap Confidence Intervals

Colin Reimer Dawson

Oberlin College

3 March 2017

slide-2
SLIDE 2

Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals

Using Samples to Make Estimates About Populations

Statistic : Sample :: Parameter : Population

We want to use our sample statistic to estimate the corresponding population parameter

slide-3
SLIDE 3

Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals

Standard Error

Standard Error Definition

The distribution of a quantitative variable has a standard deviation. The sampling distribution of a quantitative sample statistic (like a mean) has a standard deviation too. This has a special name: the standard error (e.g., “of the mean”).

slide-4
SLIDE 4

Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals

Confidence Intervals

  • A point estimate of some population parameter (like a mean),

together with some measure of our confidence/uncertainty (e.g., MoE), defines a confidence interval.

  • Can be written in the form “statistic ± MoE”.
  • “With 95% confidence, the mean flavor-life of our gumballs is

between 65.3 and 67.1 minutes.”

  • “With 95% confidence, between 39 (42 − 3) and 45 (42 + 3)

percent of U.S. adults approve of Donald Trump’s job performance as president.

slide-5
SLIDE 5

Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals

How to Determine the Margin of Error?

The population mean µ is within 2 Standard Errors of most (about 95%) sample means (from simple random samples). Margin of Error

A 95% margin of error of 3 points means that 95% of surveys with the same procedure and sample size will yield sample statistics which are within 3 points of the corresponding population parameter. If the sampling distribution is approximately Normal, then a 95% Margin of Error is about 2 Standard Errors.

slide-6
SLIDE 6

Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals

Interpretations of CIs

  • 95% CIs contain 95% of the cases in the population. False.

They represent uncertainty about a population parameter, not about individual points.

  • There is a 95% chance that the sample mean falls in the 95%
  • CI. False. Any given CI is centered around the sample mean for

that sample, so the sample mean is inside 100% of the time.

  • 95% of samples produce confidence intervals that contain the

population parameter. True: This is the definition of a confidence interval

slide-7
SLIDE 7

Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals

Correct or Incorrect?

A 98% confidence interval for mean pulse rate in the Oberlin student population is 65 to 71. The interpretation “I am 98% sure that all students will have pulse rates between 65 and 71.” is

  • A. Correct
  • B. Incorrect
slide-8
SLIDE 8

Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals

Correct or Incorrect?

A 98% confidence interval for mean pulse rate in the Oberlin student population is 65 to 71. The interpretation “I am 98% sure that the mean pulse rate for this sample of students will fall between 65 and 71” is

  • A. Correct
  • B. Incorrect
slide-9
SLIDE 9

Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals

Correct or Incorrect?

A 98% confidence interval for mean pulse rate in the Oberlin student population is 65 to 71. The interpretation “I am 98% sure that the mean pulse rate for the population of all students will fall between 65 and 71” is

  • A. Correct
  • B. Incorrect
slide-10
SLIDE 10

Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals

Correct or Incorrect?

A 98% confidence interval for mean pulse rate in the Oberlin student population is 65 to 71. The interpretation “98% of the pulse rates for students at this college will fall between 65 and 71” is

  • A. Correct
  • B. Incorrect
slide-11
SLIDE 11

Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals

slide-12
SLIDE 12

Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals

Summary

To create a 95% confidence interval for a parameter:

  • 1. Take many random samples from the population, and compute

the sample statistic for each sample

  • 2. Compute the standard error as the standard deviation of all

these statistics

  • 3. For your actual sample, use statistic ± 2SE
slide-13
SLIDE 13

Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals

Ok, but...

In reality we only have one sample. How do we know what the standard error is?

  • Standard error depends on population characteristics,

particularly variability

  • We can use the sample to estimate not only the parameter of

interest (e.g., mean, proportion), but also the variability.

  • Two approaches: (1) Simulation, (2) Probability theory
slide-14
SLIDE 14

Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals

Estimating the Margin of Error from One Sample

  • Since we only have one sample, we have to estimate the

Margin of Error using only the information it contains.

  • Idea: Let the whole sample (not just the statistic of interest)

serve as an estimate for the whole population

slide-15
SLIDE 15

Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals

Note: We do not literally make copies of the data, or increase our sample size, by bootstrapping!

slide-16
SLIDE 16

Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals

Sampling from the Pseudo-Population

  • Sampling from the estimated population is equivalent to

sampling from the sample, but never “using up” the cases.

  • In other words, we sample with replacement from the sample.
  • The resulting sample is called a bootstrap sample.
slide-17
SLIDE 17

Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals

slide-18
SLIDE 18

Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals

Bootstrap Statistic and Bootstrap Distribution

  • We compute the relevant statistic (e.g., mean) on the

bootstrap sample. This is a bootstrap statistic.

  • Over many bootstrap samples, each contributing a bootstrap

statistic, we get a bootstrap distribution.

  • Each bootstrap statistic differs from the “pseudopopulation

parameter” (which is really the real sample statistic).

  • We hope these differences are similar in size to the differences

between true sample statistics and population parameter.

Bootstrap statistic : Actual sample statistic :: Actual sample statistic : Actual Population Parameter

slide-19
SLIDE 19

Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals

slide-20
SLIDE 20

Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals

Examples: StatKey http://lock5stat.com/statkey

slide-21
SLIDE 21

Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals

Population vs. Sample vs. Sampling Dist. vs. Bootstrap Dist.

Population <- read.file("http://colindawson.net/data/ames.csv") Sample <- sample(Population, size = 50) SamplingDist <- do(5000) * sample(Population, size = 50) %>% mean(~Price, data = .) BootstrapDist <- do(5000) * resample(Sample) %>% mean(~Price, data = .)

slide-22
SLIDE 22

Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals

Population vs. Sample vs. Sampling Dist. vs. Bootstrap Dist.

Price

  • Pop. Cases

20 40 60 80 100 150000 200000

Price

  • Samp. Cases

1 2 3 4 150000 200000

Mean Price Samples

200 400 600 800 150000 200000

Mean Price

  • Boot. Samples

200 400 600 150000 200000

  • What is the center of the

sampling distribution?

  • What is the center of the

bootstrap distribution?

  • How does the spread

compare?

slide-23
SLIDE 23

Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals

Estimating the Margin of Error

Mean Price Samples 100000 150000 200000 250000 400

  • 95%

Mean Price

  • Boot. Samples

100000 150000 200000 250000 600

  • 95%
  • The spread of the bootstrap distribution approximates the

spread of the true sampling distribution.

  • We can use the bootstrap distribution to get a Margin of Error

for our Confidence Interval

  • Where should the center of the CI be?
slide-24
SLIDE 24

Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals

Adjusting the Confidence Level

If the sampling distribution is approximately Normal, then a 95% Margin of Error is about 2 Standard Errors. If the bootstrap distribution is approximately Normal, 95% of the bootstrap statistics are within 2 SE of the boostrap center (i.e.,

  • riginal sample stat.). That is, 95% of bootstrap statistics are

within the 95% CI. If the bootstrap distribution is symmetric, then capturing the middle X% of the bootstrap statistics yields an X% confidence interval!

slide-25
SLIDE 25

Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals

Estimating the Margin of Error

Mean Price Samples 100000 150000 200000 250000 400

  • 99%

Mean Price

  • Boot. Samples

100000 150000 200000 250000 600

  • 99%
  • If we want a 99% CI, we need a MoE such that 99% of sample

stats are within that MoE of the population parameter.

  • Since the bootstrap dist. has similar spread to the true

sampling dist., we can estimate such an MoE there

  • Then build a CI around the sample stat. (aka center of

boostrap dist.) with that MoE.

slide-26
SLIDE 26

Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals

CI with Arbitrary Confidence Level

### 99% CI goes from 0.5 percentile to 99.5 percentile of bootstrap dist. CI <- quantile(~result, data = BootstrapDist, probs = c(0.005, 0.995)) CI 0.5% 99.5% 142708.9 190408.7 histogram(~result, data = BootstrapDist, fit = "normal", nint = 100, v = CI)

result Density

0.00000 0.00001 0.00002 0.00003 0.00004 0.00005 140000 160000 180000 200000

slide-27
SLIDE 27

Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals

Example: Atlanta Commutes http://lock5stat.com/StatKey

slide-28
SLIDE 28

Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals

Summary: Bootstrap CIs

To generate a bootstrap distribution, we

  • 1. Generate bootstrap samples by sampling with replacement

from the original sample, using the same sample size

  • 2. Compute the statistic of interest, a bootstrap statistic, for

each of the bootstrap samples

  • 3. Collect the statistics for many bootstrap samples to form a

bootstrap distribution If the bootstrap distribution is symmetric, an X% CI can be estimated by taking the range of the middle X% of the bootstrap statistics.