Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals
STAT 113 Bootstrap Confidence Intervals Colin Reimer Dawson - - PowerPoint PPT Presentation
STAT 113 Bootstrap Confidence Intervals Colin Reimer Dawson - - PowerPoint PPT Presentation
Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals STAT 113 Bootstrap Confidence Intervals Colin Reimer Dawson Oberlin College 3 March 2017 Confidence Intervals Bootstrap Resampling
Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals
Using Samples to Make Estimates About Populations
Statistic : Sample :: Parameter : Population
We want to use our sample statistic to estimate the corresponding population parameter
Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals
Standard Error
Standard Error Definition
The distribution of a quantitative variable has a standard deviation. The sampling distribution of a quantitative sample statistic (like a mean) has a standard deviation too. This has a special name: the standard error (e.g., “of the mean”).
Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals
Confidence Intervals
- A point estimate of some population parameter (like a mean),
together with some measure of our confidence/uncertainty (e.g., MoE), defines a confidence interval.
- Can be written in the form “statistic ± MoE”.
- “With 95% confidence, the mean flavor-life of our gumballs is
between 65.3 and 67.1 minutes.”
- “With 95% confidence, between 39 (42 − 3) and 45 (42 + 3)
percent of U.S. adults approve of Donald Trump’s job performance as president.
Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals
How to Determine the Margin of Error?
The population mean µ is within 2 Standard Errors of most (about 95%) sample means (from simple random samples). Margin of Error
A 95% margin of error of 3 points means that 95% of surveys with the same procedure and sample size will yield sample statistics which are within 3 points of the corresponding population parameter. If the sampling distribution is approximately Normal, then a 95% Margin of Error is about 2 Standard Errors.
Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals
Interpretations of CIs
- 95% CIs contain 95% of the cases in the population. False.
They represent uncertainty about a population parameter, not about individual points.
- There is a 95% chance that the sample mean falls in the 95%
- CI. False. Any given CI is centered around the sample mean for
that sample, so the sample mean is inside 100% of the time.
- 95% of samples produce confidence intervals that contain the
population parameter. True: This is the definition of a confidence interval
Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals
Correct or Incorrect?
A 98% confidence interval for mean pulse rate in the Oberlin student population is 65 to 71. The interpretation “I am 98% sure that all students will have pulse rates between 65 and 71.” is
- A. Correct
- B. Incorrect
Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals
Correct or Incorrect?
A 98% confidence interval for mean pulse rate in the Oberlin student population is 65 to 71. The interpretation “I am 98% sure that the mean pulse rate for this sample of students will fall between 65 and 71” is
- A. Correct
- B. Incorrect
Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals
Correct or Incorrect?
A 98% confidence interval for mean pulse rate in the Oberlin student population is 65 to 71. The interpretation “I am 98% sure that the mean pulse rate for the population of all students will fall between 65 and 71” is
- A. Correct
- B. Incorrect
Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals
Correct or Incorrect?
A 98% confidence interval for mean pulse rate in the Oberlin student population is 65 to 71. The interpretation “98% of the pulse rates for students at this college will fall between 65 and 71” is
- A. Correct
- B. Incorrect
Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals
Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals
Summary
To create a 95% confidence interval for a parameter:
- 1. Take many random samples from the population, and compute
the sample statistic for each sample
- 2. Compute the standard error as the standard deviation of all
these statistics
- 3. For your actual sample, use statistic ± 2SE
Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals
Ok, but...
In reality we only have one sample. How do we know what the standard error is?
- Standard error depends on population characteristics,
particularly variability
- We can use the sample to estimate not only the parameter of
interest (e.g., mean, proportion), but also the variability.
- Two approaches: (1) Simulation, (2) Probability theory
Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals
Estimating the Margin of Error from One Sample
- Since we only have one sample, we have to estimate the
Margin of Error using only the information it contains.
- Idea: Let the whole sample (not just the statistic of interest)
serve as an estimate for the whole population
Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals
Note: We do not literally make copies of the data, or increase our sample size, by bootstrapping!
Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals
Sampling from the Pseudo-Population
- Sampling from the estimated population is equivalent to
sampling from the sample, but never “using up” the cases.
- In other words, we sample with replacement from the sample.
- The resulting sample is called a bootstrap sample.
Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals
Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals
Bootstrap Statistic and Bootstrap Distribution
- We compute the relevant statistic (e.g., mean) on the
bootstrap sample. This is a bootstrap statistic.
- Over many bootstrap samples, each contributing a bootstrap
statistic, we get a bootstrap distribution.
- Each bootstrap statistic differs from the “pseudopopulation
parameter” (which is really the real sample statistic).
- We hope these differences are similar in size to the differences
between true sample statistics and population parameter.
Bootstrap statistic : Actual sample statistic :: Actual sample statistic : Actual Population Parameter
Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals
Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals
Examples: StatKey http://lock5stat.com/statkey
Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals
Population vs. Sample vs. Sampling Dist. vs. Bootstrap Dist.
Population <- read.file("http://colindawson.net/data/ames.csv") Sample <- sample(Population, size = 50) SamplingDist <- do(5000) * sample(Population, size = 50) %>% mean(~Price, data = .) BootstrapDist <- do(5000) * resample(Sample) %>% mean(~Price, data = .)
Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals
Population vs. Sample vs. Sampling Dist. vs. Bootstrap Dist.
Price
- Pop. Cases
20 40 60 80 100 150000 200000
Price
- Samp. Cases
1 2 3 4 150000 200000
Mean Price Samples
200 400 600 800 150000 200000
Mean Price
- Boot. Samples
200 400 600 150000 200000
- What is the center of the
sampling distribution?
- What is the center of the
bootstrap distribution?
- How does the spread
compare?
Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals
Estimating the Margin of Error
Mean Price Samples 100000 150000 200000 250000 400
- 95%
Mean Price
- Boot. Samples
100000 150000 200000 250000 600
- 95%
- The spread of the bootstrap distribution approximates the
spread of the true sampling distribution.
- We can use the bootstrap distribution to get a Margin of Error
for our Confidence Interval
- Where should the center of the CI be?
Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals
Adjusting the Confidence Level
If the sampling distribution is approximately Normal, then a 95% Margin of Error is about 2 Standard Errors. If the bootstrap distribution is approximately Normal, 95% of the bootstrap statistics are within 2 SE of the boostrap center (i.e.,
- riginal sample stat.). That is, 95% of bootstrap statistics are
within the 95% CI. If the bootstrap distribution is symmetric, then capturing the middle X% of the bootstrap statistics yields an X% confidence interval!
Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals
Estimating the Margin of Error
Mean Price Samples 100000 150000 200000 250000 400
- 99%
Mean Price
- Boot. Samples
100000 150000 200000 250000 600
- 99%
- If we want a 99% CI, we need a MoE such that 99% of sample
stats are within that MoE of the population parameter.
- Since the bootstrap dist. has similar spread to the true
sampling dist., we can estimate such an MoE there
- Then build a CI around the sample stat. (aka center of
boostrap dist.) with that MoE.
Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals
CI with Arbitrary Confidence Level
### 99% CI goes from 0.5 percentile to 99.5 percentile of bootstrap dist. CI <- quantile(~result, data = BootstrapDist, probs = c(0.005, 0.995)) CI 0.5% 99.5% 142708.9 190408.7 histogram(~result, data = BootstrapDist, fit = "normal", nint = 100, v = CI)
result Density
0.00000 0.00001 0.00002 0.00003 0.00004 0.00005 140000 160000 180000 200000
Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals
Example: Atlanta Commutes http://lock5stat.com/StatKey
Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals
Summary: Bootstrap CIs
To generate a bootstrap distribution, we
- 1. Generate bootstrap samples by sampling with replacement
from the original sample, using the same sample size
- 2. Compute the statistic of interest, a bootstrap statistic, for
each of the bootstrap samples
- 3. Collect the statistics for many bootstrap samples to form a