stat 113 bootstrap confidence intervals
play

STAT 113 Bootstrap Confidence Intervals Colin Reimer Dawson - PowerPoint PPT Presentation

Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals STAT 113 Bootstrap Confidence Intervals Colin Reimer Dawson Oberlin College 3 March 2017 Confidence Intervals Bootstrap Resampling


  1. Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals STAT 113 Bootstrap Confidence Intervals Colin Reimer Dawson Oberlin College 3 March 2017

  2. Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals Using Samples to Make Estimates About Populations Statistic : Sample :: Parameter : Population We want to use our sample statistic to estimate the corresponding population parameter

  3. Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals Standard Error Standard Error Definition The distribution of a quantitative variable has a standard deviation. The sampling distribution of a quantitative sample statistic (like a mean) has a standard deviation too. This has a special name: the standard error (e.g., “of the mean”).

  4. Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals Confidence Intervals • A point estimate of some population parameter (like a mean), together with some measure of our confidence/uncertainty (e.g., MoE), defines a confidence interval . • Can be written in the form “statistic ± MoE”. • “With 95% confidence, the mean flavor-life of our gumballs is between 65.3 and 67.1 minutes.” • “With 95% confidence, between 39 ( 42 − 3 ) and 45 ( 42 + 3 ) percent of U.S. adults approve of Donald Trump’s job performance as president.

  5. Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals How to Determine the Margin of Error? The population mean µ is within 2 Standard Errors of most (about 95%) sample means (from simple random samples). Margin of Error A 95% margin of error of 3 points means that 95% of surveys with the same procedure and sample size will yield sample statistics which are within 3 points of the corresponding population parameter. If the sampling distribution is approximately Normal, then a 95% Margin of Error is about 2 Standard Errors.

  6. Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals Interpretations of CIs • 95% CIs contain 95% of the cases in the population. False. They represent uncertainty about a population parameter, not about individual points. • There is a 95% chance that the sample mean falls in the 95% CI. False. Any given CI is centered around the sample mean for that sample, so the sample mean is inside 100% of the time. • 95% of samples produce confidence intervals that contain the population parameter. True: This is the definition of a confidence interval

  7. Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals Correct or Incorrect? A 98% confidence interval for mean pulse rate in the Oberlin student population is 65 to 71. The interpretation “I am 98% sure that all students will have pulse rates between 65 and 71.” is A. Correct B. Incorrect

  8. Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals Correct or Incorrect? A 98% confidence interval for mean pulse rate in the Oberlin student population is 65 to 71. The interpretation “I am 98% sure that the mean pulse rate for this sample of students will fall between 65 and 71” is A. Correct B. Incorrect

  9. Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals Correct or Incorrect? A 98% confidence interval for mean pulse rate in the Oberlin student population is 65 to 71. The interpretation “I am 98% sure that the mean pulse rate for the population of all students will fall between 65 and 71” is A. Correct B. Incorrect

  10. Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals Correct or Incorrect? A 98% confidence interval for mean pulse rate in the Oberlin student population is 65 to 71. The interpretation “98% of the pulse rates for students at this college will fall between 65 and 71” is A. Correct B. Incorrect

  11. Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals

  12. Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals Summary To create a 95% confidence interval for a parameter: 1. Take many random samples from the population, and compute the sample statistic for each sample 2. Compute the standard error as the standard deviation of all these statistics 3. For your actual sample, use statistic ± 2 SE

  13. Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals Ok, but... In reality we only have one sample. How do we know what the standard error is? • Standard error depends on population characteristics, particularly variability • We can use the sample to estimate not only the parameter of interest (e.g., mean, proportion), but also the variability. • Two approaches: (1) Simulation, (2) Probability theory

  14. Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals Estimating the Margin of Error from One Sample • Since we only have one sample, we have to estimate the Margin of Error using only the information it contains. • Idea: Let the whole sample (not just the statistic of interest) serve as an estimate for the whole population

  15. Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals Note: We do not literally make copies of the data, or increase our sample size, by bootstrapping!

  16. Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals Sampling from the Pseudo-Population • Sampling from the estimated population is equivalent to sampling from the sample, but never “using up” the cases. • In other words, we sample with replacement from the sample. • The resulting sample is called a bootstrap sample .

  17. Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals

  18. Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals Bootstrap Statistic and Bootstrap Distribution • We compute the relevant statistic (e.g., mean) on the bootstrap sample. This is a bootstrap statistic . • Over many bootstrap samples, each contributing a bootstrap statistic, we get a bootstrap distribution . • Each bootstrap statistic differs from the “pseudopopulation parameter” (which is really the real sample statistic). • We hope these differences are similar in size to the differences between true sample statistics and population parameter. Bootstrap statistic : Actual sample statistic :: Actual sample statistic : Actual Population Parameter

  19. Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals

  20. Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals Examples: StatKey http://lock5stat.com/statkey

  21. Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals Population vs. Sample vs. Sampling Dist. vs. Bootstrap Dist. Population <- read.file("http://colindawson.net/data/ames.csv") Sample <- sample(Population, size = 50) SamplingDist <- do(5000) * sample(Population, size = 50) %>% mean(~Price, data = .) BootstrapDist <- do(5000) * resample(Sample) %>% mean(~Price, data = .)

  22. Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals Population vs. Sample vs. Sampling Dist. vs. Bootstrap Dist. Pop. Cases 100 80 60 40 20 0 150000 200000 Price • What is the center of the Samp. Cases 4 3 sampling distribution? 2 1 0 150000 200000 • What is the center of the Price bootstrap distribution? 800 Samples 600 • How does the spread 400 200 0 compare? 150000 200000 Mean Price Boot. Samples 600 400 200 0 150000 200000 Mean Price

  23. Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals Estimating the Margin of Error Samples 95% 400 ● 0 100000 150000 200000 250000 Mean Price Boot. Samples 600 95% ● 0 100000 150000 200000 250000 Mean Price • The spread of the bootstrap distribution approximates the spread of the true sampling distribution. • We can use the bootstrap distribution to get a Margin of Error for our Confidence Interval • Where should the center of the CI be?

  24. Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals Adjusting the Confidence Level If the sampling distribution is approximately Normal, then a 95% Margin of Error is about 2 Standard Errors. If the bootstrap distribution is approximately Normal, 95% of the bootstrap statistics are within 2 SE of the boostrap center (i.e., original sample stat.). That is, 95% of bootstrap statistics are within the 95% CI. If the bootstrap distribution is symmetric, then capturing the middle X% of the bootstrap statistics yields an X% confidence interval!

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend