PS 406 Week 3 Section: Bootstrapping D.J. Flynn April 21, 2014 - PowerPoint PPT Presentation

PS 406 – Week 3 Section: Bootstrapping D.J. Flynn April 21, 2014 D.J. Flynn PS406 – Week 3 Section Spring 2014 1 / 22

Today’s plan Logic of the Bootstrap 1 Bootstrapping in R 2 D.J. Flynn PS406 – Week 3 Section Spring 2014 2 / 22

Logic of the Bootstrap Logic of the Bootstrap We are sometimes confronted with problems that would be very difficult/time-consuming to solve mathematically. Usually, these problems involve calculating some measure of accuracy for a sample statistic: How well does a ML estimator perform in small samples? Confidence intervals for complicated interaction effects? etc... D.J. Flynn PS406 – Week 3 Section Spring 2014 3 / 22

Logic of the Bootstrap Two variants of the bootstrap Non-parametric bootstrapping: Re-sample (with replacement) from original data, calculate the test statistic 1000+ times, use the variance in estimates as an estimate of variance in original estimator. Parametric bootstrapping: Specify a probability model for the data ( Y ), generate 1000+ samples based on that model, use the variance in estimates as an estimate of variance in original estimator. Let’s take a look at the code to perform each of these, paying attention to the key difference between them... D.J. Flynn PS406 – Week 3 Section Spring 2014 4 / 22

Logic of the Bootstrap Non-parametric bootstrap from class coins <- rbinom(10,1,.5) temp <- matrix(0,nrow=1000,ncol=1) for (i in 1:1000){ temp[i,1] <- sum(sample(coins, length(coins), replace=TRUE))/length(coins) } D.J. Flynn PS406 – Week 3 Section Spring 2014 5 / 22

Logic of the Bootstrap Parametric bootstrap from class temp <- matrix(0,nrow=1000,ncol=1) for (i in 1:1000){ temp[i,1] <- sum( rbinom(10,1, .5))/length(coins) } D.J. Flynn PS406 – Week 3 Section Spring 2014 6 / 22

Bootstrapping in R Bootstrapping in R We use “loops” to tell R what process(es) to repeat The basic process is as follows: make vector/matrix to store results 1 create loop 2 tell R what operations to perform inside the loop 3 store the results in original vector/matrix 4 Let’s take a look at a simple example... D.J. Flynn PS406 – Week 3 Section Spring 2014 7 / 22

Bootstrapping in R mean<-10 sd<-2 result.vec<-vector(mode="numeric",length=1000) for (i in 1:1000) { draws<-rnorm(1000,mean=mean,sd=sd) result.vec[i]<-mean(draws) } summary(result.vec) Min. 1st Qu. Median Mean 3rd Qu. Max. 9.823 9.959 10.000 10.000 10.040 10.220 D.J. Flynn PS406 – Week 3 Section Spring 2014 8 / 22

Bootstrapping in R Let’s practice... D.J. Flynn PS406 – Week 3 Section Spring 2014 9 / 22

Bootstrapping in R Non-parametric boostrapping (problem 1a) max<-10 #per the problem orig.sample<-runif(100,min=0,max=max) #create vector for bias results bias.vec<-vector(mode="numeric",length=1000) #sample from original with replacement, take max, store bias for (i in 1:1000){ boot.sample<-sample(orig.sample,100,replace=TRUE) max.boot<-max(boot.sample) bias.vec[i]<-(max.boot - max(orig.sample)) / max(orig.sample) } D.J. Flynn PS406 – Week 3 Section Spring 2014 10 / 22

Bootstrapping in R #let’s take a look at our bootstrapped estimate of bias: mean(bias.vec) #We know from the lab that the actual bias is: -1 / (1000+1) #Q: How well does the non-parametric boostrap do at estimating bias? ... What about with different sample sizes? D.J. Flynn PS406 – Week 3 Section Spring 2014 11 / 22

Bootstrapping in R Parametric bootstrapping (problem 1b) This problem is similar to the previous one. You’ll need to do a few things differently: draw a sample from the uniform distribution using some a > 0 as the maximum store the max from the original sample inside the loop: draw a sample from the uniform using the max from the original sample as the max. Store the max of the bootstrap sample. Store the bias in your results vector: bias = (mean(bootstrap max) - original max) / original max Try this on your own and let me know if you encounter issues. D.J. Flynn PS406 – Week 3 Section Spring 2014 12 / 22

Bootstrapping in R Parametric bootstrapping with the normal distribution (problem 2) Suppose we have a sample of i.i.d. Normal observations with mean 10 and sd 2. We want to know the bias, variance, and MSE for a given estimator (e.g., mean) in samples of varying sizes. D.J. Flynn PS406 – Week 3 Section Spring 2014 13 / 22

Bootstrapping in R sample.mean<-10 sample.sd<-2 results<-vector(mode="numeric",length=1000) for (i in 1:1000) { draws<-rnorm(100,mean=sample.mean,sd=sample.sd) results[i]<-mean(draws) } #calculate bias: mean(results) - sample.mean [1] 0.002322089 #calculate variance: var(results) [1] 0.03979419 D.J. Flynn PS406 – Week 3 Section Spring 2014 14 / 22

Bootstrapping in R #calculate MSE: mean((results - sample.mean)^2) [1] 0.03975979 Note that for the lab you’ll need to draw two samples inside the loop (one for X and one for Y ) – and then store mean ( X ) mean ( Y ) in your results vector D.J. Flynn PS406 – Week 3 Section Spring 2014 15 / 22

Bootstrapping in R Contaminated Data (problem 3) Suppose we accidentally merge two datasets together: the first is a sample from a Normal population with mean 10 and sd 2, and the second (contaminated) dataset is from a Normal population with mean 500 and variance 50. We’ve calculated some statistics (e.g., SD and IQR), and now we want to know how affected those estimators are by the presence of outliers. We know that the sample is N = 100 and 25% of the data are contaminated.. D.J. Flynn PS406 – Week 3 Section Spring 2014 16 / 22

Bootstrapping in R original<-rnorm(100,mean=10,sd=2) contaminated<-rnorm(100,mean=500,sd=50) results <- matrix(NA, nrow=1000, ncol=2) for (i in 1:1000) { prob.contam<-rbinom(100,1,prob=0.25) actual.dist<-ifelse(prob.contam==1, contaminated, original) iqr<-IQR(actual.dist) sd<-sd(actual.dist) results[i,]<-c(iqr,sd) } D.J. Flynn PS406 – Week 3 Section Spring 2014 17 / 22

Bootstrapping in R #to look at IQR: results[,1] #to look at SD: results[,2] #can calculate RMSE as follows (e.g., for SD): sqrt(mean((results[,2] - 1)^2)) / 1 D.J. Flynn PS406 – Week 3 Section Spring 2014 18 / 22

Bootstrapping in R Bootstrapping Mediation Effects (problem 4) Let’s build a really simple mediation model using the Mali data from last week and use bootstrapping to calculate SE for our estimated effect. For this simple example: same name is the treatment global eval is the mediator vote prefer is the DV D.J. Flynn PS406 – Week 3 Section Spring 2014 19 / 22

Bootstrapping in R data <- read.csv ("http://pantheon.yale.edu/~td244/cross_cutting_apsr_ replicationdata.csv") library(car) data$samename <- recode(mali$treat_assign, "1=0; 2=0; 3=0; 4=0; 5=0; 6=1") data.new <- na.omit(data.frame(global_eval=data$global_eval, samename=data$samename, vote_prefer=data$vote_prefer)) model.m<-lm(global_eval~samename,data=data.new) model.y<-lm(vote_prefer~samename+global_eval,data=data.new) library(mediation) med<-mediate(model.m,model.y,sims=1000,boot=TRUE, boot.ci.type="bca", treat="samename",mediator="global_eval") summary(med) D.J. Flynn PS406 – Week 3 Section Spring 2014 20 / 22

Bootstrapping in R Let’s build a function to perform the bootstrap... myfunction<-function(data, i) { boot <- NULL boot$samename<-data.new$samename[i] boot$vote_prefer<-data.new$vote_prefer[i] boot$global_eval<-data.new$global_eval[i] model.m<-lm(global_eval~samename,data=boot) model.y<-lm(vote_prefer~samename+global_eval,data=boot) med<-mediate(model.m,model.y,boot=FALSE, boot.ci.type= "bca", treat="samename",mediator="global_eval") } D.J. Flynn PS406 – Week 3 Section Spring 2014 21 / 22

Bootstrapping in R install.packages("boot") library(boot) #NOTE: number of reps cannot be < N #NOTE: this could take a few minutes. If it takes longer #(hours), let me know -- It’s possible to do the analysis #on a sub-set of cases. boot.result<-boot(data.new,myfunction,R=700) boot.ci(boot.result) D.J. Flynn PS406 – Week 3 Section Spring 2014 22 / 22

PS 406 Week 3 Section: Bootstrapping D.J. Flynn April 21, 2014 - PowerPoint PPT Presentation

PS 406 Week 3 Section: Bootstrapping D.J. Flynn April 21, 2014 D.J. Flynn PS406 Week 3 Section Spring 2014 1 / 22 Todays plan Logic of the Bootstrap 1 Bootstrapping in R 2 D.J. Flynn PS406 Week 3 Section Spring 2014 2

USCG 406 MHz DF Capabilities USCG 406 MHz DF Capabilities 2008 Beacon Manufacturers Workshop

Math 610 Section 700 - Recitation week 3 week 4 week 6 week 8 TA: Peng Wei Office: Blocker

350 Ryman Street P.O. Box 7909 Missoula, Montana 59807-7909 (406) 523-2500 Fax (406) 523-2595

Bootstrapping without the Boot We like minimally supervised learning (bootstrapping).

Parametric Bootstrapping 18.05 Spring 2017 Parametric bootstrapping Use the estimated parameter

PS 406 Week 4 Section: Matching and GLMs for Binary Outcomes D.J. Flynn April 23, 2014 D.J.

PS 406 Week 8 Section: Panel Methods and Missing Data D.J. Flynn May 21, 2014 D.J. Flynn

PS 406 Week 7 Section: Instrumental Variables/2SLS and RDD D.J. Flynn May 14, 2014 1 1

PS 406 Week 1 Section: Review of OLS and Matrix Algebra D.J. Flynn April 4, 2014 D.J. Flynn

MATH2130-F17 Week 13 Week 14 Week 15, Inner Farid Aliniaeifard Product Space CU BOULDER

Time Matters Week 7 Week 6 Prototyping + Needfinding Week 7 Week 8 Implementation Week 9

Module V: Vector Spaces Module V Math 237 Module V Section V.0 Section V.1 Section V.2

Galatians: week 3 Galatians 3:1-29 Week 1: Galatians 1:1-2:14 Week 2: Galatians 2:15-21 Week 3:

Thomas Patton Montana Bureau of Mines and Geology 1300 W. Park Street Butte, Montana 406 496

Explorations in Bootstrapping Guided Search 8th Language and Computation Day Deirdre Lungley

Improved Bootstrapping Approach in Multichannel Cognitive Radio Ad Hoc Networks The 4th Workshop

Changing the Narrative: this presentation, except where I am quoting the work of others I use

1 Peter Series Lesson #090 May 11, 2017 Dean Bible Ministries www.deanbibleministries.org Dr.

WWW 2020 Domain-Specific Automatic Scholar Profiling Based on Wikipedia Ziang Chuai, Qian Geng,

Unlocking Potential - CiviCRM Starter Kit Distribution on Drupal.org CiviCon 4/25 Kevin

From User Stories to Use Cases: Tell the Full Story 1 Agenda User Stories Benefits

Apache Libcloud API Driven Operations Paul Querna September 29, 2010 Thursday, September 30,

Our Place in the Cosmos Our Place in the Cosmos Rotation of the Earth and and The most

one structural behavior and design www.greatbuildings.com Introduction 2 Foundations