PS 406 Week 3 Section: Bootstrapping D.J. Flynn April 21, 2014 - - PowerPoint PPT Presentation

ps 406 week 3 section bootstrapping
SMART_READER_LITE
LIVE PREVIEW

PS 406 Week 3 Section: Bootstrapping D.J. Flynn April 21, 2014 - - PowerPoint PPT Presentation

PS 406 Week 3 Section: Bootstrapping D.J. Flynn April 21, 2014 D.J. Flynn PS406 Week 3 Section Spring 2014 1 / 22 Todays plan Logic of the Bootstrap 1 Bootstrapping in R 2 D.J. Flynn PS406 Week 3 Section Spring 2014 2


slide-1
SLIDE 1

PS 406 – Week 3 Section: Bootstrapping

D.J. Flynn April 21, 2014

D.J. Flynn PS406 – Week 3 Section Spring 2014 1 / 22

slide-2
SLIDE 2

Today’s plan

1

Logic of the Bootstrap

2

Bootstrapping in R

D.J. Flynn PS406 – Week 3 Section Spring 2014 2 / 22

slide-3
SLIDE 3

Logic of the Bootstrap

Logic of the Bootstrap

We are sometimes confronted with problems that would be very difficult/time-consuming to solve mathematically. Usually, these problems involve calculating some measure of accuracy for a sample statistic:

How well does a ML estimator perform in small samples? Confidence intervals for complicated interaction effects? etc...

D.J. Flynn PS406 – Week 3 Section Spring 2014 3 / 22

slide-4
SLIDE 4

Logic of the Bootstrap

Two variants of the bootstrap

Non-parametric bootstrapping: Re-sample (with replacement) from

  • riginal data, calculate the test statistic 1000+ times, use the variance

in estimates as an estimate of variance in original estimator. Parametric bootstrapping: Specify a probability model for the data (Y), generate 1000+ samples based on that model, use the variance in estimates as an estimate of variance in original estimator. Let’s take a look at the code to perform each of these, paying attention to the key difference between them...

D.J. Flynn PS406 – Week 3 Section Spring 2014 4 / 22

slide-5
SLIDE 5

Logic of the Bootstrap

Non-parametric bootstrap from class

coins <- rbinom(10,1,.5) temp <- matrix(0,nrow=1000,ncol=1) for (i in 1:1000){ temp[i,1] <- sum(sample(coins, length(coins), replace=TRUE))/length(coins) }

D.J. Flynn PS406 – Week 3 Section Spring 2014 5 / 22

slide-6
SLIDE 6

Logic of the Bootstrap

Parametric bootstrap from class

temp <- matrix(0,nrow=1000,ncol=1) for (i in 1:1000){ temp[i,1] <- sum( rbinom(10,1, .5))/length(coins) }

D.J. Flynn PS406 – Week 3 Section Spring 2014 6 / 22

slide-7
SLIDE 7

Bootstrapping in R

Bootstrapping in R

We use “loops” to tell R what process(es) to repeat The basic process is as follows:

1

make vector/matrix to store results

2

create loop

3

tell R what operations to perform inside the loop

4

store the results in original vector/matrix

Let’s take a look at a simple example...

D.J. Flynn PS406 – Week 3 Section Spring 2014 7 / 22

slide-8
SLIDE 8

Bootstrapping in R

mean<-10 sd<-2 result.vec<-vector(mode="numeric",length=1000) for (i in 1:1000) { draws<-rnorm(1000,mean=mean,sd=sd) result.vec[i]<-mean(draws) } summary(result.vec)

  • Min. 1st Qu.

Median Mean 3rd Qu. Max. 9.823 9.959 10.000 10.000 10.040 10.220

D.J. Flynn PS406 – Week 3 Section Spring 2014 8 / 22

slide-9
SLIDE 9

Bootstrapping in R

Let’s practice...

D.J. Flynn PS406 – Week 3 Section Spring 2014 9 / 22

slide-10
SLIDE 10

Bootstrapping in R

Non-parametric boostrapping (problem 1a)

max<-10 #per the problem

  • rig.sample<-runif(100,min=0,max=max)

#create vector for bias results bias.vec<-vector(mode="numeric",length=1000) #sample from original with replacement, take max, store bias for (i in 1:1000){ boot.sample<-sample(orig.sample,100,replace=TRUE) max.boot<-max(boot.sample) bias.vec[i]<-(max.boot - max(orig.sample)) / max(orig.sample) }

D.J. Flynn PS406 – Week 3 Section Spring 2014 10 / 22

slide-11
SLIDE 11

Bootstrapping in R

#let’s take a look at our bootstrapped estimate of bias: mean(bias.vec) #We know from the lab that the actual bias is:

  • 1 / (1000+1)

#Q: How well does the non-parametric boostrap do at estimating bias? ... What about with different sample sizes?

D.J. Flynn PS406 – Week 3 Section Spring 2014 11 / 22

slide-12
SLIDE 12

Bootstrapping in R

Parametric bootstrapping (problem 1b)

This problem is similar to the previous one. You’ll need to do a few things differently: draw a sample from the uniform distribution using some a > 0 as the maximum store the max from the original sample inside the loop: draw a sample from the uniform using the max from the original sample as the max. Store the max of the bootstrap

  • sample. Store the bias in your results vector:

bias = (mean(bootstrap max) - original max) / original max Try this on your own and let me know if you encounter issues.

D.J. Flynn PS406 – Week 3 Section Spring 2014 12 / 22

slide-13
SLIDE 13

Bootstrapping in R

Parametric bootstrapping with the normal distribution (problem 2)

Suppose we have a sample of i.i.d. Normal observations with mean 10 and sd 2. We want to know the bias, variance, and MSE for a given estimator (e.g., mean) in samples of varying sizes.

D.J. Flynn PS406 – Week 3 Section Spring 2014 13 / 22

slide-14
SLIDE 14

Bootstrapping in R

sample.mean<-10 sample.sd<-2 results<-vector(mode="numeric",length=1000) for (i in 1:1000) { draws<-rnorm(100,mean=sample.mean,sd=sample.sd) results[i]<-mean(draws) } #calculate bias: mean(results) - sample.mean [1] 0.002322089 #calculate variance: var(results) [1] 0.03979419

D.J. Flynn PS406 – Week 3 Section Spring 2014 14 / 22

slide-15
SLIDE 15

Bootstrapping in R

#calculate MSE: mean((results - sample.mean)^2) [1] 0.03975979 Note that for the lab you’ll need to draw two samples inside the loop (one for X and one for Y) – and then store mean(X)

mean(Y) in your results vector

D.J. Flynn PS406 – Week 3 Section Spring 2014 15 / 22

slide-16
SLIDE 16

Bootstrapping in R

Contaminated Data (problem 3)

Suppose we accidentally merge two datasets together: the first is a sample from a Normal population with mean 10 and sd 2, and the second (contaminated) dataset is from a Normal population with mean 500 and variance 50. We’ve calculated some statistics (e.g., SD and IQR), and now we want to know how affected those estimators are by the presence of

  • utliers. We know that the sample is N = 100 and 25% of the data are

contaminated..

D.J. Flynn PS406 – Week 3 Section Spring 2014 16 / 22

slide-17
SLIDE 17

Bootstrapping in R

  • riginal<-rnorm(100,mean=10,sd=2)

contaminated<-rnorm(100,mean=500,sd=50) results <- matrix(NA, nrow=1000, ncol=2) for (i in 1:1000) { prob.contam<-rbinom(100,1,prob=0.25) actual.dist<-ifelse(prob.contam==1, contaminated, original) iqr<-IQR(actual.dist) sd<-sd(actual.dist) results[i,]<-c(iqr,sd) }

D.J. Flynn PS406 – Week 3 Section Spring 2014 17 / 22

slide-18
SLIDE 18

Bootstrapping in R

#to look at IQR: results[,1] #to look at SD: results[,2] #can calculate RMSE as follows (e.g., for SD): sqrt(mean((results[,2] - 1)^2)) / 1

D.J. Flynn PS406 – Week 3 Section Spring 2014 18 / 22

slide-19
SLIDE 19

Bootstrapping in R

Bootstrapping Mediation Effects (problem 4)

Let’s build a really simple mediation model using the Mali data from last week and use bootstrapping to calculate SE for our estimated effect. For this simple example:

same name is the treatment global eval is the mediator vote prefer is the DV

D.J. Flynn PS406 – Week 3 Section Spring 2014 19 / 22

slide-20
SLIDE 20

Bootstrapping in R

data <- read.csv ("http://pantheon.yale.edu/~td244/cross_cutting_apsr_ replicationdata.csv") library(car) data$samename <- recode(mali$treat_assign, "1=0; 2=0; 3=0; 4=0; 5=0; 6=1") data.new <- na.omit(data.frame(global_eval=data$global_eval, samename=data$samename, vote_prefer=data$vote_prefer)) model.m<-lm(global_eval~samename,data=data.new) model.y<-lm(vote_prefer~samename+global_eval,data=data.new) library(mediation) med<-mediate(model.m,model.y,sims=1000,boot=TRUE, boot.ci.type="bca", treat="samename",mediator="global_eval") summary(med)

D.J. Flynn PS406 – Week 3 Section Spring 2014 20 / 22

slide-21
SLIDE 21

Bootstrapping in R

Let’s build a function to perform the bootstrap... myfunction<-function(data, i) { boot <- NULL boot$samename<-data.new$samename[i] boot$vote_prefer<-data.new$vote_prefer[i] boot$global_eval<-data.new$global_eval[i] model.m<-lm(global_eval~samename,data=boot) model.y<-lm(vote_prefer~samename+global_eval,data=boot) med<-mediate(model.m,model.y,boot=FALSE, boot.ci.type= "bca", treat="samename",mediator="global_eval") }

D.J. Flynn PS406 – Week 3 Section Spring 2014 21 / 22

slide-22
SLIDE 22

Bootstrapping in R

install.packages("boot") library(boot) #NOTE: number of reps cannot be < N #NOTE: this could take a few minutes. If it takes longer #(hours), let me know -- It’s possible to do the analysis #on a sub-set of cases. boot.result<-boot(data.new,myfunction,R=700) boot.ci(boot.result)

D.J. Flynn PS406 – Week 3 Section Spring 2014 22 / 22