Resampling from the data or from distribution Simple Example Spline Example
Parametric bootstrap August 30, 2017 Resampling from the data or - - PowerPoint PPT Presentation
Parametric bootstrap August 30, 2017 Resampling from the data or - - PowerPoint PPT Presentation
Resampling from the data or from distribution Simple Example Spline Example Parametric bootstrap August 30, 2017 Resampling from the data or from distribution Simple Example Spline Example Bootstrap + Monte Carlo = Parametric Bootstrap
Resampling from the data or from distribution Simple Example Spline Example
Bootstrap + Monte Carlo = Parametric Bootstrap
Resampling from the data or from distribution Simple Example Spline Example
There is no more in data, than the data – one view
The bootstrap is a general tool for assessing statistical accuracy by ‘creating’ data from the data. It is based on sampling randomly from data to study how a quantity of interest behaves when observed in this process It is used to assess the variability of a certain characteristics
Resampling from the data or from distribution Simple Example Spline Example
There is a model behind the data – another view
Study theoretically a mathematical model Fit statistically model using the data Use the theory to assess variability or other properties
What if the model is difficult to study?
Resampling from the data or from distribution Simple Example Spline Example
Combine model with sampling
Fit statistically theoretical model using the data Take Monte Carlo samples from the fitted model to investigate variability or other properties We use the model to get new samples as oppose to the non-parametric bootstrap where the samples are from the data directly Since the model is fitted from the data, so the data are indirectly used
Resampling from the data or from distribution Simple Example Spline Example
Simple example – bootstrap and Monte Carlo
Bootstrap #Data x=scan("Table2_1.txt") n=length(x) mean(x) sd(x) #Bootstrapping variances B=1000 Bvar=vector(’numeric’,B) for(i in 1:B) { Bvar[i]=var(sample(x,n,rep=T)) } sd(Bvar) hist(Bvar,nclass=10) Monte Carlo #Monte Carlo study of variances N=15000 MCvar=vector(’numeric’,N) for(i in 1:N){ MCx=rnorm(n,50,0.1) MCvar[i]=var(MCx) } mean(MCvar) sd(MCvar) X11() #graphical window in Unix #windows() in Windows #quartz() in Mac hist(MCvar,nclass=10)
Resampling from the data or from distribution Simple Example Spline Example
Simple example – parametric bootstrap
Fitting the model by a normal model #Fit the model -- #
- - normal distribution
#Data x=scan("Table2_1.txt") n=length(x) mu=mean(x) sigma=sd(x) Parametric bootstrap #Simulate from the fit PB=1000 PBvar=vector(’numeric’,PB) PBsd=PBvar for(i in 1:PB) { PBx=rnorm(n,mu,sigma) PBvar[i]=var(PBx) } mean(PBvar) sd(PBvar) X11() hist(PBvar,nclass=10)
Resampling from the data or from distribution Simple Example Spline Example
Fitting cubic splines - B-spline basis
We fit the data on the right by the cubic B-splines hj(x), j = 1, ..., 7 on the left. Review question: Why there are seven B-splines?
Resampling from the data or from distribution Simple Example Spline Example
Fitting through linear regression
We look for a fit of the form µ(x) =
7
- j=1
βjhj(x). From the standard regression solution we get ˆ β = (HTH)−1HTy so that the fit is ˆ µ(x) =
7
- j=1
ˆ βjhj(x).
Resampling from the data or from distribution Simple Example Spline Example
Assessing uncertainty of the fit
We have obtained the fit but we want to assess its uncertainty (variability). The concept of variability of a curve is not that straightforward as the variability of a point estimate – there can be many ways to define it. The best is to observe how curve can vary for different fits to the model For these we need many samples of data Bootstrap can be suitable But how to resample from the data? One could resample directly from the data (both y’s and x’s). However when variability of x’s is not of interest, it is better to sample from the residuals.
Resampling from the data or from distribution Simple Example Spline Example
Resampling from the residuals
Residuals ˆ ε = y − ˆ µ(x) Compute bootstrap samples ε∗ from the residuals ˆ ε. For each new sample ε∗ evaluate bootstrap version of the output data y∗ = ˆ µ(x) + ε∗ Fit new cubic splines to each bootstrap sample, plot them on the graph.
Resampling from the data or from distribution Simple Example Spline Example
Parametric bootstrap
One can assume the normal model for errors with the mean zero and the variance ˆ σ2 =
N
- i=1