bootstrap
play

Bootstrap Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on - PowerPoint PPT Presentation

Stat 451 Lecture Notes 08 12 Bootstrap Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 9 in Givens & Hoeting, Chapter 24 in Lange 2 Updated: April 4, 2016 1 / 36 Outline 1 Introduction 2 Nonparametric bootstrap 3


  1. Stat 451 Lecture Notes 08 12 Bootstrap Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 9 in Givens & Hoeting, Chapter 24 in Lange 2 Updated: April 4, 2016 1 / 36

  2. Outline 1 Introduction 2 Nonparametric bootstrap 3 Parametric bootstrap 4 Bootstrap in regression 5 Better bootstrap CIs 6 Remedies for bootstrap failure 7 Further remarks 2 / 36

  3. Motivation For hypothesis testing and confidence intervals, there is a “statistic” whose sampling distribution is required. For example, to test if H 0 : µ = µ 0 based on a sample from a N( µ, σ 2 ) population, we use the t-statistic T = X − µ 0 S / √ n , whose null distribution (under stated conditions) is Student-t. But almost any deviation from this basic setup leads to a tremendously difficult distributional calculation. The goal of the bootstrap is to give a simple approximate solution based on simulations. 3 / 36

  4. Notation For a distribution with cdf F , suppose we are interested in a parameter θ = ϕ ( F ), written as a functional of F . Examples: � Mean: ϕ ( F ) = x dF ( x ); Median: ϕ ( F ) = inf { x : F ( x ) ≥ 0 . 5 } ; ... Given data X = { X 1 , . . . , X n } from F , the empirical cdf is n � F ( x ) = 1 � I { X i ≤ x } , x ∈ R . n i =1 Then a natural estimate of θ is ˆ θ = ϕ ( � F ), the same functional of the the empirical cdf. 4 / 36

  5. Notation (cont.) For inference, some statistic T ( X , F ) is used; e.g., T ( X , F ) = X − µ 0 S / √ n . Again, the sampling distribution of T ( X , F ) may be very complicated, unknown, or could depend on unknown F . Bootstrap Idea: Replace the unknown cdf F with the empirical cdf � F . Produce a numerical approximation of the sampling distribution of T ( X , F ) by repeated sampling from � F . 5 / 36

  6. Notation (cont.) Let X ⋆ = { X ⋆ n } be an iid sample from � 1 , . . . , X ⋆ F , i.e., a sample of size n (with replacement) from X . Given X ⋆ , the statistic T ⋆ = T ( X ⋆ , � F ) can be evaluated. Repeated sampling of X ⋆ gives a sequence of T ⋆ ’s which can be used to approximate the distribution of T ( X , F ). For example, V { T ( X , F ) } ≈ Var { T ⋆ 1 , . . . , T ⋆ B } . Why should bootstrap work? Glivenko–Cantelli theorem says that � F → F as n → ∞ . So, iid sampling from � F should be approximately the same as iid sampling from F when n is large. 3 3 Lots of difficult theoretical work has been done to determine what it means for this approximation to be good and in what kinds of problems does it fail. 6 / 36

  7. Outline 1 Introduction 2 Nonparametric bootstrap 3 Parametric bootstrap 4 Bootstrap in regression 5 Better bootstrap CIs 6 Remedies for bootstrap failure 7 Further remarks 7 / 36

  8. Basic setup Above procedure is essentially the nonparametric bootstrap . Sampling distribution of T ( X , F ) is approximated directly by the empirical distribution of the bootstrap sample T ⋆ 1 , . . . , T ⋆ B . For example: Quantiles of T ( X , F ) are approximated by sample quantiles of T ⋆ 1 , . . . , T ⋆ B . Variance of T ( X , F ) is approximated by the sample variance of T ⋆ 1 , . . . , T ⋆ B . Bootstrap sample size is usually rather large, e.g., B ∼ 1000, but computationally manageable. 8 / 36

  9. Example: variance of a sample median Example 29.4 in DasGupta (2008). X 1 , . . . , X n iid Cauchy with median µ . The sample mean X is a bad estimator of µ ( why? ), so use the sample median M n instead. For odd n , say n = 2 k + 1, there is an exact formula: � π/ 2 2 n ! x k ( π − x ) k (cot x ) 2 dx . V( M n ) = ( k !) 2 π n 0 A CLT-type of asymptotic approximation is � V( M n ) = π 2 / 4 n . What about a bootstrap approximation? 9 / 36

  10. Example: variance of sample median (cont.) With n = 21 we have � V( M n ) = 0 . 1367 and V( M n ) = 0 . 1175 . A bootstrap 4 estimate of V( M n ) using B = 5000 is � V( M n ) boot = 0 . 1102 . Slight under-estimate of the variance... Main point is that we got a pretty good answer with essentially no effort — computer does all the hard work. 4 Note that I used set.seed(77) in the code... 10 / 36

  11. Technical points What does it mean for the bootstrap to “work?” H n ( x ) is the true distribution function for ˆ θ n . n ( x ) is the true distribution function for ˆ H ⋆ θ ⋆ n . Bootstrap is “consistent” if the distance between H n ( x ) and H ⋆ n ( x ) converges to 0 (in probability) as n → ∞ . The bootstrap is successful in many problems, but there are known situations when it may fail: support depends on parameter; true parameter sits on boundary of parameter space; estimator convergence rate � = n − 1 / 2 ; ... The bootstrap can detect skewness in the distribution of ˆ θ n while CLT-type of approximations will not — often has a “second-order accuracy” property. Bootstrap often underestimate variances. 11 / 36

  12. Bootstrap confidence intervals (CIs) A primary application of bootstrap is to construct CIs. The simplest approach is the percentile method . Let ˆ 1 , . . . , ˆ θ ⋆ θ ⋆ B be a bootstrap sample of point estimators. A two-sided 100(1 − α )% bootstrap percentile CI is [ ξ ⋆ α/ 2 , ξ ⋆ 1 − α/ 2 ] , where ξ ⋆ p is the 100 p percentile in the bootstrap sample. Simple and intuitive, but there are “better” methods. 12 / 36

  13. Outline 1 Introduction 2 Nonparametric bootstrap 3 Parametric bootstrap 4 Bootstrap in regression 5 Better bootstrap CIs 6 Remedies for bootstrap failure 7 Further remarks 13 / 36

  14. Definition Parametric bootstrap is a variation on the standard (nonparametric) bootstrap discussed previously. Let F = F θ be a parametric model. The parametric bootstrap replaces sampling iid from � F with θ , where ˆ sampling iid from F ˆ θ is some estimator of θ . Potentially more complicated than nonparametric bootstrap because sampling from F ˆ θ might be more difficult than sampling from � F . 14 / 36

  15. Example: variance of sample median X 1 , . . . , X n iid Cauchy with median µ . Write M n for sample median. Parametric bootstrap samples X ⋆ 1 , . . . , X ⋆ n from a Cauchy distribution with median M n . Using B = 5000, parametric bootstrap gives � V( M n ) p-boot = 0 . 1356 . A bit closer to the true variance, V( M n ) = 0 . 1367, compared to nonparametric bootstrap. 15 / 36

  16. Example: random effect model Hierarchical model: ∼ N( λ, ψ 2 ) iid ( µ 1 , . . . , µ n ) Y i | µ i ∼ N( µ i , σ 2 i ) , i = 1 , . . . , n . Parameters ( λ, ψ ) unknown but σ i ’s known. Parameter of interest is ψ ≥ 0, and values ψ ≈ 0 are of interest because it suggests homogeneity. ind ∼ N( λ, σ 2 i + ψ 2 ), i = 1 , . . . , n . Non-hierarchical version: Y i Can estimate ψ via maximum likelihood. Use parametric bootstrap to get confidence intervals? 16 / 36

  17. Example: random effect model (cont.) Want to see what happens when ψ ≈ 0. Take ψ = n − 1 / 2 , near the boundary of ψ ≥ 0. Two-sided 95% parametric bootstrap percentile intervals have pretty low coverage in this case, even for large n . It is possible to get intervals with exact coverage... n Coverage Length 50 0.758 0.183 100 0.767 0.138 250 0.795 0.079 500 0.874 0.039 17 / 36

  18. Outline 1 Introduction 2 Nonparametric bootstrap 3 Parametric bootstrap 4 Bootstrap in regression 5 Better bootstrap CIs 6 Remedies for bootstrap failure 7 Further remarks 18 / 36

  19. Setup i , y i ) ⊤ are Consider an observational study where pairs z i = ( x ⊤ sampled from a joint predictor-response distribution. Let Z = { z 1 , . . . , z n } . Following the basic bootstrap principle above, repeatedly sample Z ⋆ = { z ⋆ 1 , . . . , z ⋆ n } with replacement from Z . Then do the same approximation of sampling distributions based on the empirical distribution from the bootstrap sample. This is called the paired bootstrap . What about for a fixed design? Complication is that the y i ’s are not iid. In such cases, first resample the residuals e i = y i − ˆ y i from the i ˆ original LS fit, and then set y ⋆ i = x ⊤ β + e ⋆ i . 19 / 36

  20. Example: ratio of slope coefficients Consider the simple linear regression model y i = β 0 + β 1 x i + ε i , i = 1 , . . . , n , where ε 1 , . . . , ε n are iid mean zero, not necessarily normal. Assume this is an observational study . Suppose the parameter of interest is θ = β 1 /β 0 . A natural estimate of θ is ˆ θ = ˆ β 1 / ˆ β 0 . To get the (paired) bootstrap distribution of ˆ θ : Sample Z ⋆ = { z ⋆ 1 , . . . , z ⋆ n } with replacement from Z . Fit the regression model with data Z ⋆ to obtain ˆ 0 and ˆ β ⋆ β ⋆ 1 . θ ⋆ = ˆ Evaluate ˆ 1 / ˆ β ⋆ β ⋆ 0 . 20 / 36

  21. Example: ratio of slope coefficients (cont.) Figure below shows bootstrap distribution of ˆ θ = ˆ β 1 / ˆ β 0 . 95% bootstrap percentile confidence interval for θ is: ( − 0 . 205 , − 0 . 173) . 50 40 Density 30 20 10 0 -0.24 -0.22 -0.20 -0.18 -0.16 theta.paired 21 / 36

  22. Outline 1 Introduction 2 Nonparametric bootstrap 3 Parametric bootstrap 4 Bootstrap in regression 5 Better bootstrap CIs 6 Remedies for bootstrap failure 7 Further remarks 22 / 36

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend