pseudo bayesian inference for complex survey data
play

Pseudo-Bayesian Inference for Complex Survey Data Matt Williams 1 - PowerPoint PPT Presentation

Pseudo-Bayesian Inference for Complex Survey Data Matt Williams 1 Terrance Savitsky 2 1 National Center for Science and Engineering Statistics National Science Foundation mrwillia@nsf.gov 2 Office of Survey Methods Research Bureau of Labor


  1. Pseudo-Bayesian Inference for Complex Survey Data Matt Williams 1 Terrance Savitsky 2 1 National Center for Science and Engineering Statistics National Science Foundation mrwillia@nsf.gov 2 Office of Survey Methods Research Bureau of Labor Statistics Savitsky.Terrance@bls.gov University of Michigan April 8, 2020 1

  2. Thank you! ◮ Terrance Savitsky for being a great collaborator and mentor. ◮ Brady West and Jennifer Sinibaldi for making this connection. ◮ Jill Esau for orchestrating. ◮ You all for sharing your time today! 2

  3. Bio 1. Work ◮ 9 years as mathematical statistical for federal government: USDA, HHS, NSF ◮ Sample design, weighting, imputation, estimation, disclosure limitation (production and methods development) 2. Consulting ◮ International surveys for agricultural production (USAID) and vaccination knowledge, attitudes, and behaviors (UNICEF) 3. Research (ORCID: 0000-0001-8894-1240) ◮ Constrained Optimization for Survey Applications (weight adjustment, benchmarking model estimates) ◮ Applying Bayesian inference methods to data from complex surveys. 3

  4. Outline 1 Informative Sampling (Savitsky and Toth, 2016) 2 Theory and Examples Consistency (Williams and Savitsky, 2020) Uncertainty Quantification (Williams and Savitsky, in press) 3 Implementation Details Model Fitting Variance Estimation 4 Related and Current Works 4

  5. Outline 1 Informative Sampling (Savitsky and Toth, 2016) 2 Theory and Examples Consistency (Williams and Savitsky, 2020) Uncertainty Quantification (Williams and Savitsky, in press) 3 Implementation Details Model Fitting Variance Estimation 4 Related and Current Works 5

  6. Example: Informative Sampling ◮ Take a sample from U.S. population of business establishments ◮ Single stage, fixed-size, pps sampling design ◮ y = (e.g., Hires, Separations) ◮ Size variable is total employment, x ◮ y �⊥ x . ◮ B = 500 Monte Carlo samples at each of n ν = (100 , 500 , 1500 , 2500) establishments 6

  7. Distributions of y in Informative Samples Hires Seps 400 300 Distribution of Response Values 200 100 0 1000 2000 1000 2000 pop 100 500 pop 100 500 Sample Size 7

  8. Population Inference from Informative Samples ◮ Goal: perform inference about a finite population generated from an unknown model, P θ 0 ( y ). ◮ Data: from under a complex sampling design distribution, P ν ( δ ) ◮ Probabilities of inclusion π i = Pr ( δ i = 1 | y ) are often associated with the variable of interest (purposefully) ◮ Sampling designs are “informative”: the balance of information in the sample � = balance in the population. ◮ Biased Estimation: estimate P θ 0 ( y ) without accounting for P ν ( δ ). ◮ Use inverse probability weights w i = 1 /π i to mitigate bias. ◮ Incorrect Uncertainty Quantification: ◮ Failure to account for dependence induced by P ν ( δ ) leads to standard errors and confidence intervals that are the wrong size. 8

  9. Outline 1 Informative Sampling (Savitsky and Toth, 2016) 2 Theory and Examples Consistency (Williams and Savitsky, 2020) Uncertainty Quantification (Williams and Savitsky, in press) 3 Implementation Details Model Fitting Variance Estimation 4 Related and Current Works 9

  10. Why Bayes? ◮ Allows more complex, non-parametric (semi-supervised) models ◮ Use hierarchical modeling to capture rich dependence in data ◮ Have small sample properties from posterior distribution ◮ Full uncertainty quantification ◮ Gold standard for imputation 10

  11. Pseudo Posterior ◮ Pseudo posterior ∝ Pseudo Likelihood × Prior � n � p π ( θ | y , ˜ p ( y i | θ ) ˜ w i � w ) ∝ p ( θ ) i =1 w i := 1 π i w i w i = ˜ , i = 1 , . . . , n � w i n 11

  12. Similar Posterior Geometry y i | µ i , Φ − 1 � w i ∝ N P y i | µ i , [ w i Φ ] − 1 � � � N P n � ◮ normalize weights, w i = n , to scale posterior i =1 12

  13. Outline 1 Informative Sampling (Savitsky and Toth, 2016) 2 Theory and Examples Consistency (Williams and Savitsky, 2020) Uncertainty Quantification (Williams and Savitsky, in press) 3 Implementation Details Model Fitting Variance Estimation 4 Related and Current Works 13

  14. Pseudo Posterior Contraction - Count Data Distribution within 95% CI for Coefficient 0.6 0.7 0.8 0.9 0.5 0.6 0.7 0.8 pop weight 500 N × D ignore Ψ y id srs ∼ ind ∼ Pois (exp ( ψ id )) N × P pop X weight 1000 P × D B + N N × D ignore Sample Size srs pop weight 1500 � ignore I N , srs D × D Λ − 1 pop � weight 2500 ignore srs Emp_Seps Emp_Hires 14

  15. Frequentist Consistency of a (Pseudo) Posterior ◮ Estimated distribution p π ( θ | y , ˜ w ) collapses around generating parameter θ 0 with increasing population N ν and sample n ν sizes. ◮ Evaluated with respect to joint distribution of population generation P θ 0 ( y ) and the sample inclusion indicators P ν ( δ ). ◮ Conditions on the model P θ 0 ( y ) (standard) ◮ Complexity of the model limited by sample size ◮ Prior distribution not too restrictive (e.g. point mass) ◮ Conditions on the sampling design P ν ( δ ) (new) ◮ Every unit in population has non-zero probability of inclusion = ⇒ finite weights ◮ Dependence restricted to countable blocks of bounded size = ⇒ arbitrary dependence within clusters, but approximate independence between clusters. 15

  16. Simulation Example: Three-Stage Sample Area (PPS), Household (Systematic, sorting by Size), Individual (PPS) Deviation Deviation 1.0 40 0.5 30 0.0 20 −0.5 10 0 −1.0 Figure: Factorization matrix ( π ij / ( π i π j ) − 1) for two PSU’s. Magnitude (left) and Sign (right). Systematic Sampling ( π ij = 0). Clustering and PPS sampling ( π ij > π i π j ). Independent first stage sample ( π ij = π i π j ) 16

  17. Simulation Examples: Logistic Regression ◮ ind y i | µ i ∼ Bern ( F l ( µ i )) , i = 1 , . . . , N ◮ µ = − 1 . 88 + 1 . 0 ① 1 + 0 . 5 ① 2 ◮ The x 1 and x 2 distributions are N (0 , 1) and E ( r = 1 / 5) with rate r ◮ Size measure used for sample selection is ˜ ① 2 = ① 2 − min( ① 2 ) + 1, but neither ˜ ① 2 or ① 2 are available to the analyst. ◮ Intercept chosen so median of µ ≈ 0 → median of F l ( µ ) ≈ 0 . 5. 17

  18. Simulation Example: Three-Stage Sample (Cont) 50 100 200 400 800 2 1 Curve 0 −1 0.0 −2.5 logBias −5.0 −7.5 0 −1 logMSE −2 −3 −4 −5 −2 −1 0 1 2 −2 −1 0 1 2 −2 −1 0 1 2 −2 −1 0 1 2 −2 −1 0 1 2 x Figure: The marginal estimate of µ = f ( x 1 ). population curve , sample with equal weights , and inverse probability weights . Top to bottom: estimated curve, log of BIAS, log MSE. Left to right: sample size (50 to 800). 18

  19. Outline 1 Informative Sampling (Savitsky and Toth, 2016) 2 Theory and Examples Consistency (Williams and Savitsky, 2020) Uncertainty Quantification (Williams and Savitsky, in press) 3 Implementation Details Model Fitting Variance Estimation 4 Related and Current Works 19

  20. Asymptotic Variances ◮ Let ℓ θ ( ② ) = log p ( ② | θ ). ◮ Rely on the variance and expected curvature of the score function ℓ θ 0 = ∂ 2 ℓ ˙ ∂θ | θ = θ 0 with ¨ ℓ θ 0 = ∂ℓ ∂ 2 θ | θ = θ 0 i ∈ U ν E P θ 0 ¨ ◮ H θ 0 = − 1 � ℓ θ 0 ( y ν i ) N ν i ∈ U ν E P θ 0 ˙ ℓ θ 0 ( y ν i ) ˙ 1 ℓ θ 0 ( y ν i ) T ◮ J θ 0 = � N ν ◮ Under correctly specified models: ◮ H θ 0 = J θ 0 (Bartlett’s second identity) ◮ Posterior variance N ν V ( θ | ② ) = H − 1 same as variance of MLE θ 0 (Bernstein-von Mises) 20

  21. Scaling and Warping of Pseudo MLE ◮ Mispecified (under-specified) full joint sampling distribution P ν ( δ ). ◮ Failure of Bartlett’s Second Identity for composite likelihood ◮ Asymptotic Covariance: H − 1 θ 0 H − 1 θ 0 J π θ 0 ◮ Simple Random Sampling: J π θ 0 = J θ 0 ◮ Unequal weighting: J π θ 0 ≥ J θ 0 �� 1 N ν � � θ 0 = J θ 0 + 1 � ℓ θ 0 ( y ν i ) ˙ ˙ J π ℓ θ 0 ( y ν i ) T E P θ 0 − 1 π ν i N ν i =1 ◮ Shape of asymptotic distribution warped by unequal weighting ∝ 1 π ν i ◮ If less efficient (cluster) sampling design : J π θ 0 ≥ J θ 0 ◮ If more efficient (stratified) sampling design : J π θ 0 ≤ J θ 0 21

  22. Asymptotic Covariances Different ◮ Pseudo MLE: H − 1 θ 0 H − 1 θ 0 J π θ 0 (Robust) ◮ Pseudo Posterior: H − 1 θ 0 (Model-based) ◮ The un-adjusted pseudo-posterior will give the wrong coverage of uncertainty regions. 22

  23. Adjust Pseudo Posterior draws to Sandwich ◮ ˆ θ m ≡ sample pseudo posterior for m = 1 , . . . , M draws with mean ¯ θ � � ◮ ˆ θ m − ¯ ˆ R − 1 2 R 1 + ¯ θ a m = θ θ 1 R 1 = H − 1 θ 0 H − 1 ◮ where R ′ θ 0 J π θ 0 2 R 2 = H − 1 ◮ R ′ θ 0 23

  24. Adjustment Procedure ◮ Procedure to compute adjustment, ˆ θ a m ◮ Input ˆ θ m drawn from single run of MCMC ◮ Re-sample data under sampling design ◮ Draw PSUs (clusters) without replacement ◮ Compute ˆ H θ 0 and ˆ J π θ 0 ◮ Expectations with respect to P θ 0 , P ν � N ν ◮ Let P π 1 δ ν i N ν = π ν i δ ( y ν i ) i =1 N ν � � N ν ˙ ◮ J π P π θ 0 = Var P θ 0 , P ν ℓ θ 0 � � N ν ¨ ◮ H π P π θ 0 = − E P θ 0 , P ν ℓ θ 0 = H θ 0 24

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend