calibrated bayes an attractive framework for official
play

Calibrated Bayes: an attractive framework for official statistics in - PowerPoint PPT Presentation

Calibrated Bayes: an attractive framework for official statistics in the 21st century Roderick J. Little Overview Design-based versus model-based survey inference Current orthodoxy: design-model compromise Strengths and drawbacks


  1. Calibrated Bayes: an attractive framework for official statistics in the 21st century Roderick J. Little

  2. Overview • Design-based versus model-based survey inference • Current orthodoxy: design-model compromise – Strengths and drawbacks • An alternative: Calibrated Bayes • Two US Census Bureau applications – Disclaimer: views are mine, not US Census Bureau 2 NTTS 2015: Calibrated Bayes

  3. Overview • Design-based versus model-based survey inference • Current orthodoxy: design-model compromise – Strengths and drawbacks • An alternative: Calibrated Bayes • Two US Census Bureau applications – Disclaimer: views are mine, not US Census Bureau 3 NTTS 2015: Calibrated Bayes

  4. Survey estimation • Design-based inference: population values are fixed, inference is based on probability distribution of sample selection. Obviously this assumes that we have a probability sample (or “quasi - randomization”, where we pretend that we have one) • Model-based inference: survey variables are assumed to come from a statistical model: probability sampling is not the basis for inference, but useful for making the sample selection ignorable . (see e.g. Gelman et al., 2003; Little 2004) 4 NTTS 2015: Calibrated Bayes

  5. Design vs model-based survey inference • Two main variants of model-based inference: – Superpopulation models : Frequentist inference based on repeated samples from a “ superpopulation ” model – Bayes : add prior distribution for parameters; inference about finite population quantities or parameters based on posterior distribution • A fascinating part of the more general debate about frequentist versus Bayesian inference in statistics at large: – Design-based inference is inherently frequentist – Purest form of model-based inference is Bayes 5 NTTS 2015: Calibrated Bayes

  6. Design-based inference   ( ,..., ) = population values (fixed); design variables Y Y Y Z 1 N  ( , ) = finite population quantity Q Q Y Z  ( ,..., ) = Sample Inclusion Indicators (random) I I I 1 N I i  R 1 , unit included in sample S T 0 , otherwise  part of included in the survey Y Y inc  ˆ ˆ ( , , ) = sample estimate of q q Y I Z Q inc ˆ ˆ ( , , ) = sample estimate of , the variance of V Y I Z V q inc   ˆ ˆ    ˆ ˆ 1.96 , 1.96 95% confidence interval for q V q V Q 6 NTTS 2015: Calibrated Bayes

  7. Choice of ˆ q Seek good design-based properties:  ˆ : ( | ) (too strong) design unbiasedness E q Y Q  ˆ Or weaker: : as sample size gets large design consistency q Q It is natural to seek an estimate that is - design efficient However, this kind of optimality is not possible without a model (Horvitz and Thompson 1952, Godambe 1955) There are many choices of design-consistent estimates ... Many survey estimates are motivated by mod els: implicit  Regression model regression estimator  Ratio model rat io estimator, etc. 7 NTTS 2015: Calibrated Bayes

  8. Limitations of design-based approach • Inference is based on probability sampling, but true probability samples are harder and harder to come by: – Noncontact, nonresponse is increasing – Face-to-face interviews increasingly expensive – High proportion of available information is now not based on probability samples (e.g. internet, administrative data) • Theory is basically asymptotic -- limited tools for small samples, e.g. small area estimation 8 NTTS 2015: Calibrated Bayes

  9. Asymptotia Highlands How many more to reach the promised land of asymptotia? Murky sub- asymptotial forests Design-based methods live in the land of asymptotia 9

  10. Model-based approaches • In model-based , or model-dependent , approaches, models are the basis for the entire inference: estimator, standard error, interval estimation • Two variants: – Superpopulation modeling – Bayesian (full probability) modeling • Common theme is to “infer” or “predict” about non - sampled portion of the population, conditional on the sample and model • Superpopulation is super, but Bayes is better … for small samples 10 NTTS 2015: Calibrated Bayes

  11. Bayes inference for surveys Model: ( | ) = prior distribution for p Y Z Y  Data: ampled values of ; = design variables Y s Y Z inc  Inference about ( , ) are based on Q Q Y Z posterior predictive distribution ( ( , ) | , ) p Q Y Z Y Z inc In particular:  ˆ One estimate is posterior mean: ( | , ) q E Q Y Z inc Standard error is posterior sd: ( | , ) Var Q Y Z inc 95% posterior probability interval plays role of confidence interval (with a simpler interpretat ion) 11 NTTS 2015: Calibrated Bayes

  12. Parametric models Usually prior distribution is specified via parametric models:      ( | ) ( | , ) ( | ) p Y Z p Y Z p Z d p Y Z  ( | , ) = parametric model, as in superpopulation approach   ( | ) = prior distribution for p Z  Inference about is then obtained from its posterior distribution, computed via Bayes’ Theorem:      ( | , ) ( | ) ( | , ) p Y Z p Z L Y Z inc inc   ( | , ) Likelihood function L Y Z inc That is: Posterior = Prior x Likelihood 12 NTTS 2015: Calibrated Bayes

  13. Example. Spline model on weights   n 1  Sample Population     / ; selection prob y y   HT i i i   Z Y Z N  1 i A modeling alternative to the HT estimator is create predictions from a more robust model relating to : Y Z   n N 1    ˆ ˆ = , predictions from model, e.g.: y  y y  y mod i i i   N    1 1 i i n    2 2 ~ Nor( , ); leads to y y i i i HT     2 k ~ Nor( ( ), ); ( ) = penalized spline of on y S S Y Z i i i i Simulations in Zheng and Little (2005) suggest better RMSE, confidence coverage for spline model compared with design-based approaches 13 NTTS 2015: Calibrated Bayes

  14. The model-based perspective- pros • Flexible, unified approach for all survey problems – Models for nonresponse, response and matching errors, small area models, combining data sources • Bayesian approach is not asymptotic, provides better small-sample inferences • Probability sampling is justified as making sampling mechanism ignorable, improving robustness 14 NTTS 2015: Calibrated Bayes

  15. Models bring survey inference closer to the statistical mainstream B/F Gorilla Why? I am an Follow my (frequentist) economist, I statistical standards build models! NTTS 2015: Calibrated Bayes 15

  16. The model-based perspective- cons • Explicit dependence on the choice of model, which has subjective elements (but assumptions are explicit, not buried in a formula) • Bad models provide bad answers – justifiable concerns about the effect of model misspecification • Models are needed for all survey variables – need to understand the data, and potential for more complex computations 16 NTTS 2015: Calibrated Bayes

  17. Overview • Design-based versus model-based survey inference • Current orthodoxy: design-model compromise – Strengths and drawbacks • An alternative: Calibrated Bayes • Two US Census Bureau applications – Disclaimer: views are mine, not US Census Bureau 17 NTTS 2015: Calibrated Bayes

  18. The current “status quo” -- design- model compromise • Design-based for large samples, descriptive statistics – But may be model assisted , e.g. regression calibration: N N   ˆ      ˆ ˆ ˆ ( ) / , model prediction T y I y y y GREG i i i i i i   i 1 i 1 – model estimates adjusted to protect against misspecification, (e.g. Särndal, Swensson and Wretman 1992). • Model-based for small area estimation, nonresponse, time series,… • Attempts to capitalize on best features of both paradigms… but … at the expense of “inferential schizophrenia” (Little 2012)? 18 NTTS 2015: Calibrated Bayes

  19. Example: when is an area “s mall ”? Design-based inference n n 0 = “Point of - inferential ----------------------------------- o schizophrenia” m Model-based inference e t e How do I choose n 0 ? r If n 0 = 35, should my entire statistical philosophy and inference be different when n=34 and n=36? n=36, CI: [ ] (wider since based on direct estimate) n=34, CI: [ ] (narrower since based on model) 19 NTTS 2015: Calibrated Bayes

  20. Multilevel (hierarchical Bayes) models Model estimate      ˆ (1 ) w y w  a a a a a n Direct estimate - 1 o w m a e t e 0 r Sample size n Bayesian multilevel model estimates borrow strength increasingly from model as n decreases 20 NTTS 2015: Calibrated Bayes

  21. Overview • Design-based versus model-based survey inference • Current orthodoxy: design-model compromise – Strengths and drawbacks • An alternative: Calibrated Bayes • Two US Census Bureau applications – Disclaimer: views are mine, not US Census Bureau 21 NTTS 2015: Calibrated Bayes

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend