Calibrated Bayes, and Inferential Paradigm for Of7icial Statistics - PowerPoint PPT Presentation

Calibrated Bayes, and Inferential Paradigm for Of7icial Statistics in the Era of Big Data Rod Little

Overview • Design-based versus model-based survey inference • Calibrated Bayes • Some thoughts on Bayes and adaptive design Ross-Royall Symposium talk 2

Survey estimation • Design-based inference: population values are 7ixed, inference is based on probability distribution of sample selection. Obviously this assumes that we have a probability sample (or “quasi-randomization”, where we pretend that we have one) • Model-based inference: survey variables are assumed to come from a statistical model • Probability sampling is not the basis for inference, but is useful for making the sample selection ignorable . (see e.g. Gelman et al., 2003; Little 2004) Ross-Royall Symposium talk 3

Design vs model-based survey inference • Two main variants of model-based inference: – Superpopulation models : Frequentist inference based on repeated samples from a “ superpopulation ” model (Royall) – Bayes : add prior distribution for parameters; inference about 7inite population quantities or parameters based on posterior distribution • A fascinating part of the more general debate about frequentist versus Bayesian inference in statistics at large: – Design-based inference is inherently frequentist – Purest form of model-based inference is Bayes Ross-Royall Symposium talk 4

Limitations of design-based approach • Inference is based on probability sampling, but true probability samples are harder and harder to come by: – Noncontact, nonresponse is increasing – Face-to-face interviews increasingly expensive – Can’t do “big data” (e.g. internet, administrative data) from the design-based perspective • Theory is basically asymptotic -- limited tools for small samples, e.g. small area estimation Ross-Royall Symposium talk 5

Design-Based Approach Has Implicit Models • Although not explicitly model-based, models are needed to motivate the choice of estimator – E.g. the Horvitz-Thompson (HT) estimator assumes an y / implicit HT model that are “ exchangeable ” (iid π i i conditional on parameters) – If implicit models are unreasonable, then the resulting inferences can be very poor in moderate samples (Basu ’ s elephant being an extreme case) • Models arise more explicitly in the “ model- assisted ” paradigm (GREG) Ross-Royall Symposium talk 6

“Quasi”design-based inference • Key feature of design-based approach is weights, inversely proportional to prob of inclusion • Weights for selection, nonresponse, poststrati7ication • Modeling the inclusion propensities, using frequentist or Bayesian methods, leads to weights that are less variable, potentially increasing precision • Inference remains essentially design-based – in my view; a full Bayesian analysis involves models for the survey variables • Need terms to codify this distinction: maybe weight modeling and prediction modeling Ross-Royall Symposium talk 7

Model-based approaches • In model-based , or model-dependent , approaches, models are the basis for the entire inference: estimator, standard error, interval estimation • Two main variants: – Superpopulation modeling – Bayesian (full probability) modeling • Common theme is to predict non-sampled and nonresponding portion of the population, conditional on the sample and model • Superpopulation models are super, but Bayes is better! Ross-Royall Symposium talk 8

Parametric models Usually prior distribution is speci7ied via parametric models: p Y Z ( | ) p Y Z ( | , ) ( | p Z d ) = ∫ θ θ θ p Y Z θ ( | , ) = parametric model, as in superpopulation approach p ( | Z ) = prior distribution for θ θ Inference about is then obtained from its posterior θ distribution, computed via Bayes’ Theorem: p ( | Y , ) Z p ( | Z ) L ( | Y , ) Z θ = ∝ θ × θ inc inc L ( | Y , ) Z Likelihood function θ = inc That is: Posterior = Prior x Likelihood… Posterior for leads to inference about population θ quantities by posterior predictive distribution Ross-Royall Symposium talk 9

The model-based perspective- pros • Flexible, uni7ied approach for all survey problems – Models for nonresponse, response and matching errors, small area models, combining data sources, big data – Causal inference requires models • Bayesian approach is not asymptotic, provides better small-sample inferences • Probability sampling is justi7ied as making sampling mechanism ignorable, improving robustness – Rubin’s theory on ignorable selection/nonresponse is the right framework for assessing non-probability samples Ross-Royall Symposium talk 10

The model-based perspective- cons • Explicit dependence on the choice of model, which has subjective elements (but assumptions are explicit) • Bad models provide bad answers – justi7iable concerns about the effect of model misspeci7ication • Models are needed for all survey variables – need to understand the data, and potential for more complex computations • Infrastructure: need personnel trained in statistical modeling Ross-Royall Symposium talk 11

The current “status quo” -- design- model compromise • Design-based for large samples, descriptive statistics – But may be model assisted , e.g. regression calibration: N N ˆ ˆ ˆ ˆ T y I y ( y ) / , y model prediction ∑ ∑ = + − π = GREG i i i i i i i 1 i 1 = = – model estimates adjusted to protect against misspeci7ication, (e.g. Särndal, Swensson and Wretman 1992). • Model-based for small area estimation, nonresponse, time series,… • Attempts to capitalize on best features of both paradigms… but … at the expense of “ inferential schizophrenia ” (Little 2012)? Ross-Royall Symposium talk 12

Example: when is an area “small”? Design-based inference n n 0 = “Point of - inferential ----------------------------------- o schizophrenia” m Model-based inference e t e How do I choose n 0 ? r If n 0 = 35, should my entire statistical philosophy and inference be different when n=34 and n=36? n=36, CI: [ ] (wider since based on direct estimate) n=34, CI: [ ] (narrower since based on model) Ross-Royall Symposium talk 13

Multilevel (hierarchical Bayes) models Model estimate % ˆ w y (1 w ) µ = + − µ a a a a a π n Direct estimate - 1 o w m a e t e 0 r Sample size n Bayesian multilevel model estimates borrow strength increasingly from model as n decreases Ross-Royall Symposium talk 14

Calibrated Bayes • Frequentists should be Bayesian – Bayes is optimal under a correctly speci7ied model • Bayesians should be frequentist – We never know the model (and all models are wrong) – Inferences should be robust to misspeci7ication, have good repeated sampling characteristics • Calibrated Bayes (Box 1980, Rubin 1984, Little 2006, 2012, 2013) – Inference based on a Bayesian model – Model chosen to yield inferences that are well-calibrated in a frequentist sense – Aim for posterior credibility intervals that have (approximately) nominal frequentist coverage Ross-Royall Symposium talk 15

Calibrated Bayes models for surveys should incorporate sample design features • The “Calibrated” part of Calibrated Bayes implies: • Generally weak priors that are dominated by the likelihood (“objective Bayes”) • Models that incorporate sampling design features: – Capture design weights and stratifying variables as covariates in the prediction model (e.g. Gelman 2007) – Clustering via hierarchical random effects models Ross-Royall Symposium talk 16

Full model for Y and I p I ( | Y , Z φ , ) p Y I Z θ φ = ( , | , , ) p Y ( | Z θ , ) Model for Model for Population Inclusion • Full posterior distribution of parameters (hard): p ( , | Y , , ) Z I p ( , | Z L ) ( , | Y , , ) Z I θ φ ∝ θ φ θ φ obs obs • Posterior distribution ignoring the inclusion mechanism (easier): p ( | Y , ) Z p ( | Z L ) ( | Y , ) Z θ ∝ θ θ obs obs • When the full posterior reduces to this simpler posterior, the inclusion mechanism is called ignorable for Bayesian inference (Rubin 1976) Ross-Royall Symposium talk 17

Conditions when inclusion mechanism can be ignored • Two general and simple suf7icient conditions for ignoring the data-collection mechanism are: Inclusion at Random (IAR): p I Y Z ( | , , ) p I Y ( | , , ) for all . Z Y φ = φ obs Bayesian Distinctness: p ( , | Z ) p ( | Z p ) ( | Z ) θ φ = θ φ • Ignorability is speci7ic to the survey variable Y, unlike probability sampling, which guarantees ignorability for any outcome • In adaptive design, can include paradata or survey data Y obs from earlier waves Ross-Royall Symposium talk 18

Bayes and responsive design • Predictive Bayes modeling has more potential for gains in ef7iciency than Bayesian weight modeling – Need to model survey variables! – Speci7ically, model relationship of survey variables with weights (as covariates) Ross-Royall Symposium talk 19

Calibrated Bayes, and Inferential Paradigm for Of7icial Statistics - PowerPoint PPT Presentation

Calibrated Bayes, and Inferential Paradigm for Of7icial Statistics in the Era of Big Data Rod Little Overview Design-based versus model-based survey inference Calibrated Bayes Some thoughts on Bayes and adaptive design Ross-Royall

The Calibrated Bayes Factor for Model Comparison Steve MacEachern The Ohio State University

Inferential Statistics Inferential statistics are used to test

Naive Bayes and Gaussian Bayes Classifier Ladislav Rampasek slides by Mengye Ren and others

The Nave Bayes Classifier Machine Learning 1 Todays lecture The nave Bayes Classifier

Bayes Theorem Thomas Bayes (1701-1761) Simple form of Bayes Theorem, for

Paradigm Shift: Moving from Vertical Paradigm Shift: Moving from Vertical Paradigm Shift:

Prolog Declarative/logic paradigm Functional paradigm No assignment statement

Cognitive Modeling Unseen Examples 2 Bayes Classifiers Lecture 14: Naive Bayes Classifiers

DATA MINING: NAVE BAYES 1 Nave Bayes Classifier Thomas Bayes 1702 - 1761 We will start off

STAT 339 Naive Bayes Classification 8-10 March 2017 Colin Reimer Dawson Outline Naive Bayes

Bayes Classifiers Nave Bayes Classification Patrick Mair Bayes Classifiers Weather data

I ntroduction to Mobile Robotics Bayes Filter Kalm an Filter Wolfram Burgard 1 Bayes

towards an inferential lexicon of event selecting predicates for french Ingrid Falk and Fabienne

Validity-preservation properties of rules for combining inferential models combining

On Computational Thinking, Inferential Thinking and Big Data Michael I. Jordan University

Calibrated Bayes: an attractive framework for official statistics in the 21st century Roderick

Regression L2: Curve fitting and Given a set of observations: x = { x 1 . . . x N } probability

Reasoning with Probabilities Paolo Turrini Department of Computing, Imperial College London

Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric? Zoubin

Ba y esian Learning Read Ch Suggested exercises

Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto

BAYES FORMULA a two-stage experiment Xingru Chen xingru.chen.gr@dartmouth.edu XC 2020

Bayesian Inverse Problems and Uncertainty Quantification Hanne Kekkonen Centre for Mathematical

The mind is a neural computer, fitted by natural selection with combinatorial algorithms for