Is Random Sampling Necessary? Dan Hedlin Department of Statistics, Stockholm University

Focus on official statistics ● Trust is paramount (Holt 2008) ● Very wide group of users ● Official statistics is official ● Bias important ● Not as cost sensitive as market research ● Strong emphasis on generality and precision (trade- off in model building between generality, realism and precision, see Levins 1966 and Baker et al 2013 sec 8.2) NTTS 2015. Dan Hedlin, Stockholm University 10/3/2015 2

Trends ● “Wider , Deeper, Quicker, Better, Cheaper” (Holt 2007) ● Increasing rates of nonresponse, hard to find, hard to contact in the Western world ● Expansion of data sources, data collection methods; mixed modes ● Expanding research, mostly application driven. NTTS 2015. Dan Hedlin, Stockholm University 10/3/2015 3

On bias ● In official statistics, a minimum MSE estimator is not necessarily desirable. Bias is not on the same footing as variance ● Note that point estimates, not interval estimates, are used ● Bias is (potentially) worse than variance ● Also an issue of trust NTTS 2015. Dan Hedlin, Stockholm University 10/3/2015 4

On balanced samples ● What is balance? ● Basically that this holds: 𝑦 𝑘 𝑦 𝑘 𝑜 = for some 𝑡 𝑉 𝑂 number j (ignoring weights). (Valliant et al. 2000) “Sample balance” ● Or 𝒚 𝑠 “Response set balance” 𝑡 = 𝒚 NTTS 2015. Dan Hedlin, Stockholm University 10/3/2015 5

A balanced sample is good to have 𝑠 + 𝜸 𝑡 − 𝜸 𝑠 ´𝒚 𝒕 ● 𝑧 𝑡 − 𝑧 𝑠 = 𝒚 𝑡 − 𝒚 𝑠 ´𝛄 (Särndal & Lundquist 2014) 𝑡 − 𝜸 𝑠 ? ● Does small 𝒚 𝑠 imply small 𝜸 𝑡 − 𝒚 ● The answer is “probably yes” (Särndal & Lundquist 2014, Sec. 6) NTTS 2015. Dan Hedlin, Stockholm University 10/3/2015 6

● So what is the causal mechanism 𝑡 − 𝜸 𝑠 ? 𝑠 -> small 𝜸 small 𝒚 𝑡 − 𝒚 Loosely speaking, it is: Small variance of response propensities (in groups defined by x) 𝑡 , 𝒚 𝑠 and that we can manipulate ● Note that we know 𝒚 𝑠 by adaptive sampling 𝒚 (Schouten et al. 2013, Särndal & Lundquist 2014) NTTS 2015. Dan Hedlin, Stockholm University 10/3/2015 7

Comparing three strategies Perfect frame + random sample + unknown 1. response propensities + we strive for response set balance The same as 1 but with nonrandom sample 2. The same as 1 but with a restricted frame. There is 3. auxiliary data on the frame. Only deficiency is undercoverage. E.g. large, “good” web panel. NTTS 2015. Dan Hedlin, Stockholm University 10/3/2015 8

Scene ● Specify 𝑔(𝐙|𝐘; 𝛄) for all variables Y for all N units. X for sampling design end estimation. Analytic aim: inference about 𝛄 1. Descriptive aim: inference about 𝐙 𝒕 2. ● Some y used for post-stratification 𝐙 = 𝐙 𝑞𝑝𝑡𝑢 , 𝐙 𝑛 ● For robustness of post-stratification to nonresponse, see Särndal & Lundström (2005) 𝑞𝑝𝑡𝑢 , 𝐘; 𝛄) for inference about 𝐙 𝒕 ● We need 𝑔(𝐙 𝑡 𝑛 |𝐙 𝑡 𝑞𝑝𝑡𝑢 , 𝐚, 𝐘; 𝛄) , Z indicates web panel 𝑛 |𝐙 𝑡 ● Further 𝑔(𝐙 𝑡 membership in Strategy 3 NTTS 2015. Dan Hedlin, Stockholm University 10/3/2015 9

Sample selection ignorability criterion: 1. 𝑞𝑝𝑡𝑢 , 𝐘 𝑔 𝐉 𝑡 𝐙 𝑡 , 𝐘 = 𝑔 𝐉 𝑡 𝐙 𝑡 ● True also for some nonrandom sampling designs, e.g. sample balanced designs (Little 1982, Smith 1983) To be able to ignore nonresponse: 2. 𝑞𝑝𝑡𝑢 , 𝐘 𝑔 𝑲 𝑠 𝐉 𝑡 , 𝐙 𝑡 , 𝐘 = 𝑔 𝑲 𝑠 𝐉 𝑡 , 𝐙 𝑡 To be able to ignore web panel selection 3. mechanism: 𝑛 𝐙 𝑡 𝑛 𝐙 𝑡 𝑞𝑝𝑡𝑢 , 𝐚, 𝐘; 𝛄 = 𝑔 𝐙 𝑡 𝑞𝑝𝑡𝑢 , 𝐘; 𝛄 𝑔 𝐙 𝑡 (Little 1982, Smith 1983, Valliant et al. 2003) NTTS 2015. Dan Hedlin, Stockholm University 10/3/2015 10

● We restrict attention to ignorable sampling designs; nonrandom samples must be ignorable ● Hence it is Strategy 3 that is different. ● Does criterion 3 hold in practice? E.g. Sjöström (2012) found that sometimes it does, sometimes it does not. See also Baker et al. (2013). ● Note also that Strategy 3 has to a some limited extent always been in use in survey sampling, in particular in business surveys (cut-off sampling) NTTS 2015. Dan Hedlin, Stockholm University 10/3/2015 11

Further issues ● Suppose you are successful in balancing the set of responses. Does it matter whether you have started from a random sample or a nonrandom, ignorable sample? It would seem that it does not. ● A more practical issue: If you strive for balancing the response set, is it easier to start from a random sample? ● What is best, balancing response set or adjusting through estimation? Some evidence that balancing is slightly better (Schouten et al. 2014) NTTS 2015. Dan Hedlin, Stockholm University 10/3/2015 12

● Of course, there is a broader picture (Schouten et al. 2012) NTTS 2015. Dan Hedlin, Stockholm University 10/3/2015 13

References Baker, R. et al. (2013). Report on the AAPOR task force on non-probability ● sampling. American Association for Public Opinion Research. Holt, D. (2007). The Official Statistics Olympic Challenge: Wider, Deeper, Quicker, ● Better, Cheaper. (With discussion). The American Statistician , 61, 1-15. Holt, D. (2008). Official statistics, public policy and public trust. Journal of the ● Royal Statistical Society, Series A, 171, 1 – 20. Levins, R. (1966). The strategy of model building in population biology. American ● Scientist. Little, R. J.A. (1982). Models for Nonresponse in Sample Surveys. Journal of the ● American Statistical Association, 77, 237-250. Särndal, C.-E. and Lundquist, P . (2014). Accuracy in Estimation with ● Nonresponse: A Function of Degree of Imbalance and Degree of Explanation. Journal of Survey Statistics and Methodology, 1-27. ● Särndal, C.-E. and Lundström, S. (2012). Estimation in Surveys with Nonresponse. New York: Wiley. NTTS 2015. Dan Hedlin, Stockholm University 10/3/2015 14

Schouten, B., Bethlehem, J., Beullens, K., Kleven, Ø., Loosvelt, G., Luiten, A., Rutar, K., ● Shlomo, N. and Skinner, C. (2012). Evaluating, Comparing, Monitoring, and Improving Representativeness of Survey Response Through R-Indicators and Partial R-Indicators. International Statistical Review, 80, 382-399. Shouten, B., Calinescu, M. and Luiten, A. (2013). Optimizing quality of response through ● adaptive survey designs. Survey Methodology, 39, 29-58. Scouten, B., Cobben, F., Lundquist, P. and Wagner, J. (2014). Theoretical and Empirical ● Support for Adjustment of Nonresponse by design. Discussion paper, 2014/15, Statistics Netherlands. Sjöström, T. (2012). Självrekryterade jämfört med slumpmässigt rekryterade paneler. ● Novus, Sweden. (in Swedish) Smith, T.M.F. (1983). On the validity of inferences from non-random sample. Journal of ● the Royal Statistical Society, Series A, 146, 394-403. Valliant, R., Dorfman, A.H. and Royall, R.M. (2000). Finite Population Sampling and ● Inference: A Prediction Approach. New York: Wiley. NTTS 2015. Dan Hedlin, Stockholm University 10/3/2015 15

Download Presentation

Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend

More recommend