Sampling Necessary? Dan Hedlin Department of Statistics, - PowerPoint PPT Presentation

Is Random Sampling Necessary? Dan Hedlin Department of Statistics, Stockholm University

Focus on official statistics ● Trust is paramount (Holt 2008) ● Very wide group of users ● Official statistics is official ● Bias important ● Not as cost sensitive as market research ● Strong emphasis on generality and precision (trade- off in model building between generality, realism and precision, see Levins 1966 and Baker et al 2013 sec 8.2) NTTS 2015. Dan Hedlin, Stockholm University 10/3/2015 2

Trends ● “Wider , Deeper, Quicker, Better, Cheaper” (Holt 2007) ● Increasing rates of nonresponse, hard to find, hard to contact in the Western world ● Expansion of data sources, data collection methods; mixed modes ● Expanding research, mostly application driven. NTTS 2015. Dan Hedlin, Stockholm University 10/3/2015 3

On bias ● In official statistics, a minimum MSE estimator is not necessarily desirable. Bias is not on the same footing as variance ● Note that point estimates, not interval estimates, are used ● Bias is (potentially) worse than variance ● Also an issue of trust NTTS 2015. Dan Hedlin, Stockholm University 10/3/2015 4

On balanced samples ● What is balance? ● Basically that this holds: 𝑦 𝑘 𝑦 𝑘 𝑜 = for some 𝑡 𝑉 𝑂 number j (ignoring weights). (Valliant et al. 2000) “Sample balance” ● Or 𝒚 𝑠 “Response set balance” 𝑡 = 𝒚 NTTS 2015. Dan Hedlin, Stockholm University 10/3/2015 5

A balanced sample is good to have 𝑠 + 𝜸 𝑡 − 𝜸 𝑠 ´𝒚 𝒕 ● 𝑧 𝑡 − 𝑧 𝑠 = 𝒚 𝑡 − 𝒚 𝑠 ´𝛄 (Särndal & Lundquist 2014) 𝑡 − 𝜸 𝑠 ? ● Does small 𝒚 𝑠 imply small 𝜸 𝑡 − 𝒚 ● The answer is “probably yes” (Särndal & Lundquist 2014, Sec. 6) NTTS 2015. Dan Hedlin, Stockholm University 10/3/2015 6

● So what is the causal mechanism 𝑡 − 𝜸 𝑠 ? 𝑠 -> small 𝜸 small 𝒚 𝑡 − 𝒚 Loosely speaking, it is: Small variance of response propensities (in groups defined by x) 𝑡 , 𝒚 𝑠 and that we can manipulate ● Note that we know 𝒚 𝑠 by adaptive sampling 𝒚 (Schouten et al. 2013, Särndal & Lundquist 2014) NTTS 2015. Dan Hedlin, Stockholm University 10/3/2015 7

Comparing three strategies Perfect frame + random sample + unknown 1. response propensities + we strive for response set balance The same as 1 but with nonrandom sample 2. The same as 1 but with a restricted frame. There is 3. auxiliary data on the frame. Only deficiency is undercoverage. E.g. large, “good” web panel. NTTS 2015. Dan Hedlin, Stockholm University 10/3/2015 8

Scene ● Specify 𝑔(𝐙|𝐘; 𝛄) for all variables Y for all N units. X for sampling design end estimation. Analytic aim: inference about 𝛄 1. Descriptive aim: inference about 𝐙 𝒕 2. ● Some y used for post-stratification 𝐙 = 𝐙 𝑞𝑝𝑡𝑢 , 𝐙 𝑛 ● For robustness of post-stratification to nonresponse, see Särndal & Lundström (2005) 𝑞𝑝𝑡𝑢 , 𝐘; 𝛄) for inference about 𝐙 𝒕 ● We need 𝑔(𝐙 𝑡 𝑛 |𝐙 𝑡 𝑞𝑝𝑡𝑢 , 𝐚, 𝐘; 𝛄) , Z indicates web panel 𝑛 |𝐙 𝑡 ● Further 𝑔(𝐙 𝑡 membership in Strategy 3 NTTS 2015. Dan Hedlin, Stockholm University 10/3/2015 9

Sample selection ignorability criterion: 1. 𝑞𝑝𝑡𝑢 , 𝐘 𝑔 𝐉 𝑡 𝐙 𝑡 , 𝐘 = 𝑔 𝐉 𝑡 𝐙 𝑡 ● True also for some nonrandom sampling designs, e.g. sample balanced designs (Little 1982, Smith 1983) To be able to ignore nonresponse: 2. 𝑞𝑝𝑡𝑢 , 𝐘 𝑔 𝑲 𝑠 𝐉 𝑡 , 𝐙 𝑡 , 𝐘 = 𝑔 𝑲 𝑠 𝐉 𝑡 , 𝐙 𝑡 To be able to ignore web panel selection 3. mechanism: 𝑛 𝐙 𝑡 𝑛 𝐙 𝑡 𝑞𝑝𝑡𝑢 , 𝐚, 𝐘; 𝛄 = 𝑔 𝐙 𝑡 𝑞𝑝𝑡𝑢 , 𝐘; 𝛄 𝑔 𝐙 𝑡 (Little 1982, Smith 1983, Valliant et al. 2003) NTTS 2015. Dan Hedlin, Stockholm University 10/3/2015 10

● We restrict attention to ignorable sampling designs; nonrandom samples must be ignorable ● Hence it is Strategy 3 that is different. ● Does criterion 3 hold in practice? E.g. Sjöström (2012) found that sometimes it does, sometimes it does not. See also Baker et al. (2013). ● Note also that Strategy 3 has to a some limited extent always been in use in survey sampling, in particular in business surveys (cut-off sampling) NTTS 2015. Dan Hedlin, Stockholm University 10/3/2015 11

Further issues ● Suppose you are successful in balancing the set of responses. Does it matter whether you have started from a random sample or a nonrandom, ignorable sample? It would seem that it does not. ● A more practical issue: If you strive for balancing the response set, is it easier to start from a random sample? ● What is best, balancing response set or adjusting through estimation? Some evidence that balancing is slightly better (Schouten et al. 2014) NTTS 2015. Dan Hedlin, Stockholm University 10/3/2015 12

● Of course, there is a broader picture (Schouten et al. 2012) NTTS 2015. Dan Hedlin, Stockholm University 10/3/2015 13

References Baker, R. et al. (2013). Report on the AAPOR task force on non-probability ● sampling. American Association for Public Opinion Research. Holt, D. (2007). The Official Statistics Olympic Challenge: Wider, Deeper, Quicker, ● Better, Cheaper. (With discussion). The American Statistician , 61, 1-15. Holt, D. (2008). Official statistics, public policy and public trust. Journal of the ● Royal Statistical Society, Series A, 171, 1 – 20. Levins, R. (1966). The strategy of model building in population biology. American ● Scientist. Little, R. J.A. (1982). Models for Nonresponse in Sample Surveys. Journal of the ● American Statistical Association, 77, 237-250. Särndal, C.-E. and Lundquist, P . (2014). Accuracy in Estimation with ● Nonresponse: A Function of Degree of Imbalance and Degree of Explanation. Journal of Survey Statistics and Methodology, 1-27. ● Särndal, C.-E. and Lundström, S. (2012). Estimation in Surveys with Nonresponse. New York: Wiley. NTTS 2015. Dan Hedlin, Stockholm University 10/3/2015 14

Schouten, B., Bethlehem, J., Beullens, K., Kleven, Ø., Loosvelt, G., Luiten, A., Rutar, K., ● Shlomo, N. and Skinner, C. (2012). Evaluating, Comparing, Monitoring, and Improving Representativeness of Survey Response Through R-Indicators and Partial R-Indicators. International Statistical Review, 80, 382-399. Shouten, B., Calinescu, M. and Luiten, A. (2013). Optimizing quality of response through ● adaptive survey designs. Survey Methodology, 39, 29-58. Scouten, B., Cobben, F., Lundquist, P. and Wagner, J. (2014). Theoretical and Empirical ● Support for Adjustment of Nonresponse by design. Discussion paper, 2014/15, Statistics Netherlands. Sjöström, T. (2012). Självrekryterade jämfört med slumpmässigt rekryterade paneler. ● Novus, Sweden. (in Swedish) Smith, T.M.F. (1983). On the validity of inferences from non-random sample. Journal of ● the Royal Statistical Society, Series A, 146, 394-403. Valliant, R., Dorfman, A.H. and Royall, R.M. (2000). Finite Population Sampling and ● Inference: A Prediction Approach. New York: Wiley. NTTS 2015. Dan Hedlin, Stockholm University 10/3/2015 15

Sampling Necessary? Dan Hedlin Department of Statistics, - PowerPoint PPT Presentation

Is Random Sampling Necessary? Dan Hedlin Department of Statistics, Stockholm University Focus on official statistics Trust is paramount (Holt 2008) Very wide group of users Official statistics is official Bias important Not

Sampling Methods Oliver Schulte - CMPT 419/726 Bishop PRML Ch. 11 Sampling Rejection Sampling

Chapter 7. Sampling Chapter 7. Sampling methods? methods? Two types of sampling methods Two

Multiple importance sampling Slides for CS6630 lecture 6 sampling the BRDF sampling the

What is the strengths and weakness of these sampling methods? Sampling Strengths /

Sampling Overview R toy sampling Non-probability sampling Probability Methods (AKA random)

Sampling Sediment and Sampling Sediment and Sampling Sediment and Porewater Sampling Sediment

Sampling Methods CMSC 678 UMBC Outline Recap Monte Carlo methods Sampling Techniques Uniform

Newfound Water Quality Sampling: In Lake Sampling 8 Historic Sampling locations

Sampling Distributions Sampling Distribution of the Mean & Hypothesis Testing Sampling

Overview of Sampling Topics (Shannon) sampling theorem Impulse-train sampling

Faster Gaussian Lattice Sampling using Information Leakage Gaussian Sampling Our Work Lazy

Introduction to Sampling for Non-Statisticians Dr. Safaa R. Amer Overview Part I Part II

Medicare and Medicaid Audit Sampling Strategies Sampling Strategies Creating Sampling Plans and

CS786 Lecture 13: May 14, 2012 Sampling techniques [KF Chapter 12] CS786 P. Poupart 2012 1

Double, Multiple, and Sequential Sampling Double-sampling In a double-sampling plan, a first

02 Sampling algorithms Shravan Vasishth SMLP Shravan Vasishth 02 Sampling algorithms SMLP 1 /

Random Sampling Benjamin Graham Office Hours: M 11:30-12:30, W 10:30-12:30 SSB 447 What is

Chromium stabilization of tannery sludge by co-treatment with ladle furnace slag E. Pantazopoulou

Environment and Natural Resources Trust Fund 2012-2013 Request for Proposals (RFP) 089-E2 ENRTF

Producing Biologics with C1 Jan 4, 2017 (OTCQX: DYAI) Safe Harbor Regarding Forward-Looking

students have already covered in class). In Class: Students are divided in to 4 teams, where each

A "big data" gaze at why electronic transactions and web-scraped data are no panacea

7/11/2017 Run Charts 1 7/11/2017 The Importance of Data within the BTS Each team will have

Fis ishery ry Data for Stock Assessment Working Group Rep eport Steve Cadrin (FDSAWG Chair),

Sambuz

Useful Links

Newsletter

Mail Us

Sampling Necessary? Dan Hedlin Department of Statistics, - PowerPoint PPT Presentation

Is Random Sampling Necessary? Dan Hedlin Department of Statistics, Stockholm University Focus on official statistics Trust is paramount (Holt 2008) Very wide group of users Official statistics is official Bias important Not

Sampling Methods Oliver Schulte - CMPT 419/726 Bishop PRML Ch. 11 Sampling Rejection Sampling

Chapter 7. Sampling Chapter 7. Sampling methods? methods? Two types of sampling methods Two

Multiple importance sampling Slides for CS6630 lecture 6 sampling the BRDF sampling the

What is the strengths and weakness of these sampling methods? Sampling Strengths /

Sampling Overview R toy sampling Non-probability sampling Probability Methods (AKA random)

Sampling Sediment and Sampling Sediment and Sampling Sediment and Porewater Sampling Sediment

Sampling Methods CMSC 678 UMBC Outline Recap Monte Carlo methods Sampling Techniques Uniform

Newfound Water Quality Sampling: In Lake Sampling 8 Historic Sampling locations

Sampling Distributions Sampling Distribution of the Mean &amp; Hypothesis Testing Sampling

Overview of Sampling Topics (Shannon) sampling theorem Impulse-train sampling

Faster Gaussian Lattice Sampling using Information Leakage Gaussian Sampling Our Work Lazy

Introduction to Sampling for Non-Statisticians Dr. Safaa R. Amer Overview Part I Part II

Medicare and Medicaid Audit Sampling Strategies Sampling Strategies Creating Sampling Plans and

CS786 Lecture 13: May 14, 2012 Sampling techniques [KF Chapter 12] CS786 P. Poupart 2012 1

Double, Multiple, and Sequential Sampling Double-sampling In a double-sampling plan, a first

02 Sampling algorithms Shravan Vasishth SMLP Shravan Vasishth 02 Sampling algorithms SMLP 1 /

Random Sampling Benjamin Graham Office Hours: M 11:30-12:30, W 10:30-12:30 SSB 447 What is

Chromium stabilization of tannery sludge by co-treatment with ladle furnace slag E. Pantazopoulou

Environment and Natural Resources Trust Fund 2012-2013 Request for Proposals (RFP) 089-E2 ENRTF

Producing Biologics with C1 Jan 4, 2017 (OTCQX: DYAI) Safe Harbor Regarding Forward-Looking

students have already covered in class). In Class: Students are divided in to 4 teams, where each

A &quot;big data&quot; gaze at why electronic transactions and web-scraped data are no panacea

7/11/2017 Run Charts 1 7/11/2017 The Importance of Data within the BTS Each team will have

Fis ishery ry Data for Stock Assessment Working Group Rep eport Steve Cadrin (FDSAWG Chair),

Sambuz

Useful Links

Newsletter

Mail Us

Sampling Distributions Sampling Distribution of the Mean & Hypothesis Testing Sampling

A "big data" gaze at why electronic transactions and web-scraped data are no panacea