Sampling Necessary? Dan Hedlin Department of Statistics, - - PowerPoint PPT Presentation

β–Ά
sampling
SMART_READER_LITE
LIVE PREVIEW

Sampling Necessary? Dan Hedlin Department of Statistics, - - PowerPoint PPT Presentation

Is Random Sampling Necessary? Dan Hedlin Department of Statistics, Stockholm University Focus on official statistics Trust is paramount (Holt 2008) Very wide group of users Official statistics is official Bias important Not


slide-1
SLIDE 1

Is Random Sampling Necessary?

Dan Hedlin Department of Statistics, Stockholm University

slide-2
SLIDE 2

Focus on official statistics

  • Trust is paramount (Holt 2008)
  • Very wide group of users
  • Official statistics is official
  • Bias important
  • Not as cost sensitive as market research
  • Strong emphasis on generality and precision (trade-
  • ff in model building between generality, realism

and precision, see Levins 1966 and Baker et al 2013 sec 8.2)

10/3/2015 NTTS 2015. Dan Hedlin, Stockholm University 2

slide-3
SLIDE 3

Trends

  • β€œWider, Deeper, Quicker, Better, Cheaper” (Holt 2007)
  • Increasing rates of nonresponse, hard to find, hard

to contact in the Western world

  • Expansion of data sources, data collection

methods; mixed modes

  • Expanding research, mostly application driven.

10/3/2015 NTTS 2015. Dan Hedlin, Stockholm University 3

slide-4
SLIDE 4

On bias

  • In official statistics, a minimum MSE estimator is

not necessarily desirable. Bias is not on the same footing as variance

  • Note that point estimates, not interval estimates, are

used

  • Bias is (potentially) worse than variance
  • Also an issue of trust

10/3/2015 NTTS 2015. Dan Hedlin, Stockholm University 4

slide-5
SLIDE 5

On balanced samples

  • What is balance?
  • Basically that this holds: π‘¦π‘˜

π‘œ = π‘¦π‘˜ 𝑂 𝑉 𝑑

for some number j (ignoring weights). (Valliant et al. 2000) β€œSample balance”

  • Or π’š

𝑑 = π’š 𝑠 β€œResponse set balance”

10/3/2015 NTTS 2015. Dan Hedlin, Stockholm University 5

slide-6
SLIDE 6

A balanced sample is good to have

  • 𝑧

𝑑 βˆ’ 𝑧 𝑠 = π’š 𝑑 βˆ’ π’š 𝑠 ´𝛄 𝑠 + 𝜸 𝑑 βˆ’ 𝜸 𝑠 Β΄π’šπ’•

(SΓ€rndal & Lundquist 2014)

  • Does small π’š

𝑑 βˆ’ π’š 𝑠 imply small 𝜸 𝑑 βˆ’ 𝜸 𝑠 ?

  • The answer is β€œprobably yes” (SΓ€rndal & Lundquist

2014, Sec. 6)

10/3/2015 NTTS 2015. Dan Hedlin, Stockholm University 6

slide-7
SLIDE 7
  • So what is the causal mechanism

small π’š 𝑑 βˆ’ π’š 𝑠 -> small 𝜸 𝑑 βˆ’ 𝜸 𝑠 ? Loosely speaking, it is: Small variance of response propensities (in groups defined by x)

  • Note that we know π’š

𝑑, π’š 𝑠 and that we can manipulate π’š 𝑠 by adaptive sampling

(Schouten et al. 2013, SΓ€rndal & Lundquist 2014)

10/3/2015 NTTS 2015. Dan Hedlin, Stockholm University 7

slide-8
SLIDE 8

Comparing three strategies

1.

Perfect frame + random sample + unknown response propensities + we strive for response set balance

2.

The same as 1 but with nonrandom sample

3.

The same as 1 but with a restricted frame. There is auxiliary data on the frame. Only deficiency is

  • undercoverage. E.g. large, β€œgood” web panel.

10/3/2015 NTTS 2015. Dan Hedlin, Stockholm University 8

slide-9
SLIDE 9

Scene

  • Specify 𝑔(𝐙|𝐘; 𝛄) for all variables Y for all N units. X

for sampling design end estimation.

1.

Analytic aim: inference about 𝛄

2.

Descriptive aim: inference about 𝐙𝒕

  • Some y used for post-stratification 𝐙 = π™π‘žπ‘π‘‘π‘’, 𝐙𝑛
  • For robustness of post-stratification to

nonresponse, see SΓ€rndal & LundstrΓΆm (2005)

  • We need 𝑔(𝐙𝑑

𝑛|𝐙𝑑 π‘žπ‘π‘‘π‘’, 𝐘; 𝛄) for inference about 𝐙𝒕

  • Further 𝑔(𝐙𝑑

𝑛|𝐙𝑑 π‘žπ‘π‘‘π‘’, 𝐚, 𝐘; 𝛄), Z indicates web panel

membership in Strategy 3

10/3/2015 NTTS 2015. Dan Hedlin, Stockholm University 9

slide-10
SLIDE 10

1.

Sample selection ignorability criterion: 𝑔 𝐉𝑑 𝐙𝑑, 𝐘 = 𝑔 𝐉𝑑 𝐙𝑑

π‘žπ‘π‘‘π‘’, 𝐘

  • True also for some nonrandom sampling designs,

e.g. sample balanced designs

(Little 1982, Smith 1983)

2.

To be able to ignore nonresponse: 𝑔 𝑲𝑠 𝐉𝑑, 𝐙𝑑, 𝐘 = 𝑔 𝑲𝑠 𝐉𝑑, 𝐙𝑑

π‘žπ‘π‘‘π‘’, 𝐘

3.

To be able to ignore web panel selection mechanism: 𝑔 𝐙𝑑

𝑛 𝐙𝑑 π‘žπ‘π‘‘π‘’, 𝐚, 𝐘; 𝛄 = 𝑔 𝐙𝑑 𝑛 𝐙𝑑 π‘žπ‘π‘‘π‘’, 𝐘; 𝛄

(Little 1982, Smith 1983, Valliant et al. 2003)

10/3/2015 NTTS 2015. Dan Hedlin, Stockholm University 10

slide-11
SLIDE 11
  • We restrict attention to ignorable sampling designs;

nonrandom samples must be ignorable

  • Hence it is Strategy 3 that is different.
  • Does criterion 3 hold in practice? E.g. SjΓΆstrΓΆm

(2012) found that sometimes it does, sometimes it does not. See also Baker et al. (2013).

  • Note also that Strategy 3 has to a some limited

extent always been in use in survey sampling, in particular in business surveys (cut-off sampling)

10/3/2015 NTTS 2015. Dan Hedlin, Stockholm University 11

slide-12
SLIDE 12

Further issues

  • Suppose you are successful in balancing the set of
  • responses. Does it matter whether you have started

from a random sample or a nonrandom, ignorable sample? It would seem that it does not.

  • A more practical issue: If you strive for balancing

the response set, is it easier to start from a random sample?

  • What is best, balancing response set or adjusting

through estimation? Some evidence that balancing is slightly better (Schouten et al. 2014)

10/3/2015 NTTS 2015. Dan Hedlin, Stockholm University 12

slide-13
SLIDE 13
  • Of course, there is a broader picture (Schouten et al.

2012)

10/3/2015 NTTS 2015. Dan Hedlin, Stockholm University 13

slide-14
SLIDE 14

References

  • Baker, R. et al. (2013). Report on the AAPOR task force on non-probability
  • sampling. American Association for Public Opinion Research.
  • Holt, D. (2007). The Official Statistics Olympic Challenge: Wider, Deeper, Quicker,

Better, Cheaper. (With discussion). The American Statistician, 61, 1-15.

  • Holt, D. (2008). Official statistics, public policy and public trust. Journal of the

Royal Statistical Society, Series A, 171, 1–20.

  • Levins, R. (1966). The strategy of model building in population biology. American

Scientist.

  • Little, R. J.A. (1982). Models for Nonresponse in Sample Surveys. Journal of the

American Statistical Association, 77, 237-250.

  • SΓ€rndal, C.-E. and Lundquist, P

. (2014). Accuracy in Estimation with Nonresponse: A Function of Degree of Imbalance and Degree of Explanation. Journal of Survey Statistics and Methodology, 1-27.

  • SΓ€rndal, C.-E. and LundstrΓΆm, S. (2012). Estimation in Surveys with
  • Nonresponse. New York: Wiley.

10/3/2015 NTTS 2015. Dan Hedlin, Stockholm University 14

slide-15
SLIDE 15
  • Schouten, B., Bethlehem, J., Beullens, K., Kleven, Ø., Loosvelt, G., Luiten, A., Rutar, K.,

Shlomo, N. and Skinner, C. (2012). Evaluating, Comparing, Monitoring, and Improving Representativeness of Survey Response Through R-Indicators and Partial R-Indicators. International Statistical Review, 80, 382-399.

  • Shouten, B., Calinescu, M. and Luiten, A. (2013). Optimizing quality of response through

adaptive survey designs. Survey Methodology, 39, 29-58.

  • Scouten, B., Cobben, F., Lundquist, P. and Wagner, J. (2014). Theoretical and Empirical

Support for Adjustment of Nonresponse by design. Discussion paper, 2014/15, Statistics Netherlands.

  • SjΓΆstrΓΆm, T. (2012). SjΓ€lvrekryterade jΓ€mfΓΆrt med slumpmΓ€ssigt rekryterade paneler.

Novus, Sweden. (in Swedish)

  • Smith, T.M.F. (1983). On the validity of inferences from non-random sample. Journal of

the Royal Statistical Society, Series A, 146, 394-403.

  • Valliant, R., Dorfman, A.H. and Royall, R.M. (2000). Finite Population Sampling and

Inference: A Prediction Approach. New York: Wiley.

10/3/2015 NTTS 2015. Dan Hedlin, Stockholm University 15