Using Hierarchical Models to Calibrate Selection Bias Douglas - - PowerPoint PPT Presentation

using hierarchical models to calibrate selection bias
SMART_READER_LITE
LIVE PREVIEW

Using Hierarchical Models to Calibrate Selection Bias Douglas - - PowerPoint PPT Presentation

Using Hierarchical Models to Calibrate Selection Bias Douglas Rivers Stanford University and YouGov February 26, 2016 Margins of error We agree that margin of sampling error in surveys has an accepted meaning and that this measure is not


slide-1
SLIDE 1

Using Hierarchical Models to Calibrate Selection Bias

Douglas Rivers Stanford University and YouGov February 26, 2016

slide-2
SLIDE 2

Margins of error

We agree that margin of sampling error in surveys has an accepted meaning and that this measure is not appropriate for non-probability samples. . . . We believe that users of non-probability samples should be encouraged to report measures of the precision of their estimates, but suggest that to avoid confusion, the set of terms be distinct from those currently used in probability sample surveys. AAPOR Report on Non-probability Sampling (2013)

slide-3
SLIDE 3

Is inference possible with unknown selection probabilities?

◮ It better be, since we certainly don’t know what the selection

probabilities are for most public opinion polls and market research surveys.

With single digit response rates, actual sample inclusion probabilities differ by two orders of magnitude from the initial unit selection probabilities.

◮ The usual approach is to assume ignorable selection

(conditional independence of selection and survey variables given a set of covariates).

Such inferences are made conditional upon the selection model, which is unlikely to hold exactly. Shouldn’t this be reflected somehow in the margin of error?

◮ Empirically, calculated standard errors in pre-election polls

substantially underestimate the RMSE.

Gelman, Goel and Rothschild (2016) find that the actual RMSE was understated between 25% and 50% (depending upon type of election) in 4,221 polls.

slide-4
SLIDE 4

Three questions

A 100(1 − α)% confidence interval for a descriptive population parameter ˆ θ0 is usually computed using ˆ θ ± z1−α/2 s.e.(ˆ θ) where ˆ θ is a sample mean or proportion, possibly weighted.

  • 1. Does ˆ

θ have a normal sampling distribution?

  • 2. Can we estimate s.e.(ˆ

θ) without knowing the selection probabilities?

  • 3. Is the sampling distribution of ˆ

θ centered on the population parameter θ0? If the answer to all three questions is “yes,” then the confidence interval will have the stated level of coverage.

slide-5
SLIDE 5
  • 1. Is ˆ

θ normally distributed?

Suppose {yi}N

i=1 is a bounded sequence of real numbers and

{Di}N

i=1 is a sequence of independent Bernoulli random variables

with E(Di) = πi. Let n =

N

  • i=1

Di ˆ θN = 1 nN

N

  • i=1

Diyi ¯ πN = 1 N

N

  • i=1

πi θ∗

N =

N

i=1 πiyi

N¯ π ω2

N =

1 N¯ π

N

  • i=1

πi(1 − πi)(yi − θ∗

N)2

If (i) limN→∞ ¯ πN = ¯ π where 0 < ¯ π < 1 (ii) limN→∞ θ∗

N = θ∗

(iii) limN→∞ ω2

N = ω2 where 0 < ω2 < ∞

then √n(ˆ θN − θ∗

N) L

− → N(0, ω2)

slide-6
SLIDE 6
  • 2. Can we estimate s.e.(ˆ

θ)?

Under the same assumptions as the preceding result,

  • s.e.(ˆ

θ) =

  • 1

n

  • i∈s

(yi − ˆ θ)2 1/2 is a conservative estimator of s.e.(ˆ θ) with asymptotic bias O(¯ πN). This also works with weighting, except that yi is replaced everywhere by wi yi (where wi is the weight) and

  • s.e.(ˆ

θ) =

  • i∈s w2

i (yi − ˆ

θ)2 n ¯ w2 1/2 Independence of the draws is enough. You don’t need to know the selection probabilities.

slide-7
SLIDE 7
  • 3. Is the distribution of ˆ

θ centered on θ0?

Unfortunately, no. The sampling distribution of ˆ θ is approximately ˆ θ a ∼ N

  • θ∗

N, ω2 N

n

  • so the confidence interval

ˆ θ ± z1−α/2 s.e.(ˆ θ) is shifted to the right by the quantity Bias(ˆ θ) . = θ∗

N − θ0

The margin of error has approximately correct coverage for θ∗

N.

The interval is still useful for quantifying sampling error (how much variation could be expected from selecting another sample using the same process), but actual coverage for the population parameter is overstated (sometimes by a lot).

slide-8
SLIDE 8

Post-stratification to correct for selection bias

Bias can be eliminated if we can identify a set of covariates that make selection conditionally independent of the survey variables. The conditional independence (ignorability) assumption is more plausible if the number of covariates is large. However, post-stratification involves a bias-variance tradeoff. Post-stratifying on a large number of variables is a form of

  • ver-fitting which, while it may reduce bias, can increase the mean

square error by inflating the variance. MSE(ˆ θ) = Bias2(ˆ θ) + V(ˆ θ) Is the problem bias or variance?

slide-9
SLIDE 9

Data

Seven opt-in internet surveys, a probability internet panel study, and an RDD phone survey fielded almost identical questionnaires in 2004-05, including eight items also included in the 2004 American Community Survey (ACS). Primary demographics. Gender, age, race, education. Secondary demographics. Marital status, home ownership, number of bedrooms, number of vehicles. Six of the opt-in surveys used online panels, while one (SPSS-AOL) used a “river sample.” All of the opt-in samples used some form of quota sampling on gender, sometimes on age and/or region, and only one on race. The probability internet panel (KN) uses purposive sampling for within-panel selection, while it appears that the phone survey may have used gender quotas. Only one of the opt-in survey vendors (Harris Interactive) provided post-stratification weights.

slide-10
SLIDE 10

Parallel estimates with different post-stratification schemes

Phone (SRBI)

Number of raking variables Error (Percent)

  • 1

2 3 4 5 −8 −6 −4 −2 2 4 6 8

Probability Web (KN)

Number of raking variables Error (Percent)

  • 1

2 3 4 5 −8 −6 −4 −2 2 4 6 8

  • 1. Harris

Number of raking variables Error (Percent)

  • 1

2 3 4 5 −8 −6 −4 −2 2 4 6 8

  • 2. Luth

Number of raking variables Error (Percent)

  • 1

2 3 4 5 −8 −6 −4 −2 2 4 6 8

  • 3. Greenfield

Number of raking variables Error (Percent)

  • 1

2 3 4 5 −8 −6 −4 −2 2 4 6 8

  • 4. SSI

Number of raking variables Error (Percent)

  • 1

2 3 4 5 −8 −6 −4 −2 2 4 6 8

  • 5. Survey Direct

Number of raking variables Error (Percent)

  • 1

2 3 4 5 −8 −6 −4 −2 2 4 6 8

  • 6. SPSS/AOL

Number of raking variables Error (Percent)

  • 1

2 3 4 5 −8 −6 −4 −2 2 4 6 8

  • 7. GoZing

Number of raking variables Error (Percent)

  • 1

2 3 4 5 −20 −15 −10 −5 5 10

slide-11
SLIDE 11

95% confidence intervals for estimates

Phone (SRBI)

Number of raking variables Error (Percent)

  • 1

2 3 4 5 −8 −6 −4 −2 2 4 6 8

Probability Web (KN)

Number of raking variables Error (Percent)

  • 1

2 3 4 5 −8 −6 −4 −2 2 4 6 8

  • 1. Harris

Number of raking variables Error (Percent)

  • 1

2 3 4 5 −8 −6 −4 −2 2 4 6 8

  • 2. Luth

Number of raking variables Error (Percent)

  • 1

2 3 4 5 −8 −6 −4 −2 2 4 6 8

  • 3. Greenfield

Number of raking variables Error (Percent)

  • 1

2 3 4 5 −8 −6 −4 −2 2 4 6 8

  • 4. SSI

Number of raking variables Error (Percent)

  • 1

2 3 4 5 −8 −6 −4 −2 2 4 6 8

  • 5. Survey Direct

Number of raking variables Error (Percent)

  • 1

2 3 4 5 −8 −6 −4 −2 2 4 6 8

  • 6. SPSS/AOL

Number of raking variables Error (Percent)

  • 1

2 3 4 5 −8 −6 −4 −2 2 4 6 8

  • 7. GoZing

Number of raking variables Error (Percent)

  • 1

2 3 4 5 −20 −15 −10 −5 5 10

slide-12
SLIDE 12

Variance inflation caused by post-stratification

Effects of Weighting on Standard Errors

Weighting variables S.E. inflation (percent) Gender Gender + Age Gender + Age + Race Gender + Age + Race + Educ Gender + Age+ Race + Educ + Region 50 100 150 200 250

slide-13
SLIDE 13

Testing for nonignorable selection bias

Problem: It is difficult to distinguish between sampling variability and selection bias in the full sample. Idea: Define cells by post-stratifying into 96 cells based on primary demographics (2 gender × 4 age × 3 race × 4 education factors) and compute the error in each cell for the four secondary demographics and compare to the expected sampling error if there is no selection bias. The ACS has a high response rate and large sample (nearly one million persons) provides reasonably accurate estimates for the population proportions in each cell.

slide-14
SLIDE 14

Notation

x = covariates with finite support X y = survey variable (assumed to be dichotomous) p0(x) = population distribution of x q0(x) = P{y = 1|x} = population conditional distribution of y πy(x) = average sample selection probability for (x, y) units π(x) = q0(x)π1(x) + [1 − q0(x)]π0(x) = selection probability in x ˆ p(x) = sample proportion in cell x p∗(x) = E[ˆ p(x)] = p0(x)π(x)/

  • x′∈X

p0(x′)π(x′) ˆ q(x) = sample proportion y = 1 in cell x q∗(x) = E[ˆ q(x)] = q1(x)π1(x)/[q0(x)π0(x) + q1(x)π1(x)]

slide-15
SLIDE 15

Sources of error

Estimation error has three components: ˆ θ − θ0 =

  • x∈X

[ˆ p(x) − p∗(x)]ˆ q(x) + p∗(x)[ˆ q(x) − q∗(x)] sampling +

  • x∈X

[p∗(x) − p0(x)]q∗(x) post-stratification +

  • x∈X

p0(x)[q∗(x) − q0(x)] selection bias Post-stratification error can be eliminated by weighting the

  • bservations in cell x by the ratio p0(x)/ˆ

p(x). We wish to test whether the last component (selection bias) is present: q∗(x) = q0(x) for x ∈ X .

slide-16
SLIDE 16

A chi-squared test for selection bias

Let nx denote the sample count in cell x. Conditional upon {nx : x ∈ X }, the statistic X 2 =

  • nx>0

ˆ q(x) − q0(x) s.e.(ˆ q(x) 2 has expected value J (where J is the number of cells for which nx > 0) if there is no selection bias. When there is selection bias, then X 2/J will tend to be large. A rough test of the hypothesis compares X 2 to a chi-square distribution with J degrees of freedom.

slide-17
SLIDE 17

Results

Phone (SRBI)

Number of raking variables Chi−squared / degrees of freedom 1 2 3 4 5 5 10 15 20 25

  • Probability Web (KN)

Number of raking variables Chi−squared / degrees of freedom 1 2 3 4 5 5 10 15 20 25

  • 1. Harris

Number of raking variables Chi−squared / degrees of freedom 1 2 3 4 5 5 10 15 20 25

  • 2. Luth

Number of raking variables Chi−squared / degrees of freedom 1 2 3 4 5 5 10 15 20 25

  • 3. Greenfield

Number of raking variables Chi−squared / degrees of freedom 1 2 3 4 5 5 10 15 20 25

  • 4. SSI

Number of raking variables Chi−squared / degrees of freedom 1 2 3 4 5 5 10 15 20 25

  • 5. Survey Direct

Number of raking variables Chi−squared / degrees of freedom 1 2 3 4 5 5 10 15 20 25

  • 6. SPSS/AOL

Number of raking variables Chi−squared / degrees of freedom 1 2 3 4 5 5 10 15 20 25

  • 7. GoZing

Number of raking variables Chi−squared / degrees of freedom 1 2 3 4 5 5 10 15 20 25

slide-18
SLIDE 18

Estimating the magnitude of selection bias

Although the amount of selection bias is not particularly large, we can still reject the hypothesis of no selection bias. We model the bias as a random variable with a multilevel variance structure. We have S +1 surveys, denoted by s = 0, 1, . . . , S (s = 0 for ACS). Sampling model: yi,s|xi ∼ Bernoulli(q∗

s (x))

First level: logit(q∗

s (x)) ∼ Normal(µ0(x) + ηs, τ 2 x + σ2 s )

Second level: µ0(x) ∼ Normal(Z T

x α, ω2)

Z T

x α = αgender + αage + αrace + αeducation

We use diffuse normal priors for the means and half-Cauchy priors (with scale 5) for the variances. We impose the restriction that η0 (the bias in ACS) is zero.

slide-19
SLIDE 19

Multilevel model estimates

Estimated Bias

Bias (Percent) Phone (SRBI) Probability Web (KN)

  • 1. Harris
  • 2. Luth
  • 3. Greenfield
  • 4. SSI
  • 5. Survey Direct
  • 6. SPSS/AOL
  • 7. GoZing

−6 −5 −4 −3 −2 −1 1 2 3 4

slide-20
SLIDE 20

Discussion

◮ Model estimated separately for each survey variable.

Alternatively, estimate the model for a set of variables and use the predictive distribution for another variable or survey to estimate how much the MSE exceeds the variance of the estimate.

◮ Current data limited by items correlated with income and

family size, which should be used as covariates.

◮ Little evidence of substantial selection bias—for most of the

surveys, selection bias adds 1–2% to the standard error.

◮ Easy improvements are possible in panel surveys by selecting a

balanced sample, reducing variability. The good performance

  • f the probability panel is due primarily to balanced

within-panel selection, not probabilistic recruitment.