Robust and Efficient Methods of Inference for Non-Probability - - PowerPoint PPT Presentation

robust and efficient methods of inference for non
SMART_READER_LITE
LIVE PREVIEW

Robust and Efficient Methods of Inference for Non-Probability - - PowerPoint PPT Presentation

Robust and Efficient Methods of Inference for Non-Probability Samples: Application to Naturalistic Driving Data Ali Rafei 1 , Michael R. Elliott 1 , Carol A.C. Flannagan 2 1 Michigan Program in Survey Methodology 2 University of Michigan


slide-1
SLIDE 1

Robust and Efficient Methods of Inference for Non-Probability Samples: Application to Naturalistic Driving Data

Ali Rafei1, Michael R. Elliott1, Carol A.C. Flannagan2

1Michigan Program in Survey Methodology 2University of Michigan Transportation Research institute

JPSM/MPSM Seminar 2020

September 30

Rafei, Ali (MPSM) Robust Inference for Non-Probability Samples JSM 2020 1 / 35

slide-2
SLIDE 2

Problem statement

Probability sampling is the gold standard for finite population inference. The 21st century witnesses re-emerging non-probability sampling.

1

The response rate is steadily declining.

2

Massive unstructured data are increasingly available.

3

Convenience samples are easier, cheaper and faster to collect.

4

Rare events, such as crashes, require long-term followup.

Rafei, Ali (MPSM) Robust Inference for Non-Probability Samples JSM 2020 2 / 35

slide-3
SLIDE 3

Naturalistic Driving Studies (NDS)

One real-world application of sensor-based Big Data. Driving behaviors are monitored via instrumented vehicles. A rich resource for exploring crash causality, traffic safety, and travel dynamics.

NDS

VEHICLE TRIP DRIVER EVENT

Rafei, Ali (MPSM) Robust Inference for Non-Probability Samples JSM 2020 3 / 35

slide-4
SLIDE 4

Strategic Highway Research Program 2

Launched in 2010, SHRP2 is the largest NDS conducted to date. Participants were ∼3,150 volunteers from six sites across the U.S. ∼5M trips & ∼50M driven miles were recorded. (Trip? time interval during which vehicle is on) Major challenges:

1

SHRP2 is a non-probability sample.

2

Youngest/eldest groups were oversampled.

3

Only six sites have been studied.

Rafei, Ali (MPSM) Robust Inference for Non-Probability Samples JSM 2020 4 / 35

slide-5
SLIDE 5

Strategic Highway Research Program 2

Launched in 2010, SHRP2 is the largest NDS conducted to date. Participants were ∼3,150 volunteers from six sites across the U.S. ∼5M trips & ∼50M driven miles were recorded. (Trip? time interval during which vehicle is on) Major challenges:

1

SHRP2 is a non-probability sample.

2

Youngest/eldest groups were oversampled.

3

Only six sites have been studied.

Rafei, Ali (MPSM) Robust Inference for Non-Probability Samples JSM 2020 4 / 35

slide-6
SLIDE 6

Strategic Highway Research Program 2

Launched in 2010, SHRP2 is the largest NDS conducted to date. Participants were ∼3,150 volunteers from six sites across the U.S. ∼5M trips & ∼50M driven miles were recorded. (Trip? time interval during which vehicle is on) Major challenges:

1

SHRP2 is a non-probability sample.

2

Youngest/eldest groups were oversampled.

3

Only six sites have been studied.

37.2 14.1 7.4 9.3 7.6 10.6 13.8 11.5 17.1 19.3 18.4 18.1 10.9 4.7

10 20 30 40 50 15−24 25−34 35−44 45−54 55−64 65−74 75+

percent (%) Study

SHRP2 US Pop

Participants' age group (yrs)

4.2 14.3 5.3 33 43.1 27.8 9.8 10.6 9.1 42.6

10 20 30 40 50 <50 50−200 200−500 500−1000 1000+

percent (%) Study

SHRP2 NHTS

Population size of resid. area (x1000)

Rafei, Ali (MPSM) Robust Inference for Non-Probability Samples JSM 2020 4 / 35

slide-7
SLIDE 7

Basic framework

Let’s define the following notations:

1

B: Big non-probability sample

2

R: Reference survey

3

X: Set of common auxiliary vars

4

Y : Outcome var of interest

5

Z: Indicator of being in B

Considering MAR+positivity assumptions given X:

1

Quasi-randomization (QR): Estimating pseudo-inclusion probabilities (πB) in B

2

Prediction modeling (PM): Predicting the outcome var (Y ) for units in R

3

Doubly robust Adjustment (DR): Combining the two to further protect against model misspecification

Let combine B with R and define Z = I(i ∈ B).

Combined sample

? ? ?

. . 1 1 . . . . . . 1

Y B R 𝜌 X Z

? ? . . . . ?

Rafei, Ali (MPSM) Robust Inference for Non-Probability Samples JSM 2020 5 / 35

slide-8
SLIDE 8

Basic framework

Let’s define the following notations:

1

B: Big non-probability sample

2

R: Reference survey

3

X: Set of common auxiliary vars

4

Y : Outcome var of interest

5

Z: Indicator of being in B

Considering MAR+positivity assumptions given X:

1

Quasi-randomization (QR): Estimating pseudo-inclusion probabilities (πB) in B

2

Prediction modeling (PM): Predicting the outcome var (Y ) for units in R

3

Doubly robust Adjustment (DR): Combining the two to further protect against model misspecification

Let combine B with R and define Z = I(i ∈ B).

Combined sample

? ? ?

. . 1 1 . . . . . . 1

Y B R 𝜌 X Z

Rafei, Ali (MPSM) Robust Inference for Non-Probability Samples JSM 2020 5 / 35

slide-9
SLIDE 9

Basic framework

Let’s define the following notations:

1

B: Big non-probability sample

2

R: Reference survey

3

X: Set of common auxiliary vars

4

Y : Outcome var of interest

5

Z: Indicator of being in B

Considering MAR+positivity assumptions given X:

1

Quasi-randomization (QR): Estimating pseudo-inclusion probabilities (πB) in B

2

Prediction modeling (PM): Predicting the outcome var (Y ) for units in R

3

Doubly robust Adjustment (DR): Combining the two to further protect against model misspecification

Let combine B with R and define Z = I(i ∈ B).

Combined sample

. . 1 1 . . . . . . 1

Y B R 𝜌 X Z

? ? . . . . ?

Rafei, Ali (MPSM) Robust Inference for Non-Probability Samples JSM 2020 5 / 35

slide-10
SLIDE 10

Basic framework

Let’s define the following notations:

1

B: Big non-probability sample

2

R: Reference survey

3

X: Set of common auxiliary vars

4

Y : Outcome var of interest

5

Z: Indicator of being in B

Considering MAR+positivity assumptions given X:

1

Quasi-randomization (QR): Estimating pseudo-inclusion probabilities (πB) in B

2

Prediction modeling (PM): Predicting the outcome var (Y ) for units in R

3

Doubly robust Adjustment (DR): Combining the two to further protect against model misspecification

Let combine B with R and define Z = I(i ∈ B).

Combined sample

. . 1 1 . . . . . . 1

Y B R 𝜌 X Z

Rafei, Ali (MPSM) Robust Inference for Non-Probability Samples JSM 2020 5 / 35

slide-11
SLIDE 11

Quasi-randomization

Traditionally, propensity scores are used to estimate pseudo-weights (Lee 2006).

PS weighting when R is epsem:

¯ yPW = 1 N

nB

i=1

yi πB(xi) where under a logistic regression model, we have πB(xi) ∝ pi(β) = P(Zi = 1|xi; β) = exp{xT

i β}

1 + exp{xT

i β}, ∀i ∈ B

When R is NOT epsem, β can be estimated through a PMLE approach by solving:

1

∑i∈B xi[1 − pi(β)] − ∑i∈R xipi(β)/πR

i = 0 (odds of PS) (Wang et al. 2020)

2

∑i∈B xi − ∑i∈R xipi(β)/πR

i = 0 (Chen et al. 2019)

3

∑i∈B xi/pi(β) − ∑i∈R xi/πR

i = 0 (Kim 2020)

Rafei, Ali (MPSM) Robust Inference for Non-Probability Samples JSM 2020 6 / 35

slide-12
SLIDE 12

Quasi-randomization

Traditionally, propensity scores are used to estimate pseudo-weights (Lee 2006).

PS weighting when R is epsem:

¯ yPW = 1 N

nB

i=1

yi πB(xi) where under a logistic regression model, we have πB(xi) ∝ pi(β) = P(Zi = 1|xi; β) = exp{xT

i β}

1 + exp{xT

i β}, ∀i ∈ B

When R is NOT epsem, β can be estimated through a PMLE approach by solving:

1

∑i∈B xi[1 − pi(β)] − ∑i∈R xipi(β)/πR

i = 0 (odds of PS) (Wang et al. 2020)

2

∑i∈B xi − ∑i∈R xipi(β)/πR

i = 0 (Chen et al. 2019)

3

∑i∈B xi/pi(β) − ∑i∈R xi/πR

i = 0 (Kim 2020)

Rafei, Ali (MPSM) Robust Inference for Non-Probability Samples JSM 2020 6 / 35

slide-13
SLIDE 13

Quasi-randomization

However, the PMLE approach is limited to the parametric models. One may be interested in applying more flexible non-parametric methods. Denote δi = δB

i + δR i . With an additional assumption B ∩ R = ∅, one can show

πB

i = P(δB i = 1|xi, πR i ) = P(δi = 1|xi, πR i )P(Zi = 1|xi, πR i )

πR

i = P(δR i = 1|xi, πR i ) = P(δi = 1|xi, πR i )P(Zi = 0|xi, πR i )

Propensity Adjusted Probability weighting (PAPW):

πB

i (x∗ i ; β∗) = πR i

pi(β∗) 1 − pi(β∗), ∀i ∈ B where x∗

i = [xi, πR i ], and β∗ can be estimated through the regular MLE.

This is especially advantageous when applying a broader range of predictive methods.

Rafei, Ali (MPSM) Robust Inference for Non-Probability Samples JSM 2020 7 / 35

slide-14
SLIDE 14

Quasi-randomization

However, the PMLE approach is limited to the parametric models. One may be interested in applying more flexible non-parametric methods. Denote δi = δB

i + δR i . With an additional assumption B ∩ R = ∅, one can show

πB

i = P(δB i = 1|xi, πR i ) = P(δi = 1|xi, πR i )P(Zi = 1|xi, πR i )

πR

i = P(δR i = 1|xi, πR i ) = P(δi = 1|xi, πR i )P(Zi = 0|xi, πR i )

Propensity Adjusted Probability weighting (PAPW):

πB

i (x∗ i ; β∗) = πR i

pi(β∗) 1 − pi(β∗), ∀i ∈ B where x∗

i = [xi, πR i ], and β∗ can be estimated through the regular MLE.

This is especially advantageous when applying a broader range of predictive methods.

Rafei, Ali (MPSM) Robust Inference for Non-Probability Samples JSM 2020 7 / 35

slide-15
SLIDE 15

Quasi-randomization

However, the PMLE approach is limited to the parametric models. One may be interested in applying more flexible non-parametric methods. Denote δi = δB

i + δR i . With an additional assumption B ∩ R = ∅, one can show

πB

i = P(δB i = 1|xi, πR i ) = P(δi = 1|xi, πR i )P(Zi = 1|xi, πR i )

πR

i = P(δR i = 1|xi, πR i ) = P(δi = 1|xi, πR i )P(Zi = 0|xi, πR i )

Propensity Adjusted Probability weighting (PAPW):

πB

i (x∗ i ; β∗) = πR i

pi(β∗) 1 − pi(β∗), ∀i ∈ B where x∗

i = [xi, πR i ], and β∗ can be estimated through the regular MLE.

This is especially advantageous when applying a broader range of predictive methods.

Rafei, Ali (MPSM) Robust Inference for Non-Probability Samples JSM 2020 7 / 35

slide-16
SLIDE 16

Quasi-randomization

Under certain regularity conditions, one can prove that ˆ ¯ yPW = ¯ yU + Op(n−1/2

B

). When πR

i is unknown for i ∈ B, Elliott & Valliant (2017) show that

Propensity Adjusted Probability Prediction (PAPP):

πB

i (xi; β, γ) = P(δR i = 1|xi; γ)

pi(β) 1 − pi(β), ∀i ∈ B where γ is the vector of parameters in modeling δR

i on xi.

To predict P(δR

i = 1|xi; γ) for i ∈ B, one can model πR i on xi instead of δR i because

P(δR

i = 1|xi) =

1

0 P(δR i = 1|πR i , xi)P(πR i |xi)dπR i

=

1

0 πR i P(πR i |xi)dπR i = E(πR i |xi)

i ∈ R

Rafei, Ali (MPSM) Robust Inference for Non-Probability Samples JSM 2020 8 / 35

slide-17
SLIDE 17

Quasi-randomization

Under certain regularity conditions, one can prove that ˆ ¯ yPW = ¯ yU + Op(n−1/2

B

). When πR

i is unknown for i ∈ B, Elliott & Valliant (2017) show that

Propensity Adjusted Probability Prediction (PAPP):

πB

i (xi; β, γ) = P(δR i = 1|xi; γ)

pi(β) 1 − pi(β), ∀i ∈ B where γ is the vector of parameters in modeling δR

i on xi.

To predict P(δR

i = 1|xi; γ) for i ∈ B, one can model πR i on xi instead of δR i because

P(δR

i = 1|xi) =

1

0 P(δR i = 1|πR i , xi)P(πR i |xi)dπR i

=

1

0 πR i P(πR i |xi)dπR i = E(πR i |xi)

i ∈ R

Rafei, Ali (MPSM) Robust Inference for Non-Probability Samples JSM 2020 8 / 35

slide-18
SLIDE 18

Doubly robust adjustment

Augmented Inverse Propensity weighting (AIPW) was proposed by Robins et al (1994).

Chen et al (2019) extend AIPW to a non-probability sample setting

ˆ ¯ yDR = 1 N

nB

i=1

{yi − m(xi; θ)} πB

i (xi; β)

+ 1 N

nR

j=1

m(xj; θ) πR

j

where m(.) is a continuous differentiable function w.r.t. θ. Parameteres η = (β, θ) are estimated by simultaneously solving (Kim & Haziza 2014): ∂ ∂β [ ¯ yDR − ¯ yU] = 1 N

N

i=1

δB

i

  • 1

πB

i (xi; β) − 1

  • {yi − m(xi; θ)}xi = 0

∂ ∂θ [ ¯ yDR − ¯ yU] = 1 N

N

i=1

δB

i

πB(xi; β) ˙ m(xi; θ) −

nR

i=1

˙ m(xi; θ) πR

i

= 0

Rafei, Ali (MPSM) Robust Inference for Non-Probability Samples JSM 2020 9 / 35

slide-19
SLIDE 19

Doubly robust adjustment

Augmented Inverse Propensity weighting (AIPW) was proposed by Robins et al (1994).

Chen et al (2019) extend AIPW to a non-probability sample setting

ˆ ¯ yDR = 1 N

nB

i=1

{yi − m(xi; θ)} πB

i (xi; β)

+ 1 N

nR

j=1

m(xj; θ) πR

j

where m(.) is a continuous differentiable function w.r.t. θ. Parameteres η = (β, θ) are estimated by simultaneously solving (Kim & Haziza 2014): ∂ ∂β [ ¯ yDR − ¯ yU] = 1 N

N

i=1

δB

i

  • 1

πB

i (xi; β) − 1

  • {yi − m(xi; θ)}xi = 0

∂ ∂θ [ ¯ yDR − ¯ yU] = 1 N

N

i=1

δB

i

πB(xi; β) ˙ m(xi; θ) −

nR

i=1

˙ m(xi; θ) πR

i

= 0

Rafei, Ali (MPSM) Robust Inference for Non-Probability Samples JSM 2020 9 / 35

slide-20
SLIDE 20

Adjusted DR estimator

However, if both QR and PM are incorrectly specified, the estimates are still biased. To avoid using PMLE, we recommend using PAPW/PAPP approach for predicting πB

i .

Proposed AIPW estimator when πR

i is calculable for i ∈ B:

¯ yDR = 1 N

nB

i=1

1 πR

i

1 − pi(β∗) pi(β∗)

  • {yi − m(x∗

i ; θ∗)} + 1

N

nR

j=1

m(x∗

j ; θ∗)

πR

j

where θ∗ is the vector of parameters associated with x∗

i = [xi, πR i ].

Assuming that yi is observed for i ∈ R, denote ¯ yR = N−1 ∑nR

i=1 yi/πR i . We have

¯ yDR − ¯ yR = 1 N

n

i=1

1 πR

i

  • Zi

pi(β∗) − 1 yi − m(x∗

i ; θ∗)

  • which is identical to what Kim & Haziza (2014) derived for incomplete data inference.

Rafei, Ali (MPSM) Robust Inference for Non-Probability Samples JSM 2020 10 / 35

slide-21
SLIDE 21

Adjusted DR estimator

However, if both QR and PM are incorrectly specified, the estimates are still biased. To avoid using PMLE, we recommend using PAPW/PAPP approach for predicting πB

i .

Proposed AIPW estimator when πR

i is calculable for i ∈ B:

¯ yDR = 1 N

nB

i=1

1 πR

i

1 − pi(β∗) pi(β∗)

  • {yi − m(x∗

i ; θ∗)} + 1

N

nR

j=1

m(x∗

j ; θ∗)

πR

j

where θ∗ is the vector of parameters associated with x∗

i = [xi, πR i ].

Assuming that yi is observed for i ∈ R, denote ¯ yR = N−1 ∑nR

i=1 yi/πR i . We have

¯ yDR − ¯ yR = 1 N

n

i=1

1 πR

i

  • Zi

pi(β∗) − 1 yi − m(x∗

i ; θ∗)

  • which is identical to what Kim & Haziza (2014) derived for incomplete data inference.

Rafei, Ali (MPSM) Robust Inference for Non-Probability Samples JSM 2020 10 / 35

slide-22
SLIDE 22

Adjusted DR estimator

Therefore, under GLM, we recommend estimating η∗ = (β∗, θ∗) by solving: ∂ ∂β∗ [ ¯ yDR − ¯ yR] = 1 N

n

i=1

Zi πR

i

  • 1

pi(β∗) − 1

  • {yi − m(x∗

i ; θ∗)}x∗ i = 0

∂ ∂θ∗ [ ¯ yDR − ¯ yR] = 1 N

n

i=1

1 πR

i

  • Zi

pi(β∗) − 1

  • ˙

m(x∗

i ; θ∗) = 0

Under some regularity conditions, one can prove that ˆ ¯ yDR = ¯ yDR + Op(n−1/2). Note that using πR

i as a predictor in m(.) further weakens the modeling assumption.

Proposed AIPW estimator when πR

i is unknown for i ∈ B:

¯ yDR = 1 N

nB

i=1

1 πR

i (xi; γ)

1 − pi(β) pi(β)

  • {yi − m(xi; θ)} + 1

N

nR

j=1

m(xj; θ) πR

j

Rafei, Ali (MPSM) Robust Inference for Non-Probability Samples JSM 2020 11 / 35

slide-23
SLIDE 23

Adjusted DR estimator

Therefore, under GLM, we recommend estimating η∗ = (β∗, θ∗) by solving: ∂ ∂β∗ [ ¯ yDR − ¯ yR] = 1 N

n

i=1

Zi πR

i

  • 1

pi(β∗) − 1

  • {yi − m(x∗

i ; θ∗)}x∗ i = 0

∂ ∂θ∗ [ ¯ yDR − ¯ yR] = 1 N

n

i=1

1 πR

i

  • Zi

pi(β∗) − 1

  • ˙

m(x∗

i ; θ∗) = 0

Under some regularity conditions, one can prove that ˆ ¯ yDR = ¯ yDR + Op(n−1/2). Note that using πR

i as a predictor in m(.) further weakens the modeling assumption.

Proposed AIPW estimator when πR

i is unknown for i ∈ B:

¯ yDR = 1 N

nB

i=1

1 πR

i (xi; γ)

1 − pi(β) pi(β)

  • {yi − m(xi; θ)} + 1

N

nR

j=1

m(xj; θ) πR

j

Rafei, Ali (MPSM) Robust Inference for Non-Probability Samples JSM 2020 11 / 35

slide-24
SLIDE 24

Bayesian Additive Regression Trees (BART)

BART is a flexible sum-of-trees regression method (Chipman et al 2010).

BART structure:

yi =

m

j=1

f (xi, Tj, Mj) + ǫi where ǫi ∼ N(0, σ2) and Tj is the jth tree with Mj being terminal node parameters.

Rafei, Ali (MPSM) Robust Inference for Non-Probability Samples JSM 2020 12 / 35

slide-25
SLIDE 25

Bayesian Additive Regression Trees (BART)

BART is a flexible sum-of-trees regression method (Chipman et al 2010).

BART structure:

yi =

m

j=1

f (xi, Tj, Mj) + ǫi where ǫi ∼ N(0, σ2) and Tj is the jth tree with Mj being terminal node parameters. BART is Bayesian assigning prior distributions to T (length & decision rules), M, and σ. Considering independent structure between trees: p[(T1, M1), ..., (Tm, Mm), σ−2] = [

m

j=1

{

bj

i=1

P(µij|Tj)}P(Tj)]P(σ−2) Given the data, posterior distribution is simulated using a backfitting MCMC method.

Rafei, Ali (MPSM) Robust Inference for Non-Probability Samples JSM 2020 12 / 35

slide-26
SLIDE 26

Bayesian Additive Regression Trees (BART)

Advantages of BART: automatic variable selection, quantifying uncertainty using PPD. For a binary outcome, BART uses a data augmentation approach to transform Y into R.

Extending the modified DR method using BART:

log( πR

i

1 − πR

i

) = k(xi) + ǫi, Φ−1[P(Zi = 1|xi)] = h(xi), yi = f (xi) + ǫi For a given MCMC draw, m (m = 1, 2, ..., M), we have ˆ ¯ y (m)

DR =

1 ˆ NB

nB

i=1

1 + exp[ ˆ k(m)(xi)] exp[ ˆ k(m)(xi)] 1 − Φ[ˆ h(m)(xi)] Φ[ˆ h(m)(xi)]

  • yi − ˆ

f (m)(xi) + 1 ˆ NR

nR

j=1

ˆ f (m)(xj) πR

j

Final AIPW estimator under BART: ˆ ¯ yDR = 1 M

M

m=1

ˆ ¯ y (m)

DR

Rafei, Ali (MPSM) Robust Inference for Non-Probability Samples JSM 2020 13 / 35

slide-27
SLIDE 27

Variance estimation

To estimate variance, one has to incorporate uncertainty due to sampling, imputing pseudo-weights, and predicting the outcome. Two methods are proposed: Asymptotic variance estimator when πR

i is known for i ∈ B

For pseudo-weighting approach based on PAPW:

  • Var (ˆ

¯ yPW ) = 1 N2

nB

i=1

  • 1 − ˆ

πB

i

  • yi − ˆ

¯ yPW ˆ πB

i

2 − 2 ˆ bT N2

nB

i=1

  • 1 − pi( ˆ

β1)

  • yi − ˆ

¯ yPW ˆ πB

i

  • xi + ˆ

bT

  • 1

N2

n

i=1

pi( ˆ β1)xixT

i

  • ˆ

b

where ˆ

bT =

  • 1

N ∑nB i=1

  • yi −ˆ

¯ yPW ˆ πB

i

  • xT

i

  • 1

N ∑n i=1 pi( ˆ

β1)xixT

i

−1

Rafei, Ali (MPSM) Robust Inference for Non-Probability Samples JSM 2020 14 / 35

slide-28
SLIDE 28

Variance estimation

To estimate variance, one has to incorporate uncertainty due to sampling, imputing pseudo-weights, and predicting the outcome. Two methods are proposed: Asymptotic variance estimator when πR

i is known for i ∈ B

For pseudo-weighting approach based on PAPW:

  • Var (ˆ

¯ yPW ) = 1 N2

nB

i=1

  • 1 − ˆ

πB

i

  • yi − ˆ

¯ yPW ˆ πB

i

2 − 2 ˆ bT N2

nB

i=1

  • 1 − pi( ˆ

β1)

  • yi − ˆ

¯ yPW ˆ πB

i

  • xi + ˆ

bT

  • 1

N2

n

i=1

pi( ˆ β1)xixT

i

  • ˆ

b

where ˆ

bT =

  • 1

N ∑nB i=1

  • yi −ˆ

¯ yPW ˆ πB

i

  • xT

i

  • 1

N ∑n i=1 pi( ˆ

β1)xixT

i

−1

For the modified AIPW estimator (Chen et al 2019):

  • Var(ˆ

¯ yDR) = ˆ V1 + ˆ V2 − ˆ B( ˆ V2) where

ˆ V1 = Var(ˆ ¯ yPM), ˆ V2 = 1 N2

nB

i=1

  • 1 − ˆ

πB

i

( ˆ πB

i )2

  • {yi − m(x∗

i ; ˆ

θ1)}2, ˆ B( ˆ V2) = 1 N2

n

i=1

  • Zi

ˆ πB

i

− 1 − Zi πR

i

  • ˆ

σ2

i Rafei, Ali (MPSM) Robust Inference for Non-Probability Samples JSM 2020 14 / 35

slide-29
SLIDE 29

Variance estimation

Variance estimation when πR

i is incomputable for i ∈ B:

Under GLM:

A modified bootstrap resampling method (Rao & Wu, 1991)

1

Draw M bootstrap samples of sizes nB − 1 and nR − 1 from B and R to estimate ˆ ¯ y(m)

DR ’s.

2

Update the sampling weights in R to w(m)

i

= wi

nR nR−1ti.

  • Var(ˆ

¯ y(m)

DR ) = 1

M

M

m=1

  • ˆ

¯ y(m)

DR − ¯

¯ yDR 2

Rafei, Ali (MPSM) Robust Inference for Non-Probability Samples JSM 2020 15 / 35

slide-30
SLIDE 30

Variance estimation

Variance estimation when πR

i is incomputable for i ∈ B:

Under GLM:

A modified bootstrap resampling method (Rao & Wu, 1991)

1

Draw M bootstrap samples of sizes nB − 1 and nR − 1 from B and R to estimate ˆ ¯ y(m)

DR ’s.

2

Update the sampling weights in R to w(m)

i

= wi

nR nR−1ti.

  • Var(ˆ

¯ y(m)

DR ) = 1

M

M

m=1

  • ˆ

¯ y(m)

DR − ¯

¯ yDR 2

Under BART:

A multiple imputation method using the posterior predictive draws

1

Randomly select a sample of size M from posterior predictive draws, and estimate ˆ ¯ y(m)

DR .

2

Use Rubin’s combining rules to construct point/variance estimates.

  • Var(ˆ

¯ yDR) = ¯ VW + (1 + 1/M)VB where ¯ Vw = ∑M

m=1 var{ˆ

¯ y(m)

DR }/M and VB = ∑M m=1[ˆ

¯ y(m)

DR − ¯

¯ yDR]2/(M − 1)

Rafei, Ali (MPSM) Robust Inference for Non-Probability Samples JSM 2020 15 / 35

slide-31
SLIDE 31

Simulation study I (Chen et al 2019)

A pop. of size N = 1, 000, 000 was generated with the following variables:

z1i ∼ Ber(p = 0.5) z2i ∼ U(0, 2) z3i ∼ Exp(µ = 1) z4i ∼ χ2

(4)

x1i = z1i x2i = z2i + 0.3z1i x3i = z3i + 0.2(x1i + x2i) x4i = z4i + 0.1(x1i + x2i + x3i)

Y is a continuous outcome with normal distribution as below:

Yi = 2 + x1i + x2i + x3i + x4i + 0.5ǫi where ǫi ∼ N(0, 1)

Two sets of unequal selection probabilities, are generated as below:

πR

i ∝ γ1 + z3i,

log

  • πB

i

1 − πB

i

  • = γ0 + 0.1x1i + 0.2x2i + 0.1x3i + 0.2x4i

The simulation was iterated K = 1000 times, and rel-Bias, rMSE, 95%CI coverage rates and SE ratio were computed. Different scenarios of model misspecification were examined.

Rafei, Ali (MPSM) Robust Inference for Non-Probability Samples JSM 2020 16 / 35

slide-32
SLIDE 32

Simulation results I

The simulation results for nR = 100 and nB = 1, 000

−10 10 20 30 UW/FW PAPW IPSW PM

Method Rel−Bias (%) Spec

False True

Rel−Bias and 2.5%−97.5% percentiles for Y

Rafei, Ali (MPSM) Robust Inference for Non-Probability Samples JSM 2020 17 / 35

slide-33
SLIDE 33

Simulation results I

The simulation results for nR = 100 and nB = 1, 000

−10 10 20 30 UW/FW PAPW IPSW PM

Method Rel−Bias (%) Spec

False True

Rel−Bias and 2.5%−97.5% percentiles for Y

Rafei, Ali (MPSM) Robust Inference for Non-Probability Samples JSM 2020 17 / 35

slide-34
SLIDE 34

Simulation results I

The simulation results for nR = 100 and nB = 1, 000

−10 10 20 30 UW/TW True−True True−False False−True False−False

Model specification (QR−PM) Rel−Bias (%) Method

UW FW PAPW IPSW

Rel−Bias and 2.5%−97.5% percentiles for Y

Rafei, Ali (MPSM) Robust Inference for Non-Probability Samples JSM 2020 18 / 35

slide-35
SLIDE 35

Simulation results I

The simulation results for nR = 100 and nB = 1, 000

−10 10 20 30 UW/TW True−True True−False False−True False−False

Model specification (QR−PM) Rel−Bias (%) Method

UW FW PAPW IPSW

Rel−Bias and 2.5%−97.5% percentiles for Y

Rafei, Ali (MPSM) Robust Inference for Non-Probability Samples JSM 2020 18 / 35

slide-36
SLIDE 36

Simulation results I

The simulation results for nR = 100 and nB = 1, 000

50 60 70 80 90 T r u e − T r u e T r u e − F a l s e F a l s e − T r u e F a l s e − F a l s e

cov rate (%)

95%CI cov rate for Y

0.90 0.95 1.00 1.05 1.10 T r u e − T r u e T r u e − F a l s e F a l s e − T r u e F a l s e − F a l s e

SE ratio Method

PAPW IPSW

SE ratio for Y

Rafei, Ali (MPSM) Robust Inference for Non-Probability Samples JSM 2020 19 / 35

slide-37
SLIDE 37

Simulation study II

A clustered pop. of size A = 1, 000 and nα = 1, 000 was generated as below:

  • X1α

  • ∼ MVN(
  • 1
  • ,
  • 1

0.8 0.8 1

  • )

, X2α ∼ Ber(p = 0.5)

Y is a continuous outcome with normal distribution as below:

Yαi|Xα, dα ∼ N(µ = 2 + 0.4x2

1α + 0.3x3 1α − 0.2x2α − 0.1x1αx2α − dα + uα, σ2 = 1)

Two sets of unequal selection probabilities, are generated as below:

P(δR

α = 1|d) =

eγ0+0.5dα 1 + eγ0+0.5dα , P(δB

α = 1|x) =

eγ1+0.4x1α−0.2x2

1α+0.6x2α+0.1x1αx2α

1 + eγ1+0.4x1α−0.2x2

1α+0.6x2α+0.1x1αx2α

The simulation was iterated K = 1000 times, and rel-Bias, rMSE, 95%CI coverage rates and SE ratio were computed. Different scenarios of model misspecification were examined.

Rafei, Ali (MPSM) Robust Inference for Non-Probability Samples JSM 2020 20 / 35

slide-38
SLIDE 38

Simulation results II

The simulation results for nRα = 100 and nBα = 50 and a = 200:

  • −20

−10 10 20 UW/FW PAPW PAPP IPSW PM

Method Rel−Bias (%) Spec

  • False

True

Model

  • BART

GLM NA

Rel−Bias and 2.5%−97.5% percentiles for Y

Rafei, Ali (MPSM) Robust Inference for Non-Probability Samples JSM 2020 21 / 35

slide-39
SLIDE 39

Simulation results II

The simulation results for nRα = 100 and nBα = 50 and a = 200:

  • −20

−10 10 20 UW/FW PAPW PAPP IPSW PM

Method Rel−Bias (%) Spec

  • False

True

Model

  • BART

GLM NA

Rel−Bias and 2.5%−97.5% percentiles for Y

Rafei, Ali (MPSM) Robust Inference for Non-Probability Samples JSM 2020 21 / 35

slide-40
SLIDE 40

Simulation results II

The simulation results for nRα = 100 and nBα = 50 and a = 200:

  • −30

−20 −10 10 20 UW/TW True−True True−False False−True False−False

Model specification (QR−PM) Rel−Bias (%) Method

  • UW

FW PAPW PAPP IPSW

Model

  • BART

GLM NA

Rel−Bias and 2.5%−97.5% percentiles for Y

Rafei, Ali (MPSM) Robust Inference for Non-Probability Samples JSM 2020 22 / 35

slide-41
SLIDE 41

Simulation results II

The simulation results for nRα = 100 and nBα = 50 and a = 200:

  • −30

−20 −10 10 20 UW/TW True−True True−False False−True False−False

Model specification (QR−PM) Rel−Bias (%) Method

  • UW

FW PAPW PAPP IPSW

Model

  • BART

GLM NA

Rel−Bias and 2.5%−97.5% percentiles for Y

Rafei, Ali (MPSM) Robust Inference for Non-Probability Samples JSM 2020 22 / 35

slide-42
SLIDE 42

Simulation results II

The simulation results for nRα = 100 and nBα = 50 and a = 200:

  • 70

80 90 T r u e − T r u e T r u e − F a l s e F a l s e − T r u e F a l s e − F a l s e

cov rate (%)

95%CI cov rate for Y

  • 0.9

1.0 1.1 1.2 1.3 T r u e − T r u e T r u e − F a l s e F a l s e − T r u e F a l s e − F a l s e

SE ratio Method

  • PAPW

PAPP IPSW

Model

  • BART

GLM

SE ratio for Y

Rafei, Ali (MPSM) Robust Inference for Non-Probability Samples JSM 2020 23 / 35

slide-43
SLIDE 43

Results on SHRP2: reference survey

The 2017 National Household Travel Survey (NHTS) as the reference survey A nationally representative survey of U.S. citizens aged ≥ 5 years (nR = 129, 112) An address-based sample with a stratified design. Initial recruitment through mailing (RR: 30.4%) Responded HH assigned randomly to weekdays Travel log using web/telephone (RR: 51.4%) NHTS data were combined with SHRP2 data at the day level (nB = 874, 211)

Rafei, Ali (MPSM) Robust Inference for Non-Probability Samples JSM 2020 24 / 35

slide-44
SLIDE 44

Results on SHRP2: data integration

Common variables in SHRP2 and NHTS 2017 data sets Individual level Vehicle level Trip level gender, age, race, ethnicity, vehicle make, vehicle type duration, distance, urban size, birth country, vehicle age, mileage average speed, start education, HH income time, weekday, home ownership, job status month Differences between SHRP2 and NHTS in sample composition Feature NHTS SHRP2 Age range ≥ 5 16-80 Transportation mode walk, bicycle, motorbike, car, ... car, SUV, van, light truck Driving status driver, passenger driver Vehicle ownership

  • wned, rental, public transportation
  • wned

Trip measurement self-reported sensor-recorded Followup duration

  • ne day

months or years

Rafei, Ali (MPSM) Robust Inference for Non-Probability Samples JSM 2020 25 / 35

slide-45
SLIDE 45

Results on SHRP2: data integration

Common variables in SHRP2 and NHTS 2017 data sets Individual level Vehicle level Trip level gender, age, race, ethnicity, vehicle make, vehicle type duration, distance, urban size, birth country, vehicle age, mileage average speed, start education, HH income time, weekday, home ownership, job status month Differences between SHRP2 and NHTS in sample composition Feature NHTS SHRP2 Age range ≥ 5 16-80 Transportation mode walk, bicycle, motorbike, car, ... car, SUV, van, light truck Driving status driver, passenger driver Vehicle ownership

  • wned, rental, public transportation
  • wned

Trip measurement self-reported sensor-recorded Followup duration

  • ne day

months or years

Rafei, Ali (MPSM) Robust Inference for Non-Probability Samples JSM 2020 25 / 35

slide-46
SLIDE 46

Results on SHRP2: pseudo-weighting

Assessing the common support of the distribution of estimated PS in SHRP2 vs NHTS

2 4 6 0.4 0.5 0.6 0.7 0.8 0.9

propensity scores density Method

NHTS SHRP2

Estimated PS based on BART

5 10 PAPP_GLM PAPP_BART IPSW_GLM

method pseudo−weights (log scale)

Estimated pseudo-weights in log scale

Rafei, Ali (MPSM) Robust Inference for Non-Probability Samples JSM 2020 26 / 35

slide-47
SLIDE 47

Results on SHRP2: model specification

Comparing the performance of BART with GLM in estimating PS and trip-related outcomes

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 False positive rate True positive rate 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

  • Logit: AUC=89.9%

CART: AUC=86.2% BART: AUC=94.0% 5 10 15 20 25 Duration Distance Average speed Start hour Min hour Max speed Stop time Stop rate brake freq Brake rate GLM BART

(pseudo)-R2 in modeling trip-related outcomes

Rafei, Ali (MPSM) Robust Inference for Non-Probability Samples JSM 2020 27 / 35

slide-48
SLIDE 48

Results on SHRP2: sample composition

Comparing dist. of common covariates: unweighted SHRP2 vs weighted NHTS 2017

Female Male Female Male

25 50 75 100 SHRP2 NHTS

Sex

75+ 65−74 55−64 45−54 35−44 25−34 16−24 75+ 65−74 55−64 45−54 35−44 25−34 16−24

SHRP2 NHTS

Age Group

Other Asian Black White Other Asian Black White

SHRP2 NHTS

Race

Hispanic Non−Hisp Hispanic Non−Hisp

SHRP2 NHTS

Ethnicity

Other US Other US

SHRP2 NHTS

Birth Country

Post−grad Graduate College HS comp <HS Post−grad Graduate College HS comp <HS

SHRP2 NHTS

Education

Rafei, Ali (MPSM) Robust Inference for Non-Probability Samples JSM 2020 28 / 35

slide-49
SLIDE 49

Results on SHRP2: sample composition

Comparing dist. of common covariates: pseudo-weighted SHRP2 vs weighted NHTS 2017

Female Male Female Male

25 50 75 100 SHRP2 NHTS

Sex

75+ 65−74 55−64 45−54 35−44 25−34 16−24 75+ 65−74 55−64 45−54 35−44 25−34 16−24

SHRP2 NHTS

Age Group

Other Asian Black White Other Asian Black White

SHRP2 NHTS

Race

Hispanic Non−Hisp Hispanic Non−Hisp

SHRP2 NHTS

Ethnicity

Other US Other US

SHRP2 NHTS

Birth Country

Post−grad Graduate College HS comp <HS Post−grad Graduate College HS comp <HS

SHRP2 NHTS

Education

Rafei, Ali (MPSM) Robust Inference for Non-Probability Samples JSM 2020 28 / 35

slide-50
SLIDE 50

Results on SHRP2: sample composition

Comparing dist. of common covariates: unweighted SHRP2 vs weighted NHTS 2017

150+ 100−149 50−99 0−49 150+ 100−149 50−99 0−49

25 50 75 100 SHRP2 NHTS

HH Income

Rented Owned Rented Owned

SHRP2 NHTS

Home Own

Work home Part−time Full−time Work home Part−time Full−time

SHRP2 NHTS

Job Status

5+ 4 3 2 1 5+ 4 3 2 1

SHRP2 NHTS

HH Size

1000+ 500−1000 200−500 50−200 <50 1000+ 500−1000 200−500 50−200 <50

SHRP2 NHTS

Urban Size

20+ 15−19 10−14 5−9 0−4 20+ 15−19 10−14 5−9 0−4

SHRP2 NHTS

Vehicle Age

Rafei, Ali (MPSM) Robust Inference for Non-Probability Samples JSM 2020 28 / 35

slide-51
SLIDE 51

Results on SHRP2: sample composition

Comparing dist. of common covariates: pseudo-weighted SHRP2 vs weighted NHTS 2017

150+ 100−149 50−99 0−49 150+ 100−149 50−99 0−49

25 50 75 100 SHRP2 NHTS

HH Income

Rented Owned Rented Owned

SHRP2 NHTS

Home Own

Work home Part−time Full−time Work home Part−time Full−time

SHRP2 NHTS

Job Status

5+ 4 3 2 1 5+ 4 3 2 1

SHRP2 NHTS

HH Size

1000+ 500−1000 200−500 50−200 <50 1000+ 500−1000 200−500 50−200 <50

SHRP2 NHTS

Urban Size

20+ 15−19 10−14 5−9 0−4 20+ 15−19 10−14 5−9 0−4

SHRP2 NHTS

Vehicle Age

Rafei, Ali (MPSM) Robust Inference for Non-Probability Samples JSM 2020 28 / 35

slide-52
SLIDE 52

Results on SHRP2: sample composition

Comparing dist. of common covariates: unweighted SHRP2 vs weighted NHTS 2017

European Asian American European Asian American

25 50 75 100 SHRP2 NHTS

Vehicle Make

Pickup SUV Van Car Pickup SUV Van Car

SHRP2 NHTS

Vehicle Type

30+ 25−29 20−24 15−19 10−14 5−9 0−4 30+ 25−29 20−24 15−19 10−14 5−9 0−4

SHRP2 NHTS

Mileage

Other Gas/D Other Gas/D

SHRP2 NHTS

Fuel type

Sat Fri Thu Wed Tue Mon Sun Sat Fri Thu Wed Tue Mon Sun

SHRP2 NHTS

Weekday

Fall Summer Spring Winter Fall Summer Spring Winter

SHRP2 NHTS

Season

Rafei, Ali (MPSM) Robust Inference for Non-Probability Samples JSM 2020 28 / 35

slide-53
SLIDE 53

Results on SHRP2: sample composition

Comparing dist. of common covariates: pseudo-weighted SHRP2 vs weighted NHTS 2017

European Asian American European Asian American

25 50 75 100 SHRP2 NHTS

Vehicle Make

Pickup SUV Van Car Pickup SUV Van Car

SHRP2 NHTS

Vehicle Type

30+ 25−29 20−24 15−19 10−14 5−9 0−4 30+ 25−29 20−24 15−19 10−14 5−9 0−4

SHRP2 NHTS

Mileage

Other Gas/D Other Gas/D

SHRP2 NHTS

Fuel type

Sat Fri Thu Wed Tue Mon Sun Sat Fri Thu Wed Tue Mon Sun

SHRP2 NHTS

Weekday

Fall Summer Spring Winter Fall Summer Spring Winter

SHRP2 NHTS

Season

Rafei, Ali (MPSM) Robust Inference for Non-Probability Samples JSM 2020 28 / 35

slide-54
SLIDE 54

Results on SHRP2: bias adjustment

Comparing adjusted estimates of some trip-related outcome vars in SHRP2 vs NHTS

31 32 33 34 35 36 UW PAPP PMLE OR AIPW_PAPP AIPW_PMLE

distance (mile) Method

UW PAPP PMLE OR AIPW_PAPP AIPW_PMLE

Model

NA GLM BART

Mean daily total distance driven

Rafei, Ali (MPSM) Robust Inference for Non-Probability Samples JSM 2020 29 / 35

slide-55
SLIDE 55

Result on SHRP2: bias adjustment

Comparing adjusted estimates of some SHRP2-specific outcome vars in SHRP2

3.5 4.0 4.5 5.0 UW PAPP PMLE OR AIPW_PAPP AIPW_PMLE

Frequency of brakes Model

NA GLM BART

Mean frequency of brakes per driven mile

Rafei, Ali (MPSM) Robust Inference for Non-Probability Samples JSM 2020 30 / 35

slide-56
SLIDE 56

Results on SHRP2: bias adjustment

Comparing adjusted estimates of some SHRP2-specific outcome vars in SHRP2

59 60 61 62 63 UW PAPP PMLE OR AIPW_PAPP AIPW_PMLE

speed (MPH) Method

UW PAPP PMLE OR AIPW_PAPP AIPW_PMLE

Model

NA GLM BART

Mean daily maximum speed

Rafei, Ali (MPSM) Robust Inference for Non-Probability Samples JSM 2020 31 / 35

slide-57
SLIDE 57

Result on SHRP2: bias adjustment

Comparing adjusted estimates of maximum speed stratified by different factors

50 55 60 65 70 M a l e F e m a l e

speed (MPH)

Sex

50 55 60 65 70 1 6 − 2 4 2 5 − 3 4 3 5 − 4 4 4 5 − 5 4 5 5 − 6 4 6 5 − 7 4 7 5 +

Age group

50 55 60 65 70 W h i t e B l a c k A s i a n O t h e r

Method

UW AIPW_PAPP AIPW_IPSW

Model

NA GLM BART

Race

50 55 60 65 70 < H S H S c

  • m

p C

  • l

l e g e G r a d u a t e P

  • s

t − g r a d

speed (MPH)

Education

50 55 60 65 70 C a r V a n S U V P i c k u p

Vehicle type

50 55 60 65 70 W e e k d a y W e e k e n d

Method

UW AIPW_PAPP AIPW_IPSW

Model

NA GLM BART

Day of week Rafei, Ali (MPSM) Robust Inference for Non-Probability Samples JSM 2020 32 / 35

slide-58
SLIDE 58

Discussion

We proposed a robust method for inference in non-prob. samples. The AIPW method under BART produced approximately unbiased estimates, especially when both QR and PM are unknown. Compared to PMLE, our proposed estimator was more efficient. Under GLM both point and variance estimators were DR. The proposed asymptotic/bootstrap variance estimator performed well in simulations. However, the results of SHRP2 data were poor for some outcome vars.

Rafei, Ali (MPSM) Robust Inference for Non-Probability Samples JSM 2020 33 / 35

slide-59
SLIDE 59

Discussion

Weaknesses:

1

Auxiliary variables in SHRP2 were poor predictors of trip-related outcomes.

2

Variance estimate under BART was not as accurate as alternative methods

3

Computationally demanding, especially in high-dimensional data or when n is too large.

Future directions:

1

To develop a model-assisted method using penalized spline of propensity prediction

2

To expand a sandwich-type variance estimator under GLM when πR

i

is unknown for i ∈ B

3

To apply divide-and-recombine techniques to reduce the computational burden To adjust for the differential measurement errors in covariates

Rafei, Ali (MPSM) Robust Inference for Non-Probability Samples JSM 2020 34 / 35

slide-60
SLIDE 60

Penalized spline propensity prediction

Estimate πR

i for i ∈ SB given xi by modeling E(πR i |xi) if it is unknown for units of B.

Estimate πB

i based on B ∪ R using one of the methods discussed, PAPW/PAPP/IPSW.

Predict yi for i ∈ SR given [ ˆ πR

i , ˆ

πB

i , xi] using a penalized spline model as below:

Penalized spline model for a continuous outcome

yi|xi, ˆ πR

i , ˆ

πB

i ; θ ∼ N(θ0 + xT i θ1 + uT i1( ˆ

πR

i − KR)p + + uT i2( ˆ

πB

i − KB)p +, τ2)

where uij ∼ N(0, σ2

j I), a vector of q random effects and K a vector of q fixed knots.

Use design-based methods in R to estimate the population unknown quantity: ˆ ¯ yPM = 1 N

nR

i=1

ˆ yi πR

i

Rafei, Ali (MPSM) Robust Inference for Non-Probability Samples JSM 2020 35 / 35

slide-61
SLIDE 61

Penalized spline propensity prediction

Estimate πR

i for i ∈ SB given xi by modeling E(πR i |xi) if it is unknown for units of B.

Estimate πB

i based on B ∪ R using one of the methods discussed, PAPW/PAPP/IPSW.

Predict yi for i ∈ SR given [ ˆ πR

i , ˆ

πB

i , xi] using a penalized spline model as below:

Penalized spline model for a continuous outcome

yi|xi, ˆ πR

i , ˆ

πB

i ; θ ∼ N(θ0 + xT i θ1 + uT i1( ˆ

πR

i − KR)p + + uT i2( ˆ

πB

i − KB)p +, τ2)

where uij ∼ N(0, σ2

j I), a vector of q random effects and K a vector of q fixed knots.

Use design-based methods in R to estimate the population unknown quantity: ˆ ¯ yPM = 1 N

nR

i=1

ˆ yi πR

i

Rafei, Ali (MPSM) Robust Inference for Non-Probability Samples JSM 2020 35 / 35

slide-62
SLIDE 62

Penalized spline propensity prediction

Estimate πR

i for i ∈ SB given xi by modeling E(πR i |xi) if it is unknown for units of B.

Estimate πB

i based on B ∪ R using one of the methods discussed, PAPW/PAPP/IPSW.

Predict yi for i ∈ SR given [ ˆ πR

i , ˆ

πB

i , xi] using a penalized spline model as below:

Penalized spline model for a continuous outcome

yi|xi, ˆ πR

i , ˆ

πB

i ; θ ∼ N(θ0 + xT i θ1 + uT i1( ˆ

πR

i − KR)p + + uT i2( ˆ

πB

i − KB)p +, τ2)

where uij ∼ N(0, σ2

j I), a vector of q random effects and K a vector of q fixed knots.

Use design-based methods in R to estimate the population unknown quantity: ˆ ¯ yPM = 1 N

nR

i=1

ˆ yi πR

i

Rafei, Ali (MPSM) Robust Inference for Non-Probability Samples JSM 2020 35 / 35

slide-63
SLIDE 63

Penalized spline propensity prediction

Estimate πR

i for i ∈ SB given xi by modeling E(πR i |xi) if it is unknown for units of B.

Estimate πB

i based on B ∪ R using one of the methods discussed, PAPW/PAPP/IPSW.

Predict yi for i ∈ SR given [ ˆ πR

i , ˆ

πB

i , xi] using a penalized spline model as below:

Penalized spline model for a continuous outcome

yi|xi, ˆ πR

i , ˆ

πB

i ; θ ∼ N(θ0 + xT i θ1 + uT i1( ˆ

πR

i − KR)p + + uT i2( ˆ

πB

i − KB)p +, τ2)

where uij ∼ N(0, σ2

j I), a vector of q random effects and K a vector of q fixed knots.

Use design-based methods in R to estimate the population unknown quantity: ˆ ¯ yPM = 1 N

nR

i=1

ˆ yi πR

i

Rafei, Ali (MPSM) Robust Inference for Non-Probability Samples JSM 2020 35 / 35

slide-64
SLIDE 64

Thanks for your attention

Email address: arafei@umich.edu Acknowledgements: Professor Michael R. Elliott Research Professor Carol A.C. Flannagan Research Associate Professor Brady T. West

Rafei, Ali (MPSM) Robust Inference for Non-Probability Samples JSM 2020 0 / 2

slide-65
SLIDE 65

References

Elliott, M., Valliant, R. (2017) Inference for nonprobability samples Statistical Science 32(2), 249–264. Wu, Changbao & Sitter, Randy R. (2001) A model-calibration approach to using complete auxiliary information from survey data Journal of the American Statistical Association 96(453), 185–193. Robins, J. M., Rotnitzky, A., & Zhao, L. P. (1994) Estimation of regression coefficients when some regressors are not always observed Journal of the American statistical Association 89(427), 846-866. Elliott, M., Resler, A., Flannagan, C., Rupp, J. (2010) Appropriate analysis of CIREN data: Using NASS-CDS to reduce bias in estimation of injury risk factors in passenger vehicle crashes Accident analysis and prevention 42(2), 530–539. Deville, J. C., & S¨ arndal, C. E. (1992) Calibration estimators in survey sampling Journal of the American statistical Association 87(418), 376-382.

Rafei, Ali (MPSM) Robust Inference for Non-Probability Samples JSM 2020 1 / 2

slide-66
SLIDE 66

References

Rafei, A., Flannagan, A. C. F., Elliott, M. R. (2020) Big Data for Finite Population Inference: Applying Quasi-Random Approaches to Naturalistic Driving Data Using Bayesian Additive Regression Trees Journal of Survey Statistics and Methodology 8 (1), 148-180. Valliant, R., Dever, J. A. (2011). Estimating propensity adjustments for volunteer web surveys. Sociological Methods Research. 40(1), 105-137. Chen, Y., Li, P., Wu, C. (2018). Doubly robust inference with non-probability survey samples. Journal of American Statistical Association. 1-11. Wang, L., Valliant, R., Li, Y (2020). Adjusted Logistic Propensity Weighting Methods for Population Inference using Nonprobability Volunteer-Based Epidemiologic Cohorts. arXiv preprint arXiv:2007.02476. Lee, S. (2006). Propensity score adjustment as a weighting scheme for volunteer panel web surveys. Journal of official statistics. 22(2), 329.

Rafei, Ali (MPSM) Robust Inference for Non-Probability Samples JSM 2020 2 / 2