Bootstrapping Sensitivity Analysis Qingyuan Zhao Department of - - PowerPoint PPT Presentation

bootstrapping sensitivity analysis
SMART_READER_LITE
LIVE PREVIEW

Bootstrapping Sensitivity Analysis Qingyuan Zhao Department of - - PowerPoint PPT Presentation

Bootstrapping Sensitivity Analysis Qingyuan Zhao Department of Statistics, The Wharton School University of Pennsylvania June 2, 2019 @ OSU Bayesian Causal Inference Workshop (Joint work with Bhaswar B. Battacharya and Dylan S. Small) Why


slide-1
SLIDE 1

Bootstrapping Sensitivity Analysis

Qingyuan Zhao

Department of Statistics, The Wharton School University of Pennsylvania

June 2, 2019 @ OSU Bayesian Causal Inference Workshop

(Joint work with Bhaswar B. Battacharya and Dylan S. Small)

slide-2
SLIDE 2

1/20

Why sensitivity analysis?

◮ Unless we have perfectly executed randomized experiment, causal

inference is based on some unverifiable assumptions.

◮ In observational studies, the most commonly used assumption is

ignorability or no unmeasured confounding: A ⊥ ⊥ Y (0), Y (1)

  • X.

We can only say this assumption is “plausible”.

◮ Sensitivity analysis asks: what if this assumption does not hold?

Does our qualitative conclusion still hold?

◮ This question appears in many settings:

  • 1. Confounded observational studies.
  • 2. Survey sampling with missing not at random (MNAR).
  • 3. Longitudinal study with non-ignorable dropout.

◮ In general, this means that the target parameter (e.g. average

treatment effect) is only partially identified.

slide-3
SLIDE 3

2/20

Overview: Bootstrapping sensitivity analysis

Point-identified parameter: Efron’s bootstrap

Bootstrap

Point estimator = = = = = = = = = = = = ⇒ Confidence interval

Partially identified parameter: An analogy

Optimization Percentile Bootstrap Minimax inequality

Extrema estimator = = = = = = = = = = = = ⇒ Confidence interval

Rest of the talk

Apply this idea to IPW estimators in a marginal sensitivity model.

slide-4
SLIDE 4

3/20

Some existing sensitivity models

Generally, we need to specify how unconfoundedness is violated.

  • 1. Y models: Consider a specific difference between the conditional

distribution Y (a) | X, A and Y (a) | X.

◮ Commonly called “pattern mixture models”. ◮ Robins (1999, 2002); Birmingham et al. (2003); Vansteelandt et al.

(2006); Daniels and Hogan (2008).

  • 2. A models: Consider a specific difference between the conditional

distribution A | X, Y (a) and A | X.

◮ Commonly called “selection models”. ◮ Scharfstein et al. (1999); Gilbert et al. (2003).

  • 3. Simultaneous models: Consider a range of A models and/or Y

models and report the “worst case” result.

◮ Cornfield et al. (1959); Rosenbaum (2002); Ding and VanderWeele

(2016).

Our sensitivity model—

A hybrid of 2nd and 3rd, similar to Rosenbaum’s.

slide-5
SLIDE 5

4/20

Rosenbaum’s sensitivity model

◮ Imagine there is an unobserved confounder U that “summarizes” all

confounding, so A ⊥ ⊥ Y (0), Y (1) | X, U.

◮ Let e0(x, u) = P0(A = 1|X = x, U = u).

Rosenbaum’s sensitivity model

R(Γ) =

  • e(x, u) : 1

Γ ≤ OR(e(x, u1), e(x, u2)) ≤ Γ, ∀x ∈ X, u1, u2

  • ,

where OR(p1, p2) := [p1/(1 − p1)]/[p2/(1 − p2)] is the odds ratio.

◮ Rosenbaum’s question: can we reject the sharp null hypothesis

Y (0) ≡ Y (1) for every e0(x, u) ∈ R(Γ)?

◮ Robins (2002): we don’t need to assume the existence of U. Let

U = Y (1) when the goal is to estimate E[Y (1)].

slide-6
SLIDE 6

5/20

Our sensitivity model

◮ Let e0(x) = P0(A = 1|X = x) be the propensity score.

Marginal sensitivity models

M(Γ) =

  • e(x, y) : 1

Γ ≤ OR(e(x, y), e0(x)) ≤ Γ, ∀x ∈ X, y

  • .

◮ Compare this to Rosenbaum’s model:

R(Γ) =

  • e(x, u) : 1

Γ ≤ OR(e(x, u1), e(x, u2)) ≤ Γ, ∀x ∈ X, u1, u2

  • .

◮ Tan (2006) first considered this model, but he did not consider

statistical inference in finite sample.

◮ Relationship between the two models: M(

√ Γ) ⊆ R(Γ) ⊆ M(Γ).1

◮ For observational studies, we assume both

P0(A = 1|X, Y (1)), P0(A = 1|X, Y (0)) ∈ M(Γ).

1The second part needs “compatibility”: e(x, y) marginalizes to e0(x).

slide-7
SLIDE 7

6/20

Parametric extension

◮ In practice, the propensity score e0(X) = P0(A = 1|X) is often

estimated by a parametric model.

Definition (Parametric marginal sensitivity models)

Mβ0(Γ) =

  • e(x, y) : 1

Γ ≤ OR(e(x, y), eβ0(x)) ≤ Γ, ∀x ∈ X, y

  • , where eβ0(x)

is the best parametric approximation of e0(x).

This sensitivity model covers both

  • 1. Model misspecification, that is, eβ0(x) = e0(x); and
  • 2. Missing not at random, that is, e0(x) = e0(x, y).
slide-8
SLIDE 8

7/20

Logistic representations

  • 1. Rosenbaum’s sensitivity model:

logit(e(x, u)) = g(x) + γu, where 0 ≤ U ≤ 1 and γ = log Γ.

  • 2. Marginal sensitivity model:

logit(e(h)(x, y)) = logit(e0(x)) + h(x, y), where h∞ = sup |h(x, y)| ≤ γ. Due to this representation, we also call it a marginal L∞-sensitivity model.

  • 3. Parametric marginal sensitivity model:

logit(e(h)(x, y)) = logit(eβ0(x)) + h(x, y), where h∞ = sup |h(x, y)| ≤ γ.

slide-9
SLIDE 9

8/20

Confidence interval I

◮ For simplicity, consider the “missing data” problem where Y = Y (1)

is only observed if A = 1.

◮ Observe i.i.d. samples (Ai, Xi, AiYi), i = 1, . . . , n. ◮ The estimand is µ0 = E0[Y ], however it is only partially identified

under a simultaneous sensitivity model.

Goal 1 (Coverage of true parameter)

Construct a data-dependent interval [L, U] such that P0

  • µ0 ∈ [L, U]
  • ≥ 1 − α

whenever e0(X, Y ) = P0(A = 1|X, Y ) ∈ M(Γ).

slide-10
SLIDE 10

9/20

Confidence interval II

◮ The inverse probability weighting (IPW) identity:

E0[Y ] = E

  • AY

e0(X, Y ) MAR = E AY e0(X)

  • .

◮ Define

µ(h) = E0

  • AY

e(h)(X, Y )

  • ◮ Partially identified region: {µ(h) : e(h) ∈ M(Γ)}.

Goal 2 (Coverage of partially identified region)

Construct a data-dependent interval [L, U] such that P0

  • {µ(h) : e(h) ∈ M(Γ)} ⊆ [L, U]
  • ≥ 1 − α.

◮ Imbens and Manski (2004) have discussed the difference between

these two Goals.

slide-11
SLIDE 11

10/20

An intuitive idea: “The Union Method”

◮ Suppose for any h, we have a confidence interval [L(h), U(h)] such

that lim inf

n→∞ P0(µ(h) ∈ [L(h), U(h)]) ≥ 1 − α ◮ Let L = inf h L(h) and U = sup h

U(h), so [L, U] is the union interval.

Theorem

  • 1. [L, U] satisfies Goal 1 asymptotically.
  • 2. Furthermore if the intervals are “congruent”: ∃ α′ < α such that

lim sup

n→∞ P0

  • µ(h) < L(h)

≤ α′, lim sup

n→∞ P0

  • µ(h) > U(h)

≤ α − α′. Then [L, U] satisfies Goal 2 asymptotically.

slide-12
SLIDE 12

11/20

Practical challenge: How to take the union?

◮ Suppose ˆ

g(x) is an estimate of logit(e0(x)).

◮ For a specific difference h, we can estimate e(h)(x, y) by

ˆ e(h)(x, y) = 1 1 + eh(x,y)−ˆ

g(x,y) . ◮ This leads to an (stabilized) IPW estimate of µ(h):

ˆ µ(h) = 1 n

n

  • i=1

Ai ˆ e(h)(Xi, Yi) −11 n

n

  • i=1

AiYi ˆ e(h)(Xi, Yi)

  • .

◮ Under regularity conditions, the Z-estimation theory tells us

√n

  • ˆ

µ(h) − µ(h) d → N(0, (σ(h))2)

◮ Therefore we can use [L(h), U(h)] = ˆ

µ(h) ∓ z α

2 · ˆ

σ(h) √n .

◮ However, computing the union interval requires solving a

complicated optimization problem.

slide-13
SLIDE 13

12/20

Bootstrapping sensitivity analysis

Point-identified parameter: Efron’s bootstrap

Bootstrap

Point estimator = = = = = = = = = = = = ⇒ Confidence interval

Partially identified parameter: An analogy

Optimization Percentile Bootstrap Minimax inequality

Extrema estimator = = = = = = = = = = = = ⇒ Confidence interval

A simple procedure for simultaneous sensitivity analysis

  • 1. Generate B random resamples of the data. For each resample,

compute the extrema of IPW estimates under Mβ0(Γ).

  • 2. Construct the confidence interval using L = Qα/2 of the B minima

and U = Q1−α/2 of the B maxima.

Theorem

[L, U] achieves Goal 2 for Mβ0(Γ) asymptotically.

slide-14
SLIDE 14

13/20

Proof of the Theorem

Partially identified parameter: Three ideas

Optimization

  • 1. Percentile Bootstrap
  • 2. Minimax inequality

Extrema estimator = = = = = = = = = = = = ⇒ Confidence interval

  • 1. The sampling variability of ˆ

µ(h) can be captured by bootstrap. The percentile bootstrap CI is given by

  • Q α

2

  • ˆ

ˆ µ(h)

b

  • , Q1− α

2

  • ˆ

ˆ µ(h)

b

  • .
  • 2. Generalized minimax inequality:

Percentile Bootstrap CI

Q α

2

  • inf

h

ˆ ˆ µ(h)

b

  • ≤ inf

h Q α

2

  • ˆ

ˆ µ(h)

b

  • ≤ sup

h

Q1− α

2

  • ˆ

ˆ µ(h)

b

  • Union CI

≤ Q1− α

2

  • sup

h

ˆ ˆ µ(h)

b

  • .
slide-15
SLIDE 15

14/20

Computation

Partially identified parameter: Three ideas

  • 3. Optimization

Percentile Bootstrap Minimax inequality

Extrema estimator = = = = = = = = = = = = ⇒ Confidence interval

  • 3. Computing extrema of ˆ

µ(h) is a linear fractional programming: Let zi = eh(Xi,Yi), we just need to solve max or min n

i=1 AiYi

  • 1 + zie−ˆ

g(Xi)

n

i=1 Ai

  • 1 + zie−ˆ

g(Xi) ,

subject to zi ∈ [Γ−1, Γ], i = 1, . . . , n.

◮ This can be converted to a linear programming. ◮ Moreover, the solution z must have the same/opposite order as Y ,

so the time complexity can be reduced to O(n) (optimal).

The role of Bootstrap

Comapred to the union method, the workflow is greatly simplified:

  • 1. No need to derive σ(h) analytically (though we could).
  • 2. No need to optimize σ(h) (which is very challenging).
slide-16
SLIDE 16

15/20

Comparison with Rosenbaum’s sensitivity analysis

Rosenbaum’s paradigm New bootstrap approach Population Sample Super-population Design Matching Weighting Sensitivity model M( √ Γ) ⊆ R(Γ) ⊆ M(Γ) Inference Bounding p-value CI for ATE/ATT Effect modification Constant effect Allow for heterogeneity Extension Carefully developed for

  • bservational studies

Can be applied to missing data problems

slide-17
SLIDE 17

16/20

Example

Fish consumption and blood mercury

◮ 873 controls: ≤ 1 serving of fish per month. ◮ 234 treated: ≥ 12 servings of fish per month. ◮ Covariates: gender, age, income (very imblanced), race, education,

ever smoked, # cigarettes.

Implementation details

◮ Rosenbaum’s method: 1-1 matching, CI constructed by

Hodges-Lehmann (assuming causal effect is constant).

◮ Our method (percentile Bootstrap): stabilized IPW for ATT w/wo

augmentation by outcome linear regression.

slide-18
SLIDE 18

17/20

Results

◮ Recall that M(

√ Γ) ⊆ R(Γ) ⊆ M(Γ).

  • 1

2 3 4 1 1.6 2.7 7.4

Γ Causal effect

  • Matching

SIPW (ATT) SAIPW (ATT)

Figure: The solid error bars are the range of point estimates and the dashed error bars

(together with the solid bars) are the confidence intervals. The circles/triangles/squares are the mid-points of the solid bars.

slide-19
SLIDE 19

18/20

Discussion: The general sensitivity analysis problem

Optimization Percentile Bootstrap Minimax inequality

Extrema estimator = = = = = = = = = = = = ⇒ Confidence interval The percentile bootstrap idea can be extended to the following problem: max or min E[f (X, θ, h(X))], subject to h(x)∞ ≤ γ, where f is a functional of the observed data X, some finite-dimensional nuisance parameter θ, and a sensitivity function h(X), as long as

◮ θ is “estimable” given X and h; ◮ Bootstrap “works” for En[f (X, ˆ

θ, h(X))], given h.

The challenges...

  • 1. How to solve the sample verision of the optimization problem?
  • 2. Can we allow infinite-dimensional θ?
  • 3. Can we include additional constraints such as E[g(X, θ, h(X))] ≤ 0?
slide-20
SLIDE 20

19/20

References I

Reference for this talk

◮ “Sensitivity analysis for inverse probability weighting estimators via

the percentile bootstrap.” To appear in JRSSB.

◮ R package: https://github.com/qingyuanzhao/bootsens.

Further references

  • J. Birmingham, A. Rotnitzky, and G. M. Fitzmaurice. Pattern-mixture and selection models for

analysing longitudinal data with monotone missing patterns. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 65(1):275–297, 2003.

  • J. Cornfield, W. Haenszel, E. C. Hammond, A. M. Lilienfeld, M. B. Shimkin, and E. L. Wynder.

Smoking and lung cancer: recent evidence and a discussion of some questions. Journal of the National Cancer Institute, 22(1):173–203, 1959.

  • M. J. Daniels and J. W. Hogan. Missing data in longitudinal studies: Strategies for Bayesian

modeling and sensitivity analysis. CRC Press, 2008.

  • P. Ding and T. J. VanderWeele. Sensitivity analysis without assumptions. Epidemiology, 27(3):

368, 2016.

  • P. B. Gilbert, R. J. Bosch, and M. G. Hudgens. Sensitivity analysis for the assessment of causal

vaccine effects on viral load in hiv vaccine trials. Biometrics, 59(3):531–541, 2003.

  • G. W. Imbens and C. F. Manski. Confidence intervals for partially identified parameters.

Econometrica, 72(6):1845–1857, 2004.

slide-21
SLIDE 21

20/20

References II

  • J. M. Robins. Association, causation, and marginal structural models. Synthese, 121(1):151–179,

1999.

  • J. M. Robins. Comment on “covariance adjustment in randomized experiments and observational

studies”. Statistical Science, 17(3):309–321, 2002.

  • P. R. Rosenbaum. Observational Studies. Springer New York, 2002.
  • D. O. Scharfstein, A. Rotnitzky, and J. M. Robins. Adjusting for nonignorable drop-out using

semiparametric nonresponse models. Journal of the American Statistical Association, 94(448): 1096–1120, 1999.

  • Z. Tan. A distributional approach for causal inference using propensity scores. Journal of the

American Statistical Association, 101(476):1619–1637, 2006.

  • S. Vansteelandt, E. Goetghebeur, M. G. Kenward, and G. Molenberghs. Ignorance and uncertainty

regions as inferential tools in a sensitivity analysis. Statistica Sinica, 16(3):953–979, 2006.