Methods for Handling Missing Data Joseph Hogan Brown University - - PowerPoint PPT Presentation

methods for handling missing data
SMART_READER_LITE
LIVE PREVIEW

Methods for Handling Missing Data Joseph Hogan Brown University - - PowerPoint PPT Presentation

Methods for Handling Missing Data Joseph Hogan Brown University MDEpiNet Conference Workshop October 22, 2018 Hogan (MDEpiNet) Missing Data October 22, 2018 1 / 160 Course Overview I 1 Introduction and Background Introduce case studies


slide-1
SLIDE 1

Methods for Handling Missing Data

Joseph Hogan Brown University

MDEpiNet Conference Workshop

October 22, 2018

Hogan (MDEpiNet) Missing Data October 22, 2018 1 / 160

slide-2
SLIDE 2

Course Overview I

1 Introduction and Background ◮ Introduce case studies ◮ Missing data mechanisms ◮ Review and critique of commonly-used methods 2 Case Study 1: Growth Hormone Study ◮ Analysis using mixture models ◮ Setting up sensitivity analysis ◮ Inference about treatment effects Hogan (MDEpiNet) Missing Data October 22, 2018 2 / 160

slide-3
SLIDE 3

Course Overview II

3 Case Study 2: Smoking cessation study ◮ Exploratory analysis for long sequence of binary data ◮ Analysis via IPW methods under MAR ◮ Comparative analysis via GEE, ML, LOCF Hogan (MDEpiNet) Missing Data October 22, 2018 3 / 160

slide-4
SLIDE 4

INTRODUCTION AND BACKGROUND

Hogan (MDEpiNet) Missing Data October 22, 2018 4 / 160

slide-5
SLIDE 5

Study 1: Growth Hormone Study

NIH-funded trial to study effect of rhGH for increasing muscle strength in elderly About 240 patients randomized to one of 4 arms:

◮ Placebo ◮ rhGH ◮ Exercise + Placebo (EP) ◮ Exercise + rhGH (EG)

Primary outcome

◮ Quadriceps strength, in ft-lbs of torque ◮ Measured at baseline, 6 months, 12 months

Our analysis

◮ Mean quad strength at 12 months ◮ Compare EP and EG arms only, for illustration Hogan (MDEpiNet) Missing Data October 22, 2018 5 / 160

slide-6
SLIDE 6

Summary Statistics: Growth Hormone Study

Month Treatment k nk 6 12 EP 1 7 65 (32) 2 2 87 (52) 86 (51) 3 31 65 (24) 81 (25) 73 (21) All 40 66 (26) 82 (26) 73 (21) EG 1 12 58 (26) 2 4 57 (15) 68 (26) 3 22 78 (24) 90 (32) 88 (32) All 38 69 (25) 87 (32) 88 (32)

Hogan (MDEpiNet) Missing Data October 22, 2018 6 / 160

slide-7
SLIDE 7

Questions to be addressed

What is the mean quad strength at 12 months, among all individuals who initiated therapy? What is the treatment effect at 12 months, among all individuals who initiated therapy? These are questions about the full data, or the data we intended to observed but did not. The main difficulty is that a significant proportion of the data are missing.

Hogan (MDEpiNet) Missing Data October 22, 2018 7 / 160

slide-8
SLIDE 8

Study 2: Smoking Cessation Study

NIH-funded study to reduce smoking among sedentary women Roughly 300 individuals randomized to two arms:

◮ Supervised exercise vs. Wellness education program

Primary outcome

◮ Weekly smoking status over 12 weeks

Treatment comparison

◮ Smoking rate at week 12 following baseline

Analysis issues

◮ Binary outcomes ◮ Mean has some structure as a function of time ◮ Large number of repeated measures Hogan (MDEpiNet) Missing Data October 22, 2018 8 / 160

slide-9
SLIDE 9

Smoking Cessation Study: Summaries

Hogan (MDEpiNet) Missing Data October 22, 2018 9 / 160

slide-10
SLIDE 10

Smoking Cessation Study: Summaries

Hogan (MDEpiNet) Missing Data October 22, 2018 10 / 160

slide-11
SLIDE 11

Basics of inference with incomplete data

Formulate the precise question you want to answer Define the quantity you want to estimate Ascertain what information is available in the data And ... what information is unavailable Apply a statistical method to estimate quantity of interest Apply statistical principles to quantify uncertainty

◮ Sampling variability ◮ Uncertainty due to missing data or untestable assumptions Hogan (MDEpiNet) Missing Data October 22, 2018 11 / 160

slide-12
SLIDE 12

Course objectives

Develop an understanding of

◮ Mechanisms that lead to missing data ◮ Biases missing data may cause ◮ Methods of addressing missing data

Use examples to illustrate the methods, and provide understanding of how they work

Hogan (MDEpiNet) Missing Data October 22, 2018 12 / 160

slide-13
SLIDE 13

A word about the data examples

They are relatively simple in nature (stylized) They are designed to promote understanding of the methods The idea is that when you apply the methods on more complex problems, you will have a feel for how and why they work (and how and why they don’t) In real life, data analysis problems can be much harder than the ones we are using You will have to do further research to implement these methods on complex datasets

Hogan (MDEpiNet) Missing Data October 22, 2018 13 / 160

slide-14
SLIDE 14

Some reasons for missing data

Refusal to respond Drop out of a study (patient decision) Removal from a study (researcher / doctor decision) Death Administrative reasons (funding, etc)

Hogan (MDEpiNet) Missing Data October 22, 2018 14 / 160

slide-15
SLIDE 15

Defining the estimation target

First, some notation needed Y =

  • utcome variable (e.g., CD4 count)

R = response indicator = 1 if Y observed if Y missing X = covariates of direct interest V = auxiliary covariates (available, not of direct interest)

Hogan (MDEpiNet) Missing Data October 22, 2018 15 / 160

slide-16
SLIDE 16

Defining the estimation target with incomplete data

Possible targets of estimation Full-data parameter: Mean outcome among all individuals intended to be in the sample, whether or not they are observed µ = E(Y ) Observed-data parameter: Mean response among all individuals whose outcome was observed µ1 = E(Y | R = 1)

Hogan (MDEpiNet) Missing Data October 22, 2018 16 / 160

slide-17
SLIDE 17

Defining the estimation target with incomplete data

Full data parameter: Regression parameters among all individuals intended to be in the sample E(Y | X) = X Tβ Observed-data parameter: Regression parameters among individuals with

  • bserved outcome

E(Y | X, R = 1) = X Tβ1 Important questions to ask: When does µ = µ1? When does β = β1?

Hogan (MDEpiNet) Missing Data October 22, 2018 17 / 160

slide-18
SLIDE 18

Illustration using univariate mean

Consider univariate sample n units targeted m units respond (m < n) Data: Y1, . . . , Ym observed; Ym+1, . . . , Yn missing R1 = R2 = · · · = Rm = 1 and Rm+1 = · · · = Rn = 0 X1, . . . , Xn is a baseline covariate, observed on everyone Target of inference: µ = E(Y ) (full-data parameter)

Hogan (MDEpiNet) Missing Data October 22, 2018 18 / 160

slide-19
SLIDE 19

Data excerpt from Growth Hormone trial

V = baseline quad strength Y = quad strength at one year V Y R [1,] 35.0 NA [2,] 74.5 82.6 1 [3,] 120.2 118.7 1 [4,] 84.8 99.6 1 [5,] 68.6 NA [6,] 47.9 57.5 1 [7,] 39.0 NA [8,] 52.0 NA [9,] 92.9 87.5 1 [10,] 98.8 97.4 1 [11,] 48.6 21.4 1

Hogan (MDEpiNet) Missing Data October 22, 2018 19 / 160

slide-20
SLIDE 20

What can be estimated?

What we can estimate: µ1 = E(Y | R = 1) What we cannot estimate (without making assumptions) µ = E(Y ) Main reason we cannot estimate these is because µ = E(Y ) = E(Y | R = 1)P(R = 1) + E(Y | R = 0)P(R = 0)

Hogan (MDEpiNet) Missing Data October 22, 2018 20 / 160

slide-21
SLIDE 21

The need for assumptions to estimate full-data parameters

Cannot estimate parameters for parts of the data that are missing Hence need assumptions about the missing data

◮ These are called missing data mechanisms

Under most circumstances, these assumptions cannot be tested This motivates the need to:

◮ State the assumptions unambiguously so others can critique them ◮ Carry out sensitivity analysis wherever possible Hogan (MDEpiNet) Missing Data October 22, 2018 21 / 160

slide-22
SLIDE 22

Missing data mechanisms

Classification of association between R and Y MCAR – Missing completely at random MAR – Missing at random MNAR – Missing not at random These are sometimes defined conditionally on covariates X X X or, in the case of repeated measures, on the data history up to a specific time point. More later.

Hogan (MDEpiNet) Missing Data October 22, 2018 22 / 160

slide-23
SLIDE 23

Missing data mechanisms

Joint distribution of two random variables MDM for univariate samples MDM for multivariate sample, where interest is in regression

◮ Have model covariates only ◮ Have model covariates and auxiliary information Hogan (MDEpiNet) Missing Data October 22, 2018 23 / 160

slide-24
SLIDE 24

Statistical independence

The notation X ⊥ ⊥ Y means that the random variable X is independent of the random variable Y . Implications of independence Joint distribution can be factored f (x, y) = f (x)f (y) Conditional distributions and expectations f (x | y) = f (x) E(X | Y ) = E(X) i.e., knowing Y does not influence the distribution or expectation of X

Hogan (MDEpiNet) Missing Data October 22, 2018 24 / 160

slide-25
SLIDE 25

Joint distribution of two random variables

To characterize inference from incomplete data, always sampling at least two variables, Y and R Usually we are interested in some aspect of f (y), such as the mean or median. Denote this as θ. For example θ = E(Y ) =

  • y f (y) dy

But to carry out inference, we need to make assumptions about the joint distribution of Y and R, denoted by f (y, r)

Hogan (MDEpiNet) Missing Data October 22, 2018 25 / 160

slide-26
SLIDE 26

Joint distribution of two random variables

The joint distribution can be decomposed into conditional distributions as follows Mixture factorization f (y, r) = f (r) f (y | r) Selection factorization f (y, r) = f (y) f (r | y) We will focus on the selection factorization for now.

Hogan (MDEpiNet) Missing Data October 22, 2018 26 / 160

slide-27
SLIDE 27

Selection factorization

The selection factorization describes the joint distribution in terms of The distribution or model for the variable of interest, f (y) The distribution of the response indicators and their dependence on y, written as f (r | y).

◮ Missing data mechanism ◮ Selection mechanism

This allows us to characterize different types of missing data mechanisms formally.

Hogan (MDEpiNet) Missing Data October 22, 2018 27 / 160

slide-28
SLIDE 28

MDM for univariate sampling

Missing values of Y are missing completely at random (MCAR) if R ⊥ ⊥ Y , or equivalently if f (r | y) = f (r) For univariate samples, this is also classified as missing at random (MAR). More on this distinction later.

Hogan (MDEpiNet) Missing Data October 22, 2018 28 / 160

slide-29
SLIDE 29

MDM for univariate sampling

Missing values of Y are missing not at random (MNAR) if there exists at least one value of y such that f (r | y) = f (r) Or in words, if the probability of response is systematically higher/lower for particular values of y.

Hogan (MDEpiNet) Missing Data October 22, 2018 29 / 160

slide-30
SLIDE 30

MDM for univariate sampling

Under MAR, methods applied to the observed data only will generally yield valid inferences about the population.

◮ Estimates will be consistent ◮ Standard errors may be larger than if you had the full data

Under MNAR, methods applied to observed data only generally will not yield valid inferences

Hogan (MDEpiNet) Missing Data October 22, 2018 30 / 160

slide-31
SLIDE 31

MDM for multivariate sampling – regression

Consider the setting where we are interested in the regression of Y on X. Let µ(X) = E(Y | X). Assume there are no other covariates available The model we have in mind is g{µ(X)} = β0 + β1X Here, the function g tells you the type of regression model you are fitting (linear, logistic, etc.) The full data are (Y1, X1, R1), (Y2, X2, R2), . . . , (Yn, Xn, Rn)

Hogan (MDEpiNet) Missing Data October 22, 2018 31 / 160

slide-32
SLIDE 32

MDM for multivariate sampling – regression

In cases like this, we can define missing data mechanisms relative to the objective of inference. The Y ’s are missing at random if Y ⊥ ⊥ R | X In words, the MDM is a random deletion mechanism within distinct levels of X Another way to write this: f (r | y, x) = f (r | x) The deletion mechanism depends on X, but within levels of X it does not depend on Y

Hogan (MDEpiNet) Missing Data October 22, 2018 32 / 160

slide-33
SLIDE 33

Examples of MAR in regression

Let Y denote blood pressure, X denote gender (1 = F, 0 = M). The regression model

  • f interest is

E(Y | X) = β0 + β1X, so that β0 = mean BP among men, β1 = mean difference. Let’s assume men have higher BP on average. Randomly delete BP for 20% of men and 40% of women. R does depend on Y , but only through X

◮ Men have higher BP ◮ Men less likely to be deleted ◮ ⇒ those with higher BP less likely to be deleted

Within levels of X, deletion mechanism is completely random.

Hogan (MDEpiNet) Missing Data October 22, 2018 33 / 160

slide-34
SLIDE 34

MAR in regression – some practical issues

Revisit the MAR condition. If R ⊥ ⊥ Y | X, this also means f (y | x, r) = f (y | x),

  • r

f (y | x, r = 1) = f (y | x, r = 0) The relationship between Y and X is the same whether R = 1 or R = 0. Consequence is that a regression fit to those with R = 1 gives valid estimates of regression parameters. (Standard errors will be higher relative to having all the data.)

Hogan (MDEpiNet) Missing Data October 22, 2018 34 / 160

slide-35
SLIDE 35

MAR in regression – some practical issues

Under MAR, the inferences are still valid even if

◮ The X distribution is different between those with missing and observed Y ’s

Question you have to ask to (subjectively) assess MAR: Is the missing data mechanism a random deletion of Y ’s among people who have the same X values? Equivalent formulation of this question: Is the relationship between X and Y the same among those with missing and observed Y values?

Hogan (MDEpiNet) Missing Data October 22, 2018 35 / 160

slide-36
SLIDE 36

MAR for regression when auxiliary variables are available

In some cases we have information on more than just the X variables In a clinical trial we may be interested in E(Y | X) when X is a treatment group, but we have collected lots of baseline covariates V . In a longitudinal study, we may be interested in the mean outcome at the last measurement time, but we have accumulated information on the outcome at previous measurement times. When auxiliary information is available, we can use it in some cases to make MAR more

  • plausible. Here MAR has a slightly different formulation.

Hogan (MDEpiNet) Missing Data October 22, 2018 36 / 160

slide-37
SLIDE 37

MAR with auxiliary covariates

The relationship of interest is g{µ(X)} = β0 + β1X The full data are (Y1, X1, V1, R1), (Y2, X2, V2, R2), . . . , (Yn, Xn, Vn, Rn) where Y is observed when R = 1 and is missing when R = 0.

Hogan (MDEpiNet) Missing Data October 22, 2018 37 / 160

slide-38
SLIDE 38

MAR with auxiliary covariates

Values of Y are missing at random (MAR) if Y ⊥ ⊥ R | (X, V ) Two equivalent ways to write this are: f (r | x, v, y) = f (r | x, v) f (y | x, v, r = 1) = f (y | x, v, r = 0) The first says that within distinct levels defined by (X, V ), missingness in Y is a random deletion mechanism The second says that the relationship between Y and (X, V ) is the same whether Y is missing or not

Hogan (MDEpiNet) Missing Data October 22, 2018 38 / 160

slide-39
SLIDE 39

MAR with auxiliaries – example

Return to our BP example, but now assume V denotes income level. Recall, we are interested in the coefficient β1 from E(Y | X) = β0 + β1X and not the coefficient α1 from E(Y | X, V ) = α0 + α1X + α2V

Hogan (MDEpiNet) Missing Data October 22, 2018 39 / 160

slide-40
SLIDE 40

Missing data mechanisms for longitudinal data

Need to define some notation for longitudinal data Yj = value of Y at time j Rj = 1 if Yj observed, 0 otherwise Yj = (Y1, Y2, . . . , Yj) =

  • utcome history up to time j

Xj = covariate history up to time j Hj = (Xj, Yj−1) Allows us to define MCAR, MAR, MNAR for longitudinal data

Hogan (MDEpiNet) Missing Data October 22, 2018 40 / 160

slide-41
SLIDE 41

Missing data mechanisms for longitudinal data

MAR

Missing at random (MAR) If interest is in marginal means such as E(Yj), MAR means Rj ⊥ ⊥ Yj | (Rj−1 = 1, Hj) Interpretation:

◮ Among those in follow up at time j, missingness is independent of the outcome Yj

conditional on the previously observed Y ’s.

◮ Missingness does not depend on present or future Y ’s, given the past. Hogan (MDEpiNet) Missing Data October 22, 2018 41 / 160

slide-42
SLIDE 42

Missing data mechanisms for longitudinal data

MAR

Implications

1 Selection mechanism

[Rj | Rj−1 = 1, HJ] = [Rj | Rj−1 = 1, Hj]

◮ Can model selection probability as a function of observed past 2 Imputation mechanism

[Yj | Rj = 0, Rj−1 = 1, Hj] = [Yj | Rj = 1, Rj−1 = 1, Hj]

◮ Can impute missing Yj using a model of the observed Yj ◮ Critical: Must correctly specify observed-data model Hogan (MDEpiNet) Missing Data October 22, 2018 42 / 160

slide-43
SLIDE 43

LOCF

We can characterize LOCF in this framework It is an imputation mechanism

◮ Missing Yj is equal to the most recently observed value of Y ◮ Missing value filled in with probability one (no variance)

Formally [Yj | Rj = 0, Hj] = Yj∗ with probability one, where j∗ = maxk<j{Rk = 1} Not an MAR mechanism in general

◮ Conditional distribution of missing Yj not equal to that for observed Yj Hogan (MDEpiNet) Missing Data October 22, 2018 43 / 160

slide-44
SLIDE 44

Random effects and parametric models

Assume a joint distribution for the repeated measures Model applies to the full data, hence cannot be checked Example: Multivariate normal (Y1, . . . , YJ)T ∼ N(µ, Σ), where µJ×1 = E(Y ) and ΣJ×J = var(Y ) Special case: Random effects model

◮ Particular way of structuring mean and variance Hogan (MDEpiNet) Missing Data October 22, 2018 44 / 160

slide-45
SLIDE 45

Random effects and parametric models

When do these models yield valid inference? Most parametric models valid if

◮ MAR holds ◮ All parts of the model are correctly specified

These models have an implied distribution for the conditionals [Yj | Y1, . . . , Yj−1] Under MAR, the implied distribution applies to those with complete and incomplete data, but .... Parametric assumptions cannot be checked empirically

Hogan (MDEpiNet) Missing Data October 22, 2018 45 / 160

slide-46
SLIDE 46

GEE I

Assume a mean and variance structure for the repeated measures

◮ Not necessarily a full parametric model

Assumed variance structure is the ‘working covariance’ With complete data

◮ Inferences are most efficient when covariance correctly specified ◮ Correct inference about time-specific means even if covariance mis-specified ◮ Reason: all information about time-specific means is already observed Hogan (MDEpiNet) Missing Data October 22, 2018 46 / 160

slide-47
SLIDE 47

GEE II

With incomplete data

◮ Information about time-specific means relies on ‘imputation’ of missing observations ◮ These imputations come from the conditional distribution

[Yj | Y1, . . . , Yj−1]

◮ The form of the conditional distribution depends on the working covariance

Implication: Correct inference about time-specific means only when both mean and covariance are correctly specified

◮ Can get different treatment effects with different working covariances Hogan (MDEpiNet) Missing Data October 22, 2018 47 / 160

slide-48
SLIDE 48

Dependence of estimates on working covariance

From Hogan et al., 2004 Statistics in Medicine

Hogan (MDEpiNet) Missing Data October 22, 2018 48 / 160

slide-49
SLIDE 49

Structure of case studies

1 Introduce modeling approach 2 Relate modeling approach to missing data hierarchy 3 Illustrate on simple cases 4 Include a treatment comparison 5 Discussion of key points from case study Hogan (MDEpiNet) Missing Data October 22, 2018 49 / 160

slide-50
SLIDE 50

CASE STUDY I: MIXTURE MODEL ANALYSIS OF GROWTH HORMONE TRIAL

Hogan (MDEpiNet) Missing Data October 22, 2018 50 / 160

slide-51
SLIDE 51

Outline of analysis

Objective: Compare EG to EP at month 12

◮ Variable: Y3

Estimation of E(Y3) for EG arm only

◮ Ignoring baseline covariates ◮ Using information from baseline covariate Y1 ◮ MAR and MNAR (sensitivity analysis)

Treatment comparisons

◮ Expand to longitudinal case ◮ MAR – using regression imputation ◮ MNAR – sensitivity analysis Hogan (MDEpiNet) Missing Data October 22, 2018 51 / 160

slide-52
SLIDE 52

Estimate E(Y ) from univariate sample

Y3 R [1,] NA [2,] 82.6 1 [3,] 118.7 1 [4,] 99.6 1 [5,] NA [6,] 57.5 1 [7,] NA [8,] NA [9,] 87.5 1 [10,] 97.4 1 [11,] 21.4 1 [12,] 47.2 1 [13,] NA [14,] 68.6 1

Hogan (MDEpiNet) Missing Data October 22, 2018 52 / 160

slide-53
SLIDE 53

Estimating E(Y ) from univariate sample

Model: E(Y | R = 1) = µ1 E(Y | R = 0) = µ0 (not identifiable) Target of estimation E(Y ) = µ1 P(R = 1) + µ0 P(R = 0) Question: what to assume about µ0? In a sense, we are going to impute a value for µ0, or impute values of the missing Y ’s that will lead to an estimate of µ0.

Hogan (MDEpiNet) Missing Data October 22, 2018 53 / 160

slide-54
SLIDE 54

Parameterizing departures from MAR

Target of estimation: Eµ0(Y ) = µ1 P(R = 1) + µ0 P(R = 0) = µ1 + (µ0 − µ1) P(R = 0) Suggests sensitivity parameter ∆(µ0) = µ0 − µ1. Leads to E∆(Y ) = µ1 + ∆ P(R = 0) Features of this format: Centered at MAR (∆ = 0) ∆ cannot be estimated from observed data Can vary ∆ for sensitivity analysis Allows Bayesian approach by placing prior on ∆

Hogan (MDEpiNet) Missing Data October 22, 2018 54 / 160

slide-55
SLIDE 55

Estimation under MNAR

Recall model E∆(Y ) = µ1 + ∆ P(R = 0) Estimate known quantities n1 =

  • i Ri

n0 =

  • i(1 − Ri)
  • P(R = 0)

= n0/(n1 + n0)

  • µ1

= (1/n1)

iYi

Have one unknown quantity ∆ = µ0 − µ1

Hogan (MDEpiNet) Missing Data October 22, 2018 55 / 160

slide-56
SLIDE 56

Estimation under MAR

Plug into model

  • E∆(Y )

=

  • µ1 + (µ0 −

µ1)

  • P(R = 0)

=

  • µ1 +

  • P(R = 0)

Interpretation Under MAR (∆ = 0),

E∆(Y ) = µ1

◮ Estimator is the observed-data mean

Under MNAR (∆ = 0),

◮ Shift observed-dta mean by ∆ P(R = 0) ◮ Shift is proportional to fraction of missing observations Hogan (MDEpiNet) Missing Data October 22, 2018 56 / 160

slide-57
SLIDE 57
  • 10
  • 5

5 10 70 75 80 85 90 95 100 105 delta y3.delta

Y [R=1] = 88.3

  • P(R = 0)

= 0.42

Hogan (MDEpiNet) Missing Data October 22, 2018 57 / 160

slide-58
SLIDE 58

Using information from baseline covariates

Y1 Y3 R [1,] 35.0 NA [2,] 74.5 82.6 1 [3,] 120.2 118.7 1 [4,] 84.8 99.6 1 [5,] 68.6 NA [6,] 47.9 57.5 1 [7,] 39.0 NA [8,] 52.0 NA [9,] 92.9 87.5 1 [10,] 98.8 97.4 1 [11,] 48.6 21.4 1 [12,] 45.7 47.2 1 [13,] 63.4 NA [14,] 64.2 68.6 1

Hogan (MDEpiNet) Missing Data October 22, 2018 58 / 160

slide-59
SLIDE 59

General model for Y

Objective: Inference for E(Y3) Model – General form [Y3|Y1, R = 1] ∼ F1(y3|y1) [Y3|Y1, R = 0] ∼ F0(y3|y1) The general form encompasses all possible models that can be assumed for the observed and missing values of Y3 The model F0 cannot be estimated from data

Hogan (MDEpiNet) Missing Data October 22, 2018 59 / 160

slide-60
SLIDE 60

The model under MAR and MNAR

Under MAR, F0 = F1. Suggests following strategy

◮ Fit model for F1 using observed data; call it

F1

◮ Use this model to impute missing values of Y3

Under MNAR, F0 = F1. Suggests following strategy

◮ Parameterize a model so that F0 is related to F1 through a sensitivity parameter ∆ ◮ Generically write this as F0 = F ∆

1

◮ Use fitted version of F ∆

1 to impute missing Y3

Hogan (MDEpiNet) Missing Data October 22, 2018 60 / 160

slide-61
SLIDE 61

Regression parameterization of F1 and F0

Take the case of MAR first MAR implies [Y3|Y1, R = 1] = [Y3|Y1, R = 0] Assume regression model for [Y |X, R = 1] E(Y3|Y1, R = 1) = α1 + β1Y1 Assume a model for [Y |X, R = 0] of similar form E(Y3|Y1, R = 0) = α0 + β0Y1 Cannot estimate parameters from observed data

Hogan (MDEpiNet) Missing Data October 22, 2018 61 / 160

slide-62
SLIDE 62

Regression parameterization of F1 and F0

Recall model E(Y3|Y1, R = 1) = α1 + β1Y1 E(Y3|Y1, R = 0) = α0 + β0Y1 Link the models. One way to do this: β0 = β1 + ∆β α0 = α1 + ∆α Under MAR: ∆α = ∆β = 0

Hogan (MDEpiNet) Missing Data October 22, 2018 62 / 160

slide-63
SLIDE 63

Caveats to using this (or any!) approach

Recall model E(Y3|Y1, R = 1) = α1 + β1Y1 E(Y3|Y1, R = 0) = α0 + β0Y1 More general version of missing-data model E(Y3|Y1, R = 0) = g(Y1; θ) Do we know the form of g? Do we know the value of θ? Do we know that Y1 is sufficient to predict Y3? We are assuming we know all of these things.

Hogan (MDEpiNet) Missing Data October 22, 2018 63 / 160

slide-64
SLIDE 64

Estimation of E(Y3) under MAR

1 Fit the model E(Y3|Y1, R = 1) = α1 + β1Y1

⇒ obtain α1, β1.

2 For those with R = 0, impute predicted value via

  • Y3i

=

  • E(Y3|Y1i, Ri = 0)

=

  • α1 +

β1Y1i

3 Estimate overall mean as the mixture

  • E(Y3)

= (1/n)

  • i

RiY3i + (1 − Ri) Y3i

Hogan (MDEpiNet) Missing Data October 22, 2018 64 / 160

slide-65
SLIDE 65

Regression imputation under MAR

60 80 100 120 20 40 60 80 100 120 140 y1[r == 1] y3[r == 1]

+ + + + + + + + + + + + + +

Hogan (MDEpiNet) Missing Data October 22, 2018 65 / 160

slide-66
SLIDE 66

Sample means and imputed means under MAR

nr Y1 Y3 R = 0 16 57 67 R = 1 22 78 88 MAR 38 79

Hogan (MDEpiNet) Missing Data October 22, 2018 66 / 160

slide-67
SLIDE 67

Some intuition behind this (simple) estimator

Could base estimate purely on regression model E(Y3) = EY1,R {E(Y3 | Y1, R)} = ER

  • EY1|R {E(Y3 | Y1, R)}
  • =

ER [α1 + β1E(Y1|R)] = α1 + β1E(Y1) Plug in the estimators for each term May be more efficient when regression model is correct

Hogan (MDEpiNet) Missing Data October 22, 2018 67 / 160

slide-68
SLIDE 68

Some more details ...

If we don’t want to use the regression model for Y3, can write E(Y3) = E(Y3|R = 1)P(R = 1) + E(Y3|R = 0)P(R = 0), where the term in red is E(Y3|R = 0) = EY1|R=0 {E(Y3|Y1, R = 0)} = EY1|R=0(α1 + β1Y1|R = 0) = α1 + β1E(Y1|R = 0) Hence

  • E(Y3)

= Y

[R=1] 3

ˆ P(R = 1) +

  • α1 +

β1Y

[R=0] 1

  • ˆ

P(R = 0)

Hogan (MDEpiNet) Missing Data October 22, 2018 68 / 160

slide-69
SLIDE 69

Inference and treatment comparisons

SE and CI: bootstrap

◮ Draw bootstrap sample ◮ Carry out imputation procedure ◮ Repeat for lots of bootstrap samples (say B) ◮ Base SE and CI on the B bootstrapped estimators

Why not multiple imputation?

◮ Estimators are linear ◮ Bootstrap takes care of missing data uncertainty here

Treatment comparisons – coming later

Hogan (MDEpiNet) Missing Data October 22, 2018 69 / 160

slide-70
SLIDE 70

Estimation of E(Y3) under MNAR

Recall model E(Y3|Y1, R = 1) = α1 + β1Y1 E(Y3|Y1, R = 0) = α0 + β0Y1 More general version of model E(Y3|Y1, R = 1) = g1(Y1; θ1) E(Y3|Y1, R = 0) = g0(Y1; θ1, ∆) = h{g1(Y1; θ1), ∆} The h function relates missing-data and observed-data models. User needs to specify form of h The parameter ∆ should not be estimable from observed data Vary ∆ in a sensitivity analysis

Hogan (MDEpiNet) Missing Data October 22, 2018 70 / 160

slide-71
SLIDE 71

Regression-based specification under MNAR

Specify observed-data model E(Y3|Y1, R = 1) = g1(Y1, θ1) = α1 + β1Y1 Specify missing-data model E(Y3|Y1, R = 0) = h{g1(Y1, θ1), ∆} = ∆ + g1(Y1, θ1) = ∆ + (α1 + β1Y1) Many other choices are possible Here, add a constant to the MAR imputation Have MAR when ∆ = 0, and MNAR otherwise

Hogan (MDEpiNet) Missing Data October 22, 2018 71 / 160

slide-72
SLIDE 72

Estimation of E(Y3) under MNAR

1 Fit the model E(Y3|Y1, R = 1) = α1 + β1Y1

⇒ obtain α1, β1.

2 For those with R = 0, impute predicted value via

  • Y3i

=

  • E(Y3|Y1i, Ri = 0)

= ∆ + α1 + β1Y1i

3 Estimate overall mean as the mixture

  • E(Y3)

= (1/n)

  • i

RiY3i + (1 − Ri) Y3i = Y

[R=1] 3

ˆ P(R = 1) +

  • ∆ +

α1 + β1Y

[R=0] 1

  • ˆ

P(R = 0)

Hogan (MDEpiNet) Missing Data October 22, 2018 72 / 160

slide-73
SLIDE 73

Sensitivity analysis based on varying ∆

What should be the ‘anchor point’?

◮ Usually appropriate to anchor analysis at MAR ◮ Examine effect of MNAR by varying ∆ away from 0

How to select a range for ∆?

◮ Will always be specific to application. ◮ Ensure that range is appropriate to context (see upcoming example) ◮ Can use data-driven range for ∆, e.g. based on SD

Reporting final inferences

◮ ‘Stress test’ approach ◮ Inverted sensitivity analysis — find values of ∆ that would change substantive

conclusions

◮ Average over plausible ∆ values Hogan (MDEpiNet) Missing Data October 22, 2018 73 / 160

slide-74
SLIDE 74

Calibrating ∆

How should the range and scale of ∆ be chosen? Direction

◮ ∆ > 0 ⇒ dropouts have higher mean ◮ ∆ < 0 ⇒ dropouts have lower mean

Range and scale

◮ Residual variation in outcome quantified by SD of regression error

[Y3 | Y1, R = 1] = α1 + β1Y1 + e var(e) = σ2

◮ Suggests scaling ∆ in units of σ ◮ Will illustrate in longitudinal case Hogan (MDEpiNet) Missing Data October 22, 2018 74 / 160

slide-75
SLIDE 75

Moving to longitudinal setting

Set-up for single treatment arm Illustrate ideas with analysis of GH data

◮ Compare treatments ◮ Illustrate sensitivity analysis under MNAR ◮ Discuss how to report results Hogan (MDEpiNet) Missing Data October 22, 2018 75 / 160

slide-76
SLIDE 76

Longitudinal case: notation

Assume missing data pattern is monotone K = dropout time =

  • j

Rj Ek(Yj) = E(Yj | K = k) When j > k, cannot estimate Ek(Yj) from data

Hogan (MDEpiNet) Missing Data October 22, 2018 76 / 160

slide-77
SLIDE 77

Longitudinal model with J = 3: Set up

Ek(Y1) = E(Y1|K = k) identified for k = 1, 2, 3 For other means, we have j = 2 j = 3 K = 1 E1(Y2|Y1) E1(Y3|Y1, Y2) K = 2 E2(Y2|Y1) E2(Y3|Y1, Y2) K = 3 E3(Y2|Y1) E3(Y3|Y1, Y2) Components in red cannot be estimated. Need assumptions

Hogan (MDEpiNet) Missing Data October 22, 2018 77 / 160

slide-78
SLIDE 78

Longitudinal model with J = 3: MAR

j = 2 j = 3 K = 1 ωE2(Y2|Y1) + (1 − ω)E3(Y2|Y1) E3(Y3|Y1, Y2) K = 2 E2(Y2|Y1) E3(Y3|Y1, Y2) K = 3 E3(Y2|Y1) E3(Y3|Y1, Y2) Here, ω is a weight such that 0 ≤ ω ≤ 1

Hogan (MDEpiNet) Missing Data October 22, 2018 78 / 160

slide-79
SLIDE 79

Longitudinal model with J = 3: MNAR

j = 2 j = 3 K = 1 ωE2(Y2|Y1) + (1 − ω)E3(Y2|Y1) + ∆1 E3(Y3|Y1, Y2) + ∆2 K = 2 E2(Y2|Y1) E3(Y3|Y1, Y2) + ∆3 K = 3 E3(Y2|Y1) E3(Y3|Y1, Y2)

Hogan (MDEpiNet) Missing Data October 22, 2018 79 / 160

slide-80
SLIDE 80

Procedure with several longitudinal measures

Start by imputing those with missing data at j = 2

1 Fit model for E(Y2|Y1, R2 = 1)

E(Y2|Y1, R2 = 1) = α(2) + β(2)

1 Y1

⇒ obtain α(2), β(2)

1

This is a model that combines those with K = 2 and K = 3

2 Impute missing Y2 as before

  • Y2i

= ∆ + α(2) + β(2)

1 Y1i

Hogan (MDEpiNet) Missing Data October 22, 2018 80 / 160

slide-81
SLIDE 81

Procedure with several longitudinal measures

Now impute those with missing data at j = 3

1 Fit model for E(Y3|Y1, Y2, R3 = 1)

E(Y3|Y1, Y2, R3 = 1) = α(3) + β(3)

1 Y1 + β(3) 2 Y2

⇒ obtain α(3), β(3)

1 ,

β(3)

2

2 Impute missing Y3 as follows: ◮ For those with Y1, Y2 observed,

  • Y3i

= ∆ + α(3) + β(3)

1 Y1i +

β(3)

2 Y2i

◮ For those with only Y1 observed,

  • Y3i

= ∆ + α(3) + β(3)

1 Y1i +

β(3)

2

  • Y2i

Hogan (MDEpiNet) Missing Data October 22, 2018 81 / 160

slide-82
SLIDE 82

Side note

Recall imputation for those with only Y1 observed:

  • Y3i

= ∆ + α(3) + β(3)

1 Y1i +

β(3)

2

  • Y2i

This is really just using information from the observed Y1 because

  • Y2i

= ∆ + α(2) + β(2)

1 Y1i

Hence the imputation is from the (linear) model of E(Y3|Y1) that is implied by the

  • ther imputation models.

Hogan (MDEpiNet) Missing Data October 22, 2018 82 / 160

slide-83
SLIDE 83

Calibration of ∆

At each time point j, ∆ is actually a multiplier of the residual SD for the observed data regression [Yj | Y1, . . . , Yj−1, R = 1] For example, the imputation model at j = 3 is actually

  • Y3i

= ∆ σ3 + α(3) + β(3)

1 Y1i +

β(3)

2 Y2i

where σ2

3 = var(Y3 | Y1, Y2, R3 = 1)

Generally will suppress this for clarity

Hogan (MDEpiNet) Missing Data October 22, 2018 83 / 160

slide-84
SLIDE 84

Procedure with several longitudinal measures

Final step: compute estimate of E(Y3)

  • E∆(Y3)

= (1/n)

  • i

R3iY3i + (1 − R3i) Y3i(∆) Based on the imputations, this turns out to be a weighted average of Y

[K=3] 3

, Y

[K=2] 2

, and Y

[K=1] 1

. Weights depend on dropout rates at each time coefficients in imputation models sensitivity parameter(s)

Hogan (MDEpiNet) Missing Data October 22, 2018 84 / 160

slide-85
SLIDE 85

Analysis components

Fitted regression models at each time point

◮ Can check validity of imputation

Contour plots: vary ∆ separately by treatment

◮ Treatment effect estimates ◮ p-values

Summary table

◮ Treatment effect, SE, p-value ◮ These are computed using bootstrap Hogan (MDEpiNet) Missing Data October 22, 2018 85 / 160

slide-86
SLIDE 86

Hogan (MDEpiNet) Missing Data October 22, 2018 86 / 160

slide-87
SLIDE 87

Hogan (MDEpiNet) Missing Data October 22, 2018 87 / 160

slide-88
SLIDE 88

Hogan (MDEpiNet) Missing Data October 22, 2018 88 / 160

slide-89
SLIDE 89

Hogan (MDEpiNet) Missing Data October 22, 2018 89 / 160

slide-90
SLIDE 90

Case Study 2: Chronic Schizophrenia

Major breakthroughs have been made in the treatment of patients with psychotic symptoms. However, side effects associated with some medications have limited their usefulness. RIS-INT-3 (Marder and Meibach, 1994, Chouinard et al., 1993) was a multi-center study designed to assess the effectiveness and adverse experiences of four fixed doses of risperidone compared to haliperidol and placebo in the treatment of chronic schizophrenia.

Hogan (MDEpiNet) Missing Data October 22, 2018 90 / 160

slide-91
SLIDE 91

RIS-INT-3

Patients were required to have a PANSS (Positive and Negative Syndrome Scale) score between 60 and 120. Prior to randomization, one-week washout phase (all anti-psychotic medications discontinued). If acute psychotic symptoms occurred, patients randomized to a double-blind treatment phase, schedule to last 8 weeks. Patients randomized to one of 6 treatment groups: risperidone 2, 6, 10 or 16 mg, haliperidol 20 mg, or placebo. Dose titration occurred during the first week of the double-blind phase.

Hogan (MDEpiNet) Missing Data October 22, 2018 91 / 160

slide-92
SLIDE 92

RSIP-INT-3

Patients scheduled for 5 post-baseline assessments at weeks 1,2,4,6, and 8 of the double-blind phase. Primary efficacy variable: PANSS score Patients who did not respond to treatment and discontinued therapy or those who completed the study were eligible to receive risperidone in an open-label extension study. 521 patients randomized to receive placebo (n = 88), haliperidol 20 mg (n = 87), risperidone 2mg (n = 87), risperidone 6mg (n = 86), risperidone 10 mg (n = 86),

  • r risperidone 16 mg (n = 87).

Hogan (MDEpiNet) Missing Data October 22, 2018 92 / 160

slide-93
SLIDE 93

Dropout and withdrawal

Only 49% of patients completed the 8 week treatment period. The most common reason for discontinuation was “insufficient response.” Other main reasons included: adverse events, uncooperativeness, and withdrawal of consent.

Hogan (MDEpiNet) Missing Data October 22, 2018 93 / 160

slide-94
SLIDE 94

Dropout and Withdrawal

Placebo Haliperidol Risp 2mg Risp 6mg Risp 10mg Risp 16 mg (n = 88) (n = 87) (n = 87) (n = 86) (n = 86) (n = 87) Completed 27 31% 36 41% 36 41% 53 62% 48 56% 54 62% Withdrawn 61 69% 51 59% 51 59% 33 38% 38 44% 33 38% Lack of Efficacy 51 58% 36 41% 41 47% 12 14% 25 29% 18 21% Other 10 11% 15 17% 10 11% 21 24% 13 15% 15 17%

Hogan (MDEpiNet) Missing Data October 22, 2018 94 / 160

slide-95
SLIDE 95

Central Question

What is the difference in the mean PANSS scores at week 8 between risperidone at a specified dose level vs. placebo in the counterfactual world in which all patients were followed to that week?

Hogan (MDEpiNet) Missing Data October 22, 2018 95 / 160

slide-96
SLIDE 96

Sample means and imputed means under MAR

N INS HOST EPS Y0

  • µR

Placebo R = 0 61 3.9 10.5 3.3 94 ?? R = 1 27 3.7 8.1 3.2 89 78 Risperidone R = 0 51 3.8 10.9 3.5 98 ?? R = 1 36 3.8 8.1 2.8 87 71

Hogan (MDEpiNet) Missing Data October 22, 2018 96 / 160

slide-97
SLIDE 97

Sample means and imputed means under MAR

N INS HOST EPS Y0

  • µR

Placebo R = 0 61 3.9 10.5 3.3 94 79 R = 1 27 3.7 8.1 3.2 89 78 Risperidone R = 0 51 3.8 10.9 3.5 98 74 R = 1 36 3.8 8.1 2.8 87 71

Hogan (MDEpiNet) Missing Data October 22, 2018 97 / 160

slide-98
SLIDE 98

Regression imputation under MAR

Placebo

Hogan (MDEpiNet) Missing Data October 22, 2018 98 / 160

slide-99
SLIDE 99

Regression imputation under MAR

Risperidone

Hogan (MDEpiNet) Missing Data October 22, 2018 99 / 160

slide-100
SLIDE 100

Sample means by dropout time: Aggregated data

1 2 3 4 5 6 60 80 100 120 Visit PANSS

Hogan (MDEpiNet) Missing Data October 22, 2018 100 / 160

slide-101
SLIDE 101

Extrapolation of means when ν = 0 (MAR)

1 2 3 4 5 6 60 80 100 120 Visit PANSS

Hogan (MDEpiNet) Missing Data October 22, 2018 101 / 160

slide-102
SLIDE 102

Extrapolation of means when ν = −1

1 2 3 4 5 6 60 80 100 120 Visit PANSS

Hogan (MDEpiNet) Missing Data October 22, 2018 102 / 160

slide-103
SLIDE 103

Extrapolation of means when ν = +1

1 2 3 4 5 6 60 80 100 120 Visit PANSS

Hogan (MDEpiNet) Missing Data October 22, 2018 103 / 160

slide-104
SLIDE 104

Summary of means

Placebo Risperidone E(Y5|R5 = 1) 78 71 E(Y5|R5 = 0) ν = 0 94 87 ν = .5 109 101 ν = 1 123 114 E(Y5) ν = 0 89 81 ν = .5 99 89 ν = 1 109 97

Hogan (MDEpiNet) Missing Data October 22, 2018 104 / 160

slide-105
SLIDE 105

Summary

Full-data mean parameterized as a mixture of observed- and missing-data means Implemented using imputation

◮ We used regression imputation, but this is only one possibility ◮ Regression models need not be linear ◮ More complex models may require more complex imputation procedures

Key features of this approach

◮ Missing data distribution indexed by sensitivity parameter that cannot be estimated

from data

◮ Separates testable from untestable assumptions ◮ Easy to assess effect of departures from MAR Hogan (MDEpiNet) Missing Data October 22, 2018 105 / 160

slide-106
SLIDE 106

Summary

Model parameterization

◮ Need to limit number of ∆’s to make inferences manageable ◮ Need sensible scale and range for ∆’s ◮ Scope of sensitivity analysis should be specified as part of trial protocol to avoid

reliance on post-hoc analyses

Inference about treatment effects

◮ Sensitivity analysis provides range of conclusions ◮ Can use as a ‘stress-test’: under what MNAR scenario would our conclusions

change?

◮ Can also use Bayesian formulations that average results over a prior for sensitivity

parameter (Daniels & Hogan, 2008)

Hogan (MDEpiNet) Missing Data October 22, 2018 106 / 160

slide-107
SLIDE 107

CASE STUDY II: INVERSE PROBABILITY WEIGHTING METHODS

Hogan (MDEpiNet) Missing Data October 22, 2018 107 / 160

slide-108
SLIDE 108

Inverse Probability Weighting

General idea

Consider estimating E(Y ) from a sample of data If we had all the data, we would use the sample mean

  • E(Y )

= (1/n)(Y1 + Y2 + · · · + Yn) Problem: Only some of the Y ’s are observed The observed Y ’s may not be a random draw from the full sample

Hogan (MDEpiNet) Missing Data October 22, 2018 108 / 160

slide-109
SLIDE 109

Inverse Probability Weighting

General idea

The solution: use a weighted mean Probability of being observed: πi = P(Ri = 1) Weighted mean

  • EIPW(Y )

=

  • i RiYi/πi
  • i Ri/πi

Issues: Probability of being observed may depend on individual characteristics X May also depend on the actual (but unobserved) outcome Y

Hogan (MDEpiNet) Missing Data October 22, 2018 109 / 160

slide-110
SLIDE 110

Inverse Probability Weighting under MAR

Recall MAR: Y ⊥ ⊥ R | X MAR implies [R|X, Y ] ∼ [R|X] IPW theory Define π(Xi) = P(R = 1|Xi) Assume MAR Assume π(Xi) > 0 for all i The weighted estimator

  • EIPW(Y )

=

  • i RiYi/π(Xi)
  • i Ri/π(Xi)

is a consistent estimate of E(Y )

Hogan (MDEpiNet) Missing Data October 22, 2018 110 / 160

slide-111
SLIDE 111

Inverse Probability Weighting under MAR

Remains true when π(Xi) is replaced by a consistent estimator π(Xi). To estimate π(Xi), must specify a model such as logit π(Xi) = X T

i β

In this case,

  • π(Xi)

= exp(X T

i

β) 1 + exp(X T

i

β) Can use other models as well — but the model must yield consistent estimates of π(Xi) for IPW to give valid estimator.

Hogan (MDEpiNet) Missing Data October 22, 2018 111 / 160

slide-112
SLIDE 112

Example: Estimate E(Y3) from GH Data

Treat Y1 as the covariate MAR assumption is [R|Y3, Y1] ∼ [R|Y1] Assume logit model for selection logit π(Y1) = γ0 + γ1Y1 Compute weighted estimator as above

Hogan (MDEpiNet) Missing Data October 22, 2018 112 / 160

slide-113
SLIDE 113

Fitted model π(Y1)

40 60 80 100 120 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 y1 prob Hogan (MDEpiNet) Missing Data October 22, 2018 113 / 160

slide-114
SLIDE 114

Relative weights: Plot of Y3 vs 1/ π(Y1)

20 40 60 80 100 120 140 1.5 2.0 2.5 y3[r == 1] wt[r == 1] Hogan (MDEpiNet) Missing Data October 22, 2018 114 / 160

slide-115
SLIDE 115

Compare estimators

Complete cases: 88 Imputation under MAR: 79 IPW under MAR: 80

Hogan (MDEpiNet) Missing Data October 22, 2018 115 / 160

slide-116
SLIDE 116

IPW – Longitudinal case

To illustrate, we assume monotone missingness (as in the GH trial) Target of inference: E(Y3). Weighted estimator is

  • EIPW(Y3)

=

  • i R3iY3i/π3i
  • i R3i/π3i

How to construct the response probabilities π3i?

Hogan (MDEpiNet) Missing Data October 22, 2018 116 / 160

slide-117
SLIDE 117

Longitudinal case – no covariates

R3 = 1 is equivalent to the joint event R1 = 1, R2 = 1, R3 = 1 Hence P(R3 = 1) = P(R1 = 1, R2 = 1, R3 = 1) = P(R3 = 1 | R2 = 1, R1 = 1) × P(R2 = 1 | R1 = 1) × P(R1 = 1) = φ3 × φ2 × φ1 Notation φj = P(Rj = 1 | R1 = R2 = · · · = Rj−1 = 1)

Hogan (MDEpiNet) Missing Data October 22, 2018 117 / 160

slide-118
SLIDE 118

Longitudinal case with covariates

Recall longitudinal version of MAR [Rj | Rj−1 = 1, HJ] = [Rj | Rj−1 = 1, Hj] In words, missingness at j depends only on the observable past history of X and Y . Implication: can use observable history in models of φj Example: logit φj(Hj) = X T

i1β + X T ij γ + θYi,j−1

Hogan (MDEpiNet) Missing Data October 22, 2018 118 / 160

slide-119
SLIDE 119

Procedure for inference about E(YJ)

1 Formulate and fit models for

φj(Hij) = P(Rj = 1 | Ri,j−1 = 1, Hij)

2 Compute estimated value of

πJ(HiJ) = P(RJ = 1 | HiJ) as πJ(Hij) = φ1(Hi1) × φ2(Hi2) × · · · × φJ(HiJ)

3 Compute weighted mean

  • EIPW(YJ)

=

  • i RiJYiJ/

πJ(HiJ)

  • i RiJ/

πJ(HiJ)

Hogan (MDEpiNet) Missing Data October 22, 2018 119 / 160

slide-120
SLIDE 120

Inverse Probability Weighting under MAR

Practical issues to consider

Stability of weights

◮ Weights near zero lead to bias and inefficiency ◮ Need to check histogram ◮ Can use stabilized weights

Fit of weight models

◮ No guarantees here ◮ Can use lack of fit diagnostics to weed out poor-fitting models

Selection of weight models

◮ Poses a more serious problem with respect to final inferences ◮ Not good to pick weight model that gives lowest p-value! ◮ Pre-specify weight covariates that are related to missingness and outcome Hogan (MDEpiNet) Missing Data October 22, 2018 120 / 160

slide-121
SLIDE 121

Analysis of Smoking Cessation Data via IPW

Specify outcome model Select baseline covariates Specify and fit weight model Fit weighted longitudinal regression for treatment comparison

Hogan (MDEpiNet) Missing Data October 22, 2018 121 / 160

slide-122
SLIDE 122

Outcome model

Outcome and treatment Yj = quit status (1 if yes, 0 if no) Z = 1 if exercise, 0 if wellness θj = P(Yj = 1) Model

◮ constant quit rate up to week 4 ◮ separate treatment quit rates after week 4, but constant over time

logit θj = γ0 · 1(j ≤ 4) + (γ1 + βZ) · 1(j > 4) β = treatment log odds ratio

Hogan (MDEpiNet) Missing Data October 22, 2018 122 / 160

slide-123
SLIDE 123

Hogan (MDEpiNet) Missing Data October 22, 2018 123 / 160

slide-124
SLIDE 124

Exploratory analysis

Ascertain whether previous Y ’s belong in weight model This shows strong correlation between Rj and Yj−1

Hogan (MDEpiNet) Missing Data October 22, 2018 124 / 160

slide-125
SLIDE 125

Exploratory analysis

Hogan (MDEpiNet) Missing Data October 22, 2018 125 / 160

slide-126
SLIDE 126

Covariates for selection model

Hogan (MDEpiNet) Missing Data October 22, 2018 126 / 160

slide-127
SLIDE 127

Check weight distribution at each time

Hogan (MDEpiNet) Missing Data October 22, 2018 127 / 160

slide-128
SLIDE 128

Summary of results

Treatment effect relatively robust in odds ratio scale Arm-specific cessation rates very different Tx effect not robust in other scales

Hogan (MDEpiNet) Missing Data October 22, 2018 128 / 160

slide-129
SLIDE 129

Hogan (MDEpiNet) Missing Data October 22, 2018 129 / 160

slide-130
SLIDE 130

Multiple Imputation

Hogan (MDEpiNet) Missing Data October 22, 2018 130 / 160

slide-131
SLIDE 131

Overview

Imputing missing data from a parametric model Build around an example using GH data

◮ Goal 1: Estimate E(Y3) in GH data ◮ Goal 2: Estimate treatment effect in CTQ data

Process of imputation

◮ Model specification ◮ Drawing imputed values from the model ◮ Combining observed and imputed information ◮ Standard error estimation ◮ Sensitivity analysis (CTQ data) Hogan (MDEpiNet) Missing Data October 22, 2018 131 / 160

slide-132
SLIDE 132

Recall excerpt from GH data

id tx Y1 Y2 Y3 R3 1 1005 1 35 32 -99 2 1007 1 75 53 83 1 3 1009 1 120 111 119 1 4 1013 1 85 119 100 1 5 1018 1 69 88 -99 6 1019 1 48 55 58 1 7 2003 1 39 -99 -99 8 2008 1 52 -99 -99 9 2011 1 93 80 88 1 10 2016 1 99 89 97 1 11 2017 1 49 41 21 1 12 2024 1 46 51 47 1 13 2031 1 63 -99 -99 14 2032 1 64 86 69 1 15 2038 1 81 63 75 1 16 2041 1 28 -99 -99 17 2043 1 126 132 146 1

Hogan (MDEpiNet) Missing Data October 22, 2018 132 / 160

slide-133
SLIDE 133

The strategy behind multiple imputation

Setting: Full data for an individual is (Y , R, X, V ) Objective: Interested in some feature of f (y) or f (y | x) Assumptions:

1 MAR, in that

f (y | x, v, r = 0) = f (y | x, v, r = 1)

2 Model for f (y | x, v, r = 1) has known form Hogan (MDEpiNet) Missing Data October 22, 2018 133 / 160

slide-134
SLIDE 134

The strategy behind multiple imputation

1 Fit a model for f (y | x, v, r = 1); if it is a parametric model, this means estimating

the parameters α in the model f (y | x, v, r = 1, α)

2 For each person having R = 0, take a draw of Y | X, V from the fitted model.

That means, for person i having Ri = 0, plug in their values of Xi and Vi, and draw a value of Yi from the fitted model. Example coming soon.

3 Do this several times for each individual, so that each person has multiple draws of

  • Yi. Can call these
  • Y (1)

i

, Y (2)

i

, . . . , Y (K)

i

Now have K filled-in datasets.

Hogan (MDEpiNet) Missing Data October 22, 2018 134 / 160

slide-135
SLIDE 135

The strategy behind multiple imputation

4 Perform the analysis you would have carried out had the data been complete. If

you are interested in the parameter θ, this gives you K parameter estimates,

  • θ(1),

θ(2), . . . , θ(K)

Hogan (MDEpiNet) Missing Data October 22, 2018 135 / 160

slide-136
SLIDE 136

The strategy behind multiple imputatation

5 Now need an estimate and standard error ◮ The estimate is the sample mean

  • θ

= (1/K)

K

  • j=1
  • θ(j)

◮ The (estimate of) variance of

θ combines the between- and within-imputation variance, var( θ) = 1 K − 1

K

  • j=1

( θ(j) − θ)2 + 1 K

K

  • j=1

var( θ(j)), where var( θ(j)) =

  • s.e.(

θ(j)) 2

Hogan (MDEpiNet) Missing Data October 22, 2018 136 / 160

slide-137
SLIDE 137

Analysis 1: Use MI to estimate E(Y3)

As with single imputation, we treat Y1 as an auxiliary variable to use in the imputation model Missing data assumption: MAR f (y3 | y1, r = 1) = f (y3 | y1, r = 0) Specify parametric imputation model f (y3 | y1, r = 1) Y3 = β0 + β1Y1 + e e ∼ N(0, σ2)

◮ This implies that the model f (y3 | y1, r = 1) is a normal distribution with mean and

variance E(Y3 | Y1) = β0 + β1Y1 var(Y3 | Y1) = σ2

Hogan (MDEpiNet) Missing Data October 22, 2018 137 / 160

slide-138
SLIDE 138

Applying MI to the GH data

1 Fit a model for f (y3 | y1, r3 = 1)

fitted.model = lm(Y3 ~ Y1, subset = (R3==1)) beta.hat = fitted.model$coefficients sigma.sq = anova.lm(fitted.model)$‘Mean Sq‘[2] sigma = sqrt(sigma.sq)

For the GH data, this means fitting the regression model specified above. The estimated parameters are

  • β0

= 15.6

  • β1

= 0.19

  • σ

= 20.9

Hogan (MDEpiNet) Missing Data October 22, 2018 138 / 160

slide-139
SLIDE 139

Applying MI to the GH data

2 For each person having R = 0, take a draw of Y3 | Y1 from the fitted model. That

means, for person i having Ri = 0, plug in their value of Y1i, and draw a value of Y3 from the fitted model. So we draw imputed values of the missing Y3 from Y3 ∼ N(15.6 + 0.19Y1i, 20.92)

X.matrix = cbind( rep(1,length(Y1)), Y1) Y3.mean = X.matrix %*% beta.hat Y1 R3 Y3 Y3.mean [1,] 35.0 NA 43.36874 [2,] 74.5 1 82.6 84.66539 [3,] 120.2 1 118.7 132.44404 [4,] 84.8 1 99.6 95.43388 [5,] 68.6 NA 78.49703 [6,] 47.9 1 57.5 56.85549 [7,] 39.0 NA 47.55068 [8,] 52.0 NA 61.14198

Hogan (MDEpiNet) Missing Data October 22, 2018 139 / 160

slide-140
SLIDE 140

Applying MI to GH data

3 Do this several times for each individual, so that each person has multiple draws of

  • Y3. Can call these
  • Y (1)

3i ,

Y (2)

3i , . . . ,

Y (K)

3i

Now have K filled-in datasets.

# Draw 10 imputations of Y3 K = 10 n = length(Y3) Y3.imp = matrix(0, nrow=n, ncol=K) for (j in 1:K) { # this line imputes a value for each person Y3.imp[,j] = rnorm(n=n, mean=Y3.mean, sd = sigma ) # this line replaces the values with observed Y3 where Y3 is observed Y3.imp[R3==1,] = Y3[R3==1] }

Hogan (MDEpiNet) Missing Data October 22, 2018 140 / 160

slide-141
SLIDE 141

Excerpt from imputed data

Y1 R3 Y3 Y3.mean === FIRST 4 IMPUTATIONS === [1,] 35.0 NA 43.4 34.5 46.8 33.3 28.6 [2,] 74.5 1 82.6 84.7 82.6 82.6 82.6 82.6 [3,] 120.2 1 118.7 132.4 118.7 118.7 118.7 118.7 [4,] 84.8 1 99.6 95.4 99.6 99.6 99.6 99.6 [5,] 68.6 NA 78.5 64.0 61.0 102.3 81.3 [6,] 47.9 1 57.5 56.9 57.5 57.5 57.5 57.5 [7,] 39.0 NA 47.6 42.3 59.3 44.4 43.8

Hogan (MDEpiNet) Missing Data October 22, 2018 141 / 160

slide-142
SLIDE 142

Applying MI to GH data

4 Perform the analysis you would have carried out had the data been complete. If

you are interested in the parameter θ, this gives you K parameter estimates,

  • θ(1),

θ(2), . . . , θ(K)

Hogan (MDEpiNet) Missing Data October 22, 2018 142 / 160

slide-143
SLIDE 143

Applying MI to GH data

# Step 4: Calculate E(Y3) for each replicated dataset # We will do this on the matrix Y3.imp, calculating column means and s.e. Y3.bar.imp = apply(Y3.imp, 2, mean) Y3.sd.imp = apply(Y3.imp, 2, sd) Y3.se.imp = Y3.sd / sqrt(n)

Hogan (MDEpiNet) Missing Data October 22, 2018 143 / 160

slide-144
SLIDE 144

Applying MI to GH data

> # Here are the means and s.e. from each imputed dataset > cbind(Y3.bar.imp, Y3.se.imp) Y3.bar.imp Y3.se.imp [1,] 75.1 5.5 [2,] 80.6 5.3 [3,] 77.8 5.6 [4,] 79.9 5.4 [5,] 76.9 5.6 [6,] 73.6 6.2 [7,] 81.1 5.0 [8,] 78.3 5.4 [9,] 81.4 5.3 [10,] 81.4 5.2

Hogan (MDEpiNet) Missing Data October 22, 2018 144 / 160

slide-145
SLIDE 145

Apply MI to GH data

5 The (estimate of) variance of

θ combines the between- and within-imputation variance, var( θ) = 1 K − 1

K

  • j=1

( θ(j) − θ)2 + 1 K

K

  • j=1

var( θ(j)),

# Step 5: Calculate overall mean and SE theta.hat = mean(Y3.bar.imp) var.W = mean( Y3.se.imp^2 ) var.B = (1 + 1/K) * sd( Y3.bar.imp )^2 se.theta.hat = sqrt(var.W + var.B) # missing information miss.info = var.B / (var.W + var.B) c(theta.hat, se.theta.hat, miss.info) [1] 78.5996499 6.1756551 0.2199223

Hogan (MDEpiNet) Missing Data October 22, 2018 145 / 160

slide-146
SLIDE 146

Comparing results from three methods

Method

  • E(Y3)

s.e. Observed data only 88.3 6.7 IPW 80.6 6.8 Regression imputation 79.0 4.3 with bootstrap s.e. 6.8 Multiple imputation 78.6 6.2

Hogan (MDEpiNet) Missing Data October 22, 2018 146 / 160

slide-147
SLIDE 147

Example 2: CTQ Data

Goal of analysis: treatment comparison between wellness and exercise Available data: Y = indicator of quit status at week 12 (1 if yes, 0 if no) X = treatment indicator V = auxiliary covariates measured at baseline R = indicator of whether Y is observed Model of interest logit{Pr(Y = 1 | X)} = β0 + β1X

Hogan (MDEpiNet) Missing Data October 22, 2018 147 / 160

slide-148
SLIDE 148

Example 2: CTQ Data

Auxiliary covariates V = (F, W ): F = Fagerstrom index of nicotine dependence (1-10) W = weight at baseline Recall: model of interest does not involve F, W logit{Pr(Y = 1 | X)} = β0 + β1X

Hogan (MDEpiNet) Missing Data October 22, 2018 148 / 160

slide-149
SLIDE 149

CTQ Data

Imputation model under MAR logit{P(Y = 1 | X, F, W )} = α0 + α1X + α2F + α3W Si = α0 + α1Xi + α2Fi + α3Wi φi = exp(Si) 1 + exp(Si) Yi ∼ Ber(φi) Imputation involves drawing multiple values of Y from the appropriate Bernoulli distribution.

Hogan (MDEpiNet) Missing Data October 22, 2018 149 / 160

slide-150
SLIDE 150

Apply MI to CTQ Data

Data excerpt

> ctq id week y z weight j r fs basewt 2 305 12 1 0 265.00 1 1 8 238.00 4 309 12 0 1 123.00 1 1 6 120.50 6 311 12 0 0 143.50 1 1 9 142.50 8 313 12 0 1 114.50 1 1 7 106.50 10 314 12 0 0 138.50 1 1 9 132.00 12 317 12 NA 1 NA 1 0 5 136.00 14 321 12 0 0 137.00 1 1 5 137.25 16 324 12 0 0 144.00 1 1 6 140.00 18 325 12 0 0 103.00 1 1 7 102.50 20 326 12 1 1 169.00 1 1 9 154.00 22 328 12 0 1 206.00 1 1 6 202.50 24 331 12 NA 1 NA 1 0 6 142.00

Hogan (MDEpiNet) Missing Data October 22, 2018 150 / 160

slide-151
SLIDE 151

Apply MI to CTQ Data

Fit treatment model to observed data only logit{P(Y = 1 | X)} = β0 + β1X

> # fit regression model of smoking to z with observed data > model.0 = glm(y ~ z, family=binomial(link="logit") ) > summary(model.0) Call: glm(formula = y ~ z, family = binomial(link = "logit")) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept)

  • 0.8224

0.2228

  • 3.691 0.000223 ***

z 0.5600 0.3064 1.828 0.067577 .

  • Null deviance: 246.25
  • n 186

degrees of freedom (90 observations deleted due to missingness)

Hogan (MDEpiNet) Missing Data October 22, 2018 151 / 160

slide-152
SLIDE 152

Apply MI to CTQ Data

2 Fit imputation model

logit{P(Y = 1 | X, F, W )} = α0 + α1X + α2F + α3W = S

> imp.model = glm(y ~ z + basewt + fs, family = binomial) > summary(imp.model) glm(formula = y ~ z + basewt + fs, family = binomial) Estimate Std. Error z value Pr(>|z|) (Intercept) -0.970904 0.910468

  • 1.066

0.28625 z 0.634387 0.328802 1.929 0.05368 . basewt 0.014262 0.005571 2.560 0.01046 * fs

  • 0.334822

0.090610

  • 3.695

0.00022 ***

Hogan (MDEpiNet) Missing Data October 22, 2018 152 / 160

slide-153
SLIDE 153

Apply MI to CTQ Data

3 Generate imputation

Y

  • Si

=

  • α0 +

α1Xi + α2Fi + α3Wi

  • φi

= exp( Si) 1 + exp( Si)

  • Yi

∼ Ber( φi)

> X.matrix = cbind(1, z, basewt, fs) > X.beta = X.matrix %*% imp.model$coefficients > p.hat = exp(X.beta) / ( 1 + exp(X.beta) ) > > # draw an imputation for each person > y.hat = rbinom( n=length(p.hat), size=1, prob=p.hat ) > > # replace imputation with observed y where r=1 > y.hat[r==1] = y[r==1]

Hogan (MDEpiNet) Missing Data October 22, 2018 153 / 160

slide-154
SLIDE 154

Apply MI to CTQ Data

4 Repeat 10 times and re-fit treatment model to filled-in datasets

> round(beta.imp, digits=2) beta.0 beta.1 [1,] -0.83 0.59 [2,] -0.73 0.43 [3,] -0.90 0.50 [4,] -0.90 0.43 [5,] -0.77 0.49 [6,] -0.70 0.43 [7,] -0.86 0.43 [8,] -0.80 0.46 [9,] -0.86 0.53 [10,] -0.80 0.55

Hogan (MDEpiNet) Missing Data October 22, 2018 154 / 160

slide-155
SLIDE 155

Apply MI to CTQ Data

5 Summarize results over imputed datasets, and compare to original model (used

K = 100 here)

> ## IMPUTATION RESULTS > round(results, digits=3) > est se Z var.W var.B miss.info beta.0

  • 0.837

0.200 -4.177 0.033 0.007 0.184 beta.1 0.538 0.288 1.872 0.064 0.019 0.228 > ## MODEL FIT TO OBSERVED DATA ONLY > summary(model.0) glm(formula = y ~ z, family = binomial(link = "logit")) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept)

  • 0.8224

0.2228

  • 3.691 0.000223 ***

z 0.5600 0.3064 1.828 0.067577 .

Hogan (MDEpiNet) Missing Data October 22, 2018 155 / 160

slide-156
SLIDE 156

Interpreting results

Choice of imputation model (simple in this case) Does similarity of results imply that MDM is MAR? Why or why not?

Hogan (MDEpiNet) Missing Data October 22, 2018 156 / 160

slide-157
SLIDE 157

Introducing MNAR mechanism

Imputation model under MAR logit{P(Y = 1 | X, F, W )} = α0 + α1X + α2F + α3W Si = α0 + α1Xi + α2Fi + α3Wi φi = exp(Si) 1 + exp(Si) Yi ∼ Ber(φi) Imputation model under MNAR (applies to those with R = 0) S∆

i

= ∆0(1 − Xi) + ∆1Xi + Si φ∆

i

= exp(S∆

i )

1 + exp(S∆

i )

Yi ∼ Ber(φ∆

i )

Hogan (MDEpiNet) Missing Data October 22, 2018 157 / 160

slide-158
SLIDE 158

Understanding the MNAR imputation model

S∆

i

= ∆0(1 − Xi) + ∆1Xi + Si φ∆

i

= exp(S∆

i )

1 + exp(S∆

i )

Yi ∼ Ber(φ∆

i )

For specific treatment group: ∆ > 0 implies higher probability of Y = 1, relative to MAR

◮ As ∆ → ∞, φ∆

i → 1

∆ < 0 implies lower probability of Y = 1, relative to MAR

◮ As ∆ → −∞, φ∆

i → 0

Hence the ∆ parameters move individual probabilities toward 1 or toward 0, relative to what is predicted by Si

Hogan (MDEpiNet) Missing Data October 22, 2018 158 / 160

slide-159
SLIDE 159

Example

Set ∆1 = ∆0 = −100 This essentially implies everyone with missing data has Y = 0 with probability = 1 (‘dropouts are smokers’).

> ## IMPUTATION RESULTS > round(results, digits=3) > est se Z var.W var.B miss.info beta.0

  • 1.386

0.208 -6.677 0.043 beta.1 0.553 0.281 1.969 0.079 Treatment effect about the same Overall cessation rate lower (intercept) Notice between-imputation variance

Hogan (MDEpiNet) Missing Data October 22, 2018 159 / 160

slide-160
SLIDE 160

Summary

Imputation is a very flexible tool for filling in missing data Appeals to intuition Have to carefully consider the imputation model Multiple imputation (parametric) versus regression imputation (semiparametric) Can introduce parameters for MNAR, sensitivity analysis Bayesian approaches

Hogan (MDEpiNet) Missing Data October 22, 2018 160 / 160