The Stochastic EM Algorithm for Censored Mixed Models Ian Marschner - - PowerPoint PPT Presentation

the stochastic em algorithm for censored mixed models
SMART_READER_LITE
LIVE PREVIEW

The Stochastic EM Algorithm for Censored Mixed Models Ian Marschner - - PowerPoint PPT Presentation

The Stochastic EM Algorithm for Censored Mixed Models Ian Marschner Macquarie University, Sydney 10th International Conference on MCQMC Methods in Scientific Computing, Sydney, 2012 1 / 18 INTRODUCTION Stochastic EM algorithm is a Monte


slide-1
SLIDE 1

The Stochastic EM Algorithm for Censored Mixed Models

Ian Marschner

Macquarie University, Sydney

10th International Conference on MCQMC Methods in Scientific Computing, Sydney, 2012

1 / 18

slide-2
SLIDE 2

INTRODUCTION

Stochastic EM algorithm is a Monte Carlo iterative technique for computing an (approximate) MLE subject to missing data Fitting mixed models for repeated measures data with censoring can be a computationally challenging task due to the need for multidimensional integration in the likelihood function arising from the censoring (missingness) This talk presents a case study of the application of the stochastic EM algorithm to a censored mixed model analysis in the context of repeated HIV viral load measurements

2 / 18

slide-3
SLIDE 3

EM ALGORITHM

Maximisation of a log-likelihood 퐿(휃; 푌 ) in the presence of missing data Given a current estimate ˆ 휃old of 휃, an iteration of the algorithm proceeds in two steps E-step: 푄(휃∣ˆ 휃old) = 퐸 [ 퐿푐(휃; 푋)

  • 푌 ; ˆ

휃old ] where 퐿푐 is the log-likelihood corresponding to the complete data 푋 M-step: maximise 푄(휃∣ˆ 휃old) over the parameter space 휃 ∈ Θ to produce the updated estimate ˆ 휃new Iterate until convergence (ˆ 휃new = ˆ 휃old)

3 / 18

slide-4
SLIDE 4

INTRACTABLE E-STEP

In some cases the conditional expectation required for the E-step may be intractable Monte-Carlo EM (MCEM): involves calculation of conditional expectation via simulation MCEM requires a large number of simulations of missing data, conditional on observed data, at each iteration Stochastic EM (StEM): like MCEM but only one simulation at each iteration MCEM is equivalent to EM (MLE) up to the simulation approximation StEM is not equivalent to EM. It produces a Markov chain iterative sequence that can be averaged to produce an estimate of 휃

4 / 18

slide-5
SLIDE 5

STOCHASTIC EM ALGORITHM

Replace the E step with a single simulation 푋∗ of the complete data 푋, conditional on the observed data 푌 and the current estimate ˆ 휃old an StEM iteration consists of maximising 퐿푐(휃; 푋∗) The sequence of iterates arising from the StEM algorithm is a Markov chain and converges to a stationary distribution After a sufficiently long burn-in period a point estimate of 휃 can be

  • btained by averaging the sequence of StEM iterates

Results exist drawing an asymptotic equivalence between the StEM and EM algorithms

5 / 18

slide-6
SLIDE 6

CASE STUDY

HIV viral load: blood concentration of HIV (log transformed) Antiviral therapy suppresses viral load Data: Repeated measures of viral load on the same individual subsequent to (partial) cessation of antiviral therapy Data: Subjects measured an average of 6.2 times each (4-8 times) Rate of increase of viral load (viral rebound) and its relationship to certain covariates gives information about HIV viral dynamics Viral load left censored (40%) due to limitations of laboratory assay

6 / 18

slide-7
SLIDE 7

CENSORED REPEATED MEASURES EXAMPLES

TIME (DAYS) VIRAL LOAD (log10) 20 40 60 80 1 2 3 4 5 6 Crude Slope = 0.026 Adjusted Slope = 0.053

^ ^ ^ ^ ^

TIME (DAYS) VIRAL LOAD (log10) 20 40 60 80 1 2 3 4 5 6 Crude Slope = 0.092 Adjusted Slope = 0.121

^ ^

TIME (DAYS) VIRAL LOAD (log10) 20 40 60 80 1 2 3 4 5 6 Crude Slope = 0.044 Adjusted Slope = 0.061

^ ^ ^ ^

TIME (DAYS) VIRAL LOAD (log10) 20 40 60 80 1 2 3 4 5 6 Crude Slope = 0.079 Adjusted Slope = 0.097

^ ^

7 / 18

slide-8
SLIDE 8

EXPLORATORY ANALYSES

Separate censored linear regressions for each subject yield subject-specific estimates of rate of viral rebound Ignoring censoring leads to underestimation of rates of viral rebound Substantial variation exists between subjects in rates of viral rebound Various subject-specific viral parameters (covariates) may explain this variation

viral load on starting therapy immune level when therapy was ceased (CD4 count) immune response when while on therapy (change in CD4 count)

Exploratory analyses suggest a linear mixed effects model with uncorrelated slope and intercept

8 / 18

slide-9
SLIDE 9

RATES OF VIRAL REBOUND

0.04 0.06 0.08 0.10 0.12

std dev = 0.023

Subject-specific covariates may explain this variation

9 / 18

slide-10
SLIDE 10

RATES OF VIRAL REBOUND (versus other quantities)

10 / 18

slide-11
SLIDE 11

REPEATED MEASURES MIXED MODEL

Let 푡푖푗 be the 푗th time at which subject 푖 is observed Let 푉푖푗 be the value of subject 푖’s viral load at time 푡푖푗 퐴푖, 퐵푖 and 퐶푖 are the values of the covariates for individual 푖 Mixed effects model: 푉푖푗 = 휇푖 + 푔푖(퐴푖, 퐵푖, 퐶푖)푡푖푗 + 휖푖푗 where 푔푖(퐴, 퐵, 퐶) = 훼푖 + 훽퐴 + 훾퐵 + 훿퐶 {휇푖} ∼ 푁(휇, 휎2

휇); {훼푖} ∼ 푁(훼, 휎2 훼); {휖푖푗} ∼ 푁(0, 휎2 휖 )

(i.i.d.) Censoring: Observed data are 푉 ∗

푖푗 = max(푉푖푗, 퐿푖푗) for some lower

assay limit 퐿푖푗 that is independent of 푉푖푗 In the absence of censoring, estimation is standard

11 / 18

slide-12
SLIDE 12

STOCHASTIC EM ALGORITHM

EM algorithm can be developed using the uncensored data as the complete data set Problem: intractable E-step due to multi-dimensional expectations

  • ver the censoring ranges

up to 8-dimensional integration at each iteration for each individual

Stochastic EM algorithm:

simulate the uncensored data (conditional on the censoring limits)

  • nce at each iteration using truncated multivariate normal

distribution at each iteration apply standard mixed model analysis to estimate updated parameter values

Simple to implement: truncated multinormal simulation − − → ← − mixed model analysis

12 / 18

slide-13
SLIDE 13

COMPUTATIONAL DETAILS

Truncated multivariate normal simulation:

Simple acceptance/rejection sampling (easy, inefficient)

  • ff-the-shelf routine using Gibbs sampling (more efficient, still easy)

Initial estimates:

Start with subject-specific censored linear regressions For each individual, impute uncensored observations by simulating from subject-specific fitted model Calculate initial estimates using multiple imputation

Burn-in period and MCMC length:

Experience suggests very quick stationarity beginning with multiple imputation estimate Analysis used burn-in period of 50 iterations followed by averaging of the next 500 iterations

Information matrix is computable directly

13 / 18

slide-14
SLIDE 14

PRELIMINARY SIMULATION RESULTS

100 simulations of random intercept and slope model (no covariates) 10 patients with 5 observations per patient and 50% censoring Unit slope StEM slope estimates:

0.0 0.5 1.0 1.5 2.0 2.5 5 10 15 20 slope estimate

14 / 18

slide-15
SLIDE 15

ANALYSIS RESULTS

Preliminary analysis with all covariates (8 parameters) Tests of significance of covariates based on information matrix standard errors

Baseline viral load: P = 0.22 Initial immune level: P = 0.32 Immune response on therapy: P < 0.0001

Consider model with immune response as the sole covariate Consider whether observed variation in subject-specific slopes (viral rebound) is substantially explained by immune response covariate

15 / 18

slide-16
SLIDE 16

ANALYSIS RESULTS

Model with immune response only in the model:

immune response explains viral rebound differences (>80%) crude analysis ignoring censoring is biased MI analysis is similar to StEM with some differences

Method ˆ 휇 ˆ 훼 ˆ 훿 ˆ 휎2

ˆ 휎2

ˆ 휎2

uncensored 0.96 –0.026 0.054 0.046 0.00015 0.12 imputed 0.36 –0.024 0.061 0.39 9.4 × 10−5 0.13 StEM 0.46 –0.025 0.061 0.32 8.3 × 10−5 0.16 Interpretation:

On cessation of therapy, the rate of viral rebound is greatest for individuals who had a better immune response to therapy A better immune response creates more “fuel” (i.e. more targets) for viral rebound

16 / 18

slide-17
SLIDE 17

MARKOV CHAIN ITERATIVE BEHAVIOUR (6 parameters)

17 / 18

slide-18
SLIDE 18

CONCLUSIONS

Ignoring censoring in repeated measures data can lead to gross biases Stochastic EM algorithm allows repeated measures mixed models to be fitted to censored data using a standard mixed model routine for uncensored data All that is required is a truncated multivariate normal simulator, for simulation of uncensored data conditional on the censored data Assuming the availability of a NLME routine, non-linear models can be handled just as easily as linear models In the case study stationarity was achieved quickly and the fitted censored mixed effects models allowed insights into the sources of variation in changes over time

18 / 18