[PPT] - Bayesian Nested Partially-Latent Models for Dependent Binary Data PowerPoint Presentation

SLIDE 1

Problem Models Results Results Discussion

Bayesian Nested Partially-Latent Models for Dependent Binary Data

Estimating Disease Etiology Zhenke Wu

Postdoctoral Fellow Department of Biostatistics

09 November 2015

R Package: https://github.com/zhenkewu/baker

Zhenke Wu(zhwu@jhu.edu) Biostat Grand Rounds, JHSPH 09 November 2015 0 / 18

SLIDE 2

Problem Models Results Results Discussion

Question: What’s Causing Her Lung Infection?

Measurements From a Random Case

Bacterium Virus Measurements using different specimens

Zhenke Wu(zhwu@jhu.edu) Biostat Grand Rounds, JHSPH 09 November 2015 1 / 18

SLIDE 3

Problem Models Results Results Discussion

Motivating Application

Pneumonia Etiology Research for Child Health (PERCH)

Background:

> 1 million deaths per year among children

under 5

> 30 possible pathogen causes

Goal:

To determine the etiology and risk factors

for pneumonia Design:

7-country, case-control study
Multiple modern diagnostic tools
∼5,000 cases and ∼5,000 controls

Zhenke Wu(zhwu@jhu.edu) Biostat Grand Rounds, JHSPH 09 November 2015 2 / 18

SLIDE 4

Problem Models Results Results Discussion

Common Questions on Individual and Population Health

1.

a. What is the person’s health state

given health measurements?

b. What is the population distribution of

health states?

(Wu et al., 2015a,b,c)

2. How to make robust inference?

Picture source: http://www.diabetesdaily.com/voices/2014/07/why-one-size-fits-all-doesnt-work-in-diabetes Zhenke Wu(zhwu@jhu.edu) Biostat Grand Rounds, JHSPH 09 November 2015 3 / 18

SLIDE 5

Problem Models Results Results Discussion

Problem and Data Features

Latent health state:

Estimating population distribution + individual diagnosis

Data Features:

1. Gold-standard measure: few or none
2. Latent state: many categories
3. Measurements: many, with distinct error rates, missingness
4. Blessing: control data

No effective and principled methods to estimate the etiologic distribution (“pie”) using such data.

Zhenke Wu(zhwu@jhu.edu) Biostat Grand Rounds, JHSPH 09 November 2015 4 / 18

SLIDE 6

Problem Models Results Results Discussion

Our Approach: Direct Modeling

Connect Latent States and Measurements for Individual i

ILi

MSi

ψ Xi θ

Sensitivity (True positive rate) 1-specitivity (False positive rate) covariates measurements unobserved lung infection

IL

i =0

Mi

i

Xi

Healthy Controls

IL

i

Mi

i

Xi

i

IL

i

Mi

i

Xi

i

IL

i

Mi

i

Xi

i



IL

i

Mi

i

Xi

i

 

Zhenke Wu(zhwu@jhu.edu) Biostat Grand Rounds, JHSPH 09 November 2015 5 / 18

SLIDE 7

Problem Models Results Results Discussion

Latent Class Models (LCM)

Review

A B C D E

𝜔"

($)

𝜔"

(&)

𝜔"

(')

𝜔"

(()

𝜔"

())

𝜄"

($)

𝜔+

(&)

𝜔+

(')

𝜔+

(()

𝜔+

())

𝜔,

($)

𝜄"

(&)

𝜔,

(')

𝜔 ,

(()

𝜔,

())

𝜔-

($)

𝜔-

(&)

𝜄"

(')

𝜔-

(()

𝜔-

())

𝜔.

($)

𝜔.

(&)

𝜔.

(')

𝜄"

(()

𝜔.

())

𝜔/

($)

𝜔/

(&)

𝜔/

(')

𝜔/

(()

𝜄"

())

𝜔+

($)

𝜔 ,

(&)

𝜔-

(')

𝜔.

(()

𝜔/

())

IDEA: marginal correlations are caused by confounding of unobserved

cluster indicators (Ii)

Assumption 1: Within-Class Homogeneity

P[Mij = 1 | Ii = k] = ψ(j)

k , k = 1, ..., K

Assumption 2: Local Independence (LI)

P[Mi1 = m1, ..., MiJ = mJ | Ii = k] =

J

j=1

Pr[Mij = mj | Ii = k], ∀(m1, ..., mJ)′ = m

Zhenke Wu(zhwu@jhu.edu) Biostat Grand Rounds, JHSPH 09 November 2015 6 / 18

SLIDE 8

Problem Models Results Results Discussion

Partially-Latent Class Models (pLCM; Wu et al. 2015a)

Model Structure

cases controls A B C D E 𝜌A 𝜌B 𝜌C 𝜌𝐸 𝜌𝐹

𝜔1

(𝐵)

𝜔1

(𝐶)

𝜔1

(𝐷)

𝜔1

(𝐸)

𝜔1

(𝐹)

𝜄1

(𝐵)

𝜔1

(𝐶)

𝜔1

(𝐷)

𝜔1

(𝐸)

𝜔1

(𝐹)

𝜔1

(𝐵)

𝜄1

(𝐶)

𝜔1

(𝐷)

𝜔1

(𝐸)

𝜔1

(𝐹)

𝜔1

(𝐵)

𝜔1

(𝐶)

𝜄1

(𝐷)

𝜔1

(𝐸)

𝜔1

(𝐹)

𝜔1

(𝐵)

𝜔1

(𝐶)

𝜔1

(𝐷)

𝜄1

(𝐸)

𝜔1

(𝐹)

𝜔1

(𝐵)

𝜔1

(𝐶)

𝜔1

(𝐷)

𝜔1

(𝐸)

𝜄1

(𝐹)

False positive rate (FPR) True positive rate (TPR) Population etiology (𝝆) disease “class”

Partially-observed class:

Controls have no lung infection;

Non-interference:

P(M[−j] | Y = 0) = P(M[−j] | I L = j, Y = 1);

Local independence (LI):

independence among measurements given class (I L

i ).

Next: relax both non-interference and LI assumptions.

Zhenke Wu(zhwu@jhu.edu) Biostat Grand Rounds, JHSPH 09 November 2015 7 / 18

SLIDE 9

Problem Models Results Results Discussion

Modeling Local Dependence (LD)

logOR

s.e. std.logOR

0.86

0.4 2.1

−2.47

1.01 −2.4

0.79

0.22 3.5

1.67

0.39 4.3

−1.3

0.61 −2.1

1.12

0.24 4.7

0.51

0.23 2.2

−1.72

0.4 −4.3

−3.37

1.01 −3.3

−3.59

1.01 −3.6 RSV:(6) (6):RSV RHINO:(5) (5):RHINO PARA_1:(4) (4):PARA_1 HMPV_A_B:(3) (3):HMPV_A_B ADENO:(2) (2):ADENO HINF:(1) (1):HINF

cases controls

Direct evidence from control data;

symmetry (see Figure); pathogen interactions

Impact on inference (Pepe and

Janes, 2007; Albert et al., 2001)

Modeling cross-classified probability

contingency tables P(Mi1 = m1, ..., MiJ = mJ)

Log-linear parametrization
Generalized linear mixed-effect

models (GLMM)

Mixed-membership models
Other non-negative decompositions

Zhenke Wu(zhwu@jhu.edu) Biostat Grand Rounds, JHSPH 09 November 2015 8 / 18

SLIDE 10

Problem Models Results Results Discussion

Nested pLCM

Example: 5 Pathogens, 2 Subclasses

Zhenke Wu(zhwu@jhu.edu) Biostat Grand Rounds, JHSPH 09 November 2015 9 / 18

SLIDE 11

Problem Models Results Results Discussion

Example: Dependence Structure; 2 Subclasses

Left: weak LD Right: strong LD

Zhenke Wu(zhwu@jhu.edu) Biostat Grand Rounds, JHSPH 09 November 2015 10 / 18

SLIDE 12

Problem Models Results Results Discussion

Simulation: Relative Asymptotic Bias

Bias if Estimated by Working LI Model (pLCM) Left: weak LD Right: strong LD

Zhenke Wu(zhwu@jhu.edu) Biostat Grand Rounds, JHSPH 09 November 2015 11 / 18

SLIDE 13

Problem Models Results Results Discussion

Estimation in Finite Samples: How Many Subclasses?

Example: 3 Subclasses

A model selection problem:

Extra subclasses: rich correlation structure;
Few subclasses: parsimonious approximation in finite samples.

Proposed solution: Model averaging by stick-breaking prior: to encourage few but allow more if data have rich dependence

Zhenke Wu(zhwu@jhu.edu) Biostat Grand Rounds, JHSPH 09 November 2015 12 / 18

SLIDE 14

Problem Models Results Results Discussion

Finite-Sample Simulations: Smaller MSE by npLCM

Scenario II: Strong LD; Ncase = Ncontrol = 500 Truth: Cases’ First Subclass Weight (ηo) 0.25 0.5 0.75 1 Class 100×Ratio of MSE( Standard Error) A 82( 4) 25( 1) 47( 2) 115( 6) 221( 12) B 516( 11) 177( 5) 80( 3) 62( 4) 140( 8) C 2379( 77) 711( 26) 131( 7) 268( 13) 357( 8) D 397( 14) 152( 6) 94( 5) 79( 4) 60( 4) E 357( 13) 151( 6) 102( 5) 95( 6) 82( 5)

Table: ratio of mean squared errors (MSE) for pLCM vs npLCM. All numbers are averaged across 1,000 replications.

Zhenke Wu(zhwu@jhu.edu) Biostat Grand Rounds, JHSPH 09 November 2015 13 / 18

SLIDE 15

Problem Models Results Results Discussion

Analysis of PERCH Data

24.4

23.8 5.2 15.1 7 13.8 2.8 1.5 8.4 17.9 6.8 5.2 35.9 32.3

0.0 0.2 0.4 0.6

RSV RHINO HMPV_A_B ADENO PARA_1 HINF

ther

cause probability −

23.6% 23.6% 23.6% 23.6% 23.6% 23.6% 23.6% 23.6% 23.6% 23.6% 23.6% 23.6% 23.6% 23.6%

1 1 −

12.5% 12.5% 12.5% 12.5% 12.5% 12.5% 12.5% 12.5% 12.5% 12.5% 12.5% 12.5% 12.5% 12.5%

1 −

10.1% 10.1% 10.1% 10.1% 10.1% 10.1% 10.1% 10.1% 10.1% 10.1% 10.1% 10.1% 10.1% 10.1%

1 1 1 1 −

3% 3% 3% 3% 3% 3% 3% 3% 3% 3% 3% 3% 3% 3%

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 H I N F A D E N O H M P V _ A _ B P A R A _ 1 R H I N O R S V

t

h e r H I N F A D E N O H M P V _ A _ B P A R A _ 1 R H I N O R S V

t

h e r Cause Probability

Zhenke Wu(zhwu@jhu.edu) Biostat Grand Rounds, JHSPH 09 November 2015 14 / 18

SLIDE 16

Problem Models Results Results Discussion

Model Checking: Frequent Binary Patterns

Left: pLCM; Right: npLCM

23.4

23.8 8.3 11 9.5 10.2 8.7 8.7 7.2 7 7.1 5.1 2.3 2.9 2.8 3 1.5 2.3 2.4 2.5 27 23.3

case

0.0 0.1 0.2 0.3

1 1 1 1 1 1 1 1 1 1 1 1

t

h e r pattern frequency

43.2

44 14.3 14.7 11.8 10.7 8.4 5.9 4.2 3.5 2.8 4.9 3.9 3.4 0.8 1.1 1.2 1.1 2 1.3 8.1 8.7

control

0.0 0.2 0.4

1 1 1 1 1 1 1 1 1 1 1 1 1 1

t

h e r pattern frequency

Zhenke Wu(zhwu@jhu.edu) Biostat Grand Rounds, JHSPH 09 November 2015 15 / 18

SLIDE 17

Problem Models Results Results Discussion

Main Points Once Again

Input: multivariate binary data in case-control studies
Output: two histograms: 1) the fraction of cases caused by

each pathogen; 2) the probability of a particular case caused by each pathogen; both given measurements.

Proposed a larger model family (nested pLCM) to

1) Borrow covariation and measurement precision from controls; 2) Account for residual measurement correlations, or local dependence (LD); 3) Parsimoniously approximate LD by sparse Bayesian fitting

Compared to pLCM, the extended model family can

1) Reduce bias 2) Retain efficiency 3) Have near-nominal coverage

Zhenke Wu(zhwu@jhu.edu) Biostat Grand Rounds, JHSPH 09 November 2015 16 / 18

SLIDE 18

Problem Models Results Results Discussion

Regression Analysis

Left: pLCM (bad fit) Middle: npLCM (improved fit) Right: Seasonality

Zhenke Wu(zhwu@jhu.edu) Biostat Grand Rounds, JHSPH 09 November 2015 17 / 18

SLIDE 19

Problem Models Results Results Discussion

Thanks!

Collaborators Scott Zeger Maria Deloria-Knoll Laura Hammitt Katherine O’Brien Funding Patient-Centered Outcome Research Institute [PCORI ME-1408-20318] Bill & Melinda Gates Foundation [48968] Hopkins Individualized Health (inHealth) Initiative Related Papers (More at: zhenkewu.com)

1. Wu Z, Deloria-Knoll M, Hammitt LL, and Zeger SL, for the PERCH Core Team (2015a).

Partially Latent Class Models (pLCM) for Case-Control Studies of Childhood Pneumonia

Etiology. Journal of the Royal Statistical Society: Series C (Applied Statistics). doi:

10.1111/rssc.12101.

2. Wu Z, Zeger SL (2015b). Nested Partially-Latent Class Models for Estimating Disease

Etiology from Case-Control Data. Johns Hopkins Biostatistics Working Papers No. 276.

3. Wu Z, Zeger SL (2015c). Regression Analysis for Estimating Disease Etiology from

Case-Control Data. Johns Hopkins Biostatistics Working Papers.

Zhenke Wu(zhwu@jhu.edu) Biostat Grand Rounds, JHSPH 09 November 2015 18 / 18