Bayesian Nested Partially-Latent Models for Dependent Binary Data - - PowerPoint PPT Presentation

bayesian nested partially latent models for dependent
SMART_READER_LITE
LIVE PREVIEW

Bayesian Nested Partially-Latent Models for Dependent Binary Data - - PowerPoint PPT Presentation

Problem Models Results Results Discussion Bayesian Nested Partially-Latent Models for Dependent Binary Data Estimating Disease Etiology Zhenke Wu Postdoctoral Fellow Department of Biostatistics 09 November 2015 R Package:


slide-1
SLIDE 1

Problem Models Results Results Discussion

Bayesian Nested Partially-Latent Models for Dependent Binary Data

Estimating Disease Etiology Zhenke Wu

Postdoctoral Fellow Department of Biostatistics

09 November 2015

R Package: https://github.com/zhenkewu/baker

Zhenke Wu(zhwu@jhu.edu) Biostat Grand Rounds, JHSPH 09 November 2015 0 / 18

slide-2
SLIDE 2

Problem Models Results Results Discussion

Question: What’s Causing Her Lung Infection?

Measurements From a Random Case

Bacterium Virus Measurements using different specimens

Zhenke Wu(zhwu@jhu.edu) Biostat Grand Rounds, JHSPH 09 November 2015 1 / 18

slide-3
SLIDE 3

Problem Models Results Results Discussion

Motivating Application

Pneumonia Etiology Research for Child Health (PERCH)

Background:

  • > 1 million deaths per year among children

under 5

  • > 30 possible pathogen causes

Goal:

  • To determine the etiology and risk factors

for pneumonia Design:

  • 7-country, case-control study
  • Multiple modern diagnostic tools
  • ∼5,000 cases and ∼5,000 controls

Zhenke Wu(zhwu@jhu.edu) Biostat Grand Rounds, JHSPH 09 November 2015 2 / 18

slide-4
SLIDE 4

Problem Models Results Results Discussion

Common Questions on Individual and Population Health

1.

  • a. What is the person’s health state

given health measurements?

  • b. What is the population distribution of

health states?

(Wu et al., 2015a,b,c)

  • 2. How to make robust inference?

Picture source: http://www.diabetesdaily.com/voices/2014/07/why-one-size-fits-all-doesnt-work-in-diabetes Zhenke Wu(zhwu@jhu.edu) Biostat Grand Rounds, JHSPH 09 November 2015 3 / 18

slide-5
SLIDE 5

Problem Models Results Results Discussion

Problem and Data Features

Latent health state:

  • Estimating population distribution + individual diagnosis

Data Features:

  • 1. Gold-standard measure: few or none
  • 2. Latent state: many categories
  • 3. Measurements: many, with distinct error rates, missingness
  • 4. Blessing: control data

No effective and principled methods to estimate the etiologic distribution (“pie”) using such data.

Zhenke Wu(zhwu@jhu.edu) Biostat Grand Rounds, JHSPH 09 November 2015 4 / 18

slide-6
SLIDE 6

Problem Models Results Results Discussion

Our Approach: Direct Modeling

Connect Latent States and Measurements for Individual i

ILi

MSi

ψ Xi θ

Sensitivity (True positive rate) 1-specitivity (False positive rate) covariates measurements unobserved lung infection

IL

i =0

Mi

i

Xi

Healthy Controls

IL

i

Mi

i

Xi

i

IL

i

Mi

i

Xi

i

IL

i

Mi

i

Xi

i

IL

i

Mi

i

Xi

i

 

Zhenke Wu(zhwu@jhu.edu) Biostat Grand Rounds, JHSPH 09 November 2015 5 / 18

slide-7
SLIDE 7

Problem Models Results Results Discussion

Latent Class Models (LCM)

Review

A B C D E

𝜔"

($)

𝜔"

(&)

𝜔"

(')

𝜔"

(()

𝜔"

())

𝜄"

($)

𝜔+

(&)

𝜔+

(')

𝜔+

(()

𝜔+

())

𝜔,

($)

𝜄"

(&)

𝜔,

(')

𝜔 ,

(()

𝜔,

())

𝜔-

($)

𝜔-

(&)

𝜄"

(')

𝜔-

(()

𝜔-

())

𝜔.

($)

𝜔.

(&)

𝜔.

(')

𝜄"

(()

𝜔.

())

𝜔/

($)

𝜔/

(&)

𝜔/

(')

𝜔/

(()

𝜄"

())

𝜔+

($)

𝜔 ,

(&)

𝜔-

(')

𝜔.

(()

𝜔/

())

  • IDEA: marginal correlations are caused by confounding of unobserved

cluster indicators (Ii)

  • Assumption 1: Within-Class Homogeneity

P[Mij = 1 | Ii = k] = ψ(j)

k , k = 1, ..., K

  • Assumption 2: Local Independence (LI)

P[Mi1 = m1, ..., MiJ = mJ | Ii = k] =

J

  • j=1

Pr[Mij = mj | Ii = k], ∀(m1, ..., mJ)′ = m

Zhenke Wu(zhwu@jhu.edu) Biostat Grand Rounds, JHSPH 09 November 2015 6 / 18

slide-8
SLIDE 8

Problem Models Results Results Discussion

Partially-Latent Class Models (pLCM; Wu et al. 2015a)

Model Structure

cases controls A B C D E 𝜌A 𝜌B 𝜌C 𝜌𝐸 𝜌𝐹

𝜔1

(𝐵)

𝜔1

(𝐶)

𝜔1

(𝐷)

𝜔1

(𝐸)

𝜔1

(𝐹)

𝜄1

(𝐵)

𝜔1

(𝐶)

𝜔1

(𝐷)

𝜔1

(𝐸)

𝜔1

(𝐹)

𝜔1

(𝐵)

𝜄1

(𝐶)

𝜔1

(𝐷)

𝜔1

(𝐸)

𝜔1

(𝐹)

𝜔1

(𝐵)

𝜔1

(𝐶)

𝜄1

(𝐷)

𝜔1

(𝐸)

𝜔1

(𝐹)

𝜔1

(𝐵)

𝜔1

(𝐶)

𝜔1

(𝐷)

𝜄1

(𝐸)

𝜔1

(𝐹)

𝜔1

(𝐵)

𝜔1

(𝐶)

𝜔1

(𝐷)

𝜔1

(𝐸)

𝜄1

(𝐹)

False positive rate (FPR) True positive rate (TPR) Population etiology (𝝆) disease “class”

  • Partially-observed class:

Controls have no lung infection;

  • Non-interference:

P(M[−j] | Y = 0) = P(M[−j] | I L = j, Y = 1);

  • Local independence (LI):

independence among measurements given class (I L

i ).

Next: relax both non-interference and LI assumptions.

Zhenke Wu(zhwu@jhu.edu) Biostat Grand Rounds, JHSPH 09 November 2015 7 / 18

slide-9
SLIDE 9

Problem Models Results Results Discussion

Modeling Local Dependence (LD)

logOR

s.e. std.logOR

0.86

0.4 2.1

−2.47

1.01 −2.4

0.79

0.22 3.5

1.67

0.39 4.3

−1.3

0.61 −2.1

1.12

0.24 4.7

0.51

0.23 2.2

−1.72

0.4 −4.3

−3.37

1.01 −3.3

−3.59

1.01 −3.6 RSV:(6) (6):RSV RHINO:(5) (5):RHINO PARA_1:(4) (4):PARA_1 HMPV_A_B:(3) (3):HMPV_A_B ADENO:(2) (2):ADENO HINF:(1) (1):HINF

cases controls

  • Direct evidence from control data;

symmetry (see Figure); pathogen interactions

  • Impact on inference (Pepe and

Janes, 2007; Albert et al., 2001)

  • Modeling cross-classified probability

contingency tables P(Mi1 = m1, ..., MiJ = mJ)

  • Log-linear parametrization
  • Generalized linear mixed-effect

models (GLMM)

  • Mixed-membership models
  • Other non-negative decompositions

Zhenke Wu(zhwu@jhu.edu) Biostat Grand Rounds, JHSPH 09 November 2015 8 / 18

slide-10
SLIDE 10

Problem Models Results Results Discussion

Nested pLCM

Example: 5 Pathogens, 2 Subclasses

Zhenke Wu(zhwu@jhu.edu) Biostat Grand Rounds, JHSPH 09 November 2015 9 / 18

slide-11
SLIDE 11

Problem Models Results Results Discussion

Example: Dependence Structure; 2 Subclasses

Left: weak LD Right: strong LD

Zhenke Wu(zhwu@jhu.edu) Biostat Grand Rounds, JHSPH 09 November 2015 10 / 18

slide-12
SLIDE 12

Problem Models Results Results Discussion

Simulation: Relative Asymptotic Bias

Bias if Estimated by Working LI Model (pLCM) Left: weak LD Right: strong LD

Zhenke Wu(zhwu@jhu.edu) Biostat Grand Rounds, JHSPH 09 November 2015 11 / 18

slide-13
SLIDE 13

Problem Models Results Results Discussion

Estimation in Finite Samples: How Many Subclasses?

Example: 3 Subclasses

A model selection problem:

  • Extra subclasses: rich correlation structure;
  • Few subclasses: parsimonious approximation in finite samples.

Proposed solution: Model averaging by stick-breaking prior: to encourage few but allow more if data have rich dependence

Zhenke Wu(zhwu@jhu.edu) Biostat Grand Rounds, JHSPH 09 November 2015 12 / 18

slide-14
SLIDE 14

Problem Models Results Results Discussion

Finite-Sample Simulations: Smaller MSE by npLCM

Scenario II: Strong LD; Ncase = Ncontrol = 500 Truth: Cases’ First Subclass Weight (ηo) 0.25 0.5 0.75 1 Class 100×Ratio of MSE( Standard Error) A 82( 4) 25( 1) 47( 2) 115( 6) 221( 12) B 516( 11) 177( 5) 80( 3) 62( 4) 140( 8) C 2379( 77) 711( 26) 131( 7) 268( 13) 357( 8) D 397( 14) 152( 6) 94( 5) 79( 4) 60( 4) E 357( 13) 151( 6) 102( 5) 95( 6) 82( 5)

Table: ratio of mean squared errors (MSE) for pLCM vs npLCM. All numbers are averaged across 1,000 replications.

Zhenke Wu(zhwu@jhu.edu) Biostat Grand Rounds, JHSPH 09 November 2015 13 / 18

slide-15
SLIDE 15

Problem Models Results Results Discussion

Analysis of PERCH Data

  • 24.4

23.8 5.2 15.1 7 13.8 2.8 1.5 8.4 17.9 6.8 5.2 35.9 32.3

0.0 0.2 0.4 0.6

RSV RHINO HMPV_A_B ADENO PARA_1 HINF

  • ther

cause probability −

23.6% 23.6% 23.6% 23.6% 23.6% 23.6% 23.6% 23.6% 23.6% 23.6% 23.6% 23.6% 23.6% 23.6%

1 1 −

12.5% 12.5% 12.5% 12.5% 12.5% 12.5% 12.5% 12.5% 12.5% 12.5% 12.5% 12.5% 12.5% 12.5%

1 −

10.1% 10.1% 10.1% 10.1% 10.1% 10.1% 10.1% 10.1% 10.1% 10.1% 10.1% 10.1% 10.1% 10.1%

1 1 1 1 −

3% 3% 3% 3% 3% 3% 3% 3% 3% 3% 3% 3% 3% 3%

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 H I N F A D E N O H M P V _ A _ B P A R A _ 1 R H I N O R S V

  • t

h e r H I N F A D E N O H M P V _ A _ B P A R A _ 1 R H I N O R S V

  • t

h e r Cause Probability

Zhenke Wu(zhwu@jhu.edu) Biostat Grand Rounds, JHSPH 09 November 2015 14 / 18

slide-16
SLIDE 16

Problem Models Results Results Discussion

Model Checking: Frequent Binary Patterns

Left: pLCM; Right: npLCM

  • 23.4

23.8 8.3 11 9.5 10.2 8.7 8.7 7.2 7 7.1 5.1 2.3 2.9 2.8 3 1.5 2.3 2.4 2.5 27 23.3

case

0.0 0.1 0.2 0.3

1 1 1 1 1 1 1 1 1 1 1 1

  • t

h e r pattern frequency

  • 43.2

44 14.3 14.7 11.8 10.7 8.4 5.9 4.2 3.5 2.8 4.9 3.9 3.4 0.8 1.1 1.2 1.1 2 1.3 8.1 8.7

control

0.0 0.2 0.4

1 1 1 1 1 1 1 1 1 1 1 1 1 1

  • t

h e r pattern frequency

Zhenke Wu(zhwu@jhu.edu) Biostat Grand Rounds, JHSPH 09 November 2015 15 / 18

slide-17
SLIDE 17

Problem Models Results Results Discussion

Main Points Once Again

  • Input: multivariate binary data in case-control studies
  • Output: two histograms: 1) the fraction of cases caused by

each pathogen; 2) the probability of a particular case caused by each pathogen; both given measurements.

  • Proposed a larger model family (nested pLCM) to

1) Borrow covariation and measurement precision from controls; 2) Account for residual measurement correlations, or local dependence (LD); 3) Parsimoniously approximate LD by sparse Bayesian fitting

  • Compared to pLCM, the extended model family can

1) Reduce bias 2) Retain efficiency 3) Have near-nominal coverage

Zhenke Wu(zhwu@jhu.edu) Biostat Grand Rounds, JHSPH 09 November 2015 16 / 18

slide-18
SLIDE 18

Problem Models Results Results Discussion

Regression Analysis

Left: pLCM (bad fit) Middle: npLCM (improved fit) Right: Seasonality

Zhenke Wu(zhwu@jhu.edu) Biostat Grand Rounds, JHSPH 09 November 2015 17 / 18

slide-19
SLIDE 19

Problem Models Results Results Discussion

Thanks!

Collaborators Scott Zeger Maria Deloria-Knoll Laura Hammitt Katherine O’Brien Funding Patient-Centered Outcome Research Institute [PCORI ME-1408-20318] Bill & Melinda Gates Foundation [48968] Hopkins Individualized Health (inHealth) Initiative Related Papers (More at: zhenkewu.com)

  • 1. Wu Z, Deloria-Knoll M, Hammitt LL, and Zeger SL, for the PERCH Core Team (2015a).

Partially Latent Class Models (pLCM) for Case-Control Studies of Childhood Pneumonia

  • Etiology. Journal of the Royal Statistical Society: Series C (Applied Statistics). doi:

10.1111/rssc.12101.

  • 2. Wu Z, Zeger SL (2015b). Nested Partially-Latent Class Models for Estimating Disease

Etiology from Case-Control Data. Johns Hopkins Biostatistics Working Papers No. 276.

  • 3. Wu Z, Zeger SL (2015c). Regression Analysis for Estimating Disease Etiology from

Case-Control Data. Johns Hopkins Biostatistics Working Papers.

Zhenke Wu(zhwu@jhu.edu) Biostat Grand Rounds, JHSPH 09 November 2015 18 / 18