Prior-Driven Cluster Allocation in Bayesian Mixture Models Sally - - PowerPoint PPT Presentation

prior driven cluster allocation in bayesian mixture models
SMART_READER_LITE
LIVE PREVIEW

Prior-Driven Cluster Allocation in Bayesian Mixture Models Sally - - PowerPoint PPT Presentation

Prior-Driven Cluster Allocation in Bayesian Mixture Models Sally Paganin sally.paganin@berkeley.edu JSM 2020 August 03, 2020 Amy Herring David Dunson Andrew Olshan Duke University Duke University UNC at Chapel Hill Introduction


slide-1
SLIDE 1

Prior-Driven Cluster Allocation in Bayesian Mixture Models

Sally Paganin

sally.paganin@berkeley.edu JSM 2020 August 03, 2020

slide-2
SLIDE 2

Amy Herring David Dunson Andrew Olshan Duke University Duke University UNC at Chapel Hill

slide-3
SLIDE 3

Introduction

Clustering is one of the canonical data analysis goal in statistics

  • Distance based methods: distance metric between data points
  • Model-based clustering: rely on discrete mixture models

Bayesian perspective : allow to incorporate prior information

slide-4
SLIDE 4

Introduction

Clustering is one of the canonical data analysis goal in statistics

  • Distance based methods: distance metric between data points
  • Model-based clustering: rely on discrete mixture models

Bayesian perspective : allow to incorporate prior information What if, we have prior information on the clustering itself?

slide-5
SLIDE 5

Introduction

Clustering is one of the canonical data analysis goal in statistics

  • Distance based methods: distance metric between data points
  • Model-based clustering: rely on discrete mixture models

Bayesian perspective : allow to incorporate prior information What if, we have prior information on the clustering itself? Motivating application - Birth defects data

  • Relate exposure factors to the development risk of a defect
  • Prior information available (biology/expert’s judgments)

We aim to provide methods to facilitate data-adaptive clustering, both

using information in the data and external knowledge.

slide-6
SLIDE 6

National Birth Defect Prevention Study

  • Population-based case-control study

300 controls/100 cases per year since 1997 monthly n. of controls ∝ n. of births previous year

  • Cases (37 major birth defect)

Birth defects surveillance system +clinical genetist review Cases with known etiology were excluded

  • Controls

Non-malformed live birth Birth certificates or hospital delivery records

  • Data collection

CATI (English/Spanish) within 24 months

❤tt♣✿✴✴✇✇✇✳♥❜❞♣s✳♦r❣✴

slide-7
SLIDE 7

National Birth Defect Prevention Study

  • Population-based case-control study

300 controls/100 cases per year since 1997 monthly n. of controls ∝ n. of births previous year

  • Cases (37 major birth defect)

Birth defects surveillance system +clinical genetist review Cases with known etiology were excluded

  • Controls

Non-malformed live birth Birth certificates or hospital delivery records

  • Data collection

CATI (English/Spanish) within 24 months

❤tt♣✿✴✴✇✇✇✳♥❜❞♣s✳♦r❣✴

We focus on the Congenital Heart Defects (CDH) which are problems in the structure of the heart that are present at birth.

slide-8
SLIDE 8

Congenital Heart Defects

Clinical importance priority in public health

most frequent class of defects high impact on pediatric mortality

Statistical relevance: challenge in birth defects modeling

Most defects are too rare for individual study Difficult to determine how best to group birth defects

slide-9
SLIDE 9

Congenital Heart Defects

Clinical importance priority in public health

most frequent class of defects high impact on pediatric mortality

Statistical relevance: challenge in birth defects modeling

Most defects are too rare for individual study Difficult to determine how best to group birth defects

Experts have provided a mechanistic classification of the defects

relies on biological knowledge and embryologic development translates in a prior guess c0 for the clustering

slide-10
SLIDE 10

Set partitions

A set partition c of an integer [n] is a collection of non-empty disjoint subsets {B1, B2, . . . , BK} such that ∪K

i Bi = [n]

  • Number of partitions of [n] into k blocks

Stirling numbers S(n, k)

  • Total number of set partitions

Bell number Bn = n

k=1 S(n, k)

slide-11
SLIDE 11

Set partitions

5 32 41 221 311 2111 11111 A set partition c of an integer [n] is a collection of non-empty disjoint subsets {B1, B2, . . . , BK} such that ∪K

i Bi = [n]

  • Number of partitions of [n] into k blocks

Stirling numbers S(n, k)

  • Total number of set partitions

Bell number Bn = n

k=1 S(n, k)

  • Configuration λ = {|B1|, . . . , |BK|}

sequence of block cardinalities individuate an integer partition, a set of

positive integers {λ1, . . . , λK} such that

K

i=1 λi = n

slide-12
SLIDE 12

Modeling birth defects

  • i = 1, . . . , N heart defects, j = 1, . . . , ni observations
  • yij = 1 if observation j has the b.d. i while yij = 0 is a control
  • xT

ij = (xij1, . . . , xijp) observed values for p dichotomous variables

Grouped logistic regression

yij ∼ Ber(πij)

logit(πij) = αi + xT

ijβci,

j = 1, . . . , ni, αi ∼ N(a0, τ −1

0 )

βci|c ∼ Np(b, Q) i = 1, . . . , N,

Bayesian framework: assign a prior probability p(c)

Exchangeable Partition Probability Function (EPPF)

slide-13
SLIDE 13

Uniform distribution

p(c) ∝ 1/BN

Dirichlet Process: p(c) ∝ K

i=1(|Bi| − 1)!

Pitman-Yor Process: p(c) ∝ K

i=1(1 − σ)|Bi|

slide-14
SLIDE 14

How to account for c0?

Base idea: penalize a baseline EPPF in order to center the prior distribution on the given partition c0

p(c|c0, ψ) ∝ p0(c) exp{−ψd(c, c0)}

(1)

  • p0(c) indicates a baseline distribution (EPPF) on ΠN
  • d(c, c0) a suitable distance between partitions

ideally a metric on the set partitions lattice

  • ψ penalization parameter controlling for the centering

ψ = 0 p(c|c0, ψ) → p0(c) ψ → ∞ p(c|c0, ψ) = δc0

slide-15
SLIDE 15

How to account for c0?

Base idea: penalize a baseline EPPF in order to center the prior distribution on the given partition c0

p(c|c0, ψ) ∝ p0(c) exp{−ψd(c, c0)}

(1)

  • p0(c) indicates a baseline distribution (EPPF) on ΠN
  • d(c, c0) a suitable distance between partitions

ideally a metric on the set partitions lattice

  • ψ penalization parameter controlling for the centering

ψ = 0 p(c|c0, ψ) → p0(c) ψ → ∞ p(c|c0, ψ) = δc0

Choice of the distance Variation of information [Meila (2007)]

  • VI(c, c′) = −H(c) − H(c′) + 2H(c ∧ c′)
  • H(·) information entropy
  • metric on set partition lattice
slide-16
SLIDE 16

Centered Partition Processes

Define sets of partitions with distance δl from c0 and configuration λm

slm(c0) = {c ∈ ΠN : d(c, c0) = δl, Λ(c) = λm}

for l = 0, . . . , L and m = 1, . . . , M. Centered Partition Processes - analytic form

p(c|c0, ψ) = g(λm)e−ψδl L

u=0

M

v=1 |suv(c0)|g(λv)e−ψδu ,

for c ∈ slm(c0)

  • g(·) function of the configuration Λ(c)

e.g. Uniform g(Λ(c)) = 1, DP g(Λ(c)) = αK K

j=1 Γ(λj)

  • | · | cardinality of the set slm(c0), not analytically tractable

but can nonetheless be used in Bayesian models relying on Monte

Carlo methods

slide-17
SLIDE 17

CP Process - Uniform EPPF

c0 = {1, 2, 3, 4, 5} c0 = {1, 2}{3, 4}{5}

slide-18
SLIDE 18

CP Process - DP EPPF (α = 1)

c0 = {1, 2, 3, 4, 5} c0 = {1, 2}{3, 4}{5}

slide-19
SLIDE 19

Prior calibration

We consider to estimate the distribution of distance δ ∈ {δl}L

l=0

p(δ = δl) = M

m=1 nlmg(λm)e−ψδl

L

u=0

M

v=1 nuvg(λv)e−ψδu

  • Monte Carlo procedure

uniform sampler on the

set partition space ΠN [Stam (1983)]

  • Deterministic local search

for small values of the

distance δ ∈ {δ0, . . . , δL∗}

greedy search algorithm

  • 0.0

0.5 1.0 1.5 2.0 2.5 3.0 3.5 0.0 0.2 0.4 0.6 0.8 1.0

Distances Cumulative probabilities

  • ψ

5 10 15 20

slide-20
SLIDE 20

Modeling birth defects

N = 26 birth defects, 4, 047 cases, 8, 125 controls, 90 potential risk factors yij ∼ Ber(πij)

logit(πij) = αi + xT

ijβci,

j = 1, . . . , ni, αi ∼ N(a0, τ −1 ) βci|c ∼ Np(b, Q) i = 1, . . . , N, p(c) ∼ CP(c0, ψ, p0(c)) p0(c) ∝ αK

K

  • k=1

(λk − 1)!

from the prior calibration: ψ = 40 (90% partitions with d = 0.8 (dmax = 4.70)

Posterior estimation (MCMC)

  • A Polya-gamma data augmentation for Bayesian logistic

regression, introducing latent variables

ω(j)

i

∼ PG(1, α(j) + x(j)T

i

βcj)

  • Class allocation step involving prior penalization easily adapt

marginal sampling for DP process

slide-21
SLIDE 21

Clustering results

  • AORTICSTENOSIS

ASDOS AVSD COARCT COMMONTRUNCUS DORVOTHER DORVTGA DTGA EBSTEIN FALLOT HLHS IAANOS IAATYPEA IAATYPEB PULMATRESIA PVS TAPVR TRIATRESIA VSDCONOV VSDMUSC VSDNOS VSDOS VSDPM ASD ASDNOS PAPVR A O R T I C S T E N O S I S A S D O S A V S D C O A R C T C O M M O N T R U N C U S D O R V O T H E R D O R V T G A D T G A E B S T E I N F A L L O T H L H S I A A N O S I A A T Y P E A I A A T Y P E B P U L M A T R E S I A P V S T A P V R T R I A T R E S I A V S D C O N O V V S D M U S C V S D N O S V S D O S V S D P M A S D A S D N O S P A P V R

(a) ψ = 0, VI(ˆ c, c0) = 2.43

  • ASD

COMMONTRUNCUS DORVOTHER DORVTGA FALLOT IAANOS IAATYPEB PAPVR PULMATRESIA TAPVR TRIATRESIA VSDCONOV ASDNOS ASDOS AVSD EBSTEIN PVS VSDMUSC VSDNOS VSDOS VSDPM AORTICSTENOSIS COARCT DTGA HLHS IAATYPEA A S D C O M M O N T R U N C U S D O R V O T H E R D O R V T G A F A L L O T I A A N O S I A A T Y P E B P A P V R P U L M A T R E S I A T A P V R T R I A T R E S I A V S D C O N O V A S D N O S A S D O S A V S D E B S T E I N P V S V S D M U S C V S D N O S V S D O S V S D P M A O R T I C S T E N O S I S C O A R C T D T G A H L H S I A A T Y P E A

(b) ψ = 40, VI(ˆ c, c0) = 1.78

slide-22
SLIDE 22

Clustering results

  • ASD

ASDNOS ASDOS AVSD VSDMUSC VSDNOS VSDOS VSDPM AORTICSTENOSIS COARCT COMMONTRUNCUS DORVOTHER DORVTGA DTGA EBSTEIN FALLOT HLHS IAANOS IAATYPEA IAATYPEB PAPVR PULMATRESIA PVS TAPVR TRIATRESIA VSDCONOV A S D A S D N O S A S D O S A V S D V S D M U S C V S D N O S V S D O S V S D P M A O R T I C S T E N O S I S C O A R C T C O M M O N T R U N C U S D O R V O T H E R D O R V T G A D T G A E B S T E I N F A L L O T H L H S I A A N O S I A A T Y P E A I A A T Y P E B P A P V R P U L M A T R E S I A P V S T A P V R T R I A T R E S I A V S D C O N O V

(c) ψ = 80, VI(ˆ c, c0) = 1.65

  • COMMONTRUNCUS

DORVOTHER DORVTGA DTGA FALLOT IAANOS IAATYPEB PAPVR TAPVR VSDCONOV ASD ASDNOS ASDOS AVSD EBSTEIN PULMATRESIA PVS TRIATRESIA VSDMUSC VSDNOS VSDOS VSDPM AORTICSTENOSIS COARCT HLHS IAATYPEA C O M M O N T R U N C U S D O R V O T H E R D O R V T G A D T G A F A L L O T I A A N O S I A A T Y P E B P A P V R T A P V R V S D C O N O V A S D A S D N O S A S D O S A V S D E B S T E I N P U L M A T R E S I A P V S T R I A T R E S I A V S D M U S C V S D N O S V S D O S V S D P M A O R T I C S T E N O S I S C O A R C T H L H S I A A T Y P E A

(d) ψ = 120, VI(ˆ c, c0) = 0.86

slide-23
SLIDE 23

Exposure effects

COMMONTRUNCUS

40 80 120 ∞

Household smoking Drink alcohol Substance Abuse Folic Acid supplement Obese vs Normal Type 1 diabetes Type 2 diabetes Nausea Asthma Kidney/Bladder/UTI Acetominophen without fever NSAIDS without fever Antipyretic with no fever Anti-infective Cold Meds Doxylamine Meclizine Opoids Promethazine SSRI Sulfamethoxazole Trimethoprim Relatives Health problems or BD

ψ

PAPVR

40 80 120 ∞

ψ

PULMATRESIA

40 80 120 ∞

ψ

AVSD

40 80 120 ∞

ψ

slide-24
SLIDE 24

Future work

Data analysis

  • Variable selection in order to account for shared effects.
  • Inclusion of information favoring relation between specific outcomes

and exposure factors. Methodology

  • Building prediction rules for new observations/clusters.
  • Formalize inclusion of partial information, number/sizes of clusters.

Software

  • Provide sampling methods via
slide-25
SLIDE 25

Thanks!

Centered Partition Processes: Informative Priors for Clustering. Paganin S., Herring A. H., Olshan A. F. & Dunson B. D. (2020) Bayesian Analysis (Advanced publication)

sally.paganin@berkeley.edu @sampling_sally salleuska ↸ ❤tt♣s✿✴✴s❛❧❧❡✉s❦❛✳❣✐t❤✉❜✳✐♦✴

slide-26
SLIDE 26

References i

HARTIGAN, J. A. (1990). Partition models

  • Commun. Statist. A 19, 2745–2756.

MEILA M. (2007). Comparing clusterings - an information based distance.

  • J. of Mult. Analysis 98, 873–895.

MÜLLER, P., QUINTANA, F. & ROSNER, G. L. (2011). A Product Partition Model With Regression on Covariates.

  • J. Comput. Graph. Statist. 20, 260–278.

NEAL, R. M. (2000). Markov chain sampling methods for Dirichlet process mixture models

  • J. Comput. Graph. Statist. 9, 249–265.

PARK, J.-H. & DUNSON, D. B. (2010). Bayesian Generalize Product Partition Models.

  • Stat. Sin. 20, 1203–1226.
slide-27
SLIDE 27

References ii

RODRIGUEZ, A. & DAVID B. D. (2011). Nonparametric Bayesian models through probit stick-breaking processes Bayesian analysis (Online) 6.1. STAM, A.J. (1983). Generation of a random partition of a finite set by an urn model

  • J. of Comb. Theory, Series A 35, 231–240.