Variational Greedy Algorithm for Clustering of Grouped Data Linda - - PowerPoint PPT Presentation

variational greedy algorithm for clustering of grouped
SMART_READER_LITE
LIVE PREVIEW

Variational Greedy Algorithm for Clustering of Grouped Data Linda - - PowerPoint PPT Presentation

Variational Greedy Algorithm for Clustering of Grouped Data Linda S. L. Tan (Joint work with A/Prof. David J. Nott) National University of Singapore 2023 Dec ICSA 2013 Linda Tan (NUS) Variational Greedy Algorithm ICSA 2013 1 / 15


slide-1
SLIDE 1

Variational Greedy Algorithm for Clustering of Grouped Data

Linda S. L. Tan (Joint work with A/Prof. David J. Nott)

National University of Singapore

20–23 Dec ICSA 2013

Linda Tan (NUS) Variational Greedy Algorithm ICSA 2013 1 / 15

slide-2
SLIDE 2

Presentation Outline

1

Motivation

2

Mixtures of Linear Mixed Models

3

Variational Approximation

4

Hierarchical Centering

5

Variational Greedy Algorithm

6

Examples

7

Conclusion and Future Work

Linda Tan (NUS) Variational Greedy Algorithm ICSA 2013 2 / 15

slide-3
SLIDE 3

Motivation

Problem: Clustering correlated or replicated grouped data Example: Gene expression profiles

Clustering used to find co-regulated and functionally related groups of genes (e.g. Celeux et al., 2005).

5 10 15 −2 −1 1 2 3

Time course data (Spellman et al. 1998)

Time points Gene expression levels

Linda Tan (NUS) Variational Greedy Algorithm ICSA 2013 3 / 15

slide-4
SLIDE 4

Approach

Consider mixtures of linear mixed models (MLMMs).

Provide mathematical framework for clustering grouped data Allow covariate information to be incorporated Estimated using EM algorithm (likelihood maximization). Model selection performed using penalized log-likelihood criteria e.g. BIC (Celeux et al. 2005, Ng et al. 2006).

We

develop a variational greedy algorithm (VGA) for fitting MLMMs

automatic performs parameter estimation and model selection simultaneously

reparametrize MLMM using hierarchical centering when certain parameters are weakly identifiable, report gain in efficiency in variational algorithms due to hierarchical centering (similar to MCMC). Some theoretical support is provided.

Linda Tan (NUS) Variational Greedy Algorithm ICSA 2013 4 / 15

slide-5
SLIDE 5

Mixture of linear mixed models (MLMMs)

Observe yi = [yi1, . . . , yini]T for i = 1, . . . , n. Number of mixture components: k. δi: latent mixture component indicators. Conditional on δi = j, yi = Xiβj + Wiai + Vibj + ǫi.

Xi, Wi and Vi are design matrices, βj: fixed effects ai ∼ N(0, σ2

ajI) and bj ∼ N(0, σ2 bjI) are random effects

ǫi ∼ N (0, Σij): error vector

Mixture weights: vary with covariates. Multinomial logit model:

P(δi = j|γ) = exp(uT

i γj)

k

l=1 exp(uT i γl)

,

ui: vector of covariates, γ1 ≡ 0, γ2, . . . , γk: unknown parameters. Priors (Bayesian approach): γ ∼ N(0, Σγ), βj ∼ N(0, Σβj), σ2

aj ∼ IG(αaj, λaj), σ2 bj ∼ IG(αbj, λbj) and σ2 jl ∼ IG(αjl, λjl).

Linda Tan (NUS) Variational Greedy Algorithm ICSA 2013 5 / 15

slide-6
SLIDE 6

Introduction to variational approximation

Fast, deterministic and flexible technique Bayesian inference: approximate intractable true posterior p(θ|y) by more tractable q(θ), e.g. assume

1

q(θ) belongs to some parametric distribution or

2

q(θ) = m

i=1 qi(θi) for θ = {θ1, . . . , θm} (Variational Bayes)

Minimize Kullback-Leibler divergence between q(θ) and p(θ|y) Equivalent to maximizing lower bound L =

  • q(θ){log p(y, θ) − log q(θ)} dθ
  • n the log marginal likelihood log p(y)

L sometimes used for Bayesian model selection.

Linda Tan (NUS) Variational Greedy Algorithm ICSA 2013 6 / 15

slide-7
SLIDE 7

Variational approximation for MLMMs

Assume

q(θ) = q(γ)

n

  • i=1

{q(ai)q(δi)}

k

  • j=1
  • q(βj)q(bj)q(σ2

aj)q(σ2 bj) g

  • l=1

q(σ2

jl)

  • q(ai) = N(µq

ai, Σq ai), q(βj) = N(µq βj, Σq βj), q(bj) = N(µq bj, Σq bj),

q(σ2

aj) = IG(αq aj, λq aj), q(σ2 bj) = IG(αq bj, λq bj), q(σ2 jl) = IG(αq jl, λq jl)

q(δi = j) = qij where

k

  • j=1

qij = 1 ∀i, q(γ) = 1{γ=µq

γ} (for tractable L).

Optimize L w.r.t. variational parameters in gradient ascent algorithm Conditional mode of µq

γ: iteratively weighted least squares.

Closed form updates for all other variational parameters. Relax q(γ) to normal distribution at convergence (Waterhouse et al., 1996). Obtain approximation L∗ to log p(y) (model selection in VGA)

Linda Tan (NUS) Variational Greedy Algorithm ICSA 2013 7 / 15

slide-8
SLIDE 8

Hierarchical centering

Recall: yi = Xiβj + Wiai + Vibj + ǫi conditional on δi = j

1 Partial centering (Xi = Wi):

Introduce ηi = βj + ai ∼ N(βj, σ2

ajI) so yi = Xiηi + Vibj + ǫi

2 Full centering (Xi = Wi = Vi): Introduce νj = βj + bj ∼ N(βj, σ2

bjI)

and ρi = νj + ai ∼ N(νj, σ2

ajI) so yi = Xiρi + ǫi

We derive lower bounds and algorithms for these two cases. Observe gain in efficiency through centering similar to MCMC Theoretical support: we prove that “Rate of convergence of variational Bayes algorithms by Gaussian approximation is equal to that of the corresponding Gibbs sampler” Result not directly applicable to MLMMs, but suggests hierachical centering may lead to improved convergence in variational algorithms just as in MCMC.

Linda Tan (NUS) Variational Greedy Algorithm ICSA 2013 8 / 15

slide-9
SLIDE 9

Variational greedy algorithm (VGA)

Automatic

Returns a plausible number of mixture components + fitted model

Bottom-up approach (VA: variational algorithm)

1

Start by fitting a one component mixture f1

2

Search for optimal way to split components in current mixture, fk

Randomly partition each component into two. Apply partial VA to resulting mixture, updating only variational parameters of two split components. Trial with highest L out of M yields optimal way.

3

Split components in fk in descending order of L.

Apply partial VA each time keeping fixed variational parameters of components awaiting to be split. Split “successful” if L∗ increases. Stop once a split is unsuccessful.

4

Apply VA on resulting mixture updating all variational parameters.

5

Repeat 2–4 until all splits of current mixture are unsuccessful.

Linda Tan (NUS) Variational Greedy Algorithm ICSA 2013 9 / 15

slide-10
SLIDE 10

Variational greedy algorithm (VGA)

Increase efficiency

partial variational algorithms: Only update variational parameters of certain components instead of entire mixture Component elimination property of variational Bayes: sieve out components that resist splitting

Optional merge merges may be carried out after VGA has converged. Greedy approach can be adapted to fit other mixture models

Linda Tan (NUS) Variational Greedy Algorithm ICSA 2013 10 / 15

slide-11
SLIDE 11

Example: Time course data (Spellman et al. 1998)

18 α-factor synchronization: yeast cells sampled at 7 min intervals for 119 mins for 612 genes.

5 10 15 −2 −1 1 2 3

Time course data (Spellman et al. 1998)

Time points Gene expression levels

Apply VGA (without hierarchical centering) ten times: three 15-comp mixtures, five 17-comp mixtures and two 18-comp mixtures. Apply merge moves: three 17-comp mixtures reduce to 16-comp and both 18-comp mixtures reduce to 17-comp. Possible for VGA to overestimate number of mixture components but variation in number of components returned is relatively small.

Linda Tan (NUS) Variational Greedy Algorithm ICSA 2013 11 / 15

slide-12
SLIDE 12

Example: Time course data

Clustering of a 16-component mixture, obtained after applying one merge move to a 17-component mixture produced by VGA.

40 80 −2 1 3 cluster 1 (37 genes) 40 80 −2 1 3 cluster 2 (105 genes) 40 80 −2 1 3 cluster 3 (41 genes) 40 80 −2 1 3 cluster 4 (20 genes) 40 80 −2 1 3 cluster 5 (8 genes) 40 80 −2 1 3 cluster 6 (64 genes) 40 80 −2 1 3 cluster 7 (65 genes) 40 80 −2 1 3 cluster 8 (79 genes) 40 80 −2 1 3 cluster 9 (25 genes) 40 80 −2 1 3 cluster 10 (17 genes) 40 80 −2 1 3 cluster 11 (15 genes) 40 80 −2 1 3 cluster 12 (49 genes) 40 80 −2 1 3 cluster 13 (13 genes) 40 80 −2 1 3 cluster 14 (37 genes) 40 80 −2 1 3 cluster 15 (31 genes) 40 80 −2 1 3 cluster 16 (6 genes)

Figure: x-axis: time points, y-axis: gene expression levels. Line in black is posterior mean of fixed effects.

Linda Tan (NUS) Variational Greedy Algorithm ICSA 2013 12 / 15

slide-13
SLIDE 13

Example: Synthetic data set (Yeung et al., 2003)

5 10 15 20 −2 1 2 3 400 gene expressions (4 repeated measurements)

5 10 15 20 −2 2

cluster 1: 67 genes

5 10 15 20 −2 2

cluster 2: 67 genes

5 10 15 20 −2 2

cluster 3: 67 genes

5 10 15 20 −2 2

cluster 4: 67 genes

5 10 15 20 −2 2

cluster 5: 66 genes

5 10 15 20 −2 2

cluster 6: 66 genes

Figure: x-axis: experiments, y-axis: gene expression levels.

Model: Xi = Wi Apply VGA (with partial centering and without centering) five times each Adjusted Rand index: Measure degree

  • f agreement between true and fitted

clusters.

Centering No Partial Average adjusted Rand index < 0.01 0.99

  • No. of components returned

2 comp × 5 6 comp × 3 7 comp × 2

Hierarchical centering produced much better clustering results Number of mixture components returned by VGA very close to true number of components

Linda Tan (NUS) Variational Greedy Algorithm ICSA 2013 13 / 15

slide-14
SLIDE 14

Example: Water temperature data

Daily average water temperatures (290 days, Upper Peirce Reservoir) Model: Xi = Wi = Vi Apply VGA (full centering and without centering) five times each: a 4-component model obtained each time

no centering full centering Average CPU time (seconds) 725 469 Average log marginal likelihood

  • 837
  • 789

4-component fitted model from VGA (full centering):

cluster 1 (99)

0.5 6 12 18 28 30

cluster 2 (113)

0.5 6 12 18 28 30

cluster 3 (48)

0.5 6 12 18 28 30

cluster 4 (30)

0.5 6 12 18 28 30

Figure: Left: Clustering results. x-axis: depth, y-axis: water temperature.

Linda Tan (NUS) Variational Greedy Algorithm ICSA 2013 14 / 15

slide-15
SLIDE 15

Conclusion and Future Work

We have

developed an automatic variational greedy algorithm for fitting MLMMs that performs parameter estimation and model selection simultaneously. showed empirically that hierarchical centering can improve rate of convergence in variational algorithms and produce better clustering results.

Future work:

Extend variational greedy approach to other mixture models. Replace inverse gamma priors with marginally noninformative priors (Huang and Wand, 2013) to reduce sensitivity. Extend variational greedy algorithm to large data sets. Tan, S. L. and Nott, D. J. (2013). Variational approximation for mixtures

  • f linear mixed models. Journal of Computational and Graphical Statistics.

Advanced online publication. DOI: 10.1080/10618600.2012.761138.

Thank You!

Linda Tan (NUS) Variational Greedy Algorithm ICSA 2013 15 / 15