Variational Greedy Algorithm for Clustering of Grouped Data Linda - PowerPoint PPT Presentation

Variational Greedy Algorithm for Clustering of Grouped Data Linda S. L. Tan (Joint work with A/Prof. David J. Nott) National University of Singapore 20–23 Dec ICSA 2013 Linda Tan (NUS) Variational Greedy Algorithm ICSA 2013 1 / 15

Presentation Outline Motivation 1 Mixtures of Linear Mixed Models 2 Variational Approximation 3 Hierarchical Centering 4 Variational Greedy Algorithm 5 Examples 6 Conclusion and Future Work 7 Linda Tan (NUS) Variational Greedy Algorithm ICSA 2013 2 / 15

Motivation Problem: Clustering correlated or replicated grouped data Example: Gene expression profiles Clustering used to find co-regulated and functionally related groups of genes (e.g. Celeux et al. , 2005). Time course data (Spellman et al. 1998) Gene expression levels 3 2 1 0 −1 −2 5 10 15 Time points Linda Tan (NUS) Variational Greedy Algorithm ICSA 2013 3 / 15

Approach Consider mixtures of linear mixed models (MLMMs). Provide mathematical framework for clustering grouped data Allow covariate information to be incorporated Estimated using EM algorithm (likelihood maximization). Model selection performed using penalized log-likelihood criteria e.g. BIC (Celeux et al. 2005, Ng et al. 2006). We develop a variational greedy algorithm (VGA) for fitting MLMMs automatic performs parameter estimation and model selection simultaneously reparametrize MLMM using hierarchical centering when certain parameters are weakly identifiable, report gain in efficiency in variational algorithms due to hierarchical centering (similar to MCMC). Some theoretical support is provided. Linda Tan (NUS) Variational Greedy Algorithm ICSA 2013 4 / 15

Mixture of linear mixed models (MLMMs) Observe y i = [ y i 1 , . . . , y in i ] T for i = 1 , . . . , n . Number of mixture components: k . δ i : latent mixture component indicators. Conditional on δ i = j , y i = X i β j + W i a i + V i b j + ǫ i . X i , W i and V i are design matrices, β j : fixed effects a i ∼ N (0 , σ 2 a j I ) and b j ∼ N (0 , σ 2 b j I ) are random effects ǫ i ∼ N (0 , Σ ij ): error vector Mixture weights: vary with covariates. Multinomial logit model: exp( u T i γ j ) P ( δ i = j | γ ) = , � k l =1 exp( u T i γ l ) u i : vector of covariates, γ 1 ≡ 0, γ 2 , . . . , γ k : unknown parameters. Priors (Bayesian approach): γ ∼ N (0 , Σ γ ), β j ∼ N (0 , Σ β j ), σ 2 a j ∼ IG ( α a j , λ a j ), σ 2 b j ∼ IG ( α b j , λ b j ) and σ 2 jl ∼ IG ( α jl , λ jl ). Linda Tan (NUS) Variational Greedy Algorithm ICSA 2013 5 / 15

Introduction to variational approximation Fast, deterministic and flexible technique Bayesian inference: approximate intractable true posterior p ( θ | y ) by more tractable q ( θ ), e.g. assume q ( θ ) belongs to some parametric distribution or 1 q ( θ ) = � m i =1 q i ( θ i ) for θ = { θ 1 , . . . , θ m } (Variational Bayes) 2 Minimize Kullback-Leibler divergence between q ( θ ) and p ( θ | y ) Equivalent to maximizing lower bound � L = q ( θ ) { log p ( y , θ ) − log q ( θ ) } d θ on the log marginal likelihood log p ( y ) L sometimes used for Bayesian model selection. Linda Tan (NUS) Variational Greedy Algorithm ICSA 2013 6 / 15

Variational approximation for MLMMs Assume g n k � � � � � q ( β j ) q ( b j ) q ( σ 2 a j ) q ( σ 2 q ( σ 2 q ( θ ) = q ( γ ) { q ( a i ) q ( δ i ) } b j ) jl ) i =1 j =1 l =1 a i ) , q ( β j ) = N ( µ q β j , Σ q β j ) , q ( b j ) = N ( µ q b j , Σ q q ( a i ) = N ( µ q a i , Σ q b j ) , q ( σ 2 a j ) , q ( σ 2 b j ) = IG ( α q b j , λ q b j ) , q ( σ 2 jl ) = IG ( α q jl , λ q a j ) = IG ( α q a j , λ q jl ) k � q ( δ i = j ) = q ij where q ij = 1 ∀ i , q ( γ ) = 1 { γ = µ q γ } (for tractable L ) . j =1 Optimize L w.r.t. variational parameters in gradient ascent algorithm Conditional mode of µ q γ : iteratively weighted least squares. Closed form updates for all other variational parameters. Relax q ( γ ) to normal distribution at convergence (Waterhouse et al. , 1996). Obtain approximation L ∗ to log p ( y ) (model selection in VGA) Linda Tan (NUS) Variational Greedy Algorithm ICSA 2013 7 / 15

Hierarchical centering Recall: y i = X i β j + W i a i + V i b j + ǫ i conditional on δ i = j 1 Partial centering ( X i = W i ): Introduce η i = β j + a i ∼ N ( β j , σ 2 a j I ) so y i = X i η i + V i b j + ǫ i 2 Full centering ( X i = W i = V i ): Introduce ν j = β j + b j ∼ N ( β j , σ 2 b j I ) and ρ i = ν j + a i ∼ N ( ν j , σ 2 a j I ) so y i = X i ρ i + ǫ i We derive lower bounds and algorithms for these two cases. Observe gain in efficiency through centering similar to MCMC Theoretical support: we prove that “Rate of convergence of variational Bayes algorithms by Gaussian approximation is equal to that of the corresponding Gibbs sampler” Result not directly applicable to MLMMs, but suggests hierachical centering may lead to improved convergence in variational algorithms just as in MCMC. Linda Tan (NUS) Variational Greedy Algorithm ICSA 2013 8 / 15

Variational greedy algorithm (VGA) Automatic Returns a plausible number of mixture components + fitted model Bottom-up approach (VA: variational algorithm) Start by fitting a one component mixture f 1 1 Search for optimal way to split components in current mixture, f k 2 Randomly partition each component into two. Apply partial VA to resulting mixture, updating only variational parameters of two split components. Trial with highest L out of M yields optimal way. Split components in f k in descending order of L . 3 Apply partial VA each time keeping fixed variational parameters of components awaiting to be split. Split “successful” if L ∗ increases. Stop once a split is unsuccessful. Apply VA on resulting mixture updating all variational parameters. 4 Repeat 2–4 until all splits of current mixture are unsuccessful. 5 Linda Tan (NUS) Variational Greedy Algorithm ICSA 2013 9 / 15

Variational greedy algorithm (VGA) Increase efficiency partial variational algorithms: Only update variational parameters of certain components instead of entire mixture Component elimination property of variational Bayes: sieve out components that resist splitting Optional merge merges may be carried out after VGA has converged. Greedy approach can be adapted to fit other mixture models Linda Tan (NUS) Variational Greedy Algorithm ICSA 2013 10 / 15

Example: Time course data (Spellman et al. 1998) 18 α -factor synchronization: yeast cells sampled at 7 min intervals for 119 mins for 612 genes. Time course data (Spellman et al. 1998) Gene expression levels 3 2 1 0 −1 −2 5 10 15 Time points Apply VGA (without hierarchical centering) ten times: three 15-comp mixtures, five 17-comp mixtures and two 18-comp mixtures. Apply merge moves: three 17-comp mixtures reduce to 16-comp and both 18-comp mixtures reduce to 17-comp. Possible for VGA to overestimate number of mixture components but variation in number of components returned is relatively small. Linda Tan (NUS) Variational Greedy Algorithm ICSA 2013 11 / 15

Example: Time course data Clustering of a 16-component mixture, obtained after applying one merge move to a 17-component mixture produced by VGA. cluster 1 (37 genes) cluster 2 (105 genes) cluster 3 (41 genes) cluster 4 (20 genes) cluster 5 (8 genes) cluster 6 (64 genes) 3 3 3 3 3 3 1 1 1 1 1 1 −2 −2 −2 −2 −2 −2 0 40 80 0 40 80 0 40 80 0 40 80 0 40 80 0 40 80 cluster 7 (65 genes) cluster 8 (79 genes) cluster 9 (25 genes) cluster 10 (17 genes) cluster 11 (15 genes) cluster 12 (49 genes) 3 3 3 3 3 3 1 1 1 1 1 1 −2 −2 −2 −2 −2 −2 0 40 80 0 40 80 0 40 80 0 40 80 0 40 80 0 40 80 cluster 13 (13 genes) cluster 14 (37 genes) cluster 15 (31 genes) cluster 16 (6 genes) 3 3 3 3 1 1 1 1 −2 −2 −2 −2 0 40 80 0 40 80 0 40 80 0 40 80 Figure: x -axis: time points, y -axis: gene expression levels. Line in black is posterior mean of fixed effects. Linda Tan (NUS) Variational Greedy Algorithm ICSA 2013 12 / 15

Example: Synthetic data set (Yeung et al. , 2003) 400 gene expressions (4 repeated measurements) Model: X i = W i 3 Apply VGA (with partial centering and 2 1 without centering) five times each 0 Adjusted Rand index: Measure degree −2 of agreement between true and fitted 5 10 15 20 clusters. cluster 1: 67 genes cluster 2: 67 genes cluster 3: 67 genes Centering No Partial 2 2 2 Average adjusted Rand index < 0.01 0.99 0 0 0 2 comp × 5 6 comp × 3 −2 −2 −2 No. of components returned 7 comp × 2 5 10 15 20 5 10 15 20 5 10 15 20 cluster 4: 67 genes cluster 5: 66 genes cluster 6: 66 genes Hierarchical centering produced much 2 2 2 better clustering results 0 0 0 −2 −2 −2 Number of mixture components 5 10 15 20 5 10 15 20 5 10 15 20 returned by VGA very close to true Figure: x -axis: experiments, number of components y -axis: gene expression levels. Linda Tan (NUS) Variational Greedy Algorithm ICSA 2013 13 / 15

Variational Greedy Algorithm for Clustering of Grouped Data Linda - PowerPoint PPT Presentation

Variational Greedy Algorithm for Clustering of Grouped Data Linda S. L. Tan (Joint work with A/Prof. David J. Nott) National University of Singapore 2023 Dec ICSA 2013 Linda Tan (NUS) Variational Greedy Algorithm ICSA 2013 1 / 15

Greedy On-Line Planning - abstract overview: what is greedy on-line planning? Part 1: - greedy

Greedy embedding of a graph Greedy embedding of a graph 99 Greedy embedding Greedy embedding

Greedy Algorithms 1 The main idea of greedy algorithm is look some optimal solution locally

Greedy Algorithms Chapter 16 1 CPTR 430 Algorithms Greedy Algorithms Greedy Algorithms For

CS 170 Section 4 Greedy Algorithms I Owen Jow | owenjow@berkeley.edu Agenda Greedy

From greedy approximation to greedy optimization Vladimir Temlyakov July, 2014 Vladimir

From greedy approximation to greedy optimization Vladimir Temlyakov December 10, 2013 Vladimir

Greedy Algorithms Pedro Ribeiro DCC/FCUP 2018/2019 Pedro Ribeiro (DCC/FCUP) Greedy Algorithms

Greedy Algorithm and Matroid Intersections by Yan Alves Radtke July 2020 by Yan Alves Radtke

Variational Auto-encoders 2 VARIATIONAL AUTO-ENCODERS INTRODUCTION VARIATIONAL AUTO-ENCODERS

Greedy algorithms Greedy algorithms Find the best solution to a local problem and (hope) it

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

Greedy routing Greedy routing Other variations on greedy criterion Introduce

General remarks Algorithms Algorithms Oliver Oliver Week 8 Kullmann Kullmann Greedy Greedy

Greedy Algorithms Week 5 Objectives Subproblem structure Greedy algorithm

IRAQ ACTION 2009-2015 FACTS Total Area 438.317 km2 Population 37 millions UNIVERSITIES 5

Thermoelectricity and other effects in Rare Earths compounds with high magnetic disorder. Jos

Trust dHealthNetwork.io Ivan Jasenovic - Founder - CEO of Sicoor.com Paradigms and tools change

Payment market in transition and challenges to the Riksbanks operational framework Stefan

STAGE 2 STAGE 1 PROBLEM SPACE DECISIONS DESIGN PLANNING DECISIONS Considering

ICSA 2019 - ECRF Presentation Slides C Pretorius Presentation March 2019 CITATIONS READS 0

iCouncil Jenny Rhodes jenny.rhodes@newcastle.edu.au Senior Desktop Technologies Officer IT

Building Applications on the Ethereum Blockchain Eoin Woods Endava @eoinwoodz 1 licensed