Memoized Online Variational Inference for Dirichlet Process Mixture - PowerPoint PPT Presentation

Memoized Online Variational Inference for Dirichlet Process Mixture Models Michael C. Hughes Erik B. Sudderth Department of Computer Science, Brown University 26 June 2014 Advances in Neural Information Processing Systems (2013) Presented by Kyle Ulrich Hughes and Sudderth (NIPS 2013) Memoized Online VB Inference for DPMs 26 June 2014 1 / 12

Review: Dirichlet process mixture models A draw, G , from a DP consists of an infinite collection of atoms: ∞ � G � G ∼ DP( α 0 H ) , w k δ φ k . (1) k =1 The mixture weights w k are represented by the stick-breaking process and the data-generating parameters φ k are drawn from the base measure H : k − 1 � w k = v k (1 − v ℓ ) , v k ∼ Beta(1 , α 0 ) , φ k ∼ H ( λ 0 ) (2) ℓ =1 Each data point n = 1 , . . . , N has cluster assignment z n and observation x n distributed according to z n ∼ Cat( w ) , x n ∼ F ( φ z n ) (3) Often, H and F are assumed to belong to the exponential family. Hughes and Sudderth (NIPS 2013) Memoized Online VB Inference for DPMs 26 June 2014 2 / 12

Overview of inference for DPM models 1 Variational inference is attractive for large-scale datasets However, full-dataset variational inference scales poorly and often converges to poor local optima 2 Stochastic online (SO) variational inference alternatively scales to large datasets On the downside, they are sensitive to the learning rate decay schedule and choice of batch size 3 The proposed memoized online (MO) variational inference avoids these noisy gradients and learning rates Requires multiple full passes through the data Birth and merge moves naturally help MO escape local optima Hughes and Sudderth (NIPS 2013) Memoized Online VB Inference for DPMs 26 June 2014 3 / 12

Mean-field variational inference for DP mixture models With mean-field inference, we seek to obtain a variational distribution, N K α 0 ) q ( φ k | ˆ � � q ( z , v , φ ) = q ( z n | ˆ r n ) q ( v k | ˆ α 1 , ˆ λ k ) , (4) n =1 k =1 with the following distributions on the individual factors: q ( φ k ) = H (ˆ q ( z n ) = Cat(ˆ r n 1 , . . . , ˆ r nK ) , q ( v k ) = Beta(ˆ α k 1 , ˆ α k 0 ) , λ k ) . The parameters of q are optimized such that the KL divergence from the true posterior is minimized; this results in maximizing the ELBO, L ( q ) � E q [log p ( x , v , z , φ | α 0 , λ 0 ) − log q ( v , z , φ )] (5) α 0 , and ˆ Maximizing this ELBO, we can iteratively update ˆ r n , ˆ α 1 , ˆ λ k . These batch updates are standard and presented in the paper. Hughes and Sudderth (NIPS 2013) Memoized Online VB Inference for DPMs 26 June 2014 4 / 12

Truncation strategy There are many methods to set the truncation level K of the DP: Place artificially large mass on the final component, i.e., q ( v K = 1) = 1 1 Set the stick-breaking ‘tail’ to the prior, i.e., q ( v k ) = p ( v k | α ) for k > K 2 Truncate the assignments to enforce q ( z n = k ) = 0 for k > K 3 This work uses method 3 above, which has several advantages: All data is explained by the first K components. This allows the data 1 to be conditionally independent to all parameters with k > K . Therefore, inference only needs to consider a finite set of K atoms 2 This minimizes unnecessary computation while still approximating the 3 infinite posterior Truncation is nested – any q with truncation K could be represented 4 exactly under truncation K + 1 with zero mass on final component Hughes and Sudderth (NIPS 2013) Memoized Online VB Inference for DPMs 26 June 2014 5 / 12

Stochastic online (SO) variational inference At each iteration t , SO processes only a subset of data, B t , sampled uniformly at random from the large corpus of data. SO first updates local factors q ( z n ) for n ∈ B t Then, with a noisy gradient step, SO updates the sufficient statistics of the global factors λ k , compute 1 ˆ For example, for ˆ λ ∗ N k = λ 0 + � n ∈B t ˆ r nk t ( x n ) |B t | Then update the global parameter as ˆ λ ( t ) ← ρ t ˆ k + (1 − ρ t )ˆ λ ( t − 1) λ ∗ k k ρ t is the learning rate. Convergence is guaranteed for appropriate decays of ρ t . Performance: This has computational advantages and sometimes achieves better solutions than the full-dataset algorithm However, it is sensitive to learning rate decay and choice of batch size 1 t represents the sufficient statistics of the observation distribution Hughes and Sudderth (NIPS 2013) Memoized Online VB Inference for DPMs 26 June 2014 6 / 12

Memoized online variational inference The data is divided into B fixed batches {B b } B b =1 Maintain memoized sufficient statistics 2 S b k = [ˆ N k ( B b ) , s k ( B b )] k = [ˆ Track full-dataset statistics S 0 N k , s k ( x )] Visit each distinct batch once per full pass through the data Update local parameters for current batch, i.e., ˆ r n for n ∈ B b 1 Update cached global sufficient statistics for each component: 2 k ← [ˆ S 0 k ← S 0 k − S b S b S 0 k ← S 0 k + S b k , N k ( B b ) , s k ( B b )] , (6) k α k 0 and ˆ Update global parameters, i.e., ˆ α k 1 , ˆ λ k 3 Advantages: Unlike SO, MO is guaranteed to improve the ELBO at every step 1 MO updates reduce to standard full-dataset updates 2 More scalable and converges faster than the full-dataset algorithm 3 Same computational complexity as SO, without need for learning rates 4 2 For notation, sufficient statistics are defined as ˆ N k � E q [ � N n =1 z nk ] and s k ( x ) � E q [ � N n =1 z nk t ( x n )] Hughes and Sudderth (NIPS 2013) Memoized Online VB Inference for DPMs 26 June 2014 7 / 12

Birth moves To escape local optima, we may wish to propose birth moves This is done in three steps: Collection : During pass 1, subsample data in targeted component k ′ Creation : Before pass 2, fit a DPM to subsampled data Adoption : During pass 2, update parameters with all K + K ′ components. Future merge moves will eliminate components. Hughes and Sudderth (NIPS 2013) Memoized Online VB Inference for DPMs 26 June 2014 8 / 12

Merge moves To reduce computation costs, we may wish to propose merge moves This merge move has three steps: Select components k a and k b to merge into k m 1 Form the candidate configuration q ′ by utilizing the additive properties: 2 S 0 k m = S 0 k a + S 0 ˆ r nk m = ˆ r nk a + ˆ r nk b (7) k b Accept q ′ only if the ELBO improves 3 For each pass of the data, the author’s proposed algorithm performs: One birth move Memoized ascent steps for all batches Several merges after the final batch Hughes and Sudderth (NIPS 2013) Memoized Online VB Inference for DPMs 26 June 2014 9 / 12

Results – toy data Data ( N = 100000) synthetic image patches generated by a zero-mean GMM with 8 equally-common components Each component has a 25 × 25 covariance matrix producing 5 × 5 patches. We wish to recover these matrices and the number of them. Hughes and Sudderth (NIPS 2013) Memoized Online VB Inference for DPMs 26 June 2014 10 / 12

Results – MNIST digit clustering Clustering N = 60000 MNIST images of handwritten digits 0-9. As preprocessing, all images projected to D = 50 via PCA. Hughes and Sudderth (NIPS 2013) Memoized Online VB Inference for DPMs 26 June 2014 11 / 12

Questions? Hughes and Sudderth (NIPS 2013) Memoized Online VB Inference for DPMs 26 June 2014 12 / 12

Memoized Online Variational Inference for Dirichlet Process Mixture - PowerPoint PPT Presentation

Memoized Online Variational Inference for Dirichlet Process Mixture Models Michael C. Hughes Erik B. Sudderth Department of Computer Science, Brown University 26 June 2014 Advances in Neural Information Processing Systems (2013) Presented by

Lecture 14: Inference in Dirichlet Processes (Blei & Jordan, Variational inference for

Deep Variational Inference FLARE Reading Group Presentation Wesley Tansey 9/28/2016 What is

Variational Inference for GPs: Presenters Group1: Stochastic variational inference. Slides 2 - 28

The Dirichlet-Bohr radius Manuel Maestre April 13, 2014 Kent State University Content

Variational Auto-encoders 2 VARIATIONAL AUTO-ENCODERS INTRODUCTION VARIATIONAL AUTO-ENCODERS

Rejection Sampling Variational Inference Karan Grewal CSC2547 / STA4273 Overview Variational

CS480/680 Machine Learning Lecture 11: February 11 th , 2020 Variational Inference Zahra

Regret bounds for online variational inference Pierre Alquier ACML Nagoya, Nov. 18, 2019

An Introduction to An Introduction to Variational Variational Methods for Graphical Models

Hierarchical Dirichlet Processes Presenters: Micah Hodosh, Yizhou Sun 4/7/2010 1 Content

Variational Inference for Dirichlet Process Mixtures By David Blei and Michael Jordan Presented

Lecture Variational 13 Inference Panini Kaushal Scribes : - Margulies Smedeuranh Niklas

CS 285 Instructor: Sergey Levine UC Berkeley Todays Lecture 1. Probabilistic latent variable

Variational Inference for Bayes vMF Mixture Hanxiao Liu September 23, 2014 1 / 14 Variational

Perspective Hierarchical Dirichlet Process for Perspective Hierarchical Dirichlet Process for

Boundary Representation of Dirichlet Forms on Canonically Compactifiable Graphs Michael Schwarz

COVID-19 Contact Tracing in Schools Louisiana Department of Health Office of Public Health

CONFERENCE CALL First nine months 2017 results IMCD N.V. 8 November 2017 Page 2 Disclaimer

Accountable Care Organization April 13, 2011 The Indianapolis Association of Health

Conference Call Quarterly Information Q1 2010 April 20, 2010 21 avril 2010 Q1 2010 Revenue:

Full Year Results Presentation Transcript 15 th May 2014 Full Year Results Presentation 15th May

Role of Fuel Sensitivity in Extending the HCCI Engine Operating Window Andrew Smallbone, Amit

Atascocita Middle School Parent Night Introduction 6 th grade 2019-2020 Principal, Mr.

Budget Goals Maintain a flat budget without impacting students academic experiences.

Memoized Online Variational Inference for Dirichlet Process Mixture - PowerPoint PPT Presentation

Memoized Online Variational Inference for Dirichlet Process Mixture Models Michael C. Hughes Erik B. Sudderth Department of Computer Science, Brown University 26 June 2014 Advances in Neural Information Processing Systems (2013) Presented by

Lecture 14: Inference in Dirichlet Processes (Blei &amp; Jordan, Variational inference for

Deep Variational Inference FLARE Reading Group Presentation Wesley Tansey 9/28/2016 What is

Variational Inference for GPs: Presenters Group1: Stochastic variational inference. Slides 2 - 28

The Dirichlet-Bohr radius Manuel Maestre April 13, 2014 Kent State University Content

Variational Auto-encoders 2 VARIATIONAL AUTO-ENCODERS INTRODUCTION VARIATIONAL AUTO-ENCODERS

Rejection Sampling Variational Inference Karan Grewal CSC2547 / STA4273 Overview Variational

CS480/680 Machine Learning Lecture 11: February 11 th , 2020 Variational Inference Zahra

Regret bounds for online variational inference Pierre Alquier ACML Nagoya, Nov. 18, 2019

An Introduction to An Introduction to Variational Variational Methods for Graphical Models

Hierarchical Dirichlet Processes Presenters: Micah Hodosh, Yizhou Sun 4/7/2010 1 Content

Variational Inference for Dirichlet Process Mixtures By David Blei and Michael Jordan Presented

Lecture Variational 13 Inference Panini Kaushal Scribes : - Margulies Smedeuranh Niklas

CS 285 Instructor: Sergey Levine UC Berkeley Todays Lecture 1. Probabilistic latent variable

Variational Inference for Bayes vMF Mixture Hanxiao Liu September 23, 2014 1 / 14 Variational

Perspective Hierarchical Dirichlet Process for Perspective Hierarchical Dirichlet Process for

Boundary Representation of Dirichlet Forms on Canonically Compactifiable Graphs Michael Schwarz

COVID-19 Contact Tracing in Schools Louisiana Department of Health Office of Public Health

CONFERENCE CALL First nine months 2017 results IMCD N.V. 8 November 2017 Page 2 Disclaimer

Accountable Care Organization April 13, 2011 The Indianapolis Association of Health

Conference Call Quarterly Information Q1 2010 April 20, 2010 21 avril 2010 Q1 2010 Revenue:

Full Year Results Presentation Transcript 15 th May 2014 Full Year Results Presentation 15th May

Role of Fuel Sensitivity in Extending the HCCI Engine Operating Window Andrew Smallbone, Amit

Atascocita Middle School Parent Night Introduction 6 th grade 2019-2020 Principal, Mr.

Budget Goals Maintain a flat budget without impacting students academic experiences.

Lecture 14: Inference in Dirichlet Processes (Blei & Jordan, Variational inference for