Variational Inference for Bayes vMF Mixture Hanxiao Liu September - PowerPoint PPT Presentation

Variational Inference for Bayes vMF Mixture Hanxiao Liu September 23, 2014 1 / 14

Variational Inference Review Lower bound the likelihood L ( θ ; X ) = E q log p ( X | θ ) � � � � log p ( X , Z | θ ) q ( Z ) = E q + E q log q ( Z ) p ( Z | X , θ ) � �� VLB ( q , θ ) D KL ( q ( Z ) || p ( Z | X , θ )) Raise VLB ( q , θ ) by coordinate ascent � q , θ t � 1. q t + 1 = argmax VLB q = � M i = 1 q i 2. θ t + 1 = argmax θ VLB � q t + 1 , θ � 2 / 14

Variational Inference Review � q , θ t � Goal : solve by coordinate ascent, i.e. argmax VLB q = � M i = 1 q i sequentially updating a single q i in each iteration. Each coordinate step has a closed-form solution— � � � X , Z | θ t � log p � q j ; q − j , θ t � = E q VLB q ( Z ) M � � X , Z | θ t � = E q log p − E q log q i i = 1 � X , Z | θ t � = E q j E q − j log p − E q j log q j + const � �� log ˜ q j + const � q j log ˜ q j = + const = − D KL ( q j || ˜ q j ) + const q j � X , Z | θ t � ⇒ log q ∗ = j = E q − j log p + const 3 / 14

Compute log p ( X , Z | θ ) N � � � p ( X , Z | θ ) = Dirichlet ( π | α ) × Multi ( z i | π ) vMF x i | µ z i , κ z i i = 1 K � � κ k | m , σ 2 � × vMF ( µ k | µ 0 , C 0 ) logNormal k = 1 K � log p ( X , Z | θ ) = − log B ( α ) + ( α − 1 ) log π k k = 1 N K N K � � � � � � log C D ( κ k ) + κ k x ⊤ + z ik log π k + z ik i µ k i = 1 k = 1 i = 1 k = 1 K � � � log C D ( C 0 ) + C 0 µ ⊤ + k µ 0 k = 1 � � K − ( log κ k − m ) 2 − log κ k − 1 � � 2 πσ 2 � + 2 log 2 σ 2 k = 1 5 / 14

Updating q ( π ) ? q ( π ) ≡ Dirichlet ( ·| ρ ) log q ∗ ( π ) = E q \ π log p ( X , Z | θ ) + const � K � N K � � � = E q \ π ( α − 1 ) log π k + + const z ik log π k i = 1 k = 1 k = 1 � � K N � � = α + E q [ z ik ] − 1 log π k + const k = 1 i = 1 K α + � N � i = 1 E q [ z ik ] − 1 ⇒ q ∗ ( π ) ∝ = ∼ Dirichlet π k k = 1 N � ⇒ ρ ∗ = k = α + E q [ z ik ] i = 1 6 / 14

Updating q ( z i ) ? q ( z i ) ≡ Multi ( ·| λ i ) log q ∗ ( z i ) = E q \ z i log p ( X , Z | θ ) + const � N � K N K � � � � � � log C D ( κ k ) + κ k x ⊤ = E q \ z i z ik log π k + z ik i µ k + const i = 1 k = 1 i = 1 k = 1 K � � � E q log π k + E q log C D ( κ k ) + E q [ κ k ] x ⊤ = i E q [ µ k ] + const z ik k = 1 ⇒ q ∗ ( z i ) ∼ Multi , λ ∗ ik ∝ e E q log π k + E q log C D ( κ k )+ E q [ κ k ] x ⊤ i E q [ µ k ] = Assume E q log π k , E q log C D ( κ k ) , E q [ κ k ] and E q [ µ k ] are already known. We will explicitly compute them later. 7 / 14

Updating q ( µ k ) ? q ( µ k ) ≡ vMF ( ·| ψ k , γ k ) log q ∗ ( µ k ) = E q \ µ k log p ( X , Z | θ ) + const   N K K � � � z ij κ j x ⊤ C 0 µ ⊤  + const = E q \ µ k i µ j + j µ 0  i = 1 j = 1 j = 1 � N � � E q [ z ik ] x ⊤ + C 0 µ ⊤ = E q [ κ k ] i µ k k µ 0 + const i = 1 � �� N � � ⊤ µ k ∼ vMF E q [ κ k ] i = 1 E q [ z ik ] x i + C 0 µ 0 ⇒ q ∗ ( µ k ) ∝ e = �� N � � � N � � E q [ κ k ] i = 1 E q [ z ik ] x i + C 0 µ 0 � � � γ ∗ � � , ψ ∗ k = E q [ κ k ] E q [ z ik ] x i + C 0 µ 0 k = � � γ k � � i = 1 8 / 14

Updating q ( κ k ) ? q ( κ k ) ≡ logNormal ( ·| a k , b k ) log q ∗ ( κ k ) = E q \ κ k log p ( X , Z | θ ) + const � � N K K − log κ j − ( log κ j − m ) 2 � � � � � log C D ( κ j ) + κ j x ⊤ = E q \ κ k z ij i µ j + + const 2 σ 2 i = 1 j = 1 j = 1 � � N − log κ k − ( log κ k − m ) 2 � � � log C D ( κ k ) + κ k x ⊤ = E q \ κ k + const z ik i µ k 2 σ 2 i = 1 N − log κ k − ( log κ k − m ) 2 � E q [ z ik ] � i E q [ µ k ] � log C D ( κ k ) + κ k x ⊤ = + const 2 σ 2 i = 1 ⇒ q ∗ ( κ k ) �∼ logNormal = due to the existence of log C D ( κ k ) 9 / 14

Intermediate Quantities Some intermediate quantities are in closed-form ◮ q ( z i ) ≡ Multi ( z i | λ i ) = ⇒ E q [ z ij ] = λ ij �� ◮ q ( π ) ≡ Dirichlet ( π | ρ ) = ⇒ E q log π k = Ψ ( ρ k ) − Ψ j ρ j I D ( γ k ) 1 ◮ q ( µ k ) ≡ vMF ( µ k | ψ k , γ k ) = ⇒ E q [ µ k ] = 2 − 1 ( γ k ) ψ k 2 I D [Rothenbuehler, 2005] Some are not— E q [ κ k ] and E q log C D ( κ k ) 1. the absence of a good parametric form of q ( κ k ) ◮ apply sampling 2. even if κ k ∼ logNormal is assumed, E q log C D ( κ k ) is still hard to deal with ◮ bound log C D ( · ) by some simple functions 1 can be derived from the characteristic function of vMF 10 / 14

Sampling In principle we can sample κ k from p ( κ k | X , θ ) . Unfortunately, the sampling procedure above requires the samples of z i , µ k , π , . . . which are not maintained by variational inference. Recall the optimal posterior for κ k satisfies 2 log q ∗ ( κ k ) N − log κ k − ( log κ k − m ) 2 � � � log C D ( κ k ) + κ k x ⊤ = E [ z ik ] i E q [ µ k ] + const 2 σ 2 i = 1 � N � � ⇒ q ∗ ( κ k ) ∝ exp � � log C D ( κ k ) + κ k x ⊤ = E [ z ik ] i E q [ µ k ] i = 1 � κ k | m , σ 2 � × logNormal We can sample from q ∗ ( κ k ) ! 2 see derivation on p.8 11 / 14

Bounding Outline ◮ Assume q ( κ k ) ≡ logNormal ( ·| a k , b k ) ◮ Lower bound E q log C D ( κ k ) in VLB by some simple terms ◮ To optimize q ( κ k ) , use gradient ascent w.r.t a k and b k to raise the VLB Empirically, sampling outperforms bounding 12 / 14

Empirical Bayes for Hyperparameters Raise VLB ( q , θ ) by coordinate ascent � q , θ t � 1. q t + 1 = argmax VLB q = � M i = 1 q i 2. θ t + 1 = argmax θ VLB � q t + 1 , θ � = argmax θ E q t + 1 log p ( X , Z | θ ) For example, one can use gradient ascent to optimize α K � max − log B ( α ) + ( α − 1 ) E q t + 1 [ log π k ] α> 0 k = 1 m , σ 2 , µ 0 and C 0 can be optimized in a similar manner 3 3 Unlike α , their solutions can be written in closed-form 13 / 14

Reference I Banerjee, A., Dhillon, I. S., Ghosh, J., and Sra, S. (2005). Clustering on the unit hypersphere using von mises-fisher distributions. In Journal of Machine Learning Research , pages 1345–1382. Gopal, S. and Yang, Y. (2014). Von mises-fisher clustering models. In Proceedings of The 31st International Conference on Machine Learning , pages 154–162. Rothenbuehler, J. (2005). Dependence Structures beyond copulas: A new model of a multivariate regular varying distribution based on a finite von Mises-Fisher mixture model . PhD thesis, Cornell University. 14 / 14

Variational Inference for Bayes vMF Mixture Hanxiao Liu September - PowerPoint PPT Presentation

Variational Inference for Bayes vMF Mixture Hanxiao Liu September 23, 2014 1 / 14 Variational Inference Review Lower bound the likelihood L ( ; X ) = E q log p ( X | ) log p ( X , Z | ) q ( Z ) = E q + E q log q (

Deep Variational Inference FLARE Reading Group Presentation Wesley Tansey 9/28/2016 What is

Variational Inference for GPs: Presenters Group1: Stochastic variational inference. Slides 2 - 28

Variational Auto-encoders 2 VARIATIONAL AUTO-ENCODERS INTRODUCTION VARIATIONAL AUTO-ENCODERS

EM & Variational Bayes Hanxiao Liu September 9, 2014 1 / 19 Outline 1. EM Algorithm 1.1

Another Walkthrough of Variational Bayes Bevan Jones ML for NLP Reading Group The University of

Bernoulli Mixture Models Victor Medina Researcher at SBIF DataCamp Mixture Models in R The

Structure of mixture models Victor Medina Researcher at SBIF DataCamp Mixture Models in R

Naive Bayes and Gaussian Bayes Classifier Ladislav Rampasek slides by Mengye Ren and others

Fast and Simple Natural-Gradient Variational Inference with Mixture of Exponential-family

Rejection Sampling Variational Inference Karan Grewal CSC2547 / STA4273 Overview Variational

CS480/680 Machine Learning Lecture 11: February 11 th , 2020 Variational Inference Zahra

The Nave Bayes Classifier Machine Learning 1 Todays lecture The nave Bayes Classifier

Bayes Theorem Thomas Bayes (1701-1761) Simple form of Bayes Theorem, for

An Introduction to An Introduction to Variational Variational Methods for Graphical Models

Lecture Variational 13 Inference Panini Kaushal Scribes : - Margulies Smedeuranh Niklas

CS 285 Instructor: Sergey Levine UC Berkeley Todays Lecture 1. Probabilistic latent variable

Extended Variational Inference for Non-Gaussian Statistical Models Zhanyu Ma

Natural Language Processing Lecture 132/26/2015 Martha Palmer Today Start on Parsing

Kai-Wei Chang UCLA References: http://kwchang.net Kai-Wei Chang

1 2 3 4 5

GDP and More: Performance and Power Solutions for Multi-Core VLSI Systems Hai Wang University

design of the radiation cooled positron target Sabine Riemann (DESY), Andriy Ushakov (UHH),

A Photon Dump Study for ILC Undulator Positron Source Yu Morikawa 2017/9/20 1 ILC Beam Dumps

Estimation in nonparametric regression with discrete errors Wolfgang Wefelmeyer (University of

Variational Inference for Bayes vMF Mixture Hanxiao Liu September - PowerPoint PPT Presentation

Variational Inference for Bayes vMF Mixture Hanxiao Liu September 23, 2014 1 / 14 Variational Inference Review Lower bound the likelihood L ( ; X ) = E q log p ( X | ) log p ( X , Z | ) q ( Z ) = E q + E q log q (

Deep Variational Inference FLARE Reading Group Presentation Wesley Tansey 9/28/2016 What is

Variational Inference for GPs: Presenters Group1: Stochastic variational inference. Slides 2 - 28

Variational Auto-encoders 2 VARIATIONAL AUTO-ENCODERS INTRODUCTION VARIATIONAL AUTO-ENCODERS

EM &amp; Variational Bayes Hanxiao Liu September 9, 2014 1 / 19 Outline 1. EM Algorithm 1.1

Another Walkthrough of Variational Bayes Bevan Jones ML for NLP Reading Group The University of

Bernoulli Mixture Models Victor Medina Researcher at SBIF DataCamp Mixture Models in R The

Structure of mixture models Victor Medina Researcher at SBIF DataCamp Mixture Models in R

Naive Bayes and Gaussian Bayes Classifier Ladislav Rampasek slides by Mengye Ren and others

Fast and Simple Natural-Gradient Variational Inference with Mixture of Exponential-family

Rejection Sampling Variational Inference Karan Grewal CSC2547 / STA4273 Overview Variational

CS480/680 Machine Learning Lecture 11: February 11 th , 2020 Variational Inference Zahra

The Nave Bayes Classifier Machine Learning 1 Todays lecture The nave Bayes Classifier

Bayes Theorem Thomas Bayes (1701-1761) Simple form of Bayes Theorem, for

An Introduction to An Introduction to Variational Variational Methods for Graphical Models

Lecture Variational 13 Inference Panini Kaushal Scribes : - Margulies Smedeuranh Niklas

CS 285 Instructor: Sergey Levine UC Berkeley Todays Lecture 1. Probabilistic latent variable

Extended Variational Inference for Non-Gaussian Statistical Models Zhanyu Ma

Natural Language Processing Lecture 132/26/2015 Martha Palmer Today Start on Parsing

Kai-Wei Chang UCLA References: http://kwchang.net Kai-Wei Chang

1 2 3 4 5

GDP and More: Performance and Power Solutions for Multi-Core VLSI Systems Hai Wang University

design of the radiation cooled positron target Sabine Riemann (DESY), Andriy Ushakov (UHH),

A Photon Dump Study for ILC Undulator Positron Source Yu Morikawa 2017/9/20 1 ILC Beam Dumps

Estimation in nonparametric regression with discrete errors Wolfgang Wefelmeyer (University of

EM & Variational Bayes Hanxiao Liu September 9, 2014 1 / 19 Outline 1. EM Algorithm 1.1