 
              Fast and Simple Natural-Gradient Variational Inference with Mixture of Exponential-family Approximations Wu Lin (UBC) June 11, 2019 Joint work with Mohammad Emtiyaz Khan (AIP, RIKEN) and Mark Schmidt (UBC) 1 / 7
Variational Inference (VI) VI approximates the posterior p ( z |D ) ≈ q ( z | λ z ) by maximizing the evidence lower bound: Probabilistic Model � �� � � � ELBO: max L ( λ z ) := E q log p ( D , z ) − log q ( z | λ z ) ���� λ z data where q ( z ) is a tractable distribution parametrized by λ z . 2 / 7
ELBO Optimization Block-box VI (BBVI): λ z ← λ z + β ∇ λ z L ( λ z ) Natural-gradient VI (NGVI): natural gradient � �� � F z ( λ z ) − 1 ∇ λ z L ( λ z ) λ z ← λ z + β where F z ( λ z ) is the Fisher information matrix of q ( z | λ z ). Advantages of NGVI: ◮ NGVI can be simple and fast when q is in the exponential family (e.g., Gaussian) (Khan and Lin, AI&Stats 2017). NGVI for Exp-Family: λ z ← λ z + β ∇ m z L ( λ z ) because ∇ m z L ( λ z ) = F z ( λ z ) − 1 ∇ λ z L ( λ z ). Australian Breast Cancer 2 . 00 1 . 0 Gradient VI Gradient VI 1 . 75 0 . 8 Natural-Gradient VI Natural-Gradient VI Test log 2 loss Test log 2 loss 1 . 50 0 . 6 1 . 25 0 . 4 1 . 00 0 . 2 0 . 75 0 . 50 0 . 0 0 500 1000 1500 2000 0 500 1000 1500 2000 Epoch Epoch 3 / 7
Problem Formulation Challenges of NGVI when q ( z ) is not in the exponential-family : ◮ Computing F z ( λ z ) − 1 ∇ λ z L ( λ z ) could be complicated. ◮ F z ( λ z ) can be singular. ◮ Often no simple update beyond exponential family. Our goal: perform a simple NGVI update for more flexible variational approximations (e.g., skewness, multi-modality) 10 -3 10 1 2 3 5 log P log P log P 9 10 10 10 4.5 8 5 5 5 4 -7 -6 -7 -6 -7 -6 7 logit u logit u logit u 3.5 4 5 6 6 3 log P log P log P 10 10 10 5 2.5 5 5 5 4 2 -7 -6 -7 -6 -7 -6 logit u logit u logit u 3 1.5 7 8 exact log P log P log P 2 1 10 10 10 1 0.5 5 5 5 -7 -6 -7 -6 -7 -6 0 0 0 5 10 15 20 logit u logit u logit u (a) Skew Gaussian (b) Finite Mixture of Gaussians 4 / 7
This Work Main Contribution: propose a new NGVI update for a class of mixture of exponential family distributions. We consider the following mixture: � q ( z | λ ) = q ( z | w , λ z ) q ( w | λ w ) d w � �� � � �� � exp-family exp-family We propose to use the (joint) Fisher matrix F wz of q ( w , z | λ ) since: ∇ m L ( λ ) = F wz ( λ ) − 1 ∇ λ L ( λ ) where m is the proposed expectation parameter. ◮ Proposed NGVI update: λ ← λ + β ∇ m L ( λ ) 5 / 7
Proposed NGVI Advantage of the proposed NGVI: ◮ Has the same cost as BBVI if computing ∇ m L ( λ ) is easy. ◮ Is faster than BBVI. breast_cancer_scale 10 0 wine covtype_scale BBVI-1 10 2 10 7 BBVI(Gauss) BBVI-3 Negative ELBO NGVI(Skew-Gauss) BBVI-5 Test RMSE KL(q|p) M=290 BBVI(Skew-Gauss) BBVI-10 10 −1 BBVI 10 1 NGVI-1 10 6 M=32 M=32 M=32 NGVI NGVI-3 NGVI-5 NGVI-10 10 −2 10 0 10 5 10 1 10 2 10 3 10 4 0 5000 10000 10 1 10 2 10 3 10 4 Iterations Iterations Iterations Variational approximations: ◮ Finite mixture of exp-family distributions: Mixture of Gaussians (multi-modality) Birnbaum-Saunders distribution (non-Gaussian mixture) ◮ Gaussian compound distribution: Skew Gaussian (skewness) Normal inverse-Gaussian (heavy tails) 6 / 7
Summary & Poster Presentation Conclusion: a simple NGVI update for approximations outside the exp-family. Poster Presentation: ◮ This work: Poster #217, Pacific Ballroom, Today, 6:30 PM ◮ New gradient estimators via Stein’s lemma: “Stein’s Lemma for the Reparameterization Trick with Exponential-family Mixtures”, the workshop on Stein’s method, Saturday 7 / 7
Recommend
More recommend