variational laplace autoencoders

Variational Laplace Autoencoders Yookoon Park, Chris Dongjoo Kim and - PowerPoint PPT Presentation

Variational Laplace Autoencoders Yookoon Park, Chris Dongjoo Kim and Gunhee Kim Vision and Learning Lab Seoul National University, South Korea Introduction - Variational Autoencoders - Two Challenges of Amortized Variational Inference -


  1. Variational Laplace Autoencoders Yookoon Park, Chris Dongjoo Kim and Gunhee Kim Vision and Learning Lab Seoul National University, South Korea

  2. Introduction - Variational Autoencoders - Two Challenges of Amortized Variational Inference - Contributions

  3. Variational Autoencoders (VAEs) β€’ Generative network πœ„ π‘ž # 𝐲 𝐴 = π’ͺ(𝒉 # 𝐴 , 𝜏 , 𝐉) , π‘ž 𝐴 = π’ͺ(𝟏, 𝐉) β€’ Inference network 𝜚 : amortized inference of π‘ž # 𝐴 𝐲 , (𝐲) ) π‘Ÿ 2 𝐴 𝐲 = π’ͺ(𝝂 2 𝐲 , diag 𝝉 2 β€’ Networks jointly trained by maximizing the Evidence Lower Bound (ELBO) β„’ 𝐲 = 𝔽 ; log π‘ž # 𝐲, 𝐴 βˆ’ log π‘Ÿ 2 𝐴 𝐲 = log π‘ž # 𝐲 βˆ’ 𝐸 @A (π‘Ÿ 2 𝐴 𝐲 βˆ₯ π‘ž # 𝐴 𝐲 ) ≀ log π‘ž # (𝐲) Kingma, D. P. and Welling, M. Auto-encoding variational bayes. In ICLR , 2014. Rezende, D. J., Mohamed, S., and Wierstra, D. Stochastic backpropagation and approximate inference in deep generative models. In ICML , 2014

  4. Two Challenges of Amortized Variational Inference 1. Enhancing the expressiveness of π‘Ÿ 2 𝐴 𝐲 β€’ The full-factorized assumption is restrictive to capture complex posteriors β€’ E.g. normalizing flows (Rezende & Mohamed, 2015; Kingma et al., 2016) 2. Reducing the amortization error of π‘Ÿ 2 𝐴 𝐲 β€’ The error due to the inaccuracy of the inference network β€’ E.g. gradient-based refinements of π‘Ÿ 2 𝐴 𝐲 (Kim et al, 2018; Marino et al., 2018; Krishnan et al. 2018) Rezende, D. J. and Mohamed, S. Variational inference with normalizing flows. In ICML , 2015. Kingma, D. P., Salimans, T., Jozefowicz, R., Chen, X., Sutskever, I., and Welling, M. Improved variational inference with inverse autoregressive flow. In NeurIPS , 2016. Kim, Y., Wiseman, S., Millter, A. C., Sontag, D., and Rush, A. M. Semi-amortized variational autoencoders. In ICML , 2018. Marino, J., Yisong, Y., and Mandt, S. Iterative amortized inference. In ICML , 2018. Krishnan, R. G., Liang, D., and Hoffman, M. D. On the challenges of learning with inference networks on sparse high-dimensional data. In AISTAT , 2018.

  5. Contributions β€’ The Laplace approximation of the posterior to improve the training of latent deep generative models with: 1. Enhanced expressiveness of full-covariance Gaussian posterior 2. Reduced amortization error due to direct covariance computation from the generative network behavior β€’ A novel posterior inference exploiting local linearity of ReLU networks

  6. Approach - Posterior Inference using Local Linear Approximations - Generalization: Variational Laplace Autoencoders

  7. Observation 1: Probabilistic PCA β€’ A linear Gaussian model (Tipping & Bishop, 1999) π‘ž(𝐴) = π’ͺ 𝟏, 𝐉 π‘ž # 𝐲 𝐴 = π’ͺ(𝐗𝐴 + 𝐜, 𝜏 , 𝐉) β€’ The posterior distribution is exactly 1 𝜏 , πš»π— 𝐔 𝐲 βˆ’ 𝐜 , 𝚻 π‘ž # 𝐴 𝐲 = π’ͺ NO 1 𝜏 , 𝐗 𝐔 𝐗 + 𝐉 where 𝚻 = Toy example . 1-dim pPCA on 2-dim data Tipping, M. E. and Bishop, C. M. Probabilistic Principal Component Analysis. J. R. Statist. Soc. B , 61(3):611–622, 1999.

  8. Observation 2: Piece-wise Linear ReLU Networks β€’ ReLU networks are piece-wise linear (Pascanu et al., 2014; Montufar et al., 2014) 𝒉 # 𝐴 β‰ˆ 𝐗 𝐴 𝐴 + 𝐜 𝐴 β€’ Locally equivalent to probabilistic PCA π‘ž # 𝐲 𝐴 β‰ˆ π’ͺ(𝐗 𝐴 𝐴 + 𝐜 𝐴 , 𝜏 , 𝐉) Toy example . 1-dim ReLUVAE on 2-dim data Pascanu, R., Montufar, G., and Bengio, Y. On the number of response regions of deep feedforward networks with piecewise linear activations. In ICLR , 2014. Montufar, G., Pascanu, R., Cho, K., and Bengio, Y. On the number of linear regions of deep neural networks. In NeurIPS , 2014.

  9. Posterior Inference using Local Linear Approximations Linear models give exact ReLU networks are posterior distribution locally linear Observation 2 Observation 1 Posterior approximation based on the local linearity

  10. Posterior Inference using Local Linear Approximations 1. Iteratively find the posterior mode 𝝂 where the density is concentrated β€’ Solve under the linear assumption 𝒉 # 𝝂 𝒖 β‰ˆ 𝐗 𝒖 𝝂 R + 𝐜 𝒖 NO 𝝂 RSO = 1 1 𝐔 𝐲 βˆ’ 𝐜 𝐔 𝐗 R + 𝐉 𝜏 , 𝐗 R 𝐗 R 𝜏 , β€’ Repeat for T steps 2. Posterior approximation using π‘ž # 𝐲 𝐴 β‰ˆ π’ͺ(𝐗 𝝂 𝐴 + 𝐜 𝝂 , 𝜏 , 𝐉) NO 1 𝐔 𝐗 𝝂 + 𝐉 π‘Ÿ 𝐴 𝐲 = π’ͺ 𝝂, 𝚻 , where 𝚻 = 𝜏 , 𝐗 𝝂

  11. Generalization: Variational Laplace Autoencoders 1. Find the posterior mode s.t. βˆ‡ 𝐴 log π‘ž 𝐲, 𝐴 | 𝐴V𝝂 = 0 β€’ Initialize 𝝂 X using the inference network β€’ Iteratively refine 𝝂 R (e.g. use gradient-descent) 2. The Laplace approximation defines the posterior as: π‘Ÿ 𝐴 𝐲 = π’ͺ 𝝂, 𝚻 , where 𝚻 N𝟐 = 𝚳 = βˆ’βˆ‡ 𝐴 , log π‘ž 𝐲, 𝐴 | 𝐴V𝝂 3. Evaluate the ELBO using π‘Ÿ 𝐴 𝐲 and train the model

  12. Results - Posterior Covariance - Log-likelihood Results

  13. Experiments β€’ Image datasets: MNIST, OMNIGLOT, Fashion MNIST, SVHN, CIFAR10 β€’ Baselines β€’ VAE β€’ Semi-Amortized (SA) VAE (Kim et al, 2018) β€’ VAE + Householder Flows (HF) (Tomczak & Welling, 2016) β€’ Variational Laplace Autoencoder (VLAE) β€’ T=1, 2, 4, 8 (number of iterative updates or flows)

  14. Posterior Covariance Matrices

  15. Log-likelihood Results on CIFAR10 2390 2370 2350 VAE SA-VAE VAE+HF VLAE T=1 T=2 T=3 T=4

  16. Thank you Visit our poster session at Pacific Ballroom #2 Code available at : https://github.com/yookoon/VLAE

Recommend


More recommend