MCMC and Variational Inference for AutoEncoders Achille Thin 1 , - PowerPoint PPT Presentation

Introduction Deep Latent Generative Models (DLGMs) MetFlow and MetVAE: MCMC & VI From classical to Flow-based MCMC Experiments MCMC and Variational Inference for AutoEncoders Achille Thin 1 , Alain Durmus 2 , Eric Moulines 1 1 Ecole Polytechnique, 2 ENS Paris-Saclay September 9, 2020 MCMC and Variational Inference for AutoEncoders

Introduction Deep Latent Generative Models (DLGMs) MetFlow and MetVAE: MCMC & VI From classical to Flow-based MCMC Experiments Introduction Deep Latent Generative Models (DLGMs) MetFlow and MetVAE: MCMC & VI From classical to Flow-based MCMC Experiments MCMC and Variational Inference for AutoEncoders

Introduction Deep Latent Generative Models (DLGMs) MetFlow and MetVAE: MCMC & VI From classical to Flow-based MCMC Experiments Problem MCMC and Variational Inference for AutoEncoders

Introduction Deep Latent Generative Models (DLGMs) MetFlow and MetVAE: MCMC & VI From classical to Flow-based MCMC Experiments Generative modelling objective ◮ Objective: Learn and sample from a model of the true underlying data distribution p ∗ given a dataset { x 1 , . . . , x n } where x i ∈ R P , with P ≫ 1 . ◮ Two-steps ◮ Specify a class of model { p θ , θ ∈ Θ } . θ n by maximizing the likelihood ◮ Find the best ˆ n θ n = arg max ˆ � log p θ ( x i ) . θ i =1 MCMC and Variational Inference for AutoEncoders

Introduction Deep Latent Generative Models (DLGMs) Markov Chain Monte Carlo (MCMC) MetFlow and MetVAE: MCMC & VI Variational Inference From classical to Flow-based MCMC Implementation & Deep Learning Experiments Introduction Deep Latent Generative Models (DLGMs) Markov Chain Monte Carlo (MCMC) Variational Inference Implementation & Deep Learning MetFlow and MetVAE: MCMC & VI From classical to Flow-based MCMC Experiments MCMC and Variational Inference for AutoEncoders

Introduction Deep Latent Generative Models (DLGMs) Markov Chain Monte Carlo (MCMC) MetFlow and MetVAE: MCMC & VI Variational Inference From classical to Flow-based MCMC Implementation & Deep Learning Experiments Latent variable modelling ◮ Autoencoders assume the existence of a latent variable whose dimension D is much smaller than the dimension of the observation P . ◮ Attached to the latent variable z ∈ R D is a prior distribution π from which we can sample from. ◮ The specification of the model is completed by specifying the conditional distribution of the observation x given the latent variable z : x | z ∼ p θ ( x | z ) ◮ The marginal likelihood of the observations is obtained by computing first the joint distribution of the observation and the latent variable p θ ( x, z ) = p θ ( x | z ) π ( z ) and then marginalizing w.r.t. the latent variable z : � p θ ( x ) = p θ ( x | z ) π ( z )d z . MCMC and Variational Inference for AutoEncoders

Introduction Deep Latent Generative Models (DLGMs) Markov Chain Monte Carlo (MCMC) MetFlow and MetVAE: MCMC & VI Variational Inference From classical to Flow-based MCMC Implementation & Deep Learning Experiments Data Generation with Latent variables ◮ Draw latent variable z ∼ π . ◮ Draw observation x | z ∼ p θ ( x | z ) . ◮ Each region in the latent space is associated to a particular form of observation. MCMC and Variational Inference for AutoEncoders

Introduction Deep Latent Generative Models (DLGMs) Markov Chain Monte Carlo (MCMC) MetFlow and MetVAE: MCMC & VI Variational Inference From classical to Flow-based MCMC Implementation & Deep Learning Experiments Optimisation of the model ◮ Estimation Perform maximum likelihood estimation with stochastic gradient techniques. ◮ Obtain unbiased estimators of the gradient of � p θ ( x ) = p θ ( x | z ) π ( z )d z . ◮ Usually untractable !! MCMC and Variational Inference for AutoEncoders

Introduction Deep Latent Generative Models (DLGMs) Markov Chain Monte Carlo (MCMC) MetFlow and MetVAE: MCMC & VI Variational Inference From classical to Flow-based MCMC Implementation & Deep Learning Experiments Fisher’s Identity ◮ Idea: take advantage of Fisher’s identity: � ∇ θ p θ ( x, z ) ∇ θ log p θ ( x ) = d z p θ ( x ) � ∇ θ log p θ ( x, z ) p θ ( x, z ) = p θ ( x ) d z � = ∇ θ log p θ ( x, z ) p θ ( z | x )d z . ◮ Gradient of incomplete likelihood of the observations is computed using the complete likelihood (which is tractable !) ◮ However, we need to sample from the posterior p θ ( z | x ) . MCMC and Variational Inference for AutoEncoders

Introduction Deep Latent Generative Models (DLGMs) Markov Chain Monte Carlo (MCMC) MetFlow and MetVAE: MCMC & VI Variational Inference From classical to Flow-based MCMC Implementation & Deep Learning Experiments Markov Chain Monte Carlo ◮ Idea: Build an ergodic Markov chain whose invariant distribution is the target, known up to a normalization constant: p θ ( z | x ) ∝ π ( z ) p θ ( x | z ) . ◮ Metropolis Hastings (MH) algorithms is an option - Draw a proposal z ′ from q θ ( z ′ | z, x ) - Accept / Reject the proposal with probability α θ ( z, z ′ ) = 1 ∧ p θ ( z ′ | x ) q θ ( z | z ′ , x ) p θ ( z | x ) q θ ( z ′ | z, x ) . Figure: Markov chain targetting a correlated Gaussian distribution MCMC and Variational Inference for AutoEncoders

Introduction Deep Latent Generative Models (DLGMs) Markov Chain Monte Carlo (MCMC) MetFlow and MetVAE: MCMC & VI Variational Inference From classical to Flow-based MCMC Implementation & Deep Learning Experiments Markov Chain Monte Carlo ◮ Many recent advances for efficient MCMC methods, using Langevin dynamics, Hamiltonian Monte Carlo. ◮ Pros: provide a theoretically sound framework to sample from p θ ( z | x ) ∝ p θ ( x | z ) π ( z ) (known up to a constant). ◮ Cons: − mixing times in high dimensions. − convergence assessment. − multimodality (metastability). ◮ But Cons do not always outweights the Pros, see [HM19] MCMC and Variational Inference for AutoEncoders

Introduction Deep Latent Generative Models (DLGMs) Markov Chain Monte Carlo (MCMC) MetFlow and MetVAE: MCMC & VI Variational Inference From classical to Flow-based MCMC Implementation & Deep Learning Experiments Variational Inference ◮ Idea: Introduce a parametric family of probability distributions Q = { q φ , φ ∈ Φ } . ◮ Goal minimize a divergence between q φ and the untractable posterior p θ ( · | x ) . ◮ For each observation x : different target posterior p θ ( z | x ) . ◮ Idea: use amortized Variational Inference: x �→ q φ ( z | x ) . MCMC and Variational Inference for AutoEncoders

Introduction Deep Latent Generative Models (DLGMs) Markov Chain Monte Carlo (MCMC) MetFlow and MetVAE: MCMC & VI Variational Inference From classical to Flow-based MCMC Implementation & Deep Learning Experiments Variational Inference ◮ Evidence Lower BOund (ELBO) � p θ ( x, z ) � � ELBO( θ, φ ; x ) = log q φ ( z | x )d z q φ ( z | x ) � p θ ( z | x ) p θ ( x ) � � = log q φ ( z | x ) q φ ( z | x ) = log p θ ( x ) − KL( q φ ( z | x ) � p θ ( z | x )) ≤ log p θ ( x ) . ◮ The ELBO is a lower bound of the incomplete data likelihood also referred to as the evidence. - the bound is tight if Q contains the true posterior p θ ( · | x ) . ◮ The KL divergence measures the discrepancy when approximating the posterior with the variational distribution. - Can be replaced by f -divergence. ◮ The ELBO is tractable and can be easily optimized using the reparameterization trick, crucial for stochastic gradient descent. MCMC and Variational Inference for AutoEncoders

Introduction Deep Latent Generative Models (DLGMs) Markov Chain Monte Carlo (MCMC) MetFlow and MetVAE: MCMC & VI Variational Inference From classical to Flow-based MCMC Implementation & Deep Learning Experiments Variational Auto Encoder The Variational Auto Encoder (VAE) builds on the representational power of (Deep) Neural Networks to implement a very flexible class of encoders q φ ( z | x ) and decoders p θ ( z | x ) . ◮ The encoder q φ is parameterized by a deep neural network, which takes as input the observation x and outputs parameters for the distribution q φ ( · | x ) . ◮ The decoder p θ ( z | x ) is built symmetrically as a neural network which takes as input a latent variable z and outputs the parameters of the distribution p θ ( x | z ) . MCMC and Variational Inference for AutoEncoders

Introduction Deep Latent Generative Models (DLGMs) Markov Chain Monte Carlo (MCMC) MetFlow and MetVAE: MCMC & VI Variational Inference From classical to Flow-based MCMC Implementation & Deep Learning Experiments ”Classical” implementation ◮ In most examples, the dimension P of the observation x is large. ◮ The dimension of the latent space D is typically much smaller. ◮ The distribution of the latent variable denoted π is Gaussian. ◮ ... More sophisticated proposals can be considere: Gaussian mixture or hierarchical priors. ◮ In the vanilla implementation the variational distribution q φ ( · | x ) is q φ ( z | x ) = N( z ; µ φ ( x ) , σ φ ( x ) Id) where µ φ ( x ) , σ φ ( x ) are the output of a neural network taking the observation x as input. This parameterization is often referred to as the mean field approximation. MCMC and Variational Inference for AutoEncoders

MCMC and Variational Inference for AutoEncoders Achille Thin 1 , - PowerPoint PPT Presentation

Introduction Deep Latent Generative Models (DLGMs) MetFlow and MetVAE: MCMC & VI From classical to Flow-based MCMC Experiments MCMC and Variational Inference for AutoEncoders Achille Thin 1 , Alain Durmus 2 , Eric Moulines 1 1 Ecole

Variational Laplace Autoencoders Yookoon Park, Chris Dongjoo Kim and Gunhee Kim Vision and

Deep Variational Inference FLARE Reading Group Presentation Wesley Tansey 9/28/2016 What is

Variational Inference for GPs: Presenters Group1: Stochastic variational inference. Slides 2 - 28

CSC421/2516 Lecture 17: Variational Autoencoders Roger Grosse and Jimmy Ba Roger Grosse and

Variational Autoencoders Tom Fletcher March 25, 2019 Talking about this paper: Diederik Kingma

Semi-Amortized Variational Autoencoders Yoon Kim Sam Wiseman Andrew Miller David Sontag

Variational Auto-encoders 2 VARIATIONAL AUTO-ENCODERS INTRODUCTION VARIATIONAL AUTO-ENCODERS

Rejection Sampling Variational Inference Karan Grewal CSC2547 / STA4273 Overview Variational

An MCMC library for probabilistic programming Rob Zinkov June 13th, 2014 Rob Zinkov An MCMC

Testing MCMC Samplers Jason M.T. Roos First European Bayesian Summit in Marketing Testing MCMC

Additional notes on MCMC sampling Shravan Vasishth March 18, 2020 For more details on MCMC, some

Network determination based on birth-death MCMC inference A. Mohammadi and E. Wit February 4,

CS480/680 Machine Learning Lecture 11: February 11 th , 2020 Variational Inference Zahra

Neural Variational Inference and Learning Andriy Mnih, Karol Gregor 22 June 2014 1 / 14

CSC2541: Differentiable Inference and Generative Models Lecture 2: Variational autoencoders

An Introduction to An Introduction to Variational Variational Methods for Graphical Models

SCAF Winter Workshop Cost Estimating for Defence Programmes Tuesday 31 st March 2009 BAWA

Second quarter 2012 results 2012 results 2 August 2012 1 Disclaimer Figures included in this

Parity nonconserving corrections to the spin-spin coupling in molecules M G Kozlov PNPI, LETI

LQCD Facilities at Jefferson Lab Chip Watson May 14, 2009 Page 1 May 15, 2009 Existing

Nuclear Data for Power Applications Andrej Trkov International Atomic Energy Agency A-1400,

Better Contextual Suggestions from ClueWeb12 Using Domain Knowledge Inferred from The Open Web

EDM measurements with storage rings Gerco Onderwater VSI, University of Groningen the

Update on aircraft validation efforts; T/q retrieval validation using ARM data Dave Tobin, Leslie