Variational Auto-encoders 2 VARIATIONAL AUTO-ENCODERS INTRODUCTION - PowerPoint PPT Presentation

Lecture 3 Variational Auto-encoders

� 2 VARIATIONAL AUTO-ENCODERS INTRODUCTION VARIATIONAL AUTO-ENCODERS In this talk I will in some detail describe the paper of Kingma and Welling. “ Auto-Encoding Variational Bayes , International Conference on Learning Representations.” ICLR, 2014. arXiv:1312.6114 [stat.ML].

� 3 VARIATIONAL AUTO-ENCODERS INTRODUCTION VARIATIONAL AUTO-ENCODERS encode decode Input Hidden Output

� 4 VARIATIONAL AUTO-ENCODERS MANIFOLD HYPOTHESIS • X high dimensional vector • Data is concentrated around a low dimensional manifold • Hope finding a representation Z of that manifold.

� 5 VARIATIONAL AUTO-ENCODERS MANIFOLD HYPOTHESIS 2D 1D High Dimensional (number of pixels) Low Dimensional representation a line x 2 P ( X | Z ) z 1 x 1 3D 2D credit: http://www.deeplearningbook.org/

� 6 VARIATIONAL AUTO-ENCODERS PRINCIPLE IDEA ENCODER NETWORK • We have a set of N-observations (e.g. images) {x (1) ,x (2) , … ,x (N) } • Complex model parameterized with θ • There is a latent space z with z ~ p ( z ) multivariate Gaussian x z ~ p θ ( x z ) p θ ( X Z ) One Example Wish to learn θ from the N training observations x (i) i=1, … ,N

� 7 VARIATIONAL AUTO-ENCODERS TRAINING AS AN AUTOENCODER p θ ( z x ) p θ ( x z ) Training use maximum likelihood of p(x) given the training data Problem: p θ ( z x ) Cannot be calculated: Solution: • MCMC (too costly) • Approximate p(z|x) with q(z|x)

� 8 VARIATIONAL AUTO-ENCODERS MODEL FOR DECODER NETWORK • For illustration z one dimensional x 2D • Want a complex model of distribution of x given z • Idea: NN + Gaussian (or Bernoulli) here with diagonal covariance Σ µ x1 x z ~ N ( µ x , σ x 2 ) σ 2 X 1 x1 µ x2 X 2 σ 2 x2 p θ ( x z ) z

� 9 VARIATIONAL AUTO-ENCODERS COMPLETE AUTO-ENCODER q ϕ ( x z ) p θ ( x z ) Learning the parameters φ and θ via backpropagation Determining the loss function

� 10 VARIATIONAL AUTO-ENCODERS TRAINING: LOSS FUNCTION • What is (one of the) most beautiful idea in statistics? • Max-Likelihood, tune Φ , θ to maximize the likelihood • We maximize the (log) likelihood of a given “image” x (i) of the training set. Later we sum over all training data (using minibatches)

� 11 VARIATIONAL AUTO-ENCODERS LOWER BOUND OF LIKELIHOOD Likelihood, for an image x (i) from training set. Writing x=x (i) for short. D KL KL-Divergence >= 0 depends on how good q(z|x) can approximate p(z|x) L v “lower variational bound of the (log) likelihood” L v =L for perfect approximation

� 12 VARIATIONAL AUTO-ENCODERS APPROXIMATE INFERENCE Reconstruction quality, log(1) if x (i) gets always Regularisation reconstructed perfectly (z produces x (i) ) p(z) is usually a simple prior N(0,1) Example x (i) p θ ( x ( i ) z ) q φ ( z x ( i ) )

� 13 VARIATIONAL AUTO-ENCODERS on CALCULATION OF THE REGULARIZATION Use N(0,1) as prior for p(z) q(z|x (i) ) is Gaussian with parameters (µ (i) , σ (i) ) determined by NN

� 14 VARIATIONAL AUTO-ENCODERS SAMPLING TO CALCULATE te Example x (i) log( p θ ( x ( i ) z ( i ,1) )) where z ( i ,1) ~ N ( µ Z ( i ) , σ Z 2( i ) ) q φ ( z x ( i ) ) … log( p θ ( x ( i ) z ( i , L ) )) where z ( i , L ) ~ N ( µ Z ( i ) , σ Z 2( i ) )

� 15 VARIATIONAL AUTO-ENCODERS AN USEFUL TRICK Backpropagation not possible through random sampling! Cannot back propagate through a Sampling (reparametrization trick) random drawn number z ( i , l ) ~ N ( µ ( i ) , σ 2( i ) ) z ( i , l ) = µ ( i ) + σ ( i ) ⊙ ε i z has the same distribution, but now ε i ~ N (0,1) one can back propagate. Writing z in this form, results in a deterministic part and noise.

� 16 VARIATIONAL AUTO-ENCODERS PUTTING IT ALL TOGETHER Prior p(z) ~ N(0,1) and p, q Gaussian, extension to dim(z) > 1 trivial µ x1 µ z1 σ 2 x1 σ 2 µ x2 z1 σ 2 x2 Cost: Regularisation We use mini batch gradient decent to optimize the cost function over all x (i) in the mini batch Cost: Reproduction Least Square for constant variance

� 17 VARIATIONAL AUTO-ENCODERS PUTTING IT ALL TOGETHER

Lecture 4 Denoising Auto-encoders

� 19 DENOISING AUTO-ENCODERS INTRODUCTION Denoising Autoencoders for learning Deep Networks For more details, see: P. Vincent, H. Larochelle, Y. Bengio, P.A. Manzagol, Extracting and Composing Robust Features with Denoising Autoencoders , Proceedings of the 25 th International Conference on Machine Learning (ICML’2008) , pp. 1096-1103, Omnipress, 2008.

� 20 DENOISING AUTO-ENCODERS INTRODUCTION Building good predictors on complex domains means learning complicated functions. These are best represented by multiple levels of non-linear operations i.e. deep architectures. Deep architectures are an old idea: multi-layer perceptrons. Learning the parameters of deep architectures proved to be challenging!

� 21 DENOISING AUTO-ENCODERS MAIN IDEA Open question: what would make a good unsupervised criterion for finding good initial intermediate representations? Inspiration: our ability to“fill-in-the-blanks”in sensory input. missing pixels, small occlusions, image from sound, . . . Good fill-in-the-blanks performance ↔ distribution is well captured. → old notion of associative memory (motivated Hopfield models (Hopfield, 1982)) What we propose: unsupervised initialization by explicit fill-in-the-blanks training.

� 22 DENOISING AUTO-ENCODERS DENOISING AUTOENCODER x Clean input x 2 [0 , 1] d is partially destroyed, yielding corrupted input: ˜ x ⇠ q D (˜ x | x ). ˜ x is mapped to hidden representation y = f θ (˜ x ). From y we reconstruct a z = g θ 0 ( y ). Train parameters to minimize the cross-entropy“reconstruction error” L I H ( x , z ) = I H( B x k B z ), where B x denotes multivariate Bernoulli distribution with parameter x .

� 23 DENOISING AUTO-ENCODERS DENOISING AUTOENCODER q D ˜ x x Clean input x 2 [0 , 1] d is partially destroyed, yielding corrupted input: ˜ x ⇠ q D (˜ x | x ). ˜ x is mapped to hidden representation y = f θ (˜ x ). From y we reconstruct a z = g θ 0 ( y ). Train parameters to minimize the cross-entropy“reconstruction error” L I H ( x , z ) = I H( B x k B z ), where B x denotes multivariate Bernoulli distribution with parameter x .

� 24 DENOISING AUTO-ENCODERS DENOISING AUTOENCODER y f θ q D ˜ x x Clean input x 2 [0 , 1] d is partially destroyed, yielding corrupted input: ˜ x ⇠ q D (˜ x | x ). ˜ x is mapped to hidden representation y = f θ (˜ x ). From y we reconstruct a z = g θ 0 ( y ). Train parameters to minimize the cross-entropy“reconstruction error” L I H ( x , z ) = I H( B x k B z ), where B x denotes multivariate Bernoulli distribution with parameter x .

� 25 DENOISING AUTO-ENCODERS DENOISING AUTOENCODER y g θ 0 f θ q D ˜ x x z Clean input x 2 [0 , 1] d is partially destroyed, yielding corrupted input: ˜ x ⇠ q D (˜ x | x ). ˜ x is mapped to hidden representation y = f θ (˜ x ). From y we reconstruct a z = g θ 0 ( y ). Train parameters to minimize the cross-entropy“reconstruction error” L I H ( x , z ) = I H( B x k B z ), where B x denotes multivariate Bernoulli distribution with parameter x .

� 26 DENOISING AUTO-ENCODERS DENOISING AUTOENCODER y L H ( x , z ) g θ 0 f θ q D ˜ x x z Clean input x 2 [0 , 1] d is partially destroyed, yielding corrupted input: ˜ x ⇠ q D (˜ x | x ). ˜ x is mapped to hidden representation y = f θ (˜ x ). From y we reconstruct a z = g θ 0 ( y ). Train parameters to minimize the cross-entropy“reconstruction error” L I H ( x , z ) = I H( B x k B z ), where B x denotes multivariate Bernoulli distribution with parameter x .

� 27 DENOISING AUTO-ENCODERS NOISE PROCESS q D ˜ x x Choose a fixed proportion ν of components of x at random. Reset their values to 0. Can be viewed as replacing a component considered missing by a default value. Other corruption processes are possible.

� 28 DENOISING AUTO-ENCODERS ENCODER - DECODER We use standard sigmoid network layers: y = f θ (˜ x ) = sigmoid ( W ˜ x + ) b |{z} |{z} d 0 ⇥ d d 0 ⇥ 1 g θ 0 ( y ) = sigmoid ( W 0 y + b 0 ). |{z} |{z} d ⇥ d 0 d ⇥ 1 and cross-entropy loss.

� 29 DENOISING AUTO-ENCODERS ENCODER - DECODER Denoising is a fundamentally di ff erent task Think of classical autoencoder in overcomplete case: d 0 ≥ d Perfect reconstruction is possible without having learnt anything useful! Denoising autoencoder learns useful representation in this case. Being good at denoising requires capturing structure in the input. Denoising using classical autoencoders was actually introduced much earlier (LeCun, 1987; Gallinari et al., 1987), as an alternative to Hopfield networks (Hopfield, 1982).

� 30 DENOISING AUTO-ENCODERS LAYER-WISE INITIALIZATION y L H ( x , z ) g θ 0 f θ q D ˜ x x z Learn first mapping f θ by training as a denoising autoencoder. 1 Remove sca ff olding. Use f θ directly on input yielding higher level 2 representation. Learn next level mapping f (2) by training denoising autoencoder on 3 θ current level representation. Iterate to initialize subsequent layers. 4

Variational Auto-encoders 2 VARIATIONAL AUTO-ENCODERS INTRODUCTION - PowerPoint PPT Presentation

Lecture 3 Variational Auto-encoders 2 VARIATIONAL AUTO-ENCODERS INTRODUCTION VARIATIONAL AUTO-ENCODERS In this talk I will in some detail describe the paper of Kingma and Welling. Auto-Encoding Variational Bayes , International

Variational Auto-Encoders without (too) much math Stphane dAscoli Roadmap 1. A reminder

Correlated Variational Auto-Encoders Da Tang 1 Dawen Liang 2 Tony Jebara 1 , 2 Nicholas Ruozzi 3 1

Nonparametric Variational Auto-encoders for Hierarchical Representation Learning Prasoon Goyal,

Xiong Zhang yi : McInerney Jered Auto encoding Variational General Methods View : -

Scribe Graphs Stochastic Computation 22 : Heiko Zimmermann : Auto encoding Variational

Variational Auto-Encoders Diederik P. Kingma Introduction and Motivation Motivation and

Advanced Machine Learning Variational Auto-encoders Amit Sethi, EE, IITB Objectives Learn

Variational Auto-Encoders (VAE) Jonathan Pillow Lecture 21 slides NEU 560 Spring 2018

CS 4803 / 7643: Deep Learning Topics: Variational Auto-Encoders (VAEs)

Advanced Machine Learning Variational Auto-encoders Amit Sethi, EE, IITB Objectives Learn

KODA AUTO University KODA AUTO University Agenda on KODA AUTO University Enterprise

KODA AUTO University KODA AUTO University Agenda on KODA AUTO University Enterprise

Unit B - Rotary Encoders B.2 Rotary Encoders Electromechanical devices used to measure the

Rotary Encoders 2 Rotary Encoders Electromechanical devices used to measure the angular

An Introduction to An Introduction to Variational Variational Methods for Graphical Models

19 Auto Lecture encoders : Ankur Bambhanoliya Scribes : Donald Hamnett Motivation

Fundamentals of Computational Neuroscience 2e Thomas Trappenberg December 11, 2009 Chapter 8:

Course setup 9 ec course examination based on computer exercises weekly exercises

Introduction to Artificial Intelligence What is Artificial Intelligence for YOU? CPSC 533

Algorithmic Learning Theory Theoretical Computer Science Peter Rossmanith Felix Reidl, Fernando

Integer Representation But first, encode deck of cards. Representation

OpenStack Heat OpenShift Autoscaling on OpenStack Heat Steven Dake (sdake@redhat.com) Twitter:

Algorithms in Nature Dimensionality Reduction Slides adapted from Tom Mitchell and Aarti Singh

Organic Compounds in Water and Wastewater PCBs and other HOCs: Volatilization & other

Variational Auto-encoders 2 VARIATIONAL AUTO-ENCODERS INTRODUCTION - PowerPoint PPT Presentation

Lecture 3 Variational Auto-encoders 2 VARIATIONAL AUTO-ENCODERS INTRODUCTION VARIATIONAL AUTO-ENCODERS In this talk I will in some detail describe the paper of Kingma and Welling. Auto-Encoding Variational Bayes , International

Variational Auto-Encoders without (too) much math Stphane dAscoli Roadmap 1. A reminder

Correlated Variational Auto-Encoders Da Tang 1 Dawen Liang 2 Tony Jebara 1 , 2 Nicholas Ruozzi 3 1

Nonparametric Variational Auto-encoders for Hierarchical Representation Learning Prasoon Goyal,

Xiong Zhang yi : McInerney Jered Auto encoding Variational General Methods View : -

Scribe Graphs Stochastic Computation 22 : Heiko Zimmermann : Auto encoding Variational

Variational Auto-Encoders Diederik P. Kingma Introduction and Motivation Motivation and

Advanced Machine Learning Variational Auto-encoders Amit Sethi, EE, IITB Objectives Learn

Variational Auto-Encoders (VAE) Jonathan Pillow Lecture 21 slides NEU 560 Spring 2018

CS 4803 / 7643: Deep Learning Topics: Variational Auto-Encoders (VAEs)

Advanced Machine Learning Variational Auto-encoders Amit Sethi, EE, IITB Objectives Learn

KODA AUTO University KODA AUTO University Agenda on KODA AUTO University Enterprise

KODA AUTO University KODA AUTO University Agenda on KODA AUTO University Enterprise

Unit B - Rotary Encoders B.2 Rotary Encoders Electromechanical devices used to measure the

Rotary Encoders 2 Rotary Encoders Electromechanical devices used to measure the angular

An Introduction to An Introduction to Variational Variational Methods for Graphical Models

19 Auto Lecture encoders : Ankur Bambhanoliya Scribes : Donald Hamnett Motivation

Fundamentals of Computational Neuroscience 2e Thomas Trappenberg December 11, 2009 Chapter 8:

Course setup 9 ec course examination based on computer exercises weekly exercises

Introduction to Artificial Intelligence What is Artificial Intelligence for YOU? CPSC 533

Algorithmic Learning Theory Theoretical Computer Science Peter Rossmanith Felix Reidl, Fernando

Integer Representation But first, encode deck of cards. Representation

OpenStack Heat OpenShift Autoscaling on OpenStack Heat Steven Dake (sdake@redhat.com) Twitter:

Algorithms in Nature Dimensionality Reduction Slides adapted from Tom Mitchell and Aarti Singh

Organic Compounds in Water and Wastewater PCBs and other HOCs: Volatilization &amp; other

Organic Compounds in Water and Wastewater PCBs and other HOCs: Volatilization & other