variational autoencoders
play

Variational Autoencoders Tom Fletcher March 25, 2019 Talking about - PowerPoint PPT Presentation

Variational Autoencoders Tom Fletcher March 25, 2019 Talking about this paper: Diederik Kingma and Max Welling, Auto-Encoding Variational Bayes, In International Conference on Learning Representation (ICLR) , 2014. Autoencoders Input Latent


  1. Variational Autoencoders Tom Fletcher March 25, 2019

  2. Talking about this paper: Diederik Kingma and Max Welling, Auto-Encoding Variational Bayes, In International Conference on Learning Representation (ICLR) , 2014.

  3. Autoencoders Input Latent Space Output x ′ ∈ R D x ∈ R D z ∈ R d d << D

  4. Autoencoders ◮ Linear activation functions give you PCA ◮ Training: 1. Given data x , feedforward to x ′ output 2. Compute loss, e.g., L ( x , x ′ ) = � x − x ′ � 2 3. Backpropagate loss gradient to update weights ◮ Not a generative model!

  5. Variational Autoencoders Input Latent Space Output μ σ 2 x ′ ∈ R D x ∈ R D z ∼ N ( µ, σ 2 )

  6. Generative Models z Sample a new x in two steps: θ p ( z ) Prior: p θ ( x | z ) Generator: x Now the analogy to the “encoder” is: Posterior: p ( z | x )

  7. Posterior Inference Posterior via Bayes’ Rule: p θ ( x | z ) p ( z ) p ( z | x ) = � p θ ( x | z ) p ( z ) dz Integral in denominator is (usually) intractable! Could use Monte Carlo to approximate, but it’s expensive

  8. Kullback-Leibler Divergence � p ( z ) � � D KL ( q � p ) = − q ( z ) log dz q ( z ) � � p �� = E q − log q The average information gained from moving from q to p

  9. Variational Inference Approximate intractable posterior p ( z | x ) with a manageable distribution q ( z ) Minimize the KL divergence: D KL ( q ( z ) � p ( z | x ))

  10. Evidence Lower Bound (ELBO) D KL ( q ( z ) � p ( z | x )) � � p ( z | x ) �� = E q − log q ( z ) � � − log p ( z , x ) = E q q ( z ) p ( x ) = E q [ − log p ( z , x ) − log q ( z ) + log p ( x )] = − E q [log p ( z , x )] + E q [log q ( z )] + log p ( x ) log p ( x ) = D KL ( q ( z ) � p ( z | x )) + L [ q ( z )] ELBO: L [ q ( z )] = E q [log p ( z , x )] − E q [log q ( z )]

  11. Variational Autoencoder q φ ( z | x ) p θ ( x | z ) Encoder Network Decoder Network Maximize ELBO: L ( θ, φ, x ) = E q φ [log p θ ( x , z ) − log q φ ( z | x )]

  12. VAE ELBO L ( θ, φ, x ) = E q φ [log p θ ( x , z ) − log q φ ( z | x )] = E q φ [log p θ ( z ) + log p θ ( x | z ) − log q φ ( z | x )] � p θ ( z ) � = E q φ log q φ ( z | x ) + log p θ ( x | z ) = − D KL ( q φ ( z | x ) � p θ ( z )) + E q φ [log p θ ( x | z )] Problem: Gradient ∇ φ E q φ [log p θ ( x | z )] is intractable! Use Monte Carlo approx., sampling z ( s ) ∼ q φ ( z | x ) : S ∇ φ E q φ [log p θ ( x | z )] ≈ 1 � log p θ ( x | z ) ∇ φ log q φ ( z ( s ) ) S s = 1

  13. Reparameterization Trick What about the other term? − D KL ( q φ ( z | x ) � p θ ( z )) Says encoder, q φ ( z | x ) , should make code z look like prior distribution Instead of encoding z , encode parameters for a normal distribution, N ( µ, σ 2 )

  14. Reparameterization Trick q φ ( z j | x ( i ) ) = N ( µ ( i ) j , σ 2 ( i ) ) j p θ ( z ) = N ( 0 , I ) KL divergence between these two is: d D KL ( q φ ( z | x ( i ) ) � p θ ( z )) = − 1 � j ) 2 − σ 2 ( i ) � � 1 + log( σ 2 ( i ) ) − ( µ ( i ) j j 2 j = 1

  15. Results from Kingma & Welling

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend