variational inference and
play

Variational Inference and Generative Models CS 294-112: Deep - PowerPoint PPT Presentation

Variational Inference and Generative Models CS 294-112: Deep Reinforcement Learning Sergey Levine Class Notes 1. Homework 3 due next Wednesday 2. Accept CMT peer review invitations These are required (part of your final project grade)


  1. Variational Inference and Generative Models CS 294-112: Deep Reinforcement Learning Sergey Levine

  2. Class Notes 1. Homework 3 due next Wednesday 2. Accept CMT peer review invitations • These are required (part of your final project grade) • If you have not received/cannot find invitation, email Kate Rakelly!

  3. Where we are in the course RL algorithms advanced topics

  4. Today’s Lecture 1. Probabilistic latent variable models 2. Variational inference 3. Amortized variational inference 4. Generative models: variational autoencoders • Goals • Understand latent variable models in deep learning • Understand how to use (amortized) variational inference

  5. Probabilistic models

  6. Latent variable models mixture element

  7. Latent variable models in general “easy” distribution “easy” distribution (e.g., conditional Gaussian) (e.g., Gaussian) “easy” distribution (e.g., Gaussian)

  8. Latent variable models in RL conditional latent variable latent variable models for models for multi-modal policies model-based RL

  9. Other places we’ll see latent variable models Using RL/control + variational inference to model human behavior Muybridge (c. 1870) Mombaur et al. ‘09 Li & Todorov ‘06 Ziebart ‘08 Using generative models and variational inference for exploration

  10. How do we train latent variable models?

  11. Estimating the log-likelihood

  12. The variational approximation

  13. The variational approximation Jensen’s inequality

  14. A brief aside… Entropy: high Intuition 1: how random is the random variable? Intuition 2: how large is the log probability in expectation under itself low this maximizes the first part this also maximizes the second part (makes it as wide as possible)

  15. A brief aside… KL-Divergence: Intuition 1: how different are two distributions? Intuition 2: how small is the expected log probability of one distribution under another, minus entropy? why entropy? this maximizes the first part this also maximizes the second part (makes it as wide as possible)

  16. The variational approximation

  17. The variational approximation

  18. How do we use this? how?

  19. What’s the problem?

  20. Break

  21. What’s the problem?

  22. Amortized variational inference how do we calculate this?

  23. Amortized variational inference look up formula for entropy of a Gaussian can just use policy gradient! What’s wrong with this gradient?

  24. The reparameterization trick Is there a better way? most autodiff software (e.g., TensorFlow) will compute this for you!

  25. Another way to look at it… this often has a convenient analytical form (e.g., KL-divergence for Gaussians)

  26. Reparameterization trick vs. policy gradient • Policy gradient • Can handle both discrete and continuous latent variables • High variance, requires multiple samples & small learning rates • Reparameterization trick • Only continuous latent variables • Very simple to implement • Low variance

  27. The variational autoencoder

  28. Using the variational autoencoder

  29. Conditional models

  30. Examples

  31. 1. collect data 2. learn embedding of image & dynamics model ( jointly ) 3. run iLQG to learn to reach image of goal a type of variational autoencoder with temporally decomposed latent state!

  32. Local models with images

  33. Local models with images variational autoencoder with stochastic dynamics

  34. We’ll see more of this for… Using RL/control + variational inference to model human behavior Muybridge (c. 1870) Mombaur et al. ‘09 Li & Todorov ‘06 Ziebart ‘08 Using generative models and variational inference for exploration

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend