CSC2541: Differentiable Inference and Generative Models Lecture - PowerPoint PPT Presentation

CSC2541: Differentiable Inference and Generative Models Lecture 2: Variational autoencoders

Admin: • TAs: • Tony Wu (ywu@cs.toronto.edu) • Kamal Rai (kamal.rai@mail.utoronto.ca) • Extra seminar: Model-based Reinforcement learning • Seminar sign-up

Seminars • 7 weeks of seminars, about 8-9 people each • Each day will have one or two major themes, 3-6 papers covered • Divided into 2-3 presentations of about 30-40 mins each • Explain main idea, relate to previous work and future directions

Computational Tools • Automatic differentiation • Neural networks • Stochastic optimization • Simple Monte Carlo

Computational Tools • Can specify arbitrarily-flexible functions with a deep net: y = f θ ( x ) • Can specify arbitrarily complex conditional distributions with a deep net: p ( y | x ) = N ( y | µ = f θ ( x ) , Σ = g θ ( x )) • Density networks: p ( y = c | x ) = 1 exp([ f θ ( x )] c ) Z θ Z p ( y | x ) = f θ ( x ) p ( θ ) d θ • Bayesian neural network:

Computational Tools • Can optimize continuous parameters wrt any objective given unbiased estimates of its gradient. • given E p ( x ) [ grad ( J )( θ , x )] = r θ J ( θ ) • can use: ˆ θ = SGD( θ init , ˆ grad(J)) ≈ argmin θ ( J )

Computational Tools • Can differentiate any deterministic, continuous function using reverse-mode automatic differentiation (backprop) • Cost of evaluating gradient about same as evaluating function

Computational Tools • Simple Monte Carlo gives unbiased estimates of integrals given samples

Benefits of Bayesianism • Examples: Diagnosing disease, doing regression • Captures uncertainty • Necessary for decision-making • Why pretend we’re certain? • Automatic regularization from ensembling • Latent variables can be meaningful • Can combine datasets/models (semi-supervised learning) • Marginal likelihood automatically chooses model capacity • Inference is deterministic given model, automatic answer for hyperparameters

From IS to Variational Inference [from Shakir Mohamed] Variational Inference Z log p ( y ) = log p ( y | z ) p ( z ) dz Integral problem p ( y | z ) p ( z ) q ( z ) Z log p ( y ) = log q ( z ) dz Proposal p ( y | z ) p ( z ) Z log p ( y ) = log q ( z ) q ( z ) dz Importance Weight ✓ ◆ p ( y | z ) p ( z ) Z log p ( y ) ≥ q ( z ) log dz Jensen’s inequality q ( z ) Z Z log p ( x ) g ( x ) dx ≥ p ( x ) log g ( x ) dx q ( z ) log q ( z ) Z Z = q ( z ) log p ( y | z ) − p ( z ) = E q ( z ) [log p ( y | z )] � KL [ q ( z ) k p ( z )] Variational lower bound Variational Inference 28

Interpretations • Bound maximized when q(z|x) = p(z|x) • Reconstruction + difference from prior • MAP + Entropy

Show demos • Toy example • Mixture example • Bayesian neural network

When we have lots of data, and global model parameters: N Y p ( x | θ ) = ( x i | z i , θ ) p ( z i ) d θ i =1 • Can alternate optimizing variational parameters, model parameters • A generalization of Expectation-Maximization • Slow because of alternating optimization - need to update q ( z i | x i , θ ) theta, then each • Slow and memory-intensive when we have many datapoints

Variational autoencoders • Model: Latent-variable model p(x|z, theta) usually specified by a neural network • Inference: Recognition network for q(z|x, theta) usually specified by a neural network • Training objective: Simple Monte Carlo for unbiased estimate of Variational lower bound • Optimization method: Stochastic gradient ascent, with automatic differentiation for gradients

Show VAE demo • Maximizing ELBO, or minimizing KL from true posterior • Relation to denoting autoencoders: Training ‘encoder’ and ‘decoder’ together • Decoder specifies model, encoder specifies inference

Pros and Cons • Flexible generative model • End-to-end gradient training • Measurable objective (and lower bound - model is at least this good) • Fast test-time inference • Cons: • sub-optimal variational factors • limited approximation to true posterior (will revisit) • Can have high-variance gradients

Questions

Class Projects • Develop a generative model for a new medium • Extend existing models, inference, or training • Apply an existing approach in a new way • Review / comparison / tutorials

Other ideas • Backprop through BEAM search • Backprop through dynamic programming for DNA alignment • Conditional GANs for mesh upsampling • Apply VAE SLDS to human speech • Generate images from captions • Learn to predict time-reversed physical dynamics • Investigate minimax optimization methods for GANS • Model-based RL (show demo)

CSC2541: Differentiable Inference and Generative Models Lecture - PowerPoint PPT Presentation

CSC2541: Differentiable Inference and Generative Models Lecture 2: Variational autoencoders Admin: TAs: Tony Wu (ywu@cs.toronto.edu) Kamal Rai (kamal.rai@mail.utoronto.ca) Extra seminar: Model-based Reinforcement learning

CSC2541: Differentiable Inference and Generative Models Density estimation using Real NVP. Ding

CSC2541 Lecture 5 Natural Gradient Roger Grosse Roger Grosse CSC2541 Lecture 5 Natural Gradient

CSC2541 Lecture 1 Introduction Roger Grosse Roger Grosse CSC2541 Lecture 1 Introduction 1 / 36

Learning Deep Generative Models Inference & Representation Lecture 12 Rahul G. Krishnan

generative design systems Generative Brief Design Definitions Workshop Processes

CSC2541 Lecture 2 Bayesian Occams Razor and Gaussian Processes Roger Grosse Roger Grosse

An Enriched Perspective on Differentiable Stacks Benjamin MacAdam Joint work with Jonathan

Generative networks part 2: GANs 23 / 54 Recap on generative networks Generative networks provide

GAN Foundations CSC2541 Michael Chiu - chiu@cs.toronto.edu Jonathan Lorraine -

CSC421/2516 Lecture 18: Generative Adversarial Networks Roger Grosse and Jimmy Ba Roger Grosse

Reparameterization Gradient for Non-differentiable Models Wonyeol Lee Hangyeol Yu Hongseok

Deep Generative models for Inverse Problems Alex Dimakis joint work with Ashish Bora, Dave Van

Invertible Generative Models for Inverse Problems Mitigating Representation Error and Dataset Bias

Generative Adversarial Nets(GANs) Troy Cary and Chenzhi Zhao A generative adversarial net is

Structured Inference Networks for Nonlinear State Space Models Rahul G. Krishnan, Uri Shalit,

Bayesian Optimization CSC2541 - Topics in Machine Learning Scalable and Flexible Models of

Bigger GPUs and Bigger Nodes Carl Pearson (pearson@illinois.edu) PhD Candidate, advised by

Reliability Assurance I nitiative (RAI ) Progress Report Jerry Hedrick, Director of Compliance

Santosh Kumar Rai Oklahoma State University PHENO 2012 May 08 D. Karabacak, S. Nandi, SKR

Machines CMSC 422 M ARINE C ARPUAT marine@cs.umd.edu Slides credit: Piyush Rai Back to linear

Week 5 Video 2 Relationship Mining Causal Mining Causal Data Mining These slides developed in

Case Study: Bayesian Linear Regression and Sparse Bayesian Models Piyush Rai Dept. of CSE, IIT

Content-Centric Networking at Internet Scale through The Integration of Name Resolution and

DeepIntent : Deep Icon-Behavior Learning for Detecting Intention-Behavior Discrepancy in Mobile

Sambuz

Useful Links

Newsletter

Mail Us

CSC2541: Differentiable Inference and Generative Models Lecture - PowerPoint PPT Presentation

CSC2541: Differentiable Inference and Generative Models Lecture 2: Variational autoencoders Admin: TAs: Tony Wu (ywu@cs.toronto.edu) Kamal Rai (kamal.rai@mail.utoronto.ca) Extra seminar: Model-based Reinforcement learning

CSC2541: Differentiable Inference and Generative Models Density estimation using Real NVP. Ding

CSC2541 Lecture 5 Natural Gradient Roger Grosse Roger Grosse CSC2541 Lecture 5 Natural Gradient

CSC2541 Lecture 1 Introduction Roger Grosse Roger Grosse CSC2541 Lecture 1 Introduction 1 / 36

Learning Deep Generative Models Inference &amp; Representation Lecture 12 Rahul G. Krishnan

generative design systems Generative Brief Design Definitions Workshop Processes

CSC2541 Lecture 2 Bayesian Occams Razor and Gaussian Processes Roger Grosse Roger Grosse

An Enriched Perspective on Differentiable Stacks Benjamin MacAdam Joint work with Jonathan

Generative networks part 2: GANs 23 / 54 Recap on generative networks Generative networks provide

GAN Foundations CSC2541 Michael Chiu - chiu@cs.toronto.edu Jonathan Lorraine -

CSC421/2516 Lecture 18: Generative Adversarial Networks Roger Grosse and Jimmy Ba Roger Grosse

Reparameterization Gradient for Non-differentiable Models Wonyeol Lee Hangyeol Yu Hongseok

Deep Generative models for Inverse Problems Alex Dimakis joint work with Ashish Bora, Dave Van

Invertible Generative Models for Inverse Problems Mitigating Representation Error and Dataset Bias

Generative Adversarial Nets(GANs) Troy Cary and Chenzhi Zhao A generative adversarial net is

Structured Inference Networks for Nonlinear State Space Models Rahul G. Krishnan, Uri Shalit,

Bayesian Optimization CSC2541 - Topics in Machine Learning Scalable and Flexible Models of

Bigger GPUs and Bigger Nodes Carl Pearson (pearson@illinois.edu) PhD Candidate, advised by

Reliability Assurance I nitiative (RAI ) Progress Report Jerry Hedrick, Director of Compliance

Santosh Kumar Rai Oklahoma State University PHENO 2012 May 08 D. Karabacak, S. Nandi, SKR

Machines CMSC 422 M ARINE C ARPUAT marine@cs.umd.edu Slides credit: Piyush Rai Back to linear

Week 5 Video 2 Relationship Mining Causal Mining Causal Data Mining These slides developed in

Case Study: Bayesian Linear Regression and Sparse Bayesian Models Piyush Rai Dept. of CSE, IIT

Content-Centric Networking at Internet Scale through The Integration of Name Resolution and

DeepIntent : Deep Icon-Behavior Learning for Detecting Intention-Behavior Discrepancy in Mobile

Sambuz

Useful Links

Newsletter

Mail Us

Learning Deep Generative Models Inference & Representation Lecture 12 Rahul G. Krishnan