Deep Variational Inference
FLARE Reading Group Presentation Wesley Tansey 9/28/2016
Deep Variational Inference FLARE Reading Group Presentation Wesley - - PowerPoint PPT Presentation
Deep Variational Inference FLARE Reading Group Presentation Wesley Tansey 9/28/2016 What is Variational Inference? What is Variational Inference? p*(x) Want to estimate some distribution, p*(x) What is Variational Inference?
FLARE Reading Group Presentation Wesley Tansey 9/28/2016
p*(x)
p*(x)
p*(x) q(x)
p*(x) q(x)
Still hard! p*(x) usually has a tricky normalizing constant
log(q(x) / p*(x)) = log(q(x)) - log(p*(x)) = log(q(x)) - log(p~(x) / Z) = log(q(x)) - log(p~(x)) - log(Z)
log(q(x) / p*(x)) = log(q(x)) - log(p*(x)) = log(q(x)) - log(p~(x) / Z) = log(q(x)) - log(p~(x)) - log(Z)
Constant => Can ignore in our
[1] Blei, Ng, Jordan, “Latent Dirichlet Allocation”, JMLR, 2003.
[1] Blei, Ng, Jordan, “Latent Dirichlet Allocation”, JMLR, 2003.
[1] Blei, Ng, Jordan, “Latent Dirichlet Allocation”, JMLR, 2003.
[1] Blei, Ng, Jordan, “Latent Dirichlet Allocation”, JMLR, 2003.
(Kingma and Welling, 2013)
(Rezende and Mohamed, 2015)
(Kingma and Welling, 2014)
High-level idea: 1) Optimizing the same lower bound that we get in VB 2) Data augmentation trick leads to lower-variance estimator 3) Lots of choices of q(z|x) and p(z) lead to partial closed-form 4) Use a neural network to parameterize qϕ(z | x) and pθ(x | z) 5) SGD to fit everything
[1] Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR, 2014.
Exponential, Cauchy, Logistic, Rayleigh, Pareto, Weibull, Reciprocal, Gompertz, Gumbel, Erlang Laplace, Elliptical, Student’s t, Logistic, Uniform, Triangular, Gaussian Log-Normal (exponentiated normal) Gamma (sum of exponentials) Dirichlet (sum of Gammas) Beta, Chi-Squared, F
More info in Karol Gregor’s Deep Mind lecture: https://www.youtube.com/watch?v=P78QYjWh5sM
High-level idea: 1) VAEs are great, but our posterior q(z|x) needs to be simple 2) Take simple q(z | x) and apply series of k transformations to z to get q_k(z | x). Metaphor: z “flows” through each transform. 3) Be clever in choice of transforms (computational issue) 4) Variational posterior q now converges to true posterior p 5) Deep NN now parameterizes q and flow parameters
[1] Rezende, Danilo Jimenez, and Shakir Mohamed. "Variational inference with normalizing flows." arXiv preprint arXiv:1505.05770 (2015)..