Neural Variational Inference and Learning
Andriy Mnih, Karol Gregor 22 June 2014
1 / 14
Neural Variational Inference and Learning Andriy Mnih, Karol Gregor - - PowerPoint PPT Presentation
Neural Variational Inference and Learning Andriy Mnih, Karol Gregor 22 June 2014 1 / 14 Introduction Training directed latent variable models is difficult because inference in them is intractable. Both MCMC and traditional variational
1 / 14
◮ Both MCMC and traditional variational methods involve iterative
◮ Use feedforward approximation to inference to implement efficient
2 / 14
◮ These have to be efficient to evaluate and sample from.
3 / 14
4 / 14
◮ Variational distributions Q with simple factored form and no
◮ Simple models Pθ(x, h) yielding tractable expectations. ◮ Iterative optimization to compute Q for each x.
5 / 14
◮ This allows us to sample from Qφ(h|x) very efficiently. ◮ We will refer to Q as the inference network because it implements
◮ We compute all the required expectations using samples from Q. 6 / 14
7 / 14
8 / 14
∂ ∂φLθ,φ(x) for any b independent of h.
9 / 14
◮ Makes the learning signal zero-mean. ◮ Enough to obtain reasonable models on MNIST.
◮ Can be seen as capturing log Pθ(x). ◮ An MLP with a single real-valued output. ◮ Makes learning considerably faster and leads to better results.
◮ Can be seen as simple global learning rate adaptation. ◮ Makes learning faster and more robust.
◮ Take advantage of the Markov properties of the models. 10 / 14
200 400 600 800 1000 1200 1400 1600 1800 2000 −240 −220 −200 −180 −160 −140 −120 −100 SBN 200−200 Number of parameter updates Validation set bound Baseline, IDB, & VN Baseline & VN Baseline only VN only No baselines & no VN
11 / 14
◮ 20 Newsgroups: 11K documents, 2K vocabulary ◮ Reuters RCV1: 800K documents, 10K vocabulary
12 / 14
◮ Can handle both continuous and discrete latent variables. ◮ Easy to apply, requiring no model-specific derivations beyond
13 / 14
14 / 14