Ba Bayesi esian Deep Deep Le Lear arning ning
- Prof. Leal-Taixé and Prof. Niessner
1
Ba Bayesi esian Deep Deep Le Lear arning ning Prof. Leal-Taix - - PowerPoint PPT Presentation
Ba Bayesi esian Deep Deep Le Lear arning ning Prof. Leal-Taix and Prof. Niessner 1 Go Going ful g full Bay ayes esian an Bayes = Probabilities Hypothesis = Model Bayes Theorem Evidence = data Prof. Leal-Taix and Prof.
1
2
Evidence = data Hypothesis = Model
3
No dependence
data
4
prior posterior likelihood data
5
– Finding a point estimate (MAP) à what we have been doing so far! – Finding a probability distribution of
This lecture
Advant antag ages of Deep Learning models
– Very expressive models – Good for tasks such as classification, regression, sequence prediction – Modular structure, efficient training, many tools – Scales well with large amounts of data
advant antag ages…
– ”Black-box” feeling – We cannot judge how “confident” the model is about a decision
7
– We want to know what our models know and what they do not know
8
9
Bulldog German sheperd Chihuaha What answer will my NN give?
10
Bulldog German sheperd Chihuaha I would rather get as an answer that my model is not certain about the type of dog breed
– We want to know what our models know and what they do not know
– Decision making – Learning from limited, noisy, and missing data – Insights on why a model failed
11
12
– Finding a point estimate (MAP) à what we have been doing so far! – Finding a probability distribution of
Image: https://medium.com/@joeDiHare/deep-bayesian-neural-networks-952763a9537
13
see how this affects our model’s predictions
Image: https://medium.com/@joeDiHare/deep-bayesian-neural-networks-952763a9537
14
I am not really sure
Kendal & Gal. “What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?“ NIPS 2016
model parameters
15
How do we compute this?
possible combinations
approximation of the posterior:
16
Markov Chain Monte Carlo Variational Inference
– A chain of samples that converge to
– Find an approximation that.
17
SLOW
18
19
Srivastava 2014
Forward
20
Furry Has two eyes Has a tail Has paws Has two ears Redundant representations
– Redundant representations – Base your scores on more features
21
22
Model 1 Model 2
– Find an approximation that
– The variational distribution is from a Bernoulli distribution (where the states are “on” and “off”)
23
Y Gal, Z Ghahramani, “Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning”, ICML 2016
layer
test time
– Sampling is done in a Monte Carlo fashion, hence the name Monte Carlo dropout
24
Y Gal, Z Ghahramani, “Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning”, ICML 2016
– Sampling is done in a Monte Carlo fashion, e.g., where and is the dropout distribution
25
Y Gal, Z Ghahramani, “Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning”, ICML 2016
classification Parameter sampling NN
26
Kendal & Gal. “What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?“ NIPS 2016
27
generated
28
random process, involving an unobserved continuous random (latent) variable
29
– Find an approximation.
30
31
and reconstruct it with the decoder
32
Conv Transpose Conv Encoder Decoder
33
Conv Transpose Conv Encoder Decoder
34
Encoder
35
Encoder Mean Diagonal covariance
36
Encoder Mean Diagonal covariance
37
was:
I want to optimize
38
I draw samples of the latent variable z from my encoder
39
Bayes Rule Posterior
40
Just a constant
41
42
Kullback-Leibler Divergences
43
Reconstruction loss Measures how good my latent distribution is with respect to my prior I still cannot express the shape of the
44
Loss function (lower bound)
45
Loss function (lower bound)
46
Encoder Make posterior distribution close to prior (close to unit Gaussian distribution)
47
Encoder
48
Encoder Sample
49
Encoder Decoder Sample
50
Decoder Sample Output is also parameterized
51
Maximize the likelihood of reconstructing the input
52
Bayes“. ICLR 2014
http://kvfrans.com/variational-autoencoders-explained/
53
Sample from the distribution (e.g., unit Gaussian
54
55
Each element of z encodes a different feature
56
Degree of smile Head pose
Autoencoder Variational Autoencoder Ground Truth
https://github.com/kvfrans/variational-autoencoder
57
– Reconstruct input – Unsupervised learning – Latent space features are useful
– Probability distribution in latent space (e.g., Gaussian) – Sample from model to generate output
58
– Reconstruct input – Unsupervised learning – Latent space features are useful
– Probability distribution in latent space (e.g., Gaussian) – Interpretable latent space (head pose, smile) – Sample from model to generate output
59
60
61
Figure from Ian Goodfellow, Tutorial on Generative Adversarial /networks, 2017
62
Figure from Ian Goodfellow, Tutorial on Generative Adversarial /networks, 2017
63
Figure from Ian Goodfellow, Tutorial on Generative Adversarial /networks, 2017
Define a more tractable density function
64
Figure from Ian Goodfellow, Tutorial on Generative Adversarial /networks, 2017
I do not care about the shape, I just want to sample!
receive feedback about the presentations
65
– Sohn, Kihyuk, Honglak Lee, and Xinchen Yan. “Learning Structured Output Representation using Deep Conditional Generative Models.” Advances in Neural Information Processing Systems. 2015. – Xinchen Yan, Jimei Yang, Kihyuk Sohn, Honglak Lee, Attribute2Image: Conditional Image Generation from Visual Attributes, ECCV, 2016 –
66
– Jacob Walker, Carl Doersch, Abhinav Gupta, Martial Hebert, An Uncertain Future: Forecasting from Static Images using Variational Autoencoders, ECCV, 2016 – Anders Boesen Lindbo Larsen, Søren Kaae Sønderby, Hugo Larochelle, Ole Winther, Autoencoding beyond pixels using a learned similarity metric, ICML, 2016 – Aditya Deshpande, Jiajun Lu, Mao-Chuang Yeh, David Forsyth, Learning Diverse Image Colorization, arXiv, 2016 – Raymond Yeh, Ziwei Liu, Dan B Goldman, Aseem Agarwala, Semantic Facial Expression Editing using Autoencoded Flow, arXiv, 2016 – Diederik P. Kingma, Danilo J. Rezende, Shakir Mohamed, Max Welling, Semi-Supervised Learning with Deep Generative Models, NIPS, 2014
67