ba bayesi esian deep deep le lear arning ning
play

Ba Bayesi esian Deep Deep Le Lear arning ning Prof. Leal-Taix - PowerPoint PPT Presentation

Ba Bayesi esian Deep Deep Le Lear arning ning Prof. Leal-Taix and Prof. Niessner 1 Go Going ful g full Bay ayes esian an Bayes = Probabilities Hypothesis = Model Bayes Theorem Evidence = data Prof. Leal-Taix and Prof.


  1. Ba Bayesi esian Deep Deep Le Lear arning ning Prof. Leal-Taixé and Prof. Niessner 1

  2. Go Going ful g full Bay ayes esian an • Bayes = Probabilities Hypothesis = Model • Bayes Theorem Evidence = data Prof. Leal-Taixé and Prof. Niessner 2

  3. Go Going ful g full Bay ayes esian an • Start with a prior on the model parameters • Choose a statistical model data • Use data to refine my prior, i.e., compute the posterior No dependence on parameters Prof. Leal-Taixé and Prof. Niessner 3

  4. Go Going ful g full Bay ayes esian an • Start with a prior on the model parameters • Choose a statistical model data • Use data to refine my prior, i.e., compute the posterior posterior prior likelihood Prof. Leal-Taixé and Prof. Niessner 4

  5. Go Going ful g full Bay ayes esian an • 1. Learning: Computing the posterior – Finding a point estimate (MAP) à what we have been doing so far! – Finding a probability distribution of This lecture Prof. Leal-Taixé and Prof. Niessner 5

  6. Wh What at hav ave e we e lear earned ed so o far ar? ages of Deep Learning models • Ad Advant antag – Very expressive models – Good for tasks such as classification, regression, sequence prediction – Modular structure, efficient training, many tools – Scales well with large amounts of data • But we have also disad advant antag ages … – ”Black-box” feeling – We cannot judge how “confident” the model is about a decision Prof. Leal-Taixé and Prof. Niessner 7

  7. Model Modeling uncer ertai ainty • Wish list: – We want to know what our models know and what they do not know Prof. Leal-Taixé and Prof. Niessner 8

  8. Model Modeling uncer ertai ainty • Example: I have built a dog breed classifier Bulldog German What answer sheperd will my NN give? Chihuaha Prof. Leal-Taixé and Prof. Niessner 9

  9. Model Modeling uncer ertai ainty • Example: I have built a dog breed classifier Bulldog German sheperd I would rather get as an answer that my model is not certain Chihuaha about the type of dog breed Prof. Leal-Taixé and Prof. Niessner 10

  10. Model Modeling uncer ertai ainty • Wish list: – We want to know what our models know and what they do not know • Why do we care? – Decision making – Learning from limited, noisy, and missing data – Insights on why a model failed Prof. Leal-Taixé and Prof. Niessner 11

  11. Model Modeling uncer ertai ainty • Finding the posterior – Finding a point estimate (MAP) à what we have been doing so far! – Finding a probability distribution of Image: https://medium.com/@joeDiHare/deep-bayesian-neural-networks-952763a9537 Prof. Leal-Taixé and Prof. Niessner 12

  12. Model Modeling uncer ertai ainty • We can sample many times from the distribution and see how this affects our model’s predictions • If predictions are consistent = model is confident Image: https://medium.com/@joeDiHare/deep-bayesian-neural-networks-952763a9537 Prof. Leal-Taixé and Prof. Niessner 13

  13. Model Modeling uncer ertai ainty I am not really sure Kendal & Gal. “What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?“ NIPS 2016 Prof. Leal-Taixé and Prof. Niessner 14

  14. Ho How d do w we g get t the p post sterio ior? • Compute the posterior over the weights • Probability of observing our data under all possible model parameters How do we compute this? Prof. Leal-Taixé and Prof. Niessner 15

  15. Ho How d do w we g get t the p post sterio ior? • How do we compute this? • Denominator = we cannot compute all possible combinations • Two ways to compute the Markov Chain Monte Carlo approximation of the posterior: Variational Inference Prof. Leal-Taixé and Prof. Niessner 16

  16. Ho How d do w we g get t the p post sterio ior? • Markov Chain Monte Carlo (MCMC) – A chain of samples SLOW that converge to • Variational Inference – Find an approximation that. Prof. Leal-Taixé and Prof. Niessner 17

  17. Dropout Dropout for or Ba Bayesi esian I Inferen erence ce Prof. Leal-Taixé and Prof. Niessner 18

  18. Rec Recal all: Drop opou out • Disable a random set of neurons (typically 50%) Forward Prof. Leal-Taixé and Prof. Niessner 19 Srivastava 2014

  19. Rec Recal all: Drop opou out Redundant representations • Using half the network = half capacity Furry Has two eyes Has a tail Has paws Has two ears Prof. Leal-Taixé and Prof. Niessner 20

  20. Rec Recal all: Drop opou out • Using half the network = half capacity – Redundant representations – Base your scores on more features • Consider it as model ensemble Prof. Leal-Taixé and Prof. Niessner 21

  21. Rec Recal all: Drop opou out • Two models in one Model 1 Model 2 Prof. Leal-Taixé and Prof. Niessner 22

  22. MC MC dr drop opou out • Variational Inference – Find an approximation that • Dropout training – The variational distribution is from a Bernoulli distribution (where the states are “on” and “off”) Y Gal, Z Ghahramani, “Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning”, ICML 2016 Prof. Leal-Taixé and Prof. Niessner 23

  23. MC MC dr drop opou out • 1. Train a model with dropout before every weight layer test time • 2. Apply dropout at te – Sampling is done in a Monte Carlo fashion, hence the name Monte Carlo dropout Y Gal, Z Ghahramani, “Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning”, ICML 2016 Prof. Leal-Taixé and Prof. Niessner 24

  24. MC MC dr drop opou out – Sampling is done in a Monte Carlo fashion, e.g., classification Parameter sampling NN where and is the dropout distribution Y Gal, Z Ghahramani, “Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning”, ICML 2016 Prof. Leal-Taixé and Prof. Niessner 25

  25. Meas Measure e you our model odel’s uncer ertai ainty Kendal & Gal. “What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?“ NIPS 2016 Prof. Leal-Taixé and Prof. Niessner 26

  26. Another lo look Prof. Leal-Taixé and Prof. Niessner 27

  27. Le Let t us ta take ke ano nothe ther lo look • We know it is intractable, we approximate it • The denominator expresses how my data is generated Prof. Leal-Taixé and Prof. Niessner 28

  28. Le Let t us ta take ke ano nothe ther lo look • We assume that the data is generated by some random process, involving an unobserved continuous random (latent) variable • Generation process: • Posterior: Prof. Leal-Taixé and Prof. Niessner 29

  29. Le Let t us ta take ke ano nothe ther lo look • Variational Inference – Find an approximation. • My approximation is parameterized by a model Prof. Leal-Taixé and Prof. Niessner 30

  30. Variational l Au Autoenc ncoders Prof. Leal-Taixé and Prof. Niessner 31

  31. Rec Recal all: Autoen oencoder oders • Encode the input into a representation (bottleneck) and reconstruct it with the decoder Encoder Decoder Conv Transpose Conv Prof. Leal-Taixé and Prof. Niessner 32

  32. Var Variat ation onal al Autoen oencoder oder Encoder Decoder Conv Transpose Conv Prof. Leal-Taixé and Prof. Niessner 33

  33. Var Variat ation onal al Autoen oencoder oder • Latent space is now a distribution • Specifically it is a Gaussian Encoder Prof. Leal-Taixé and Prof. Niessner 34

  34. Var Variat ation onal al Autoen oencoder oder • Latent space is now a distribution • Specifically it is a Gaussian Encoder Mean Diagonal covariance Prof. Leal-Taixé and Prof. Niessner 35

  35. Var Variat ation onal al Autoen oencoder oder • Latent space is now a distribution • Specifically it is a Gaussian Encoder Mean Diagonal covariance Prof. Leal-Taixé and Prof. Niessner 36

  36. Var Variat ation onal al Autoen oencoder oder • Back to our Bayesian view, our generation process was: • Which is the denominator of the posterior: I want to optimize Prof. Leal-Taixé and Prof. Niessner 37

  37. Var Variat ation onal al Autoen oencoder oder • Loss function for a data point I draw samples of the latent variable z from my encoder Prof. Leal-Taixé and Prof. Niessner 38

  38. Var Variat ation onal al Autoen oencoder oder • Loss function for a data point Bayes Rule Posterior Prof. Leal-Taixé and Prof. Niessner 39

  39. Var Variat ation onal al Autoen oencoder oder • Loss function for a data point Just a constant Prof. Leal-Taixé and Prof. Niessner 40

  40. Var Variat ation onal al Autoen oencoder oder • Loss function for a data point Prof. Leal-Taixé and Prof. Niessner 41

  41. Var Variat ation onal al Autoen oencoder oder • Loss function for a data point Kullback-Leibler Divergences Prof. Leal-Taixé and Prof. Niessner 42

  42. Var Variat ation onal al Autoen oencoder oder • Loss function for a data point Measures how good Reconstruction loss I still cannot express my latent distribution the shape of the is with respect to my distribution. But I know prior Prof. Leal-Taixé and Prof. Niessner 43

  43. Var Variat ation onal al Autoen oencoder oder • Loss function for a data point Loss function (lower bound) Prof. Leal-Taixé and Prof. Niessner 44

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend