ba bayesi esian deep deep le lear arning ning
play

Ba Bayesi esian Deep Deep Le Lear arning ning Prof. Leal-Taix - PowerPoint PPT Presentation

Ba Bayesi esian Deep Deep Le Lear arning ning Prof. Leal-Taix and Prof. Niessner 1 Go Going ful g full B Baye yesi sian Bayes = Probabilities Hypothesis = Model Bayes Theorem p ( H | E ) = p ( E | H ) p ( H ) p ( E )


  1. Ba Bayesi esian Deep Deep Le Lear arning ning Prof. Leal-Taixé and Prof. Niessner 1

  2. Go Going ful g full B Baye yesi sian • Bayes = Probabilities Hypothesis = Model • Bayes Theorem p ( H | E ) = p ( E | H ) p ( H ) p ( E ) Evidence = data Prof. Leal-Taixé and Prof. Niessner 2

  3. Go Going ful g full B Baye yesi sian • Start with a prior on the model parameters p ( θ ) • Choose a statistical model p ( x | θ ) data • Use data to refine my prior, i.e., compute the posterior p ( θ | x ) = p ( x | θ ) p ( θ ) p ( x ) No dependence on parameters Prof. Leal-Taixé and Prof. Niessner 3

  4. Go Going ful g full B Baye yesi sian • Start with a prior on the model parameters p ( θ ) • Choose a statistical model p ( x | θ ) data • Use data to refine my prior, i.e., compute the posterior p ( θ | x ) = p ( x | θ ) p ( θ ) posterior prior likelihood Prof. Leal-Taixé and Prof. Niessner 4

  5. Go Going ful g full B Baye yesi sian • 1. Learning: Computing the posterior – Finding a point estimate (MAP) à what we have been doing so far! p ( θ | x ) = p ( x | θ ) p ( θ ) – Finding a probability distribution of θ This lecture p ( θ | x ) = p ( x | θ ) p ( θ ) p ( x ) Prof. Leal-Taixé and Prof. Niessner 5

  6. Wh What at hav ave e we e lear earned ed so o far ar? ages of Deep Learning models • Ad Advant antag – Very expressive models – Good for tasks such as classification, regression, sequence prediction – Modular structure, efficient training, many tools – Scales well with large amounts of data • But we have also disad advant antag ages … – ”Black-box” feeling – We cannot judge how “confident” the model is about a decision Prof. Leal-Taixé and Prof. Niessner 7

  7. Model Modeling uncer ertai ainty • Wish list: – We want to know what our models know and what they do not know Prof. Leal-Taixé and Prof. Niessner 8

  8. Model Modeling uncer ertai ainty • Example: I have built a dog breed classifier Bulldog German What answer sheperd will my NN give? Chihuaha Prof. Leal-Taixé and Prof. Niessner 9

  9. Model Modeling uncer ertai ainty • Example: I have built a dog breed classifier Bulldog German sheperd I would rather get as an answer that my model is not certain Chihuaha about the type of dog breed Prof. Leal-Taixé and Prof. Niessner 10

  10. Model Modeling uncer ertai ainty • Wish list: – We want to know what our models know and what they do not know • Why do we care? – Decision making – Learning from limited, noisy, and missing data – Insights on why a model failed Prof. Leal-Taixé and Prof. Niessner 11

  11. Model Modeling uncer ertai ainty • Finding the posterior – Finding a point estimate (MAP) à what we have been doing so far! – Finding a probability distribution of θ Image: https://medium.com/@joeDiHare/deep-bayesian-neural-networks-952763a9537 Prof. Leal-Taixé and Prof. Niessner 12

  12. Model Modeling uncer ertai ainty • We can sample many times from the distribution and see how this affects our model’s predictions • If predictions are consistent = model is confident Image: https://medium.com/@joeDiHare/deep-bayesian-neural-networks-952763a9537 Prof. Leal-Taixé and Prof. Niessner 13

  13. Model Modeling uncer ertai ainty I am not really sure Kendal & Gal. “What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?“ NIPS 2016 Prof. Leal-Taixé and Prof. Niessner 14

  14. Wh Why? Prof. Leal-Taixé and Prof. Niessner 15

  15. Ho How d do w we g get t the p post sterio ior? • Compute the posterior over the weights p ( θ | x ) = p ( x | θ ) p ( θ ) p ( x ) • Probability of observing our data under all possible model parameters How do we p ( x | θ ) p ( θ ) p ( θ | x ) = compute R θ p ( x | θ ) p ( θ ) d θ this? Prof. Leal-Taixé and Prof. Niessner 16

  16. Ho How d do w we g get t the p post sterio ior? • How do we compute this? p ( x | θ ) p ( θ ) p ( θ | x ) = R θ p ( x | θ ) p ( θ ) d θ • Denominator = we cannot compute all possible combinations • Two ways to compute the Markov Chain Monte Carlo approximation of the posterior: Variational Inference Prof. Leal-Taixé and Prof. Niessner 17

  17. Ho How d do w we g get t the p post sterio ior? • Markov Chain Monte Carlo (MCMC) – A chain of samples SLOW θ t → θ t +1 → θ t +2 ... that converge to p ( θ | x ) • Variational Inference q ( θ ) – Find an approximation that. arg min KL ( q ( θ ) || p ( θ | x )) Prof. Leal-Taixé and Prof. Niessner 18

  18. Dropout Dropout for or Ba Bayesi esian I Inferen erence ce Prof. Leal-Taixé and Prof. Niessner 19

  19. Rec Recal all: Drop opou out • Disable a random set of neurons (typically 50%) Forward Prof. Leal-Taixé and Prof. Niessner 20 Srivastava 2014

  20. Rec Recal all: Drop opou out Redundant representations • Using half the network = half capacity Furry Has two eyes Has a tail Has paws Has two ears Prof. Leal-Taixé and Prof. Niessner 21

  21. Rec Recal all: Drop opou out • Using half the network = half capacity – Redundant representations – Base your scores on more features • Consider it as model ensemble Prof. Leal-Taixé and Prof. Niessner 22

  22. Rec Recal all: Drop opou out • Two models in one Model 1 Model 2 Prof. Leal-Taixé and Prof. Niessner 23

  23. MC MC dr drop opou out • Variational Inference – Find an approximation that q ( θ ) arg min KL ( q ( θ ) || p ( θ | x )) • Dropout training – The variational distribution is from a Bernoulli distribution (where the states are “on” and “off”) Y Gal, Z Ghahramani, “Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning”, ICML 2016 Prof. Leal-Taixé and Prof. Niessner 24

  24. MC MC dr drop opou out • 1. Train a model with dropout before every weight layer test time • 2. Apply dropout at te – Sampling is done in a Monte Carlo fashion, hence the name Monte Carlo dropout Y Gal, Z Ghahramani, “Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning”, ICML 2016 Prof. Leal-Taixé and Prof. Niessner 25

  25. MC dr MC drop opou out – Sampling is done in a Monte Carlo fashion, e.g., T p ( y = c | x ) ≈ 1 X Softmax ( f ˆ θ t ( x )) T t =1 classification Parameter sampling NN ˆ where θ t ∼ q ( θ ) and is the dropout distribution ∼ q ( θ ) Y Gal, Z Ghahramani, “Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning”, ICML 2016 Prof. Leal-Taixé and Prof. Niessner 26

  26. Meas Measure e you our model odel’s uncer ertai ainty Kendal & Gal. “What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?“ NIPS 2016 Prof. Leal-Taixé and Prof. Niessner 27

  27. Variational l Au Autoenc ncoders Prof. Leal-Taixé and Prof. Niessner 32

  28. Rec Recal all: Autoen oencoder oders • Encode the input into a representation (bottleneck) and reconstruct it with the decoder Encoder Decoder ˜ x x z Conv Transpose Conv Prof. Leal-Taixé and Prof. Niessner 33

  29. Var Variat ation onal al Autoen oencoder oder p θ (˜ x | z ) q φ ( z | x ) Encoder Decoder φ ˜ θ x x z Conv Transpose Conv Prof. Leal-Taixé and Prof. Niessner 34

  30. Var Variat ation onal al Autoen oencoder oder Goal: Sample from the latent distribution to generate new outputs! φ ˜ θ x x z Conv Transpose Conv Prof. Leal-Taixé and Prof. Niessner 35

  31. Var Variat ation onal al Autoen oencoder oder • Latent space is now a distribution • Specifically it is a Gaussian Encoder Decoder µ z | x Sample φ x ˜ θ x Σ z | x z z | x ∼ N ( µ z | x , Σ z | x ) Prof. Leal-Taixé and Prof. Niessner 36

  32. Var Variat ation onal al Autoen oencoder oder • Latent space is now a distribution • Specifically it is a Gaussian Encoder Mean µ z | x z | x ∼ N ( µ z | x , Σ z | x ) φ x Σ z | x Diagonal covariance Prof. Leal-Taixé and Prof. Niessner 37

  33. Var Variat ation onal al Autoen oencoder oder • Training Encoder Decoder µ z | x Sample φ x ˜ θ x Σ z | x z z | x ∼ N ( µ z | x , Σ z | x ) Prof. Leal-Taixé and Prof. Niessner 38

  34. Var Variat ation onal al Autoen oencoder oder • Test: sampling from the latent space Decoder µ z | x Sample ˜ θ x Σ z | x z z | x ∼ N ( µ z | x , Σ z | x ) Prof. Leal-Taixé and Prof. Niessner 39

  35. VA VAE: trai aining Goal: Want to • Back to the Bayesian view for training estimate the parameters of my Z generative model p θ ( x ) = p θ ( x | z ) p θ ( z ) dz z Prior = Gaussian Intractable to x | z ) p θ ( z ) dz θ compute the output for every z z Decoder (Neural Z Network) p θ ( x | z ) p z Prof. Leal-Taixé and Prof. Niessner 40

  36. VA VAE: trai aining Goal: Want to • We approximate it with an encoder estimate the parameters of my generative model Encoder µ z | x Sample φ x ˜ θ x Σ z | x z q φ ( z | x ) p θ (˜ x | z ) Prof. Leal-Taixé and Prof. Niessner 41

  37. VA VAE: los oss function on • Loss function for a data point x i log( p θ ( x i )) = E z ∼ q φ ( z | x i ) [log( p θ ( x i ))] I draw samples of the latent variable z from my encoder Prof. Leal-Taixé and Prof. Niessner 42

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend