Ba Bayesi esian Deep Deep Le Lear arning ning Prof. Leal-Taix - PowerPoint PPT Presentation

Ba Bayesi esian Deep Deep Le Lear arning ning Prof. Leal-Taixé and Prof. Niessner 1

Go Going ful g full B Baye yesi sian • Bayes = Probabilities Hypothesis = Model • Bayes Theorem p ( H | E ) = p ( E | H ) p ( H ) p ( E ) Evidence = data Prof. Leal-Taixé and Prof. Niessner 2

Go Going ful g full B Baye yesi sian • Start with a prior on the model parameters p ( θ ) • Choose a statistical model p ( x | θ ) data • Use data to refine my prior, i.e., compute the posterior p ( θ | x ) = p ( x | θ ) p ( θ ) p ( x ) No dependence on parameters Prof. Leal-Taixé and Prof. Niessner 3

Go Going ful g full B Baye yesi sian • Start with a prior on the model parameters p ( θ ) • Choose a statistical model p ( x | θ ) data • Use data to refine my prior, i.e., compute the posterior p ( θ | x ) = p ( x | θ ) p ( θ ) posterior prior likelihood Prof. Leal-Taixé and Prof. Niessner 4

Go Going ful g full B Baye yesi sian • 1. Learning: Computing the posterior – Finding a point estimate (MAP) à what we have been doing so far! p ( θ | x ) = p ( x | θ ) p ( θ ) – Finding a probability distribution of θ This lecture p ( θ | x ) = p ( x | θ ) p ( θ ) p ( x ) Prof. Leal-Taixé and Prof. Niessner 5

Wh What at hav ave e we e lear earned ed so o far ar? ages of Deep Learning models • Ad Advant antag – Very expressive models – Good for tasks such as classification, regression, sequence prediction – Modular structure, efficient training, many tools – Scales well with large amounts of data • But we have also disad advant antag ages … – ”Black-box” feeling – We cannot judge how “confident” the model is about a decision Prof. Leal-Taixé and Prof. Niessner 7

Model Modeling uncer ertai ainty • Wish list: – We want to know what our models know and what they do not know Prof. Leal-Taixé and Prof. Niessner 8

Model Modeling uncer ertai ainty • Example: I have built a dog breed classifier Bulldog German What answer sheperd will my NN give? Chihuaha Prof. Leal-Taixé and Prof. Niessner 9

Model Modeling uncer ertai ainty • Example: I have built a dog breed classifier Bulldog German sheperd I would rather get as an answer that my model is not certain Chihuaha about the type of dog breed Prof. Leal-Taixé and Prof. Niessner 10

Model Modeling uncer ertai ainty • Wish list: – We want to know what our models know and what they do not know • Why do we care? – Decision making – Learning from limited, noisy, and missing data – Insights on why a model failed Prof. Leal-Taixé and Prof. Niessner 11

Model Modeling uncer ertai ainty • Finding the posterior – Finding a point estimate (MAP) à what we have been doing so far! – Finding a probability distribution of θ Image: https://medium.com/@joeDiHare/deep-bayesian-neural-networks-952763a9537 Prof. Leal-Taixé and Prof. Niessner 12

Model Modeling uncer ertai ainty • We can sample many times from the distribution and see how this affects our model’s predictions • If predictions are consistent = model is confident Image: https://medium.com/@joeDiHare/deep-bayesian-neural-networks-952763a9537 Prof. Leal-Taixé and Prof. Niessner 13

Model Modeling uncer ertai ainty I am not really sure Kendal & Gal. “What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?“ NIPS 2016 Prof. Leal-Taixé and Prof. Niessner 14

Wh Why? Prof. Leal-Taixé and Prof. Niessner 15

Ho How d do w we g get t the p post sterio ior? • Compute the posterior over the weights p ( θ | x ) = p ( x | θ ) p ( θ ) p ( x ) • Probability of observing our data under all possible model parameters How do we p ( x | θ ) p ( θ ) p ( θ | x ) = compute R θ p ( x | θ ) p ( θ ) d θ this? Prof. Leal-Taixé and Prof. Niessner 16

Ho How d do w we g get t the p post sterio ior? • How do we compute this? p ( x | θ ) p ( θ ) p ( θ | x ) = R θ p ( x | θ ) p ( θ ) d θ • Denominator = we cannot compute all possible combinations • Two ways to compute the Markov Chain Monte Carlo approximation of the posterior: Variational Inference Prof. Leal-Taixé and Prof. Niessner 17

Ho How d do w we g get t the p post sterio ior? • Markov Chain Monte Carlo (MCMC) – A chain of samples SLOW θ t → θ t +1 → θ t +2 ... that converge to p ( θ | x ) • Variational Inference q ( θ ) – Find an approximation that. arg min KL ( q ( θ ) || p ( θ | x )) Prof. Leal-Taixé and Prof. Niessner 18

Dropout Dropout for or Ba Bayesi esian I Inferen erence ce Prof. Leal-Taixé and Prof. Niessner 19

Rec Recal all: Drop opou out • Disable a random set of neurons (typically 50%) Forward Prof. Leal-Taixé and Prof. Niessner 20 Srivastava 2014

Rec Recal all: Drop opou out Redundant representations • Using half the network = half capacity Furry Has two eyes Has a tail Has paws Has two ears Prof. Leal-Taixé and Prof. Niessner 21

Rec Recal all: Drop opou out • Using half the network = half capacity – Redundant representations – Base your scores on more features • Consider it as model ensemble Prof. Leal-Taixé and Prof. Niessner 22

Rec Recal all: Drop opou out • Two models in one Model 1 Model 2 Prof. Leal-Taixé and Prof. Niessner 23

MC MC dr drop opou out • Variational Inference – Find an approximation that q ( θ ) arg min KL ( q ( θ ) || p ( θ | x )) • Dropout training – The variational distribution is from a Bernoulli distribution (where the states are “on” and “off”) Y Gal, Z Ghahramani, “Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning”, ICML 2016 Prof. Leal-Taixé and Prof. Niessner 24

MC MC dr drop opou out • 1. Train a model with dropout before every weight layer test time • 2. Apply dropout at te – Sampling is done in a Monte Carlo fashion, hence the name Monte Carlo dropout Y Gal, Z Ghahramani, “Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning”, ICML 2016 Prof. Leal-Taixé and Prof. Niessner 25

MC dr MC drop opou out – Sampling is done in a Monte Carlo fashion, e.g., T p ( y = c | x ) ≈ 1 X Softmax ( f ˆ θ t ( x )) T t =1 classification Parameter sampling NN ˆ where θ t ∼ q ( θ ) and is the dropout distribution ∼ q ( θ ) Y Gal, Z Ghahramani, “Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning”, ICML 2016 Prof. Leal-Taixé and Prof. Niessner 26

Meas Measure e you our model odel’s uncer ertai ainty Kendal & Gal. “What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?“ NIPS 2016 Prof. Leal-Taixé and Prof. Niessner 27

Variational l Au Autoenc ncoders Prof. Leal-Taixé and Prof. Niessner 32

Rec Recal all: Autoen oencoder oders • Encode the input into a representation (bottleneck) and reconstruct it with the decoder Encoder Decoder ˜ x x z Conv Transpose Conv Prof. Leal-Taixé and Prof. Niessner 33

Var Variat ation onal al Autoen oencoder oder p θ (˜ x | z ) q φ ( z | x ) Encoder Decoder φ ˜ θ x x z Conv Transpose Conv Prof. Leal-Taixé and Prof. Niessner 34

Var Variat ation onal al Autoen oencoder oder Goal: Sample from the latent distribution to generate new outputs! φ ˜ θ x x z Conv Transpose Conv Prof. Leal-Taixé and Prof. Niessner 35

VA VAE: trai aining Goal: Want to • Back to the Bayesian view for training estimate the parameters of my Z generative model p θ ( x ) = p θ ( x | z ) p θ ( z ) dz z Prior = Gaussian Intractable to x | z ) p θ ( z ) dz θ compute the output for every z z Decoder (Neural Z Network) p θ ( x | z ) p z Prof. Leal-Taixé and Prof. Niessner 40

VA VAE: trai aining Goal: Want to • We approximate it with an encoder estimate the parameters of my generative model Encoder µ z | x Sample φ x ˜ θ x Σ z | x z q φ ( z | x ) p θ (˜ x | z ) Prof. Leal-Taixé and Prof. Niessner 41

VA VAE: los oss function on • Loss function for a data point x i log( p θ ( x i )) = E z ∼ q φ ( z | x i ) [log( p θ ( x i ))] I draw samples of the latent variable z from my encoder Prof. Leal-Taixé and Prof. Niessner 42

Ba Bayesi esian Deep Deep Le Lear arning ning Prof. Leal-Taix - PowerPoint PPT Presentation

Ba Bayesi esian Deep Deep Le Lear arning ning Prof. Leal-Taix and Prof. Niessner 1 Go Going ful g full B Baye yesi sian Bayes = Probabilities Hypothesis = Model Bayes Theorem p ( H | E ) = p ( E | H ) p ( H ) p ( E )

Ba Bayesi esian Deep Deep Le Lear arning ning Prof. Leal-Taix and Prof. Niessner 1 Go

The Folly of Lear arning ning : On On Hearing ng Eras asmus Afresh (Handout) Pdraig

Ma Machine chine Lear arning ning for r Auton tonomous mous Dr Driving ving Nasser r

I . Preliminaries: practical matters I . Preliminaries: practical matters A. Office

Ba y esian Decon v olution of Seismic Arra y Data for RippleFired Explosions Eric

LEAR C ONTIGUOUS A REAS A NALYSIS (CAA) M APPING R EFINEMENT LEAR Open House Presentation April

Cleani ning C ng Cont ontract Cleani ning C ng Cont ontract Cleani ning C ng Cont

EMPIRE 1 Themes The Lear arning from an and ap appreciating th the pas ast: t:

Sud udan an Lear arning ng Ma Management agement System em Initiati ative ve Sudan

MI MILITARY S SEA EALIFT COMM COMMAND VIBRATION ANALYSIS N7 LEAR ARNING O OBJ BJECTI

Teache her e educators e enactment o of pedagogies t tha hat p prioritise LEAR ARNING

Ella Uwaibi Ten Sablo KINES KINESTHET HETIC IC LEAR ARNING S STYLE Visit

The Future of Lear arning Man anag agement Systems (LMS): Where did we come from and where

Pe Pedag agogical al Guidelines s an and a a Lear arning Pr Progressi ssion for CT CT

Lead ading Teac aching, L Lear arning & g & CPD PD Weston, 16 th November, 2019

Social Social and Emotional Lear and Emotional Learning ning (SEL) (SEL) Phas Phase e 2:

CSCE 978 Lecture 3: Risk and Loss Functions Introduction In Lecture 1 we mentioned our

Part 5: Structured Support Vector Machines Sebastian Nowozin and Christoph H. Lampert Colorado

Lecture 13 Deep Belief Networks Michael Picheny, Bhuvana Ramabhadran, Stanley F . Chen, Markus

An Introductory Tutorial on Implementing DRL Algorithms with DQN and TensorFlow Tim Tse May 18,

Appendix A Borough Plan outcomes performance Q2 2017/18 Reducing Inequality Green Amber Red

Introductory Statistics Day 13 Review Review Activity 1: Reading a histogram: True or False:

Organic Compounds in Water and Wastewater PCBs: Introduction and Properties Lecture #34 CEE

Economic Outlook Ted C. Jones May 17, 2018 The webinar will begin shortly. Phone |

Ba Bayesi esian Deep Deep Le Lear arning ning Prof. Leal-Taix - PowerPoint PPT Presentation

Ba Bayesi esian Deep Deep Le Lear arning ning Prof. Leal-Taix and Prof. Niessner 1 Go Going ful g full B Baye yesi sian Bayes = Probabilities Hypothesis = Model Bayes Theorem p ( H | E ) = p ( E | H ) p ( H ) p ( E )

Ba Bayesi esian Deep Deep Le Lear arning ning Prof. Leal-Taix and Prof. Niessner 1 Go

The Folly of Lear arning ning : On On Hearing ng Eras asmus Afresh (Handout) Pdraig

Ma Machine chine Lear arning ning for r Auton tonomous mous Dr Driving ving Nasser r

I . Preliminaries: practical matters I . Preliminaries: practical matters A. Office

Ba y esian Decon v olution of Seismic Arra y Data for RippleFired Explosions Eric

LEAR C ONTIGUOUS A REAS A NALYSIS (CAA) M APPING R EFINEMENT LEAR Open House Presentation April

Cleani ning C ng Cont ontract Cleani ning C ng Cont ontract Cleani ning C ng Cont

EMPIRE 1 Themes The Lear arning from an and ap appreciating th the pas ast: t:

Sud udan an Lear arning ng Ma Management agement System em Initiati ative ve Sudan

MI MILITARY S SEA EALIFT COMM COMMAND VIBRATION ANALYSIS N7 LEAR ARNING O OBJ BJECTI

Teache her e educators e enactment o of pedagogies t tha hat p prioritise LEAR ARNING

Ella Uwaibi Ten Sablo KINES KINESTHET HETIC IC LEAR ARNING S STYLE Visit

The Future of Lear arning Man anag agement Systems (LMS): Where did we come from and where

Pe Pedag agogical al Guidelines s an and a a Lear arning Pr Progressi ssion for CT CT

Lead ading Teac aching, L Lear arning &amp; g &amp; CPD PD Weston, 16 th November, 2019

Social Social and Emotional Lear and Emotional Learning ning (SEL) (SEL) Phas Phase e 2:

CSCE 978 Lecture 3: Risk and Loss Functions Introduction In Lecture 1 we mentioned our

Part 5: Structured Support Vector Machines Sebastian Nowozin and Christoph H. Lampert Colorado

Lecture 13 Deep Belief Networks Michael Picheny, Bhuvana Ramabhadran, Stanley F . Chen, Markus

An Introductory Tutorial on Implementing DRL Algorithms with DQN and TensorFlow Tim Tse May 18,

Appendix A Borough Plan outcomes performance Q2 2017/18 Reducing Inequality Green Amber Red

Introductory Statistics Day 13 Review Review Activity 1: Reading a histogram: True or False:

Organic Compounds in Water and Wastewater PCBs: Introduction and Properties Lecture #34 CEE

Economic Outlook Ted C. Jones May 17, 2018 The webinar will begin shortly. Phone |

Lead ading Teac aching, L Lear arning & g & CPD PD Weston, 16 th November, 2019