Ba Bayesi esian Deep Deep Le Lear arning ning Prof. Leal-Taix - PowerPoint PPT Presentation

Ba Bayesi esian Deep Deep Le Lear arning ning Prof. Leal-Taixé and Prof. Niessner 1

Go Going ful g full Bay ayes esian an • Bayes = Probabilities Hypothesis = Model • Bayes Theorem Evidence = data Prof. Leal-Taixé and Prof. Niessner 2

Go Going ful g full Bay ayes esian an • Start with a prior on the model parameters • Choose a statistical model data • Use data to refine my prior, i.e., compute the posterior No dependence on parameters Prof. Leal-Taixé and Prof. Niessner 3

Go Going ful g full Bay ayes esian an • Start with a prior on the model parameters • Choose a statistical model data • Use data to refine my prior, i.e., compute the posterior posterior prior likelihood Prof. Leal-Taixé and Prof. Niessner 4

Go Going ful g full Bay ayes esian an • 1. Learning: Computing the posterior – Finding a point estimate (MAP) à what we have been doing so far! – Finding a probability distribution of This lecture Prof. Leal-Taixé and Prof. Niessner 5

Wh What at hav ave e we e lear earned ed so o far ar? ages of Deep Learning models • Ad Advant antag – Very expressive models – Good for tasks such as classification, regression, sequence prediction – Modular structure, efficient training, many tools – Scales well with large amounts of data • But we have also disad advant antag ages … – ”Black-box” feeling – We cannot judge how “confident” the model is about a decision Prof. Leal-Taixé and Prof. Niessner 7

Model Modeling uncer ertai ainty • Wish list: – We want to know what our models know and what they do not know Prof. Leal-Taixé and Prof. Niessner 8

Model Modeling uncer ertai ainty • Example: I have built a dog breed classifier Bulldog German What answer sheperd will my NN give? Chihuaha Prof. Leal-Taixé and Prof. Niessner 9

Model Modeling uncer ertai ainty • Example: I have built a dog breed classifier Bulldog German sheperd I would rather get as an answer that my model is not certain Chihuaha about the type of dog breed Prof. Leal-Taixé and Prof. Niessner 10

Model Modeling uncer ertai ainty • Wish list: – We want to know what our models know and what they do not know • Why do we care? – Decision making – Learning from limited, noisy, and missing data – Insights on why a model failed Prof. Leal-Taixé and Prof. Niessner 11

Model Modeling uncer ertai ainty • Finding the posterior – Finding a point estimate (MAP) à what we have been doing so far! – Finding a probability distribution of Image: https://medium.com/@joeDiHare/deep-bayesian-neural-networks-952763a9537 Prof. Leal-Taixé and Prof. Niessner 12

Model Modeling uncer ertai ainty • We can sample many times from the distribution and see how this affects our model’s predictions • If predictions are consistent = model is confident Image: https://medium.com/@joeDiHare/deep-bayesian-neural-networks-952763a9537 Prof. Leal-Taixé and Prof. Niessner 13

Model Modeling uncer ertai ainty I am not really sure Kendal & Gal. “What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?“ NIPS 2016 Prof. Leal-Taixé and Prof. Niessner 14

Ho How d do w we g get t the p post sterio ior? • Compute the posterior over the weights • Probability of observing our data under all possible model parameters How do we compute this? Prof. Leal-Taixé and Prof. Niessner 15

Ho How d do w we g get t the p post sterio ior? • How do we compute this? • Denominator = we cannot compute all possible combinations • Two ways to compute the Markov Chain Monte Carlo approximation of the posterior: Variational Inference Prof. Leal-Taixé and Prof. Niessner 16

Ho How d do w we g get t the p post sterio ior? • Markov Chain Monte Carlo (MCMC) – A chain of samples SLOW that converge to • Variational Inference – Find an approximation that. Prof. Leal-Taixé and Prof. Niessner 17

Dropout Dropout for or Ba Bayesi esian I Inferen erence ce Prof. Leal-Taixé and Prof. Niessner 18

Rec Recal all: Drop opou out • Disable a random set of neurons (typically 50%) Forward Prof. Leal-Taixé and Prof. Niessner 19 Srivastava 2014

Rec Recal all: Drop opou out Redundant representations • Using half the network = half capacity Furry Has two eyes Has a tail Has paws Has two ears Prof. Leal-Taixé and Prof. Niessner 20

Rec Recal all: Drop opou out • Using half the network = half capacity – Redundant representations – Base your scores on more features • Consider it as model ensemble Prof. Leal-Taixé and Prof. Niessner 21

Rec Recal all: Drop opou out • Two models in one Model 1 Model 2 Prof. Leal-Taixé and Prof. Niessner 22

MC MC dr drop opou out • Variational Inference – Find an approximation that • Dropout training – The variational distribution is from a Bernoulli distribution (where the states are “on” and “off”) Y Gal, Z Ghahramani, “Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning”, ICML 2016 Prof. Leal-Taixé and Prof. Niessner 23

MC MC dr drop opou out • 1. Train a model with dropout before every weight layer test time • 2. Apply dropout at te – Sampling is done in a Monte Carlo fashion, hence the name Monte Carlo dropout Y Gal, Z Ghahramani, “Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning”, ICML 2016 Prof. Leal-Taixé and Prof. Niessner 24

MC MC dr drop opou out – Sampling is done in a Monte Carlo fashion, e.g., classification Parameter sampling NN where and is the dropout distribution Y Gal, Z Ghahramani, “Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning”, ICML 2016 Prof. Leal-Taixé and Prof. Niessner 25

Meas Measure e you our model odel’s uncer ertai ainty Kendal & Gal. “What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?“ NIPS 2016 Prof. Leal-Taixé and Prof. Niessner 26

Another lo look Prof. Leal-Taixé and Prof. Niessner 27

Le Let t us ta take ke ano nothe ther lo look • We know it is intractable, we approximate it • The denominator expresses how my data is generated Prof. Leal-Taixé and Prof. Niessner 28

Le Let t us ta take ke ano nothe ther lo look • We assume that the data is generated by some random process, involving an unobserved continuous random (latent) variable • Generation process: • Posterior: Prof. Leal-Taixé and Prof. Niessner 29

Le Let t us ta take ke ano nothe ther lo look • Variational Inference – Find an approximation. • My approximation is parameterized by a model Prof. Leal-Taixé and Prof. Niessner 30

Variational l Au Autoenc ncoders Prof. Leal-Taixé and Prof. Niessner 31

Rec Recal all: Autoen oencoder oders • Encode the input into a representation (bottleneck) and reconstruct it with the decoder Encoder Decoder Conv Transpose Conv Prof. Leal-Taixé and Prof. Niessner 32

Var Variat ation onal al Autoen oencoder oder Encoder Decoder Conv Transpose Conv Prof. Leal-Taixé and Prof. Niessner 33

Var Variat ation onal al Autoen oencoder oder • Latent space is now a distribution • Specifically it is a Gaussian Encoder Prof. Leal-Taixé and Prof. Niessner 34

Var Variat ation onal al Autoen oencoder oder • Latent space is now a distribution • Specifically it is a Gaussian Encoder Mean Diagonal covariance Prof. Leal-Taixé and Prof. Niessner 35

Var Variat ation onal al Autoen oencoder oder • Latent space is now a distribution • Specifically it is a Gaussian Encoder Mean Diagonal covariance Prof. Leal-Taixé and Prof. Niessner 36

Var Variat ation onal al Autoen oencoder oder • Back to our Bayesian view, our generation process was: • Which is the denominator of the posterior: I want to optimize Prof. Leal-Taixé and Prof. Niessner 37

Var Variat ation onal al Autoen oencoder oder • Loss function for a data point I draw samples of the latent variable z from my encoder Prof. Leal-Taixé and Prof. Niessner 38

Var Variat ation onal al Autoen oencoder oder • Loss function for a data point Bayes Rule Posterior Prof. Leal-Taixé and Prof. Niessner 39

Var Variat ation onal al Autoen oencoder oder • Loss function for a data point Just a constant Prof. Leal-Taixé and Prof. Niessner 40

Var Variat ation onal al Autoen oencoder oder • Loss function for a data point Prof. Leal-Taixé and Prof. Niessner 41

Var Variat ation onal al Autoen oencoder oder • Loss function for a data point Kullback-Leibler Divergences Prof. Leal-Taixé and Prof. Niessner 42

Var Variat ation onal al Autoen oencoder oder • Loss function for a data point Measures how good Reconstruction loss I still cannot express my latent distribution the shape of the is with respect to my distribution. But I know prior Prof. Leal-Taixé and Prof. Niessner 43

Var Variat ation onal al Autoen oencoder oder • Loss function for a data point Loss function (lower bound) Prof. Leal-Taixé and Prof. Niessner 44

Ba Bayesi esian Deep Deep Le Lear arning ning Prof. Leal-Taix - PowerPoint PPT Presentation

Ba Bayesi esian Deep Deep Le Lear arning ning Prof. Leal-Taix and Prof. Niessner 1 Go Going ful g full Bay ayes esian an Bayes = Probabilities Hypothesis = Model Bayes Theorem Evidence = data Prof. Leal-Taix and Prof.

Ba Bayesi esian Deep Deep Le Lear arning ning Prof. Leal-Taix and Prof. Niessner 1 Go

The Folly of Lear arning ning : On On Hearing ng Eras asmus Afresh (Handout) Pdraig

Ma Machine chine Lear arning ning for r Auton tonomous mous Dr Driving ving Nasser r

I . Preliminaries: practical matters I . Preliminaries: practical matters A. Office

Ba y esian Decon v olution of Seismic Arra y Data for RippleFired Explosions Eric

LEAR C ONTIGUOUS A REAS A NALYSIS (CAA) M APPING R EFINEMENT LEAR Open House Presentation April

Cleani ning C ng Cont ontract Cleani ning C ng Cont ontract Cleani ning C ng Cont

EMPIRE 1 Themes The Lear arning from an and ap appreciating th the pas ast: t:

Sud udan an Lear arning ng Ma Management agement System em Initiati ative ve Sudan

MI MILITARY S SEA EALIFT COMM COMMAND VIBRATION ANALYSIS N7 LEAR ARNING O OBJ BJECTI

Teache her e educators e enactment o of pedagogies t tha hat p prioritise LEAR ARNING

Ella Uwaibi Ten Sablo KINES KINESTHET HETIC IC LEAR ARNING S STYLE Visit

The Future of Lear arning Man anag agement Systems (LMS): Where did we come from and where

Pe Pedag agogical al Guidelines s an and a a Lear arning Pr Progressi ssion for CT CT

Lead ading Teac aching, L Lear arning & g & CPD PD Weston, 16 th November, 2019

Social Social and Emotional Lear and Emotional Learning ning (SEL) (SEL) Phas Phase e 2:

Complementary-Label Learning for Arbitrary Losses and Models Takashi Ishida 1 , 2 Gang Niu 2

CS184b: Computer Architecture [Single Threaded Architecture: abstractions, quantification, and

Classification: Introduction David Dalpiaz STAT 430, Fall 2017 1 Announcements BVT Review

CSE306 Software Quality in Practice Dr. Carl Alphonce alphonce@buffalo.edu 343 Davis Hall

CNN Architectures ILSVRC: Imagenet Large Scale Visual Recognition Challenge [Russakovsky et al

Restart and Recovery Plan South Hackensack Memorial Reopening Plan For 2020-2021 School Year

TO 7XC and I am an English teacher at St Marys College. Reading is my favourite thing to do,

Multidimensional Scaling MAT 6480W / STT 6705V Guy Wolf guy.wolf@umontreal.ca Universit e de

Ba Bayesi esian Deep Deep Le Lear arning ning Prof. Leal-Taix - PowerPoint PPT Presentation

Ba Bayesi esian Deep Deep Le Lear arning ning Prof. Leal-Taix and Prof. Niessner 1 Go Going ful g full Bay ayes esian an Bayes = Probabilities Hypothesis = Model Bayes Theorem Evidence = data Prof. Leal-Taix and Prof.

Ba Bayesi esian Deep Deep Le Lear arning ning Prof. Leal-Taix and Prof. Niessner 1 Go

The Folly of Lear arning ning : On On Hearing ng Eras asmus Afresh (Handout) Pdraig

Ma Machine chine Lear arning ning for r Auton tonomous mous Dr Driving ving Nasser r

I . Preliminaries: practical matters I . Preliminaries: practical matters A. Office

Ba y esian Decon v olution of Seismic Arra y Data for RippleFired Explosions Eric

LEAR C ONTIGUOUS A REAS A NALYSIS (CAA) M APPING R EFINEMENT LEAR Open House Presentation April

Cleani ning C ng Cont ontract Cleani ning C ng Cont ontract Cleani ning C ng Cont

EMPIRE 1 Themes The Lear arning from an and ap appreciating th the pas ast: t:

Sud udan an Lear arning ng Ma Management agement System em Initiati ative ve Sudan

MI MILITARY S SEA EALIFT COMM COMMAND VIBRATION ANALYSIS N7 LEAR ARNING O OBJ BJECTI

Teache her e educators e enactment o of pedagogies t tha hat p prioritise LEAR ARNING

Ella Uwaibi Ten Sablo KINES KINESTHET HETIC IC LEAR ARNING S STYLE Visit

The Future of Lear arning Man anag agement Systems (LMS): Where did we come from and where

Pe Pedag agogical al Guidelines s an and a a Lear arning Pr Progressi ssion for CT CT

Lead ading Teac aching, L Lear arning &amp; g &amp; CPD PD Weston, 16 th November, 2019

Social Social and Emotional Lear and Emotional Learning ning (SEL) (SEL) Phas Phase e 2:

Complementary-Label Learning for Arbitrary Losses and Models Takashi Ishida 1 , 2 Gang Niu 2

CS184b: Computer Architecture [Single Threaded Architecture: abstractions, quantification, and

Classification: Introduction David Dalpiaz STAT 430, Fall 2017 1 Announcements BVT Review

CSE306 Software Quality in Practice Dr. Carl Alphonce alphonce@buffalo.edu 343 Davis Hall

CNN Architectures ILSVRC: Imagenet Large Scale Visual Recognition Challenge [Russakovsky et al

Restart and Recovery Plan South Hackensack Memorial Reopening Plan For 2020-2021 School Year

TO 7XC and I am an English teacher at St Marys College. Reading is my favourite thing to do,

Multidimensional Scaling MAT 6480W / STT 6705V Guy Wolf guy.wolf@umontreal.ca Universit e de

Lead ading Teac aching, L Lear arning & g & CPD PD Weston, 16 th November, 2019