Bayesian Neural Network: Foundation and Practice Tianyu Cui, Yi - PowerPoint PPT Presentation

Bayesian Neural Network: Foundation and Practice Tianyu Cui, Yi Zhao Department of Computer Science Aalto University May 2, 2019 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Outline Introduction to Bayesian Neural Network Dropout as Bayesian Approximation Concrete Dropout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Introduction to Bayesian Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

What’s a Neural Network? Figure: A simple NN (left) and a BNN (right)[Blundell, 2015]. Probabilistic interpretation of NN: ▶ Model: y = f ( x ; w ) + ϵ , ϵ ∼ N (0 , σ 2 ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

What’s a Neural Network? Figure: A simple NN (left) and a BNN (right)[Blundell, 2015]. Probabilistic interpretation of NN: ▶ Model: y = f ( x ; w ) + ϵ , ϵ ∼ N (0 , σ 2 ) ▶ Likelihood: P ( y | x , w ) = N ( y ; f ( x ; w ) , σ 2 ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

What’s a Neural Network? Figure: A simple NN (left) and a BNN (right)[Blundell, 2015]. Probabilistic interpretation of NN: ▶ Model: y = f ( x ; w ) + ϵ , ϵ ∼ N (0 , σ 2 ) ▶ Likelihood: P ( y | x , w ) = N ( y ; f ( x ; w ) , σ 2 ) ▶ Prior: P ( w ) = N ( w ; 0 , σ 2 w I ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

What’s a Neural Network? Figure: A simple NN (left) and a BNN (right)[Blundell, 2015]. Probabilistic interpretation of NN: ▶ Model: y = f ( x ; w ) + ϵ , ϵ ∼ N (0 , σ 2 ) ▶ Likelihood: P ( y | x , w ) = N ( y ; f ( x ; w ) , σ 2 ) ▶ Prior: P ( w ) = N ( w ; 0 , σ 2 w I ) ▶ Posterior: P ( w | y , x ) ∝ P ( y | x , w ) P ( w ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

What’s a Neural Network? Figure: A simple NN (left) and a BNN (right)[Blundell, 2015]. Probabilistic interpretation of NN: ▶ Model: y = f ( x ; w ) + ϵ , ϵ ∼ N (0 , σ 2 ) ▶ Likelihood: P ( y | x , w ) = N ( y ; f ( x ; w ) , σ 2 ) ▶ Prior: P ( w ) = N ( w ; 0 , σ 2 w I ) ▶ Posterior: P ( w | y , x ) ∝ P ( y | x , w ) P ( w ) ▶ MAP: w ⋆ = argmax w P ( w | y , x ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

What’s a Neural Network? Figure: A simple NN (left) and a BNN (right)[Blundell, 2015]. Probabilistic interpretation of NN: ▶ Model: y = f ( x ; w ) + ϵ , ϵ ∼ N (0 , σ 2 ) ▶ Likelihood: P ( y | x , w ) = N ( y ; f ( x ; w ) , σ 2 ) ▶ Prior: P ( w ) = N ( w ; 0 , σ 2 w I ) ▶ Posterior: P ( w | y , x ) ∝ P ( y | x , w ) P ( w ) ▶ MAP: w ⋆ = argmax w P ( w | y , x ) ▶ Prediction: y ′ = f ( x ′ ; w ⋆ ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

What’s a Bayesian Neural Network? Figure: A simple NN (left) and a BNN (right)[Blundell, 2015]. What do I mean by being Bayesian? ▶ Model: y = f ( x ; w ) + ϵ , ϵ ∼ N (0 , σ 2 ) ▶ Likelihood: P ( y | x , w ) = N ( y ; f ( x ; w ) , σ 2 ) ▶ Prior: P ( w ) = N ( w ; 0 , σ 2 w I ) ▶ Posterior: P ( w | y , x ) ∝ P ( y | x , w ) P ( w ) ▶ MAP: w ⋆ = argmax w P ( w | y , x ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

What’s a Bayesian Neural Network? Figure: A simple NN (left) and a BNN (right)[Blundell, 2015]. What do I mean by being Bayesian? ▶ Model: y = f ( x ; w ) + ϵ , ϵ ∼ N (0 , σ 2 ) ▶ Likelihood: P ( y | x , w ) = N ( y ; f ( x ; w ) , σ 2 ) ▶ Prior: P ( w ) = N ( w ; 0 , σ 2 w I ) ▶ Posterior: P ( w | y , x ) ∝ P ( y | x , w ) P ( w ) ▶ MAP: w ⋆ = argmax w P ( w | y , x ) ▶ Prediction: y ′ = f ( x ′ ; w ), w ∼ P ( w | y , x ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Why Should We Care? Calibrated prediction uncertainty : The models should know what they don’t know. One Example: [Gal, 2017] ▶ We train a model to recognise dog breeds. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Why Should We Care? Calibrated prediction uncertainty : The models should know what they don’t know. One Example: [Gal, 2017] ▶ We train a model to recognise dog breeds. ▶ What would you want your model to do when a cat are given? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Why Should We Care? Calibrated prediction uncertainty : The models should know what they don’t know. One Example: [Gal, 2017] ▶ We train a model to recognise dog breeds. ▶ What would you want your model to do when a cat are given? ▶ A prediction with high uncertainty. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Why Should We Care? Calibrated prediction uncertainty : The models should know what they don’t know. One Example: [Gal, 2017] ▶ We train a model to recognise dog breeds. ▶ What would you want your model to do when a cat are given? ▶ A prediction with high uncertainty. buffer Successful Applications: ▶ Identify adversarial examples [Smith, 2018]. ▶ Adapted exploration rate in RL [Gal, 2016]. ▶ Self-driving car [McAllister, 2017, Michelmore, 2018] and medican analysis [Gal, 2017]. buffer Self-driving car and medican analysis. buffer Self-driving car and medican analysis. buffer Self-driving car and medican analysis. buffer Self-driving car and medican analysis. buffer Self-driving car and medican analysis. . . . . . . . . . . . . . . . . . . . . One simple algorhthm: dropout as Bayesian approximation. . . . . . . . . . . . . . . . . . . . .

How To Learn a Bayesian Neural Network? What’s the difficult part? ▶ P ( w | y , x ) is generally intractable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

How To Learn a Bayesian Neural Network? What’s the difficult part? ▶ P ( w | y , x ) is generally intractable ▶ Standard approximate inference (difficult): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

How To Learn a Bayesian Neural Network? What’s the difficult part? ▶ P ( w | y , x ) is generally intractable ▶ Standard approximate inference (difficult): ▶ Laplace Approximation [MacKay, 1992]; . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

How To Learn a Bayesian Neural Network? What’s the difficult part? ▶ P ( w | y , x ) is generally intractable ▶ Standard approximate inference (difficult): ▶ Laplace Approximation [MacKay, 1992]; ▶ Hamiltonian Monte Carlo [Neal, 1995]; . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

How To Learn a Bayesian Neural Network? What’s the difficult part? ▶ P ( w | y , x ) is generally intractable ▶ Standard approximate inference (difficult): ▶ Laplace Approximation [MacKay, 1992]; ▶ Hamiltonian Monte Carlo [Neal, 1995]; ▶ (Stochastic) Variational Inference [Blundell, 2015]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

How To Learn a Bayesian Neural Network? What’s the difficult part? ▶ P ( w | y , x ) is generally intractable ▶ Standard approximate inference (difficult): ▶ Laplace Approximation [MacKay, 1992]; ▶ Hamiltonian Monte Carlo [Neal, 1995]; ▶ (Stochastic) Variational Inference [Blundell, 2015]. ▶ Most of the algorithms above are complicated both in theory and in practice. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

How To Learn a Bayesian Neural Network? What’s the difficult part? ▶ P ( w | y , x ) is generally intractable ▶ Standard approximate inference (difficult): ▶ Laplace Approximation [MacKay, 1992]; ▶ Hamiltonian Monte Carlo [Neal, 1995]; ▶ (Stochastic) Variational Inference [Blundell, 2015]. ▶ Most of the algorithms above are complicated both in theory and in practice. ▶ A simple and pratical Bayesian neural network: dropout [Gal, 2016]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Dropout as Bayesian Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Dropout as Bayesian Approximation Dropout works by randomly setting network units to zero. We can obtain the distribution of prediction by repeating forward passing several times. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Bayesian Neural Network: Foundation and Practice Tianyu Cui, Yi - PowerPoint PPT Presentation

Bayesian Neural Network: Foundation and Practice Tianyu Cui, Yi Zhao Department of Computer Science Aalto University May 2, 2019 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Bayesian Methods for Neural Networks Readings: Bishop, Neural Networks for Pattern Recognition .

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Building a Bayesian Network 223 / 385 The construction of a Bayesian network Construction of a

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

Bayesian networks (2) Lirong Xia Last class Bayesian networks compact, graphical

Bayes Nets (Ch. 14) Announcements Homework 1 posted Bayesian Network A Bayesian network (Bayes

Exact inference (Ch. 14) Bayesian Network A Bayesian network (Bayes net) is: (1) a directed

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

The Bayesian Network Framework 89 / 384 The network formalism, informal A Bayesian network

Bayesian Networks Youve heard about how Bayesian networks have revolutionized AI

Beyond Uniform Priors in Bayesian Network Structure Learning (for Discrete Bayesian Networks)

LS-SVMlab & Large scale modeling Kristiaan Pelckmans, ESAT- SCD/SISTA J.A.K. Suykens, B. De

APPLICATION OF THE BAYESIAN APPROACH AND INVERSE DISPERSION MODELLING TO SOURCE TERM ESTIMATES

Automated Bayesian Inference for PDE-constrained Inverse Problems Ivan Yashchuk VTT Technical

Sequential Optimal Inference for Experiments with Bayesian Particle Filters Remi Daviet Wharton

Introduction to stochastic dynamical modelling Gavin J Gibson Maxwell Institute for Mathematical

Inference and Evidence Danil Lakens Eindhoven University of Technology @Lakens /

Defining the firing rate for a non-Poissonian spike train --- a nerdish study --- Shigeru

Bayesian networks in Mastermind Ji r Vomlel http://www.utia.cas.cz/vomlel/ 1 Contents

Sambuz

Useful Links

Newsletter

Mail Us

Bayesian Neural Network: Foundation and Practice Tianyu Cui, Yi - PowerPoint PPT Presentation

Bayesian Neural Network: Foundation and Practice Tianyu Cui, Yi Zhao Department of Computer Science Aalto University May 2, 2019 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Bayesian Methods for Neural Networks Readings: Bishop, Neural Networks for Pattern Recognition .

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Building a Bayesian Network 223 / 385 The construction of a Bayesian network Construction of a

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

Bayesian networks (2) Lirong Xia Last class Bayesian networks compact, graphical

Bayes Nets (Ch. 14) Announcements Homework 1 posted Bayesian Network A Bayesian network (Bayes

Exact inference (Ch. 14) Bayesian Network A Bayesian network (Bayes net) is: (1) a directed

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

The Bayesian Network Framework 89 / 384 The network formalism, informal A Bayesian network

Bayesian Networks Youve heard about how Bayesian networks have revolutionized AI

Beyond Uniform Priors in Bayesian Network Structure Learning (for Discrete Bayesian Networks)

LS-SVMlab &amp; Large scale modeling Kristiaan Pelckmans, ESAT- SCD/SISTA J.A.K. Suykens, B. De

APPLICATION OF THE BAYESIAN APPROACH AND INVERSE DISPERSION MODELLING TO SOURCE TERM ESTIMATES

Automated Bayesian Inference for PDE-constrained Inverse Problems Ivan Yashchuk VTT Technical

Sequential Optimal Inference for Experiments with Bayesian Particle Filters Remi Daviet Wharton

Introduction to stochastic dynamical modelling Gavin J Gibson Maxwell Institute for Mathematical

Inference and Evidence Danil Lakens Eindhoven University of Technology @Lakens /

Defining the firing rate for a non-Poissonian spike train --- a nerdish study --- Shigeru

Bayesian networks in Mastermind Ji r Vomlel http://www.utia.cas.cz/vomlel/ 1 Contents

Sambuz

Useful Links

Newsletter

Mail Us

LS-SVMlab & Large scale modeling Kristiaan Pelckmans, ESAT- SCD/SISTA J.A.K. Suykens, B. De