bayesian neural network foundation and practice
play

Bayesian Neural Network: Foundation and Practice Tianyu Cui, Yi - PowerPoint PPT Presentation

Bayesian Neural Network: Foundation and Practice Tianyu Cui, Yi Zhao Department of Computer Science Aalto University May 2, 2019 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .


  1. Bayesian Neural Network: Foundation and Practice Tianyu Cui, Yi Zhao Department of Computer Science Aalto University May 2, 2019 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  2. Outline Introduction to Bayesian Neural Network Dropout as Bayesian Approximation Concrete Dropout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  3. Introduction to Bayesian Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  4. What’s a Neural Network? Figure: A simple NN (left) and a BNN (right)[Blundell, 2015]. Probabilistic interpretation of NN: ▶ Model: y = f ( x ; w ) + ϵ , ϵ ∼ N (0 , σ 2 ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  5. What’s a Neural Network? Figure: A simple NN (left) and a BNN (right)[Blundell, 2015]. Probabilistic interpretation of NN: ▶ Model: y = f ( x ; w ) + ϵ , ϵ ∼ N (0 , σ 2 ) ▶ Likelihood: P ( y | x , w ) = N ( y ; f ( x ; w ) , σ 2 ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  6. What’s a Neural Network? Figure: A simple NN (left) and a BNN (right)[Blundell, 2015]. Probabilistic interpretation of NN: ▶ Model: y = f ( x ; w ) + ϵ , ϵ ∼ N (0 , σ 2 ) ▶ Likelihood: P ( y | x , w ) = N ( y ; f ( x ; w ) , σ 2 ) ▶ Prior: P ( w ) = N ( w ; 0 , σ 2 w I ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  7. What’s a Neural Network? Figure: A simple NN (left) and a BNN (right)[Blundell, 2015]. Probabilistic interpretation of NN: ▶ Model: y = f ( x ; w ) + ϵ , ϵ ∼ N (0 , σ 2 ) ▶ Likelihood: P ( y | x , w ) = N ( y ; f ( x ; w ) , σ 2 ) ▶ Prior: P ( w ) = N ( w ; 0 , σ 2 w I ) ▶ Posterior: P ( w | y , x ) ∝ P ( y | x , w ) P ( w ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  8. What’s a Neural Network? Figure: A simple NN (left) and a BNN (right)[Blundell, 2015]. Probabilistic interpretation of NN: ▶ Model: y = f ( x ; w ) + ϵ , ϵ ∼ N (0 , σ 2 ) ▶ Likelihood: P ( y | x , w ) = N ( y ; f ( x ; w ) , σ 2 ) ▶ Prior: P ( w ) = N ( w ; 0 , σ 2 w I ) ▶ Posterior: P ( w | y , x ) ∝ P ( y | x , w ) P ( w ) ▶ MAP: w ⋆ = argmax w P ( w | y , x ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  9. What’s a Neural Network? Figure: A simple NN (left) and a BNN (right)[Blundell, 2015]. Probabilistic interpretation of NN: ▶ Model: y = f ( x ; w ) + ϵ , ϵ ∼ N (0 , σ 2 ) ▶ Likelihood: P ( y | x , w ) = N ( y ; f ( x ; w ) , σ 2 ) ▶ Prior: P ( w ) = N ( w ; 0 , σ 2 w I ) ▶ Posterior: P ( w | y , x ) ∝ P ( y | x , w ) P ( w ) ▶ MAP: w ⋆ = argmax w P ( w | y , x ) ▶ Prediction: y ′ = f ( x ′ ; w ⋆ ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  10. What’s a Bayesian Neural Network? Figure: A simple NN (left) and a BNN (right)[Blundell, 2015]. What do I mean by being Bayesian? ▶ Model: y = f ( x ; w ) + ϵ , ϵ ∼ N (0 , σ 2 ) ▶ Likelihood: P ( y | x , w ) = N ( y ; f ( x ; w ) , σ 2 ) ▶ Prior: P ( w ) = N ( w ; 0 , σ 2 w I ) ▶ Posterior: P ( w | y , x ) ∝ P ( y | x , w ) P ( w ) ▶ MAP: w ⋆ = argmax w P ( w | y , x ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  11. What’s a Bayesian Neural Network? Figure: A simple NN (left) and a BNN (right)[Blundell, 2015]. What do I mean by being Bayesian? ▶ Model: y = f ( x ; w ) + ϵ , ϵ ∼ N (0 , σ 2 ) ▶ Likelihood: P ( y | x , w ) = N ( y ; f ( x ; w ) , σ 2 ) ▶ Prior: P ( w ) = N ( w ; 0 , σ 2 w I ) ▶ Posterior: P ( w | y , x ) ∝ P ( y | x , w ) P ( w ) ▶ MAP: w ⋆ = argmax w P ( w | y , x ) ▶ Prediction: y ′ = f ( x ′ ; w ), w ∼ P ( w | y , x ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  12. Why Should We Care? Calibrated prediction uncertainty : The models should know what they don’t know. One Example: [Gal, 2017] ▶ We train a model to recognise dog breeds. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  13. Why Should We Care? Calibrated prediction uncertainty : The models should know what they don’t know. One Example: [Gal, 2017] ▶ We train a model to recognise dog breeds. ▶ What would you want your model to do when a cat are given? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  14. Why Should We Care? Calibrated prediction uncertainty : The models should know what they don’t know. One Example: [Gal, 2017] ▶ We train a model to recognise dog breeds. ▶ What would you want your model to do when a cat are given? ▶ A prediction with high uncertainty. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  15. Why Should We Care? Calibrated prediction uncertainty : The models should know what they don’t know. One Example: [Gal, 2017] ▶ We train a model to recognise dog breeds. ▶ What would you want your model to do when a cat are given? ▶ A prediction with high uncertainty. buffer Successful Applications: ▶ Identify adversarial examples [Smith, 2018]. ▶ Adapted exploration rate in RL [Gal, 2016]. ▶ Self-driving car [McAllister, 2017, Michelmore, 2018] and medican analysis [Gal, 2017]. buffer Self-driving car and medican analysis. buffer Self-driving car and medican analysis. buffer Self-driving car and medican analysis. buffer Self-driving car and medican analysis. buffer Self-driving car and medican analysis. . . . . . . . . . . . . . . . . . . . . One simple algorhthm: dropout as Bayesian approximation. . . . . . . . . . . . . . . . . . . . .

  16. How To Learn a Bayesian Neural Network? What’s the difficult part? ▶ P ( w | y , x ) is generally intractable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  17. How To Learn a Bayesian Neural Network? What’s the difficult part? ▶ P ( w | y , x ) is generally intractable ▶ Standard approximate inference (difficult): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  18. How To Learn a Bayesian Neural Network? What’s the difficult part? ▶ P ( w | y , x ) is generally intractable ▶ Standard approximate inference (difficult): ▶ Laplace Approximation [MacKay, 1992]; . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  19. How To Learn a Bayesian Neural Network? What’s the difficult part? ▶ P ( w | y , x ) is generally intractable ▶ Standard approximate inference (difficult): ▶ Laplace Approximation [MacKay, 1992]; ▶ Hamiltonian Monte Carlo [Neal, 1995]; . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  20. How To Learn a Bayesian Neural Network? What’s the difficult part? ▶ P ( w | y , x ) is generally intractable ▶ Standard approximate inference (difficult): ▶ Laplace Approximation [MacKay, 1992]; ▶ Hamiltonian Monte Carlo [Neal, 1995]; ▶ (Stochastic) Variational Inference [Blundell, 2015]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  21. How To Learn a Bayesian Neural Network? What’s the difficult part? ▶ P ( w | y , x ) is generally intractable ▶ Standard approximate inference (difficult): ▶ Laplace Approximation [MacKay, 1992]; ▶ Hamiltonian Monte Carlo [Neal, 1995]; ▶ (Stochastic) Variational Inference [Blundell, 2015]. ▶ Most of the algorithms above are complicated both in theory and in practice. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  22. How To Learn a Bayesian Neural Network? What’s the difficult part? ▶ P ( w | y , x ) is generally intractable ▶ Standard approximate inference (difficult): ▶ Laplace Approximation [MacKay, 1992]; ▶ Hamiltonian Monte Carlo [Neal, 1995]; ▶ (Stochastic) Variational Inference [Blundell, 2015]. ▶ Most of the algorithms above are complicated both in theory and in practice. ▶ A simple and pratical Bayesian neural network: dropout [Gal, 2016]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  23. Dropout as Bayesian Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  24. Dropout as Bayesian Approximation Dropout works by randomly setting network units to zero. We can obtain the distribution of prediction by repeating forward passing several times. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend