How Good is the Bayes Posterior in Deep Neural Networks Really? - PowerPoint PPT Presentation

Code: github.com/google-research/google-research/tree/ master/cold_posterior_bnn How Good is the Bayes Posterior in Deep Neural Networks Really? Florian Wenzel (Google Research Berlin) Joint first authors: Kevin Roth, Bas Veeling, and: Jakub Swiatkowski, Linh Tran, Stephan Mandt, Jasper Snoek, Tim Salimans, Rodolphe Jenatton, Sebastian Nowozin Florian Wenzel, 15 June 2020

Bayesian Deep Learning Goal: enable Bayesian inference for deep networks to improve robustness of predictions! Active research field where most work focuses on improving approximate inference to get closer to the Bayes posterior 2 Florian Wenzel, 15 June 2020

But is the Bayes posterior actually good? 3 Florian Wenzel, 15 June 2020

<latexit sha1_base64="rTwF5qKwJnpd9PmqZdb1cRT7ZVM=">AB/HicbVC7TsMwFHV4lvIKdGSJqJCYqoSHYKxgYSwSfUhNVDmO01p17Mi+Qaqi8isDCDEyoew8Tc4bQZoOZLlo3PulY9PmHKmwXW/rZXVtfWNzcpWdXtnd2/fPjsaJkpQtEcql6IdaUM0HbwIDTXqoTkJOu+H4tvC7j1RpJsUDTFIaJHgoWMwIBiMN7JofSh7pSWKu3IcRBTwd2HW34c7gLBOvJHVUojWwv/xIkiyhAgjHWvc9N4UgxwoY4XRa9TNU0zGeEj7hgqcUB3ks/BT58QokRNLZY4AZ6b+3shxot8ZjLBMNKLXiH+5/UziK+DnIk0AyrI/KE4w5Ip2jCiZiBPjEwUM1kdMsIKEzB9VU0J3uKXl0nrOGdNy7vL+rNm7KOCjpCx+gUegKNdEdaqE2ImiCntErerOerBfr3fqYj65Y5U4N/YH1+QOnQZVt</latexit> <latexit sha1_base64="IY1J2jvCqr5IEA9S/aVNBI46QY=">ACSXicbVDLSgMxFM20Pmp9V26CRahBSkzPlAQoagLlxXsAzplzKRpG5p5kNwRy9jfc+POnf/gxoUirsy0XWjbAyGHc+5N7j1uKLgC03wzUumFxaXlzEp2dW19YzO3tV1TQSQpq9JABLhEsUE91kVOAjWCUjnitY3e1fJX79gUnFA/8OBiFreaTr8w6nBLTk5O7Dgu0Goq0Gnr5iG3oMyPA2x6BHiUivh7iIr7AYWHg8KdHh2trtr6I7XM896Wik8ubJXMEPEusCcmjCSpO7tVuBzTymA9UEKWalhlCKyYSOBVsmLUjxUJC+6TLmpr6xGOqFY+SGOJ9rbRxJ5D6+IBH6t+OmHgqGVBXJguqaS8R53nNCDpnrZj7YQTMp+OPOpHAEOAkVtzmklEQA0IlVzPimPSEJBh5/VIVjTK8+S2mHJOiqd3B7ny5eTODJoF+2hArLQKSqjG1RBVUTRM3pHn+jLeDE+jG/jZ1yaMiY9O+gfUulfRdCzKA=</latexit> <latexit sha1_base64="WGOL1vxHGku1eDCobZodwAJrpgc=">ACEHicbVDLSsNAFJ34rPUVdelmsIh1UxIf6EYo6sJlBfuAJpTJdNIOnTyYuRFKzCe48VfcuFDErUt3/o2TtgtvTDM4Zx7ueceLxZcgWV9G3PzC4tLy4WV4ura+samubXdUFEiKavTSESy5RHFBA9ZHTgI1olI4EnWNMbXOV685JxaPwDoYxcwPSC7nPKQFNdcyDuOwEBPqUiPQ6w/Y8SLRVcNAf6kDfQYkO8QXuGOWrIo1KjwL7AkoUnVOuaX041oErAQqCBKtW0rBjclEjgVLCs6iWIxoQPSY20NQxIw5ajgzK8r5ku9iOpXwh4xP6eSEmgco+6MzevprWc/E9rJ+CfuykP4wRYSMeL/ERgiHCeDu5ySiIoQaESq69YtonklDQGRZ1CPb0ybOgcVSxjyuntyel6uUkjgLaRXuojGx0hqroBtVQHVH0iJ7RK3oznowX4934GLfOGZOZHfSnjM8fHtWcoQ=</latexit> Bayesian Neural Networks (BNNs) Input Output Hidden Neural Network ) = p ( y i | x i , θ ) p ( D| θ ) = Different models obtained by different θ 4 Florian Wenzel, 15 June 2020

<latexit sha1_base64="wiljDcQFTYfNdIHzdnPtl0J/I+0=">ACDHicbVDLSsNAFJ34rPVdelmsAh1UxIf6LKoC5cV7AOaUCaTSTt08mDmRigxH+DGX3HjQhG3foA7/8ZJm4W2HhjmcO653HuPGwuwDS/jYXFpeWV1dJaeX1jc2u7srPbVlEiKWvRSESy6xLFBA9ZCzgI1o0lI4ErWMcdXeX1zj2TikfhHYxj5gRkEHKfUwJa6leqc12I+GpcaC/1IYhA5I9YDsgMKREpNfZkXaZdXMCPE+sglRgWa/8mV7EU0CFgIVRKmeZcbgpEQCp4JlZTtRLCZ0RAasp2lIAqacdHJMhg+14mE/kvqFgCfq746UBCrfVjvzHdVsLRf/q/US8C+clIdxAiyk0F+IjBEOE8Ge1wyCmKsCaGS610xHRJKOj8yjoEa/bkedI+rlsn9bPb02rjsoijhPbRAaohC52jBrpBTdRCFD2iZ/SK3own48V4Nz6m1gWj6NlDf2B8/gBvL5vc</latexit> <latexit sha1_base64="IY1J2jvCqr5IEA9S/aVNBI46QY=">ACSXicbVDLSgMxFM20Pmp9V26CRahBSkzPlAQoagLlxXsAzplzKRpG5p5kNwRy9jfc+POnf/gxoUirsy0XWjbAyGHc+5N7j1uKLgC03wzUumFxaXlzEp2dW19YzO3tV1TQSQpq9JABLhEsUE91kVOAjWCUjnitY3e1fJX79gUnFA/8OBiFreaTr8w6nBLTk5O7Dgu0Goq0Gnr5iG3oMyPA2x6BHiUivh7iIr7AYWHg8KdHh2trtr6I7XM896Wik8ubJXMEPEusCcmjCSpO7tVuBzTymA9UEKWalhlCKyYSOBVsmLUjxUJC+6TLmpr6xGOqFY+SGOJ9rbRxJ5D6+IBH6t+OmHgqGVBXJguqaS8R53nNCDpnrZj7YQTMp+OPOpHAEOAkVtzmklEQA0IlVzPimPSEJBh5/VIVjTK8+S2mHJOiqd3B7ny5eTODJoF+2hArLQKSqjG1RBVUTRM3pHn+jLeDE+jG/jZ1yaMiY9O+gfUulfRdCzKA=</latexit> Bayesian Neural Networks (BNNs) Input Output Hidden Bayesian Neural Network p ( θ , D ) = p ( y i | x i , θ ) p ( θ ) Posterior: Distribution over likely models given the data p ( θ |D ) 5 Florian Wenzel, 15 June 2020

<latexit sha1_base64="fXqnuC6GFqFpqTVuYlqBxuVUNsM=">ACXHicbVHLSgMxFM2MVfuwOiq4cRMsQoUyzLSKLou6cFnBPqBTSiZNbWjmQXJHKGN/0l03/opm2i607YGQw7n3ck9O/FhwBY6zMy93P7BYb5QLB2Vj0+s07OihJWZtGIpI9nygmeMjawEGwXiwZCXzBuv70Kat3P5hUPArfYBazQUDeQz7mlICWhpby/EiM1CzQV+rBhAGZD90a3iXd8uNGrZtG3uKBziu7ujAn9gLCEwoEenz/AYPrYpjO0vgbeKuSQWt0RpaX94oknAQqCKNV3nRgGKZHAqWDzopcoFhM6Je+sr2lIAqYG6TKcOb7WygiPI6lPCHip/p1ISaAyv7ozM6k2a5m4q9ZPYPwSHkYJ8BCulo0TgSGCGdJ4xGXjIKYaUKo5NorphMiCQX9H0Udgrv5G3Sqdtuw757va0H9dx5NElukJV5KJ71EQvqIXaiKIF+jHyRsH4NnNmySyvWk1jPXO/sG8+AWhaLY/</latexit> <latexit sha1_base64="uJWKBoYxcgei08sye4qA0oB9Su0=">ACD3icbVC7TgMxEPTxDOEVoKSxiEBJQXTHQ1BG0FCRApF0U+Z5NY8Z0tew8RnfIHNPwKDQUI0dLS8Tc4IQUQRrI8mtnV7k6kpbDo+5/e1PTM7Nx8biG/uLS8slpYW7+yKjUcalxJZW4iZkGKBGoUMKNsDiSMJ1Dsd+te3YKxQySX2NTRi1klEW3CGTmoWdkJtlEZFQ7jTtLRbK4WRki3bj92XhdgFZINyuVko+hV/BDpJgjEpkjHOm4WPsKV4GkOCXDJr64GvsZExg4JLGOTD1IJmvMc6UHc0YTHYRja6Z0C3ndKibWXcS5CO1J8dGYvtcENXGTPs2r/eUPzPq6fYPm5kItEpQsK/B7VTSd39w3BoSxjgKPuOMG6E25XyLjOMo4sw70I/p48Sa72KsF+5fDioFg9GceRI5tki5RIQI5IlZyRc1IjnNyTR/JMXrwH78l79d6+S6e8c8G+QXv/QtTmJw4</latexit> BNNs: Predictions In standard deep learning we optimize θ SGD (MAP) BNNs use samples from the posterior (ensemble of models) θ 1 , θ 2 , θ 3 , ... ∼ p ( θ |D ) ∝ exp( − U ( θ )) 6 Florian Wenzel, 15 June 2020

<latexit sha1_base64="aOGT/zEJg6itkXTtYS7WSQOTsjM=">ACFnicbVDLSgMxFM3UV62vUZdugkVQ0DLjA12KblxWsA/olCGTpm1oZhKSO2IZ+xVu/BU3LhRxK+78G9PHQq0HQg7n3Mu90RKcAOe9+XkZmbn5hfyi4Wl5ZXVNXd9o2pkqimrUCmkrkfEMETVgEOgtWVZiSOBKtFvcuhX7tl2nCZ3EBfsWZMOglvc0rASqF7EBCltLwLTBqHBqvdPr7Hd/s4iKRomX5svyALgMyCM1e6Ba9kjcCnib+hBTRBOXQ/QxakqYxS4AKYkzD9xQ0M6KBU8EGhSA1TBHaIx3WsDQhMTPNbHTWAO9YpYXbUtuXAB6pPzsyEpvhirYyJtA1f72h+J/XSKF91sx4olJgCR0PaqcCg8TDjHCLa0ZB9C0hVHO7K6ZdogkFm2TBhuD/PXmaVA9L/lHp5Pq4eH4xiSOPtA2kU+OkXn6AqVUQVR9ICe0At6dR6dZ+fNeR+X5pxJzyb6BefjG31un5Y=</latexit> <latexit sha1_base64="fXqnuC6GFqFpqTVuYlqBxuVUNsM=">ACXHicbVHLSgMxFM2MVfuwOiq4cRMsQoUyzLSKLou6cFnBPqBTSiZNbWjmQXJHKGN/0l03/opm2i607YGQw7n3ck9O/FhwBY6zMy93P7BYb5QLB2Vj0+s07OihJWZtGIpI9nygmeMjawEGwXiwZCXzBuv70Kat3P5hUPArfYBazQUDeQz7mlICWhpby/EiM1CzQV+rBhAGZD90a3iXd8uNGrZtG3uKBziu7ujAn9gLCEwoEenz/AYPrYpjO0vgbeKuSQWt0RpaX94oknAQqCKNV3nRgGKZHAqWDzopcoFhM6Je+sr2lIAqYG6TKcOb7WygiPI6lPCHip/p1ISaAyv7ozM6k2a5m4q9ZPYPwSHkYJ8BCulo0TgSGCGdJ4xGXjIKYaUKo5NorphMiCQX9H0Udgrv5G3Sqdtuw757va0H9dx5NElukJV5KJ71EQvqIXaiKIF+jHyRsH4NnNmySyvWk1jPXO/sG8+AWhaLY/</latexit> BNNs: Predictions Predict by using an average of models X p ( y | x, θ s ) ≈ θ SGD (MAP) s θ 1 , θ 2 , θ 3 , ... ∼ p ( θ |D ) In this talk: A model is good if it predicts well (e.g. low cross entropy loss) 7 Florian Wenzel, 15 June 2020

Bayesian Neural Networks (BNNs) Promises of BNNs * : • Robustness in generalization • Better uncertainty quantification (calibration) • Enables new deep learning applications (continual learning, sequential decision making, …) * [e.g., Neal 1995, Gal et al. 2016, Wilson 2019, Ovadia et al. 2019]. 8 Florian Wenzel, 15 June 2020

Bayesian Neural Networks (BNNs) But in practice BNNs are rarely used! 9 Florian Wenzel, 15 June 2020

Bayesian Neural Networks (BNNs) In practice: • Often, the Bayes posterior is worse than SGD point estimates • But Bayes predictions can be improved by the use of the Cold Posterior* For temperature T<1: We sharpen the posterior (over-count evidence) *Explicitly (or implicitly) used by most recent Bayesian DL papers [e.g., Li et al. 2016, Zhang et al. 2020, Ashukha et al. 2020]. 10 Florian Wenzel, 15 June 2020

Bayesian Neural Networks (BNNs) θ Cold Posterior For temperature T<1: We sharpen the posterior (over-count evidence) 11 Florian Wenzel, 15 June 2020

ResNet-20 / CIFAR-10 True Bayes posterior Optimal cold posterior CNN-LSTM / IMDB 12 Florian Wenzel, 15 June 2020

The cold posterior sharply deviates from the Bayesian paradigm. What is the use of more accurate posterior approximations if the posterior is poor? 13 Florian Wenzel, 15 June 2020

Our paper: Hypothesis for the origin of the improved performance of cold posteriors Inference Likelihood Prior Inaccurate SDE Simulation? Dirty likelihoods? Current priors used for BNN parameters are (batch-normalization, poor? Bias of SG-MCMC? dropout, data augmentation) The effect becomes Minibatch noise (which is stronger with increasing not Gaussian)? model depths and capacity? Bias-variance tradeoff induced by cold posterior? 14 Florian Wenzel, 15 June 2020

How Good is the Bayes Posterior in Deep Neural Networks Really? - PowerPoint PPT Presentation

Code: github.com/google-research/google-research/tree/ master/cold_posterior_bnn How Good is the Bayes Posterior in Deep Neural Networks Really? Florian Wenzel (Google Research Berlin) Joint first authors: Kevin Roth, Bas Veeling, and: Jakub

A O I Posterior View A O I Posterior View A O I

Section 33: Hip Structural Components 33-1 posterior posterior anterior anterior head of

Naive Bayes and Gaussian Bayes Classifier Ladislav Rampasek slides by Mengye Ren and others

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Posterior Lower Body 54b Deep Tissue: Technique Demo and Practice - Posterior Lower Body

The Nave Bayes Classifier Machine Learning 1 Todays lecture The nave Bayes Classifier

Bayes Theorem Thomas Bayes (1701-1761) Simple form of Bayes Theorem, for

Deep Learning with Neural Networks The Structure and Optimization of Deep Neural Networks Allan

Introduction to Artificial Intelligence Neural Networks - Deep Learning for NLP Janyl Jumadinova

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

91b Deep Massage: Technique Demo and Practice - Anterior and Posterior Legs 91b Deep Massage:

53b Deep Tissue: Posterior Upper Body - Technique Demo and Practice 53b Deep Tissue:

Optimizing Deep Neural Networks Leena Chennuru Vankadara 26-10-2015 Table of Contents Neural

TPC electronics calibration with pulser in cold box data BNL DUNE David Adams BNL June 20,

Spatial and Temporal Knowledge Representation Antony Galton University of Exeter, UK PART II:

Challenges and R&D for DAQ in Particle Physics Experiment Kai Chen With input from many

RSA and Factorization Sourav Sen Gupta Indian Statistical Institute, Kolkata About this talk

Experimental implementation of UFAD regulation based on robust controlled invariance Pierre-Jean

Overview of Non-Energy Impacts and Energy Optimization in the 2021-2023 Plan NH Benefit Cost

Faster Evaluation of Subtraction Games David Eppstein 9th International Conference on Fun With

Second Quarter 2016 Results & Outlook July 28, 2016 Jackson Generating Station Ludington

Sambuz

Useful Links

Newsletter

Mail Us

How Good is the Bayes Posterior in Deep Neural Networks Really? - PowerPoint PPT Presentation

Code: github.com/google-research/google-research/tree/ master/cold_posterior_bnn How Good is the Bayes Posterior in Deep Neural Networks Really? Florian Wenzel (Google Research Berlin) Joint first authors: Kevin Roth, Bas Veeling, and: Jakub

A O I Posterior View A O I Posterior View A O I

Section 33: Hip Structural Components 33-1 posterior posterior anterior anterior head of

Naive Bayes and Gaussian Bayes Classifier Ladislav Rampasek slides by Mengye Ren and others

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Posterior Lower Body 54b Deep Tissue: Technique Demo and Practice - Posterior Lower Body

The Nave Bayes Classifier Machine Learning 1 Todays lecture The nave Bayes Classifier

Bayes Theorem Thomas Bayes (1701-1761) Simple form of Bayes Theorem, for

Deep Learning with Neural Networks The Structure and Optimization of Deep Neural Networks Allan

Introduction to Artificial Intelligence Neural Networks - Deep Learning for NLP Janyl Jumadinova

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

91b Deep Massage: Technique Demo and Practice - Anterior and Posterior Legs 91b Deep Massage:

53b Deep Tissue: Posterior Upper Body - Technique Demo and Practice 53b Deep Tissue:

Optimizing Deep Neural Networks Leena Chennuru Vankadara 26-10-2015 Table of Contents Neural

TPC electronics calibration with pulser in cold box data BNL DUNE David Adams BNL June 20,

Spatial and Temporal Knowledge Representation Antony Galton University of Exeter, UK PART II:

Challenges and R&amp;D for DAQ in Particle Physics Experiment Kai Chen With input from many

RSA and Factorization Sourav Sen Gupta Indian Statistical Institute, Kolkata About this talk

Experimental implementation of UFAD regulation based on robust controlled invariance Pierre-Jean

Overview of Non-Energy Impacts and Energy Optimization in the 2021-2023 Plan NH Benefit Cost

Faster Evaluation of Subtraction Games David Eppstein 9th International Conference on Fun With

Second Quarter 2016 Results &amp; Outlook July 28, 2016 Jackson Generating Station Ludington

Sambuz

Useful Links

Newsletter

Mail Us

Challenges and R&D for DAQ in Particle Physics Experiment Kai Chen With input from many

Second Quarter 2016 Results & Outlook July 28, 2016 Jackson Generating Station Ludington