WESLEY MADDOX
JOINT WORK WITH TIMUR GARIPOV , PAVEL IZMAILOV , DMITRY VETROV , ANDREW GORDON WILSON
FAST UNCERTAINTY ESTIMATES AND BAYESIAN MODEL AVERAGING OF DNNS
1
FAST UNCERTAINTY ESTIMATES AND BAYESIAN MODEL AVERAGING OF DNNS - - PowerPoint PPT Presentation
1 FAST UNCERTAINTY ESTIMATES AND BAYESIAN MODEL AVERAGING OF DNNS WESLEY MADDOX JOINT WORK WITH TIMUR GARIPOV , PAVEL IZMAILOV , DMITRY VETROV , ANDREW GORDON WILSON 2 SUMMARY Stochastic Weight Averaging (Izmailov et al, UAI,
1
2
3
4
5
6
7
8
9
10
11
i=1
i − θ2 swa
See also Liu et al, 2018, UDL Workshop
12
p(y∗|y) ≈ 1 K
K
X
i=1
p(y∗|θi), θi ∼ qSW AG(θ|y)
<latexit sha1_base64="mFD65uXY6kdITXtbzipmANluiJk=">ACRHicbVDLThsxFPXQ8gqvtF2ysRohJQhFMwgJNkiBLorEhgpCkDLJyON4Eiv2jLHvVIyG+bhu+gHd8QVsWICqbiucBxKvI1k6Oufca/uESnADrnvjzHz4ODs3v7BYWlpeWV0rf/p8bpJU9akiUj0RUgMEzxmTeAg2IXSjMhQsFY4/DbyWz+ZNjyJzyBTrCNJP+YRpwSsFJTbqp1N6+zGvaJUjq5wn6kCc29Ij8usG9SGeR83yu6x3iS9GHAgAS8toWfqI1xiS+D/LR18L2oTmS7MihX3Lo7Bn5LvCmpoClOgvIfv5fQVLIYqCDGtD1XQScnGjgVrCj5qWGK0CHps7alMZHMdPJxCQXesEoPR4m2JwY8Vp9P5EQak8nQJiWBgXntjcT3vHYK0V4n57FKgcV0clGUCgwJHjWKe1wzCiKzhFDN7VsxHRBbItjeS7YE7/WX35Lz7brn1r0fO5XG4bSOBbSOvqIq8tAuaqAjdIKaiKJf6Bbdowfnt3Pn/HX+TaIznTmC3oB5/8jEP2w5w=</latexit><latexit sha1_base64="mFD65uXY6kdITXtbzipmANluiJk=">ACRHicbVDLThsxFPXQ8gqvtF2ysRohJQhFMwgJNkiBLorEhgpCkDLJyON4Eiv2jLHvVIyG+bhu+gHd8QVsWICqbiucBxKvI1k6Oufca/uESnADrnvjzHz4ODs3v7BYWlpeWV0rf/p8bpJU9akiUj0RUgMEzxmTeAg2IXSjMhQsFY4/DbyWz+ZNjyJzyBTrCNJP+YRpwSsFJTbqp1N6+zGvaJUjq5wn6kCc29Ij8usG9SGeR83yu6x3iS9GHAgAS8toWfqI1xiS+D/LR18L2oTmS7MihX3Lo7Bn5LvCmpoClOgvIfv5fQVLIYqCDGtD1XQScnGjgVrCj5qWGK0CHps7alMZHMdPJxCQXesEoPR4m2JwY8Vp9P5EQak8nQJiWBgXntjcT3vHYK0V4n57FKgcV0clGUCgwJHjWKe1wzCiKzhFDN7VsxHRBbItjeS7YE7/WX35Lz7brn1r0fO5XG4bSOBbSOvqIq8tAuaqAjdIKaiKJf6Bbdowfnt3Pn/HX+TaIznTmC3oB5/8jEP2w5w=</latexit><latexit sha1_base64="mFD65uXY6kdITXtbzipmANluiJk=">ACRHicbVDLThsxFPXQ8gqvtF2ysRohJQhFMwgJNkiBLorEhgpCkDLJyON4Eiv2jLHvVIyG+bhu+gHd8QVsWICqbiucBxKvI1k6Oufca/uESnADrnvjzHz4ODs3v7BYWlpeWV0rf/p8bpJU9akiUj0RUgMEzxmTeAg2IXSjMhQsFY4/DbyWz+ZNjyJzyBTrCNJP+YRpwSsFJTbqp1N6+zGvaJUjq5wn6kCc29Ij8usG9SGeR83yu6x3iS9GHAgAS8toWfqI1xiS+D/LR18L2oTmS7MihX3Lo7Bn5LvCmpoClOgvIfv5fQVLIYqCDGtD1XQScnGjgVrCj5qWGK0CHps7alMZHMdPJxCQXesEoPR4m2JwY8Vp9P5EQak8nQJiWBgXntjcT3vHYK0V4n57FKgcV0clGUCgwJHjWKe1wzCiKzhFDN7VsxHRBbItjeS7YE7/WX35Lz7brn1r0fO5XG4bSOBbSOvqIq8tAuaqAjdIKaiKJf6Bbdowfnt3Pn/HX+TaIznTmC3oB5/8jEP2w5w=</latexit><latexit sha1_base64="mFD65uXY6kdITXtbzipmANluiJk=">ACRHicbVDLThsxFPXQ8gqvtF2ysRohJQhFMwgJNkiBLorEhgpCkDLJyON4Eiv2jLHvVIyG+bhu+gHd8QVsWICqbiucBxKvI1k6Oufca/uESnADrnvjzHz4ODs3v7BYWlpeWV0rf/p8bpJU9akiUj0RUgMEzxmTeAg2IXSjMhQsFY4/DbyWz+ZNjyJzyBTrCNJP+YRpwSsFJTbqp1N6+zGvaJUjq5wn6kCc29Ij8usG9SGeR83yu6x3iS9GHAgAS8toWfqI1xiS+D/LR18L2oTmS7MihX3Lo7Bn5LvCmpoClOgvIfv5fQVLIYqCDGtD1XQScnGjgVrCj5qWGK0CHps7alMZHMdPJxCQXesEoPR4m2JwY8Vp9P5EQak8nQJiWBgXntjcT3vHYK0V4n57FKgcV0clGUCgwJHjWKe1wzCiKzhFDN7VsxHRBbItjeS7YE7/WX35Lz7brn1r0fO5XG4bSOBbSOvqIq8tAuaqAjdIKaiKJf6Bbdowfnt3Pn/HX+TaIznTmC3oB5/8jEP2w5w=</latexit>et al 2015)
13 Method ECE Laplace 0.7604 SWA 0.7650 SWAG-Diagonal 0.7093 SWAG 0.6001
VGG16 on CIFAR100.
14
▸
Check out our poster…
15
TEXT
▸ X. Chen, J. D. Lee, X. T. Tong, and Y. Zhang. Statistical Inference for Model Parameters in
Stochastic Gradient Descent. arXiv: 1610.08637, Oct. 2016.
▸ P. Izmailov, D. Podoprikhin, T. Garipov, D. Vetrov, and A. G. Wilson. Averaging Weights Leads to
Wider Optima and Better Generalization. In UAI , 2018.
▸ J. Liu, S. Tripathi, U. Kurup, M. Shah, Make (Nearly) Every Neural Network Better: Generating
Neural Network Ensembles by Weight Parameter Resampling. In UAI Workshop on Uncertainty in Deep Learning, 2018.
▸ M. P. Naeini, G. F. Cooper, and M. Hauskrecht. Obtaining well calibrated probabilities using
bayesian binning. In AAAI , pages 2901–2907, 2015
▸ B. T. Polyak and A. B. Juditsky. Acceleration on Stochastic Approximation by Averaging. SIAM
Journal on Control and Optimization , 30(4):838–855, July 1992.
▸ D. Ruppert. Efficient Estimators from a Slowly Convergent Robbins-Munro Process. Technical
Report 781, Cornell University, School of Operations Research and Industrial Engineering, 1988.
16
TEXT
17
T
i=1
ii XX′H−1 ii )
19
From Naeini et al 2015, also Guo et al ICML 2017 “On Calibration of Modern Neural Networks”