Beyond temperature scaling: Obtaining well-calibrated multiclass - PowerPoint PPT Presentation

Beyond temperature scaling: Obtaining well-calibrated multiclass probabilities with Dirichlet calibration Meelis Kull, Miquel Perello Nieto, Markus Kängsepp, Telmo Silva Filho, Hao Song, Peter Flach NeurIPS 2019

Contributions ● New parametric calibration method: ● New regularization method for matrix scaling (and for Dirichlet calibration): ODIR – Off-Diagonal and Intercept Regularisation ● Multi-class classifier evaluation: Confidence-calibrated Confidence-reliability diagram Confidence-ECE Classwise-calibrated Classwise-reliability diagrams Classwise-ECE Multiclass-calibrated

Making classifiers more trustworthy

Making classifiers more trustworthy a classifier with 60% accuracy on a set of instances

Making classifiers more trustworthy a classifier with 60% accuracy if the classifier reports class probabilities on a set of instances

Making classifiers more trustworthy a classifier with 60% accuracy if the classifier reports class probabilities on a set of instances then we get instance-specific

Trustworthy if confidence-calibrated

Trustworthy if confidence-calibrated Confidence-calibrated:

Deep nets are usually over-confident Experimental setup: CIFAR-10 Confidence-calibrated: ResNet Wide 32 Accuracy: Overall: 94% At 90% confidence: 58%

Example: uncalibrated predictions Experimental setup: CIFAR-10 Confidence-calibrated: ResNet Wide 32 Accuracy: Overall: 94% At 90% confidence: 58%

Example: after calibration with temperature scaling Experimental setup: CIFAR-10 Confidence-calibrated: ResNet Wide 32 Accuracy: Overall: 94% At 90% confidence: 58% Accuracy after Temp.Scal: Overall: 94% At 90% confidence: 88%

Example: after calibration with temperature scaling Experimental setup: CIFAR-10 Confidence-calibrated: ResNet Wide 32 Accuracy: Overall: 94% Classwise-calibrated: At 90% confidence: 58% Accuracy after Temp.Scal: Overall: 94% At 90% confidence: 88% At 90% class 2 prob: 76%

Example: after calibration with Dirichlet calibration Experimental setup: CIFAR-10 Confidence-calibrated: ResNet Wide 32 Accuracy: Overall: 94% Classwise-calibrated: At 90% confidence: 58% Accuracy after Temp.Scal: Overall: 94% At 90% confidence: 88% At 90% class 2 prob: 76% Accuracy after Dir.Calib: At 90% class 2 prob: 90%

How to calibrate a multi-class classifier? logits class probabilities Last hidden layer Input layer ANY Softmax FEED- FORWARD NETWORK

Temperature scaling frozen logits scaled logits class probabilities Temperature scaling Last hidden layer Input layer ANY Softmax FEED- FORWARD NETWORK Parameters: C. Guo, G. Pleiss, Y. Sun, and K. Q. Weinberger. On Calibration of Modern Neural Networks. ICML 2017

Vector scaling frozen logits scaled logits class probabilities Last hidden layer Vector scaling Input layer ANY Softmax FEED- FORWARD NETWORK Parameters: C. Guo, G. Pleiss, Y. Sun, and K. Q. Weinberger. On Calibration of Modern Neural Networks. ICML 2017

Matrix scaling frozen logits scaled logits class probabilities Last hidden layer Matrix scaling Input layer ANY Softmax FEED- FORWARD NETWORK Parameters: C. Guo, G. Pleiss, Y. Sun, and K. Q. Weinberger. On Calibration of Modern Neural Networks. ICML 2017

Dirichlet calibration can calibrate any classifiers class probabilities ANY PROBABILISTIC MULTI-CLASS CLASSIFIER

Parametric calibration methods Logit space Class probability space Derived from Gaussian distribution Derived from Beta distribution Binary Platt scaling [1] Beta calibration [2] classification (+ constrained variants) Multi-class classification [1] J. Platt. Probabilities for SV machines. In Advances in Large Margin Classifiers, pages 61–74, MIT Press, 2000. [2] M. Kull, T. Silva Filho, P. Flach. Beta calibration: a well-founded and easily implemented improvement on logistic calibration for binary classifiers. AISTATS 2017

Parametric calibration methods Logit space Class probability space Derived from Gaussian distribution Derived from Beta distribution Binary Platt scaling [1] Beta calibration [2] classification (+ constrained variants) Derived from Dirichlet distribution Multi-class Dirichlet calibration classification (+ constrained variants) [1] J. Platt. Probabilities for SV machines. In Advances in Large Margin Classifiers, pages 61–74, MIT Press, 2000. [2] M. Kull, T. Silva Filho, P. Flach. Beta calibration: a well-founded and easily implemented improvement on logistic calibration for binary classifiers. AISTATS 2017

Parametric calibration methods Logit space Class probability space Derived from Gaussian distribution Derived from Beta distribution Binary Platt scaling [1] Beta calibration [2] classification (+ constrained variants) Derived from Dirichlet distribution Multi-class Matrix scaling [3] Dirichlet calibration classification (+ vector scaling, temperature scaling) (+ constrained variants) [1] J. Platt. Probabilities for SV machines. In Advances in Large Margin Classifiers, pages 61–74, MIT Press, 2000. [2] M. Kull, T. Silva Filho, P. Flach. Beta calibration: a well-founded and easily implemented improvement on logistic calibration for binary classifiers. AISTATS 2017 [3] C. Guo, G. Pleiss, Y. Sun, and K. Q. Weinberger. On Calibration of Modern Neural Networks. ICML 2017

Dirichlet calibration Fully-connected linear Log-transform Softmax ANY PROBABILISTIC MULTI-CLASS CLASSIFIER Parameters: Regularisation: • L2 • ODIR (Off-Diagonal and Intercept Regularisation)

Non-neural experiments ● 21 datasets x 11 classifiers = 231 settings ● Average rank ○ Classwise-ECE ○ Log-loss ○ Error rate

Which classifiers are calibrated?

Deep Neural Networks Experiments: Settings - 3 datasets: CIFAR-10, CIFAR-100, SVHN - 11 convolutional NNs + 3 pretrained

Neural experiments ● Datasets: CIFAR-10, CIFAR-100, SVHN ● 11 CNNs trained as in Guo et al + 3 pretrained Log-loss Classwise-ECE

Conclusion 1. Dirichlet calibration: New parametric general-purpose multiclass calibration method a. Natural extension of two-class Beta calibration b. Easy to implement with multinomial logistic regression on log-transformed class probabilities 2. Best or tied best performance with 21 datasets x 11 classifiers 3. Advances state-of-the-art on Neural Networks by introducing ODIR regularisation

Beyond temperature scaling: Obtaining well-calibrated multiclass probabilities with Dirichlet calibration Meelis Kull, Miquel Perello Nieto, Markus Kängsepp, Telmo Silva Filho, Hao Song, Peter Flach NeurIPS 2019

Beyond temperature scaling: Obtaining well-calibrated multiclass - PowerPoint PPT Presentation

Beyond temperature scaling: Obtaining well-calibrated multiclass probabilities with Dirichlet calibration Meelis Kull, Miquel Perello Nieto, Markus Kngsepp, Telmo Silva Filho, Hao Song, Peter Flach NeurIPS 2019 Contributions New

Outline Scaling Scalinga Plenitude of Power Laws Scaling-at-large Scaling-at-large

UP UP AND OUT: SCALING SOFTWARE WITH AKKA Jonas Bonr CTO Typesafe @jboner Scaling software

Temperature Measurements What is Temperature ? What is Temperature ? Temperature: A measure

Temperature & Thermal Expansion Temperature Zeroth Law of Thermodynamics Temperature

Analysis of Scaling Algorithms for Matrix & Operator Scaling Contents Scaling Algorithms

On-demand radio imaging On-demand radio imaging access to calibrated data for all astronomers

Calibrated Bayes, and Inferential Paradigm for Of7icial Statistics in the Era of Big Data Rod

Convex Calibrated Surrogates for Low-Rank Loss Matrices with Applications to Subset Ranking

The Calibrated Bayes Factor for Model Comparison Steve MacEachern The Ohio State University

PLURISUBHARMONICITY and PSEUDOCONVEXITY IN CALIBRATED (and other) GEOMETRIES with REESE HARVEY

Temperature Gradient System (TemGraS) Jelena Maricic, YujingSun, Radovan Milincic University of

Effectively Scaling Effectively Scaling up/universalizing exclusive up/universalizing exclusive

Scaling From simple models to rich strategies PPPLab Day, November 30th Scaling: recent

Outline Scalinga Plenitude of Power Laws Scaling-at-large Scaling-at-large Principles of

Temperature and Heat How to Measure Temperature? What is temperature? Fahrenheit (US)

Looking Beyond the Knob Looking Beyond the Knob Looking Beyond the Knob Looking Beyond the Knob

Camera Calibration COMPSCI 527 Computer Vision COMPSCI 527 Computer Vision Camera

Camera Calibration Camera Calibration Steve Steve Seitz Seitz Carnegie Mellon University

THE NUMBER EIGHT IN TOPOLOGY Andrew Ranicki (Edinburgh) http://www.maths.ed.ac.uk/

3Q FY2016/17 Financial Results Presentation 26 January 2017 S Y D N E Y M E L B O U R N

Robotic Agents (CMPSC 311) Calibration Janyl Jumadinova September 10, 2019 Janyl Jumadinova

When does label smoothing help? Rafael Mller, Simon Kornblith, Geofgrey Hinton Label smoothing

of Calibration by Lars Peter Hansen and James J. Heckman Journal of Economic Perspectives, Vol.

Calibration of hitting probabilities via multilevel splitting Ioannis Phinikettos Axel Gandy

Beyond temperature scaling: Obtaining well-calibrated multiclass - PowerPoint PPT Presentation

Beyond temperature scaling: Obtaining well-calibrated multiclass probabilities with Dirichlet calibration Meelis Kull, Miquel Perello Nieto, Markus Kngsepp, Telmo Silva Filho, Hao Song, Peter Flach NeurIPS 2019 Contributions New

Outline Scaling Scalinga Plenitude of Power Laws Scaling-at-large Scaling-at-large

UP UP AND OUT: SCALING SOFTWARE WITH AKKA Jonas Bonr CTO Typesafe @jboner Scaling software

Temperature Measurements What is Temperature ? What is Temperature ? Temperature: A measure

Temperature &amp; Thermal Expansion Temperature Zeroth Law of Thermodynamics Temperature

Analysis of Scaling Algorithms for Matrix &amp; Operator Scaling Contents Scaling Algorithms

On-demand radio imaging On-demand radio imaging access to calibrated data for all astronomers

Calibrated Bayes, and Inferential Paradigm for Of7icial Statistics in the Era of Big Data Rod

Convex Calibrated Surrogates for Low-Rank Loss Matrices with Applications to Subset Ranking

The Calibrated Bayes Factor for Model Comparison Steve MacEachern The Ohio State University

PLURISUBHARMONICITY and PSEUDOCONVEXITY IN CALIBRATED (and other) GEOMETRIES with REESE HARVEY

Temperature Gradient System (TemGraS) Jelena Maricic, YujingSun, Radovan Milincic University of

Effectively Scaling Effectively Scaling up/universalizing exclusive up/universalizing exclusive

Scaling From simple models to rich strategies PPPLab Day, November 30th Scaling: recent

Outline Scalinga Plenitude of Power Laws Scaling-at-large Scaling-at-large Principles of

Temperature and Heat How to Measure Temperature? What is temperature? Fahrenheit (US)

Looking Beyond the Knob Looking Beyond the Knob Looking Beyond the Knob Looking Beyond the Knob

Camera Calibration COMPSCI 527 Computer Vision COMPSCI 527 Computer Vision Camera

Camera Calibration Camera Calibration Steve Steve Seitz Seitz Carnegie Mellon University

THE NUMBER EIGHT IN TOPOLOGY Andrew Ranicki (Edinburgh) http://www.maths.ed.ac.uk/

3Q FY2016/17 Financial Results Presentation 26 January 2017 S Y D N E Y M E L B O U R N

Robotic Agents (CMPSC 311) Calibration Janyl Jumadinova September 10, 2019 Janyl Jumadinova

When does label smoothing help? Rafael Mller, Simon Kornblith, Geofgrey Hinton Label smoothing

of Calibration by Lars Peter Hansen and James J. Heckman Journal of Economic Perspectives, Vol.

Calibration of hitting probabilities via multilevel splitting Ioannis Phinikettos Axel Gandy

Temperature & Thermal Expansion Temperature Zeroth Law of Thermodynamics Temperature

Analysis of Scaling Algorithms for Matrix & Operator Scaling Contents Scaling Algorithms