Beyond temperature scaling: Obtaining well-calibrated multiclass - - PowerPoint PPT Presentation

beyond temperature scaling obtaining well calibrated
SMART_READER_LITE
LIVE PREVIEW

Beyond temperature scaling: Obtaining well-calibrated multiclass - - PowerPoint PPT Presentation

Beyond temperature scaling: Obtaining well-calibrated multiclass probabilities with Dirichlet calibration Meelis Kull, Miquel Perello Nieto, Markus Kngsepp, Telmo Silva Filho, Hao Song, Peter Flach NeurIPS 2019 Contributions New


slide-1
SLIDE 1

Beyond temperature scaling: Obtaining well-calibrated multiclass probabilities with Dirichlet calibration

Meelis Kull, Miquel Perello Nieto, Markus Kängsepp, Telmo Silva Filho, Hao Song, Peter Flach

NeurIPS 2019

slide-2
SLIDE 2

Contributions

  • New parametric calibration method:
  • New regularization method for matrix scaling (and for Dirichlet calibration):
  • Multi-class classifier evaluation:

Confidence-reliability diagram Classwise-reliability diagrams Confidence-ECE Classwise-ECE Confidence-calibrated Classwise-calibrated Multiclass-calibrated ODIR – Off-Diagonal and Intercept Regularisation

slide-3
SLIDE 3

Making classifiers more trustworthy

slide-4
SLIDE 4

a classifier with 60% accuracy

  • n a set of instances

Making classifiers more trustworthy

slide-5
SLIDE 5

a classifier with 60% accuracy

  • n a set of instances

Making classifiers more trustworthy

slide-6
SLIDE 6

a classifier with 60% accuracy

  • n a set of instances

Making classifiers more trustworthy

if the classifier reports class probabilities

slide-7
SLIDE 7

a classifier with 60% accuracy

  • n a set of instances

if the classifier reports class probabilities then we get instance-specific

Making classifiers more trustworthy

slide-8
SLIDE 8

Trustworthy if confidence-calibrated

slide-9
SLIDE 9

Trustworthy if confidence-calibrated

slide-10
SLIDE 10

Trustworthy if confidence-calibrated

slide-11
SLIDE 11

Trustworthy if confidence-calibrated

slide-12
SLIDE 12

Trustworthy if confidence-calibrated

slide-13
SLIDE 13

Trustworthy if confidence-calibrated

slide-14
SLIDE 14

Trustworthy if confidence-calibrated

slide-15
SLIDE 15

Trustworthy if confidence-calibrated

slide-16
SLIDE 16

Confidence-calibrated:

Trustworthy if confidence-calibrated

slide-17
SLIDE 17

Confidence-calibrated: Experimental setup: CIFAR-10 ResNet Wide 32 Accuracy: Overall: 94% At 90% confidence: 58%

Deep nets are usually over-confident

slide-18
SLIDE 18

Confidence-calibrated: Experimental setup: CIFAR-10 ResNet Wide 32 Accuracy: Overall: 94% At 90% confidence: 58%

Deep nets are usually over-confident

slide-19
SLIDE 19

Confidence-calibrated: Experimental setup: CIFAR-10 ResNet Wide 32 Accuracy: Overall: 94% At 90% confidence: 58%

Example: uncalibrated predictions

slide-20
SLIDE 20

Confidence-calibrated: Experimental setup: CIFAR-10 ResNet Wide 32 Accuracy: Overall: 94% At 90% confidence: 58% Accuracy after Temp.Scal: Overall: 94% At 90% confidence: 88%

Example: after calibration with temperature scaling

slide-21
SLIDE 21

Classwise-calibrated: Confidence-calibrated: Experimental setup: CIFAR-10 ResNet Wide 32 Accuracy: Overall: 94% At 90% confidence: 58% Accuracy after Temp.Scal: Overall: 94% At 90% confidence: 88% At 90% class 2 prob: 76%

Example: after calibration with temperature scaling

slide-22
SLIDE 22

Classwise-calibrated: Confidence-calibrated: Experimental setup: CIFAR-10 ResNet Wide 32 Accuracy: Overall: 94% At 90% confidence: 58% Accuracy after Temp.Scal: Overall: 94% At 90% confidence: 88% At 90% class 2 prob: 76% Accuracy after Dir.Calib: At 90% class 2 prob: 90%

Example: after calibration with Dirichlet calibration

slide-23
SLIDE 23

Input layer Last hidden layer ANY FEED- FORWARD NETWORK Softmax

How to calibrate a multi-class classifier?

class probabilities logits

slide-24
SLIDE 24

Input layer Last hidden layer ANY FEED- FORWARD NETWORK Temperature scaling Softmax

Temperature scaling

  • C. Guo, G. Pleiss, Y. Sun, and K. Q. Weinberger. On Calibration of Modern Neural Networks. ICML 2017

class probabilities logits scaled logits Parameters: frozen

slide-25
SLIDE 25

Input layer Last hidden layer ANY FEED- FORWARD NETWORK Vector scaling Softmax

Vector scaling

  • C. Guo, G. Pleiss, Y. Sun, and K. Q. Weinberger. On Calibration of Modern Neural Networks. ICML 2017

class probabilities logits scaled logits frozen Parameters:

slide-26
SLIDE 26

Input layer Last hidden layer ANY FEED- FORWARD NETWORK Matrix scaling Softmax

Matrix scaling

  • C. Guo, G. Pleiss, Y. Sun, and K. Q. Weinberger. On Calibration of Modern Neural Networks. ICML 2017

class probabilities logits scaled logits frozen Parameters:

slide-27
SLIDE 27

ANY PROBABILISTIC MULTI-CLASS CLASSIFIER

Dirichlet calibration can calibrate any classifiers

class probabilities

slide-28
SLIDE 28

Parametric calibration methods

Logit space Class probability space Binary classification

Platt scaling[1] Beta calibration[2]

Multi-class classification

(+ constrained variants) Derived from Gaussian distribution Derived from Beta distribution

[1] J. Platt. Probabilities for SV machines. In Advances in Large Margin Classifiers, pages 61–74, MIT Press, 2000. [2] M. Kull, T. Silva Filho, P. Flach. Beta calibration: a well-founded and easily implemented improvement on logistic calibration for binary classifiers. AISTATS 2017

slide-29
SLIDE 29

Parametric calibration methods

Logit space Class probability space Binary classification

Platt scaling[1] Beta calibration[2]

Multi-class classification

Dirichlet calibration

(+ constrained variants) (+ constrained variants) Derived from Gaussian distribution Derived from Beta distribution Derived from Dirichlet distribution

[1] J. Platt. Probabilities for SV machines. In Advances in Large Margin Classifiers, pages 61–74, MIT Press, 2000. [2] M. Kull, T. Silva Filho, P. Flach. Beta calibration: a well-founded and easily implemented improvement on logistic calibration for binary classifiers. AISTATS 2017

slide-30
SLIDE 30

Parametric calibration methods

Logit space Class probability space Binary classification

Platt scaling[1] Beta calibration[2]

Multi-class classification

Matrix scaling[3] Dirichlet calibration

(+ constrained variants) (+ vector scaling, temperature scaling) (+ constrained variants) Derived from Gaussian distribution Derived from Beta distribution Derived from Dirichlet distribution

[1] J. Platt. Probabilities for SV machines. In Advances in Large Margin Classifiers, pages 61–74, MIT Press, 2000. [2] M. Kull, T. Silva Filho, P. Flach. Beta calibration: a well-founded and easily implemented improvement on logistic calibration for binary classifiers. AISTATS 2017 [3] C. Guo, G. Pleiss, Y. Sun, and K. Q. Weinberger. On Calibration of Modern Neural Networks. ICML 2017

slide-31
SLIDE 31

ANY PROBABILISTIC MULTI-CLASS CLASSIFIER

Dirichlet calibration

Log-transform Fully-connected linear Softmax

Parameters:

  • L2
  • ODIR (Off-Diagonal and Intercept Regularisation)

Regularisation:

slide-32
SLIDE 32

Non-neural experiments

  • 21 datasets x 11 classifiers = 231 settings
  • Average rank

○ Classwise-ECE ○ Log-loss ○ Error rate

slide-33
SLIDE 33

Which classifiers are calibrated?

slide-34
SLIDE 34

Which classifiers are calibrated?

slide-35
SLIDE 35

Deep Neural Networks Experiments: Settings

  • 3 datasets: CIFAR-10, CIFAR-100, SVHN
  • 11 convolutional NNs + 3 pretrained
slide-36
SLIDE 36

Neural experiments

  • Datasets: CIFAR-10, CIFAR-100, SVHN
  • 11 CNNs trained as in Guo et al + 3 pretrained

Log-loss Classwise-ECE

slide-37
SLIDE 37

Conclusion

  • 1. Dirichlet calibration: New parametric general-purpose multiclass calibration method
  • a. Natural extension of two-class Beta calibration
  • b. Easy to implement with multinomial logistic regression on log-transformed

class probabilities

  • 2. Best or tied best performance with 21 datasets x 11 classifiers
  • 3. Advances state-of-the-art on Neural Networks by introducing ODIR regularisation
slide-38
SLIDE 38

Beyond temperature scaling: Obtaining well-calibrated multiclass probabilities with Dirichlet calibration

Meelis Kull, Miquel Perello Nieto, Markus Kängsepp, Telmo Silva Filho, Hao Song, Peter Flach

NeurIPS 2019