Corrupted Labels Nontawat Charoenphakdee 1,2 , Jongyeong Lee 1,2 and - - PowerPoint PPT Presentation

corrupted labels
SMART_READER_LITE
LIVE PREVIEW

Corrupted Labels Nontawat Charoenphakdee 1,2 , Jongyeong Lee 1,2 and - - PowerPoint PPT Presentation

On Symmetric Losses for Learning from Corrupted Labels Nontawat Charoenphakdee 1,2 , Jongyeong Lee 1,2 and Masashi Sugiyama 2,1 The University of Tokyo 1 RIKEN Center for Advanced Intelligence Project (AIP) 2 2 Supervised learning Predict output


slide-1
SLIDE 1

On Symmetric Losses for Learning from Corrupted Labels

The University of Tokyo1 RIKEN Center for Advanced Intelligence Project (AIP)2

Nontawat Charoenphakdee1,2 , Jongyeong Lee1,2 and Masashi Sugiyama2,1

slide-2
SLIDE 2

Supervised learning

https://t.pimg.jp/006/570/886/1/6570886.jpg https://www.kullabs.com/uploads/meauring-clip-art-at-clker-com-vector-clip-art-online-royalty-free-H2SJHF-clipart.png https://d3njjcbhbojbot.cloudfront.net/api/utilities/v1/imageproxy/https://coursera.s3.amazonaws.com/topics/ml/large-icon.png\

Features (Input) Labels (Output)

No noise robustness

Prediction function

2

Machine learning

Data collection Learn from input-output pairs

Predict output of unseen input accurately

Such that

slide-3
SLIDE 3

Learning fr from corrupted labels

https://thumbs.dreamstime.com/b/power-crowd-d-render-crowdsourcing-concept-30738769.jpg http://www.process-improvement-institute.com/wp-content/uploads/2015/05/Accounting-for-Human-Error-Probability-in-SIL-Verification.jpg https://www.kullabs.com/uploads/meauring-clip-art-at-clker-com-vector-clip-art-online-royalty-free-H2SJHF-clipart.png https://d3njjcbhbojbot.cloudfront.net/api/utilities/v1/imageproxy/https://coursera.s3.amazonaws.com/topics/ml/large-icon.png

Feature collection Labeling process Noise-robust ML

Data collection

Prediction function

Our goal Examples:

  • Expert labelers (human error)
  • Crowdsourcing (non-expert error)

3

slide-4
SLIDE 4

Contents

  • Background and related work
  • The importance of symmetric losses
  • Theoretical properties of symmetric losses
  • Barrier hinge loss
  • Experiments

4

slide-5
SLIDE 5

Warmup: Binary ry classification

: Label : Prediction function : Feature vector

  • Given: input-output pairs:
  • Goal: minimize expected error:

No access to distribution: minimize empirical error (Vapnik, 1998):

: Margin loss function

5

same sign different sign

slide-6
SLIDE 6

Minimizing 0-1 loss directly is difficult.

  • Discontinuous and not differentiable (Ben-david+, 2003, Feldman+, 2012)

In practice, we minimize a surrogate loss (Zhang, 2004, Bartlett+, 2006).

Surrogate losses

: Label : Prediction function : Feature vector : Margin

6

slide-7
SLIDE 7

Given: Two sets of corrupted data:

Clean:

Learning fr from corrupted la labels

Positive: Negative:

7

Class priors

Positive-unlabeled:

(Scott+, 2013, Menon+, 2015, Lu+, 2019)

This setting covers many weakly-supervised settings (Lu+, 2019).

(du Plessis+, 2014)

slide-8
SLIDE 8

Given: Two sets of corrupted data: Assumption:

Problem: are unidentifiable from samples (Scott+, 2013).

How to learn without estimating ? 8 Is Issue on cla lass pri riors

Positive: Negative:

slide-9
SLIDE 9

Classification error: Balanced error rate (BER): Area under the receiver operating characteristic curve (AUC) risk:

9 Related work:

Class priors are needed! (Lu+, 2019) Class priors are not needed! (Menon+, 2015)

slide-10
SLIDE 10

Menon+, 2015: we can treat corrupted data as if they were clean.

Related work: BER and AUC optimization 10

Squared loss was used in experiments. van Rooyen+, 2015: symmetric losses are also useful for BER minimization (no experiments). The proof relies on a property of 0-1 loss. Ours: using symmetric loss is preferable for both BER and AUC theoretically and experimentally!

slide-11
SLIDE 11

Contents

  • Background and related work
  • The importance of symmetric losses
  • Theoretical properties of symmetric losses
  • Barrier hinge loss
  • Experiments

11

slide-12
SLIDE 12

Robustness under symmetric noise (label flip with a fixed probability)

Symmetric losses

12

Risk estimator simplification in weakly-supervised learning

(du Plessis+, 2014, Kiryo+, 2017, Lu+, 2018) (Ghosh+, 2015, van Rooyen+, 2015)

Applications:

slide-13
SLIDE 13

Symmetric losses:

AUC maximization

13

Excessive terms become constant!

Excessive terms can be safely ignored with symmetric losses

1.

Corrupted risk Clean risk

slide-14
SLIDE 14

Symmetric losses:

BER minimization

14

Excessive term becomes constant!

Corrupted risk Clean risk

Excessive terms can be safely ignored with symmetric losses

Coincides with van Rooyen 2015+

slide-15
SLIDE 15

Contents

  • Background and related work
  • The importance of symmetric losses
  • Theoretical properties of symmetric losses
  • Barrier hinge loss
  • Experiments

15

slide-16
SLIDE 16

Theoretical properties of f symmetric losses

Nonnegative symmetric losses are non-convex.

  • Theory of convex losses cannot be applied.

16

We provide a better understanding of symmetric losses:

  • Necessary and sufficient condition for classification-calibration
  • Excess risk bound in binary classification
  • Inability to estimate class posterior probability
  • A sufficient condition for AUC-consistency

➢ Covers many symmetric losses, e.g., sigmoid, ramp.

(du Plessis+, 2014, Ghosh+, 2015)

Well-known symmetric losses, e.g., sigmoid, ramp are classification-calibrated and AUC-consistent!

slide-17
SLIDE 17

Contents

  • Background and related work
  • The importance of symmetric losses
  • Theoretical properties of symmetric losses
  • Barrier hinge loss
  • Experiments

17

slide-18
SLIDE 18

Convex symmetric losses?

By sacrificing nonnegativity:

  • nly unhinged loss is convex and symmetric (van Rooyen+, 2015).

18

This loss has been considered (although robustness was not discussed).

(Devroye+, 1996, Schoelkopf+, 2002, Shawe-Taylor+, 2004, Sriperumbudur+, 2009, Reid+, 2011)

slide-19
SLIDE 19

slope of the non-symmetric region. width of symmetric region. High penalty if misclassify or output is outside symmetric region.

Barrier hinge loss

19

slide-20
SLIDE 20

Symmetricity of f barrier hinge loss

Satisfies symmetric property in an interval.

20

If output range is restricted in a symmetric region: unhinged, hinge , barrier are equivalent.

slide-21
SLIDE 21

Contents

  • Background and related work
  • The importance of symmetric losses
  • Theoretical properties of symmetric losses
  • Barrier hinge loss
  • Experiments

21

slide-22
SLIDE 22

Experiments: BER/AUC optimization fr

from corrupted la labels

22

To empirically answer the following questions:

  • 1. Does the symmetric condition significantly help?
  • 2. Do we need a loss to be symmetric everywhere?
  • 3. Does the negative unboundedness degrade the practical performance?

We conducted the following experiments: Fix the models, vary the loss functions Losses: Barrier [b=200, r=50], Unhinged, Sigmoid, Logistic, Hinge, Squared, Savage Experiment 1: MLPs on UCI/LIBSVM datasets. Experiment 2:

CNNs on more difficult datasets (MNIST, CIFAR-10).

slide-23
SLIDE 23

Experiments: BER/AUC optimization fr

from corrupted la labels

23

For UCI datasets:

Multilayered perceptrons (MLPs) with one hidden layer: [d-500-1] Activation function: Rectifier Linear Units (ReLU) (Nair+, 2010)

MNIST and CIFAR-10:

Convolutional neural networks (CNNs): [d-Conv[18,5,1,0]-Max[2,2]-Conv[48,5,1,0]-Max[2,2]-800-400-1] ReLU after fully connected layer follows by dropout layer (Srivastava+, 2010)

MNIST: Odd numbers vs Even numbers CIFAR: One class vs Airplane (follows Ishida+, 2017)

Conv[18, 5, 1 , 0]: 18 channels, 5 x 5 convolutions, stride 1, padding 0 Max[2,2]: max pooling with kernel size 2 and stride 2

slide-24
SLIDE 24

Experiment 1: : MLPs on UCI/LIBSVM datasets

24

Dataset information and more experiments and can be found in our paper.

The higher the better.

slide-25
SLIDE 25

Experiment 1: : MLPs on UCI/ I/LIBSVM datasets 25

Symmetric losses and barrier hinge loss are preferable!

The higher the better.

slide-26
SLIDE 26

Experiment 2: : CNNs on MNIST/CIF

IFAR-10 10 26

slide-27
SLIDE 27

Conclusion

We showed that symmetric loss is preferable under corrupted labels for:

  • Area under the receiver operating characteristic curve (AUC) maximization
  • Balanced error rate (BER) minimization

We provided general theoretical properties for symmetric losses:

  • Classification-calibration, excess risk bound, AUC-consistency
  • Inability of estimating the class posterior probability

We proposed a barrier hinge loss:

  • As a proof of concept of the importance of symmetric condition
  • Symmetric only in an interval but benefits greatly from symmetric condition
  • Significantly outperformed all losses in BER/AUC optimization using CNNs

27

Poster#135: today 6:3 :30-9:00PM