Class Weighted Classification: Trade-offs and Robust Approaches - - PowerPoint PPT Presentation

class weighted classification trade offs and robust
SMART_READER_LITE
LIVE PREVIEW

Class Weighted Classification: Trade-offs and Robust Approaches - - PowerPoint PPT Presentation

Class Weighted Classification: Trade-offs and Robust Approaches Ziyu Xu (Neil), Chen Dan, Justin Khim, Pradeep Ravikumar Machine Learning Department, Computer Science Department Carnegie Mellon University ICML 2020 (July 12th, 2020) Problem


slide-1
SLIDE 1

Class Weighted Classification: Trade-offs and Robust Approaches

Ziyu Xu (Neil), Chen Dan, Justin Khim, Pradeep Ravikumar Machine Learning Department, Computer Science Department Carnegie Mellon University ICML 2020 (July 12th, 2020)

slide-2
SLIDE 2

Problem

We look at the class imbalance problem in machine learning, which comes up in applications such as e-commerce, object detection etc.

slide-3
SLIDE 3

Contributions

  • Fundamental trade-off for different weightings
  • Formulation for robust risk on a set of weightings
  • Stochastic programming solution to robust risk
  • Statistical guarantees for generalization of robust risk (paper)
slide-4
SLIDE 4

Organization

  • Motivation and previous approaches
  • Fundamental trade-off for different weightings
  • Formulation for robust risk on a set of weightings
  • Stochastic programming solution to robust risk
slide-5
SLIDE 5

Class Imbalance

The classes are very imbalanced...

~20x difference!

slide-6
SLIDE 6

Is accuracy/risk a good measure?

Example: 99% Microwave, 1% keyboard

  • Classifier A: Predicts everything as microwave

Accuracy: 99%

  • Classifier B: Classifies all keyboards correctly, 2% error on Microwave

Accuracy: 98%

slide-7
SLIDE 7

Previous Approaches: Data Augmentation

  • SMOTE (Chawla et al. 2002)
  • Under/oversampling (Zhou

and Liu 2006)

  • GANs (Mariani et al. 2018)
slide-8
SLIDE 8

Previous Approaches: Alternative Metrics

F1 Score Precision: proportion of minority class predictions that are correct Recall: proportion of true minority class samples that are predicted as minority class Poorly understood and may not be the desired metric

slide-9
SLIDE 9

Class Weighting

We formalize errors on different classes with class-conditioned risks.

slide-10
SLIDE 10

Class Weighting

Weighted risk is the weighted sum of the class-conditioned risks.

slide-11
SLIDE 11

However, choosing weights is a difficult task: there are many hyperparameters to choose!

Class Weighting

slide-12
SLIDE 12

Example: Credit Card Fraud

Avg cost of Mis-Classification $10 $100

Cost(fraud) = 10 ✕Cost(non-fraud)

slide-13
SLIDE 13

Example: Credit Card Fraud

Avg cost of Mis-Classification $10 $100

Cost(fraud) = 10 ✕Cost(non-fraud)

slide-14
SLIDE 14

However, choosing weights is a difficult task: there are many hyperparameters to choose!

Class Weighting

What is the effect of choosing different weightings?

slide-15
SLIDE 15
  • Motivation and previous approaches
  • Fundamental trade-off for different weightings
  • Formulation for robust risk on a set of weightings
  • Stochastic programming solution to robust risk
slide-16
SLIDE 16

Fundamental Tradeoff

Bayes optimal classifier: Binary classification setup:

slide-17
SLIDE 17

Fundamental Tradeoff

Plug-in estimator: Weighted excess risk:

slide-18
SLIDE 18

Fundamental Tradeoff

Region where differing predictions

  • ccur
slide-19
SLIDE 19

Fundamental Tradeoff

Optimizing for one weighting inevitably reduces performance on another

Region where differing predictions

  • ccur
slide-20
SLIDE 20
  • Motivation and previous approaches
  • Fundamental trade-off for different weightings
  • Formulation for robust risk on a set of weightings
  • Stochastic programming solution to robust risk
slide-21
SLIDE 21

Robust Weighting

Define Q as a set of weightings - we define a robust risk as the maximum weighted risk over Q:

slide-22
SLIDE 22
  • Motivation and previous approaches
  • Fundamental trade-off for different weightings
  • Formulation for robust risk on a set of weightings
  • Stochastic programming solution to robust risk
slide-23
SLIDE 23

Label CVaR

The result is label CVaR (LCVaR), a new optimization objective based on a specific robust weighted risk.

slide-24
SLIDE 24

Label CVaR

The result is label CVaR (LCVaR), a new optimization objective based on a specific robust weighted risk.

Must be a probability. Each weight has a selected upper bound.

slide-25
SLIDE 25

LHCVaR

Since different classes have different sizes, we can also use different maximum weights. We call this version label heterogeneous CVaR (LHCVaR), since the label weights are not necessarily uniform like in LCVaR

slide-26
SLIDE 26

CVaR

This type of robust problem has been studied in portfolio optimization. One formulation is the ɑ conditional value-at-risk (CVaR), which is the average loss conditional on the loss being above the (1 - ɑ)-quantile.

slide-27
SLIDE 27

CVaR

Main idea: instead of optimizing the worst ɑ-proportion of losses in a portfolio, achieve good accuracy on the worst ɑ-proportion of class labels.

slide-28
SLIDE 28

Optimization

The connection to CVaR presents us with a dual form, that allows for minimization over all variables.

slide-29
SLIDE 29

Conclusions

  • Minimizing LCVaR/LHCVaR enables good performance all

weightings, rather than on a single weighting.

  • LCVaR require fewer user tuned parameters.
  • LCVaR/LHCVaR have dual forms that can be optimized

efficiently.

slide-30
SLIDE 30

Thank you!

slide-31
SLIDE 31

Main equations

LCVaR:

slide-32
SLIDE 32

Main equations

LHCVaR:

slide-33
SLIDE 33

Fundamental Trade-off Summary

slide-34
SLIDE 34

Hyperparameter tuning for LHCVaR

Recall that LHCVaR is the heterogeneous version of our loss i.e. we can choose a different alpha for each class. That means the number of hyperparameters scale w/ the number of classes, which is scary.

slide-35
SLIDE 35

Hyperparameter tuning for LHCVaR

It seems somewhat reasonable to choose alphas inversely proportional to the the class proportions:

Acts as upper bound

  • n any alpha

Temperature parameter: As kappa goes to infinity, the alphas become closer to uniform As kappa goes to 0 - the sharper the alphas become.

slide-36
SLIDE 36

Dual form optimization tricks

Note that the dual form is non-smooth, which actually makes gradient descent a little inefficient in this case, but we can explicitly calculate lambda at each step:

slide-37
SLIDE 37

Dual form optimization tricks

Dual objective:

slide-38
SLIDE 38

Numerical validation

slide-39
SLIDE 39

Experimental Evaluation

  • Synthetic dataset, in which we simulate large class

imbalance for binary classification.

  • A real dataset from the UCI dataset repository, which has

multiclass class imbalance. In our experiments, we use a logistic regression model.

slide-40
SLIDE 40

Synthetic Experiment

We generate a binary classification dataset, where we vary probability of class 0, the majority class.

slide-41
SLIDE 41

Synthetic Experiment

Risk on majority class Risk on minority class LCVaR/LHCVaR beats balanced on majority class, and standard on minority class.

slide-42
SLIDE 42

Synthetic Experiment

Worst case risk And consequently has increasingly better worst case risk as imbalance increases.

slide-43
SLIDE 43

Real Data Experiment

Covertype dataset: https://archive.ics.uci.edu/ml/datasets/covertype 54-dimension feature set. 7 labels.

slide-44
SLIDE 44

Real Data Experiment

Balanced (0.5333) Standard (0.5111) LCVaR (0.5037) LHCVaR (0.4907) LHCVaR/LCVaR have the best worst case class risk