Combating Label Noise in Deep Learning using Abstention Speaker: - - PowerPoint PPT Presentation

combating label noise in deep learning using abstention
SMART_READER_LITE
LIVE PREVIEW

Combating Label Noise in Deep Learning using Abstention Speaker: - - PowerPoint PPT Presentation

Combating Label Noise in Deep Learning using Abstention Speaker: Sunil Thulasidasan sunil@lanl.gov sunil@lanl.gov A Practical Challenge for Deep Learning State-of-the-art models require large amounts of clean , annotated data. sunil@lanl.gov


slide-1
SLIDE 1

Combating Label Noise in Deep Learning using Abstention

Speaker: Sunil Thulasidasan sunil@lanl.gov

slide-2
SLIDE 2

sunil@lanl.gov

A Practical Challenge for Deep Learning State-of-the-art models require large amounts of clean, annotated data.

slide-3
SLIDE 3

sunil@lanl.gov

Annotation is labor intensive!

  • 49k workers
  • 167 countries
  • 2.5 years to complete!

ImageNet: 15 million labeled images; over 20,000 classes

The data that transformed AI research—and possibly the world (D. Gershgorn, quartz, magazine, 2017)

Slide from Fei-Fei Li and Jia Deng

slide-4
SLIDE 4

sunil@lanl.gov

Approaches to large-scale labeling

  • Crowdsource at scale –

labor intensive, but relatively cheap

  • Use weak labels from

queries, user tags and pre-trained classifiers

slide-5
SLIDE 5

sunil@lanl.gov

Approaches to large-scale labeling

  • Crowdsource at scale –

labor intensive, but cheap

  • Use weak labels from

queries, user tags and pre-trained classifiers Both approaches can lead to significant labeling errors!

Dog Taxi Banana

Slide credit: S Guo et al ‘2018

slide-6
SLIDE 6
  • Label noise is an inconsistent mapping from

features X to labels Y

Dog Dog Dog

slide-7
SLIDE 7

Approach: Use learning difficulty on incorrectly labeled or confusing samples to defer on learning -- “abstain” -- till correct mapping is learned. The Deep Abstaining Classifier (DAC)

slide-8
SLIDE 8

sunil@lanl.gov

Training a Deep Abstaining Classifier

Cross entropy as usual

L(x) = (1 − p(x)k+1) −

k

X

i=1

t(x)i log p(x)i 1 − p(x)k+1 ! +α log 1 1 − p(x)k+1

slide-9
SLIDE 9

sunil@lanl.gov

Training a Deep Abstaining Classifier

Cross entropy over actual classes Abstention class

L(x) = (1 − p(x)k+1) −

k

X

i=1

t(x)i log p(x)i 1 − p(x)k+1 ! +α log 1 1 − p(x)k+1

Encourages abstention

slide-10
SLIDE 10

sunil@lanl.gov

Training a Deep Abstaining Classifier

Cross entropy over actual classes

L(x) = (1 − p(x)k+1) −

k

X

i=1

t(x)i log p(x)i 1 − p(x)k+1 ! + α log 1 1 − p(x)k+1

Abstention class Encourages abstention Penalizes abstention Automatically tuned during learning.

slide-11
SLIDE 11

sunil@lanl.gov

Abstention Dynamics

Abstained percent on training set vs epoch with 10% label noise.

Ideal rate of abstention Overfitting regime!

Introduce abstention after a warmup period.

Abstention reduces as the DAC makes learning progress

slide-12
SLIDE 12

sunil@lanl.gov

The DAC gives state-of-art results in label-noise experiments.

CIFAR-100 60% label noise CIFAR-10 80% label noise

WebVision: Real-world noisy dataset. ~2.4M images. ~35-40% label noise

Training protocol:

  • Use DAC to identify and

eliminate label noise.

  • Retrain on cleaner set.

CIFAR-10 60% label noise

GCE: Generalized Cross-Entropy Loss (Zhang et al NIPS ‘18); Forward (Patrini et al, CVPR ’17); MentorNet (Li et al, ICML ‘18)

slide-13
SLIDE 13

sunil@lanl.gov

Abstention in the presence of Systematic Label Noise: The Random Monkeys Experiment

All the monkey labels in the training set (STL- 10) are randomized. Can the DAC learn that images containing monkey features have unreliable labels and abstain on monkeys in the test set?

slide-14
SLIDE 14

sunil@lanl.gov

Random Monkeys: DAC Predictions on Monkey Images

airplane bird car cat deer dog horse monkey ship truck Abstained 0.0 0.5

The DAC abstains on most of the monkeys in the test set!

slide-15
SLIDE 15

sunil@lanl.gov

Image Blurring

Blur a subset (20%)

  • f the images in the

training set and randomize labels Will the DAC learn to abstain on blurred images in the test set?

slide-16
SLIDE 16

DAC Behavior on Blurred Images

DAC abstains on most of the blurred images in the test set For DAC, validation accuracy is calculated on non-abstained samples.

slide-17
SLIDE 17

sunil@lanl.gov

Conclusions

  • Abstention training is an effective way

to clean label noise in a deep learning pipeline.

  • Abstention can also be used as a

representation learner for label noise.

  • Especially useful for interpretability in “don’t-

know” decision situations.

Code available at https://github.com/thulas/dac-label-noise

slide-18
SLIDE 18

sunil@lanl.gov

Code available at https://github.com/thulas/dac-label-noise

Jamal Mohd- Yusof Los Alamos National Lab Tanmoy Bhattacharya Los Alamos National Lab Jeff Bilmes University of Washington Gopinath Chennupati Los Alamos National Lab

Point of Contact: Sunil Thulasidasan (sunil@lanl.gov)

Joint work with…….

Poster: Tue Jun 11th 06:30 -- 09:00 PM @ Pacific Ballroom #9