Beyond Synthetic Noise: Deep Learning on Controlled Noisy Labels Lu - - PowerPoint PPT Presentation

beyond synthetic noise deep learning on controlled noisy
SMART_READER_LITE
LIVE PREVIEW

Beyond Synthetic Noise: Deep Learning on Controlled Noisy Labels Lu - - PowerPoint PPT Presentation

Proprietary + Confidential Beyond Synthetic Noise: Deep Learning on Controlled Noisy Labels Lu Jiang, Di Huang, Mason Liu, Weilong Yang. Lu Jiang Di Huang Mason Liu Weilong Yang Deep Learning on Noisy Labels Deep networks are very good at


slide-1
SLIDE 1

Proprietary + Confidential

Beyond Synthetic Noise: Deep Learning on Controlled Noisy Labels

Lu Jiang, Di Huang, Mason Liu, Weilong Yang.

Lu Jiang Weilong Yang Di Huang Mason Liu

slide-2
SLIDE 2

Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem

Deep Learning on Noisy Labels

Deep networks are very good at memorizing the noisy labels (Zhang et al. 2017). Memorization leads to a critical issue since noisy labels are inevitable in big data.

Zhang, Chiyuan, et al. "Understanding deep learning requires rethinking generalization." ICLR (2017).

slide-3
SLIDE 3

Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem

Controlled Noisy Labels

Pergorming controlled experiments on noisy labels is essential in existing works. noise level=20% 40% 80%

Wrong label Correct label

slide-4
SLIDE 4

Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem

Issues with Controlled Synthetic Labels

Issue: existing studies only pergorm controlled experiments on synthetic labels (or random labels). 1. Contradictory fjndings. For example, DNNs are robust to massive label noise?

slide-5
SLIDE 5

Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem

Issues with Controlled Synthetic Labels

Issue: existing studies only pergorm controlled experiments on synthetic labels (or random labels). 1. Contradictory fjndings. For example, DNNs are robust to massive label noise?

Zhang, Chiyuan, et al. "Understanding deep learning requires rethinking generalization." ICLR (2017). Rolnick, D., et al. Deep learning is robust to massive label noise. arXiv preprint arXiv:1705.10694, 2017.

(Zhang et al. 2017) (Rolnick et al. 2017)

slide-6
SLIDE 6

Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem

Issues with Controlled Synthetic Labels

Issue: existing studies only pergorm controlled experiments on synthetic labels (or random labels). 2. Inconsistent empirical results We found that methods that pergorm well on synthetic noise may not work as well on real-world noisy labels.

  • Motivation of our research project.
slide-7
SLIDE 7

Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem

Our Contributions:

1.

We establish the fjrst benchmark of controlled real-world label noise (from the web). 2. A simple but highly efgective method to overcome both synthetic and real-world noisy labels (best results on the WebVision benchmark) 3. We conduct the largest study by far into understanding deep neural networks trained on noisy labels across difgerent noise levels, noise types, network architectures, methods, and training setuings.

slide-8
SLIDE 8

Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem

Contribution I: New Dataset

First benchmark of controlled real-world label noise

slide-9
SLIDE 9

Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem

real-world label noise content synthetic image corruption adversarial attack (Hendrycks & Dietterich, 2019) (Zhang et al., 2019a)

Datasets of noisy training labels

slide-10
SLIDE 10

Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem

?

label noise content image corruption uncontrolled adversarial attack WebVision, Clothing1M etc. (Hendrycks & Dietterich, 2019) real-world (Zhang et al., 2019a)

Datasets of noisy training labels

synthetic Controlled → Missing Controlled (Zhang et al. 2017)

  • ur work
slide-11
SLIDE 11

Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem

?

label noise content image corruption uncontrolled adversarial attack WebVision, Clothing1M etc. (Hendrycks & Dietterich, 2019) real-world (Zhang et al., 2019a)

Datasets of noisy training labels

synthetic Controlled → Missing Controlled (Zhang et al. 2017)

  • ur work
slide-12
SLIDE 12

Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem

Construction of controlled synthetic label noise

1. Starts with a well-labeled dataset. 2. Randomly selects p% examples. 3. Independently flips each label to a random incorrect class (symmetric or asymmetric). 4. Repeats Step 1-3 with a different p (noise level)

Mini-ImageNet Correct label

slide-13
SLIDE 13

Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem

Construction of controlled synthetic label noise

1. Starts with a well-labeled dataset. 2. Randomly selects p% examples. 3. Independently flips each label to a random incorrect class (symmetric or asymmetric). 4. Repeats Step 1-3 with a different p (noise level)

noise level p = 20% Correct label

slide-14
SLIDE 14

Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem

Construction of controlled synthetic label noise

1. Starts with a well-labeled dataset. 2. Randomly selects p% examples. 3. Independently flips each label to a random incorrect class (symmetric or asymmetric). 4. Repeats Step 1-3 with a different p (noise level)

noise level p = 20% Wrong label Correct label

slide-15
SLIDE 15

Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem

Construction of controlled synthetic label noise

1. Starts with a well-labeled dataset. 2. Randomly selects p% examples. 3. Independently flips each label to a random incorrect class (symmetric or asymmetric). 4. Repeats Step 1-3 with a different p (noise level)

noise level p = 40%

This process generates controlled synthetic label noise.

Wrong label Correct label

slide-16
SLIDE 16

Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem

?

label noise content image corruption uncontrolled adversarial attack WebVision, Clothing1M etc. (Hendrycks & Dietterich, 2019) real-world (Zhang et al., 2019a)

Datasets of noisy training labels

synthetic Controlled → Missing Controlled (Zhang et al. 2017)

  • ur work
slide-17
SLIDE 17

Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem

Construction of uncontrolled web label noise

noise level p = ??%

This process can automatically collect noisy labeled images from the web. But the noise level is fixed and unknown (unsuitable for controlled studies).

label correctness unknown

slide-18
SLIDE 18

Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem

?

label noise content image corruption uncontrolled adversarial attack WebVision, Clothing1M etc. (Hendrycks & Dietterich, 2019) real-world (Zhang et al., 2019a)

Datasets of noisy training labels

synthetic Controlled → Missing Controlled (Zhang et al. 2017)

  • ur work
slide-19
SLIDE 19

Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem

From uncontrolled to controlled noise

We have each retrieved image annotated by 3-5 works using Google Cloud Labeling Service

https://cloud.google.com/ai-platform/data-labeling/docs noise level p is known

correct incorrect correct

Wrong label Correct label

slide-20
SLIDE 20

Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem

Construction of our dataset

1. Starts with a well-labeled dataset. 2. Randomly selects p% examples. 3. Replaces the clean images with the incorrectly labeled web images while leaving the label unchanged*. 4. Repeats Step 1-3 with a different p (noise level)

noise level p = 20% wrong label Correct label

*We show that an alternative way to construct the dataset by removing all image-to-image results leads to consistent results in the Appendix

slide-21
SLIDE 21

Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem

Manually annotate 212K images through 800K annotations. We establish the fjrst benchmark of controlled web label noise for two classifjcation tasks: coarse (Mini-ImageNet) and fjne-grained (Stanford Cars)

Our Dataset: Controlled Noisy Labels from the Web

slide-22
SLIDE 22

Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem

Manually annotate 212K images through 800K annotations. We establish the fjrst benchmark of controlled web label noise for two classifjcation tasks: coarse (Mini-ImageNet) and fjne-grained (Stanford Cars)

Our Dataset: Controlled Noisy Labels from the Web

Red noise: label noise from the web Blue noise: synthetic label noise

slide-23
SLIDE 23

Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem

Mini-ImageNet Stanford Cars

slide-24
SLIDE 24

Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem

Contribution II: New Method

to overcome synthetic and real-world label noise

slide-25
SLIDE 25

Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem

Problem: Given a noisy dataset of some unknown noise level, fjnd a robust learning method

that generalizes well on the clean test data.

Prior works: Many techniques tackle it from multiple directions, among others,

  • Regularization (Azadi et al., 2016; Noh et al., 2017; etc.)
  • Label cleaning (Reed et al., 2014; Goldberger, 2017; Li et al., 2017b; Veit et al., 2017; Song et al., 2019; etc.)
  • Example weighting (Jiang et al., 2018; Ren et al., 2018; Shu et al., 2019; Jiang et al., 2015; Liang et al., 2016; etc.)
  • Data augmentation (Zhang et al., 2018; Cheng et al., 2019)
  • … ...

Our Method: a simple and efgective method called MentorMix. Why need yet another method? We show our method overcomes both synthetic and

real-world noisy labels.

Overview

slide-26
SLIDE 26

Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem

MentorMix is inspired by MentorNet (for curriculum learning) and Mixup (for vicinal risk minimization). It comprise four steps: weight1, sample, mixup, and weight again2.

Jiang, Lu, et al. "Mentornet: Learning data-driven curriculum for very deep neural networks on corrupted labels." ICML 2018. Zhang, Hongyi, et al. "mixup: Beyond empirical risk minimization." ICLR 2017.

Method

1. The simplest MentorNet form is a loss thresholding function: 2. We found second weighting is useful for high noise levels.

slide-27
SLIDE 27

Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem

Experimental Results

MentorMix: A simple but highly efgective method to overcome both synthetic and real-world noisy labels. On our dataset

Methods which pergorm well on synthetic noise may not work as well on real-world noisy labels, and vice versa. MentorMix is able to overcome both synthetic and real-world noisy labels

each cell is the mean of 10 difgerent noise levels from 0% to 80%

slide-28
SLIDE 28

Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem

Experimental Results

On public CIFAR (synthetic noise) On public WebVision (real-world noise) MentorMix: A simple but highly efgective method to overcome both synthetic and real-world noisy labels.

slide-29
SLIDE 29

Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem

Experimental Results

On public CIFAR (synthetic noise) On public WebVision (real-world noise) MentorMix: A simple but highly efgective method to overcome both synthetic and real-world noisy labels.

The best-published result on the WebVision benchmark!

slide-30
SLIDE 30

Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem

Contribution III: New findings

  • n real-world label noise
slide-31
SLIDE 31

Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem

Contribution III

We conduct the largest study by far into understanding deep neural networks trained on noisy labels. Our study confjrms existing fjndings on synthetic noisy labels, and brings forward new fjndings that may challenge our preconception.

slide-32
SLIDE 32

Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem

Blue Noise (symmetric)

(1) DNNs generalize poorly on synthetic label noise (Zhang et al., 2017).

Colored belt plots the 95% confjdence interval across 10 noise levels. Wider belt → poorer generalization

slide-33
SLIDE 33

Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem

Blue Noise (symmetric) Red Noise (web)

(1) DNNs generalize poorly on synthetic label noise (Zhang et al., 2017). DNNs generalize much better on the web label noise.

Colored belt plots the 95% confjdence interval across 10 noise levels. Wider belt → poorer generalization

slide-34
SLIDE 34

Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem

Blue Noise (symmetric) Red Noise (web)

(1) DNNs generalize poorly on synthetic label noise (Zhang et al., 2017). DNNs generalize much better on the web label noise.

slide-35
SLIDE 35

Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem

(2) DNNs learn pattern first on noisy training labels (Arpit et al., 2017)

Blue Noise (symmetric)

Accuracy drop

slide-36
SLIDE 36

Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem

(2) DNNs learn pattern first on noisy training labels (Arpit et al., 2017) DNNs may NOT learn pattern first on the web label noise

Blue Noise (symmetric) Red Noise (web)

Accuracy drop

slide-37
SLIDE 37

Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem

Conclusions

slide-38
SLIDE 38

Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem

ImageNet architectures generalize on noisy labels when the networks are fine-tuned. Clean Data

ImageNet architectures generalize on clean training labels when the networks are fine-tuned (Kornblith et al., 2019). It also holds on noisy labels.

Blue Noise and Red Noise

slide-39
SLIDE 39

Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem

Key takeaways:

1.

We proposed:

a.

the fjrst benchmark of real-world controlled label noise (from the web),

b.

a simple method (MentorMix) to overcome both synthetic and real-world noisy labels. 2. We found: a. DNNs may NOT learn patuerns fjrst but generalize much betuer on the real-world web label noise. b. Methods which pergorm well on synthetic noise may not work as well on real-world noisy labels. c. Advanced pretrained architectures are betuer at overcoming noisy labels. d. Furuher using MentorMix yields the best results.

slide-40
SLIDE 40

Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem

Key takeaways:

1.

We proposed:

a.

the fjrst benchmark of real-world controlled label noise (from the web),

b.

a simple method (MentorMix) to overcome both synthetic and real-world noisy labels. 2. We found: a. Deep networks may NOT learn patuerns fjrst but generalize much betuer on the real-world label noise from the web. b. Methods which pergorm well on synthetic noise may not work as well on the real-world noisy labels from the web. c. Advanced pretrained architectures are betuer at overcoming noisy labels. d. Furuher using MentorMix yields the best results.

Thanks for watching. Please find our data and code at: http://www.lujiang.info/cnlw

slide-41
SLIDE 41

Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem

Appendix

slide-42
SLIDE 42

Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem Proprietary + Confidential

Contribution II

MentorMix consists of two key operations: MentorNet (for curriculum learning) and Mixup (for vicinal risk minimization). MentorNet as importance sampling Mixup for minimizing the vicinal risk

We use the simplest MentorNet here which is a thresholding function:

Jiang, Lu, et al. "Mentornet: Learning data-driven curriculum for very deep neural networks on corrupted labels." ICML 2018 Zhang, Hongyi, et al. "mixup: Beyond empirical risk minimization." ICLR 2017.

slide-43
SLIDE 43

Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem

loss weight mini-batch

Weight → Sample → Mixup → Weight

forward pass distribution