Making deep neural networks robust to label noise: a loss - PowerPoint PPT Presentation

Making deep neural networks robust to label noise: � a loss correction approach Giorgio Patrini 23 July 2017 CVPR, Honolulu joint work with Alessandro Rozza, Aditya Krishna Menon, Richard Nock and Lizhen Qu ANU, Data61, Waynaut, University of Sydney

Label noise: motivations “Data science becomes the art of extracting labels out of thin air” [Malach & Shalev-Shwartz 17] G. Patrini | Making deep neural networks robust to label noise | github.com/giorgiop/loss-correction

Label noise: motivations “Data science becomes the art of extracting labels out of thin air” [Malach & Shalev-Shwartz 17] Labels from Web queries Crowd sourcing : ? : jaguar : leopard : cheetah G. Patrini | Making deep neural networks robust to label noise | github.com/giorgiop/loss-correction

Previous work (sample) • Noise-aware deep nets (CV) – Good performance on specific domains, scalable – Heuristics – In many cases, need some clean labels [Sukhbaat ar et al. ICLR15, Krause et al. ECCV16, Xiao et al. CVPR15] • Theoretically robust loss functions (ML) – Theoretically sound – Unrealistic assumptions … knowing the noise distribution! [Natarajan et al. NIPS13, Patrini et al. ICML16] • Estimating the noise from noisy data [Menon et al. ICML15] G. Patrini | Making deep neural networks robust to label noise | github.com/giorgiop/loss-correction

Contributions • Two procedures for loss correction . Loss/architecture/ dataset agnostic. • Theoretical guarantee: same model as without noise (in expectation). • Noise estimation, by using the same deep net. • Tests on MNIST, CIFAR10/100, IMDB with multiple nets (CNN, ResNets, LSTM, … ). SOTA on data of [Xiao et al. 15]. G. Patrini | Making deep neural networks robust to label noise | github.com/giorgiop/loss-correction

Supervised learning • Sample from p ( x , y ) y ∈ { e j : j = 1 , . . . , c } • -class classification: c • Learn a neural network p ( y | x ) G. Patrini | Making deep neural networks robust to label noise | github.com/giorgiop/loss-correction

Supervised learning • Sample from p ( x , y ) y ∈ { e j : j = 1 , . . . , c } • -class classification: c • Learn a neural network p ( y | x ) • Minimize the empirical risk associated with loss : ` ( y , p ( y | x )) argmin E S ` ( y , p ( y | x )) p ( y | x ) � > • Let ` ( e 1 , p ( y | x )) , . . . , ` ( e c , p ( y | x )) � ` ( p ( y | x )) = G. Patrini | Making deep neural networks robust to label noise | github.com/giorgiop/loss-correction

Asymmetric label noise • Sample from p ( x , ˜ y ) • Corruption by asymmetric noise, defined by a transition matrix : T ∈ [0 , 1] c × c p (˜ y | y ) y = e j | y = e i ) ˜ T ij = p (˜ y y x Feature independent noise G. Patrini | Making deep neural networks robust to label noise | github.com/giorgiop/loss-correction

Asymmetric label noise • Sample from p ( x , ˜ y ) • Corruption by asymmetric noise, defined by a transition matrix : T ∈ [0 , 1] c × c p (˜ y | y ) y = e j | y = e i ) ˜ T ij = p (˜ y y x Feature independent noise • How to be robust to such noise? G. Patrini | Making deep neural networks robust to label noise | github.com/giorgiop/loss-correction

Backward loss correction • -class version of [Natarajan et al. 13] c ` ← ( p ( y | x )) = T − 1 ` ( p ( y | x )) • Rationale: linear combination of losses, weighted by the inverse of the noise probabilities • “One step back” in the Markov chain T G. Patrini | Making deep neural networks robust to label noise | github.com/giorgiop/loss-correction

Backward loss correction: theory • Theorem: if is non-singular, is ` ← T unbiased . It follows that the models learned with/without noise are the same under noise expectation: y ` ← ( y , p ( y | x )) = argmin argmin E x , y ` ( y , p ( y | x )) E x , ˜ p ( y | x ) p ( y | x ) G. Patrini | Making deep neural networks robust to label noise | github.com/giorgiop/loss-correction

Forward loss correction • Inspired by [Sukhbaatar et al. 15]: “absorbs” the noise in a top linear layer, emulating T ` ! ( p ( y | x )) = ` ( T > p ( y | x )) • Rationale: compare noisy labels with “noisified” predictions G. Patrini | Making deep neural networks robust to label noise | github.com/giorgiop/loss-correction

Forward loss correction: theory • Theorem: if is non-singular, is such ` → T that the models with/without noise are the same under noise expectation* : y ` → ( y , p ( y | x )) = argmin argmin E x , y ` ( y , p ( y | x )) E x , ˜ p ( y | x ) p ( y | x ) * Technically, the loss needs to be proper composite here. Cross- entropy and square are OK. G. Patrini | Making deep neural networks robust to label noise | github.com/giorgiop/loss-correction

Noise estimation • -class extension of [Menon et al. 15] c G. Patrini | Making deep neural networks robust to label noise | github.com/giorgiop/loss-correction

Noise estimation • -class extension of [Menon et al. 15] c • Hp: there are some “perfect examples”, and the net can model very well p (˜ y | x ) G. Patrini | Making deep neural networks robust to label noise | github.com/giorgiop/loss-correction

Noise estimation • -class extension of [Menon et al. 15] c • Hp: there are some “perfect examples”, and the net can model very well p (˜ y | x ) • First, train and get p (˜ y | x ) • Then estimate by ˆ T x i = argmax y = e i | x ) ¯ p (˜ x ∀ i, j y = e j | ¯ x i ) T ij = p (˜ G. Patrini | Making deep neural networks robust to label noise | github.com/giorgiop/loss-correction

Noise estimation • -class extension of [Menon et al. 15] c • Hp: there are some “perfect examples”, and the net can model very well p (˜ y | x ) • First, train and get p (˜ y | x ) • Then estimate by ˆ T x i = argmax y = e i | x ) ¯ p (˜ x ∀ i, j y = e j | ¯ x i ) T ij = p (˜ • Rationale: mistakes on “perfect examples” must be due to the noise G. Patrini | Making deep neural networks robust to label noise | github.com/giorgiop/loss-correction

Recap: the algorithm ˆ (1) Train the network on noisy data to obtain T y | x ) → ˆ argmin y ` ( y , p ( y | x )) = p (˜ E x , ˜ T p ( y | x ) (2) Re-train the network correcting with backward/forward loss, e.g. y ` ← ( y , p ( y | x )) argmin E x , ˜ p ( y | x ) n i e g n a h c o n n o i t a g a p o r p - k c a b G. Patrini | Making deep neural networks robust to label noise | github.com/giorgiop/loss-correction

Empirics: models and datasets • Goal: show robustness independently from architecture and dataset Simulated noise: – MNIST: 2 x fully connected, dropout – IMDB: word embedding + LSTM – CIFAR10/100: various ResNets Real noise: – Clothing1M [Xiao et al. 15], 50-ResNet G. Patrini | Making deep neural networks robust to label noise | github.com/giorgiop/loss-correction

Inject sparse, asymmetric T   1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0     0 0 .3 0 0 0 0 .7 0 0     0 0 0 .3 0 0 0 0 .7 0   T   0 0 0 0 1 0 0 0 0 0     0 0 0 0 0 .3 .7 0 0 0     0 0 0 0 0 .7 .3 0 0 0     0 .7 0 0 0 0 0 .3 0 0     0 0 0 0 0 0 0 0 1 0   0 0 0 0 0 0 0 0 0 1   1 ✏ ✏ ✏ ✏ ✏ ✏ ✏ ✏ ✏ 1 ✏ ✏ ✏ ✏ ✏ ✏ ✏ ✏ ✏     .33 .67 ✏ ✏ ✏ ✏ ✏ ✏ ✏ ✏ ˆ   T   .35 ✏ .65 ✏ ✏ ✏ ✏ ✏ ✏ ✏     1 ✏ ✏ ✏ ✏ ✏ ✏ ✏ ✏ ✏     ✏ < 10 − 6 ✏ .29 .71 ✏ ✏ ✏ ✏ ✏ ✏ ✏     ✏ .73 .26 ✏ ✏ ✏ ✏ ✏ ✏ ✏     ✏ .75 .25 ✏ ✏ ✏ ✏ ✏ ✏ ✏     1 ✏ ✏ ✏ ✏ ✏ ✏ ✏ ✏ ✏   1 ✏ ✏ ✏ ✏ ✏ ✏ ✏ ✏ ✏ G. Patrini | Making deep neural networks robust to label noise | github.com/giorgiop/loss-correction

Experiments with real noise Clothing1M [Xiao et al. CVPR15] • Trainset: 1M noisy label + 50k clean labels • Testset: 10k clean labels G. Patrini | Making deep neural networks robust to label noise | github.com/giorgiop/loss-correction

Experiments with real noise Clothing1M # model loss init training accuracy 50 k 72 . 63 1 AlexNet cross-. ImageNet 2 AlexNet cross-. #1 1 M, 50 k 76 . 22 3 2x AlexNet cross-. #1 1 M, 50 k 78 . 24 4 50-ResNet cross- ImageNet 1 M 68 . 94 5 50-ResNet backward ImageNet 1 M 69 . 13 6 50-ResNet forward ImageNet 1 M 69 . 84 50 k 75 . 19 7 50-ResNet cross-. ImageNet 8 50-ResNet cross-. #6 50 k 80 . 38 Recipe for SOTA: Our method • Pre-train: “forward loss” on 1M noisy labels • Fine-tune: cross-entropy on 50k clean labels G. Patrini | Making deep neural networks robust to label noise | github.com/giorgiop/loss-correction

Making deep neural networks robust to label noise: a loss - PowerPoint PPT Presentation

Making deep neural networks robust to label noise: a loss correction approach Giorgio Patrini 23 July 2017 CVPR, Honolulu joint work with Alessandro Rozza, Aditya Krishna Menon, Richard Nock and Lizhen Qu ANU, Data61, Waynaut, University

Blue Label Pilot-plant Reactor 1 Product Line-up Platinum Label Gold Label Blue Label Blue

AG! Blue Label Bench-top Reactor 1 Product line up Platinum Label Gold Label Blue Label Blue

Making Polynomials Robust to Noise Alexander Sherstov U C L A Noise in computation 2 Noise in

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Extreme Classification A New Paradigm for Ranking & Recommendation Manik Varma Microsoft

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Module-2c: Two Port Noise Modelling 20 July 2018 16:40 Shot Noise vs. Flicker Noise Simple

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Optimizing Deep Neural Networks Leena Chennuru Vankadara 26-10-2015 Table of Contents Neural

Boosting under high noise. Adaboost is sensitive to label noise Letter / Irvine Database

ICASSP 2017 Tutorial on Methods for Interpreting and Understanding Deep Neural Networks G.

Deep Learning with Neural Networks The Structure and Optimization of Deep Neural Networks Allan

Introduction to Artificial Intelligence Neural Networks - Deep Learning for NLP Janyl Jumadinova

Outlier Outlier Outlier- Outlier - -robust - robust robust robust identification

Visioning Committee Air Quality and Noise January 23, 2020 Noise Data Noise is evaluated on

Hadron mass corrections Hadron mass corrections in SIDIS and DIS in SIDIS and DIS Alberto

The representation of asteroid shapes: a test for the inversion of Gaia photometry A. Carbognani

Sliding window detection January 29, 2009 Kristen Grauman UT-Austin Schedule

Vir irgin inia ia Sar arah ah E. Ras askin, kin, PhD, MPH on behalf of the VCU iCubed Oral

The role of neutrinos in the ejection of matter from binary neutron star mergers. Albino Perego

Jet Substructure at the LHC Wouter Waalewijn LANL - January 8, 2015 Outline Introduction

Story of Medea, Roman sarcophagus, Ancona Leo X directly involved with help of theologian or

Verse in 24 Do you not know that in a race all the runners run, but only one receives the prize?

Sambuz

Useful Links

Newsletter

Mail Us

Making deep neural networks robust to label noise: a loss - PowerPoint PPT Presentation

Making deep neural networks robust to label noise: a loss correction approach Giorgio Patrini 23 July 2017 CVPR, Honolulu joint work with Alessandro Rozza, Aditya Krishna Menon, Richard Nock and Lizhen Qu ANU, Data61, Waynaut, University

Blue Label Pilot-plant Reactor 1 Product Line-up Platinum Label Gold Label Blue Label Blue

AG! Blue Label Bench-top Reactor 1 Product line up Platinum Label Gold Label Blue Label Blue

Making Polynomials Robust to Noise Alexander Sherstov U C L A Noise in computation 2 Noise in

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Extreme Classification A New Paradigm for Ranking &amp; Recommendation Manik Varma Microsoft

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Module-2c: Two Port Noise Modelling 20 July 2018 16:40 Shot Noise vs. Flicker Noise Simple

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Optimizing Deep Neural Networks Leena Chennuru Vankadara 26-10-2015 Table of Contents Neural

Boosting under high noise. Adaboost is sensitive to label noise Letter / Irvine Database

ICASSP 2017 Tutorial on Methods for Interpreting and Understanding Deep Neural Networks G.

Deep Learning with Neural Networks The Structure and Optimization of Deep Neural Networks Allan

Introduction to Artificial Intelligence Neural Networks - Deep Learning for NLP Janyl Jumadinova

Outlier Outlier Outlier- Outlier - -robust - robust robust robust identification

Visioning Committee Air Quality and Noise January 23, 2020 Noise Data Noise is evaluated on

Hadron mass corrections Hadron mass corrections in SIDIS and DIS in SIDIS and DIS Alberto

The representation of asteroid shapes: a test for the inversion of Gaia photometry A. Carbognani

Sliding window detection January 29, 2009 Kristen Grauman UT-Austin Schedule

Vir irgin inia ia Sar arah ah E. Ras askin, kin, PhD, MPH on behalf of the VCU iCubed Oral

The role of neutrinos in the ejection of matter from binary neutron star mergers. Albino Perego

Jet Substructure at the LHC Wouter Waalewijn LANL - January 8, 2015 Outline Introduction

Story of Medea, Roman sarcophagus, Ancona Leo X directly involved with help of theologian or

Verse in 24 Do you not know that in a race all the runners run, but only one receives the prize?

Sambuz

Useful Links

Newsletter

Mail Us

Extreme Classification A New Paradigm for Ranking & Recommendation Manik Varma Microsoft