Error-Bounded Correction of Noisy Labels Songzhu Zheng , Pengxiang - PowerPoint PPT Presentation

Error-Bounded Correction of Noisy Labels Songzhu Zheng , Pengxiang Wu, Aman Goswami, Mayank Goswami, Dimitris Metaxas, Chao Chen The State University of New York at Stony Brook Rutgers University The City University of New York, Queen’s College 1

Label Noise is Ubiquitous and Troublesome Dog Training Train with noisy labels Cat Noisy Model Infer with Inference Dog noisy model Noisy Model Label Noise can be Introduced by: • Human or automatic annotators mistakenly (Yan et al. 2014; Veit et al. 2017) 2

Settings • ෤ 𝑧 is noisy label (observed), 𝑧 is clean label (unknown) • C hanllenge: Train with noisy data 𝐲, ෥ 𝒛 . But require to give correct prediction 𝒛 . Cat ? Inference Robust Model Trained with 𝐲, ෥ 𝒛 3

Settings • ෤ 𝑧 is noisy label (observed), 𝑧 is clean label (unknown) • C hanllenge: Train with noisy data 𝐲, ෥ 𝒛 . But require to give correct prediction 𝒛 . Cat ? Inference Robust Model Trained with 𝐲, ෥ 𝒛 • Noise Transition Matrix 𝑈 . Each entry 𝜐 𝑗𝑘 = 𝑄 ෤ 𝑧 = 𝑘 𝑧 = 𝑗) : Noisy 𝑑𝑏𝑢 𝑒𝑝𝑕 ℎ𝑣𝑛𝑏𝑜 Noisy 𝑑𝑏𝑢 𝑒𝑝𝑕 ℎ𝑣𝑛𝑏𝑜 True True 𝑑𝑏𝑢 0.4 0.3 0.3 𝑑𝑏𝑢 0.6 0.4 0 𝑈 = 𝑈 = 𝑒𝑝𝑕 0.4 0.6 0 𝑒𝑝𝑕 0.3 0.4 0.3 ℎ𝑣𝑛𝑏𝑜 0.3 0.3 0.4 ℎ𝑣𝑛𝑏𝑜 0 0.4 0.6 Uniform Noise Pairwise Noise 4

Existing Solutions – Model Re-calibration • Introduce new loss term to get robust model: 1) Estimation of matrix 𝑈 to correct the loss term (Goldberger & Ben-Reuven, 2017; Patrini et al., 2017) 2) Robust deep learning layer (Van Rooyen et al., 2015) 3) Reconstruction loss term (Reed et al., 2014) • Pros: Globally regularization; theoretical guarantee • Cons: Not flexible enough; omit local information 5

Existing Solutions – Data Re-calibration Training Reweighting Robust Model Input • Re-weighting or pick data point using noisy classifier • Noisy classifier’s confidence determines the weight • Clean labels gain higher weight • Re-weighting and training happens jointly • Pros: Better performance than model re-calibration model. Flexible enough to fully use point-wise information • Cons: 6 No theoretical support

Contribution • The first theoretic explanation for data re-calibration method • Explained why noisy classifier to be used to decide whether a label is trustable or not. • A theory inspired data re-calibrating algorithm • Easy to tune • Scalable • Label Correction Image Source: https://media.istockphoto.com/vectors/hand-drawn-vector-cartoon-illustration-of-a-broken-robot-trying-to-vector- 7 id1131797122?k=6&m=1131797122&s=612x612&w=0&h=H2fviprWr24dxlO2QPae1R8X3nrHB-J40NCunv2aE84=

(N (Noisy) Cla lassifier and (N (Noisy) Posterior Classification scoring function f 𝑦 approximates posterior probability of labels: • Clean (𝑦, 𝑧) : 𝑔(𝑦) approximates clean posterior 𝜃 𝑦 = 𝑄 𝑧 = 1 𝑦) • Noisy (𝑦, ෤ 𝑧) : 𝑔(𝑦) approximates noisy posterior ෤ 𝜃 𝑦 = 𝑄 ෤ 𝑧 = 1 𝑦) 8

(N (Noisy) Cla lassifier and (N (Noisy) Posterior Classification scoring function f 𝑦 approximates posterior probability of labels: • Clean (𝑦, 𝑧) : 𝑔(𝑦) approximates clean posterior 𝜃 𝑦 = 𝑄 𝑧 = 1 𝑦) • Noisy (𝑦, ෤ 𝑧) : 𝑔(𝑦) approximates noisy posterior ෤ 𝜃 𝑦 = 𝑄 𝑧 = 1 𝑦) • There is a linear relationship ෤ 𝜃 𝑦 = (1 − 𝜐 10 − 𝜐 01 ) 𝜃 𝑦 + 𝜐 01 Remember 𝜐 10 = 𝑄 ෤ 𝑧 = 0 𝑧 = 1) and 𝜐 01 = 𝑄 ෤ 𝑧 = 1 𝑧 = 0) 9

Low Confidence of ෤ 𝜃 𝑦 Implies Noise 1− 𝜐 10 −𝜐 01 Theorem 1. Let 𝜗 ≔ 𝑔 − ෤ 𝜃 ∞ and for Δ = , there exists constant 𝐷, 𝜇 > 0 2 such that: 𝜇 • 𝑧 = 1 ∶ 𝑄𝑠𝑝𝑐 𝑔 𝑦 ≤ Δ, ෤ ෤ 𝑧 𝑗𝑡 𝑑𝑚𝑓𝑏𝑜 ≤ 𝐷 𝑃 𝜗 • 𝜇 𝑧 = 0 ∶ 𝑄𝑠𝑝𝑐 1 − 𝑔 𝑦 ≤ Δ, ෤ ෤ 𝑧 𝑗𝑡 𝑑𝑚𝑓𝑏𝑜 ≤ 𝐷 𝑃 𝜗 10

Low Confidence of ෤ 𝜃 𝑦 Implies Noise Theorem 1. Let 𝜗 ≔ 𝑔 − ෤ 𝜃 ∞ , there exists constant 𝐷, 𝜇 > 0 and Δ ∈ (0, 1) , such that: • 𝜇 𝑧 = 1 ∶ 𝑄𝑠𝑝𝑐 𝑔 𝑦 ≤ Δ, ෤ ෤ 𝑧 𝑗𝑡 𝑑𝑚𝑓𝑏𝑜 ≤ 𝐷 𝑃 𝜗 𝜇 • 𝑧 = 0 ∶ 𝑄𝑠𝑝𝑐 1 − 𝑔 𝑦 ≤ Δ, ෤ ෤ 𝑧 𝑗𝑡 𝑑𝑚𝑓𝑏𝑜 ≤ 𝐷 𝑃 𝜗 Clean Label Noisy Label 𝚬 ℓ 𝜃 𝑦 𝜃 𝑦 ෤ ℓ 11

Low Confidence of ෤ 𝜃 𝑦 Implies Noise Theorem 1. Let 𝜗 ≔ 𝑔 − ෤ 𝜃 ∞ , there exists constant 𝐷, 𝜇 > 0 and Δ ∈ (0, 1) , such that: • 𝜇 𝑧 = 1 ∶ 𝑄𝑠𝑝𝑐 𝑔 𝑦 ≤ Δ, ෤ ෤ 𝑧 𝑗𝑡 𝑑𝑚𝑓𝑏𝑜 ≤ 𝐷 𝑃 𝜗 𝜇 • 𝑧 = 0 ∶ 𝑄𝑠𝑝𝑐 1 − 𝑔 𝑦 ≤ Δ, ෤ ෤ 𝑧 𝑗𝑡 𝑑𝑚𝑓𝑏𝑜 ≤ 𝐷 𝑃 𝜗 Clean Label Noisy Label 𝚬 ℓ 𝜃 𝑦 𝜃 𝑦 ෤ f 𝑦 ℓ 12

Low Confidence of ෤ 𝜃 𝑦 Implies Noise Theorem 1. Let 𝜗 ≔ 𝑔 − ෤ 𝜃 ∞ , there exists constant 𝐷, 𝜇 > 0 and Δ ∈ (0, 1) , such that: • 𝜇 𝑧 = 1 ∶ 𝑄𝑠𝑝𝑐 𝑔 𝑦 ≤ Δ, ෤ ෤ 𝑧 𝑗𝑡 𝑑𝑚𝑓𝑏𝑜 ≤ 𝐷 𝑃 𝜗 𝜇 • 𝑧 = 0 ∶ 𝑄𝑠𝑝𝑐 1 − 𝑔 𝑦 ≤ Δ, ෤ ෤ 𝑧 𝑗𝑡 𝑑𝑚𝑓𝑏𝑜 ≤ 𝐷 𝑃 𝜗 𝜃 𝑦 = (1 − 𝜐 10 − 𝜐 01 ) 𝜃 𝑦 + 𝜐 01 ෤ Clean Label Noisy Label 𝚬 ℓ 𝜃 𝑦 𝜃 𝑦 ෤ f 𝑦 ℓ 13

Low Confidence of ෤ 𝜃 𝑦 Implies Noise Theorem 1. Let 𝜗 ≔ 𝑔 − ෤ 𝜃 ∞ , there exists constant 𝐷, 𝜇 > 0 and Δ ∈ (0, 1) , such that: • 𝜇 𝑧 = 1 ∶ 𝑄𝑠𝑝𝑐 𝑔 𝑦 ≤ Δ, ෤ ෤ 𝑧 𝑗𝑡 𝑑𝑚𝑓𝑏𝑜 ≤ 𝐷 𝑃 𝜗 𝜇 • 𝑧 = 0 ∶ 𝑄𝑠𝑝𝑐 1 − 𝑔 𝑦 ≤ Δ, ෤ ෤ 𝑧 𝑗𝑡 𝑑𝑚𝑓𝑏𝑜 ≤ 𝐷 𝑃 𝜗 Clean Label Noisy Label 𝚬 ℓ 𝜃 𝑦 𝜃 𝑦 ෤ f 𝑦 ℓ 14

Inconfidence of ෤ 𝜃 𝑦 Implies Noise Theorem 1. Let 𝜗 ≔ 𝑔 − ෤ 𝜃 ∞ , there exists constant 𝐷, 𝜇 > 0 and Δ ∈ (0, 1) , such that: • 𝜇 𝑧 = 1 ∶ 𝑄𝑠𝑝𝑐 𝑔 𝑦 ≤ Δ, ෤ ෤ 𝑧 𝑗𝑡 𝑑𝑚𝑓𝑏𝑜 ≤ 𝐷 𝑃 𝜗 𝜇 • 𝑧 = 0 ∶ 𝑄𝑠𝑝𝑐 1 − 𝑔 𝑦 ≤ Δ, ෤ ෤ 𝑧 𝑗𝑡 𝑑𝑚𝑓𝑏𝑜 ≤ 𝐷 𝑃 𝜗 1 − 𝜃 𝑦 1 − ෤ 𝜃 𝑦 1 − 𝑔 𝑦 Clean Label Noisy Label 𝚬 ℓ ℓ 15

Tsybakov Condition Clean Label Noisy Label 𝜃 𝑦 = 1/2 𝟐 𝟑 − 𝒖 ≤ 𝛉 𝐲 ≤ 𝟐 𝟑 + 𝒖 16

Tsybakov Condition 1 • Tsybakov Condition [2004]. There exists constants 𝐷, 𝜇 > 0 and 𝑢 0 ∈ ቀ 0, 2 , such that for all 𝑢 ≤ 𝑢 0 , ቃ 1 2 ≤ 𝑢 ≤ 𝐷𝑢 𝜇 𝑄 𝜃 𝑦 − Clean Label Noisy Label 𝜃 𝑦 = 1/2 𝟑 − 𝒖 ≤ 𝛉 𝐲 ≤ 𝟐 𝟐 𝟑 + 𝒖 17

Tsybakov Condition 1 • Tsybakov Condition [2004]. There exists constants 𝐷, 𝜇 > 0 and 𝑢 0 ∈ ቀ 0, 2 , such that for all 𝑢 ≤ 𝑢 0 , ቃ 1 2 ≤ 𝑢 ≤ 𝐷𝑢 𝜇 𝑄 𝜃 𝑦 − 𝐷 = 0.32 and መ Empirical Verification (CIFAR-10) : መ • 𝜇 = 1.04 . Statistically Significant Clean Label Noisy Label 𝜃 𝑦 = 1/2 𝟑 − 𝒖 ≤ 𝛉 𝐲 ≤ 𝟐 𝟐 𝟑 + 𝒖 18

Inconfidence of ෤ 𝜃 𝑦 Implies Noise Theorem 1. Let 𝜗 ≔ 𝑔 − ෤ 𝜃 ∞ , there exists constant 𝐷, 𝜇 > 0 and Δ ∈ (0, 1) , such that: 1.04 • 𝑧 = 1 ∶ 𝑄𝑠𝑝𝑐 𝑔 𝑦 ≤ Δ, ෤ ෤ 𝑧 𝑗𝑡 𝑑𝑚𝑓𝑏𝑜 ≤ 0.23 𝑃 𝜗 • 1.04 𝑧 = 0 ∶ 𝑄𝑠𝑝𝑐 1 − 𝑔 𝑦 ≤ Δ, ෤ ෤ 𝑧 𝑗𝑡 𝑑𝑚𝑓𝑏𝑜 ≤ 0.23 𝑃 𝜗

Theory-Inspired Algorithm 20

Theory-Inspired Algorithm Corollary 1. Let 𝜗 ≔ max 𝑔 𝑦 − ෤ 𝜃(𝑦) . If ෤ 𝑧 𝑜𝑓𝑥 denotes the output of the LRT-Correction with input x, ෤ y , f and 𝜀 then ∃C, 𝜇 > 0 : 𝜇 𝑄𝑠𝑝𝑐 ෤ 𝑧 𝑜𝑓𝑥 is clean > 1 − 𝐷 𝑃 𝜗 Remark: The extension to multi-class would be natural 21

AdaCorr: Using LRT-Correction During Training Step 1: Train 𝑔(𝑦) using (𝑦, ෤ 𝑧) Step 2: Applying LRT-Correction using (𝑦, ෤ 𝑧) , 𝑔(𝑦) and 𝜀 Step 3: Let ෤ y = ෤ 𝑧 𝑜𝑓𝑥 Step 4: Repeat Step 1~3 Remark: In step 1, to get a good approximation of ෤ 𝜃(𝑦) , we train 𝑔(𝑦) with (𝑦, ෤ 𝑧) for several warm-up epochs 22

Error-Bounded Correction of Noisy Labels Songzhu Zheng , Pengxiang - PowerPoint PPT Presentation

Error-Bounded Correction of Noisy Labels Songzhu Zheng , Pengxiang Wu, Aman Goswami, Mayank Goswami, Dimitris Metaxas, Chao Chen The State University of New York at Stony Brook Rutgers University The City University of New York, Queens

ERROR DETECTON & CORRECTION Error Detection EDC= Error Detection and Correction bits

Quantum Information Processing and Quantum Error Correction and Quantum Error Correction with

Private quantum subsystems and error Tomas Jochym- OConnor correction Privacy & error

QEC11 Quantum Error Correction and Quantum Error-Correcting Codes Todd A. Brun Center for

The Capability of Error Correction for Burst-noise Channels Using Error Estimating Code Yaoyu

Formal Modeling in Cognitive Science 1 Noisy Channel Model Channel Capacity Lecture 29: Noisy

Generalized Cross Entropy Loss for Noisy Labels Zhilu Zhang and Mert R. Sabuncu Cornell

Beyond Synthetic Noise: Deep Learning on Controlled Noisy Labels Lu Jiang, Di Huang, Mason Liu,

2016 Vegetable Pesticide Update: Weeds 1) New/Changed labels 2) Labels soon 3) Auxin Technologies

2012 GFVGA: Herbicide Update 2012 Weed Control Update 1. Recent labels 2. New labels 3. Near

Constructing Error- -Correction Codes Correction Codes Constructing Error from Scale- -Free

Parametric Linear System Solving with Error Correction. Cleveland Waddell S YMBOLIC -N UMERIC C

On Error Correction in the Exponent Chris Peikert MIT Computer Science and AI Laboratory Theory

Spelling Correction and the Noisy Channel The Spelling

Chapter 11: The R.M.S. Error for Regression Errors: A has a large positive error B has a large

6.02 Fall 2012 Lecture #12 Bounded-input, bounded-output stability Frequency response 6.02

Reducing Web Latency: The Virtue of Gentle Aggression Tobias Flach , Nandita Dukkipati, Andreas

Approximate Computing on Unreliable Silicon Georgios Karakonstantis 2 Jeremy Constantin, Andreas

Recovery Techniques for Streaming Audio zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA A

Computing and Communications 1. Introduction Ying Cui Department of Electronic Engineering

Shannon's Theory of Communication An operational introduction 5 September 2014, Introduction to

Network Security: Network Review and Firewalls Henning Schulzrinne Columbia University, New York

Energy Efficient Channel Coding Leonardo Fagundes Luz Serrano Energy Efficient Channel Coding

The Case for Run- -Time Error Checking Time Error Checking The Case for Run Todd Austin

Error-Bounded Correction of Noisy Labels Songzhu Zheng , Pengxiang - PowerPoint PPT Presentation

Error-Bounded Correction of Noisy Labels Songzhu Zheng , Pengxiang Wu, Aman Goswami, Mayank Goswami, Dimitris Metaxas, Chao Chen The State University of New York at Stony Brook Rutgers University The City University of New York, Queens

ERROR DETECTON &amp; CORRECTION Error Detection EDC= Error Detection and Correction bits

Quantum Information Processing and Quantum Error Correction and Quantum Error Correction with

Private quantum subsystems and error Tomas Jochym- OConnor correction Privacy &amp; error

QEC11 Quantum Error Correction and Quantum Error-Correcting Codes Todd A. Brun Center for

The Capability of Error Correction for Burst-noise Channels Using Error Estimating Code Yaoyu

Formal Modeling in Cognitive Science 1 Noisy Channel Model Channel Capacity Lecture 29: Noisy

Generalized Cross Entropy Loss for Noisy Labels Zhilu Zhang and Mert R. Sabuncu Cornell

Beyond Synthetic Noise: Deep Learning on Controlled Noisy Labels Lu Jiang, Di Huang, Mason Liu,

2016 Vegetable Pesticide Update: Weeds 1) New/Changed labels 2) Labels soon 3) Auxin Technologies

2012 GFVGA: Herbicide Update 2012 Weed Control Update 1. Recent labels 2. New labels 3. Near

Constructing Error- -Correction Codes Correction Codes Constructing Error from Scale- -Free

Parametric Linear System Solving with Error Correction. Cleveland Waddell S YMBOLIC -N UMERIC C

On Error Correction in the Exponent Chris Peikert MIT Computer Science and AI Laboratory Theory

Spelling Correction and the Noisy Channel The Spelling

Chapter 11: The R.M.S. Error for Regression Errors: A has a large positive error B has a large

6.02 Fall 2012 Lecture #12 Bounded-input, bounded-output stability Frequency response 6.02

Reducing Web Latency: The Virtue of Gentle Aggression Tobias Flach , Nandita Dukkipati, Andreas

Approximate Computing on Unreliable Silicon Georgios Karakonstantis 2 Jeremy Constantin, Andreas

Recovery Techniques for Streaming Audio zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA A

Computing and Communications 1. Introduction Ying Cui Department of Electronic Engineering

Shannon's Theory of Communication An operational introduction 5 September 2014, Introduction to

Network Security: Network Review and Firewalls Henning Schulzrinne Columbia University, New York

Energy Efficient Channel Coding Leonardo Fagundes Luz Serrano Energy Efficient Channel Coding

The Case for Run- -Time Error Checking Time Error Checking The Case for Run Todd Austin

ERROR DETECTON & CORRECTION Error Detection EDC= Error Detection and Correction bits

Private quantum subsystems and error Tomas Jochym- OConnor correction Privacy & error