Learning with Multiple Complementary Labels Lei Feng 1* , Takuo - - PowerPoint PPT Presentation

learning with multiple complementary labels
SMART_READER_LITE
LIVE PREVIEW

Learning with Multiple Complementary Labels Lei Feng 1* , Takuo - - PowerPoint PPT Presentation

Learning with Multiple Complementary Labels Lei Feng 1* , Takuo Kaneko 2,3* , Bo Han 3,4 , Gang Niu 3 , Bo An 1 , Masashi Sugiyama 2,3 1 Nanyang Technological University, Singapore 2 The University of Tokyo, Tokyo, Japan 3 RIKEN Center for Advanced


slide-1
SLIDE 1

Learning with Multiple Complementary Labels

Lei Feng1*, Takuo Kaneko2,3*, Bo Han3,4, Gang Niu3, Bo An1, Masashi Sugiyama2,3

1Nanyang Technological University, Singapore 2The University of Tokyo, Tokyo, Japan 3RIKEN Center for Advanced Intelligent Project, Tokyo, Japan 4Hong Kong Baptist University, Hong Kong SAR, China *Equal Contribution

ICML 2020

1 ICML 2020 Learning with Multiple Complementary Labels

slide-2
SLIDE 2

Outline

ICML 2020 2

⚫ Learning Frameworks ⚫ Problem Formulation ⚫ The Proposed Methods

❑ Wrappers ❑ Unbiased Risk Estimator ❑ Upper-Bound Surrogate Losses

⚫ Experiments ⚫ Conclusion

Learning with Multiple Complementary Labels

slide-3
SLIDE 3

Learning Frameworks

ICML 2020 3

Supervised Learning:

Unsupervised Learning:

Semi-Supervised Learning [Chapelle et al., 2006]:

Complementary-Label Learning [Ishida et al., 2017;2019]:

Learning with Multiple Complementary Labels (our paper):

Instance ??? Instance True Label Instance True Label

Learning with Multiple Complementary Labels

Instance ??? False label Instance False label False label Instance False label

slide-4
SLIDE 4

Data Distribution

For complementary-label (CL) learning [Ishida et al., 2017; 2019]: ҧ 𝑞 𝒚, ത 𝑧 =

1 𝑙−1 σ𝑧≠ ത 𝑧 𝑞(𝒚, 𝑧).

For learning with multiple complementary labels (MCLs): ҧ 𝑞 𝑦, ത 𝑍 = σ𝑘=1

𝑙−1 𝑞 𝑡 = 𝑘

ҧ 𝑞 𝒚, ത 𝑍 𝑡 = 𝑘), where ҧ 𝑞 𝒚, ത 𝑍 𝑡 = 𝑘) ≔ ൞ 1

𝑙−1 𝑘

𝑘∉ ത 𝑍

𝑞(𝒚, 𝑧) if ത 𝑍 = 𝑘,

  • therwise.

➢ 𝑙: the number of classes ➢

ҧ 𝑞 𝒚, ത 𝑧 : joint distribution with a single CL

➢ 𝑞(𝒚, 𝑧): joint distribution with a single true label ➢

ҧ 𝑞 𝒚, ത 𝑍 : joint distribution with MCLs

➢ 𝑞 𝑡 = 𝑘 : the probability of the size of the set of MCLs being 𝑘

ICML 2020 4 Learning with Multiple Complementary Labels

slide-5
SLIDE 5

Wrappers

➢ #TP: how many times the correct label serves as a non-complementary

label for each instance

➢ #FP: how many times the other labels except the correct label serve as a

non-complementary label for each instance

➢ Supervision Purity: #TP/(#TP+#FP)

Decomposing a set of MCLs into many single CLs: Decomposition after Shuffle/Decomposition before Shuffle. E.g., suppose ത 𝑍 = ത 𝑧1, ത 𝑧2 , 𝑦, ത 𝑍 is decomposed into 𝑦, ത 𝑧1 and 𝑦, ത 𝑧2 . Using the wrappers, we can apply any existing complementary-label learning

  • methods. However, the supervision purity would be diluted after

decomposition, as shown in the above table.

ICML 2020 5 Learning with Multiple Complementary Labels

slide-6
SLIDE 6

Unbiased Risk Estimator

The classification risk can be equivalently expressed as 𝑆 𝑔 = σ𝑘=1

𝑙−1 𝑞(𝑡 = 𝑘) ഥ

𝑆

𝑘(𝑔),

where ഥ 𝑆

𝑘 𝑔 ≔ 𝔽 ҧ 𝑞 𝑦, ത 𝑍 𝑡=𝑘)[ ҧ

ℒ𝑘(𝑔 𝑦 , ത 𝑍)], and ҧ ℒ𝑘 𝑔 𝑦 , ത 𝑍 ≔ σ𝑧∉ ത

𝑍 ℒ 𝑔 𝑦 , 𝑧 − 𝑙−1−𝑘 𝑘

σ𝑧′∉ ത

𝑍 ℒ 𝑔 𝑦 , 𝑧′ .

➢ 𝑆 𝑔 : the classification risk defined as 𝔽𝑞(𝑦,𝑧)[ℒ 𝑔 𝑦 , 𝑧 ] ➢ ℒ 𝑔 𝑦 , 𝑧 : multi-class loss function

Each set of MCLs is taken as a whole!

ICML 2020 6 Learning with Multiple Complementary Labels

slide-7
SLIDE 7

Practical Implementation

Observation: The empirical risk estimator may become unbounded below if the used loss function is unbounded, thereby leading to over-fitting. Conjecture: Bounded loss is better than unbounded loss. Results: We validate via experiments that MAE, MSE, GCE [Zhang & Sabuncu, 2018], and Phuber-CE [Menon et al., 2020] outperform CCE.

ICML 2020 7 Learning with Multiple Complementary Labels

slide-8
SLIDE 8

Is Bounded Loss Good Enough?

Is the performance of the unbiased risk estimator with bounded loss good enough? We take MAE for example, and insert MAE into the empirical risk estimator, and obtain an equivalent formulation as ℒMAE

𝑔 𝒚𝑗 , ത 𝑍

𝑗 = 1 − σ𝑘∉ ത 𝑍𝑗 𝑞𝜾 𝑘 𝒚𝑗),

Its gradient is expressed as 𝜖ℒMAE

𝜖𝜾 = ቊ−𝛼𝜾𝑞𝜾 𝑘 𝒚𝑗) ∙ 1 if 𝑘 ∉ ത 𝑍

𝑗,

  • therwise.

Each example is treated equally important for optimization.

ICML 2020 8 Learning with Multiple Complementary Labels

slide-9
SLIDE 9

Upper-Bound Surrogate Losses

We propose the following upper-bound surrogate losses: ℒEXP 𝑔 𝒚𝑗 , ത 𝑍

𝑗 = exp(− σ𝑘∉ ത 𝑍𝑗 𝑞𝜾 𝑘 𝒚𝑗)),

ℒLOG 𝑔 𝒚𝑗 , ത 𝑍

𝑗 = −log(σ𝑘∉ ത 𝑍𝑗 𝑞𝜾 𝑘 𝒚𝑗)).

Their gradient can be expressed as 𝜖ℒEXP 𝜖𝜾 = ቊ−𝛼𝜾𝑞𝜾 𝑘 𝒚𝑗) ∙ 𝑥EXP if 𝑘 ∉ ത 𝑍

𝑗,

  • therwise,

𝜖ℒLOG 𝜖𝜾 = ቊ−𝛼𝜾𝑞𝜾 𝑘 𝒚𝑗) ∙ 𝑥LOG if 𝑘 ∉ ത 𝑍

𝑗,

  • therwise,

where 𝑥EXP = exp(− σ𝑘∉ ത

𝑍𝑗 𝑞𝜾 𝑘 𝒚𝑗)) and 𝑥LOG = (σ𝑘∉ ത 𝑍𝑗 𝑞𝜾 𝑘 𝒚𝑗))−𝟐.

Higher weights will be given to hard examples!

ICML 2020 9 Learning with Multiple Complementary Labels

slide-10
SLIDE 10

Experiments

Benchmark datasets: MNIST, Kuzushiji-MNIST, Fashion-MNIST, CIFAR-10.

UCI datasets: Yeast, Texture, Dermatology, Synthetic Control, 20Newsgroups.

Compared methods: GA, NN, and Free [Ishida et al., 2019], PC [Ishida et al., 2017], Forward [Yu et al., 2018], CLPL [Cour et al., 2011], unbiased risk estimator with bounded losses MAE, MSE, GCE [Zhang & Sabuncu, 2018], and PHuber-CE (Menon et al., 2020) and unbounded loss CCE, and the two upper-bound surrogate losses EXP and LOG. Extensive experimental results clearly demonstrate the effectiveness of our proposed methods.

ICML 2020 10 Learning with Multiple Complementary Labels

slide-11
SLIDE 11

Conclusion

❑ A novel problem setting that generalizes learning with a

single CL to learning with MCLs.

❑ Solutions including the wrappers and an unbiased risk

estimator.

❑ Upper-bound surrogate losses.

ICML 2020 11 Learning with Multiple Complementary Labels

Thank you!