Robust Deep Learning Based on Meta-learning Deyu Meng Xian Jiaotong - PowerPoint PPT Presentation

Robust Deep Learning Based on Meta-learning Deyu Meng Xi’an Jiaotong University dymeng@mail.xjtu.edu.cn http://gr.xjtu.edu.cn/web/dymeng

Deep Learning • Robust • Meta-learning •

The Success of Deep Learning Relies on well-annotated & big data sets LFW

What we think we have: But what we really have is always:

Commonly Encountered Data Bias (low quality data) Class imbalance Data noise Label noise

Robust Machine Learning for Data Bias Design specific optimization objective (especially, robust loss) to make it robust to certain data bias: Class imbalance Data noise Label noise Meng, et al., Information Sciences, 2017 Lin, et al., TPAMI, 2018 Yong, et al., TPAMI, 2018

Two Critical Issues Hyperparameter Tunning Generalized Cross Entropy Zhang, et al., NeurIPS, 2018 Symmetric Cross Entropy Wang, et al., ICCV, 2019 Bi-Tempered logistic Loss Amid, et al., NeurIPS, 2019 Polynomial SoftWeighting loss Non-convexity Zhao, et al., AAAI, 2015 Focal loss Lin, et al., TPAMI, 2018 CT loss Xie, et al., TMI, 2018

Training Data VS Validation Data Hyper-parameter tuning: by validation data Training loss Validation loss 𝑁 1 𝑛 (𝒙 ∗ (Θ)) ≈ argmin 𝑁 ෍ 𝑀 𝑗 Θ∈{Θ 1 ,Θ 2 ,⋯,Θ 𝑡 } 𝑗=1

Training Data VS Validation Data ✓ Low efficiency ✓ Low accuracy ✓ Search instead of optimization Hyper-parameter tuning: by validation data ✓ Heuristic instead of intelligent Training loss Validation loss 𝑁 1 𝑛 (𝒙 ∗ (Θ)) ≈ argmin 𝑁 ෍ 𝑀 𝑗 Θ∈{Θ 1 ,Θ 2 ,⋯,Θ 𝑡 } 𝑗=1

Intrinsic Functions of Validation Data • The function of validation data is higher than training data ➢ Hyper-parameter tuning VS classifier parameter learning ➢ Make the model adaptable to data fit (general to specific) • Validation data is different from training data! ➢ Teacher vs. student ➢ Ideal vs. real ➢ High quality vs. low quality ➢ Small scale vs. large scale ➢ Fixed vs. dynamic (relatively) • What we should do? ➢ Lower the threshold of training data collection; higher the threshold of validation data selection

From Validation Loss Searching to Meta Loss Training Hyper-parameter tuning: by meta data Training loss Meta loss 𝑁 1 𝑛 (𝒙 ∗ (Θ)) = argmin 𝑁 ෍ 𝑀 𝑗 ✓ Optimization instead of search Θ∈𝒣 𝑗=1 ✓ Intelligent instead of heuristic (partially)

Many Recent Attempts ◆ Loss function. Wu L, Tian F, Xia Y, et al. Learning to teach with dynamic loss functions. In NeurIPS, 2018: 6466-6477. Huang C, Zhai S, Talbott W, et al. Addressing the Loss-Metric Mismatch with Adaptive Loss Alignment. In ICML, 2019: 2891-2900. Xu H, Zhang H, Hu Z, et al. AutoLoss: Learning Discrete Schedule for Alternate Optimization. In ICLR, 2019. Li C, Yuan X, Lin C, et al. AM-LFS: AutoML for Loss Function Search. In ICCV, 2019: 8410-8419. Grabocka J, Scholz R, Schmidt-Thieme L. Learning Surrogate Losses[J]. arXiv preprint arXiv:1905.10108, 2019. ◆ Regularization. Feng J, Simon N. Gradient-based regularization parameter selection for problems with nonsmooth penalty functions[J]. Journal of Computational and Graphical Statistics, 2018, 27(2): 426-435. Frecon J, Salzo S, Pontil M. Bilevel learning of the group lasso structure. In NeurIPS 2018: 8301-8311. Streeter M. Learning Optimal Linear Regularizers. In ICML. 2019: 5996-6004. ◆ learner (NAS). Zoph B, Le Q V. Neural architecture search with reinforcement learning. In ICLR, 2017. Baker B, Gupta O, Naik N, et al. Designing neural network architectures using reinforcement learning. In ICLR, 2017. Pham H, Guan M, Zoph B, et al. Efficient Neural Architecture Search via Parameter Sharing. ICML. 2018: 4092-4101. Zoph B, Vasudevan V, Shlens J, et al. Learning transferable architectures for scalable image recognition. In CVPR, 2018: 8697-8710. Liu H, Simonyan K, Yang Y. Darts: Differentiable architecture search. In ICLR, 2019. Xie S, Zheng H, Liu C, et al. SNAS: stochastic neural architecture search. In ICLR, 2019. Liu C, Zoph B, Neumann M, et al. Progressive neural architecture search. In ECCV, 2018: 19-34.

Many Recent Attempts ◆ Hyper-parameters learning. Maclaurin D, Duvenaud D, Adams R. Gradient-based hyperparameter optimization through reversible learning. In ICML, 2015: 2113-2122. Pedregosa F. Hyperparameter optimization with approximate gradient. In ICML, 2016: 737-746. Luketina J, Berglund M, Greff K, et al. Scalable gradient-based tuning of continuous regularization hyperparameters. In ICML. 2016: 2952-2960. Franceschi L, Donini M, Frasconi P, et al. Forward and reverse gradient-based hyperparameter optimization. In ICML, 2017: 1165-1173. Franceschi L, Frasconi P, Salzo S, et al. Bilevel Programming for Hyperparameter Optimization and Meta-Learning. In ICML, 2018: 1563-1572. ◆ Gradients and learning rate. Andrychowicz M, Denil M, Gomez S, et al. Learning to learn by gradient descent by gradient descent. In NeurIPS, 2016. Baydin A G, Cornish R, Rubio D M, et al. Online learning rate adaptation with hypergradient descent. In ICLR, 2018. Jacobsen A, Schlegel M, Linke C, et al. Meta-descent for Online, Continual Prediction. In AAAI. 2019. Metz L,, et al. Understanding and correcting pathologies in the training of learned optimizers. In ICML,2019:4556-4565. Xu Z, Dai A M, Kemp J, et al. Learning an Adaptive Learning Rate Schedule. arXiv preprint arXiv:1909.09712, 2019. ◆ Sample reweighing. Jiang L, Zhou Z, Leung T, et al. MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels. In ICML, 2018: 2309-2318. Ren M, Zeng W, Yang B, et al. Learning to Reweight Examples for Robust Deep Learning. In ICML, 2018: 4331-4340. Shu J, Xie Q, Yi L, et al. Meta-Weight-Net: Learning an Explicit Mapping For Sample Weighting. In NeurIPS, 2019. Zhao S, Fard M M, Narasimhan H, et al. Metric-Optimized Example Weights. In ICML 2019: 7533-7542.

Adaptively Learning the Robust Loss Generalized Cross Entropy Zhang, et al., NeurIPS, 2018 Symmetric Cross Entropy Wang, et al., ICCV, 2019 Bi-Tempered logistic Loss Amid, et al., NeurIPS, 2019 Polynomial SoftWeighting loss Zhao, et al., AAAI, 2015

Hyperparameter Learning by Meta Learning Training loss Meta loss Shu, et al., submitted, 2019

Experimental Results Shu, et al., submitted, 2019

Experimental Results ✓ The hyper-parameter adaptively learned by meta-learning actually not the optimal one for the original loss, with fixed hyper-parameter throughout its iteration. ✓ Meta learning adaptively finds a proper hyper-parameter and simultaneously explores a good initialization network parameter under its current hyper-parameter in a dynamical way. ✓ Such adaptive learning manner should be more suitable for simultaneously obtain optimal values for both of them rather than only updating one under the other fixed. Shu, et al., submitted, 2019

When Model Contains Large Amount of Hyperparameters? Training loss Meta loss ➢ Overfitting issue easily occurs (similar to conventional machine learning) ➢ How to alleviate this issue? ➢ Build parametric prior representation (neither too large nor too small) for hyperparameters (similar to conventional machine learning) ➢ Learner VS meta-learner ➢ Need to deeply understand the data as well as the learning problem! ✓ Multi-view learning, multi-task learning (parameter - similar) ✓ Subspace learning (matrix – low rank)

When Model Contains Large Amount of Hyperparameters?

Deep Learning with Training Data Bias Problem: big data often come with noisy labels or class imbalance.

Deep Networks tend to overfit to Training Data! Zhang et al. (2017) found that: Deep neural networks easily fit(memorizing) random labels. Zhang C, Bengio S, Hardt M, et al. Understanding deep learning requires rethinking generalization. ICLR 2017. best paper

How to robustly train deep networks on training data bias to improve the generalization performance?

Related work: Learning with Training Data Bias ◆ Sample weighting methods ✓ dataset resampling(Chawla et al., 2002) ✓ instance re-weight (Zadrozny, 2004) ✓ AdaBoost method (Freund & Schapire, 1997) ✓ Hard example mining (Malisiewicz et al., 2011) ✓ focal loss (Lin et al., 2018) ✓ self-paced learning (Kumar et al., 2010) ✓ Iterative reweighting strategy (Fernando & Mkchael, 2003; Zhang & Sabuncu, 2018) ✓ prediction variance method (Chang et al., 2017) ◆ Meta learning methods ✓ FWL (Dehghani et al.,2018) ✓ learning to teach (Fan et al., 2018; Wu et al., 2018) ✓ MentorNet (Jiang et al., 2018) ✓ L2RW (Ren et al., 2018) ◆ Other methods ✓ GLC (Hendrycks et al., 2018) ✓ Reed (Reed et al., 2015) ✓ Co-teaching (Han et al., 2018) ✓ D2L (Ma et al.,2018) ✓ S-Model (Goldberger & Ben-Reuven, 2017)

Robust Deep Learning Based on Meta-learning Deyu Meng Xian Jiaotong - PowerPoint PPT Presentation

Robust Deep Learning Based on Meta-learning Deyu Meng Xian Jiaotong University dymeng@mail.xjtu.edu.cn http://gr.xjtu.edu.cn/web/dymeng Deep Learning Robust Meta-learning The Success of Deep Learning Relies on

Meta- Meta -Programming with Programming with Modelica Modelica for Meta- for Meta

Outlier Outlier Outlier- Outlier - -robust - robust robust robust identification

META Seal of Recognition and META Prize Award Ceremony Georg Rehm (DFKI) on behalf of the

Short Course in Supervised Learning Robust Optimization and Machine Learning Robust Supervised

Bayesian Model-Agnostic Meta-Learning Taesup Kim* (presenter), Jaesik Yoon* Ousmane Dia,

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Robust Evidence Synthesis Ullrika Sahlin Thursday 9.00-12.30 1 Evidence-based Meta-analysis

Meta Learning Shengchao Liu Background Meta Learning (AKA Learning to Learn) A

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Intelligent Tutoring Systems: A Meta-Analysis Meta-Analysis Wenting Ma March, 2011

Company profile Capabilities Customers & References META-LRA Kft. 8400 Ajka,

Individual Participant Data (IPD) Reviews and Meta analyses Lesley Stewart Director, CRD Larysa

Lecture 31/Chapter 25 More about Meta-Analysis Benefits and Pitfalls An Application:

Simultaneous meta and data manipulation in Blaise Marien Lina Statistics netherlands Statistics

treatment in geophysical data assimilation - Some ideas Alberto

Lecture on Multicores Darius Sidlauskas Post-doc Darius Sidlauskas, 25/02-2014 1/21 Outline

Access to health and how to do it inclusively Prof. Francesco Castelli University of Brescia

Practice Based Research in the Community Pharmacy Stephanie A. Gernant, PharmD MS. Assistant

Consistency Management Anthony Finkelstein University College London Department of Computer

4G Wireless Networks 4G Wireless Networks Need for Improved Loss Tolerance K. K. Ramakrishnan

Introduction to Software Architecture The top level.... (and design revisited) 1 System Software

1 <Insert Picture Here> C++, Java and .NET: Lessons learned from the Internet Age, and

Robust Deep Learning Based on Meta-learning Deyu Meng Xian Jiaotong - PowerPoint PPT Presentation

Robust Deep Learning Based on Meta-learning Deyu Meng Xian Jiaotong University dymeng@mail.xjtu.edu.cn http://gr.xjtu.edu.cn/web/dymeng Deep Learning Robust Meta-learning The Success of Deep Learning Relies on

Meta- Meta -Programming with Programming with Modelica Modelica for Meta- for Meta

Outlier Outlier Outlier- Outlier - -robust - robust robust robust identification

META Seal of Recognition and META Prize Award Ceremony Georg Rehm (DFKI) on behalf of the

Short Course in Supervised Learning Robust Optimization and Machine Learning Robust Supervised

Bayesian Model-Agnostic Meta-Learning Taesup Kim* (presenter), Jaesik Yoon* Ousmane Dia,

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Robust Evidence Synthesis Ullrika Sahlin Thursday 9.00-12.30 1 Evidence-based Meta-analysis

Meta Learning Shengchao Liu Background Meta Learning (AKA Learning to Learn) A

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Intelligent Tutoring Systems: A Meta-Analysis Meta-Analysis Wenting Ma March, 2011

Company profile Capabilities Customers &amp; References META-LRA Kft. 8400 Ajka,

Individual Participant Data (IPD) Reviews and Meta analyses Lesley Stewart Director, CRD Larysa

Lecture 31/Chapter 25 More about Meta-Analysis Benefits and Pitfalls An Application:

Simultaneous meta and data manipulation in Blaise Marien Lina Statistics netherlands Statistics

treatment in geophysical data assimilation - Some ideas Alberto

Lecture on Multicores Darius Sidlauskas Post-doc Darius Sidlauskas, 25/02-2014 1/21 Outline

Access to health and how to do it inclusively Prof. Francesco Castelli University of Brescia

Practice Based Research in the Community Pharmacy Stephanie A. Gernant, PharmD MS. Assistant

Consistency Management Anthony Finkelstein University College London Department of Computer

4G Wireless Networks 4G Wireless Networks Need for Improved Loss Tolerance K. K. Ramakrishnan

Introduction to Software Architecture The top level.... (and design revisited) 1 System Software

1 &lt;Insert Picture Here&gt; C++, Java and .NET: Lessons learned from the Internet Age, and

Company profile Capabilities Customers & References META-LRA Kft. 8400 Ajka,

1 <Insert Picture Here> C++, Java and .NET: Lessons learned from the Internet Age, and