AutoML in Full Life Circle of Deep Learning Assembly Line
Junjie Yan SenseTime Group Limited 2019/10/09 Works by AutoML Group @ SenseTime Research
AutoML in Full Life Circle of Deep Learning Assembly Line Junjie - - PowerPoint PPT Presentation
AutoML in Full Life Circle of Deep Learning Assembly Line Junjie Yan SenseTime Group Limited 2019/10/09 Works by AutoML Group @ SenseTime Research A Brief History of Axiomatic System Why AutoML Moore Law V.S. Flynn
Junjie Yan SenseTime Group Limited 2019/10/09 Works by AutoML Group @ SenseTime Research
Moore Law V.S. Flynn Effect
Data Model Optimization Data Set
Data Augmentation Model Network Architecture Optimization Loss Function Data Set NAS ? ? Auto Augment Loss Function Search
Data Augmentation Data Set ? Auto Augment
Data Augmentation Model Network Architecture Data Set NAS ? Auto Augment
Data Augmentation Model Network Architecture Optimization Loss Function Data Set NAS ? ? Auto Augment Loss Function Search
Data Augmentation Model Network Architecture Optimization Loss Function Data Set NAS ? ? Auto Augment Loss Function Search
Lin, Chen, Minghao Guo, Chuming Li, Wei Wu, Dahua Lin, Wanli Ouyang, and Junjie Yan ICCV 2019
and a predefined CNN
Lin, Chen, Minghao Guo, Chuming Li, Wei Wu, Dahua Lin, Wanli Ouyang, and Junjie Yan. "Online Hyper-parameter Learning for Auto-Augmentation Strategy." ICCV19.
dataset, hyper-parameters in training strategy is KNOWN to be deeply coupled with specific dataset and underlying network architecture.
loss.
evolution, Bayesian optimization is computational expensive and implausible to apply on industrial-scaled dataset
parameter within only a single run.
model at mean time.
parameters.
performance on validation set.
Initial Model Sample hyper- paramter
𝑞0(𝜄)
: Initial Distribution 𝑞0(𝜄) Distribute
𝜄1 𝜄2 𝜄𝑜 𝑆1 𝑆𝑜 𝑆2
Update Distribution Model With Highest Reward Sample hyper- parameter
𝑞1(𝜄)
Distribute
𝜄1 𝜄2 𝜄𝑜 𝑆1 𝑆𝑜 𝑆2
Update Distribution
auto-augment, with minor modification
is a random variable:
augmentation distribution.
Lin, Chen, Minghao Guo, Chuming Li, Wei Wu, Dahua Lin, Wanli Ouyang, and Junjie Yan. "Online Hyper-parameter Learning for Auto-Augmentation Strategy." ICCV19.
at the same time.
4.66 3.87 4.55 3.4 3.62 3.08 3.71 2.9 3.46 2.68 3.16 1.75 3.29 2.61 2.75 1.89
RESNET18 WRN-28 DPN-92 AMOEBANET-B
Baseline Cutout Autoaug OHL-Autoaug
Lin, Chen, Minghao Guo, Chuming Li, Wei Wu, Dahua Lin, Wanli Ouyang, and Junjie Yan. "Online Hyper-parameter Learning for Auto-Augmentation Strategy." ICCV19.
24.7 20.07 22.37 20.03 21.07 19.3 RESNET50 SE-RESNET101
Top1 Error
Baseline Autoaug OHL-Autoaug
Lin, Chen, Minghao Guo, Chuming Li, Wei Wu, Dahua Lin, Wanli Ouyang, and Junjie Yan. "Online Hyper-parameter Learning for Auto-Augmentation Strategy." ICCV19.
96% 4%
IMAGENET
Autoaug OHL-Autoaug
98% 2%
CIFAR-10
Autoaug OHL-Autoaug
Data Augmentation Model Network Architecture Optimization Loss Function Data Set NAS ? ? Auto Augment Loss Function Search
Neural Architecture Search with Reinforcement Learning Regularized Evolution for Image Classifier Architecture Search Nov 2016 Sep 2019 May 2017 Dec 2017 July 2018 Feb 2019 DARTS: Differentiable Architecture Search Efficient Neural Architecture Search via Parameter Sharing ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware Single Path One-Shot Neural Architecture Search with Uniform Sampling BlockQNN: Efficient Block- wise Neural Network Architecture Generation IRLAS: Inverse Reinforcement Learning for Architecture Search MBNAS: Multi-branch Neural Architecture Search (preprint)
Xiang Li*, Chen Lin*, Chuming Li, Ming Sun, Wei Wu, Junjie Yan, Wanli Ouyang Preprint.
sharing approach:
set of parameters during training.
ranking models.
*Christian Sciuto, Swisscom Kaicheng Yu, Martin Jaggi and Mathieu Salzmann. "Evaluating the Search Phase of Neural Architecture Search" https://arxiv.org/pdf/1902.08142.pdf.
single operator (operator 𝑝 at 𝑚-th layer ) trained alone or share weights under certain independence assumption:
Xiang Li*, Chen Lin*, Chuming Li, Ming Sun, Wei Wu, Junjie Yan, Wanli Ouyang. “Improving One-Shot NAS By Suppressing The Posterior Fading”Preprint
the sum of cross-entropy (Posterior Fading).
weights could reduce the dis-alignment.
Xiang Li*, Chen Lin*, Chuming Li, Ming Sun, Wei Wu, Junjie Yan, Wanli Ouyang. “Improving One-Shot NAS By Suppressing The Posterior Fading”Preprint
layers are reduced to a fixed set when models are sampled for training.
Xiang Li*, Chen Lin*, Chuming Li, Ming Sun, Wei Wu, Junjie Yan, Wanli Ouyang. “Improving One-Shot NAS By Suppressing The Posterior Fading”Preprint
stages.
uniformly sampled, with the earlier i layers sampled from the partial model pool.
updated by expanding its partial models by one layer and selecting the top-K partial model.
Xiang Li*, Chen Lin*, Chuming Li, Ming Sun, Wei Wu, Junjie Yan, Wanli Ouyang. “Improving One-Shot NAS By Suppressing The Posterior Fading”Preprint
sampling the unspecified layers.
architecture with unsatisfied latency would be removed from the average computation.
Xiang Li*, Chen Lin*, Chuming Li, Ming Sun, Wei Wu, Junjie Yan, Wanli Ouyang. “Improving One-Shot NAS By Suppressing The Posterior Fading”Preprint
models with 10 ms latency constraint.
Xiang Li*, Chen Lin*, Chuming Li, Ming Sun, Wei Wu, Junjie Yan, Wanli Ouyang. “Improving One-Shot NAS By Suppressing The Posterior Fading”Preprint
21.5 22 22.5 23 23.5 24 24.5 25 25.5 26 5 10 15 20 25 30
Latency&Error
AmoebaNet-A PNASNet MNASNet ProxylessGpu EfficientNet-B0 MixNet-S PC-NAS-S PC-NAS-L
Xiang Li*, Chen Lin*, Chuming Li, Ming Sun, Wei Wu, Junjie Yan, Wanli Ouyang. “Improving One-Shot NAS By Suppressing The Posterior Fading”Preprint
with/without
model pool
Top models among final candidate is selected
with/without
model pool
Top models among final candidate is selected
Feng Liang, Ronghao Guo, Chen Lin, Ming Sun, Wei Wu, Junjie Yan, Wanli Ouyang Preprint
Previous work “DetNAS: Backbone Search for Object Detection”use a fixed allocation with is common in NAS for classification.
in Dai et al. 2017, Zhu et al. 2019.
Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, and Yichen Wei. “Deformable convolutional networks”. ICCV17 Xizhou Zhu, Han Hu, Stephen Lin, and Jifeng Dai. “Deformable convnets v2: More deformable, better results”. CVPR19
determining factor of Effective Receptive Fields thus crucial to
detection tasks to improve the backbone.
to improve the performance of various networks
Data Augmentation Model Network Architecture Optimization Loss Function Data Set NAS ? ? Auto Augment Loss Function Search
Li, Chuming, Chen Lin, Minghao Guo, Wei Wu, Wanli Ouyang, and Junjie Yan ICCV 2019
visual analysis.
that require domain experts to explore the large design space, which is usually suboptimal and time-consuming.
learning
Li, Chuming, Chen Lin, Minghao Guo, Wei Wu, Wanli Ouyang, and Junjie Yan. "AM-LFS: AutoML for Loss Function Search." ICCV 2019.
tasks could be approximated in simple function space
Loss Function t(𝒚) SphereFace 𝒅𝒑𝒕(𝒏 ∙ 𝒃𝒅𝒑𝒕 𝒚 ) CosFace 𝒚 − 𝒏 ArcFace 𝒅𝒑𝒕(𝒃𝒅𝒑𝒕 𝒚 + 𝒏) 𝑴𝒋 = −𝒎𝒑𝒉 𝒇
𝑿𝒛𝒋 𝒚𝒋 𝒖 𝒅𝒑𝒕 𝜾𝒛𝒋
𝒇
𝑿𝒛𝒋 𝒚𝒋 𝒖 𝒅𝒑𝒕 𝜾𝒛𝒋
+ 𝒌≠𝒛𝒋 𝒇 𝑿𝒌
𝒚𝒋 𝒅𝒑𝒕 𝜾𝒌
Loss Function 𝝊(𝒚) FocalLoss 𝒚(𝟐−𝒚)𝒏
𝑴𝒋 = −𝒎𝒑𝒉 𝝊 𝒇 𝑿𝒛𝒋
𝒚𝒋 𝒅𝒑𝒕 𝜾𝒛𝒋
𝒇 𝑿𝒛𝒋
𝒚𝒋 𝒅𝒑𝒕 𝜾𝒛𝒋 + 𝒌≠𝒛𝒋 𝒇 𝑿𝒌 𝒚𝒋 𝒅𝒑𝒕 𝜾𝒌
0.2 0.4 0.6 0.8 1
0.5 1 1.5
Search Space of t and
section-1 section-2 section-3
𝑴𝒋 = −𝒎𝒑𝒉 𝝊 𝒇
𝑿𝒛𝒋 𝒚𝒋 𝒖 𝒅𝒑𝒕 𝜾𝒛𝒋
𝒇 𝑿𝒛𝒋
𝒚𝒋 𝒅𝒑𝒕 𝜾𝒛𝒋 + 𝒌≠𝒛𝒋 𝒇 𝑿𝒌 𝒚𝒋 𝒅𝒑𝒕 𝜾𝒌
Methods mAP Top 1 Acc SFT 73.2 86.9 MGN 78.4 88.7 MGN(RK) 88.6 90.9 SFT+ours 73.8(+0.6) 87.0 MGN+ours 80.0(+1.6) 89.9 MGN(RK)+ours 90.1(+1.5) 92.4
Noise ratio Baseline Ours 0% 91.2 93.1 10% 87.9 89.9 20% 84.9 87.3