AutoML for TinyML with Once-for-All Network Song Han Massachusetts - PowerPoint PPT Presentation

AutoML for TinyML with Once-for-All Network Song Han Massachusetts Institute of Technology Once-for-All, ICLR’20

AutoML for TinyML with Once-for-All Network �� fewer engineers small model �� many engineers less computation large model Less Engineer Resources: AutoML Less Computational Resources: TinyML A lot of computation Once-for-All, ICLR’20

Challenge: Efficient Inference on Diverse Hardware Platforms Cloud AI Mobile AI Tiny AI (AIoT) less less resource resource • Memory: 32GB • Memory: 4GB • Memory: <100 KB • Computation: TFLOPS/s • Computation: GFLOPS/s • Computation: <MFLOPS/s • Different hardware platforms have different resource constraints. We need to customize our models for each platform to achieve the best accuracy-efficiency trade-off, especially on resource-constrained edge devices . Once-for-All, ICLR’20 3

Challenge: Efficient Inference on Diverse Hardware Platforms Design Cost (GPU hours) 200 for training iterations: forward-backward(); The design cost is calculated under the assumption of using MobileNet-v2. 4

Challenge: Efficient Inference on Diverse Hardware Platforms Design Cost (GPU hours) for search episodes: （ 1 ） 40K for training iterations: forward-backward(); if good_model: break; for post-search training iterations: forward-backward(); The design cost is calculated under the assumption of using MnasNet. Once-for-All, ICLR’20 5 [1] Tan, Mingxing, et al. "Mnasnet: Platform-aware neural architecture search for mobile." CVPR . 2019.

Challenge: Efficient Inference on Diverse Hardware Platforms Diverse Hardware Platforms 2019 2017 2015 2013 Design Cost (GPU hours) for devices: （ 2 ） for search episodes: （ 1 ） 40K for training iterations: 160K forward-backward(); if good_model: break; for post-search training iterations: forward-backward(); The design cost is calculated under the assumption of using MnasNet. Once-for-All, ICLR’20 6 [1] Tan, Mingxing, et al. "Mnasnet: Platform-aware neural architecture search for mobile." CVPR . 2019.

Challenge: Efficient Inference on Diverse Hardware Platforms Diverse Hardware Platforms … 10 12 10 9 10 6 Cloud AI ( FLOPS) Mobile AI ( FLOPS) Tiny AI ( FLOPS) Design Cost (GPU hours) for many devices: （ 2 ） for search episodes: （ 1 ） 40K for training iterations: 160K forward-backward(); if good_model: break; 1600K for post-search training iterations: forward-backward(); The design cost is calculated under the assumption of using MnasNet. Once-for-All, ICLR’20 7 [1] Tan, Mingxing, et al. "Mnasnet: Platform-aware neural architecture search for mobile." CVPR . 2019.

Challenge: Efficient Inference on Diverse Hardware Platforms Diverse Hardware Platforms … 10 12 10 9 10 6 Cloud AI ( FLOPS) Mobile AI ( FLOPS) Tiny AI ( FLOPS) Design Cost (GPU hours) for many devices: （ 2 ） for search episodes: （ 1 ） 40K 11.4k lbs CO 2 emission → for training iterations: 45.4k lbs CO 2 emission 160K → forward-backward(); if good_model: break; 454.4k lbs CO 2 emission 1600K → for post-search training iterations: forward-backward(); 1 GPU hour translates to 0.284 lbs CO 2 emission according to Once-for-All, ICLR’20 8 Strubell, Emma, et al. "Energy and policy considerations for deep learning in NLP." ACL. 2019.

Problem: TinyML comes at the cost of BigML (inference) (training/search) We need Green AI: Solve the Environmental Problem of NAS ICML’19, ACL’19 Evolved Transformer 4 orders of magnitude Ours 52 ACL’20 Hardware-Aware Transformer

OFA: Decouple Training and Search Once-for-All: Conventional NAS for OFA training iterations: for devices: （ 2 ） training forward-backward(); for search episodes: （ 1 ） decouple for training iterations: => for devices: search forward-backward(); for search episodes: if good_model: break; sample from OFA; for post-search training iterations: if good_model: break; forward-backward(); direct deploy without training; Once-for-All, ICLR’20 10

Challenge: Efficient Inference on Diverse Hardware Platforms Diverse Hardware Platforms … 10 12 10 9 10 6 Cloud AI ( FLOPS) Mobile AI ( FLOPS) Tiny AI ( FLOPS) for OFA training iterations: Design Cost (GPU hours) training forward-backward(); 40K 11.4k lbs CO 2 emission → decouple for devices: search 45.4k lbs CO 2 emission 160K → for search episodes: sample from OFA; 454.4k lbs CO 2 emission 1600K → if good_model: break; Once-for-All Network direct deploy without training; 1 GPU hour translates to 0.284 lbs CO 2 emission according to Once-for-All, ICLR’20 11 Strubell, Emma, et al. "Energy and policy considerations for deep learning in NLP." ACL. 2019.

Once-for-All Network: Decouple Model Training and Architecture Design once-for-all network Once-for-All, ICLR’20 12

Once-for-All Network: Decouple Model Training and Architecture Design once-for-all network … Once-for-All, ICLR’20 15

Challenge: how to prevent different subnetworks from interfering with each other? Once-for-All, ICLR’20 16

Solution: Progressive Shrinking 10 19 • More than different sub-networks in a single once-for-all network, covering 4 different dimensions: resolution , kernel size , depth , width . • Directly optimizing the once-for-all network from scratch is much more challenging than training a normal neural network given so many sub-networks to support. Once-for-All, ICLR’20 17

Solution: Progressive Shrinking 10 19 • More than different sub-networks in a single once-for-all network, covering 4 different dimensions: resolution , kernel size , depth , width . • Directly optimizing the once-for-all network from scratch is much more challenging than training a normal neural network given so many sub-networks to support. Progressive Shrinking Jointly fine-tune Train the Shrink the model once-for-all both large and full model (4 dimensions) network small sub-networks • Small sub-networks are nested in large sub-networks. • Cast the training process of the once-for-all network as a progressive shrinking and joint fine-tuning process. Once-for-All, ICLR’20 18

Connection to Network Pruning Network Pruning Train the single pruned Shrink the model Fine-tune full model network (only width) the small net Progressive Shrinking Fine-tune Train the Shrink the model once-for-all both large and full model (4 dimensions) network small sub-nets • Progressive shrinking can be viewed as a generalized network pruning with much higher flexibility across 4 dimensions. Once-for-All, ICLR’20 19

Progressive Shrinking Full Full Full Full Elastic Elastic Elastic Elastic Resolution Kernel Size Depth Width Partial Once-for-All, ICLR’20 20

Progressive Shrinking Full Full Full Full Elastic Elastic Elastic Elastic Resolution Kernel Size Depth Width Partial Partial Once-for-All, ICLR’20 26

Progressive Shrinking Full Full Full Full Elastic Elastic Elastic Elastic Resolution Kernel Size Depth Width Partial Partial Partial Once-for-All, ICLR’20 32

AutoML for TinyML with Once-for-All Network Song Han Massachusetts - PowerPoint PPT Presentation

AutoML for TinyML with Once-for-All Network Song Han Massachusetts Institute of Technology Once-for-All, ICLR20 AutoML for TinyML with Once-for-All Network

Sponsors tinyML Committee tinyML Org team: Bette Cooper, tinyML Org Lead Gary Brown

AutoML for Object Detection Xiangyu Zhang MEGVII Research 1 AutoML for Advances in AutoML

Automatic Machine Learning (AutoML): A Tutorial Frank Hutter Joaquin Vanschoren University of

AutoML in Full Life Circle of Deep Learning Assembly Line Junjie Yan SenseTime Group Limited

AutoML: Automated Machine Learning Barret Zoph, Quoc Le Thanks: Google Brain team CIFAR-10

Automated Machine Learning (AutoML) and Pentaho Caio Moreno de Souza Pentaho Senior Consultant,

Neural Architecture Optimization CONTENTS 1.AutoML 2.NAS

Beyond Reason Codes A Blueprint for Human-Centered, Low-Risk AutoML H2O.ai Machine Learning

Learning is a never-ending process Tasks come and go, but learning is forever Learn more e ff

Sea Otters to Oregon Robert Bailey, Elakha Alliance Once, They W Once, The y Were ere Here.

OpenCms in the Telco industry - A Tale from Down Under OpenCMSdays 2009 Thomas Kutschi Once

From 2 days to 2 seconds - the birth of DevOps Dan North @tastapod Once upon a time

Once for All: Train One Network and Specialize it for Efficient Deployment Han Cai, Chuang Gan,

A JEALOUS A JEALOUS CRYPTANALYST CRYPTANALYST In search of a short vector A story by Leo

The Story of a Tired Old Bathroom Once Once up upon on a time, time, th there ere was sad

Local Complete Count Committees Census 2020 Count Everyone Once, Only Once, and in the Right

Voter turnout, political power and community well-being Objectives for today 1. Why voting

VYS Athle3c Performance Training Fall 2017-Spring 2018 A LIFETIME OF SOCCER BUILT ON A

Safety Briefing Evacuation Route Place of Safety Calling 911 CPR/First

Available (NOFA) Training February 20, 2020 Agenda Welcome Multifamily Energy Program

POWER ANALYSIS GRID Environmental & Economic Justice Project Power Analysis Training - Chart 5

Shifting Paradigms, Positive Behavioral Supports and Restorative Practices in an Urban High

Training Y aining Your VIST ur VISTAs R As Remot motely ly Dial : 877-853-5257 Webinar ID :

Troop Fall Product Manager Training 1 FALL 2018 Agenda Objectives Fall Product: What is

AutoML for TinyML with Once-for-All Network Song Han Massachusetts - PowerPoint PPT Presentation

AutoML for TinyML with Once-for-All Network Song Han Massachusetts Institute of Technology Once-for-All, ICLR20 AutoML for TinyML with Once-for-All Network

Sponsors tinyML Committee tinyML Org team: Bette Cooper, tinyML Org Lead Gary Brown

AutoML for Object Detection Xiangyu Zhang MEGVII Research 1 AutoML for Advances in AutoML

Automatic Machine Learning (AutoML): A Tutorial Frank Hutter Joaquin Vanschoren University of

AutoML in Full Life Circle of Deep Learning Assembly Line Junjie Yan SenseTime Group Limited

AutoML: Automated Machine Learning Barret Zoph, Quoc Le Thanks: Google Brain team CIFAR-10

Automated Machine Learning (AutoML) and Pentaho Caio Moreno de Souza Pentaho Senior Consultant,

Neural Architecture Optimization CONTENTS 1.AutoML 2.NAS

Beyond Reason Codes A Blueprint for Human-Centered, Low-Risk AutoML H2O.ai Machine Learning

Learning is a never-ending process Tasks come and go, but learning is forever Learn more e ff

Sea Otters to Oregon Robert Bailey, Elakha Alliance Once, They W Once, The y Were ere Here.

OpenCms in the Telco industry - A Tale from Down Under OpenCMSdays 2009 Thomas Kutschi Once

From 2 days to 2 seconds - the birth of DevOps Dan North @tastapod Once upon a time

Once for All: Train One Network and Specialize it for Efficient Deployment Han Cai, Chuang Gan,

A JEALOUS A JEALOUS CRYPTANALYST CRYPTANALYST In search of a short vector A story by Leo

The Story of a Tired Old Bathroom Once Once up upon on a time, time, th there ere was sad

Local Complete Count Committees Census 2020 Count Everyone Once, Only Once, and in the Right

Voter turnout, political power and community well-being Objectives for today 1. Why voting

VYS Athle3c Performance Training Fall 2017-Spring 2018 A LIFETIME OF SOCCER BUILT ON A

Safety Briefing Evacuation Route Place of Safety Calling 911 CPR/First

Available (NOFA) Training February 20, 2020 Agenda Welcome Multifamily Energy Program

POWER ANALYSIS GRID Environmental &amp; Economic Justice Project Power Analysis Training - Chart 5

Shifting Paradigms, Positive Behavioral Supports and Restorative Practices in an Urban High

Training Y aining Your VIST ur VISTAs R As Remot motely ly Dial : 877-853-5257 Webinar ID :

Troop Fall Product Manager Training 1 FALL 2018 Agenda Objectives Fall Product: What is

POWER ANALYSIS GRID Environmental & Economic Justice Project Power Analysis Training - Chart 5