automl for tinyml with once for all network
play

AutoML for TinyML with Once-for-All Network Song Han Massachusetts - PowerPoint PPT Presentation

AutoML for TinyML with Once-for-All Network Song Han Massachusetts Institute of Technology Once-for-All, ICLR20 AutoML for TinyML with Once-for-All Network


  1. AutoML for TinyML with Once-for-All Network Song Han Massachusetts Institute of Technology Once-for-All, ICLR’20

  2. AutoML for TinyML with Once-for-All Network �������������� ������������� �������������� ������������� �������� �������� �������� �������� �������� ������� fewer engineers small model �������� ������� many engineers less computation large model Less Engineer Resources: AutoML Less Computational Resources: TinyML A lot of computation Once-for-All, ICLR’20

  3. Challenge: Efficient Inference on Diverse Hardware Platforms Cloud AI Mobile AI Tiny AI (AIoT) less less resource resource • Memory: 32GB • Memory: 4GB • Memory: <100 KB • Computation: TFLOPS/s • Computation: GFLOPS/s • Computation: <MFLOPS/s • Different hardware platforms have different resource constraints. We need to customize our models for each platform to achieve the best accuracy-efficiency trade-off, especially on resource-constrained edge devices . Once-for-All, ICLR’20 3

  4. Challenge: Efficient Inference on Diverse Hardware Platforms Design Cost (GPU hours) 200 for training iterations: forward-backward(); The design cost is calculated under the assumption of using MobileNet-v2. 4

  5. Challenge: Efficient Inference on Diverse Hardware Platforms Design Cost (GPU hours) for search episodes: ( 1 ) 40K for training iterations: forward-backward(); if good_model: break; for post-search training iterations: forward-backward(); The design cost is calculated under the assumption of using MnasNet. Once-for-All, ICLR’20 5 [1] Tan, Mingxing, et al. "Mnasnet: Platform-aware neural architecture search for mobile." CVPR . 2019.

  6. Challenge: Efficient Inference on Diverse Hardware Platforms Diverse Hardware Platforms 2019 2017 2015 2013 Design Cost (GPU hours) for devices: ( 2 ) for search episodes: ( 1 ) 40K for training iterations: 160K forward-backward(); if good_model: break; for post-search training iterations: forward-backward(); The design cost is calculated under the assumption of using MnasNet. Once-for-All, ICLR’20 6 [1] Tan, Mingxing, et al. "Mnasnet: Platform-aware neural architecture search for mobile." CVPR . 2019.

  7. Challenge: Efficient Inference on Diverse Hardware Platforms Diverse Hardware Platforms … 10 12 10 9 10 6 Cloud AI ( FLOPS) Mobile AI ( FLOPS) Tiny AI ( FLOPS) Design Cost (GPU hours) for many devices: ( 2 ) for search episodes: ( 1 ) 40K for training iterations: 160K forward-backward(); if good_model: break; 1600K for post-search training iterations: forward-backward(); The design cost is calculated under the assumption of using MnasNet. Once-for-All, ICLR’20 7 [1] Tan, Mingxing, et al. "Mnasnet: Platform-aware neural architecture search for mobile." CVPR . 2019.

  8. Challenge: Efficient Inference on Diverse Hardware Platforms Diverse Hardware Platforms … 10 12 10 9 10 6 Cloud AI ( FLOPS) Mobile AI ( FLOPS) Tiny AI ( FLOPS) Design Cost (GPU hours) for many devices: ( 2 ) for search episodes: ( 1 ) 40K 11.4k lbs CO 2 emission → for training iterations: 45.4k lbs CO 2 emission 160K → forward-backward(); if good_model: break; 454.4k lbs CO 2 emission 1600K → for post-search training iterations: forward-backward(); 1 GPU hour translates to 0.284 lbs CO 2 emission according to Once-for-All, ICLR’20 8 Strubell, Emma, et al. "Energy and policy considerations for deep learning in NLP." ACL. 2019.

  9. Problem: TinyML comes at the cost of BigML (inference) (training/search) We need Green AI: Solve the Environmental Problem of NAS ICML’19, ACL’19 Evolved Transformer 4 orders of magnitude Ours 52 ACL’20 Hardware-Aware Transformer

  10. OFA: Decouple Training and Search Once-for-All: Conventional NAS for OFA training iterations: for devices: ( 2 ) training forward-backward(); for search episodes: ( 1 ) decouple for training iterations: => for devices: search forward-backward(); for search episodes: if good_model: break; sample from OFA; for post-search training iterations: if good_model: break; forward-backward(); direct deploy without training; Once-for-All, ICLR’20 10

  11. Challenge: Efficient Inference on Diverse Hardware Platforms Diverse Hardware Platforms … 10 12 10 9 10 6 Cloud AI ( FLOPS) Mobile AI ( FLOPS) Tiny AI ( FLOPS) for OFA training iterations: Design Cost (GPU hours) training forward-backward(); 40K 11.4k lbs CO 2 emission → decouple for devices: search 45.4k lbs CO 2 emission 160K → for search episodes: sample from OFA; 454.4k lbs CO 2 emission 1600K → if good_model: break; Once-for-All Network direct deploy without training; 1 GPU hour translates to 0.284 lbs CO 2 emission according to Once-for-All, ICLR’20 11 Strubell, Emma, et al. "Energy and policy considerations for deep learning in NLP." ACL. 2019.

  12. Once-for-All Network: Decouple Model Training and Architecture Design once-for-all network Once-for-All, ICLR’20 12

  13. Once-for-All Network: Decouple Model Training and Architecture Design once-for-all network Once-for-All, ICLR’20 13

  14. Once-for-All Network: Decouple Model Training and Architecture Design once-for-all network Once-for-All, ICLR’20 14

  15. Once-for-All Network: Decouple Model Training and Architecture Design once-for-all network … Once-for-All, ICLR’20 15

  16. Challenge: how to prevent different subnetworks from interfering with each other? Once-for-All, ICLR’20 16

  17. Solution: Progressive Shrinking 10 19 • More than different sub-networks in a single once-for-all network, covering 4 different dimensions: resolution , kernel size , depth , width . • Directly optimizing the once-for-all network from scratch is much more challenging than training a normal neural network given so many sub-networks to support. Once-for-All, ICLR’20 17

  18. Solution: Progressive Shrinking 10 19 • More than different sub-networks in a single once-for-all network, covering 4 different dimensions: resolution , kernel size , depth , width . • Directly optimizing the once-for-all network from scratch is much more challenging than training a normal neural network given so many sub-networks to support. Progressive Shrinking Jointly fine-tune Train the Shrink the model once-for-all both large and full model (4 dimensions) network small sub-networks • Small sub-networks are nested in large sub-networks. • Cast the training process of the once-for-all network as a progressive shrinking and joint fine-tuning process. Once-for-All, ICLR’20 18

  19. Connection to Network Pruning Network Pruning Train the single pruned Shrink the model Fine-tune full model network (only width) the small net Progressive Shrinking Fine-tune Train the Shrink the model once-for-all both large and full model (4 dimensions) network small sub-nets • Progressive shrinking can be viewed as a generalized network pruning with much higher flexibility across 4 dimensions. Once-for-All, ICLR’20 19

  20. Progressive Shrinking Full Full Full Full Elastic Elastic Elastic Elastic Resolution Kernel Size Depth Width Partial Once-for-All, ICLR’20 20

  21. Progressive Shrinking Full Full Full Full Elastic Elastic Elastic Elastic Resolution Kernel Size Depth Width Partial Once-for-All, ICLR’20 21

  22. Progressive Shrinking Full Full Full Full Elastic Elastic Elastic Elastic Resolution Kernel Size Depth Width Partial Once-for-All, ICLR’20 22

  23. Progressive Shrinking Full Full Full Full Elastic Elastic Elastic Elastic Resolution Kernel Size Depth Width Partial Once-for-All, ICLR’20 23

  24. Progressive Shrinking Full Full Full Full Elastic Elastic Elastic Elastic Resolution Kernel Size Depth Width Partial Once-for-All, ICLR’20 24

  25. Progressive Shrinking Full Full Full Full Elastic Elastic Elastic Elastic Resolution Kernel Size Depth Width Partial Once-for-All, ICLR’20 25

  26. Progressive Shrinking Full Full Full Full Elastic Elastic Elastic Elastic Resolution Kernel Size Depth Width Partial Partial Once-for-All, ICLR’20 26

  27. Progressive Shrinking Full Full Full Full Elastic Elastic Elastic Elastic Resolution Kernel Size Depth Width Partial Partial Once-for-All, ICLR’20 27

  28. Progressive Shrinking Full Full Full Full Elastic Elastic Elastic Elastic Resolution Kernel Size Depth Width Partial Partial Once-for-All, ICLR’20 28

  29. Progressive Shrinking Full Full Full Full Elastic Elastic Elastic Elastic Resolution Kernel Size Depth Width Partial Partial Once-for-All, ICLR’20 29

  30. Progressive Shrinking Full Full Full Full Elastic Elastic Elastic Elastic Resolution Kernel Size Depth Width Partial Partial Once-for-All, ICLR’20 30

  31. Progressive Shrinking Full Full Full Full Elastic Elastic Elastic Elastic Resolution Kernel Size Depth Width Partial Partial Once-for-All, ICLR’20 31

  32. Progressive Shrinking Full Full Full Full Elastic Elastic Elastic Elastic Resolution Kernel Size Depth Width Partial Partial Partial Once-for-All, ICLR’20 32

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend