mcunet tiny deep learning on iot devices
play

MCUNet: Tiny Deep Learning on IoT Devices Ji Lin 1 Song Han 1 - PowerPoint PPT Presentation

MCUNet: Tiny Deep Learning on IoT Devices Ji Lin 1 Song Han 1 Wei-Ming Chen 1,2 Yujun Lin 1 John Cohn 3 Chuang Gan 3 1 MIT 2 National Taiwan University 3 MIT-IBM Watson AI Lab NeurIPS 2020 (spotlight) Background: The Era of AIoT on


  1. MCUNet: Tiny Deep Learning on IoT Devices Ji Lin 1 Song Han 1 Wei-Ming Chen 1,2 Yujun Lin 1 John Cohn 3 Chuang Gan 3 1 MIT 2 National Taiwan University 3 MIT-IBM Watson AI Lab NeurIPS 2020 (spotlight)

  2. Background: The Era of AIoT on Microcontrollers (MCUs) • Low-cost, low-power

  3. Background: The Era of AIoT on Microcontrollers (MCUs) • Low-cost, low-power • Rapid growth 50 #Units (Billion) 40 30 20 10 0 12 13 14 15F 16F 17F 18F 19F

  4. Background: The Era of AIoT on Microcontrollers (MCUs) • Low-cost, low-power • Rapid growth 50 #Units (Billion) 40 30 20 10 0 12 13 14 15F 16F 17F 18F 19F • Wide applications Smart Retail Personalized Healthcare Precision Agriculture Smart Home …

  5. Challenge: Memory Too Small to Hold DNN Memory (Activation) Storage (Weights)

  6. Challenge: Memory Too Small to Hold DNN Cloud AI Memory (Activation) 16GB Storage (Weights) ~TB/PB

  7. Challenge: Memory Too Small to Hold DNN Cloud AI Mobile AI Memory (Activation) 16GB 4GB Storage (Weights) ~TB/PB 256GB

  8. Challenge: Memory Too Small to Hold DNN Cloud AI Mobile AI Tiny AI Memory (Activation) 16GB 320kB 4GB Storage (Weights) ~TB/PB 256GB 1MB

  9. Challenge: Memory Too Small to Hold DNN Cloud AI Mobile AI Tiny AI Memory (Activation) 16GB 320kB 4GB 13,000x Storage (Weights) ~TB/PB 256GB 1MB smaller 50,000x smaller

  10. Challenge: Memory Too Small to Hold DNN Cloud AI Mobile AI Tiny AI Memory (Activation) 16GB 320kB 4GB 13,000x Storage (Weights) ~TB/PB 256GB 1MB smaller We need to reduce the peak activation size 50,000x AND the model size to fit a DNN into MCUs. smaller

  11. Existing efficient network only reduces model size but NOT activation size! ~70% ImageNet Top-1 ResNet-18 MobileNetV2-0.75 MCUNet 50 40 30 4.6x 20 10 1.8x 0 Param (MB) Peak Activation (MB)

  12. Challenge: Memory Too Small to Hold DNN Peak Memory (kB) ResNet-50 23x MobileNetV2 22x MobileNetV2 (int8) 5x 320kB 0 2000 4000 6000 8000 constraint

  13. Challenge: Memory Too Small to Hold DNN Peak Memory (kB) ResNet-50 23x MobileNetV2 22x MobileNetV2 (int8) 5x 320kB 0 2000 4000 6000 8000 constraint

  14. Challenge: Memory Too Small to Hold DNN Peak Memory (kB) ResNet-50 23x MobileNetV2 22x MobileNetV2 (int8) 5x 320kB 0 2000 4000 6000 8000 constraint MCUNet

  15. MCUNet: System-Algorithm Co-design

  16. MCUNet: System-Algorithm Co-design Library NAS (a) Search NN model on an existing library e.g., ProxylessNAS, MnasNet

  17. MCUNet: System-Algorithm Co-design Library NN Model Library NAS (a) Search NN model on an existing library (b) Tune deep learning library given a NN model e.g., ProxylessNAS, MnasNet e.g., TVM

  18. MCUNet: System-Algorithm Co-design Library NN Model Library NAS (a) Search NN model on an existing library (b) Tune deep learning library given a NN model e.g., ProxylessNAS, MnasNet e.g., TVM Efficient Neural Architecture TinyNAS MCUNet TinyEngine Efficient Compiler / Runtime (c) MCUNet : system-algorithm co-design

  19. MCUNet: System-Algorithm Co-design Library NN Model Library NAS (a) Search NN model on an existing library (b) Tune deep learning library given a NN model e.g., ProxylessNAS, MnasNet e.g., TVM Efficient Neural Architecture TinyNAS TinyNAS MCUNet TinyEngine Efficient Compiler / Runtime (c) MCUNet : system-algorithm co-design

  20. TinyNAS: Two-Stage NAS for Tiny Memory Constraints Search space design is crucial for NAS performance There is no prior expertise on MCU model design Full Network Space

  21. TinyNAS: Two-Stage NAS for Tiny Memory Constraints Search space design is crucial for NAS performance There is no prior expertise on MCU model design Memory/Storage Constraints Full Network Space Optimized Search Space

  22. TinyNAS: Two-Stage NAS for Tiny Memory Constraints Search space design is crucial for NAS performance There is no prior expertise on MCU model design Memory/Storage Constraints Full Network Space Model Specialization Optimized Search Space

  23. TinyNAS: (1) Automated search space optimization Revisit ProxylessNAS search space: S = kernel size × expansion ratio × depth

  24. TinyNAS: (1) Automated search space optimization Revisit ProxylessNAS search space: S = kernel size × expansion ratio × depth k=7 k=5 k=3

  25. TinyNAS: (1) Automated search space optimization Revisit ProxylessNAS search space: S = kernel size × expansion ratio × depth dw pw1 pw2 e=6 k=7 e=4 k=5 k=3 e=2

  26. TinyNAS: (1) Automated search space optimization Revisit ProxylessNAS search space: S = kernel size × expansion ratio × depth dw pw1 pw2 d=4 e=6 k=7 d=3 e=4 k=5 k=3 d=2 e=2

  27. TinyNAS: (1) Automated search space optimization Revisit ProxylessNAS search space: S = kernel size × expansion ratio × depth Out of memory!

  28. TinyNAS: (1) Automated search space optimization Extended search space to cover wide range of hardware capacity: S’ = kernel size × expansion ratio × depth × input resolution R × width multiplier W

  29. TinyNAS: (1) Automated search space optimization Extended search space to cover wide range of hardware capacity: S’ = kernel size × expansion ratio × depth × input resolution R × width multiplier W Di ff erent R and W for di ff erent hardware capacity (i.e., di ff erent optimized sub-space) R= 224 , W= 1.0

  30. TinyNAS: (1) Automated search space optimization Extended search space to cover wide range of hardware capacity: S’ = kernel size × expansion ratio × depth × input resolution R × width multiplier W Di ff erent R and W for di ff erent hardware capacity (i.e., di ff erent optimized sub-space) R= 260 , W= 1.4 * R= 224 , W= 1.0 * Cai et al., Once-for-All: Train One Network and Specialize it for Efficient Deployment, ICLR’20

  31. TinyNAS: (1) Automated search space optimization Extended search space to cover wide range of hardware capacity: S’ = kernel size × expansion ratio × depth × input resolution R × width multiplier W Di ff erent R and W for di ff erent hardware capacity (i.e., di ff erent optimized sub-space) R= 260 , W= 1.4 R= 224 , W= 1.0 R= ? , W= ? F412/F743/H746/… 256kB/320kB/512kB/…

  32. TinyNAS: (1) Automated search space optimization Analyzing FLOPs distribution of satisfying models in each search space: Larger FLOPs -> Larger model capacity -> More likely to give higher accuracy

  33. TinyNAS: (1) Automated search space optimization Analyzing FLOPs distribution of satisfying models in each search space: Larger FLOPs -> Larger model capacity -> More likely to give higher accuracy 320kB? 100% width-res. | mFLOPs Cumulative Probability w0.3-r160 | 32.5 75% w0.4-r144 | 46.9 50% 25% 0% 25 30 35 40 45 50 55 60 65 FLOPs (M)

  34. TinyNAS: (1) Automated search space optimization Analyzing FLOPs distribution of satisfying models in each search space: Larger FLOPs -> Larger model capacity -> More likely to give higher accuracy 100% width-res. | mFLOPs Cumulative Probability w0.3-r160 | 32.5 75% w0.4-r144 | 46.9 50% 25% 0% 25 30 35 40 45 50 55 60 65 FLOPs (M)

  35. TinyNAS: (1) Automated search space optimization Analyzing FLOPs distribution of satisfying models in each search space: Larger FLOPs -> Larger model capacity -> More likely to give higher accuracy 100% width-res. | mFLOPs Cumulative Probability w0.3-r160 | 32.5 p=80% (32.3M, 80%) (45.4M, 80%) 75% w0.4-r144 | 46.9 Bad design space p0.8 50% 25% 0% 25 30 35 40 45 50 55 60 65 FLOPs (M)

  36. TinyNAS: (1) Automated search space optimization Analyzing FLOPs distribution of satisfying models in each search space: Larger FLOPs -> Larger model capacity -> More likely to give higher accuracy 100% width-res. | mFLOPs Cumulative Probability w0.3-r160 | 32.5 p=80% (32.3M, 80%) (45.4M, 80%) 75% w0.4-r144 | 46.9 Bad design space p0.8 % best acc: 76.4% 2 . 4 50% 7 : c c a t s 25% e b 0% 25 30 35 40 45 50 55 60 65 FLOPs (M)

  37. TinyNAS: (1) Automated search space optimization Analyzing FLOPs distribution of satisfying models in each search space: Larger FLOPs -> Larger model capacity -> More likely to give higher accuracy 100% width-res. | mFLOPs Cumulative Probability w0.3-r160 | 32.5 p=80% (32.3M, 80%) (50.3M, 80%) 75% w0.4-r112 | 32.4 Bad design space w0.4-r128 | 39.3 % Good design space: likely to achieve best acc: 76.4% best acc: 78.7% 2 w0.4-r144 | 46.9 high FLOPs under memory constraint . 4 50% w0.5-r112 | 38.3 7 : w0.5-r128 | 46.9 c c w0.5-r144 | 52.0 a t s w0.6-r112 | 41.3 25% e b w0.7-r96 | 31.4 w0.7-r112 | 38.4 p0.8 0% 25 30 35 40 45 50 55 60 65 FLOPs (M)

  38. TinyNAS: (2) Resource-constrained model specialization • One-shot NAS through weight sharing Super Network Random sample Jointly fine-tune (kernel size, multiple sub- expansion, depth) networks • Small sub-networks are nested in large sub-networks. * Cai et al., Once-for-All: Train One Network and Specialize it for Efficient Deployment, ICLR’20

  39. TinyNAS: (2) Resource-constrained model specialization • One-shot NAS through weight sharing Super Network Random sample Jointly fine-tune (kernel size, multiple sub- expansion, depth) networks Directly evaluate the accuracy of sub-nets …

  40. TinyNAS: (2) Resource-constrained model specialization Elastic Elastic Elastic Kernel Size Depth Width 40

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend