MCUNet: Tiny Deep Learning on IoT Devices Ji Lin 1 Song Han 1 - PowerPoint PPT Presentation

MCUNet: Tiny Deep Learning on IoT Devices Ji Lin 1 Song Han 1 Wei-Ming Chen 1,2 Yujun Lin 1 John Cohn 3 Chuang Gan 3 1 MIT 2 National Taiwan University 3 MIT-IBM Watson AI Lab NeurIPS 2020 (spotlight)

Background: The Era of AIoT on Microcontrollers (MCUs) • Low-cost, low-power

Background: The Era of AIoT on Microcontrollers (MCUs) • Low-cost, low-power • Rapid growth 50 #Units (Billion) 40 30 20 10 0 12 13 14 15F 16F 17F 18F 19F

Background: The Era of AIoT on Microcontrollers (MCUs) • Low-cost, low-power • Rapid growth 50 #Units (Billion) 40 30 20 10 0 12 13 14 15F 16F 17F 18F 19F • Wide applications Smart Retail Personalized Healthcare Precision Agriculture Smart Home …

Challenge: Memory Too Small to Hold DNN Memory (Activation) Storage (Weights)

Challenge: Memory Too Small to Hold DNN Cloud AI Memory (Activation) 16GB Storage (Weights) ~TB/PB

Challenge: Memory Too Small to Hold DNN Cloud AI Mobile AI Memory (Activation) 16GB 4GB Storage (Weights) ~TB/PB 256GB

Challenge: Memory Too Small to Hold DNN Cloud AI Mobile AI Tiny AI Memory (Activation) 16GB 320kB 4GB Storage (Weights) ~TB/PB 256GB 1MB

Challenge: Memory Too Small to Hold DNN Cloud AI Mobile AI Tiny AI Memory (Activation) 16GB 320kB 4GB 13,000x Storage (Weights) ~TB/PB 256GB 1MB smaller 50,000x smaller

Challenge: Memory Too Small to Hold DNN Cloud AI Mobile AI Tiny AI Memory (Activation) 16GB 320kB 4GB 13,000x Storage (Weights) ~TB/PB 256GB 1MB smaller We need to reduce the peak activation size 50,000x AND the model size to fit a DNN into MCUs. smaller

Existing efficient network only reduces model size but NOT activation size! ~70% ImageNet Top-1 ResNet-18 MobileNetV2-0.75 MCUNet 50 40 30 4.6x 20 10 1.8x 0 Param (MB) Peak Activation (MB)

Challenge: Memory Too Small to Hold DNN Peak Memory (kB) ResNet-50 23x MobileNetV2 22x MobileNetV2 (int8) 5x 320kB 0 2000 4000 6000 8000 constraint

Challenge: Memory Too Small to Hold DNN Peak Memory (kB) ResNet-50 23x MobileNetV2 22x MobileNetV2 (int8) 5x 320kB 0 2000 4000 6000 8000 constraint MCUNet

MCUNet: System-Algorithm Co-design

MCUNet: System-Algorithm Co-design Library NAS (a) Search NN model on an existing library e.g., ProxylessNAS, MnasNet

MCUNet: System-Algorithm Co-design Library NN Model Library NAS (a) Search NN model on an existing library (b) Tune deep learning library given a NN model e.g., ProxylessNAS, MnasNet e.g., TVM

MCUNet: System-Algorithm Co-design Library NN Model Library NAS (a) Search NN model on an existing library (b) Tune deep learning library given a NN model e.g., ProxylessNAS, MnasNet e.g., TVM Efficient Neural Architecture TinyNAS MCUNet TinyEngine Efficient Compiler / Runtime (c) MCUNet : system-algorithm co-design

MCUNet: System-Algorithm Co-design Library NN Model Library NAS (a) Search NN model on an existing library (b) Tune deep learning library given a NN model e.g., ProxylessNAS, MnasNet e.g., TVM Efficient Neural Architecture TinyNAS TinyNAS MCUNet TinyEngine Efficient Compiler / Runtime (c) MCUNet : system-algorithm co-design

TinyNAS: Two-Stage NAS for Tiny Memory Constraints Search space design is crucial for NAS performance There is no prior expertise on MCU model design Full Network Space

TinyNAS: Two-Stage NAS for Tiny Memory Constraints Search space design is crucial for NAS performance There is no prior expertise on MCU model design Memory/Storage Constraints Full Network Space Optimized Search Space

TinyNAS: Two-Stage NAS for Tiny Memory Constraints Search space design is crucial for NAS performance There is no prior expertise on MCU model design Memory/Storage Constraints Full Network Space Model Specialization Optimized Search Space

TinyNAS: (1) Automated search space optimization Revisit ProxylessNAS search space: S = kernel size × expansion ratio × depth

TinyNAS: (1) Automated search space optimization Revisit ProxylessNAS search space: S = kernel size × expansion ratio × depth k=7 k=5 k=3

TinyNAS: (1) Automated search space optimization Revisit ProxylessNAS search space: S = kernel size × expansion ratio × depth dw pw1 pw2 e=6 k=7 e=4 k=5 k=3 e=2

TinyNAS: (1) Automated search space optimization Revisit ProxylessNAS search space: S = kernel size × expansion ratio × depth dw pw1 pw2 d=4 e=6 k=7 d=3 e=4 k=5 k=3 d=2 e=2

TinyNAS: (1) Automated search space optimization Revisit ProxylessNAS search space: S = kernel size × expansion ratio × depth Out of memory!

TinyNAS: (1) Automated search space optimization Extended search space to cover wide range of hardware capacity: S’ = kernel size × expansion ratio × depth × input resolution R × width multiplier W

TinyNAS: (1) Automated search space optimization Extended search space to cover wide range of hardware capacity: S’ = kernel size × expansion ratio × depth × input resolution R × width multiplier W Di ff erent R and W for di ff erent hardware capacity (i.e., di ff erent optimized sub-space) R= 224 , W= 1.0

TinyNAS: (1) Automated search space optimization Extended search space to cover wide range of hardware capacity: S’ = kernel size × expansion ratio × depth × input resolution R × width multiplier W Di ff erent R and W for di ff erent hardware capacity (i.e., di ff erent optimized sub-space) R= 260 , W= 1.4 * R= 224 , W= 1.0 * Cai et al., Once-for-All: Train One Network and Specialize it for Efficient Deployment, ICLR’20

TinyNAS: (1) Automated search space optimization Extended search space to cover wide range of hardware capacity: S’ = kernel size × expansion ratio × depth × input resolution R × width multiplier W Di ff erent R and W for di ff erent hardware capacity (i.e., di ff erent optimized sub-space) R= 260 , W= 1.4 R= 224 , W= 1.0 R= ? , W= ? F412/F743/H746/… 256kB/320kB/512kB/…

TinyNAS: (1) Automated search space optimization Analyzing FLOPs distribution of satisfying models in each search space: Larger FLOPs -> Larger model capacity -> More likely to give higher accuracy

TinyNAS: (1) Automated search space optimization Analyzing FLOPs distribution of satisfying models in each search space: Larger FLOPs -> Larger model capacity -> More likely to give higher accuracy 320kB? 100% width-res. | mFLOPs Cumulative Probability w0.3-r160 | 32.5 75% w0.4-r144 | 46.9 50% 25% 0% 25 30 35 40 45 50 55 60 65 FLOPs (M)

TinyNAS: (1) Automated search space optimization Analyzing FLOPs distribution of satisfying models in each search space: Larger FLOPs -> Larger model capacity -> More likely to give higher accuracy 100% width-res. | mFLOPs Cumulative Probability w0.3-r160 | 32.5 75% w0.4-r144 | 46.9 50% 25% 0% 25 30 35 40 45 50 55 60 65 FLOPs (M)

TinyNAS: (1) Automated search space optimization Analyzing FLOPs distribution of satisfying models in each search space: Larger FLOPs -> Larger model capacity -> More likely to give higher accuracy 100% width-res. | mFLOPs Cumulative Probability w0.3-r160 | 32.5 p=80% (32.3M, 80%) (45.4M, 80%) 75% w0.4-r144 | 46.9 Bad design space p0.8 50% 25% 0% 25 30 35 40 45 50 55 60 65 FLOPs (M)

TinyNAS: (1) Automated search space optimization Analyzing FLOPs distribution of satisfying models in each search space: Larger FLOPs -> Larger model capacity -> More likely to give higher accuracy 100% width-res. | mFLOPs Cumulative Probability w0.3-r160 | 32.5 p=80% (32.3M, 80%) (45.4M, 80%) 75% w0.4-r144 | 46.9 Bad design space p0.8 % best acc: 76.4% 2 . 4 50% 7 : c c a t s 25% e b 0% 25 30 35 40 45 50 55 60 65 FLOPs (M)

TinyNAS: (1) Automated search space optimization Analyzing FLOPs distribution of satisfying models in each search space: Larger FLOPs -> Larger model capacity -> More likely to give higher accuracy 100% width-res. | mFLOPs Cumulative Probability w0.3-r160 | 32.5 p=80% (32.3M, 80%) (50.3M, 80%) 75% w0.4-r112 | 32.4 Bad design space w0.4-r128 | 39.3 % Good design space: likely to achieve best acc: 76.4% best acc: 78.7% 2 w0.4-r144 | 46.9 high FLOPs under memory constraint . 4 50% w0.5-r112 | 38.3 7 : w0.5-r128 | 46.9 c c w0.5-r144 | 52.0 a t s w0.6-r112 | 41.3 25% e b w0.7-r96 | 31.4 w0.7-r112 | 38.4 p0.8 0% 25 30 35 40 45 50 55 60 65 FLOPs (M)

TinyNAS: (2) Resource-constrained model specialization • One-shot NAS through weight sharing Super Network Random sample Jointly fine-tune (kernel size, multiple sub- expansion, depth) networks • Small sub-networks are nested in large sub-networks. * Cai et al., Once-for-All: Train One Network and Specialize it for Efficient Deployment, ICLR’20

TinyNAS: (2) Resource-constrained model specialization • One-shot NAS through weight sharing Super Network Random sample Jointly fine-tune (kernel size, multiple sub- expansion, depth) networks Directly evaluate the accuracy of sub-nets …

TinyNAS: (2) Resource-constrained model specialization Elastic Elastic Elastic Kernel Size Depth Width 40

MCUNet: Tiny Deep Learning on IoT Devices Ji Lin 1 Song Han 1 - PowerPoint PPT Presentation

MCUNet: Tiny Deep Learning on IoT Devices Ji Lin 1 Song Han 1 Wei-Ming Chen 1,2 Yujun Lin 1 John Cohn 3 Chuang Gan 3 1 MIT 2 National Taiwan University 3 MIT-IBM Watson AI Lab NeurIPS 2020 (spotlight) Background: The Era of AIoT on

The Internet of Things: (almost) every thing connected to Internet By Vctor M. Rivas Santos

WHERE CAN I PUT MY TINY HOUSE? TINY HOMES CARNIVAL 8 MARCH 2020 1 08 MAR 2020 WHO ARE WE? 2

IoT - Big Data & Security MWC Smart Cities Seminar Telefnica Global IoT Group Feb 2017

An Introduction to IoT Penetration Testing @libertyunix www.kmco.com The Agenda n IoT Attack

Internet of Things (IoT) Raspberry Pi Summer Camp Tech Talk Raspberry Pi Camp IoT 1

The Small (Tiny) House Movement SCAPA Fall Conference October 16, 2014 Photo credit Tumbleweed

Why IoT IoT Domain IoT Data Characteristics Massive data: 20.4 Billion connected Growing

Considerations for Enterprise Grade IoT Ishu Verma Red Hat AGENDA l 50 Shades of IoT l Functions,

Ti Time me Squeezing for Tiny Device ces DAC 2018, ISCA 2019

MOBILE DEVICES USING WIFI NETWORK JOAQUIN BERENGUER 30 MINUTES IMAGES OF IOT CONTENT

Toward De Designing Pri rivacy and Security La Label for or IoT IoT De Devices Pardis

IoT Trade Mission to Malaysia 23 rd 26 th April 2018 IOT IN ASIA AND MALAYSIA Global IoT

Akintayo Akinyoade 12/01/2017 Survey Roadmap Internet of Things (IoT)? Tech. Enablers for IoT

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Internet of Things for B2B Connected Devices, Data and Connected Devices, Data and IoT IoT

Garbage Collection for Edge Computing Andrs Amaya Garca David May Ed Nutting

Optically Based Small Arms Targeting for Air Defense Applications Dr. John R. Surdu, Col US Army

Slide Handouts: Environment Take Action Welcome to Module 3: Lesson 3. Environment: Take

John J. Kiefer, Ph.D. Associate Professor & Director MPA Program 1 2 What makes up your

T HE L OGIC OF A TOMIC S ENTENCES : P ROOFS OF ( IN )V ALIDITY Wednesday, 1 September Wednesday,

ForCES Applicability Statement draft-crouch-forces-applicability-00.txt Mark Handley Alan Crouch

Operational Semantics Part I Jim Royer CIS 352 February 12, 2019 1 / 22 [Syntax] [Big Steps]

Smoke free environment policy A guide to managing staff in a smoke free environment Smoke free

MCUNet: Tiny Deep Learning on IoT Devices Ji Lin 1 Song Han 1 - PowerPoint PPT Presentation

MCUNet: Tiny Deep Learning on IoT Devices Ji Lin 1 Song Han 1 Wei-Ming Chen 1,2 Yujun Lin 1 John Cohn 3 Chuang Gan 3 1 MIT 2 National Taiwan University 3 MIT-IBM Watson AI Lab NeurIPS 2020 (spotlight) Background: The Era of AIoT on

The Internet of Things: (almost) every thing connected to Internet By Vctor M. Rivas Santos

WHERE CAN I PUT MY TINY HOUSE? TINY HOMES CARNIVAL 8 MARCH 2020 1 08 MAR 2020 WHO ARE WE? 2

IoT - Big Data &amp; Security MWC Smart Cities Seminar Telefnica Global IoT Group Feb 2017

An Introduction to IoT Penetration Testing @libertyunix www.kmco.com The Agenda n IoT Attack

Internet of Things (IoT) Raspberry Pi Summer Camp Tech Talk Raspberry Pi Camp IoT 1

The Small (Tiny) House Movement SCAPA Fall Conference October 16, 2014 Photo credit Tumbleweed

Why IoT IoT Domain IoT Data Characteristics Massive data: 20.4 Billion connected Growing

Considerations for Enterprise Grade IoT Ishu Verma Red Hat AGENDA l 50 Shades of IoT l Functions,

Ti Time me Squeezing for Tiny Device ces DAC 2018, ISCA 2019

MOBILE DEVICES USING WIFI NETWORK JOAQUIN BERENGUER 30 MINUTES IMAGES OF IOT CONTENT

Toward De Designing Pri rivacy and Security La Label for or IoT IoT De Devices Pardis

IoT Trade Mission to Malaysia 23 rd 26 th April 2018 IOT IN ASIA AND MALAYSIA Global IoT

Akintayo Akinyoade 12/01/2017 Survey Roadmap Internet of Things (IoT)? Tech. Enablers for IoT

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Internet of Things for B2B Connected Devices, Data and Connected Devices, Data and IoT IoT

Garbage Collection for Edge Computing Andrs Amaya Garca David May Ed Nutting

Optically Based Small Arms Targeting for Air Defense Applications Dr. John R. Surdu, Col US Army

Slide Handouts: Environment Take Action Welcome to Module 3: Lesson 3. Environment: Take

John J. Kiefer, Ph.D. Associate Professor &amp; Director MPA Program 1 2 What makes up your

T HE L OGIC OF A TOMIC S ENTENCES : P ROOFS OF ( IN )V ALIDITY Wednesday, 1 September Wednesday,

ForCES Applicability Statement draft-crouch-forces-applicability-00.txt Mark Handley Alan Crouch

Operational Semantics Part I Jim Royer CIS 352 February 12, 2019 1 / 22 [Syntax] [Big Steps]

Smoke free environment policy A guide to managing staff in a smoke free environment Smoke free

IoT - Big Data & Security MWC Smart Cities Seminar Telefnica Global IoT Group Feb 2017

John J. Kiefer, Ph.D. Associate Professor & Director MPA Program 1 2 What makes up your