Towards Benchmarking AIOT Device based on MCU Dong Li Seaway - - PowerPoint PPT Presentation

towards benchmarking aiot device based on mcu
SMART_READER_LITE
LIVE PREVIEW

Towards Benchmarking AIOT Device based on MCU Dong Li Seaway - - PowerPoint PPT Presentation

Towards Benchmarking AIOT Device based on MCU Dong Li Seaway Technology Inc. ICT, CAS 2019-11-15 Outline MCU-based AIOT Device and Benchmarking SeawayRTOS Intro. & Auditing Kernel Contents Early Experiments for BenchMarking


slide-1
SLIDE 1

Dong Li Seaway Technology Inc. ICT, CAS 2019-11-15

Towards Benchmarking AIOT Device based on MCU

slide-2
SLIDE 2

Bench19 Seaway tech.

2

Outline

Contents

MCU-based AIOT Device and Benchmarking SeawayRTOS Intro. & Auditing Kernel BenchMarking Goal and Method

2

Early Experiments for BenchMarking

slide-3
SLIDE 3

Bench19 Seaway tech.

3

内容提要

2

01

MCU-based AIOT Device and Benchmarking

slide-4
SLIDE 4

Bench19 Seaway tech.

4

MCU-based AIOT Device

2

  • 1. Tiny Smart Device with computing ability are

Already Cheap and Everywhere.

  • 2. the Future of Machine Learning will be Tiny
slide-5
SLIDE 5

Bench19 Seaway tech.

5

MCU and Sensors are already in milliwatts ranges

2

6 in' Display 400 mW 4G cell radio 800 mW LP BLE4.0&WIFI 100 mW Gyroscope Sensor 130mW GPS 180 milliwatts. 1/4 CMOS camera 300 milliwatts.

  • ARM & Princeton

[arXiv:1905.12107]

slide-6
SLIDE 6

Bench19 Seaway tech.

6

Deep Learning Works Well and Energy-Efficient on MCUs

2

  • 1. ARM CMSIS-5 for Cortex M
  • CMSIS-NN
  • uTensor
  • 2. TensorFlow Lite For MCU
  • Person detection
  • Speech Keyword spotting
  • Classify physical gestures
  • 3. Microsoft Embedded Learning Library (ELL)

ESP32 SOC WIFI and BLE Spark fun Edge with Apollo3 Nordic nRF 52840 BLE STM32F746 Discovery kit

slide-7
SLIDE 7

Bench19 Seaway tech.

7

Existing MCUs and New AIOT Low Power Proccessor

  • 1. MCU

40~200Mhz

  • 2. RAM(SDRAM) 32KB ~ 512KB
  • 3. ROM(Flash)

512KB ~ 1MB

  • 4. Energy

~100 uA/MHz (1.2V - 5V) Existing MCU/DSP

  • 1. MCU+NPU by ARM or RISC-V
  • 2. MCU+DSP+ Spec. NN Accelerator by ARM/RISC-V/FPGA
  • 3. MCU+PIM(Process in Memory) chip

New AIOT Proccessor (MCU/DSP+NPU) ESP32 by TFLite for Face Recognition ICT RISCV MCU+NPU FPGA Broad

slide-8
SLIDE 8

Bench19 Seaway tech.

8

Benchmarking Goal : The Best Shape

picojoules per op

Accuracy Energy Consumption Max RAM Cost Max ROM Computing Performance spindle-shaped is the best shape

slide-9
SLIDE 9

Bench19 Seaway tech.

9

2

SeawayRTOS Intro. & Auditing Kernel

02

slide-10
SLIDE 10

Bench19 Seaway tech.

10

SeawayRTOS for AIOT Devices

2

KB-Level Runtime KB-level Seaway RTOS Kernelel) KB-Level EdgeStack

  • Online AIOT App Store
  • Support Javascript and Python
  • ROM<100K, RAM<2K
  • Function Migration
  • Support for MQTT、CoAP and HTTP
  • WIFI、BLE、LoRA、NB-IOT and Zigbee
  • ROM<32K, RAM<2K
  • Resp to Req <200 mS

Data/Ins. Bus

I/O BUS

Little Core

Sensor Hub

Sensors Actua.

Big Core OS

AI core

Inference

Memory Controller Comm. Controller EdgeStack Seaway Kernel

HAL & BSP Seaway Runtime AIOT Framework App App App

Energy Opt.

App

Files

  • Auditing Kernel
  • Active Sleep Mode
  • ROM<10K & RAM<1K & TCB<10B
  • ask Fail Rate <0.1%
slide-11
SLIDE 11

Bench19 Seaway tech.

11

2

Seaway Runtime

技术特点

  • 1. AIOT App Store
  • 不落盘AIOT App应用执行方法
  • 面向边缘域的拟单机编程
  • 2. AIOT Runtime Development
  • on Kernel:Native C/C++
  • on Runtime:JavaScript/Python
  • Dynamic Task Allocation and Execution
  • 3. Less Codes than Traditional Embedded Program

Evaluation index Experiment result WebletScript JerryScript Duktape Espruino Compatibility(%) 58.6 99.7 99.4 66.5 Footprint(KB) 80 168 184 231 by ECMA-262 benchmark

slide-12
SLIDE 12

Bench19 Seaway tech.

12

Seaway EdgeSuite

2

End AIOT Device Edge AIOT Device Cloud

Seaway RTOS Seasway Edge Seaway Cloud The developer now only need one application for the whole end-Edge-clould system

slide-13
SLIDE 13

Bench19 Seaway tech.

13

Auditing Kernel Design

n Enable Kernel information monitoring for event-driven RTOS should be in Kernel n A lightweight resource auditing tool Less than 1KB ROM and 1KB RAM nEarly security warning when the abnormal resource usage pattern is captured

Design Goals

slide-14
SLIDE 14

Bench19 Seaway tech.

14

n Process

l Confirm the execution entity of a task l Locate the executable code segment

n Event

l The event statistics data of a tasks in the kernel l Identify the abnormal event usage.

n Hardware resource usage

l Quantity and pattern of the consumption of hardware resource, including Proccessor, Memory, Radio and Sensors

Auditing Kernel Design

5

slide-15
SLIDE 15

Bench19 Seaway tech.

15

7

Seaway Resource Auditing Overview

  • 1. Resource Auditor Moudle collects the

running information and generates the log data of an AIoT device.

  • 2. Seaway analyzes the log data in Edge

devices according to the corresponding resource usage Model.

  • 3. the AIoT devices receive the performance

status.

slide-16
SLIDE 16

Bench19 Seaway tech.

16

n Data Hook l Process-Event Model l Hardware Time-Base Model n Data processing Module n Warning Handle Module 7

Kernel Auditing Architecture

n kernel inner loop function

l The entity of a task l The executable code segment l Setup hooks in basic kernel function such as do_poll / do_event l Save the data in the locally file system l Or Send them out to the gateway for analysis

slide-17
SLIDE 17

Bench19 Seaway tech.

17

n Hardware resource scheduling

l Quantity and pattern of the consumption of event and task

Capture the kernel data for hardware Resources

Category Component Parameter Kernel Events Network Data Package Network wifi_init_result WiFi init wifi_mode WiFi set_mode wifi_state WiFi On/Off source source IP destination destination IP package_transfer System Shcedulin g Data Task Information taskID xTaskCreate task_running_fre quency portYIELD, xPortSysTickHan dler Hardware Module Usage CPU CPU_Frequency CPU frequency switch Sensors nviroment_data sensor_get_data Sensors_Frequen cy sensor frequency switch

slide-18
SLIDE 18

Bench19 Seaway tech.

18

2

Experiments for getting bench score

03

slide-19
SLIDE 19

Bench19 Seaway tech.

19

n SeawayRTOS

l A event-driven scheduling system l multi-threaded l lightweight threading technology Protothreads l file system(Coffee) l network support: LwIP l OTA

Experiment Setup

n CC2538 + ESP32

l an ARM Cortex-M3 with up to 32MHz clock speed l 32KB of RAM l 256KB flash l Zigbee in CC2538 l WIFI/BLE in ESP32 8

slide-20
SLIDE 20

Bench19 Seaway tech.

20

we catch the kernel data of event and process information of an benchmark task using SeawayRTOS EVALUATION 9

slide-21
SLIDE 21

Bench19 Seaway tech.

21

The analysis restult of the tcp/ip experiment with process-event Model

n The Process-event Analysis Result

l There are different operations in Period 1056&1057 compared with base behavior of this benchmarking task l The system is using the radio to send data Warning generated

period

10

slide-22
SLIDE 22

Bench19 Seaway tech.

22

The analysis result of the Time-Base Model

n The Time-Base Analysis Result

l We got the working state information of CPU, Memory, RADIO and SENSORS l There are suspicious operations in Period 5&6 compared with normal action of this application l The System is using the radio to listen other data l Warning generated, and we should suspend the task waiting for the administrator to decide.

period

12

slide-23
SLIDE 23

Bench19 Seaway tech.

23

2

BenchMarking Goal and Method

04

slide-24
SLIDE 24

Bench19 Seaway tech.

24

  • 1. A open-source Testbed Board with sensors and Radios

2 the main processor

A: Low Power BLE/WIFI Module B: MIC C: Accelerometers D: Temperature & Humidity E: multi-threaded Protothreads F: COMS Image Sensor G: PIR (motion) sensor H: GPS

slide-25
SLIDE 25

Bench19 Seaway tech.

25

Run the Benchmark tasks on DataSets

2

MNIST database handwritten digits CIFAR-10 Wechat Audio 100 Keyword Spotting By Seaway Tech. Chars74K dataset Band Accelerator Data 100hours Pattern recognition Band Heart Rate 100hours for DL and SVM alg. By Seaway Tech. Character Recognition We can provide some baseline results on these dataset with our own implementation on STM32 and ESP32

  • bjects classification
slide-26
SLIDE 26

Bench19 Seaway tech.

26

Benchmark Design

2

First Satisfy:

  • 1. Benchmark Alg. Accuracy > baseline
  • 2. Max ROM < baseline
  • 3. Max RAM < baseline
  • 4. Processor Cost

Compare: how much energy a single benchmark task cost given picojoules per op

slide-27
SLIDE 27

Bench19 Seaway tech.

27

Thanks

Dong Li Seaway Technology Inc. lidong@haiwei.tech

slide-28
SLIDE 28

Bench19 Seaway tech.

28

Comparison

2

AliOS Things Amazon FreeRTOS Microsoft ThreadX Seaway 授权方式 社区版开源 小部分开源 闭源 社区版开源 基础内核Footprint 8KB 8KB 10KB 8KB 物端应用层协栈 各协议分立-80K MQTT协议栈-20K 专有协议-80K MCH综合栈, 32KB ML推理模型支持

  • 支持

支持 支持 低功耗控制

  • 支持

支持(<0.1w) 边缘计算支持

  • 支持

支持 支持 原生安全机制

  • 支持

第三方应用支持 物云独立 物云一体 物云一体 端边云一体 IOT云服务 绑定阿里云 绑定AWS 绑定Azure 自由 AI数学库支持

  • 至Cortex A级
  • 至Cortex M级