Quest(-V): A Secure and Predictable System for Smart IoT Devices - - PowerPoint PPT Presentation

quest v a secure and predictable system for smart iot
SMART_READER_LITE
LIVE PREVIEW

Quest(-V): A Secure and Predictable System for Smart IoT Devices - - PowerPoint PPT Presentation

Quest(-V): A Secure and Predictable System for Smart IoT Devices Richard West richwest@cs.bu.edu Computer Science Emerging Smart Devices Need an OS Multiple cores GPIOs PWM Virtualization support Integrated Graphics


slide-1
SLIDE 1

Quest(-V): A Secure and Predictable System for Smart IoT Devices

Richard West richwest@cs.bu.edu

Computer Science

slide-2
SLIDE 2

2

Emerging “Smart” Devices Need an OS

  • Multiple cores
  • GPIOs
  • PWM
  • Virtualization support
  • Integrated Graphics
  • Various bus interfaces
  • Timing + data

security requirements

slide-3
SLIDE 3

3

Recap: Quest-V Separation Kernel

Sandbox M Monitor Sandbox 1

VCPU

. . .

Monitor Sandbox 2

VCPU VCPU

Monitor Communication + Migration

VCPU VCPU

Sandbox Address Space Thread IO Devices IO Devices IO Devices PCPU(s) PCPU(s) PCPU(s)

Exploit VT-x/EPT capabilities on Intel multicore processors for efficient sandboxing

slide-4
SLIDE 4

4

VCPUs in Quest(-V)

Main VCPUs I/O VCPUs Threads PCPUs (Cores) Address Space

  • Temporal isolation between VCPUs
  • Guarantee budget C every T cycles (or time units)
  • I/O VCPUs use simpler bandwidth preservation

scheme

  • Reduces timer reprogramming overheads for

short-lived interrupts

slide-5
SLIDE 5

5

Proposed Work

  • Implement and study Quest(-V) on Intel SBCs
  • Port of Quest to Intel Galileo

[Done]

  • Port of Quest(-V) to Intel Edison and

Minnowboard Max [Quest is working]

  • Qduino API [Version 1 complete]

– Now working on QduinoMC [In progress]

  • IoT “smart” devices/apps: 3D printing /

manufacturing, robotics, secure home automation, UAVs, etc [In progress]

slide-6
SLIDE 6

6

Smart Devices

  • Dumb device?

– Requires remote inputs to function – No autonomy

  • Smart device?

– Ability to make own decisions, at least partly, based on sensory inputs that determine the state of the environment and the device itself

  • e.g., Smart 3D printer

– Spool requests via webserver – High level (STL file) requests rather than g-codes – Local slicer engine & g-code parser – Local verifier for “correctness” of requests – Possible communication/coordination with other smart devices

slide-7
SLIDE 7

7

Developments 1/2

  • Built 3D printer controller circuit using:

– MinnowMax/Turbot – RAMPS 1.4 – ADS7828 I2C Analog-to-Digital Converter – 4 x 4988 Pololu Stepper Motor drivers – PNP/NPN transistors, resistors etc for level shifting

  • Tested on a Printrbot Simple Metal

– See: www.cs.bu.edu/fac/richwest/smartprint3d.php

slide-8
SLIDE 8

8

Developments 2/2

  • Ported Marlin 3D printer firmware to Yocto Linux

– Used Intel IoT devkit libmraa library to interface w/

I2C ADC and GPIOs via sysfs

  • Ported Quest to MinnowMax and Turbot

– Developed test scenarios for 3D print objects – Details to follow

  • Papers

– Qduino – RTSS'15 – Quest-V – ACM TOCS

slide-9
SLIDE 9

9

Marlin on Arduino

  • One loop and two timer interrupt handlers

– Loop: read G-code commands, translate them to

motor movements and fan/heater operations

– A high frequency, sporadic timer interrupt to drive

motors (up to 10 Khz)

  • Trapezoidal speed control

– A low frequency, periodic timer interrupt to read

extruder temperature (1 KHz)

slide-10
SLIDE 10

10

Real-Time Challenges

  • Nanosleep timing for stepper motor control
  • Matching extrusion rate with bed motion
  • Let:

– B = gear pitch (e.g., 2mm for GT2 pulley) – C = gear tooth count (e.g., 20) – S = stepper motor steps per revolution (e.g., 200) – α = microstepping (e.g., 16 for 4988 driver) – V = feedrate in given axis (e.g., 125mm/s)

  • GPIO stepper pulse frequency, F:

– F = (V * S * α) / (B * C) = 10kHz using above params – Requires 100 microsecond pulse timing – Won't work with Linux scheduling accuracy!

slide-11
SLIDE 11

11

Marlin on Linux/MinnowBoard Max

  • Ported Marlin to a Linux program

– Replaced hardware timer interrupts with high

resolution software timers

  • Linux hrtimer-based nanosleep

– Replaced architecture-dependent I/O operations

with mraa library functions

– Cons: approach fails to utilize underlying

hardware parallelism

slide-12
SLIDE 12

12

Marlin on Linux

Read Gcode Translate coordinates to steps Use temperature to do PID control Extract steps from the block and pulse the steppers

Buffer: each block contains steps for one command File system Motors

Read temperature and adjust Fan & heater Temperature PID output

Fan & Heater

slide-13
SLIDE 13

13

Quest on MinnowBoard Max

  • Ported Quest to MinnowBoard Max

– Added I2C Driver – Added GPIO Driver – Updated ACPI firmware to latest version

  • Implemented partial mraa library on Quest

– I2C Module (read/write bytes on I2C bus) – GPIO Module (get/set value+direction of GPIOs)

  • Qduino Framework
slide-14
SLIDE 14

14

Marlin on Quest/MinnowMax

  • Three Qduino loops

– Loop 1: command reading and path planning

  • Calculate & buffer steps+direction along each axis

– Loop 2: motor driving

  • Smallest period and largest CPU utilization

– Loop 3: temperature reading & adjustment

  • Largest period and smallest utilization
slide-15
SLIDE 15

15

Marlin on Quest/MinnowMax

Quest Kernel VCPU VCPU VCPU Qduino Library

G-code translation Temperature PID control

Loop 1

Extract steps from the block and pulse the steppers Read temperature and adjust fan/heater

Loop 2 Loop 3

buffer

Temperature PID output

MinnowBoard Core 1 Core 2

slide-16
SLIDE 16

16

Qduino

  • Qduino – Enhanced Arduino API for Quest

– Parallel and predictable loop execution – Real-time communication b/w loops – Predictable and efficient interrupt management – Real-time event delivery – Backward compatible with Arduino API – Simplifies multithreaded real-time programming

slide-17
SLIDE 17

17

Interleaved Sketches

//Sketch 2: toggle pin 10 every 3s int val10 = 0; void setup() { pinMode(10, OUTPUT); } void loop() { val10 = !val10; //flip the output value digitalWrite(10, val10); delay(3000); //delay 3s } // Sketch 1: toggle GPIO pin 9 // every 2s int val9 = 0; void setup() { pinMode(9, OUTPUT); } void loop() { val9 = !val9; //flip the output value digitalWrite(9, val9);

delay(2000); //delay 2s

}

How do you merge the sketches and keep the correct delays?

slide-18
SLIDE 18

18

Interleaved Sketches

int val9, val10 = 0; int next_flip9, next_flip10 = 0; void setup() { pinMode(9, OUTPUT); pinMode(10, OUTPUT); } void loop() { if (millis() >= next_flip9) { val9 = !val9; //flip the output value digitalWrite(9, val9); next_flip9 += 2000; } if (millis() >= next_flip10) { val10 = !val10; //flip the output value digitalWrite(10, val10); next_flip10 += 3000; } }

  • Do scheduling by

hand

  • Inefficient
  • Hard to scale
slide-19
SLIDE 19

19

Qduino Multi-threaded Sketch

int val9, val10 = 0; int C = 500, T = 1000; void setup() { pinMode(9, OUTPUT); pinMode(10, OUTPUT); } void loop(1, C, T) { val9 = !val9; // flip the output value digitalWrite(9, val9); delay(2000); } void loop(2, C, T) { val10 = !val10; // flip the output value digitalWrite(10, val10); delay(3000); }

slide-20
SLIDE 20

20

Qduino Organization

Sketch

Kernel User ...

Quest Native App Quest Native App Galileo QDuino Libs loop1 loopN

... x86 SoC

Edison Minnowboard GPIO Driver SPI Driver I2C Driver

slide-21
SLIDE 21

21

Qduino New APIs

Function Signatures Category

  • loop(loop_id, C, T)

Structure

  • interruptsVcpu(C,T) ← I/O VCPU
  • attachInterruptVcpu(pin,ISR,mode,C,T) ←Main VCPU

Interrupt

  • spinlockInit(lock)
  • spinlockLock(lock)
  • spinlockUnlock(lock)

Spinlock

  • channelWrite(channel,item)
  • item channelRead(channel)

Four-slot

  • ringbufInit(buffer,size)
  • ringbufWrite(buffer,item)
  • ringbufRead(buffer,item)

Ring buffer

slide-22
SLIDE 22

22

Qduino Event Handling

Scheduler

Main VCPU Main VCPU

Sketch Thread

I/O VCPU

User Interrupt Handler Interrupt Bottom Half

CPU Core(s) GPIO Expander Kernel User

Wakeup

attachInterruptVcpu interrupt return

GPIO Driver

Hardware Interrupt

slide-23
SLIDE 23

23

Qduino Temporal Isolation

10 20 30 40 50 60 100T 200T 300T 400T 500T

Counter (x104) Time (Periods)

(50,100),2 (50,100),4 (70,100),2 (70,100),4 (90,100),2 (90,100),4 Linux,2 Linux,4

  • Foreground loop increments

counter during loop period

  • 2-4 background loops act

as potential interference, consuming remaining CPU capacity

  • No temporal isolation or

timing guarantees w/ Linux

slide-24
SLIDE 24

24

Qduino Rover

  • Autonomous Vehicle
  • Collision avoidance using ultrasonic

sensor

  • Two tasks:
  • A sensing task detects distance to an
  • bstacle – delay(200)
  • An actuation task controls the motors -

delay(100)

slide-25
SLIDE 25

25

Rover Performance

  • Measure the time interval

between two consecutive calls to the motor actuation code

  • Clanton Linux single loop
  • delay from both sensing

and actuation task

  • Qduino multi-loop
  • No delay from sensing

loop

  • No delay from sensor

timeout

  • The shorter the worst case

time interval, the faster the vehicle can drive

100 200 300 400 500 600 700 800 10 20 30 40 50 60 70 80 90 100

Time (milliseconds) Sample #

Clanton Single-loop Qduino Multi-loop Qduino Single-loop Clanton Interrupt

slide-26
SLIDE 26

26

RacerX Autonomous Vehicle

slide-27
SLIDE 27

27

Edison 3D Printer Controller

Real-time Sensing & Control Real-time Sensing & Control Real-time Job Scheduling Real-time Job Scheduling Linux Linux Memory Memory Monitor Monitor Core(s) Core(s) Core(s) Core(s) Core(s) Core(s) Web Server / Verification Web Server / Verification Comms Monitor Monitor Monitor Monitor Memory Memory Memory Memory I/O Devices e.g. Motors, Extruder, Temp Sensors I/O Devices e.g. Motors, Extruder, Temp Sensors I/O Devices e.g. Flash Storage I/O Devices e.g. Flash Storage I/O Devices e.g. NIC I/O Devices e.g. NIC Hardware Kernel VCPU(s) VCPU(s) VCPU(s) VCPU(s) User Untrusted Trusted Sandbox 1 Sandbox 2 Sandbox 3

DUAL CORE ATOM SILVERMONT QUARK MCU INTERNET

slide-28
SLIDE 28

28

MinnowMax 3D Printer Controller

slide-29
SLIDE 29

29

MinnowMax 3D Printer Controller

7805

GND

7404 1N4728 1K

GND

Z_STOP

+12V

Z_STOP Sensor 5V REG

1 2 3 4 5 6 7 14 13 12 11 10 9 8

GND CH0 CH1 CH2 CH3 CH4 CH5 CH6 CH7 NC SCL SDA REF VCC

ADS7828 I2C-ADC

THERM0 +5V +5V Y_ENABLE Z_ENABLE

RAMPS 1.4

4988 4988 4988 4988

A0 A1 A0 A0 A2

X_STEP X_DIR

A0 A7 A6 A0 A8

Y_STEP Y_DIR

A0 D46

Z_STEP

A0 D48

Z_DIR

A0 D38

X_ENABLE

A0 D26

E0_STEP

A0 D28

E0_DIR

A0 D24

E0_ENABLE

D3

X_STOP

D15

Y_STOP_max HEAT_IN

D9

FAN_IN

D10

X Y Z E0

  • s

s

Y_STOP_max

GND +12V +

  • +

D9D10 THERM0

A0 A13

X_STOP Switch T0 Thermistor

1 3 5 7 9 11 13 15 17 19 21 23 25 2 4 6 8 10 12 14 16 18 20 22 24 26

GND GND +5V +3.3V SCL SDA PWM0 FAN HEATER X_STEP SPI_MISO SPI_MOSI Z_ENABLE E0_STEP E0_DIR X_STOP Y_STOP_max Z_STOP E0_ENABLE Z_DIR Z_STEP Y_ENABLE Y_DIR Y_STEP X_ENABLE X_DIR

MINNOWBOARD MAX

2N3906 2N3904 1K

GND

4.7K

+5V GND +3.3V

4.7K FAN

  • r

HEATER FAN_IN (D9)

  • r

HEAT_IN (D10)

Circuit x2

slide-30
SLIDE 30

30

Future Directions

  • QduinoMC

– API support to map loops to cores – Load balancing via MARACAS [RTSS'16] framework – Pub/sub communications between Quest-V sandboxes

  • e.g., Linux ↔ Quest
  • QROS

– Legacy Linux ROS nodes communicate w/ time- critical Quest services

slide-31
SLIDE 31

31

Future Directions

  • Smart Devices / Apps

– Use Intel SBCs/SoCs (Up board, Edison, MinnowMax, Celeron Braswell, Skylake U, Kaby/Apollo Lake NUCs) – Energy + CPU + GPU + latency-sensitive I/O requirements – RacerX autonomous rover – Smart drones

  • Configurable mission objectives (indoor / outdoor)
  • Search & rescue, surveillance, package delivery, SLAM, target

tracking

  • Real-time adaptive control (e.g. in windy conditions)

– Biokinematic / body sensor network (Edison/Curie)

slide-32
SLIDE 32

32

Future Directions

  • Quest-V on Edison, MinnowMax, Up board,
  • ther Intel SBCs/SoCs

– Mixed-criticality: Linux + Quest – Mixed-criticality scheduling (ECRTS'16) – TMR fault tolerance using replicated sandboxes

slide-33
SLIDE 33

33

MARACAS Framework

  • Quest memory+cache-aware scheduling

framework

– Supports VCPU load balancing to share background cycles across cores

  • Background cycles: 1-C/T
  • Uses h/w perf counters to identify bus congestion
  • Congestion? Throttle select cores with available

background cycles

  • Reduces memory/bus congestion while guaranteeing

VCPU foreground timing requirements

slide-34
SLIDE 34

34

MARACAS Framework

  • Avg memory request latency = Occupancy / Requests
  • Occupancy = UNC_ARB_TRK_OCCUPANCY.ALL

– Cycles weighted by queued memory requests

  • Requests = UNC_ARB_TRK_REQUEST.ALL

– # of requests to memory controller request queue

  • If latency exceeds threshold apply weighted throttling

to cores

  • Use COLORIS [PACT'14] dynamic page coloring for

cache isolation

slide-35
SLIDE 35

35

Lessons Learned

  • Intel SBCs for “smart” devices

– Multiple cores (good for multi-tasking) – VT-x capabilities for security/isolation/fault tolerance – GPIOs for interfacing sensors + actuators – PWMs for motor & servo control – Serial interfaces for device communication – Shared caches + memory bus affects temporal isolation (not good for real-time!)

  • ARINC 653 requires space-time isolation b/w cores
slide-36
SLIDE 36

36

Wish List 1/2

  • Intel SBCs for “smart” devices

– Temporal isolation b/w cores

  • TI ARM PRU-like features for dedicated core(s)
  • Quark offers something close on the Edison
  • Support for cache + bus isolation (way partitioning, page coloring, TDMA bus

management?)

– Better GPU support

  • Needed for vision+AI+deep learning tasks
  • Georgia Tech AutoRally vehicle uses Mini-ITX + Nvidia GTX 750 Ti PCIe

card, which is too power-hungry and heavy

– Low-wattage “PC” with GPIOs, serial buses, GPU ala Nvidia Jetson (but better!)

  • e.g., “smart” drone has energy and weight restrictions
slide-37
SLIDE 37

37

Wish List 2/2

  • Intel SBCs for “smart” devices

– Simplified VT-x support

  • Basic memory partitioning b/w sandboxes (no EPT walking)
  • Like segmentation with simplified VMCS
  • Simplified IOMMU w/ DMA to sandbox physical offset address

– Tagged memory for confidentiality + integrity on secure information flows between sandboxes – H/W-assisted port-based I/O interposition

  • To prevent sandbox discovery/access to unauthorized devices
slide-38
SLIDE 38

38

The End

? || /* */

slide-39
SLIDE 39

39

Extra Slides

slide-40
SLIDE 40

40

Goals

  • Develop high-confidence (embedded) systems

– Mixed criticalities: timeliness and safety

  • Predictable
  • Secure
  • Safe / Fault tolerant
  • Efficient
slide-41
SLIDE 41

41

Target Applications

  • Healthcare
  • Avionics
  • Automotive
  • Factory automation
  • Robotics
  • Space exploration
  • Internet-of-Things (IoT)
  • Industry 4.0 smart factories
  • Smart drones, other devices
slide-42
SLIDE 42

42

Internet of Things

  • Number of Internet-connected devices

> 12.5 billion in 2010

  • World population > 7 billion (2015)
  • Cisco predicts 50 billion Internet devices by

2020

  • Challenges:
  • Secure management of data
  • Reliable + predictable data processing & exchange
  • Device interoperability
slide-43
SLIDE 43

43

Background: Quest Real-Time OS

  • Initially a “small” RTOS
  • ~30KB ROM image for uniprocessor version
  • Page-based address spaces
  • Threads
  • Dual-mode kernel-user separation
  • Real-time Virtual CPU (VCPU) Scheduling
  • Later SMP support
  • LAPIC timing

FreeRTOS, uC/OS-II etc Quest Linux, Windows, Mac OS X etc

slide-44
SLIDE 44

44

From Quest to Quest-V

  • Quest-V for multi-/many-core processors

– Distributed system on a chip – Time as a first-class resource

  • Cycle-accurate time accountability

– Separate sandbox kernels for system components – Memory isolation using h/w-assisted memory virtualization – Also CPU, I/O, cache partitioning

  • Focus on safety, efficiency, predictability + security
slide-45
SLIDE 45

45

Related Work

  • Existing virtualized solutions for resource

partitioning

– Wind River Hypervisor, XtratuM, PikeOS, Mentor Graphics Hypervisor – Xen, Oracle PDOMs, IBM LPARs – Muen, (Siemens) Jailhouse

slide-46
SLIDE 46

46

SS Scheduling

  • Model periodic tasks

– Each SS has a pair (C,T) s.t. a server is guaranteed C CPU cycles every period of T cycles when runnable

  • Guarantee applied at foreground priority
  • background priority when budget depleted

– Rate-Monotonic Scheduling theory applies

slide-47
SLIDE 47

47

PIBS Scheduling

  • IO VCPUs have utilization factor, UV,IO
  • IO VCPUs inherit priorities of tasks (or Main

VCPUs) associated with IO events

– Currently, priorities are (T) for corresponding Main VCPU – IO VCPU budget is limited to:

  • TV,main* UV,IO for period TV,main
slide-48
SLIDE 48

48

PIBS Scheduling

  • IO VCPUs have eligibility times, when they

can execute

  • te = t + Cactual / UV,IO

– t = start of latest execution – t >= previous eligibility time

slide-49
SLIDE 49

49

Example VCPU Schedule Example SS-Only Schedule

τ1 Main Application Sporadic Server C=8 T=16

8 16 24 32 8 16 24 32 8 16 24 32

τ2 I/O Interrupt BH Sporadic Server C=4 T=16 Execution

I/O Event Initiated Interrupts Occur Missed Deadline 8,0 8,16 4,0 4,9 3,9 1,25 3,11 1,25 2,11 1,25 1,27

time

2,25 1,27 1,29 2,27 1,29 1,41

slide-50
SLIDE 50

50

Example VCPU Schedule Example SS+PIBS Schedule

τ1 Main Application Sporadic Server C=8 T=16

8 16 24 32 8 16 24 32 8 16 24 32

τ2 I/O Interrupt BH PIBS U=0.25 Execution

I/O Event Initiated Interrupts Occur 8,0 8,16

time

8,32 4,0 4,9 4,13 4,25 No Missed Deadline

slide-51
SLIDE 51

51

Utilization Bound Test

  • Sandbox with 1 PCPU, n Main VCPUs, and

m I/O VCPUs

– Ci = Budget Capacity of Vi – Ti = Replenishment Period of Vi – Main VCPU, Vi – Uj = Utilization factor for I/O VCPU, Vj

i=0 n−1 Ci

Ti + ∑

j=0 m−1

(2−Uj) ⋅Uj≤n⋅ (

n

√2−1)

slide-52
SLIDE 52

52

Cache Partitioning

  • Shared caches controlled using color-aware

memory allocator [COLORIS – PACT'14]

  • Cache occupancy prediction based on h/w

performance counters

– E' = E + (1-E/C) * ml – E/C * mo – Enhanced with hits + misses [Book Chapter, OSR'11, PACT'10]

  • 5 patents (3 awarded so far) w/ VMware