From Reflexes to In-Network Processing Enabling Ultra-low Latency - - PowerPoint PPT Presentation

from reflexes to in network processing
SMART_READER_LITE
LIVE PREVIEW

From Reflexes to In-Network Processing Enabling Ultra-low Latency - - PowerPoint PPT Presentation

From Reflexes to In-Network Processing Enabling Ultra-low Latency and High Reliability for Cyber-physical Networking Klaus Wehrle, joint work by the COMSYS team http://comsys.rwth-aachen.de NIPAA@ICNP, 13.10.2020 Motivation Cyber-physical


slide-1
SLIDE 1

http://comsys.rwth-aachen.de

From Reflexes to In-Network Processing

Enabling Ultra-low Latency and High Reliability for Cyber-physical Networking

NIPAA@ICNP, 13.10.2020 Klaus Wehrle, joint work by the COMSYS team

slide-2
SLIDE 2

http://comsys.rwth-aachen.de

2

Motivation

Evolution of Communication Systems

Internet

  • f Things

Web & Cloud WiFi 3G 4G CPS / IIoT Control

Cyber-physical networking „ Remote control of machines, humans not involved „ High Precision required „ Challenges: Ultra-low latency and high reliability

Human-centric communication

„ Humans are ‘slow’ and compensate (comm. & system) errors „ Latency (<20ms) was never a big issue

Good old Internet

Cyber Physical Cyber Physical Human

slide-3
SLIDE 3

http://comsys.rwth-aachen.de

3

Cyber-Physical Networking – Challenge 1: Ultra-low Latency

Challenge: Ultra-low latency

Sensors Transport Network MAC PHY IP MAC PHY MAC PHY IP MAC PHY MAC PHY

Switch Switch

Controller Transport Network MAC PHY

Cloud with Controller

Actuators Application Application

slide-4
SLIDE 4

http://comsys.rwth-aachen.de

4

Cyber-Physical Networking – Challenge 1: Ultra-low Latency

Challenge: Ultra-low latency

„ Problem 1: Physical distance „ Solution: Reduce the distance 😐 à Edge Cloud reduces horizontal distance, e.g. in 5G „ Edge cloud still not nearest to plant

Sensors Transport Network MAC PHY IP MAC PHY MAC PHY IP MAC PHY MAC PHY

Switch Switch

Controller Transport Network MAC PHY

Cloud with Controller

Actuators Controller Transport Network MAC PHY

Cloud with Controller

slide-5
SLIDE 5

http://comsys.rwth-aachen.de

5

MAC NET TRANS CONTR TRANS NET MAC NET MAC TRANS ACTUATOR MAC NET MAC TRANS SENSOR NET MAC MAC NET MAC MAC NET MAC MAC NET MAC

SENSE ACT

Pendulum unstable

Actuators Transport Network MAC PHY IP MAC PHY MAC PHY IP MAC PHY MAC PHY

Switch Switch

Controller Transport Network MAC PHY

Cloud with Controller

Sensors

— Problem 2: Vertical distance

  • f the layered system approach

„ Control and Communication layers heavily abstract from each other ¾ Control is just an(other) application ¾ Network seen as (stable) black box „ No joint optimization possible

à Cyber-Physical Networking Initiative

„ Co-Designed Networked Control

DECIDE

Cyber-Physical Networking – Challenge 1: Ultra-low Latency

Seperated by abstraction

slide-6
SLIDE 6

http://comsys.rwth-aachen.de

6

Reflex n Reflex n Reflex n Reflexes

Co-Designing Communication and Control

SDN Data Plane Control Plane

SDN Softw. Controller

Software-defined Networking (SDN)

Switch Switch

Cloud Controller

Low Low lat atenc ency dat data pat path Co Control path

Simple Approximated Control Rules Simple Reduced Communication Rules

„ SDN actions not suitable for control actions „ Simply pre-computing bloats rule tables à But modern programmable switches (Tofino, FPGA, smartNICs) are paving the way towards In-Network- Processing

Reflexes

Control Task

Communication Task

wrong token pre- alert wait token token rcv token missing

INR1 INR1 OUTR1 TOR1 INR2 OUTR2 OUTR2

lost token Token rcvd Token passed

slide-7
SLIDE 7

http://comsys.rwth-aachen.de

10

Accuracy-Latency-(Throughput) Trade-Off — Computing Platforms

1) End-host computations 2) In-kernel processing (XDP, TC) 3) SmartNIC 4) Switch (e.g. Tofino)

NIC Kernel Userspace

Faster & more predictable But very restricted

  • perations

Less computational restrictions But more unpredictable and slower execution

slide-8
SLIDE 8

http://comsys.rwth-aachen.de

11

MAC NET TRANS CONTR TRANS NET MAC NET MAC TRANS ACTUATOR MAC NET MAC TRANS SENSOR NET MAC MAC NET MAC MAC NET MAC MAC NET MAC

SENSE DECIDE

REFLEXES – A Co-Designed Architecture for In-Network Control

Challenge: „ Make joint decision on control and communication decision „ Combine possible reactions to many reflex candidates and „ push reflexes nearer to plant

ACT earlier and Com/Con-optimized ACT

Pendulum unstable Pendulum very stable

Actuators Transport Network MAC PHY IP MAC PHY MAC PHY IP MAC PHY MAC PHY

Switch Switch

Controller Transport Network MAC PHY

Cloud with Controller

Sensors

Reflexes Reflex n Reflex n Reflex n Reflexes

slide-9
SLIDE 9

http://comsys.rwth-aachen.de

12

General REFLEXES Framework

— Task separation: Separating data processing and coordination

„ Fast and simple reactions based on INP

¾ Use computation in the network to execute simple tasks ¾ Push simplified control algorithm (reflex) to the switch

„ Main control algorithm stays in edge cloud to do delay-insensitive adaptation

¾ Slow path processing, coordination and state management stays in the cloud ¾ Cloud updates reflex if necessary, e.g. latency change, process is mobile, etc. Sensor Sensor Actuator

Access Point Switch Switch

Actuator

Access Point Switch Edge Cloud Remote Cloud

Low Low lat atenc ency dat data a pat path Co Control path

R

slide-10
SLIDE 10

http://comsys.rwth-aachen.de

13

Two Real-world Examples (Cluster Internet of Production)

— Arc welding robots

— Control loops

„ Single-digit millisecond latency „ Multiple sensor sources

¾ HD and infrared camera ¾ Current draw of light arc

„ Actuators

¾ Robot positioning ¾ Light arc voltage

— Mobile robot cooperation

— Control loops

„ Positioning coordinated by many inputs

¾ e.g. indoor coordinate system, camera, etc. ¾ In-network coordinate transformation

„ Human in the loop detection (safety zone)

¾ e.g. logical safety loop among cameras, lasers, Lidar

„ Robot interaction via multiple sensors „ Augmented Reality …

S S S A A A HD HD HD HD S S

6x

S S S S S R R R S

slide-11
SLIDE 11

http://comsys.rwth-aachen.de

14

Coordinate Transformation C

Networked Control – Real-World Example Laser Tracker

slide-12
SLIDE 12

http://comsys.rwth-aachen.de

15

In-Network Coordinate Transformation – Fundamentals

𝒔 sin 𝜾 cos 𝝌 𝒔 sin 𝜾 sin 𝝌 𝒔 cos 𝜾 = 𝑦 𝑧 𝑨

— Restricted Fixed-Point Arithmetic

„ ± 0 …2! . [0 …2"#$!] „ Choose fixed point to

¾ ensure range is sufficiently large (application range) ¾ maximize fractional part (required accuracy)

— Approximate trigonometric functions

1. Chebyshev polynomials 2. Table Lookup „ Problem: Large table space needed „ Use sum of angle identity sin𝑏 + 𝑐 = sin𝑏 / cos 𝑐 + cos 𝑏 / sin𝑐 sin 𝜾

𝜾 𝐭𝐣𝐨 0.000000 0.000488 0.000488 0.000977 0.000977 …

Challenge: Coordinate transformation (Spherical to Cartesian)

slide-13
SLIDE 13

http://comsys.rwth-aachen.de

16

In-Network Coordinate Transformation – Fundamentals

— Restricted Fixed-Point Arithmetic

„ ± 0 …2! . [0 …2"#$!] „ Choose fixed point to

¾ ensure range is sufficiently large (application range) ¾ maximize fractional part (required accuracy)

— Approximate trigonometric functions

1. Chebyshev polynomials 2. Table Lookup sin 6.282714000002 ≈ −0.000471 ≈ sin 𝜾𝒊𝒋𝒉𝒊 / 𝑑𝑝𝑡 𝜾()* + cos 𝜾𝒊𝒋𝒉𝒊 / sin 𝜾+,- ≈ −0.000470 / 1 + 0.999999 / 5.960464 / 10$. ≈ −0.000470

𝜾𝒊𝒋𝒉𝒊 (sin 𝜾𝒊𝒋𝒉𝒊 , cos 𝜾𝒊𝒋𝒉𝒊) 0.000000 (0,1) 0.000488 (0.000488, 0.999999) 0.000977 (0.000977, 0.999995) … 6.282714 (-0.000470, 0.999999) 𝜾𝒊𝒋𝒉𝒊 (sin 𝜾𝒊𝒋𝒉𝒊 , cos 𝜾𝒊𝒋𝒉𝒊) 000000 (0,1) 000001 (2.980232e-8, 1) 000002 (5.960464e-8, 1) … 000488 (-0.000470, 0.999999) 𝜾𝒊𝒋𝒉𝒊 (sin 𝜾𝒊𝒋𝒉𝒊 , cos 𝜾𝒊𝒋𝒉𝒊) 0.000000 (0,1) 0.000488 (0.000488, 0.999999) 0.000977 (0.000977, 0.999995) … 6.282714 (-0.000470, 0.999999) 𝜾𝒎𝒑𝒙 (sin 𝜾𝐦𝐩𝐱 , cos 𝜾𝐦𝐩𝐱) 000000 (0,1) 000001 (2.980232e-8, 1) 000002 (5.960464e-8, 1) … 000488 (-0.000470, 0.999999)

sin 6.282714000002 ≈ −0.000471 sin 6.282714000002 ≈ −0.000471

𝒔 sin 𝜾 cos 𝝌 𝒔 sin 𝜾 sin 𝝌 𝒔 cos 𝜾 = 𝑦 𝑧 𝑨

Challenge: Coordinate transformation (Spherical to Cartesian)

slide-14
SLIDE 14

http://comsys.rwth-aachen.de

17

In-Network Image Processing

—

Low-latency computer vision often needed

  • Fast reactions to the environment

—

Camera images rarely fit into single packet

  • Use local computation strategies like convolution

Turn right Forward Turn left Middle position between two highest responses

slide-15
SLIDE 15

http://comsys.rwth-aachen.de

18

Data Stream Processing

— Collection and Analysis of Process Data

„ Data-driven improvement of production and efficiency

¾ Collect every data item the process and machines are emitting ¾ Derive immediate feedback on process status and product quality ¾ Realtime-feedback for production process

„ Problem: Data rate of produced process data

Sensor

Access Point Switch Switch

Actuator

Access Point Switch Ed Edge Cl Cloud Re Remote Cl Cloud

Sensor Sensor

slide-16
SLIDE 16

http://comsys.rwth-aachen.de

19

Real-world example: Fine Blanking

Decoiler

  • Sampling: 2.5-5kHz
  • Data rate: 45-90 Mbps

Leveler

  • Data not relevant
  • 64 signals at 32bit
  • Sampling: 5 kHz
  • D. Rate: 10 Mbps

Lubricator

  • Infrared camera: 160 Mbps
  • Press control/sensors: 25 Mbps
  • Vibr. Sensor: 1 Mhz, 150Mbps
  • ~500 Mbps per 4K camera

Press

slide-17
SLIDE 17

http://comsys.rwth-aachen.de

20

Data Stream Processing at 40 Gbps Line-Rate

— Collection and Analysis of Process Data

„ Data-driven improvement of production and efficiency

¾ Collect every data item the process and machines are emitting ¾ Derive immediate feedback on process status and product quality ¾ Realtime-feedback for production process

„ Problem: Data rate of produced process data

— Reduce/process the data as early as possible in the network

„ Apply filtering, aggregation, compression, classification on the data path

Sensor Sensor Sensor

Access Point Switch Switch

Actuator

Access Point Switch Edge Cloud Remote Cloud

F F F F

Filters derived from data

slide-18
SLIDE 18

http://comsys.rwth-aachen.de

21

Proposed Framework for IRTF: Computing in the Network

— Proposed Framework

„ Enable computation in the network elements (switches, smartNICs, access points, etc)

¾ For simple control tasks ¾ For filtering, aggregating, etc. data on the path to the cloud ¾ For boosting data analysis in a data center (not discussed here)

„ Hierarchical placement of computational tasks

¾ Simple and predictive computation in the network ¬ Used to satisfy tight constraints (e.g. low latency response) ¾ Long-term computation, state management and coordination in the cloud (complex tasks)

Data at high rate/volume/precision Data at low rate

  • Update of models

filters, functions etc

  • Configuration
  • State Management
  • Mobility

Control actions, fast feedback

Process / Plant Data Plane INP

  • process, compute, …
  • filter, aggregate, reduce …
  • etc.

Cloud Controller à C O I N R e s e a r c h G r

  • u

p

slide-19
SLIDE 19

http://comsys.rwth-aachen.de

22 Performance- & Analysis-Feedback Prediction Fix bugs Analysis

  • f Paths

Instruction Chains

  • ·
  • ...

Symbolic Analysis

  • {}

{len < 54} {len ≥ 54} {len ≥ 54, (data + 12) = 2048} {len ≥ 54, (data + 12) = 2048} {len ≥ 54, (data + 12) = 2048} {len ≥ 54, (data + 12) = 2048} {len ≥ 54, (data + 12) = 2048, λ = 0} {len ≥ 54, (data + 12) = 2048, λ = 0} {len ≥ 54, (data + 12) = 2048, λ = 0} {len ≥ 54, (data + 12) = 2048, λ = 0, (λ) = 0} {len ≥ 54, (data + 12) = 2048, λ = 0, (λ) = 0}

  • Execution Tree
  • Network Function Code

Instruction- Cache- & CPU-Model Traffic Pattern

100 200 300 400 CPU Cycles 0.00 0.01 0.02 Frequency 0.00 0.25 0.50 0.75 1.00 CDF measured predicted 5 5 Rate [Million pkt/s] 250 500 750 1000 1250 CPU Cycles 0.000 0.001 0.002 0.003 Frequency 0.00 0.25 0.50 0.75 1.00 CDF measured predicted 5 5 4 3 2 Rate [Million pkt/s]

Performance Predictions

100 200 300

CPU Cycles

0.00 0.05 0.10

Frequency

0.00 0.25 0.50 0.75 1.00

CDF

measured predicted 100 200 300 400

CPU Cycles

0.00 0.02 0.04

Frequency

0.00 0.25 0.50 0.75 1.00

CDF

measured predicted 100 200 300 400

CPU Cycles

0.00 0.05 0.10

Frequency

0.00 0.25 0.50 0.75 1.00

CDF

measured predicted

Pre-Deployment Performance Prediction of INP-Components

slide-20
SLIDE 20

http://comsys.rwth-aachen.de

23

— Allows pre-deployment understanding of NF performance

„ Impact of different implementation designs (e.g. linear list vs. decision tree) „ Impact of different traffic patterns (e.g. regular vs. attack traffic, or IPv4 vs. IPv6)

250 500 750 1000 1250

CPU Cycles

0.000 0.001 0.002 0.003

Frequency

0.00 0.25 0.50 0.75 1.00

CDF

measured predicted 5 5 4 3 2

Rate [Million pkt/s]

Firewall NF: decision list

100 200 300 400

CPU Cycles

0.00 0.01 0.02

Frequency

0.00 0.25 0.50 0.75 1.00

CDF

measured predicted 5 5

Rate [Million pkt/s]

Firewall NF: decision tree

250 500 750 1000 1250 1500 1750

CPU Cycles

0.00 0.05 0.10

Frequency

0.00 0.25 0.50 0.75 1.00

CDF

measured IPv4 predicted IPv4 measured IPv6 predicted IPv6

Cilium Load Balancer NF

Example Findings from Symbolic Performance Analysis

slide-21
SLIDE 21

http://comsys.rwth-aachen.de

24

Summary

— On-path computation

„ has always been considered harmful in networks

¾ Erroneous, slow and unpredictable

— But…

„ these days latency matters more and more

¾ Deviations via clouds/data centers are costly ¾ Type of computation is often not too complex and not too lengthy

„ there is (more and more) hardware that can …

¾ do more than just packet forwarding, counting, and dropping ¾ at 40/100 Gbps line speed (depends on the architecture)

„ There are better methods…

¾ to predict the (software) behavior ¾ to predict the resulting performance ¾ to ensure the code does what you want it to do

— Our goal is to find out the limits and to push them further