Real-Time Edge Computing Chenyang Lu Industrial Internet of Things - - PowerPoint PPT Presentation
Real-Time Edge Computing Chenyang Lu Industrial Internet of Things - - PowerPoint PPT Presentation
Real-Time Edge Computing Chenyang Lu Industrial Internet of Things (IIoT) Synergizing sensing, analytics, and control Cloud computing for high capacity Edge computing for timely performance Condition monitoring, Cloud Emergency
Industrial Internet of Things (IIoT)
Ø Synergizing sensing, analytics, and control
ü Cloud computing for high capacity ü Edge computing for timely performance
2
Wireless sensor network (e.g., in a wind farm) Private cloud for training and storage Machine learning training
...
...
Applications
...
IIoT services Edge 1
Database
...
Edge 2 Edge N
...
Cloud
Condition monitoring, Emergency response, Predictive maintenance, …
Research challenge #1: timeliness
Ø Timing constraints:
q IIoT applications have latency requirements q Events carrying physical data have temporal semantics
3
Image source: https://www.maintwiz.com/what-is-condition-monitoring/
Application example: condition monitoring
Research challenge #1: timeliness
Ø Timing constraints:
q IIoT applications have latency requirements q Events carrying physical data have temporal semantics
4
Image source: https://www.maintwiz.com/what-is-condition-monitoring/
Application example: condition monitoring
Contribution #1: Cyber-Physical Event Processing Architecture
- latency differentiation
- time consistency enforcement
Research challenge #2: loss-tolerance
Ø An IIoT service must deliver messages reliably, but
q fault-tolerant systems can be slow or costly q heterogeneous traffic and platforms can increase pessimism
5 Primary service Backup service edge applications IIoT devices cloud applications
Research challenge #2: loss-tolerance
Ø An IIoT service must deliver messages reliably, but
q fault-tolerant systems can be slow or costly q heterogeneous traffic and platforms can increase pessimism
6 Primary service Backup service edge applications IIoT devices cloud applications
Contribution #2: Fault-Tolerant Real-Time Messaging Architecture
- co-scheduling fault-tolerant real-time activities
- traffic/platform-aware service configuration
Research challenge #3: efficiency
Ø Efficiency atop loss-tolerance and timeliness:
q costly to backup many in-band small computations q costly to recompute for fault recovery
7
Image source: https://aws.amazon.com/lambda/
Example of in-band computations: AWS Lambda function for IIoT inference
Research challenge #3: efficiency
Ø Efficiency atop loss-tolerance and timeliness:
q costly to backup many in-band small computations q costly to recompute for fault recovery
8
Image source: https://aws.amazon.com/lambda/
Example of in-band computations: AWS Lambda function for IIoT inference
Contribution #3: Adaptive Real-Time Reliable Edge Computing
- selective lazy data replication
- proactive cleanup of obsolete data
Contributions
Ø Three new IIoT middleware design and implementations:
q Real-time cyber-physical event processing (CPEP) q Fault-tolerant real-time messaging (FRAME) q Adaptive real-time reliable edge computing (ARREC) 9
All have been implemented and validated within the TAO real-time event service [1].
[1] Harrison, T.H., Levine, D.L. and Schmidt, D.C., 1997. The design and performance of a real-time CORBA event service. ACM SIGPLAN Notices, 32(10), pp.184-200.
efficiency t i m e l i n e s s loss-tolerance efficiency t i m e l i n e s s loss-tolerance efficiency t i m e l i n e s s loss-tolerance efficiency t i m e l i n e s s loss-tolerance
CPEP FRAME ARREC
Subscription & Filtering Supplier Proxies Event Correlation Dispatching Consumer Proxies
- riginal TAO
Outline
Ø CPEP: real-time cyber-physical event processing Ø FRAME: fault-tolerant real-time messaging Ø ARREC: adaptive real-time reliable edge computing
10
efficiency t i m e l i n e s s loss-tolerance
with CPEP
CPEP
Supplier Proxies Consumer Proxies Subscription & Filtering Supplier Proxies Event Correlation Dispatching Consumer Proxies
- riginal TAO
Cyber-physical event processing model
Ø Temporal semantics
q Absolute time consistency
- A bound on an event’s elapse time since its creation
q Relative time consistency
- A bound on the difference between events’ creation times
11 Oi: operations (filtering, transformation, encryption, …)
IIoT devices IIoT event service
- 7
- 6
- 5
- 4
- 3
- 2
- 1
s1 s2 s3 s5 s4
High priority Middle priority Low priority
c1 c2 c3 c4
Low priority
IIoT applications
Real-time event processing
Ø Processing in the order of priorities propagated from application: Ø Temporal semantics enforcement and shedding:
q Absolute time consistency q Relative time consistency
- Track both the earliest and the latest event creations, per operator
12
- 7
- 6
- 5
- 4
- 3
- 2
- 1
s1 s2 s3 s5 s4
High priority Middle priority Low priority
c1 c2 c3 c4
Low priority S2 S1 S3 C2 t1 t2 t3 t4 t5 t7 t6
The CPEP processing architecture
13
Both workers and movers are further prioritized, enabling an appropriate activity ordering.
- 7
- 6
- 5
- 4
- 3
- 2
- 1
s1 s2 s3 s5 s4
High priority Middle priority Low priority
c1 c2 c3 c4
Low priority
Enforcing Absolute Time Consistency
Ø Tracking the earliest end time of validity interval Ø Responses to consistency violation
q Marking: deferring the handling to consumers q Shedding: cancelling all subsequent processing 14
- 7
- 6
- 5
- 4
- 3
- 2
- 1
s1 s2 s3 s5 s4 c1 c2 c3 c4
es1 es2 es3
(Improving efficiency)
Enforcing Relative Time Consistency
Ø Maintaining an ordered list of events’ timestamp
q One timestamp per event type q Comparing the maximum time difference with validity interval
Ø Responses to consistency violation
q Marking: deferring the handling to consumers q Shedding: cancelling all subsequent processing 15
- 7
- 6
- 5
- 4
- 3
- 2
- 1
s1 s2 s3 s5 s4 c1 c2 c3 c4
es1 es2 es3 es1
'
(Improving efficiency)
Experiment design
Ø IIoT workload:
q Filtering q Data transform q Encryption
Ø Test-bed configuration: Ø Comparison baseline:
q Apache Flink streaming processing framework [1]
16
High priority Middle priority Low priority EKF2 FFT1 CAT1 AES2 FFT2 EKF4 FFT3 AES3 EKF1 AES1 c2 c1 c3 s1 s2 s3 s4 s7 s8 50 Hz 200 Hz 100 Hz EKF3 s5 s6
Machine 1 Suppliers Machine 2 CPEP Machine 3 Consumers
[1] https://flink.apache.org
Latency performance
17
99th percentile latency (unit: ms) CPEP maintained high-priority latency performance as workload increased.
High Middle Low
CPEP differentiated latency according to priority level.
#
Benefits of shedding inconsistent events
18
Improve the throughput
- f consistent events.
Save CPU utilization.
Latency of low-priority processing
Effectiveness of CPEP Sharing
Ø Experiment setup Ø Results of sharing vs. non-sharing
19
High priority Middle priority Low priority EKF3 FFT3 CAT2 AES2 FFT4 c2 s5 s6 EKF4 s7 s8 EKF1 FFT1 CAT1 AES1 FFT2 c1 s1 s2 EKF2 s3 s4 CAT3 AES4 c3 AES3 100 Hz 100 Hz
CPEP sharing helped reduce latency
Effectiveness of Sharing
Ø Results of sharing vs. non-sharing
20
Latency of low-priority processing CPU utilization
Outline
Ø CPEP: Real-time cyber-physical event processing Ø FRAME: Fault-tolerant real-time messaging Ø ARREC: Adaptive real-time reliable edge computing
21
efficiency timeliness l
- s
s
- t
- l
e r a n c e
Subscription & Filtering Supplier Proxies Event Correlation Dispatching Consumer Proxies
- riginal TAO
FRAME
Supplier Proxies Consumer Proxies
Ø Application-specific requirements to an IIoT service
q
: the tolerable number of consecutive losses for topic i
Message loss-tolerance requirement
22
Image source: https://www.originlab.com/doc/Origin-Help/Math-Inter-Extrapoltate-YfromX
Value of Application examples emergency response; predictive maintenance k > 0 condition monitoring
(Within the tolerable number, applications may use estimates for the missing data.)
Fault-tolerance model
Ø A crash failure may happen to an IIoT service host (fail-stop) Ø Lost messages may be recovered
1.
via retransmissions from message publishers
2.
via a backup service [1]
23
[1] Budhiraja, N., Marzullo, K., Schneider, F.B. and Toueg, S., 1993. The primary-backup approach. Distributed systems, 2, pp.199-216. Primary service Backup service edge applications IIoT devices cloud applications
Fault-tolerant real-time processing
Ø Specify provable deadlines for message replication and dispatch Ø Co-schedule replication and dispatch using, e.g., earliest-deadline-first (EDF)
24
Primary service Backup service edge applications IIoT devices cloud applications dispatch replication
Necessary condition for a message loss
Ø A message may loss only if both
1.
publisher has deleted its copy
2.
a copy of message has not been replicated to the Backup
25
Events between message creation and its delivery:
Deadlines for dispatch and replication
Ø Deadline for dispatch: Ø Deadline for replication:
26 : # of most-recent messages a publisher can retransmit : loss-tolerance requirement : topic’s sending period : latency requirement
Primary Broker Publisher Subscriber
PB BS
Primary Broker Publisher Backup Broker Subscriber
PB BB
crash
x ( Ni + Li ) Ti
...
The deadline specifications aid to configuration of IIoT traffic/platform parameters. The deadline specifications help in configuring IIoT traffic/platform parameters.
The FRAME messaging architecture
Ø EDF scheduling to dispatch/replicate a message Ø Suppress replication if the dispatch deadline is smaller Ø Prune dispatched messages
27
Data subscriber Data publisher Primary Broker Data publisher Publishers Data subscriber Subscribers ... ... Message Delivery Message Proxy Replicators Dispatchers Message Delivery Replicators Dispatchers Backup Broker Message Proxy
Experiment design
Ø IIoT topic configuration: Ø Test-bed deployment: Ø Service configurations:
q FRAME+; FRAME; FCFS; FCFS-
28
ES2 ES1 Primary Broker Backup Broker Edge Subscribers Cloud Subscriber EP1 B1 B2 CS1 Edge Publishers EP2
Loss tolerance performance
29
Success rate of meeting loss-tolerance requirements
A small increase in Ni can greatly improve performance (FRAME+). FRAME succeeded in assuring loss-tolerance.
Latency performance for recovery
30
FRAME can mitigate latency penalty. Without suppressing replication, pruning could overload the system. No pruning, however, could cause latency penalty by re- sending obsolete messages.
when the Primary host crashed
Latency during fault-free operations
31
Observation: Both replication and pruning could delay message dispatching…
Success rate of meeting soft latency requirements
How to improve efficiency ?
All configurations gave similar latency performance.
Outline
Ø CPEP: Real-time cyber-physical event processing Ø FRAME: Fault-tolerant real-time messaging Ø ARREC: Adaptive real-time reliable edge computing
32
efficiency timeliness l
- s
s
- t
- l
e r a n c e
Subscription & Filtering Supplier Proxies Event Correlation Dispatching Consumer Proxies
- riginal TAO
ARREC
with ARREC Supplier Proxies Consumer Proxies
Edge Computing for IIoT
Ø Timely, reliable, and efficient IIoT edge computing
q CPEP: Real-time cyber-physical event processing q FRAME: Fault-tolerant real-time messaging q ARREC: Adaptive real-time reliable edge computing 33
efficiency timeliness l
- s
s
- t
- l
e r a n c e
CPEP FRAME ARREC
Wireless sensor network (e.g., in a wind farm) Private cloud for training and storage Machine learning training
...
...
Applications
...
IIoT services Edge 1
Database
...
Edge 2 Edge N
...
Cloud
References
Ø C. Wang, C. Gill and C. Lu, Real-Time Middleware for Cyber-Physical Event Processing, ACM Transactions on Cyber-Physical Systems, Special Issue on Real- Time aspects in Cyber-Physical Systems, 3(3), Article 29, August 2019. Ø C. Wang, C. Gill and C. Lu, FRAME: Fault Tolerant and Real-Time Messaging for Edge Computing, IEEE International Conference on Distributed Computing Systems (ICDCS'19), July 2019. Ø C. Wang, C. Gill and C. Lu, Adaptive Data Replication in Real-Time Reliable Edge Computing for Internet of Things, ACM/IEEE International Conference on Internet
- f Things Design and Implementation (IoTDI'20), April 2020.
34