Critiques 1/2 page critiques of research papers Due at 10am on the - - PowerPoint PPT Presentation

critiques
SMART_READER_LITE
LIVE PREVIEW

Critiques 1/2 page critiques of research papers Due at 10am on the - - PowerPoint PPT Presentation

Critiques 1/2 page critiques of research papers Due at 10am on the class day (hard deadline) Email Dingwen dingwenli@wustl.edu in plain txt Back-of-envelop notes - NOT whole essays Guidelines:


slide-1
SLIDE 1

Critiques

Ø 1/2 page critiques of research papers Ø Due at 10am on the class day (hard deadline) Ø Email Dingwen dingwenli@wustl.edu in plain txt Ø Back-of-envelop notes - NOT whole essays Ø Guidelines: http://www.cs.wustl.edu/~lu/cse521s/critique.html Ø Critique #3

Ø Due on 10/31

Ø C. Wang, C. Gill and C. Lu, FRAME: Fault Tolerant and Real-Time Messaging for Edge Computing, IEEE International Conference on Distributed Computing Systems (ICDCS'19), July 2019.

1

slide-2
SLIDE 2

Real-Time Systems 101

Chenyang Lu

slide-3
SLIDE 3

Consequence of Deadline Miss

Ø Hard deadline

q System fails if missed. q Goal: guarantee no deadline miss.

Ø Soft deadline

q User may notice, but system does not fail. q Goal: meet most deadlines most of the time.

3

slide-4
SLIDE 4

Ø Since the application interacts with the physical world, its computation must be completed under a time constraint. Ø CPS are built from, and depend upon, the seamless integration

  • f computational algorithms and physical components. [NSF]

Cyber-Physical Systems (CPS)

4

Cyber-Physical Boundary

^ Robert L. and Terry L. Bowen Large Scale Structures Laboratory at Purdue University

Real-Time Hybrid Simulation (RTHS)

slide-5
SLIDE 5

Cyber-Physical Systems (CPS)

5

Cyber-Physical Boundary

slide-6
SLIDE 6

Interactive Cloud Services (ICS)

Need to respond within100ms for users to find responsive*.

6

Search the web

* Jeff Dean et al. (Google) "The tail at scale." Communications of the ACM 56.2 (2013)

2nd phase ranking Snippet generator doc

  • Doc. index search

Response Query

slide-7
SLIDE 7

Interactive Cloud Services (ICS)

Need to respond within100ms for users to find responsive*. E.g., web search, online gaming, stock trading etc.

7

* Jeff Dean et al. (Google) "The tail at scale." Communications of the ACM 56.2 (2013)

Search the web

slide-8
SLIDE 8

Comparison

Ø General-purpose systems

q Fairness to all tasks (no starvation) q Optimize throughput q Optimize average performance

Ø Real-time systems

q Meet all deadlines. q Fairness or throughput is not important q Hard real-time: worry about worst case performance

8

slide-9
SLIDE 9

Terminology

Ø Task

q Map to a process or thread q May be released multiple times

Ø Job: an instance of a task Ø Periodic task

q Ideal: inter-arrival time = period q General: inter-arrival time >= period

Ø Aperiodic task

q Inter-arrival time does not have a lower bound 9

slide-10
SLIDE 10

Timing Parameters

Ø Task Ti

q Period Pi q Worst-case execution time Ci q Relative deadline Di

Ø Job Jik

q Release time: time when a job is ready q Response time Ri = finish time – release time q Absolute deadline = release time + Di

Ø A job misses its deadline if

q Response time Ri > Di q Finish time > absolute deadline 10

slide-11
SLIDE 11

Example

Ø P1 = D1 = 5, C1 = 2; P2 = D2 = 7, C2 = 4.

11

slide-12
SLIDE 12

Metrics

Ø A task set is schedulable if all jobs meet their deadlines. Ø Optimal scheduling algorithm

q A task set is unschedulable under the optimal algorithm à

unschedulable under any other algorithms.

Ø Overhead: Time required for scheduling.

12

slide-13
SLIDE 13

Optimal Scheduling Algorithms

Ø Rate Monotonic (RM)

q Higher rate (1/period) à Higher priority q Optimal preemptive static priority scheduling algorithm

Ø Earliest Deadline First (EDF)

q Earlier absolute deadline à Higher priority q Optimal preemptive dynamic priority scheduling algorithm 13

slide-14
SLIDE 14

Example

Ø P1 = D1 = 5, C1 = 2; P2 = D2 = 7, C2 = 4.

14

slide-15
SLIDE 15

Process States

Ø A process can be in one of three states:

q executing on the CPU; q ready to run; q waiting for data.

15

executing ready waiting needs data gets data preempted gets CPU Scheduler

slide-16
SLIDE 16

Priority Scheduling

Ø Every process has a priority. Ø CPU goes to the ready process with the highest priority.

q Fixed vs. dynamic priority q Preemptive vs. non-preemptive

16

slide-17
SLIDE 17

Preemptive Priority Scheduling

Ø Each process has a fixed priority (1 highest); Ø P1: priority 1; P2: priority 2; P3: priority 3.

17

time P2 released P1 released P3 released 30 10 20 60 40 50 P2 P2 P1 P3

slide-18
SLIDE 18

Preemptive Priority Scheduling

Ø Most common real-time scheduling approach

q Real-time POSIX q Real-time priorities in Linux q Most RTOS

Ø Not the only possible way

q Non-preemptive q Clock-driven scheduling q Reservation-based scheduling

18

slide-19
SLIDE 19

How Real-Time Is Linux?

Ø I believe that Linux is ready to handle applications requiring sub- millisecond process-scheduling and interrupt latencies with 99.99+ percent probabilities of success. No, that does not cover every imaginable real-time application, but it does cover a very large and important subset. Ø The Linux 2.6 kernel, if configured carefully and run on fast hardware, can provide sub-millisecond interrupt and process scheduling latencies with extremely high probabilities of success. There are patches out there that are expected to provide latencies in the tens of microseconds. These patches need some work, but are maturing quickly. Paul McKenney, IBM Linux Technology Center Shrinking slices: Looking at real time for Linux, PowerPC, and Cell

19

slide-20
SLIDE 20

Linux Scheduling

Ø Real-time scheduling class

q Fixed priority

  • SCHED_FIFO: First-In-First-Out for threads of the same priority
  • SCHED_RR: Round-Robin for threads of the same priority

q SCHED_DEADLINE: EDF

Ø Non-real-time scheduling class (SCHED_NORMAL)

q CFS: Completed Fair Scheduler

Ø Default

q Real-time: 0 – 99 q Non-real-time: 100 – 139 20

slide-21
SLIDE 21

Scheduler Setup – Priorities

Ø chrt command (can also check task priorities)

http://www.cyberciti.biz/faq/howto-set-real-time-scheduling-priority-process/

q sudo chrt –f –p 99 4800 # pid 4800 with priority 99 and fifo

Ø sched_scheduler [http://linux.die.net/man/2/sched_setscheduler]

21

#include <sched.h> int main() { … struct sched_param sched; sched.sched_priority = 98; if (sched_setscheduler(getpid(), SCHED_FIFO, &sched) < 0) { exit(EXIT_FAILURE); } … }

slide-22
SLIDE 22

Real-Time Edge Computing

Chenyang Lu

slide-23
SLIDE 23

Industrial Internet of Things (IIoT)

Ø Synergizing sensing, analytics, and control

ü Cloud computing for high capacity ü Edge computing for timely performance

23

Wireless sensor network (e.g., in a wind farm) Private cloud for training and storage Machine learning training

...

...

Applications

...

IIoT services Edge 1

Database

...

Edge 2 Edge N

...

Cloud

Condition monitoring, Emergency response, Predictive maintenance, …

slide-24
SLIDE 24

Research challenge #1: timeliness

Ø Timing constraints:

q IIoT applications have latency requirements q Events carrying physical data have temporal semantics

24

Image source: https://www.maintwiz.com/what-is-condition-monitoring/

Application example: condition monitoring

slide-25
SLIDE 25

Research challenge #1: timeliness

Ø Timing constraints:

q IIoT applications have latency requirements q Events carrying physical data have temporal semantics

25

Image source: https://www.maintwiz.com/what-is-condition-monitoring/

Application example: condition monitoring

Contribution #1: Cyber-Physical Event Processing Architecture

  • latency differentiation
  • time consistency enforcement
slide-26
SLIDE 26

Research challenge #2: loss-tolerance

Ø An IIoT service must deliver messages reliably, but

q fault-tolerant systems can be slow or costly q heterogeneous traffic and platforms can increase pessimism

26 Primary service Backup service edge applications IIoT devices cloud applications

slide-27
SLIDE 27

Research challenge #2: loss-tolerance

Ø An IIoT service must deliver messages reliably, but

q fault-tolerant systems can be slow or costly q heterogeneous traffic and platforms can increase pessimism

27 Primary service Backup service edge applications IIoT devices cloud applications

Contribution #2: Fault-Tolerant Real-Time Messaging Architecture

  • co-scheduling fault-tolerant real-time activities
  • traffic/platform-aware service configuration
slide-28
SLIDE 28

Research challenge #3: efficiency

Ø Efficiency atop loss-tolerance and timeliness:

q costly to backup many in-band small computations q costly to recompute for fault recovery

28

Image source: https://aws.amazon.com/lambda/

Example of in-band computations: AWS Lambda function for IIoT inference

slide-29
SLIDE 29

Research challenge #3: efficiency

Ø Efficiency atop loss-tolerance and timeliness:

q costly to backup many in-band small computations q costly to recompute for fault recovery

29

Image source: https://aws.amazon.com/lambda/

Example of in-band computations: AWS Lambda function for IIoT inference

Contribution #3: Adaptive Real-Time Reliable Edge Computing

  • selective lazy data replication
  • proactive cleanup of obsolete data
slide-30
SLIDE 30

Contributions

Ø Three new IIoT middleware design and implementations:

q Real-time cyber-physical event processing (CPEP) q Fault-tolerant real-time messaging (FRAME) q Adaptive real-time reliable edge computing (ARREC) 30

All have been implemented and validated within the TAO real-time event service [1].

[1] Harrison, T.H., Levine, D.L. and Schmidt, D.C., 1997. The design and performance of a real-time CORBA event service. ACM SIGPLAN Notices, 32(10), pp.184-200.

efficiency t i m e l i n e s s loss-tolerance efficiency t i m e l i n e s s loss-tolerance efficiency t i m e l i n e s s loss-tolerance efficiency t i m e l i n e s s loss-tolerance

CPEP FRAME ARREC

Subscription & Filtering Supplier Proxies Event Correlation Dispatching Consumer Proxies

  • riginal TAO
slide-31
SLIDE 31

Outline

Ø CPEP: real-time cyber-physical event processing Ø FRAME: fault-tolerant real-time messaging Ø ARREC: adaptive real-time reliable edge computing

31

efficiency t i m e l i n e s s loss-tolerance

with CPEP

CPEP

Supplier Proxies Consumer Proxies Subscription & Filtering Supplier Proxies Event Correlation Dispatching Consumer Proxies

  • riginal TAO
slide-32
SLIDE 32

Cyber-physical event processing model

Ø Temporal semantics

q Absolute time consistency

  • A bound on an event’s elapse time since its creation

q Relative time consistency

  • A bound on the difference between events’ creation times

32 Oi: operations (filtering, transformation, encryption, …)

IIoT devices IIoT event service

  • 7
  • 6
  • 5
  • 4
  • 3
  • 2
  • 1

s1 s2 s3 s5 s4

High priority Middle priority Low priority

c1 c2 c3 c4

Low priority

IIoT applications

slide-33
SLIDE 33

Real-time event processing

Ø Processing in the order of priorities propagated from application: Ø Temporal semantics enforcement and shedding:

q Absolute time consistency q Relative time consistency

  • Track both the earliest and the latest event creations, per operator

33

  • 7
  • 6
  • 5
  • 4
  • 3
  • 2
  • 1

s1 s2 s3 s5 s4

High priority Middle priority Low priority

c1 c2 c3 c4

Low priority S2 S1 S3 C2 t1 t2 t3 t4 t5 t7 t6

slide-34
SLIDE 34

The CPEP processing architecture

34

Both workers and movers are further prioritized, enabling an appropriate activity ordering.

  • 7
  • 6
  • 5
  • 4
  • 3
  • 2
  • 1

s1 s2 s3 s5 s4

High priority Middle priority Low priority

c1 c2 c3 c4

Low priority

slide-35
SLIDE 35

Enforcing Absolute Time Consistency

Ø Tracking the earliest end time of validity interval Ø Responses to consistency violation

q Marking: deferring the handling to consumers q Shedding: cancelling all subsequent processing 35

  • 7
  • 6
  • 5
  • 4
  • 3
  • 2
  • 1

s1 s2 s3 s5 s4 c1 c2 c3 c4

es1 es2 es3

(Improving efficiency)

slide-36
SLIDE 36

Enforcing Relative Time Consistency

Ø Maintaining an ordered list of events’ timestamp

q One timestamp per event type q Comparing the maximum time difference with validity interval

Ø Responses to consistency violation

q Marking: deferring the handling to consumers q Shedding: cancelling all subsequent processing 36

  • 7
  • 6
  • 5
  • 4
  • 3
  • 2
  • 1

s1 s2 s3 s5 s4 c1 c2 c3 c4

es1 es2 es3 es1

'

(Improving efficiency)

slide-37
SLIDE 37

Experiment design

Ø IIoT workload:

q Filtering q Data transform q Encryption

Ø Test-bed configuration: Ø Comparison baseline:

q Apache Flink streaming processing framework [1]

37

High priority Middle priority Low priority EKF2 FFT1 CAT1 AES2 FFT2 EKF4 FFT3 AES3 EKF1 AES1 c2 c1 c3 s1 s2 s3 s4 s7 s8 50 Hz 200 Hz 100 Hz EKF3 s5 s6

Machine 1 Suppliers Machine 2 CPEP Machine 3 Consumers

[1] https://flink.apache.org

slide-38
SLIDE 38

Latency performance

38

99th percentile latency (unit: ms) CPEP maintained high-priority latency performance as workload increased.

High Middle Low

CPEP differentiated latency according to priority level.

#

slide-39
SLIDE 39

Benefits of shedding inconsistent events

39

Improve the throughput

  • f consistent events.

Save CPU utilization.

slide-40
SLIDE 40

Latency of low-priority processing

Effectiveness of CPEP Sharing

Ø Experiment setup Ø Results of sharing vs. non-sharing

40

High priority Middle priority Low priority EKF3 FFT3 CAT2 AES2 FFT4 c2 s5 s6 EKF4 s7 s8 EKF1 FFT1 CAT1 AES1 FFT2 c1 s1 s2 EKF2 s3 s4 CAT3 AES4 c3 AES3 100 Hz 100 Hz

CPEP sharing helped reduce latency

slide-41
SLIDE 41

Effectiveness of Sharing

Ø Results of sharing vs. non-sharing

41

Latency of low-priority processing CPU utilization

slide-42
SLIDE 42

Outline

Ø CPEP: Real-time cyber-physical event processing Ø FRAME: Fault-tolerant real-time messaging Ø ARREC: Adaptive real-time reliable edge computing

42

efficiency timeliness loss-tolerance

Subscription & Filtering Supplier Proxies Event Correlation Dispatching Consumer Proxies

  • riginal TAO

FRAME

Supplier Proxies Consumer Proxies

slide-43
SLIDE 43

Ø Application-specific requirements to an IIoT service

q : the tolerable number of consecutive losses for topic i

Message loss-tolerance requirement

43

Image source: https://www.originlab.com/doc/Origin-Help/Math-Inter-Extrapoltate-YfromX

Value of Application examples emergency response; predictive maintenance k > 0 condition monitoring

(Within the tolerable number, applications may use estimates for the missing data.)

slide-44
SLIDE 44

Fault-tolerance model

Ø A crash failure may happen to an IIoT service host (fail-stop) Ø Lost messages may be recovered

1.

via retransmissions from message publishers

2.

via a backup service [1]

44

[1] Budhiraja, N., Marzullo, K., Schneider, F.B. and Toueg, S., 1993. The primary-backup approach. Distributed systems, 2, pp.199-216. Primary service Backup service edge applications IIoT devices cloud applications

slide-45
SLIDE 45

Design principles

Ø Fault-tolerant real-time processing:

q Specify provable deadlines for message replication and dispatch q Co-schedule replication and dispatch using, e.g., the earliest-

deadline-first policy (EDF)

45

Primary service Backup service edge applications IIoT devices cloud applications dispatch replication

slide-46
SLIDE 46

Necessary condition for a message loss

Ø A message may loss only if both

1.

publisher has deleted its copy

2.

a copy of message has not been replicated to the Backup

46

Events between message creation and its delivery:

slide-47
SLIDE 47

Deadlines for dispatch and replication

Ø Deadline for dispatch: Ø Deadline for replication:

47 : # of most-recent messages a publisher can retransmit

: loss-tolerance requirement : topic’s sending period

: latency requirement

Primary Broker Publisher Subscriber

ΔPB ΔBS

Primary Broker Publisher Backup Broker Subscriber

ΔPB ΔBB

crash

x ( Ni + Li ) Ti

...

The deadline specifications aid to configuration of IIoT traffic/platform parameters. The deadline specifications help in configuring IIoT traffic/platform parameters.

slide-48
SLIDE 48

The FRAME messaging architecture

Ø EDF scheduling to dispatch/replicate a message Ø Suppress replication if the dispatch deadline is smaller Ø Prune dispatched messages

48

Data subscriber Data publisher Primary Broker Data publisher Publishers Data subscriber Subscribers ... ... Message Delivery Message Proxy Replicators Dispatchers Message Delivery Replicators Dispatchers Backup Broker Message Proxy

slide-49
SLIDE 49

Experiment design

Ø IIoT topic configuration: Ø Test-bed deployment: Ø Service configurations:

q FRAME+; FRAME; FCFS; FCFS-

49

ES2 ES1 Primary Broker Backup Broker Edge Subscribers Cloud Subscriber EP1 B1 B2 CS1 Edge Publishers EP2

slide-50
SLIDE 50

Loss tolerance performance

50

Success rate of meeting loss-tolerance requirements

A small increase in Ni can greatly improve performance (FRAME+). FRAME succeeded in assuring loss-tolerance.

slide-51
SLIDE 51

Latency performance for recovery

51

FRAME can mitigate latency penalty. Without suppressing replication, pruning could overload the system. No pruning, however, could cause latency penalty by re- sending obsolete messages.

when the Primary host crashed

slide-52
SLIDE 52

Latency during fault-free operations

52

Observation: Both replication and pruning could delay message dispatching…

Success rate of meeting soft latency requirements

How to improve efficiency ?

All configurations gave similar latency performance.

slide-53
SLIDE 53

Outline

Ø CPEP: Real-time cyber-physical event processing Ø FRAME: Fault-tolerant real-time messaging Ø ARREC: Adaptive real-time reliable edge computing

53

efficiency timeliness l

  • s

s

  • t
  • l

e r a n c e

Subscription & Filtering Supplier Proxies Event Correlation Dispatching Consumer Proxies

  • riginal TAO

ARREC

with ARREC Supplier Proxies Consumer Proxies

slide-54
SLIDE 54

Edge Computing for IIoT

Ø Timely, reliable, and efficient IIoT edge computing

q CPEP: Real-time cyber-physical event processing q FRAME: Fault-tolerant real-time messaging q ARREC: Adaptive real-time reliable edge computing 54

efficiency timeliness l

  • s

s

  • t
  • l

e r a n c e

CPEP FRAME ARREC

Wireless sensor network (e.g., in a wind farm) Private cloud for training and storage Machine learning training

...

...

Applications

...

IIoT services Edge 1

Database

...

Edge 2 Edge N

...

Cloud