1 Challenge 4: Adaptive QoS Control Certification Develop - - PDF document

1
SMART_READER_LITE
LIVE PREVIEW

1 Challenge 4: Adaptive QoS Control Certification Develop - - PDF document

Challenges for Real-Time Systems Adaptive QoS Control Classical real-time scheduling theory relies on accurate knowledge about workload and platform. in Distributed Real-Time Middleware New challenges under uncertainties Maintain robust


slide-1
SLIDE 1

1

Adaptive QoS Control in Distributed Real-Time Middleware

Chenyang Lu

Department of Computer Science and Engineering Washington University in St. Louis

2

Challenges for Real-Time Systems

Classical real-time scheduling theory relies on accurate knowledge about workload and platform. New challenges under uncertainties Maintain robust real-time properties in face of

unknown and varying workload system failure system upgrade

Certification and testing of real-time properties of adaptive systems

3

Challenge 1:

Workload Uncertainties

Task execution times

Heavily influenced by sensor data or user input Unknown and time-varying

Disturbances

Aperiodic events Resource contention from subsystems Denial of Service attacks

e.g., SCADA for power grid management, total ship computing environment

4

Challenge 2:

System Failure

Only maintaining functional reliability is not sufficient. Must also maintain robust real-time properties!

1. Norbert fails. 2. Move its tasks to other processors. hermione & harry are

  • verloaded!

5

Challenge 3:

System Upgrade

Goal: Portable application across HW/OS platforms

Same application “work” on multiple platforms

Existing real-time middleware

Support functional portability Lack QoS portability: must manually reconfigure applications for different platforms to achieve desired real-time properties

Profile execution times Determine/implement allocation and task rate Test/analyze schedulability

Time-consuming and expensive!

6

Example: nORB Middleware

Timer thread Worker thread

Server

… … Conn. thread … … … … … … Priority queues

nORB* Application

CORBA Objects Client

T1: 2 Hz T2: 12 Hz

Offline, manual config.

Conn. thread Operation Request Lanes

slide-2
SLIDE 2

2

7

Challenge 4:

Certification

Uncertainties call for adaptive solutions. But… Adaptation can make things worse. Adaptive systems are difficult to test and certify

0.2 0.4 0.6 0.8 1 100 200 300 Time (sampling period) CPU utilization P1 P2 Set Point

An unstable adaptive system

8

Adaptive QoS Control

Available resources? HW failure? Drivers/OS/HW? Applications Sensor/human input? Disturbance? Adaptive QoS Control Middleware

Maintain QoS guarantees

  • w/o accurate knowledge

about workload/platform

  • w/o hand tuning

Develop software feedback control in middleware Achieve robust real-time properties for many applications Apply control theory to design and analyze control algorithms Facilitate certification of embedded software

9

Adaptive QoS Control Middleware

FCS/nORB: Single server control FC-ORB: Distributed systems with end-to-end tasks

10

Feedback Control Real-Time Scheduling (FCS) Service

  • Developers specify
  • Performance specs
  • CPU utilization = 70%; Deadline miss ratio = 1%.
  • Tunable parameters
  • Range of task rate: digital control loop, video/data display
  • Quality levels: image quality, filters
  • Admission control
  • FCS guarantees specs by tuning parameters based on
  • nline feedbacks
  • Automatic: No need for hand tuning
  • Transparent from developers
  • Performance Portability!

11

A Feedback Control Loop

FC-U

Monitor

HW? Drivers/OS? Application?

Sensors, Inputs Specs

Us = 70%

Parameters

R1: [1, 5] Hz R2: [10, 20] Hz Middleware

Actuator Controller

U(k) {Ri(k+1)}

12

The FC-U Algorithm

Us: utilization reference Ku: control parameter Ri(0): initial rate 1. Get utilization U(k) from Utilization Monitor. 2. Utilization Controller: B(k+1) = B(k)+ Ku*(Us–U(k)) /* Integral Controller */ 3. Rate Actuator adjusts task rates Ri(k+1) = (B(k+1)/B(0))Ri(0) 4. Inform clients of new task rates.

slide-3
SLIDE 3

3

13

The Family of FCS Algorithms

FC-U controls utilization

Performance spec: U(k) = Us Meet all deadlines if Us ≤ schedulable utilization bound Relatively low utilization if utilization bound is pessimistic

FC-M controls miss ratio

Performance spec: M(k) = Ms High utilization Does not require utilization bound to be known a priori Small but non-zero deadline miss ratio: M(k) > 0

FC-UM combines FC-U and FC-M

Performance specs: Us, Ms Allow higher utilization than FC-U No deadline misses in “nominal” case Performance bounded by FC-M

14

Control Analysis

Rigorously designed based on feedback control theory Analytic guarantees on

Stability Steady state performance Transient state: settling time and overshoot Robustness against variation in execution time

Do not assume accurate knowledge of execution time

Lu, Stankovic, Tao, and Son, Feedback Control Real-Time Scheduling: Framework, Modeling, and Algorithms, Real-Time Systems, 23(1/2), July/September 2002.

15

Dynamic Response

Settling time

Controlled variable Time Reference

Steady State Transient State Steady state error Stability

16

FCS/nORB Architecture

Timer thread worker thread conn. thread

Server

… …

util monitor miss monitor controller rate assigner conn. thread

… …

rate modulator

feedback lane … … … Operation Request Lanes …

Priority Queues

FCS/nORB Application

CORBA Objects Client

17

Implementation

Running on top of COTS Linux Deadline Miss Monitor

Instrument operation request lanes Time-stamp operation request and response on each lane

CPU Utilization Monitor

Interface with Linux /proc/stat file Count idle time: “Coarse” granularity: jiffy (10 ms)

Only controls server delay

18

Offline or Online?

Offline

FCS executed in testing phase on a new platform Turned off after entering steady state No run-time overhead Cannot deal with varying workload

Online

Run-time overhead (actually small…) Robustness in face of changing execution times

slide-4
SLIDE 4

4

19

Set-up

  • OS: Redhat Linux
  • Hardware platform
  • Server A: 1.8GHz Celeron, 512 MB RAM
  • Server B: 1.99GHz Pentium 4, 256 MB RAM
  • Same client
  • Connected via 100 Mbps LAN
  • Experiment

1. Overhead 2. Steady execution time (offline case) 3. Varying execution time (on-line case)

20

Server Overhead

Server Overhead per Sampling Period

5 10 15 20 25 30 35 40 FC-U FC-M FC-UM Overhead (ms)

  • Overhead: FC-UM > FC-M > FC-U
  • FC-UM increases CPU utilization by <1% for a 4s sampling period.

Sampling Period = 4 sec

21

Performance Portability

Steady Execution Time

0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 25 50 75 100 125 150 175 200 Time (4 sec) U(k) B(k) M(k)

FC-U on Server A 1.8GHz Celeron, 512 MB RAM

0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 25 50 75 100 125 150 175 200 Time (4 sec) U(k) B(k) M(k)

  • Same CPU utilization (and no deadline miss) on

different platforms w/o hand-tuning!

FC-U on Server B 1.99GHz Pentium 4, 256 MB RAM

Us = 70%

22

Steady-state Deadline Miss Ratio

Server A

Average Deadline Miss Ratio in Steay State

1.49 0.00 0.50 1.00 1.50 2.00

FC-U FC-M FC-UM %

Ms = 1.5%

  • FC-M enforces miss ratio spec
  • FC-U, FC-UM causes no deadline misses

23

Steady-State CPU Utilization

Server A

Average CPU Utilization in Steady State 70.01 98.93 74.97 20 40 60 80 100

FC-U FC-M FC-UM %

Us = 70% Us = 75%

  • FC-U, FC-UM enforces utilization spec
  • FC-M achieves higher utilization

24

Robust Guarantees

Varying Execution Time

0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 50 100 150 200 250 300 350 400 Time (4 sec) U(k) B(k) M(k)

  • Same CPU utilization and no deadline miss in

steady state despite changes in execution times!

slide-5
SLIDE 5

5

25

Tolerance to Load Increase

Surprise: server crashes under FC-M when execution time increases

FCS/nORB threads run at real-time priority Kernel starvation when CPU utilization reaches 100%

Tolerance margin of load increase

FC-U, FC-UM: margin = 1/Us-1 Us=70% Server can tolerate (1/0.7-1)=43% of increase in execution time FC-M: small and “unknown” margin Inappropriate middleware-level service when execution time can increase unexpectedly

26

Summary of Experimental Results

FCS algorithms enforces specified CPU utilization or miss ratio in steady state

Experimental validation of control design and analysis of FCS

Performance Portability: FCS/nORB achieves the same performance guarantee when

platform changes execution time changes (within tolerance margin)

Overhead acceptable FCS can be used online

27

Summary: FCS/nORB

  • FCS/nORB supports robust, performance-portable

real-time software

  • Program application once runs on multiple platforms with

robust performance guarantees!

  • FCS/nORB 1.0 release:

http://deuce.doc.wustl.edu/FCS_nORB

  • Next: FC-ORB
  • Handle end-to-end tasks
  • Fault tolerance

28

Adaptive QoS Control Middleware

FCS/nORB: Single server control FC-ORB: Distributed systems with end-to-end tasks

29

End-to-End Task Model

Periodic task Ti = chain of subtasks {Tij} on different processors

All subtasks run at a same rate End-to-end deadline

Task rate can be adjusted within a range

Trade-off between video quality and rate Higher rate better video quality & higher CPU utilization

T1 T2 T3 T11 T12 T13 P1 P2 P3

Precedence Constraints Subtask 30

End-to-End Utilization Control

CPU utilization

Too high system overload crash Too low poor performance (e.g. poor video quality) Utilization < schedulable bound meet deadlines

Uncertainties: varying task execution times

Adjust task rates to compensate for variations

T1 T2 T3 T11 T12 T13 P1 P2 P3

Precedence Constraints Subtask

slide-6
SLIDE 6

6

31

Challenges of End-to-End Utilization Control

Multi-Input-Multi-Output (MIMO) control Utilizations are coupled due to end-to-end tasks

Rate change affects all processors in the task chain

Constraints on task rates Stability assurance

50% 80% 50% 80% 30% 60% 60%

Controller CPU utilization

T1 T2 T3 T11 T12 T13 P1 P2 P3

32

EUCON – Centralized Control Algorithm

Model Predictive Controller

Utilization Monitor Rate Modulator RM UM UM RM

Manipulated variables: Task rate changes Controlled Variables: CPU utilizations

EUCON (End-to-end Utilization CONtrol)

Designed based on Model Predictive Control (MPC) theory Invoked periodically to control the utilizations of all processors ⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎣ ⎡ ) ( ) (

1

k u k u

n

⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎣ ⎡ Δ Δ ) ( ) (

1

k r k r

m

⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎣ ⎡

m m R

R R R

max, 1 max, min, 1 min,

  • Allowed rate

range for tasks Desired utilization bounds (constraints) ⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎣ ⎡

n

B B

  • 1

33

FC-ORB

Feedback Controlled Object Request Broker

End-to-end utilization control

Maintains desired utilizations on all processors

End-to-end ORB architecture

Specialized for rate adaptation

Task migration

Improves reliability in terms of both system functions and real-time performance

34

End-to-End Utilization Control Service

Implements EUCON (End-to-end Utilization CONtrol) Provides functional and performance portability

Remote request lanes

Priority Manager Rate Modulator Model Predictive Controller

Remote request lanes

Utilization Monitor

Controlled variables: Utilizations Manipulated variables: Rate changes

Priority Manager Rate Modulator Utilization Monitor Priority Manager Rate Modulator Utilization Monitor

35

End-to-End Object Request Broker

Release guard for end-to-end tasks Priority management

Rate adaptation continuous priority changes Thread-per-priority high overhead Thread-per-subtask: Change priority only when the order

  • f task rate changes

Remote request lanes

Priority Manager Rate Modulator

Remote request lanes

Utilization Monitor Priority Manager Rate Modulator Utilization Monitor Priority Manager Rate Modulator Utilization Monitor

36

Task Migration

Fault model: permanent processor failure Subtasks have backups on different processors Utilization control + fault-tolerance

Automatic controller reconfiguration Handle overload caused by task migration

Remote request lanes

Priority Manager Rate Modulator Model Predictive Controller

Remote request lanes

Utilization Monitor

⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎣ ⎡ ) ( ) ( ) (

3 2 1

k u k u k u Utilizations ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ Δ Δ ) ( ) (

2 1

k r k r Rate changes

Priority Manager Rate Modulator Utilization Monitor Priority Manager Rate Modulator Utilization Monitor

⎥ ⎦ ⎤ ⎢ ⎣ ⎡ ) ( ) (

2 1

k u k u

slide-7
SLIDE 7

7

37

FC-ORB Implementation

Implemented based on FCS/nORB, nORB and ACE Specialized for memory constrained Distributed Real-time and Embedded (DRE) systems 7017 lines of C++ code Controller is implemented as a Dynamic Link Library (DLL) generated by MATLAB

38

Experimental Setup

12 tasks (25 subtasks) and 4 Pentium IV processors KURT Linux 2.4.22 Rate Monotonic Scheduling Subtasks on Norbert have backups on other processors

Harry

Ron

Hermione

1_1 2_1 1_4 1_3 1_2 2_2 3_1 3_2 4_1 4_2 4_3 5_1 5_2 5_3 6_1 6_2 6_3 7_1 7_2 8_1 8_2 12_1 10_1 9_1 11_1 1_3 3_2 5_3 7_2

Normal subtask Backup subtask

i_j i_j

Norbert

39

Goal 1: Robust Utilization Control

Execution times change at runtime Disturbance from external resource contention

Desired utilization: 73% (0.73)

40

Goal 2: Performance Portability

Same utilization – portable performance

Even on different systems with different computing capacity

0.2 0.4 0.6 0.8 1 200 400 600 800 1000 1200 1400 1600 Time (sec) CPU utilization ron harry norbert hermione

Real exec times are twice longer than normal (running on slow machines)

0.2 0.4 0.6 0.8 1 200 400 600 800 1000 1200 1400 1600 Time (sec) CPU utilization ron harry norbert hermione

Real exec times are 1/4 of normal (running on fast machines) Desired utilization: 73% (0.73)

41

Goal 3: Fault Tolerance

1. Norbert fails. 2. move its tasks to other processors. 3. reconfigure controller 4. control utilization by adjusting task rates

T1 T2 T11 T12 P1 P2 Norbert

73% 73%

T13 T3

100% !!

73%

42

Summary: FC-ORB

1. Robust utilization control, despite

  • unknown or varying execution times
  • external disturbances

2. Performance portability 3. Fault tolerance, in terms of

  • functionality
  • real-time performance
slide-8
SLIDE 8

8

43

Conclusion: Adaptive QoS Control

Software feedback control: achieve robust real-time properties under uncertainty Middleware: provides reusable adaptive QoS control services to many real-time applications Control analysis: facilitates certification of embedded software Future Advanced control: event-driven, discrete configurations. Coordination of multiple control policies Sophisticated fault tolerance techniques Certification/testing methodologies

44

Reading

Control of a single server

FCS/nORB: Feedback Control Real-Time Scheduling in ORB Middleware, RTAS’03. (required) FCS: Feedback Control Real-Time Scheduling: Framework, Modeling, and Algorithms, Real-Time Systems, 2002.

Centralized control of distributed systems

FC-ORB: Enhancing the Robustness of Distributed Real-Time Middleware via End-to-End Utilization Control, RTSS’05. EUCON: Feedback Utilization Control in Distributed Real- Time Systems with End-to-End Tasks, RTSS’05, IEEE TPDS.

Decentralized control of distributed systems

DEUCON: Decentralized Utilization Control in Distributed Real-Time Systems, RTSS’05, IEEE TPDS.

45

For More Information

Papers: http://www.cs.wustl.edu/~lu Open source middleware: http://www.cse.wustl.edu/~lu/aqc.htm