Dynamic Real-Time Workload: A Practical Approach Based On - PowerPoint PPT Presentation

Semi-Partitioned Scheduling of Dynamic Real-Time Workload: A Practical Approach Based On Analysis-driven Load Balancing Daniel Casini , Alessandro Biondi, and Giorgio Buttazzo Scuola Superiore Sant’Anna – ReTiS Laboratory Pisa, Italy 1

This talk in a nutshell Linear-time methods for task splitting Approximation scheme for C=D with very limited utilization loss (<3%) Load balancing algorithms for semi-partitioned scheduling How to handle dynamic workload under semi- partitioned scheduling with limited task re-allocations and high schedulability performance (>87%) 2

Dynamic real-time workload  Real-time tasks can join and leave the system dynamically CPUs CPU 1 CPU 2 𝜐 3 𝜐 5 𝜐 2 𝜐 4 𝜐 1 No a-priori knowledge of the workload 3

Is dynamic workload relevant?  Many real-time applications do not have a-priori knowledge of the workload  Cloud computing, multimedia, real-time databases, …  Example: multimedia applications with Linux that require guaranteed timing performance  Workload typically changes at runtime while the system is operating  SCHED_DEADLINE scheduling class can be used to achieve EDF scheduling with reservations 4

Is dynamic workload relevant?  Many real-time operating systems provide syscalls to spawn tasks at run- time (SCHED_DEADLINE) 5

Multiprocessor Scheduling  Most RTOSes for multiprocessors implement APA (Arbitrary Processor Affinities) schedulers 𝜐 3 𝜐 2 𝜐 1 CPUs Global Partitioned Scheduling Scheduling 6

Global Scheduling Provides automatic load-balancing ( transparent ) by construction CPUs 𝜐 3 𝜐 2 𝜐 1 CPU 1 CPU 2 7

Global Scheduling Automatic load balancing High run-time overhead Execution difficult to predict Difficult derivation of worst-case bounds … 8

Partitioned Scheduling Typically exploits a-priori knowledge of the workload and an off-line partitioning phase CPUs 𝜐 1 𝜐 4 𝜐 6 6 𝜐 2 𝜐 5 𝜐 7 𝜐 3 9

Semi-Partitioned Scheduling Anderson et al. (2005)  Builds upon partitioned scheduling  Tasks that do not fit in a processor are split into sub-tasks ′ 𝜐 3 𝜐 3 ′ 𝜐 3 ′′ ′′ 𝜐 3 𝜐 3 𝜐 1 𝜐 2 𝜐 3 may experience a migration across the two processors CPU 1 CPU 2 10

C=D Splitting Burns et al. (2010)  Allows to split tasks into multiple chunks, with the first n-1 chunks at zero-laxity (C = D)  Based on EDF ′ = (20, 20, 100) 𝜐 3 Zero-laxity chunk Example: two chunks C i = D i 𝜐 3 = ( 𝐷 𝑗 , 𝐸 𝑗 , 𝑈 𝑗 ) = (30, 100, 100) ′′ = (10, 80, 100) 𝜐 3 Last chunk ′′ = T i − D i ′ D i 11

C=D Splitting Burns et al. (2010)  Allows to split tasks into multiple chunks, with the first n-1 chunks at zero-laxity (C = D)  Based on EDF ′ = (20, 20, 100) 𝜐 3 20 100 migration ′′ = (10, 80, 100) 𝜐 3 10 80 12

A very important result Brandenburg and Gül (2016) “Global Scheduling Not Required” Empirically, near-optimal schedulability (99%+) achieved with simple, well-known and low-overhead techniques  Based on C=D Semi-Partitioned Scheduling  Performance achieved by applying multiple clever heuristics (off-line) Conceived for static workload 13

Semi-Partitioned Scheduling More predictable execution Reuse of results for uniprocessors Excellent worst-case performance Low overhead A-priori knowledge of the workload Off-line partitioning and splitting phase 14

Global vs Semi-partitioned Global Semi-Partitioned More predictable execution Automatic load balancing Reuse of results of uniprocessors High run-time overhead Excellent worst-case performance Execution difficult to predict Low overhead Difficulty in deriving Off-line partitioning and splitting worst-case bounds phase A-priori knowledge of the workload 15

HOW TO MAINTAIN THE BENEFITS OF SEMI-PARTITIONED SCHEDULING WITHOUT REQUIRING ANY OFF-LINE PHASE? How to partition and split tasks online? 16

This work  This work considers dynamic workload consisting of reservations (budget, period)  The consideration of this model is compliant with the one available in Linux (SCHED_DEADLINE), hence present in billions of devices around the world  The workload is executed under C=D Semi-Partitioned Scheduling budget  Budget splitting zero-laxity chunk remaining chunk 17

C=D Budget Splitting 𝜐 = (budget = 30, period = 100) to be split 𝜐 ′ = (20, 20, 100) 20 100 migration How to find a safe zero- 𝜐 ′′ = (10, 80, 100) 10 laxity budget? 80 18

How to find the zero-laxity budget? Burns et al. (2010)  Iterative process based on QPA ( Quick Processor- demand Analysis ) with high complexity (no bound provided by the authors)  Also used by Brandenburg and Gül (2016) Pseudo-polynomial START (exponential if U=1) yes Reduce 𝐷𝑗 QPA END no Fixed-point iteration Potentially looping for a high number of times 19

How to find the zero-laxity budget? Burns et al. (2010)  Iterative process based on QPA ( Quick Processor- demand Analysis ) with high complexity (no bound provided by the authors)  Also used by Brandenburg and Gül (2016) Pseudo-polynomial Unsuitable to be performed online ! START (exponential if U=1) yes Reduce 𝐷𝑗 QPA END no Fixed-point iteration Potentially looping for a high number of times 20

Our approach: approximated C=D Main goal : Compute a safe bound for the zero-laxity budget in linear time  In this work we proposed an approximate method based on solving a system of inequalities Constants depending on static task-set parameters 𝐷 ′ = 𝐸 ′ ≤ 𝐿 1 𝐷 ′ = min(𝐿 1 , … , 𝐿 𝑂 ) … 𝐷 ′ = 𝐸 ′ ≤ 𝐿 𝑂 order of number of tasks 21

Our approach: approximated C=D How have we achieved the closed-form formulation?  Approach based on approximate demand-bound functions dbf(t) Some of them similar to those proposed by Fisher et al. (2006) t  + theorems to obtain a closed-form formulation The derivation of the closed-form solution has been also mechanized with the Wolfram Mathematica tool 22

Approximated C=D: Extensions The approximation can be improved by:  Extension 1: Iterative algorithm that refines the bound Repeats for a fixed Approximated C=D END number k of refinements O(k*n)  Extension 2: Refinement on the precisions of the approximate dbfs dbf(t) Add a fixed number k of discontinuities O(k*n) t 23

Approximated C=D: Extensions The approximation can be improved by:  Extension 1: Iterative algorithm that refines the bound Repeats for a fixed Approximated C=D END number k of refinements We found that significant improvements O(k*n) can be achieved with just two iterations  Extension 2: Refinement on the precisions of the approximate dbfs dbf(t) Add a fixed number k of discontinuities O(k*n) t 24

Experimental Study  Measure the utilization loss introduced by our approach with respect to the (exact) Burns et al.’s algorithm Task-set ∗ 𝐷 𝑜𝑓𝑥 Burns et al.’s C=D ∗ ′ 𝑉 𝑜𝑓𝑥 − 𝑉 𝑜𝑓𝑥 𝜐 𝑜𝑓𝑥 Our approach ′ 𝐷 𝑜𝑓𝑥 to be split  Tested almost 2 Million of task sets over wide range of parameters 25

Representative Results 4 tasks Extension 1 is effective for low utilization values Extension 2 is effective for high utilization values The lower the better Increasing CPU load 26

Representative Results 4 tasks Extension 1 is effective for low utilization values Extension 2 is effective for high utilization values Utilization loss ~2% w.r.t. the exact algorithm 27

Representative Results 4 tasks Extension 1 is effective for low utilization values Extension 2 is effective for high utilization values 13 tasks The average utilization loss decreases as the number of tasks increases 28

Representative Results Utilization = 0.4 Utilization loss of the baseline approach reaches very low values for n > 12 Utilization = 0.6 Same trend observed for all utilization values 29

HOW TO APPLY ON-LINE SEMI-PARTITIONING TO PERFORM LOAD BALACING? 30

Why do not use classical approaches?  Existing task-placement algorithms for semi- partitioning would require reallocating many tasks (they were conceived for static workload) 𝜐 6 𝜐 5 𝜐 5 𝜐 6 𝜐 4 𝜐 4 𝜐 1 𝜐 1 𝜐 3 𝜐 2 𝜐 2 𝜐 3 CPU 1 CPU 2 CPU 1 CPU 2 New allocation Old allocation Impracticable to be performed on-line: the previous allocation cannot be ignored ! 31

The problem How to achieve high schedulability performance with  a very limited number of re-allocations; and  keeping the mechanism as simple as possible? Focus on practical applicability 32

Proposed approach First try a simple bin packing heuristics (e.g., first-fit) 𝜐 1 𝜐 3 𝜐 2 CPU 1 CPU 2 33

Proposed approach If not schedulable, try to split ′′ 𝜐 4 ′ 𝜐 4 𝜐 1 𝜐 3 𝜐 2 CPU 1 CPU 2 ′ 𝜐 4 𝜐 4 ′′ 𝜐 4 34

Proposed approach  How to split? take the maximum zero-laxity ′ 𝜐 8 budget across the processors 𝜐 8 ′′ 𝜐 8 ′ max 𝐷 8 ′,1 ′,2 ′,3 ′,4 𝐷 8 𝐷 8 𝐷 8 𝐷 8 𝜐 5 𝜐 1 𝜐 7 𝜐 3 𝜐 4 𝜐 2 𝜐 6 CPU 3 CPU 4 CPU 1 CPU 2 35

Dynamic Real-Time Workload: A Practical Approach Based On - PowerPoint PPT Presentation

Semi-Partitioned Scheduling of Dynamic Real-Time Workload: A Practical Approach Based On Analysis-driven Load Balancing Daniel Casini , Alessandro Biondi, and Giorgio Buttazzo Scuola Superiore SantAnna ReTiS Laboratory Pisa, Italy 1

Workload, Fatigue, and Sleep Disruption 1 Workload 1.What is workload? 2.What is the

WORKLOAD WORKLOAD WORKLOAD During exercise, nasal breathing causes a reduction in FEO 2

ASHA Workload Calculator What is Direct and Other indirect workload? activities Services

Workload Formulas Judicial Branch Workload Formulas and On-Bench Time Reporting | September 23,

DAY 2 Agenda for Today Introduce the workload characterization problem. Discuss a

Day 3 Agenda for Today Formulate simple problem statement Revisit the workload

Local 006 Workload Appeal COLLECTIVE AGREEMENT 2014:LETTER OF INTENT #2 Why a Workload Appeal?

CS 147: Computer Systems Performance Analysis Workload Selection 1 / 39 Overview CS147

Real- Real -Time Systems Time Systems Real- -Time Systems Time Systems Real

Real Real- -Time Systems Time Systems Designing a real- Designing a real -time system time

Real- Real -time systems time systems Real- Real -time programming time programming

Real graduates, Real graduates, real transitions, real transitions, real stories: real

Real Real Real Time Real-Time Time Time Model Checking Model Model Checking Model

Real-Time Communication Integrated Services: Integration of variety of services with

Andrea Bogie, Sarah Covington, Karen Meulendyke, and Sarah Goad Agenda Objectives Workload Study

Work Physiology & Workload Assessment Agenda Work Physiology Workload Assessment

Welcome! Smallest Ray Tracers: Executable 5692598 & 5683777: RTMini_minimal.exe 2803

Two-phase flow dynamics in ice sheets Ian Hewitt, University of Oxford Thanks to: Christian

Null cone membership for the left right action on tuples of matrices Gabor Ivanyos 1 , Jimmy Qiao

Photoproduction of Kaons Dalibor Skoupil, Petr Bydovsk Nuclear Physics Institute of the ASCR

Integrated Services in the Internet Lecture for QoS in the Internet course S-38.180

SURFnet6 SURFnet6 SURFnet6 Integrating the IP and Optical worlds Integrating the IP and Optical

Acoplamientos anmalos del quark top: la preparacin terica para los datos J. A.

Chapter 4: Implementing High Availability and Redundancy in a Campus Network CCNP-RS SWITCH

Dynamic Real-Time Workload: A Practical Approach Based On - PowerPoint PPT Presentation

Semi-Partitioned Scheduling of Dynamic Real-Time Workload: A Practical Approach Based On Analysis-driven Load Balancing Daniel Casini , Alessandro Biondi, and Giorgio Buttazzo Scuola Superiore SantAnna ReTiS Laboratory Pisa, Italy 1

Workload, Fatigue, and Sleep Disruption 1 Workload 1.What is workload? 2.What is the

WORKLOAD WORKLOAD WORKLOAD During exercise, nasal breathing causes a reduction in FEO 2

ASHA Workload Calculator What is Direct and Other indirect workload? activities Services

Workload Formulas Judicial Branch Workload Formulas and On-Bench Time Reporting | September 23,

DAY 2 Agenda for Today Introduce the workload characterization problem. Discuss a

Day 3 Agenda for Today Formulate simple problem statement Revisit the workload

Local 006 Workload Appeal COLLECTIVE AGREEMENT 2014:LETTER OF INTENT #2 Why a Workload Appeal?

CS 147: Computer Systems Performance Analysis Workload Selection 1 / 39 Overview CS147

Real- Real -Time Systems Time Systems Real- -Time Systems Time Systems Real

Real Real- -Time Systems Time Systems Designing a real- Designing a real -time system time

Real- Real -time systems time systems Real- Real -time programming time programming

Real graduates, Real graduates, real transitions, real transitions, real stories: real

Real Real Real Time Real-Time Time Time Model Checking Model Model Checking Model

Real-Time Communication Integrated Services: Integration of variety of services with

Andrea Bogie, Sarah Covington, Karen Meulendyke, and Sarah Goad Agenda Objectives Workload Study

Work Physiology &amp; Workload Assessment Agenda Work Physiology Workload Assessment

Welcome! Smallest Ray Tracers: Executable 5692598 &amp; 5683777: RTMini_minimal.exe 2803

Two-phase flow dynamics in ice sheets Ian Hewitt, University of Oxford Thanks to: Christian

Null cone membership for the left right action on tuples of matrices Gabor Ivanyos 1 , Jimmy Qiao

Photoproduction of Kaons Dalibor Skoupil, Petr Bydovsk Nuclear Physics Institute of the ASCR

Integrated Services in the Internet Lecture for QoS in the Internet course S-38.180

SURFnet6 SURFnet6 SURFnet6 Integrating the IP and Optical worlds Integrating the IP and Optical

Acoplamientos anmalos del quark top: la preparacin terica para los datos J. A.

Chapter 4: Implementing High Availability and Redundancy in a Campus Network CCNP-RS SWITCH

Work Physiology & Workload Assessment Agenda Work Physiology Workload Assessment

Welcome! Smallest Ray Tracers: Executable 5692598 & 5683777: RTMini_minimal.exe 2803