Practical Steady-State Scheduling for Tree-Shaped Task Graphs Skou D - PowerPoint PPT Presentation

Practical Steady-State Scheduling for Tree-Shaped Task Graphs Sékou D IAKITÉ 1 , Loris M ARCHAL 2 , Jean-Marc N ICOD 1 , Laurent P HILIPPE 1 - 19/11/2009 1 : Laboratoire d’Informatique de Franche-Comté Université de France Comté, France 2 : Laboratoire de l’Informatique du Parallélisme CNRS - INRIA - Université de Lyon, France

Outline Scheduling problem Principle of steady-state scheduling Overview Shortcomings Reducing the latency Dependencies Mixed Integer Program Heuristic approach Using non-conservative steady-state solutions Experimental results Simulation settings Inter-period dependencies Scheduling efficiency Number of running instances Running time of the algorithms Synthesis D IAKITÉ , M ARCHAL , N ICOD , P HILIPPE ROMA/GRAAL working Group - 19/11/2009 2 / 43

Scheduling problem Definitions Execution platform undirected graph, G p = ( V p , E p ) V p = { P 1 , ..., P n } : n processors E p : communication links between the processors bidirectional one-port model c i , j is the time needed to send a unit of data from P i to P j Example P 1 P 2 P 3 P 4 D IAKITÉ , M ARCHAL , N ICOD , P HILIPPE ROMA/GRAAL working Group - 19/11/2009 3 / 43

Scheduling problem Definitions Application DAG with no forks (in-trees), G a = ( V a , E a ) V a = { T 1 , ..., T k } : k tasks unrelated computation model, w i , k : time needed by P i to execute T k E a dependencies between tasks F k , l is the amount of data (File) produced by T k and consumed by T l Example T 1 10 to 1000 times T 3 T 4 T 2 D IAKITÉ , M ARCHAL , N ICOD , P HILIPPE ROMA/GRAAL working Group - 19/11/2009 4 / 43

Scheduling problem How to ? Problem Executing a batch of graphs (from 10 to 1000) Objective Minimizing the makespan C max Chosen method Steady-state technique which is asymptotically optimal (throughput) D IAKITÉ , M ARCHAL , N ICOD , P HILIPPE ROMA/GRAAL working Group - 19/11/2009 5 / 43

Principle of steady-state scheduling Overview This study is based on O. Beaumont, A. Legrand, L. Marchal and Y. Robert. Steady-state scheduling on heterogeneous clusters. Int. J. of Foundations of Computer Science, 16(2) :163-194, 2005. D IAKITÉ , M ARCHAL , N ICOD , P HILIPPE ROMA/GRAAL working Group - 19/11/2009 7 / 43

Principle of steady-state scheduling Overview Converting the scheduling problem to a linear program the steady-state is characterized by activities variables the average number of T k processed by P i in one time unit the average number of F k , l sent by P i to P j in one time unit these activities variables allow us to write constraints on processor speeds and link bandwidths "conservation laws" to state that F k , l has to be produced by T k and consumed by T l these constraints describe a valid steady-state schedule by adding the objective of maximizing the steady-state throughput, we obtain a linear program D IAKITÉ , M ARCHAL , N ICOD , P HILIPPE ROMA/GRAAL working Group - 19/11/2009 8 / 43

Principle of steady-state scheduling Overview From the linear program to a periodic schedule (period) the optimal solution of the linear program gives rational activities we can not split tasks and files → the period length L is equal to the LCM of activities denominators → we multiply every activity by L , activities are now integers L is large but bounded the period allows us to schedule any number of graphs, the final schedule consists in 3 phases initialization steady-state : n × periods clean-up D IAKITÉ , M ARCHAL , N ICOD , P HILIPPE ROMA/GRAAL working Group - 19/11/2009 9 / 43

Principle of steady-state scheduling Overview Example L processor P 1 T 1 P 1 P 2 T 1 processor P 2 T 1 T 1 T 1 F 1 , 2 processor P 3 T 2 T 2 link P 1 → P 3 F 1 , 2 T 2 P 3 T 2 T 2 link P 2 → P 3 F 1 , 2 A 2 A 1 Platform graph Task graph Steady-state period Allocations D IAKITÉ , M ARCHAL , N ICOD , P HILIPPE ROMA/GRAAL working Group - 19/11/2009 10 / 43

Principle of steady-state scheduling Shortcomings Long latency several periods are necessary to process an instance → drawback for interactive applications → lead to large buffers : at every time step, a large number of ongoing job has to be stored Long initialization and clean-up phases the period contains a large number of ongoing job → long initialization phase to enter steady-state → long clean-up phase to leave steady-state initialization and clean-up are done with heuristic scheduling → we lose the benefit of the optimal steady-state phase D IAKITÉ , M ARCHAL , N ICOD , P HILIPPE ROMA/GRAAL working Group - 19/11/2009 12 / 43

Principle of steady-state scheduling Shortcomings Addressing the shortcomings the original steady-state algorithm reaches good C max as soon as the number of instances is large enough in this study, we aim at reducing this threshold D IAKITÉ , M ARCHAL , N ICOD , P HILIPPE ROMA/GRAAL working Group - 19/11/2009 13 / 43

Principle of steady-state scheduling Addressing the shortcomings Means of actions decrease the length of the period hard to do when we want to keep an optimal period reduce the latency (inter/intra dependencies) side benefit : less work to do in initialization and clean-up (gain on C max ) reduce the period length by allowing a small reduction of the throughput side benefit : reducing the latency D IAKITÉ , M ARCHAL , N ICOD , P HILIPPE ROMA/GRAAL working Group - 19/11/2009 14 / 43

Reducing the latency Dependencies How the reduce the latency ? Intra-period dependencies. The original steady-state (only inter-period dependencies) T 2 n T 2 n + 2 T 2 n + 4 1 1 1 P 1 P 2 T 2 n + 1 T 1 T 2 n + 3 T 2 n + 5 1 1 1 T 2 n + 1 T 2 n F 1 , 2 T 2 n − 4 T 2 n − 3 T 2 n − 2 T 2 n − 1 2 2 2 2 2 2 F 2 n F 2 n − 2 F 2 n + 2 T 2 1 , 2 1 , 2 1 , 2 P 3 F 2 n + 1 F 2 n − 1 F 2 n + 3 1 , 2 1 , 2 1 , 2 Platform graph Task graph Steady-state schedule D IAKITÉ , M ARCHAL , N ICOD , P HILIPPE ROMA/GRAAL working Group - 19/11/2009 16 / 43

Reducing the latency Dependencies The steady-state with intra-period dependencies period n + 1 period n + 2 period n T 2 n T 2 n + 2 T 2 n + 4 1 1 1 T 2 n + 1 T 2 n + 3 T 2 n + 5 1 1 1 P 1 P 2 T 1 T 2 n T 2 n + 1 T 2 n − 3 T 2 n − 2 T 2 n − 1 T 2 n + 2 2 2 2 2 2 2 F 1 , 2 F 2 n F 2 n − 2 F 2 n + 2 1 , 2 1 , 2 1 , 2 T 2 F 2 n + 1 F 2 n − 1 F 2 n + 3 P 3 1 , 2 1 , 2 1 , 2 inter-period dependency Platform graph Task graph intra-period dependency D IAKITÉ , M ARCHAL , N ICOD , P HILIPPE ROMA/GRAAL working Group - 19/11/2009 17 / 43

Reducing the latency Mixed Integer Program Ordering Tasks ( T j , T k ) on the same processor P i binary variable y j , k = 1 if and only if T j is processed before T k t j is the starting time of task T j , L is the length of the period (1) t j − t k ≥ − y j , k × L y j , k + y k , j = 1 (2) t k − ( t j + w i , j ) ≥ ( y j , k − 1 ) × L (3) (4) t j + w i , j ≤ L D IAKITÉ , M ARCHAL , N ICOD , P HILIPPE ROMA/GRAAL working Group - 19/11/2009 19 / 43

Reducing the latency Mixed Integer Program Dependencies For each dependency T j → T k binary variable e j , k = 1 intra-period dependency ( e j , k = 0 inter-period) t k − ( t j + w i , j ) ≥ ( e j , k − 1 ) × L (5) Objective � Maximize � e j , k under the constraints (1) , (2) , (3) , (4) and (5) D IAKITÉ , M ARCHAL , N ICOD , P HILIPPE ROMA/GRAAL working Group - 19/11/2009 20 / 43

Practical Steady-State Scheduling for Tree-Shaped Task Graphs Skou D - PowerPoint PPT Presentation

Practical Steady-State Scheduling for Tree-Shaped Task Graphs Skou D IAKIT 1 , Loris M ARCHAL 2 , Jean-Marc N ICOD 1 , Laurent P HILIPPE 1 - 19/11/2009 1 : Laboratoire dInformatique de Franche-Comt Universit de France Comt, France 2 :

Aperiodic Task Scheduling Radek Pel anek Preemptive Scheduling Non-preemptive Scheduling

SHAPED CHARGE JET ATTACKS WHAT SHAPED CHARGE ? WHICH TEST SET-UP ? IMEMG's Expe rt Working Group

Are Hybrid Physical Designs Important? 1 B+ tree 2 C O L B+ tree 3 ? C O L C O L B+ tree

Scheduling tree-shaped task graphs to minimize memory and makespan Lionel Eyraud-Dubois (INRIA,

Tree Recursion Tree Recursion Tree-shaped processes arise whenever executing the body of a

Real-Time Scheduling slides: P. Puschner Scheduling Task Model Assumptions about task timing,

61A Lecture 21 Announcements Binary Trees Binary Tree Class 4 Binary Tree Class class

CPU Scheduling CPU Scheduling CPU Scheduling 101 CPU Scheduling 101 The CPU scheduler makes a

Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Uniprocessor Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms 2

Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Module 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Uniprocessor Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Three

CPU Scheduling CPU Scheduling CPU Scheduling 101 CPU Scheduling 101 The CPU scheduler makes a

Instruction Scheduling Last time Instruction scheduling using list scheduling Today

Black Brane Steady States Irene Amado Technion GGI 24 th March 2015 Based on collaboration with

Chapter 3. Steady-State Equivalent Circuit Modeling, Losses, and Efficiency 3.1. The dc

Using Fully y Homomorphic Encryp yption for St Statistical Analysi sis s of Categori rical,

Covariate Adjustment and Statistical Power Tara Slough EGAP Learning Days X Covariate Adjustment

AMath 483/583 Lecture 24 Notes: Outline: Heat equation and discretization OpenMP and

Lecture 5.2: Boundary conditions for the heat equation Matthew Macauley Department of

FE Review-Transportation 1 FE Review-Transportation 2 FE Review-Transportation 3 FE

HOMs and Wakes in LCLSII Superconducting Linac K. Bane ICFA Workshop on Higher Order Modes in