Scheduling MIMD parallel program A number of tasks executing - PDF document

Scheduling • MIMD parallel program – A number of tasks executing serially or in parallel • The scheduling problem NP-complete problem (in general) – Distribute tasks on processors so that minimal execution time Lecture 5: Load Balancing is achieved • Optimal distribution – Processor allocation + execution order such that the execution time is minimized • Scheduling system (Consumer, Policy, Resource) Scheduler Consumer Resource Policy 1 2 Load Balancing Scheduling Principles • Local scheduling – Timesharing between processes on one processor Imperfect balance Perfect balance • Global scheduling – Allocate work to processors in a // system • Static allocation (before execution, at compile time) • Dynamic allocation (during execution) scheduler static dynamic sub-optimal optimal distributed non-distributed heuristic approx For the observer it is the longest cooperative non cooperative execution time that matters!!! optimal sub-optimal 3 heuristic approx 4 Dynamic Load Balancing Static Load Balancing • Scheduling decisions during program execution • Scheduling decisions are made before execution • Distributed – Task graph known before execution – Decisions made by local distributed schedulers – Each job is allocated to one processor statically – Cooperative • Optimal scheduling (impossible?) • Local schedulers cooperate ⇒ global scheduling • Sub-optimal scheduling – Non cooperative – Heuristics (use knowledge acquired through experience) • Local schedulers do not cooperate ⇒ affect only local • Example: Put tasks that communicate a lot on the same processor performance – Approximative • Non distributed • Limited machine-/program-model, suboptimal – Decisions made by one processor (master) • Drawbacks • Disadvantages – Can not handle non-determinism in programs, should not – Hard to find optimal schedulers be used when we do not know exactly what will happen – Overhead as it is done during execution (e.g. DFS-search) 5 6

Other kinds of scheduling Static Scheduling ● Single application / multiple application system • Graph Theory Approach ● Only one application at the time, minimize execution time – (for programs without loops and jumps) for that application – DAG (directed acyclic graph) = task graph ● Several parallel applications (compare to batch-queues), – Start-node (no parents), exit-node (no children) minimize the average execution time for all applications • Machine Model ● Adaptive / non adaptive scheduling – Processors P = {P 1 , ..., P m } T 1 , A 1, D 14 ● Changes behavior depending on feedback from the system – Edge matrix (mxm), comm-cost P i,j 1 ● Is not affected by feedback – Processor performance S i [instructions per second] 5 1 • Parallel Program Model ● Preemptive / non-preemptive scheduling 1 2 ● Allows a process to be interrupted if it is allowed to – Tasks T = {T 1 , ..., T n } 2 3 4 10 5 8 – The execution order is given by the arrows resume later on 2 non-preemptive preemptive 3 – Communication matrix (nxn), no. elem. D i,j 2 2 ● Does not allow a process to 3 1 2 5 – Number of instructions A i 1 5 be interrupted 3 7 8 Optimal Scheduling Algorithms Construction of schedules • The scheduling problem is NP complete for the general case. Exceptions: • Schedule: mapping that allocates one or more – HLF (Highest Level First), CP (Critical Path), LP (Longest disjunct time interval to each task so that Path) which in most cases gives optimal scheduling – Exactly one processor gets each interval • List scheduling: priority list with nodes and allocate the nodes one by one to the processes. Choose the node with highest – The sum of the intervals equals the priority and allocate that to the first available process. Repeat execution time for the task until the list is empty. – Different intervals on the same processor – It varies between algorithms how to compute priority • Tree structured task graph. Simplification: do not overlap – All tasks have the same execution time – The order between tasks is maintained – All processors have the same performance – Some processor is always allocated a job • Arbitrary task graph on two processors. Simplification: – All tasks have the same execution time 9 10 List Scheduling Scheduling of a tree structured task graph • Remember • Level – Each task is allocated a priority & is placed in a list sorted by priority – maximum number of nodes from x to a – When a processor is free, allocate the task with the highest priority terminal node • If two tasks have the same priority, take one randomly • Optimal algorithm (HLF) • Different choice of priority gives different kinds of scheduling – Determine the level of each node = priority – Level gives closest to optimal priority order (HLF) – When a processor is available, schedule the Task #Pr Level ready task with the highest priority 1 1 Number of reasons 1 1 0 3 1 I'm not ready • HLF can fail 2 3 2 1 2 1 2 – You can always construct an example that fails 1 1 3 1 2 – Works for most algorithms 4 4 2 1 1 11 12

Parallelism vs Communication Delay Scheduling Heuristics • Scheduling must be based on both • The complexity increases if the model allows – Communication delay – Tasks with different execution times – The time when a processor is ready to work – Different speed of the communication links • Trade-off between maximizing the parallelism & – Communication conflicts minimizing the communication (max-min problem) – Loops and jumps P1 P2 P1 P2 – Limited networks 1 1 1 Dx • Find suboptimal solutions 2 Dx 2 Dx 3 2 3 – Find, with the help of a heuristic, solutions 3 3 that most of the time are close to optimal Dx > T2 Dx < T2 13 14 Example, Trade-off The Granularity Problem // vs Communication Time • Find the best clustering of tasks in the task graph (minimize execution time) P1 P2 P1 P2 P1 • Coarse Grain 1 1 1 D3 1 D2 D3 D3 – Less parallelism 2 2 3 3 2 2 3 • Fine Grain Dx Dy 3 Dx Dy 4 4 4 – More parallelism 4 – More scheduling time D3 < T2, assign T3 to P2 Time = T1 + D3 + T3 + Dy + T4 , or – More communication conflicts Time = T1 + T2 + Dx + T4 If min(Dx, Dy) > T3 assign T3 to P1 15 16 Dynamic Load Balancing Redundant Computing • Local scheduling • Sometimes you may eliminate communication delays Example: Threads, Processes, I/O by duplicating work • Global scheduling Example: Some simulations P1 P2 P1 P2 – Pool of tasks / distributed pool of tasks 1 1 1 1 1 1 1 • receiver-initiated or sender-initiated 2 3 2 2 3 – Queue line structure 1 1 5 3 1 1 1 4 3 5 4 5 1 1 4 17 18

Pool of Tasks Work Transfer - Distributed Centralized • Centralized • The receiver takes the initiative. ”Pull” • Decentralized – One process asks another process for work • Distributed – The process asks when it is out of work, or has too little to do. • How to choose processor – Works well, even when the system load is high to communicate with? Distributed – Can be expensive to approximate system loads 19 20 Decentralized Work Transfer - Distributed Work Transfer - Decentralized • Example of process choices • The sender takes the initiative. ”Push” – Load – One process sends work to another • (hard) process – Round robin – The process asks (or just sends) when it • Must make sure that the has too many tasks, or high load processes do not “get in phase”, i.e. they all ask the same process – Works well when the system load is low – Randomly (random polling) – Hard to know when to send • Good generator necessary?? 21 22 Queue Line Structure Tree Based Queue • Each process sends to one of two processes • Have two processes per node – generalization of the previous technique • One worker process that – computes – asks the queue for work • Another that – asks (to the left) for new tasks if the queue is nearly empty – receives new tasks from the left neighbor – receives requests from the right neighbor and from the worker process and answers these requests 23 24

Scheduling MIMD parallel program A number of tasks executing - PDF document

Scheduling MIMD parallel program A number of tasks executing serially or in parallel The scheduling problem NP-complete problem (in general) Distribute tasks on processors so that minimal execution time Lecture 5: Load Balancing

Scheduling Aperiodic Tasks Background Scheduling Treat aperiodic tasks as lowest-priority

MIMD Multicomputer Mesh, ring, linear array, 2D-torus, 3D-mesh 3D-torus, tree fat tree,

MIMD Overview Intel Paragon XP/S Overview MIMDs in the 1980s and 1990s

Time Management Beth Asbury Outline Time Bandits Scheduling tasks Prioritising tasks

Aperiodic Task Scheduling Radek Pel anek Preemptive Scheduling Non-preemptive Scheduling

Short Summery Taxonomy of parallel computers SISD: von Neumann model SIMD: Single

Priority-Driven Scheduling of Periodic Tasks Priority-driven vs. clock-driven scheduling:

CPU Scheduling CPU Scheduling CPU Scheduling 101 CPU Scheduling 101 The CPU scheduler makes a

Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Uniprocessor Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms 2

Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Module 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Uniprocessor Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Three

CPU Scheduling CPU Scheduling CPU Scheduling 101 CPU Scheduling 101 The CPU scheduler makes a

Instruction Scheduling Last time Instruction scheduling using list scheduling Today

Stochastic Load Balancing on Unrelated Machines Viswanath Nagarajan Industrial & Operations

Approximation Algorithms Q. Suppose I need to solve an NP-hard problem. What should I do? A.

Epidemic Algorithm for Load Balancing Harshitha Menon, Laxmikant Kal e 15th April 1 / 25

COMP558 Network Games Martin Gairing University of Liverpool, Computer Science Dept 2nd

Load balancing Prof. Richard Vuduc Georgia Institute of Technology CSE/CS 8803 PNA: Parallel

Efficient Discovery of Load-Balanced Paths Alistair King al@bellstreet.co.nz Load-Balancer

Shared Memory Parallelism in Ada: Load Balancing by Work Stealing Jan Verschelde University of

CSC373 Weeks 9 & 10: Approximation Algorithms & Local Search 373F19 - Nisarg Shah &

Scheduling MIMD parallel program A number of tasks executing - PDF document

Scheduling MIMD parallel program A number of tasks executing serially or in parallel The scheduling problem NP-complete problem (in general) Distribute tasks on processors so that minimal execution time Lecture 5: Load Balancing

Scheduling Aperiodic Tasks Background Scheduling Treat aperiodic tasks as lowest-priority

MIMD Multicomputer Mesh, ring, linear array, 2D-torus, 3D-mesh 3D-torus, tree fat tree,

MIMD Overview Intel Paragon XP/S Overview MIMDs in the 1980s and 1990s

Time Management Beth Asbury Outline Time Bandits Scheduling tasks Prioritising tasks

Aperiodic Task Scheduling Radek Pel anek Preemptive Scheduling Non-preemptive Scheduling

Short Summery Taxonomy of parallel computers SISD: von Neumann model SIMD: Single

Priority-Driven Scheduling of Periodic Tasks Priority-driven vs. clock-driven scheduling:

CPU Scheduling CPU Scheduling CPU Scheduling 101 CPU Scheduling 101 The CPU scheduler makes a

Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Uniprocessor Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms 2

Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Module 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Uniprocessor Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Three

CPU Scheduling CPU Scheduling CPU Scheduling 101 CPU Scheduling 101 The CPU scheduler makes a

Instruction Scheduling Last time Instruction scheduling using list scheduling Today

Stochastic Load Balancing on Unrelated Machines Viswanath Nagarajan Industrial &amp; Operations

Approximation Algorithms Q. Suppose I need to solve an NP-hard problem. What should I do? A.

Epidemic Algorithm for Load Balancing Harshitha Menon, Laxmikant Kal e 15th April 1 / 25

COMP558 Network Games Martin Gairing University of Liverpool, Computer Science Dept 2nd

Load balancing Prof. Richard Vuduc Georgia Institute of Technology CSE/CS 8803 PNA: Parallel

Efficient Discovery of Load-Balanced Paths Alistair King al@bellstreet.co.nz Load-Balancer

Shared Memory Parallelism in Ada: Load Balancing by Work Stealing Jan Verschelde University of

CSC373 Weeks 9 &amp; 10: Approximation Algorithms &amp; Local Search 373F19 - Nisarg Shah &amp;

Stochastic Load Balancing on Unrelated Machines Viswanath Nagarajan Industrial & Operations

CSC373 Weeks 9 & 10: Approximation Algorithms & Local Search 373F19 - Nisarg Shah &