scheduling mimd parallel program a number of tasks
play

Scheduling MIMD parallel program A number of tasks executing - PDF document

Scheduling MIMD parallel program A number of tasks executing serially or in parallel The scheduling problem NP-complete problem (in general) Distribute tasks on processors so that minimal execution time Lecture 5: Load Balancing


  1. Scheduling • MIMD parallel program – A number of tasks executing serially or in parallel • The scheduling problem NP-complete problem (in general) – Distribute tasks on processors so that minimal execution time Lecture 5: Load Balancing is achieved • Optimal distribution – Processor allocation + execution order such that the execution time is minimized • Scheduling system (Consumer, Policy, Resource) Scheduler Consumer Resource Policy 1 2 Load Balancing Scheduling Principles • Local scheduling – Timesharing between processes on one processor Imperfect balance Perfect balance • Global scheduling – Allocate work to processors in a // system • Static allocation (before execution, at compile time) • Dynamic allocation (during execution) scheduler static dynamic sub-optimal optimal distributed non-distributed heuristic approx For the observer it is the longest cooperative non cooperative execution time that matters!!! optimal sub-optimal 3 heuristic approx 4 Dynamic Load Balancing Static Load Balancing • Scheduling decisions during program execution • Scheduling decisions are made before execution • Distributed – Task graph known before execution – Decisions made by local distributed schedulers – Each job is allocated to one processor statically – Cooperative • Optimal scheduling (impossible?) • Local schedulers cooperate ⇒ global scheduling • Sub-optimal scheduling – Non cooperative – Heuristics (use knowledge acquired through experience) • Local schedulers do not cooperate ⇒ affect only local • Example: Put tasks that communicate a lot on the same processor performance – Approximative • Non distributed • Limited machine-/program-model, suboptimal – Decisions made by one processor (master) • Drawbacks • Disadvantages – Can not handle non-determinism in programs, should not – Hard to find optimal schedulers be used when we do not know exactly what will happen – Overhead as it is done during execution (e.g. DFS-search) 5 6

  2. Other kinds of scheduling Static Scheduling ● Single application / multiple application system • Graph Theory Approach ● Only one application at the time, minimize execution time – (for programs without loops and jumps) for that application – DAG (directed acyclic graph) = task graph ● Several parallel applications (compare to batch-queues), – Start-node (no parents), exit-node (no children) minimize the average execution time for all applications • Machine Model ● Adaptive / non adaptive scheduling – Processors P = {P 1 , ..., P m } T 1 , A 1, D 14 ● Changes behavior depending on feedback from the system – Edge matrix (mxm), comm-cost P i,j 1 ● Is not affected by feedback – Processor performance S i [instructions per second] 5 1 • Parallel Program Model ● Preemptive / non-preemptive scheduling 1 2 ● Allows a process to be interrupted if it is allowed to – Tasks T = {T 1 , ..., T n } 2 3 4 10 5 8 – The execution order is given by the arrows resume later on 2 non-preemptive preemptive 3 – Communication matrix (nxn), no. elem. D i,j 2 2 ● Does not allow a process to 3 1 2 5 – Number of instructions A i 1 5 be interrupted 3 7 8 Optimal Scheduling Algorithms Construction of schedules • The scheduling problem is NP complete for the general case. Exceptions: • Schedule: mapping that allocates one or more – HLF (Highest Level First), CP (Critical Path), LP (Longest disjunct time interval to each task so that Path) which in most cases gives optimal scheduling – Exactly one processor gets each interval • List scheduling: priority list with nodes and allocate the nodes one by one to the processes. Choose the node with highest – The sum of the intervals equals the priority and allocate that to the first available process. Repeat execution time for the task until the list is empty. – Different intervals on the same processor – It varies between algorithms how to compute priority • Tree structured task graph. Simplification: do not overlap – All tasks have the same execution time – The order between tasks is maintained – All processors have the same performance – Some processor is always allocated a job • Arbitrary task graph on two processors. Simplification: – All tasks have the same execution time 9 10 List Scheduling Scheduling of a tree structured task graph • Remember • Level – Each task is allocated a priority & is placed in a list sorted by priority – maximum number of nodes from x to a – When a processor is free, allocate the task with the highest priority terminal node • If two tasks have the same priority, take one randomly • Optimal algorithm (HLF) • Different choice of priority gives different kinds of scheduling – Determine the level of each node = priority – Level gives closest to optimal priority order (HLF) – When a processor is available, schedule the Task #Pr Level ready task with the highest priority 1 1 Number of reasons 1 1 0 3 1 I'm not ready • HLF can fail 2 3 2 1 2 1 2 – You can always construct an example that fails 1 1 3 1 2 – Works for most algorithms 4 4 2 1 1 11 12

  3. Parallelism vs Communication Delay Scheduling Heuristics • Scheduling must be based on both • The complexity increases if the model allows – Communication delay – Tasks with different execution times – The time when a processor is ready to work – Different speed of the communication links • Trade-off between maximizing the parallelism & – Communication conflicts minimizing the communication (max-min problem) – Loops and jumps P1 P2 P1 P2 – Limited networks 1 1 1 Dx • Find suboptimal solutions 2 Dx 2 Dx 3 2 3 – Find, with the help of a heuristic, solutions 3 3 that most of the time are close to optimal Dx > T2 Dx < T2 13 14 Example, Trade-off The Granularity Problem // vs Communication Time • Find the best clustering of tasks in the task graph (minimize execution time) P1 P2 P1 P2 P1 • Coarse Grain 1 1 1 D3 1 D2 D3 D3 – Less parallelism 2 2 3 3 2 2 3 • Fine Grain Dx Dy 3 Dx Dy 4 4 4 – More parallelism 4 – More scheduling time D3 < T2, assign T3 to P2 Time = T1 + D3 + T3 + Dy + T4 , or – More communication conflicts Time = T1 + T2 + Dx + T4 If min(Dx, Dy) > T3 assign T3 to P1 15 16 Dynamic Load Balancing Redundant Computing • Local scheduling • Sometimes you may eliminate communication delays Example: Threads, Processes, I/O by duplicating work • Global scheduling Example: Some simulations P1 P2 P1 P2 – Pool of tasks / distributed pool of tasks 1 1 1 1 1 1 1 • receiver-initiated or sender-initiated 2 3 2 2 3 – Queue line structure 1 1 5 3 1 1 1 4 3 5 4 5 1 1 4 17 18

  4. Pool of Tasks Work Transfer - Distributed Centralized • Centralized • The receiver takes the initiative. ”Pull” • Decentralized – One process asks another process for work • Distributed – The process asks when it is out of work, or has too little to do. • How to choose processor – Works well, even when the system load is high to communicate with? Distributed – Can be expensive to approximate system loads 19 20 Decentralized Work Transfer - Distributed Work Transfer - Decentralized • Example of process choices • The sender takes the initiative. ”Push” – Load – One process sends work to another • (hard) process – Round robin – The process asks (or just sends) when it • Must make sure that the has too many tasks, or high load processes do not “get in phase”, i.e. they all ask the same process – Works well when the system load is low – Randomly (random polling) – Hard to know when to send • Good generator necessary?? 21 22 Queue Line Structure Tree Based Queue • Each process sends to one of two processes • Have two processes per node – generalization of the previous technique • One worker process that – computes – asks the queue for work • Another that – asks (to the left) for new tasks if the queue is nearly empty – receives new tasks from the left neighbor – receives requests from the right neighbor and from the worker process and answers these requests 23 24

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend