toward a fully decentralized algorithm for multiple bag
play

Toward a Fully Decentralized Algorithm for Multiple Bag-of-tasks - PowerPoint PPT Presentation

Toward a Fully Decentralized Algorithm for Multiple Bag-of-tasks Application Scheduling on Grids R emi Bertin, Arnaud Legrand, Corinne Touati Laboratoire LIG, CNRS-INRIA Grenoble, France Aussois Workshop A. Legrand (CNRS-LIG) INRIA-MESCAL


  1. Toward a Fully Decentralized Algorithm for Multiple Bag-of-tasks Application Scheduling on Grids R´ emi Bertin, Arnaud Legrand, Corinne Touati Laboratoire LIG, CNRS-INRIA Grenoble, France Aussois Workshop A. Legrand (CNRS-LIG) INRIA-MESCAL Fair and Distributed Scheduling 1 / 24

  2. Outline Framework 1 Lagrangian Optimization 2 Simulations: Early Results 3 A. Legrand (CNRS-LIG) INRIA-MESCAL Fair and Distributed Scheduling 2 / 24

  3. Motivation Large-scale distributed computing platforms result from the collab- oration of many users: ◮ Sharing resources amongst users should somehow be fair. A. Legrand (CNRS-LIG) INRIA-MESCAL Fair and Distributed Scheduling 3 / 24

  4. Motivation Large-scale distributed computing platforms result from the collab- oration of many users: ◮ Sharing resources amongst users should somehow be fair. ◮ The size of these systems prevents the use of centralized ap- proaches � need for distributed scheduling. A. Legrand (CNRS-LIG) INRIA-MESCAL Fair and Distributed Scheduling 3 / 24

  5. Motivation Large-scale distributed computing platforms result from the collab- oration of many users: ◮ Sharing resources amongst users should somehow be fair. ◮ The size of these systems prevents the use of centralized ap- proaches � need for distributed scheduling. ◮ Task regularity (SETI@home, BOINC, . . . ) � steady-state scheduling. A. Legrand (CNRS-LIG) INRIA-MESCAL Fair and Distributed Scheduling 3 / 24

  6. Motivation Large-scale distributed computing platforms result from the collab- oration of many users: ◮ Sharing resources amongst users should somehow be fair. ◮ The size of these systems prevents the use of centralized ap- proaches � need for distributed scheduling. ◮ Task regularity (SETI@home, BOINC, . . . ) � steady-state scheduling. Designing a Fair and Distributed scheduling algorithm for this framework. A. Legrand (CNRS-LIG) INRIA-MESCAL Fair and Distributed Scheduling 3 / 24

  7. Outline Framework 1 Lagrangian Optimization 2 Simulations: Early Results 3 A. Legrand (CNRS-LIG) INRIA-MESCAL Fair and Distributed Scheduling Framework 4 / 24

  8. Platform Model ◮ General platform graph G = ( N, E, W, B ) . W j B i → j W i A. Legrand (CNRS-LIG) INRIA-MESCAL Fair and Distributed Scheduling Framework 5 / 24

  9. Platform Model ◮ General platform graph G = ( N, E, W, B ) . ◮ Speed of P n ∈ N : W n (in MFlops/s). W j B i → j W i A. Legrand (CNRS-LIG) INRIA-MESCAL Fair and Distributed Scheduling Framework 5 / 24

  10. Platform Model ◮ General platform graph G = ( N, E, W, B ) . ◮ Speed of P n ∈ N : W n (in MFlops/s). ◮ Bandwidth of ( P i → P j ) : B i,j (in MB/s). W j B i → j W i A. Legrand (CNRS-LIG) INRIA-MESCAL Fair and Distributed Scheduling Framework 5 / 24

  11. Platform Model ◮ General platform graph G = ( N, E, W, B ) . ◮ Speed of P n ∈ N : W n (in MFlops/s). ◮ Bandwidth of ( P i → P j ) : B i,j (in MB/s). ◮ Linear-cost communication and computa- tion model: X/B i,j time units to send a W j message of size X from P i to P j . B i → j W i A. Legrand (CNRS-LIG) INRIA-MESCAL Fair and Distributed Scheduling Framework 5 / 24

  12. Platform Model ◮ General platform graph G = ( N, E, W, B ) . ◮ Speed of P n ∈ N : W n (in MFlops/s). ◮ Bandwidth of ( P i → P j ) : B i,j (in MB/s). ◮ Linear-cost communication and computa- tion model: X/B i,j time units to send a W j message of size X from P i to P j . ◮ Communications and computations can be B i → j overlapped. W i A. Legrand (CNRS-LIG) INRIA-MESCAL Fair and Distributed Scheduling Framework 5 / 24

  13. Platform Model ◮ General platform graph G = ( N, E, W, B ) . ◮ Speed of P n ∈ N : W n (in MFlops/s). ◮ Bandwidth of ( P i → P j ) : B i,j (in MB/s). ◮ Linear-cost communication and computa- tion model: X/B i,j time units to send a W j message of size X from P i to P j . ◮ Communications and computations can be B i → j overlapped. ◮ Multi-port communication model. W i A. Legrand (CNRS-LIG) INRIA-MESCAL Fair and Distributed Scheduling Framework 5 / 24

  14. Application Model Multiple applications: ◮ A set A of K applications A 1 , . . . , A K . A 1 A 2 A 3 A. Legrand (CNRS-LIG) INRIA-MESCAL Fair and Distributed Scheduling Framework 6 / 24

  15. Application Model Multiple applications: ◮ A set A of K applications A 1 , . . . , A K . ◮ Each consisting in a large number of same-size independent tasks � each application is defined by a communication cost w k (in MFlops) and a communication cost b k (in MB). A 1 A 2 A 3 A. Legrand (CNRS-LIG) INRIA-MESCAL Fair and Distributed Scheduling Framework 6 / 24

  16. Application Model Multiple applications: ◮ A set A of K applications A 1 , . . . , A K . ◮ Each consisting in a large number of same-size independent tasks � each application is defined by a communication cost w k (in MFlops) and a communication cost b k (in MB). ◮ Different communication and computation demands for differ- ent applications. A 1 A 2 A 3 A. Legrand (CNRS-LIG) INRIA-MESCAL Fair and Distributed Scheduling Framework 6 / 24

  17. Hierarchical Deployment ◮ Each application originates from a master node P m ( k ) that initially holds all the input data necessary for each application A k . P m (2) P m (1) A. Legrand (CNRS-LIG) INRIA-MESCAL Fair and Distributed Scheduling Framework 7 / 24

  18. Hierarchical Deployment ◮ Each application originates from a master node P m ( k ) that initially holds all the input data necessary for each application A k . P m (2) ◮ Communication are only required outwards P m (1) from the master nodes: the amount of data returned by the worker is negligible. A. Legrand (CNRS-LIG) INRIA-MESCAL Fair and Distributed Scheduling Framework 7 / 24

  19. Hierarchical Deployment ◮ Each application originates from a master node P m ( k ) that initially holds all the input data necessary for each application A k . P m (2) ◮ Communication are only required outwards P m (1) from the master nodes: the amount of data returned by the worker is negligible. ◮ Each application A k is deployed on the platform as a tree. Therefore if an application k wants to use a node P n , all its data will use a single path from P m ( k ) to P n denoted by ( P m ( k ) � P n ) . A. Legrand (CNRS-LIG) INRIA-MESCAL Fair and Distributed Scheduling Framework 7 / 24

  20. Steady-State Scheduling and Utility ◮ All tasks of a given application are identical and independent � we do not really need to care about where and when (as opposed to classical scheduling problems). A. Legrand (CNRS-LIG) INRIA-MESCAL Fair and Distributed Scheduling Framework 8 / 24

  21. Steady-State Scheduling and Utility ◮ All tasks of a given application are identical and independent � we do not really need to care about where and when (as opposed to classical scheduling problems). ◮ We only need to focus on average values in steady-state. A. Legrand (CNRS-LIG) INRIA-MESCAL Fair and Distributed Scheduling Framework 8 / 24

  22. Steady-State Scheduling and Utility ◮ All tasks of a given application are identical and independent � we do not really need to care about where and when (as opposed to classical scheduling problems). ◮ We only need to focus on average values in steady-state. ◮ Steady-state values: A. Legrand (CNRS-LIG) INRIA-MESCAL Fair and Distributed Scheduling Framework 8 / 24

  23. Steady-State Scheduling and Utility ◮ All tasks of a given application are identical and independent � we do not really need to care about where and when (as opposed to classical scheduling problems). ◮ We only need to focus on average values in steady-state. ◮ Steady-state values: ◮ Variables: average number of tasks of type k processed by pro- cessor n per time unit: ̺ n,k . A. Legrand (CNRS-LIG) INRIA-MESCAL Fair and Distributed Scheduling Framework 8 / 24

  24. Steady-State Scheduling and Utility ◮ All tasks of a given application are identical and independent � we do not really need to care about where and when (as opposed to classical scheduling problems). ◮ We only need to focus on average values in steady-state. ◮ Steady-state values: ◮ Variables: average number of tasks of type k processed by pro- cessor n per time unit: ̺ n,k . ◮ Throughput of application k : ̺ k = � n ∈ N ̺ n,k . A. Legrand (CNRS-LIG) INRIA-MESCAL Fair and Distributed Scheduling Framework 8 / 24

  25. Steady-State Scheduling and Utility ◮ All tasks of a given application are identical and independent � we do not really need to care about where and when (as opposed to classical scheduling problems). ◮ We only need to focus on average values in steady-state. ◮ Steady-state values: ◮ Variables: average number of tasks of type k processed by pro- cessor n per time unit: ̺ n,k . ◮ Throughput of application k : ̺ k = � n ∈ N ̺ n,k . Theorem 1. From “feasible” ̺ n,k , it is possible to build an optimal periodic infi- nite schedule (i.r. whose steady-state rates are exactly the ̺ n,k ). Such a schedule is asymptotically optimal for the makespan. A. Legrand (CNRS-LIG) INRIA-MESCAL Fair and Distributed Scheduling Framework 8 / 24

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend