Peer-to-peer cooperative scheduling architecture for National Grid - - PowerPoint PPT Presentation

peer to peer cooperative scheduling architecture for
SMART_READER_LITE
LIVE PREVIEW

Peer-to-peer cooperative scheduling architecture for National Grid - - PowerPoint PPT Presentation

Peer-to-peer cooperative scheduling architecture for National Grid Infrastructure L. Matyska, M. Ruda, S. Toth CESNET Czech Republic 10 th March 2010 10 th March 2010 ISGC2010 (Taipei, Taiwan) Cooperative scheduling 1 / 15 Job scheduling in


slide-1
SLIDE 1

Peer-to-peer cooperative scheduling architecture for National Grid Infrastructure

  • L. Matyska, M. Ruda, S. Toth

CESNET Czech Republic

10th March 2010

ISGC2010 (Taipei, Taiwan) Cooperative scheduling 10th March 2010 1 / 15

slide-2
SLIDE 2

Job scheduling in Grid

Many approaches and types of schedulers in standard grid Multi-layered approach Grid middleware usually deals with the three top layers

Pilot scheduling usually more user-centric

Usually requires remote services available

Often leads to local by-pass and direct cluster submits

ISGC2010 (Taipei, Taiwan) Cooperative scheduling 10th March 2010 2 / 15

slide-3
SLIDE 3

META Centrum

META Centrum (http://meta.cesnet.cz) Anyone remembers term metacomputing? Czech national grid infrastructure

Under umbrella of CESNET

Computational resources

Mostly clusters Installed across country, centrally managed

The same team involved in EGEE

Computing site, user and VO support, gLite development

Virtualization and job scheduling as one research focus

ISGC2010 (Taipei, Taiwan) Cooperative scheduling 10th March 2010 3 / 15

slide-4
SLIDE 4

Current META Centrum scheduling

Basic features Relies on batch schedulers more than usually Global batch system instead of multi-level scheduling Standard grid interface (gLite/Globus) also available Integrated with scheduling of virtual machines Based on a central PBSPro installation Central knowledge of system’s state

Easy implementation of global scheduling policies

Fairshare

Avoid problems with multi-level schedulers

Job stalled when waiting for cluster in maintenance Local jobs not visible to global scheduler

Support for large, multi-site jobs

ISGC2010 (Taipei, Taiwan) Cooperative scheduling 10th March 2010 4 / 15

slide-5
SLIDE 5

Deficiencies of the used approach

Scalability

Adding new sites increases burden on central scheduler

Stability of central-server based solution

Just limited support for wide area replication Inability to submit new jobs if central service not up/available

Local un-usability of a disconnected cluster

Leads to frustrated users, by-passing the META Centrum scheduling

Not able to cope with the planned major extension of the national grid infrastructure

ISGC2010 (Taipei, Taiwan) Cooperative scheduling 10th March 2010 5 / 15

slide-6
SLIDE 6

New scheduling architecture

Motivation Keep positive aspects of a centralized solution

Especially the ability to take global decisions While not introducing multi-level scheduling

Remove (some of) negative aspects of a centralized solution

Scalability Use of disconnected resources

General features Self-contained scheduler at each site (or even a large cluster)

Always able to accept jobs for the whole infrastructures Always able to submit jobs to the local cluster

Cooperating with similar schedulers at other sites

Exchanging information about the whole infrastructure (global state) Ability to make a “global” decision Moving jobs directly between schedulers

ISGC2010 (Taipei, Taiwan) Cooperative scheduling 10th March 2010 6 / 15

slide-7
SLIDE 7

New scheduling architecture

Motivation Keep positive aspects of a centralized solution

Especially the ability to take global decisions While not introducing multi-level scheduling

Remove (some of) negative aspects of a centralized solution

Scalability Use of disconnected resources

General features Self-contained scheduler at each site (or even a large cluster)

Always able to accept jobs for the whole infrastructures Always able to submit jobs to the local cluster

Cooperating with similar schedulers at other sites

Exchanging information about the whole infrastructure (global state) Ability to make a “global” decision Moving jobs directly between schedulers

ISGC2010 (Taipei, Taiwan) Cooperative scheduling 10th March 2010 6 / 15

slide-8
SLIDE 8

Proposed architecture in more detail

ISGC2010 (Taipei, Taiwan) Cooperative scheduling 10th March 2010 7 / 15

slide-9
SLIDE 9

Architecture implementation

Basic features: Torque in the hearth of each local scheduler Extended with

A gateway interface to accept jobs and store them into a routing queue A “global” scheduling strategy

L&B from gLite as the persistent information storage for job monitoring Lead on each site to: “Standard” Torque instalation Extended scheduler managing jobs from more servers Jobs submitted through gateway to routing queue Scheduler

Moves job to a different server where job has to be started Moves job to a local queue where job is started

Jobs monitored from any gateway, job information stored in L&B

ISGC2010 (Taipei, Taiwan) Cooperative scheduling 10th March 2010 8 / 15

slide-10
SLIDE 10

Main development tasks

Cooperative scheduling Torque enhancements to support peer-to-peer scheduling Maintenance of globally available information used for scheduling

Fair-share is using actual accounting information

Support for multi-site jobs Scheduler extensions PBSPro originally used for better stability across Czech Republic Switch to Torque

Need to port Kerberos support Need to port scheduling enhancements

Support for management of virtual machines

Magrathea system (extending node states) Direct support for virtualized fabrics must be ported to Torque too

ISGC2010 (Taipei, Taiwan) Cooperative scheduling 10th March 2010 9 / 15

slide-11
SLIDE 11

Current status

Peer-to-peer extensions—prototype done, reasonable overhead Fair-share—simple solution done, more development later Multi-site jobs—several possibilities in discussion Torque scheduler extensions—on-going work Kerberos support ported Magrathea support—on-going work Gateway and L&B usage—next phase

ISGC2010 (Taipei, Taiwan) Cooperative scheduling 10th March 2010 10 / 15

slide-12
SLIDE 12

Peer-to-peer overhead—Experimental setup

Series of measurements Realistic simulation of a production environment using light-VM extension of Linux kernel 5000 jobs submitted to 200 nodes on up to 5 sites All the jobs run

1000 2000 3000 4000 5000 JobCount Known jobs (jobs that entered the clusters) In system (jobs in the cluster now) Queued (jobs waiting in queues) Done (jobs that finished running) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Load 5 10 15 20 25 RunTrafic 5 10 15 20 25 30 35 40 45 50 55 60 100 200 300 400 500 600 700 800 900 1000 MoveTrafic

ISGC2010 (Taipei, Taiwan) Cooperative scheduling 10th March 2010 11 / 15

slide-13
SLIDE 13

Peer-to-peer overhead—Experimental setup

Interaction between schedulers and sites

750 800 850 900 950 1000 1050 1:1 2:1 2:2 optmized 2:2 2:2 w. job moving time (s) ISGC2010 (Taipei, Taiwan) Cooperative scheduling 10th March 2010 12 / 15

slide-14
SLIDE 14

Communication scheme

Original proposal, full information everywhere “Neighbor” approach, information routing On demand super-scheduler for multi-site jobs

ISGC2010 (Taipei, Taiwan) Cooperative scheduling 10th March 2010 13 / 15

slide-15
SLIDE 15

Conclusion

Cooperative scheduling architecture supports

High scalability (esp. with a proper communication scheme) Independence on remote services and local submit Ability to make decisions based on global state Free job movement between sites based on local scheduler decision Direct inclusion of virtualized resources Easy integration of different gateways (e.g. gLite CE interface)

Its META Centrum implementation underway

Based on a Torque system Extended to multi-site scheduling META Centrum native gateways Use of gLite L&B for job monitoring

Initial experiments encouraging (acceptable overhead for peer to peer communication) Expected to be in full production already this year

ISGC2010 (Taipei, Taiwan) Cooperative scheduling 10th March 2010 14 / 15

slide-16
SLIDE 16

Thank you Questions?

ISGC2010 (Taipei, Taiwan) Cooperative scheduling 10th March 2010 15 / 15