Flux: Practical Job Scheduling Dong H. Ahn, Ned Bass, Al Chu, Jim - - PowerPoint PPT Presentation

flux practical job scheduling
SMART_READER_LITE
LIVE PREVIEW

Flux: Practical Job Scheduling Dong H. Ahn, Ned Bass, Al Chu, Jim - - PowerPoint PPT Presentation

Flux: Practical Job Scheduling Dong H. Ahn, Ned Bass, Al Chu, Jim Garlick, Mark Grondona, Stephen Herbein , Tapasya Patki, Tom Scogland, Becky Springmeyer August 15, 2018 LLNL-PRES-757227 This work was performed under the auspices of the U.S.


slide-1
SLIDE 1

LLNL-PRES-757227

This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE- AC52-07NA27344. Lawrence Livermore National Security, LLC

Flux: Practical Job Scheduling

Dong H. Ahn, Ned Bass, Al Chu, Jim Garlick, Mark Grondona, Stephen Herbein, Tapasya Patki, Tom Scogland, Becky Springmeyer August 15, 2018

slide-2
SLIDE 2

LLNL-PRES-757227

2

What is Flux?

▪ New Resource and Job Management Software (RJMS) developed here at LLNL ▪ A way to manage remote resources and execute tasks on them

slide-3
SLIDE 3

LLNL-PRES-757227

3

What is Flux?

▪ New Resource and Job Management Software (RJMS) developed here at LLNL ▪ A way to manage remote resources and execute tasks on them

slide-4
SLIDE 4

LLNL-PRES-757227

3

What is Flux?

▪ New Resource and Job Management Software (RJMS) developed here at LLNL ▪ A way to manage remote resources and execute tasks on them

slide-5
SLIDE 5

LLNL-PRES-757227

3

What is Flux?

▪ New Resource and Job Management Software (RJMS) developed here at LLNL ▪ A way to manage remote resources and execute tasks on them

flickr: dannychamoro

slide-6
SLIDE 6

LLNL-PRES-757227

4

What is Flux?

▪ New Resource and Job Management Software (RJMS) developed here at LLNL ▪ A way to manage remote resources and execute tasks on them

slide-7
SLIDE 7

LLNL-PRES-757227

4

What is Flux?

▪ New Resource and Job Management Software (RJMS) developed here at LLNL ▪ A way to manage remote resources and execute tasks on them

slide-8
SLIDE 8

LLNL-PRES-757227

4

What is Flux?

▪ New Resource and Job Management Software (RJMS) developed here at LLNL ▪ A way to manage remote resources and execute tasks on them

slide-9
SLIDE 9

LLNL-PRES-757227

5

What about …?

slide-10
SLIDE 10

LLNL-PRES-757227

6

What about …? Closed-source

slide-11
SLIDE 11

LLNL-PRES-757227

7

What about …? Not designed for HPC

slide-12
SLIDE 12

LLNL-PRES-757227

8

What about …? Limited Scalability, Usability, and Portability

slide-13
SLIDE 13

LLNL-PRES-757227

9

Why Flux?

slide-14
SLIDE 14

LLNL-PRES-757227

9

▪ Extensibility

— Open source — Modular design with support for user plugins

Why Flux?

slide-15
SLIDE 15

LLNL-PRES-757227

9

▪ Extensibility

— Open source — Modular design with support for user plugins

▪ Scalability

— Designed from the ground up for exascale and beyond — Already tested at 1000s of nodes & millions of jobs

Why Flux?

slide-16
SLIDE 16

LLNL-PRES-757227

9

▪ Extensibility

— Open source — Modular design with support for user plugins

▪ Scalability

— Designed from the ground up for exascale and beyond — Already tested at 1000s of nodes & millions of jobs

▪ Usability

— C, Lua, and Python bindings that expose 100% of Flux’s functionality — Can be used as a single-user tool or a system scheduler

Why Flux?

slide-17
SLIDE 17

LLNL-PRES-757227

9

▪ Extensibility

— Open source — Modular design with support for user plugins

▪ Scalability

— Designed from the ground up for exascale and beyond — Already tested at 1000s of nodes & millions of jobs

▪ Usability

— C, Lua, and Python bindings that expose 100% of Flux’s functionality — Can be used as a single-user tool or a system scheduler

▪ Portability

— Optimized for HPC and runs in Cloud and Grid settings too — Runs on any set of Linux machines: only requires a list of IP addresses or PMI

Why Flux?

slide-18
SLIDE 18

LLNL-PRES-757227

9

▪ Extensibility

— Open source — Modular design with support for user plugins

▪ Scalability

— Designed from the ground up for exascale and beyond — Already tested at 1000s of nodes & millions of jobs

▪ Usability

— C, Lua, and Python bindings that expose 100% of Flux’s functionality — Can be used as a single-user tool or a system scheduler

▪ Portability

— Optimized for HPC and runs in Cloud and Grid settings too — Runs on any set of Linux machines: only requires a list of IP addresses or PMI

Why Flux?

Flux is designed to make hard scheduling problems easy

slide-19
SLIDE 19

LLNL-PRES-757227

10

Portability: Running Flux

slide-20
SLIDE 20

LLNL-PRES-757227

10

▪ Already installed on LC systems (including Sierra)

— spack install flux-sched for everywhere else

Portability: Running Flux

slide-21
SLIDE 21

LLNL-PRES-757227

10

▪ Already installed on LC systems (including Sierra)

— spack install flux-sched for everywhere else

▪ Flux can run anywhere that MPI can run, (via PMI – Process Management

Interface)

— Inside a resource allocation from: itself (hierarchical Flux), Slurm, Moab, PBS, LSF

, etc

— flux start OR srun flux start

Portability: Running Flux

slide-22
SLIDE 22

LLNL-PRES-757227

10

▪ Already installed on LC systems (including Sierra)

— spack install flux-sched for everywhere else

▪ Flux can run anywhere that MPI can run, (via PMI – Process Management

Interface)

— Inside a resource allocation from: itself (hierarchical Flux), Slurm, Moab, PBS, LSF

, etc

— flux start OR srun flux start

▪ Flux can run anywhere that supports TCP and you have the IP addresses

— flux broker -Sboot.method=config -Sboot.config_file=boot.conf — boot.conf:

Portability: Running Flux

session-id = "mycluster" tbon-endpoints = [ "tcp://192.168.1.1:8020", "tcp://192.168.1.2:8020", "tcp://192.168.1.3:8020"]

slide-23
SLIDE 23

LLNL-PRES-757227

11

▪ Extensibility

— Open source — Modular design with support for user plugins

▪ Scalability

— Designed from the ground up for exascale and beyond — Already tested at 1000s of nodes & millions of jobs

▪ Usability

— C, Lua, and Python bindings that expose 100% of Flux’s functionality — Can be used as a single-user tool or a system scheduler

▪ Portability

— Optimized for HPC and runs in Cloud and Grid settings too — Runs on any set of Linux machines: only requires a list of IP addresses or PMI

Why Flux?

slide-24
SLIDE 24

LLNL-PRES-757227

12

Usability: Submitting a Batch Job

slide-25
SLIDE 25

LLNL-PRES-757227

12

▪ Slurm

— sbatch –N2 –n4 –t 2:00 sleep 120

Usability: Submitting a Batch Job

slide-26
SLIDE 26

LLNL-PRES-757227

12

▪ Slurm

— sbatch –N2 –n4 –t 2:00 sleep 120

▪ Flux CLI

— flux submit –N2 –n4 –t 2m sleep 120

Usability: Submitting a Batch Job

slide-27
SLIDE 27

LLNL-PRES-757227

12

▪ Slurm

— sbatch –N2 –n4 –t 2:00 sleep 120

▪ Flux CLI

— flux submit –N2 –n4 –t 2m sleep 120

Usability: Submitting a Batch Job

▪ Flux API:

import json, flux jobreq = { 'nnodes' : 2, 'ntasks' : 4, 'walltime' : 120, 'cmdline' : ["sleep", "120"]} f = flux.Flux () resp = f.rpc_send ("job.submit", json.dumps(jobreq))

slide-28
SLIDE 28

LLNL-PRES-757227

13

Usability: Running an Interactive Job

slide-29
SLIDE 29

LLNL-PRES-757227

13

▪ Slurm

— srun –N2 –n4 –t 2:00 sleep 120

Usability: Running an Interactive Job

slide-30
SLIDE 30

LLNL-PRES-757227

13

▪ Slurm

— srun –N2 –n4 –t 2:00 sleep 120

▪ Flux CLI

— flux wreckrun –N2 –n4 –t 2m sleep 120

Usability: Running an Interactive Job

slide-31
SLIDE 31

LLNL-PRES-757227

13

▪ Slurm

— srun –N2 –n4 –t 2:00 sleep 120

▪ Flux CLI

— flux wreckrun –N2 –n4 –t 2m sleep 120

Usability: Running an Interactive Job

▪ Flux API:

import sys from flux import kz resp = f.rpc_send ("job.submit", json.dumps(jobreq)) kvs_dir = resp['kvs_dir'] for task_id in range(jobreq['ntasks']): kz.attach (f, "{}.{}.stdout".format(kvs_dir, task_id), sys.stdout) f.reactor_run (f.get_reactor (), 0)

slide-32
SLIDE 32

LLNL-PRES-757227

14

Usability: Tracking Job Status

slide-33
SLIDE 33

LLNL-PRES-757227

14

Usability: Tracking Job Status

▪ CLI: slow, non-programmatic, inconvenient to parse

— watch squeue –j JOBID — watch flux wreck ls JOBID

slide-34
SLIDE 34

LLNL-PRES-757227

14

Usability: Tracking Job Status

▪ CLI: slow, non-programmatic, inconvenient to parse

— watch squeue –j JOBID — watch flux wreck ls JOBID

▪ Tracking via the filesystem

— date > $JOBID.start; srun myApp; date > $JOBID.stop

slide-35
SLIDE 35

LLNL-PRES-757227

14

Usability: Tracking Job Status

▪ CLI: slow, non-programmatic, inconvenient to parse

— watch squeue –j JOBID — watch flux wreck ls JOBID

▪ Tracking via the filesystem

— date > $JOBID.start; srun myApp; date > $JOBID.stop

→ quota -vf ~/quota.conf Disk quotas for herbein1: Filesystem used quota limit files /p/lscratchrza 760.3G n/a n/a 8.6M

slide-36
SLIDE 36

LLNL-PRES-757227

14

Usability: Tracking Job Status

▪ CLI: slow, non-programmatic, inconvenient to parse

— watch squeue –j JOBID — watch flux wreck ls JOBID

▪ Tracking via the filesystem

— date > $JOBID.start; srun myApp; date > $JOBID.stop

→ quota -vf ~/quota.conf Disk quotas for herbein1: Filesystem used quota limit files /p/lscratchrza 760.3G n/a n/a 8.6M UQP Startup Job Submission File Creation File Access

I/O I/O

Non-I/O

Runtime Stages

slide-37
SLIDE 37

LLNL-PRES-757227

14

Usability: Tracking Job Status

▪ CLI: slow, non-programmatic, inconvenient to parse

— watch squeue –j JOBID — watch flux wreck ls JOBID

▪ Tracking via the filesystem

— date > $JOBID.start; srun myApp; date > $JOBID.stop

▪ Push notification via Flux’s Job Status and Control (JSC):

def jsc_cb (jcbstr, arg, errnum): jcb = json.loads (jcbstr) jobid = jcb['jobid'] state = jsc.job_num2state (jcb[jsc.JSC_STATE_PAIR][jsc.JSC_STATE_PAIR_NSTATE]) print "flux.jsc: job", jobid, "changed its state to ", state jsc.notify_status (f, jsc_cb, None)

slide-38
SLIDE 38

LLNL-PRES-757227

15

▪ Extensibility

— Open source — Modular design with support for user plugins

▪ Scalability

— Designed from the ground up for exascale and beyond — Already tested at 1000s of nodes & millions of jobs

▪ Usability

— C, Lua, and Python bindings that expose 100% of Flux’s functionality — Can be used as a single-user tool or a system scheduler

▪ Portability

— Optimized for HPC and runs in Cloud and Grid settings too — Runs on any set of Linux machines: only requires a list of IP addresses or PMI

Why Flux?

slide-39
SLIDE 39

LLNL-PRES-757227

16

Scalability: Running Many Jobs

slide-40
SLIDE 40

LLNL-PRES-757227

16

▪ Slurm

— find ./ -exec sbatch –N1 tar –cf {}.tgz {}\;

  • Slow: requires acquiring a lock in Slurm, can timeout causing failures
  • Inefficient: uses 1 node for each task

— find ./ -exec srun –n1 tar –cf {}.tgz {}\;

  • Slow: spawns a process for every submission
  • Inefficient: is not a true scheduler – can overlap tasks on cores

Scalability: Running Many Jobs

slide-41
SLIDE 41

LLNL-PRES-757227

16

▪ Slurm

— find ./ -exec sbatch –N1 tar –cf {}.tgz {}\;

  • Slow: requires acquiring a lock in Slurm, can timeout causing failures
  • Inefficient: uses 1 node for each task

— find ./ -exec srun –n1 tar –cf {}.tgz {}\;

  • Slow: spawns a process for every submission
  • Inefficient: is not a true scheduler – can overlap tasks on cores

Scalability: Running Many Jobs

slide-42
SLIDE 42

LLNL-PRES-757227

16

▪ Slurm

— find ./ -exec sbatch –N1 tar –cf {}.tgz {}\;

  • Slow: requires acquiring a lock in Slurm, can timeout causing failures
  • Inefficient: uses 1 node for each task

— find ./ -exec srun –n1 tar –cf {}.tgz {}\;

  • Slow: spawns a process for every submission
  • Inefficient: is not a true scheduler – can overlap tasks on cores

Scalability: Running Many Jobs

slide-43
SLIDE 43

LLNL-PRES-757227

16

▪ Slurm

— find ./ -exec sbatch –N1 tar –cf {}.tgz {}\;

  • Slow: requires acquiring a lock in Slurm, can timeout causing failures
  • Inefficient: uses 1 node for each task

— find ./ -exec srun –n1 tar –cf {}.tgz {}\;

  • Slow: spawns a process for every submission
  • Inefficient: is not a true scheduler – can overlap tasks on cores

Scalability: Running Many Jobs

▪ Flux API:

for f in os.listdir(‘.’): payload[‘command’] = [“tar”, “-cf”, ”{}.tgz”.format(f), f] resp = f.rpc_send ("job.submit", payload)

slide-44
SLIDE 44

LLNL-PRES-757227

16

▪ Slurm

— find ./ -exec sbatch –N1 tar –cf {}.tgz {}\;

  • Slow: requires acquiring a lock in Slurm, can timeout causing failures
  • Inefficient: uses 1 node for each task

— find ./ -exec srun –n1 tar –cf {}.tgz {}\;

  • Slow: spawns a process for every submission
  • Inefficient: is not a true scheduler – can overlap tasks on cores

Scalability: Running Many Jobs

▪ Flux API:

for f in os.listdir(‘.’): payload[‘command’] = [“tar”, “-cf”, ”{}.tgz”.format(f), f] resp = f.rpc_send ("job.submit", payload)

slide-45
SLIDE 45

LLNL-PRES-757227

16

▪ Slurm

— find ./ -exec sbatch –N1 tar –cf {}.tgz {}\;

  • Slow: requires acquiring a lock in Slurm, can timeout causing failures
  • Inefficient: uses 1 node for each task

— find ./ -exec srun –n1 tar –cf {}.tgz {}\;

  • Slow: spawns a process for every submission
  • Inefficient: is not a true scheduler – can overlap tasks on cores

Scalability: Running Many Jobs

▪ Flux API:

for f in os.listdir(‘.’): payload[‘command’] = [“tar”, “-cf”, ”{}.tgz”.format(f), f] resp = f.rpc_send ("job.submit", payload)

Subject: Good Neighbor Policy
 
 You currently have 271 jobs in the batch system on lamoab.
 
 The good neighbor policy is that users keep their maximum submitted job count at a maximum of 200 or less. Please try to restrict yourself to this limit in the future. Thank you.

slide-46
SLIDE 46

LLNL-PRES-757227

16

▪ Slurm

— find ./ -exec sbatch –N1 tar –cf {}.tgz {}\;

  • Slow: requires acquiring a lock in Slurm, can timeout causing failures
  • Inefficient: uses 1 node for each task

— find ./ -exec srun –n1 tar –cf {}.tgz {}\;

  • Slow: spawns a process for every submission
  • Inefficient: is not a true scheduler – can overlap tasks on cores

Scalability: Running Many Jobs

▪ Flux API:

for f in os.listdir(‘.’): payload[‘command’] = [“tar”, “-cf”, ”{}.tgz”.format(f), f] resp = f.rpc_send ("job.submit", payload)

Constant Output Job Stream

Capacitor Scheduler

Variable Input Job Stream

slide-47
SLIDE 47

LLNL-PRES-757227

16

▪ Slurm

— find ./ -exec sbatch –N1 tar –cf {}.tgz {}\;

  • Slow: requires acquiring a lock in Slurm, can timeout causing failures
  • Inefficient: uses 1 node for each task

— find ./ -exec srun –n1 tar –cf {}.tgz {}\;

  • Slow: spawns a process for every submission
  • Inefficient: is not a true scheduler – can overlap tasks on cores

▪ Flux Capacitor

— find ./ -printf -n1 tar –cf %p.tgz %p | flux-capacitor — flux-capacitor --command_file my_command_file

  • -n1 tar -cf dirA.tgz ./dirA
  • -n1 tar -cf dirB.tgz ./dirB
  • -n1 tar -cf dirC.tgz ./dirC

Scalability: Running Many Jobs

▪ Flux API:

for f in os.listdir(‘.’): payload[‘command’] = [“tar”, “-cf”, ”{}.tgz”.format(f), f] resp = f.rpc_send ("job.submit", payload)

slide-48
SLIDE 48

LLNL-PRES-757227

17

Scalability: Running Many Heterogeneous Jobs

slide-49
SLIDE 49

LLNL-PRES-757227

17

Scalability: Running Many Heterogeneous Jobs

▪ Slurm

— No support for heterogeneous job steps in versions before 17.11 — Limited support in versions after 17.11

slide-50
SLIDE 50

LLNL-PRES-757227

17

Scalability: Running Many Heterogeneous Jobs

▪ Slurm

— No support for heterogeneous job steps in versions before 17.11 — Limited support in versions after 17.11

https://slurm.schedmd.com/ heterogeneous_jobs.html#limitations

slide-51
SLIDE 51

LLNL-PRES-757227

17

Scalability: Running Many Heterogeneous Jobs

▪ Slurm

— No support for heterogeneous job steps in versions before 17.11 — Limited support in versions after 17.11

▪ Flux Capacitor

— flux-capacitor --command_file my_command_file

  • -n1 tar -cf dirA.tgz ./dirA
  • -n32 make –j 32
  • -N4 my_mpi_app
  • ...
slide-52
SLIDE 52

LLNL-PRES-757227

18

Scalability: Running Millions of Jobs

slide-53
SLIDE 53

LLNL-PRES-757227

18

▪ Flux Capacitor (Depth-1)

— flux-capacitor --command_file my_command_file

Scalability: Running Millions of Jobs

C lus ter

Node

C C C C C C C C C C C C C C C C

Node

C C C C C C C C C C C C C C C C

Node

C C C C C C C C C C C C C C C C

Node

C C C C C C C C C C C C C C C C

F lux Instance

Capacitor

slide-54
SLIDE 54

LLNL-PRES-757227

18

▪ Flux Capacitor (Depth-1)

— flux-capacitor --command_file my_command_file

▪ Hierarchical Flux Capacitor (Depth-2)

— for x in ./*.commands; do


flux submit -N1 flux start \
 flux-capacitor --command_file $x
 done

Scalability: Running Millions of Jobs

C lus ter

Node

C C C C C C C C C C C C C C C C

Node

C C C C C C C C C C C C C C C C

Node

C C C C C C C C C C C C C C C C

Node

C C C C C C C C C C C C C C C C

F lux Instance

F lux Instance F lux Instance F lux Instance F lux Instance C apacitor C apacitor C apacitor C apacitor

slide-55
SLIDE 55

LLNL-PRES-757227

18

▪ Flux Capacitor (Depth-1)

— flux-capacitor --command_file my_command_file

▪ Hierarchical Flux Capacitor (Depth-2)

— for x in ./*.commands; do


flux submit -N1 flux start \
 flux-capacitor --command_file $x
 done

▪ Flux Hierarchy (Depth-3+)

— flux-hierarchy --config=config.json


  • -command_file my_command_file

Scalability: Running Millions of Jobs

C lus ter

Node

C C C C C C C C C C C C C C C C

Node

C C C C C C C C C C C C C C C C

Node

C C C C C C C C C C C C C C C C

Node

C C C C C C C C C C C C C C C C

F lux Instance

F lux Instance F lux Instance F lux Instance F lux Instance

Capacitor

slide-56
SLIDE 56

LLNL-PRES-757227

18

▪ Flux Capacitor (Depth-1)

— flux-capacitor --command_file my_command_file

▪ Hierarchical Flux Capacitor (Depth-2)

— for x in ./*.commands; do


flux submit -N1 flux start \
 flux-capacitor --command_file $x
 done

▪ Flux Hierarchy (Depth-3+)

— flux-hierarchy --config=config.json


  • -command_file my_command_file

Scalability: Running Millions of Jobs

slide-57
SLIDE 57

LLNL-PRES-757227

19

▪ Extensibility

— Open source — Modular design with support for user plugins

▪ Scalability

— Designed from the ground up for exascale and beyond — Already tested at 1000s of nodes & millions of jobs

▪ Usability

— C, Lua, and Python bindings that expose 100% of Flux’s functionality — Can be used as a single-user tool or a system scheduler

▪ Portability

— Optimized for HPC and runs in Cloud and Grid settings too — Runs on any set of Linux machines: only requires a list of IP addresses or PMI

Why Flux?

slide-58
SLIDE 58

LLNL-PRES-757227

20

Extensibility: Modular Design

slide-59
SLIDE 59

LLNL-PRES-757227

20

▪ At the core of Flux is an overlay

network

— Built on top of ZeroMQ — Supports RPCs, Pub/Sub, Push/Pull, etc

Extensibility: Modular Design

Msg Idioms (RPC/Pub-Sub)

Overlay Networks & Routing Comms Message Broker

Flux Instance

slide-60
SLIDE 60

LLNL-PRES-757227

20

▪ At the core of Flux is an overlay

network

— Built on top of ZeroMQ — Supports RPCs, Pub/Sub, Push/Pull, etc

▪ Modules provide extended

functionality (i.e., services)

— User-built modules are loadable too — Some modules also support plugins

Extensibility: Modular Design

Msg Idioms (RPC/Pub-Sub)

Overlay Networks & Routing Comms Message Broker

Flux Instance

Sched Framework Remote Execution Policy Plugin A

Service Modules

Resource Key-Value Store Heartbeat

slide-61
SLIDE 61

LLNL-PRES-757227

20

▪ At the core of Flux is an overlay

network

— Built on top of ZeroMQ — Supports RPCs, Pub/Sub, Push/Pull, etc

▪ Modules provide extended

functionality (i.e., services)

— User-built modules are loadable too — Some modules also support plugins

▪ External tools and commands can

access services

— User authentication and roles supported

Extensibility: Modular Design

Msg Idioms (RPC/Pub-Sub)

Overlay Networks & Routing Comms Message Broker

Flux Instance

Sched Framework Remote Execution Policy Plugin A

Service Modules

Resource Key-Value Store Heartbeat

Commands

flux submit flux-capacitor

slide-62
SLIDE 62

LLNL-PRES-757227

21

Extensibility: Creating Your Own Module

slide-63
SLIDE 63

LLNL-PRES-757227

21

▪ Register a new service “pymod.new_job” that ingests jobs and responds with

a Job ID

Extensibility: Creating Your Own Module

slide-64
SLIDE 64

LLNL-PRES-757227

21

▪ Register a new service “pymod.new_job” that ingests jobs and responds with

a Job ID

Extensibility: Creating Your Own Module

import itertools, json, flux def handle_new_job(f, typemask, message, arg): job_queue, job_ids = arg job_queue.append(message.payload) response = {‘jobid’ : job_ids.next()} f.respond(message, 0, json.dumps(response)) def mod_main(f, *argv): f.msg_watcher_create(flux.FLUX_MSGTYPE_REQUEST, handle_new_job,"pymod.new_job",
 args=([], itertools.count(0))).start() f.reactor_run(f.get_reactor(), 0)

slide-65
SLIDE 65

LLNL-PRES-757227

21

▪ Register a new service “pymod.new_job” that ingests jobs and responds with

a Job ID

▪ Load using flux module load pymod --module=path/to/file.py

Extensibility: Creating Your Own Module

import itertools, json, flux def handle_new_job(f, typemask, message, arg): job_queue, job_ids = arg job_queue.append(message.payload) response = {‘jobid’ : job_ids.next()} f.respond(message, 0, json.dumps(response)) def mod_main(f, *argv): f.msg_watcher_create(flux.FLUX_MSGTYPE_REQUEST, handle_new_job,"pymod.new_job",
 args=([], itertools.count(0))).start() f.reactor_run(f.get_reactor(), 0)

slide-66
SLIDE 66

LLNL-PRES-757227

22

Extensibility: Flux’s Communication Overlay

slide-67
SLIDE 67

LLNL-PRES-757227

22

▪ Connect to a running flux instance

— f = flux.Flux()

Extensibility: Flux’s Communication Overlay

slide-68
SLIDE 68

LLNL-PRES-757227

22

▪ Connect to a running flux instance

— f = flux.Flux()

▪ Send an RPC to a service and receive a response

— resp = f.rpc_send (”pymod.new_job", payload)


jobid = json.loads(resp)[‘jobid’]

Extensibility: Flux’s Communication Overlay

slide-69
SLIDE 69

LLNL-PRES-757227

22

▪ Connect to a running flux instance

— f = flux.Flux()

▪ Send an RPC to a service and receive a response

— resp = f.rpc_send (”pymod.new_job", payload)


jobid = json.loads(resp)[‘jobid’]

▪ Subscribe to and publish an event

— f.event_subscribe(“node_down”)


f.msg_watcher_create(node_down_cb,
 raw.FLUX_MSGTYPE_EVENT,
 “node_down”).start()

— f.event_send(“node_down”)

Extensibility: Flux’s Communication Overlay

slide-70
SLIDE 70

LLNL-PRES-757227

23

Extensibility: Scheduler Plugins

slide-71
SLIDE 71

LLNL-PRES-757227

23

▪ Common, built-in scheduler plugins:

— First-come First-Served (FCFS) — Backfilling

  • Conservative
  • EASY
  • Hybrid

Extensibility: Scheduler Plugins

slide-72
SLIDE 72

LLNL-PRES-757227

23

▪ Common, built-in scheduler plugins:

— First-come First-Served (FCFS) — Backfilling

  • Conservative
  • EASY
  • Hybrid

▪ Various, advanced scheduler plugins:

— I/O-aware — CPU performance variability aware — Network-aware

Extensibility: Scheduler Plugins

slide-73
SLIDE 73

LLNL-PRES-757227

23

▪ Common, built-in scheduler plugins:

— First-come First-Served (FCFS) — Backfilling

  • Conservative
  • EASY
  • Hybrid

▪ Various, advanced scheduler plugins:

— I/O-aware — CPU performance variability aware — Network-aware

▪ Create your own!

Extensibility: Scheduler Plugins

slide-74
SLIDE 74

LLNL-PRES-757227

23

▪ Common, built-in scheduler plugins:

— First-come First-Served (FCFS) — Backfilling

  • Conservative
  • EASY
  • Hybrid

▪ Various, advanced scheduler plugins:

— I/O-aware — CPU performance variability aware — Network-aware

▪ Create your own! ▪ Loading the plugins

— flux module load sched.io-aware — FLUX_SCHED_OPTS="plugin=sched.fcfs" flux start

Extensibility: Scheduler Plugins

slide-75
SLIDE 75

LLNL-PRES-757227

24

▪ Flux-Framework code is available on GitHub ▪ Most project discussions happen in GitHub issues ▪ PRs and collaboration welcome!

Extensibility: Open Source

slide-76
SLIDE 76

LLNL-PRES-757227

24

▪ Flux-Framework code is available on GitHub ▪ Most project discussions happen in GitHub issues ▪ PRs and collaboration welcome!

Extensibility: Open Source

Thank You!

slide-77
SLIDE 77

Disclaimer This document was prepared as an account of work sponsored by an agency of the United States government. Neither the United States government nor Lawrence Livermore National Security, LLC, nor any of their employees makes any warranty, expressed or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or

  • therwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States

government or Lawrence Livermore National Security, LLC. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States government or Lawrence Livermore National Security, LLC, and shall not be used for advertising or product endorsement purposes.