Day 13: Scripting Workflows II DAGMan 2012 Fall Cartwright 1 - - PowerPoint PPT Presentation

day 13 scripting workflows ii dagman
SMART_READER_LITE
LIVE PREVIEW

Day 13: Scripting Workflows II DAGMan 2012 Fall Cartwright 1 - - PowerPoint PPT Presentation

Computer Sciences 368 Scripting for CHTC Day 13: Scripting Workflows II DAGMan 2012 Fall Cartwright 1 Computer Sciences 368 Scripting for CHTC Homework Review 2012 Fall Cartwright 2 Computer Sciences 368 Scripting for CHTC Advanced


slide-1
SLIDE 1

Cartwright 2012 Fall

Computer Sciences 368 Scripting for CHTC

Day 13: Scripting Workflows II DAGMan

1

slide-2
SLIDE 2

Cartwright 2012 Fall

Computer Sciences 368 Scripting for CHTC

Homework Review

2

slide-3
SLIDE 3

Cartwright 2012 Fall

Computer Sciences 368 Scripting for CHTC

Advanced DAGMan

3

slide-4
SLIDE 4

Cartwright 2012 Fall

Computer Sciences 368 Scripting for CHTC

Retrying Nodes

4

  • Specifies number of times to retry given node
  • Affects entire node, not just its job
  • Especially useful if job is sensitive to environment

RETRY node count UNLESS-EXIT value JOB Analyze1 analysis.sub RETRY Analyze1 3 UNLESS-EXIT 99

slide-5
SLIDE 5

Cartwright 2012 Fall

Computer Sciences 368 Scripting for CHTC

Node Directories

5

  • Use directory for all files for this node
  • Submit file, executable, inputs, outputs, everything
  • Effectively:

cd directory condor_submit submit-file

  • In submit, reference common files as, e.g., ../foo

JOB name submit-file DIR directory JOB Wibble wibble.sub DIR wibble % ls wibble go-wibble.py input-1.txt wibble.sub

slide-6
SLIDE 6

Cartwright 2012 Fall

Computer Sciences 368 Scripting for CHTC

Node Priorities

6

  • Sets DAGMan priority for the given node
  • Determines when DAGMan submits job to queue
  • Hence, different than job priority (set in submit file)
  • Useful when throttling jobs (-maxjobs, -maxidle)
  • Integer (+/–), defaults to 0, higher submits sooner

PRIORITY node value JOB Analyze1 analysis.sub PRIORITY Analyze1 10 JOB Analyze2 analysis.sub PRIORITY Analyze2 5

slide-7
SLIDE 7

Cartwright 2012 Fall

Computer Sciences 368 Scripting for CHTC

Skipping Nodes

7

  • If node’s Pre-Script exits with the given exit status,

skip rest of node

  • Node is marked as successful

PRE_SKIP node exit-status JOB Foo foo.sub SCRIPT PRE Foo set-up-foo.py PRE_SKIP Foo 1

slide-8
SLIDE 8

Cartwright 2012 Fall

Computer Sciences 368 Scripting for CHTC

Node Variables

8

  • Define macro(s) (= variable(s)) for submit file
  • macroname is \w+, cannot start with queue
  • Multiple macros for node on same line, or separate
  • In value, $(JOB) expands to node name node

VARS node macroname="value" ... JOB Foo foo.sub VARS Foo arg1="hello" arg2="42" VARS Foo arg3="$(JOB)"

slide-9
SLIDE 9

Cartwright 2012 Fall

Computer Sciences 368 Scripting for CHTC

Using Node Variables

9

  • In HTCondor submit, use macro as $(macroname)

JOB Foo foo.sub VARS Foo arg1="hello" arg2="42" VARS Foo arg3="$(JOB)" executable = /bin/echo universe = local

  • utput = test.out

error = test.err log = test.log arguments = "... $(arg1) -n=$(arg2) ..." queue

slide-10
SLIDE 10

Cartwright 2012 Fall

Computer Sciences 368 Scripting for CHTC

Node Variables Can Simplify Submit Files

  • Move data from many submit files to 1 DAGMan file
  • Use VARS, $(cluster), and/or $(process)

10

JOB Analysis1 analysis.sub VARS Analysis1 jobname="$(JOB)" arg="ABW" JOB Analysis2 analysis.sub VARS Analysis2 jobname="$(JOB)" arg="ADO"

  • utput = analysis.$(jobname).out

error = analysis.$(jobname).err log = analysis.log arguments = "$(arg)" queue

slide-11
SLIDE 11

Cartwright 2012 Fall

Computer Sciences 368 Scripting for CHTC

Scripting Simple DAGs

11

slide-12
SLIDE 12

Cartwright 2012 Fall

Computer Sciences 368 Scripting for CHTC

Designing DAGs for Scripting

12

  • Mostly, focus on wide, parallel parts
  • Consider pros and cons of each choice
  • VARS and 1 submit file, or 1 submit file per node?

– Often easier to script one complex DAG submit file – Submit file can specify subdirectories (initialdir)

  • Use sub-directories?

– Same considerations as without DAG – More useful with distinct inputs or lots of output files – Put common files in ../ or ../common/

  • Consider using DAGMan for independent jobs
slide-13
SLIDE 13

Cartwright 2012 Fall

Computer Sciences 368 Scripting for CHTC

Scripting DAG Submit Files

13

def psub(text): ... # add text to submit file psub(dag_submit_header) n = 0 for t in product(parameter_1, parameter_2): n += 1 psub('JOB N%d node.sub DIR node-%d' % (n, n)) psub('RETRY N%d 3 UNLESS-EXIT 1' % (n)) if t[0] < 1.0: psub('PRIORITY N%d 10' % (n)) args = '%d %s' % (n, t[1]) psub('SCRIPT PRE N%d pre.py %s' % (n, args)) psub('PARENT Start CHILD N%d' % (n)) write_node_dir(sources, n, t) psub(dag_submit_footer)

slide-14
SLIDE 14

Cartwright 2012 Fall

Computer Sciences 368 Scripting for CHTC

Setting Up Node Directories

14

  • Much like before, but need to include submit file

# sources: dict from filename to contents def prepare_node_dir(sources, node, params): node_dir = 'node-%d' % (node)

  • s.mkdir(node_dir)

# write node submit file, incl. job arguments node_sub = os.path.join(node_dir, 'node.sub') write_node_submit(node_sub, params) for filename in sources: text = sources[filename] target = os.path.join(dirname, filename) write_template(text, target, params)

slide-15
SLIDE 15

Cartwright 2012 Fall

Computer Sciences 368 Scripting for CHTC

Splices

15

slide-16
SLIDE 16

Cartwright 2012 Fall

Computer Sciences 368 Scripting for CHTC

Understanding Splices

16

  • Reusable DAG fragment, inserted into larger DAG
  • Like a function, if you think about it
  • Common use: write outer DAG once, replace insides
slide-17
SLIDE 17

Cartwright 2012 Fall

Computer Sciences 368 Scripting for CHTC

Splice Syntax

17

  • Like the JOB statement, except it names a DAG file
  • All nodes in splice become part of (outer) DAG
  • Can create PARENT / CHILD relationships for splice,

which affect all of its initial/final nodes

SPLICE name inner-dag-file DIR directory JOB Start start.sub JOB End end.sub SPLICE Diamond1 diamond.dag SPLICE Diamond2 diamond.dag PARENT Start CHILD Diamond1 Diamond2

slide-18
SLIDE 18

Cartwright 2012 Fall

Computer Sciences 368 Scripting for CHTC

Splice Example

18

# Splice JOB A a.sub VARS A x="$(JOB)" JOB B b.sub VARS B x="$(JOB)" PARENT A CHILD B # Outer JOB X x.sub SPLICE Y000 spl.dag ··· SPLICE Y999 spl.dag JOB Z z.sub PARENT X CHILD Y000 PARENT Y000 CHILD Z

X Z

slide-19
SLIDE 19

Cartwright 2012 Fall

Computer Sciences 368 Scripting for CHTC

Sub-DAGs

19

slide-20
SLIDE 20

Cartwright 2012 Fall

Computer Sciences 368 Scripting for CHTC

Understanding Sub-DAGs

20

  • Reusable DAG fragment, submitted by larger DAG
  • Also like a function, if you think about it
  • Splices are better in most cases, except for one…
slide-21
SLIDE 21

Cartwright 2012 Fall

Computer Sciences 368 Scripting for CHTC

SUBDAG Syntax

21

  • Like the JOB statement, except it names a DAG file
  • Nodes in sub-DAG do not become part of DAG
  • DAGman submits inner-dag when job is run

SUBDAG EXTERNAL name inner-dag DIR dir JOB Start start.sub JOB End end.sub SUBDAG EXTERNAL Diamond1 diamond.dag SUBDAG EXTERNAL Diamond2 diamond.dag PARENT Start CHILD Diamond1 Diamond2 PARENT Diamond1 Diamond2 CHILD End

slide-22
SLIDE 22

Cartwright 2012 Fall

Computer Sciences 368 Scripting for CHTC

Running Nested DAGs

  • DAGMan does condor_submit_dag on DAG file

– Hence, another copy of DAGMan is running – If there are many copies, submit machine may suffer

  • Sub-DAG not processed until needed

– Allows for some cool tricks… – Errors not discovered until run-time!

  • Rescue DAGs are complicated, but still work

22

slide-23
SLIDE 23

Cartwright 2012 Fall

Computer Sciences 368 Scripting for CHTC

Dynamic DAGs

23

slide-24
SLIDE 24

Cartwright 2012 Fall

Computer Sciences 368 Scripting for CHTC

The Need for Dynamic DAGs

24

  • Suppose the exact number of parallel jobs depends
  • n some initial (significant) input processing

… or exact number of stages … … or exact DAG shape …

  • We could:

– Run one job to process input, then… – Manually run script to generate rest of DAG – But we want to automate!

  • Dynamic DAG — build (part of) DAG during run
slide-25
SLIDE 25

Cartwright 2012 Fall

Computer Sciences 368 Scripting for CHTC

Dynamic DAGs

  • How to implement:

– In DAG, add one or more SUBDAG EXTERNAL nodes – (Re)Write their DAGMan submit files in earlier node (or, even in the node’s pre-script!)

  • Again, errors not found until sub-DAG is submitted
  • Outer DAG can be very simple and/or generic:

25

slide-26
SLIDE 26

Cartwright 2012 Fall

Computer Sciences 368 Scripting for CHTC

Dynamic DAG Example

  • DAGMan submit file for simple, generic outer DAG:

26

JOB Start start.sub SUBDAG EXTERNAL Innards dynamic.dag JOB End end.sub SCRIPT PRE Innards generate-dag.py PARENT Start CHILD Innards PARENT Innards CHILD End

slide-27
SLIDE 27

Cartwright 2012 Fall

Computer Sciences 368 Scripting for CHTC

Workflow Management Systems

27

slide-28
SLIDE 28

Cartwright 2012 Fall

Computer Sciences 368 Scripting for CHTC

makeflow

28

  • Different way to describe workflow DAG

– Uses syntax like make – Handles data transfers (so does HTCondor/DAGMan) – Highly fault tolerant (so is DAGMan)

  • Works with several distributed computing systems

– HTCondor – Sun Grid Engine (SGE) – Work Queue (also from CCL)

  • From Doug Thain’s Cooperative Computing Lab

http://nd.edu/~ccl/software/makeflow/

slide-29
SLIDE 29

Cartwright 2012 Fall

Computer Sciences 368 Scripting for CHTC

Pegasus WMS

  • Supports higher-level workflow abstractions
  • Compiles down to DAG
  • Works with HTCondor, OSG, Amazon EC2, XSEDE, …
  • Used on a wide variety of complex science projects
  • Lots of cool example applications online
  • From Information Sciences Institute, USC

http://pegasus.isi.edu/

29

slide-30
SLIDE 30

Cartwright 2012 Fall

Computer Sciences 368 Scripting for CHTC

SOAR

  • System Of Automated Runs
  • Automatically scans directories for jobs to run
  • Each “job” can be a complete DAG in itself
  • Puts jobs into DAG and manages workflow
  • Also handles R and MATLAB jobs well
  • Provides extra tracking and reporting tools
  • From Bill Taylor, CHTC Team

http://submit.chtc.wisc.edu/SOAR/

30

slide-31
SLIDE 31

Cartwright 2012 Fall

Computer Sciences 368 Scripting for CHTC

Homework

31

slide-32
SLIDE 32

Cartwright 2012 Fall

Computer Sciences 368 Scripting for CHTC

Homework

  • Script a workflow!
  • Using the Mandelbrot generator again, but adding

the stitching step at the end

  • Note: Use a different universe (scheduler) for the

montage node (only)!

  • If you have an alternate workflow that you would

like to work on instead, talk to me

32