A Practical Approach for a Workflow Management System
Simone Pellegrini, Francesco Giacomini, Antonia Ghiselli
INFN Cnaf Viale B. Pichat, 6/2 40127 Bologna {simone.pellegrini | francesco.giacomini | antonia.ghiselli}@cnaf.infn.it
A Practical Approach for a Workflow Management System Simone - - PowerPoint PPT Presentation
A Practical Approach for a Workflow Management System Simone Pellegrini, Francesco Giacomini, Antonia Ghiselli INFN Cnaf Viale B. Pichat, 6/2 40127 Bologna {simone.pellegrini | francesco.giacomini | antonia.ghiselli}@cnaf.infn.it Outline
INFN Cnaf Viale B. Pichat, 6/2 40127 Bologna {simone.pellegrini | francesco.giacomini | antonia.ghiselli}@cnaf.infn.it
– Thanks to workflows, the processes' business logic
– several languages for workflow description exist:
– Change of the WfMS has a high cost:
– Petri-Nets based
– Independent from the underlying Grid middleware – Multi-language:
– Deals with interoperability
Workflow Management Workflow Management System System
Workflow Gateway Workflow Gateway Grid Abstraction Layer Grid Abstraction Layer Grid Middleware/s Grid Middleware/s Workflow Engine Workflow Engine
Layered architecture Language Interoperability Engine Interoperability Execution nodes Storage nodes Aims at Grid middleware independence Aims at language independence
JDL, GworkflowDL, BPEL
Grid Abstraction Layer Grid Abstraction Layer
Dispatcher Observer Data Transfer Reservation
– Dispatcher: Job submission/cancellation – Data Transfer: Move data between Grid nodes – Observer: Monitor submitted job status – Reservation: Resource reservation
Workflow Gateway Workflow Gateway
Parser Model Translator Dag
BPEL gLite (classAd) JSDL
Compiler Pi-Calculus
– Parser: Extracts
– Compiler: Produces a workflow representation
– Part of a workflow (or a sub-workflow) can be
– Legacy workflows can be executed on our WfMS
– Petri-Net base – Micro-Kernel architecture:
– Distributed
– Services maturity – Reliability – Large adoption
– simplify interaction with Grid services
– Client sends the workflow description to the server; – WfMS server manages workflows execution;
Grid Workflow Description WFMS Server Monitoring Client Submitted Jobs
Interface to the Grid is provided by the WMS API. Workflow running instance
– Client submits a workflow to the Grid via the WMS and
– The WfMS ends up running on a Grid node
Grid
Workflow Description submit
WFMS running instances
Monitoring
Client
– The WfMS instance
– record-like structure composed of a finite number of
DAG Model
[ Type = “dag”; [ ... ] nodes = [ father = [ ... ]; son1 = [ ... ]; son2 = [ ... ]; final = [ ... ]; dependencies = { {father, {son1, son2}}, {son1, final}, {son2, final} }; ]; ]
father son1 son2 final
Job Dependencies JDL parser + Model extractor
– Acts as a Meta-Scheduler for Condor jobs – Submits job respecting their inter-dependencies – In case of job failure, DAGMan continues until it can
– lack of error handling – lack of task types other than computation
– a DAG node can be represented by a Petri Net
– the flow of data among DAG nodes is modeled
DAG Model
father son1 son2 final
Petri-Net Model
P4 init father P1 P2 P3 son1 son2 final end DAG to Petri Net Model Converter
Petri-Net Model
P4 init father P1 P2 P3 son1 son2 final end
<workflow [...]> <place ID=”init” /> <place ID=”P1” /> <place ID=”P2” /> [...] <place ID=”end” /> <transition ID=”father”> <inputPlace placeID=”init”/> <outputPlace placeID=”P1”/> <outputPlace placeID=”P2”/> <operation /> </transition> [...] </workflow>
Compiler
– The WMS serves job submission request returning
– The ID is used to query (or register for notifications
JobExecute JobExecute
P1
P1-1 JobRegister jobID jdl InputSandbox P1-2 data movement JobStart P1-3
wait_for_termination
P1-4 P1-5
P2
if(result.FAIL) if(result.SUCCESS) data movement
Recovery Strategy
result
Task Task Execution Execution
P1
JobExecute
P2
wait_for_termination (polling) wait_for_termination (polling)
P1-3-1
getJobState P1-3 N sec jobID if(s.job_running) if(s.job_done || s.job_fail) P1-4 s
– getJobState() operation of the LB service.
wait_for_termination (notify) wait_for_termination (notify)
jobID P1-4 P1-3 P1-3 N sec do_polling result
– Other WfMS can easily take advantage from
– customizable target language depending on the
– solves low level aspects of workflow management