Converting Scripts into Reproducible Workflow Research Objects
Lucas A. M. C. Carvalho, Khalid Belhajjame, Claudia Bauzer Medeiros lucas.carvalho@ic.unicamp.br
Baltimore, Maryland, USA October 23-26, 2016
Converting Scripts into Reproducible Workflow Research Objects - - PowerPoint PPT Presentation
Converting Scripts into Reproducible Workflow Research Objects Lucas A. M. C. Carvalho, Khalid Belhajjame, Claudia Bauzer Medeiros lucas.carvalho@ic.unicamp.br Baltimore, Maryland, USA October 23-26, 2016 Background and Motivation
Lucas A. M. C. Carvalho, Khalid Belhajjame, Claudia Bauzer Medeiros lucas.carvalho@ic.unicamp.br
Baltimore, Maryland, USA October 23-26, 2016
2
– Collection of scripts, programs and (big) data
Papers
3
– Collection of scripts, programs and (big) data
Papers How to understand, reproduce or reuse data and models of experiments?
4
– Collection of scripts, programs and (big) data
Manual collection and
Papers How to understand, reproduce or reuse data and models of experiments?
5
What are the inputs and outputs? How to change this local program for a similar web service?
Example of script code.
Difficult to understand, to reuse, and to reproduce.
6
Example of Scientific Workflow Management System.
7
Create Understand Reuse Reproduce
8
Create Understand Reuse Reproduce
9
Create Understand Reuse Reproduce
Step 2 Step 1 Step 3 Step 4 Step 5 Methodology
10
11
– Domain experts who understand the experiment, and
– Scientists who are also familiar with workflow and
– Computer scientists who are familiar enough with the
– Responsible for authoring, documenting and
12
1 2 3 4 5
13
1 Activity 1 Port 1 Port 2 Port 3
Port 1 Port 2
Activity 2 Port 3
Port 3
Activity n Port n
Script-based experiment. Abstract workflow.
14
2
Executable workflow. Script-based experiment.
15
3 Local (a) (b) Algorithm A Algorithm B
16
4 Activity 1 Output 1 Output 2
wasGeneratedBy wasGeneratedBy
Sample
used “2012-06-01” wasStartedAt
Activity 2
used
Lucas Workflow Run
wasAssociatedWith used
17
5
Abstract workfmows Concrete workfmows Annotations Papers and Reports Provenance Authors Scripts Data
18 Script
Generate Abstract Workfmow Generate Abstract Workfmow Create an executable workfmow Create an executable workfmow Refjne workfmow Refjne workfmow Bundle Resources into a Research Object Bundle Resources into a Research Object Annotate and check quality Annotate and check quality
Abstract workfmow Concrete workfmow
2 1 3 4 5
19
Research Object Model
20
– Many branches of material sciences, computational
– Scripts (shell script), programs (NAMD, VMD, Fortran) – Phases: set up, simulation and analysis of trajectories. – Inputs: protein structure, simulation parameters and
– Output: trajectories and analysis results.
21
Generate Abstract Workfmow
1
Script code.
22
Generate Abstract Workfmow
1 Manually annotate
Script code. Annotated script code.
23
Generate Abstract Workfmow
1 Manually annotate Create workflow-like view
Script code. Annotated script code. Abstract workflow.
24
Generate Abstract Workfmow
1 code blocks Input/ouput YesWorkflow McPhillips et. al, 2015
independent tool for recovering workflow information from scripts,” International Journal of Digital Curation, vol. 10, no. 1, pp. 298–313, 2015.
Create Workflow-like view
Abstract workflow. Annotated script code.
25
Generate Abstract Workfmow
1 Create Workflow-like view
Abstract workflow. Annotated script code.
26
Create an executable workfmow
2
Abstract workflow.
27
Create an executable workfmow
2 Create implementation
Copy code blocks from the script.
Abstract workflow. Executable workflow.
28
Create an executable workfmow
2 Create implementation
Copy code blocks from the script.
Abstract workflow. Executable workflow.
29
Create an executable workfmow
2 Create implementation
Copy code blocks from the script.
Abstract workflow. Executable workflow. Script code.
30
Refjne executable workfmow
3 Modify resources:
Executable workflow. New workflow version.
31
Refjne executable workfmow
3 Create new version Modify resources:
Executable workflow. New workflow version.
32
2 3
wasEnactedBy
split Output 1 Output 2
wasGeneratedBy wasGeneratedBy
Sample
used “2012-06-01” wasStartedAt
psgen
used
Lucas Workflow Run
wasAssociatedWith used hasSpecification
W3C PROV
Executable workflow.
33
2 3
wasDerivedFrom wasDerivedFrom wasDerivedFrom wasAssociatedWith
Curator Curator
W3C PROV
Executable workflow. New workflow version. Script code.
34
Annotate and check quality
– To check the quality of the conversion process.
4
35
Annotate and check quality
4
Script code. Executable workflow.
36
Annotate and check quality
4
Workflow version. Initial Executable workflow.
37
Annotate and check quality
– not clearly identified the main logical processing
– a mistake when migrating script code into the
– not provided the correct input files and parameters; – the coding of the workflow itself contained errors.
4
38
Bundle Resources into a Research Object
5
Script Abstract workfmow Concrete workfmow(s) Annotations Paper Provenance Data Attributions
39
40
– elaborated based on requirements; – showcased via a real world use case from the field of Molecular
Dynamics;
– Scientific Workflows, YesWorkflow, Research Objects, the W3C
PROV recommendations and the Web Annotation Data Model.
41
42
– Center for Computational Engineering & Sciences
Lucas A. M. C. Carvalho, Khalid Belhajjame, Claudia Bauzer Medeiros lucas.carvalho@ic.unicamp.br
Baltimore, Maryland, USA October 23-26, 2016