Challenges in managing implicit and abstract provenance data: - - PowerPoint PPT Presentation

challenges in managing implicit and abstract provenance
SMART_READER_LITE
LIVE PREVIEW

Challenges in managing implicit and abstract provenance data: - - PowerPoint PPT Presentation

Workshop on the Theory and Practice of Provenance 2011 Challenges in managing implicit and abstract provenance data: experiences with ProvManager Anderson Marinho, Marta Mattoso, Cludia Werner, Vanessa Braganholo, Leonardo Murta


slide-1
SLIDE 1

Challenges in managing implicit and abstract provenance data: 
 experiences with ProvManager


Anderson Marinho, Marta Mattoso, Cláudia Werner, Vanessa Braganholo, Leonardo Murta

Workshop on the Theory and Practice of Provenance 2011 Federal University of Rio de Janeiro (UFRJ), Brazil Fluminense Federal University (UFF), Brazil

slide-2
SLIDE 2

Problem/Motivation

  • Some challenges in managing provenance
  • Which provenance data should be gathered?
  • Open Provenance Model is a possible solution
  • How to capture provenance data?
  • Three levels: Workflow, Operating System, Activity
  • How to manage provenance in workflows that are

executed by different execution environments (distributed environments)?

2

slide-3
SLIDE 3

ProvManager - Overview of provenance gathering strategy

ProvManager

Taverna Kepler VisTrails

Experiment

  • 6. Provenance data sent

by the activities Scientist

  • 5. Run
  • 4. Load the adapted workflow

specifications in the SWfMS

  • 2. Publish the

workflow specifications

  • 1. Collect the workflow

specifications from SWfMS

3

  • 3. Obtain the adapted

workflow specifications [SWF 2009; IPAW, 2010]

slide-4
SLIDE 4

Workflow instrumentation

4

A B Original workflow

slide-5
SLIDE 5

Workflow instrumentation

5

A B

PGA

PGA PGA A B Original workflow Instrumenting workflow

PGA PGA

slide-6
SLIDE 6

Workflow instrumentation

6

A B

PGA PGA PGA

A B

PGA

PGA PGA A B Original workflow Instrumenting workflow Wrapping activities

PGA PGA

A’ B’

slide-7
SLIDE 7

Workflow instrumentation

7

A B

PGA PGA PGA

A B

PGA

PGA PGA A B A’ B’ Original workflow Instrumenting workflow Wrapping activities Adapted workflow

PGA PGA

A’ B’

slide-8
SLIDE 8

There are still some problems…

  • Implicit provenance data
  • Difficulty in gathering provenance data when these

are not explicitly declared in the workflow specification

  • Lack of higher provenance abstraction levels
  • Concrete workflow related provenance data are not

enough to help scientists in the experiment analysis

  • Some scientists may not be used to such information

8

slide-9
SLIDE 9

Implicit provenance data

9

A B C OS domain SWfMS domain D

img.jpg C:\data res.zip “C:\data” “C:\res.zip” {1,3,5}

slide-10
SLIDE 10

Analysis of Movements of P latform P rosim

C

  • nceptual

Abstract C

  • ncrete

Lack of higher provenance abstraction levels

10

Problem: scientist does not easily relate data from one abstraction level to the other Questions such as “what is the result data of the ‘analysis of platform movements’ activity?” can not be easily answered

slide-11
SLIDE 11

Some ideas…

  • Implicit data
  • Adopt a OS level provenance gathering mechanism to

work together with the PGA in the ProvManager

  • There is a similar initiative in VisTrails (Koop et al. (2010))
  • Lack of higher provenance abstraction levels
  • Create a “conceptual provenance data” model
  • Salayandia and da Silva (2010) propose something similar
  • Map this model to the existing “concrete provenance data”

11

slide-12
SLIDE 12

Challenges in managing implicit and abstract provenance data: 
 experiences with ProvManager


Anderson Marinho, Marta Mattoso, Cláudia Werner, Vanessa Braganholo, Leonardo Murta

Workshop on the Theory and Practice of Provenance 2011 Federal University of Rio de Janeiro (UFRJ), Brazil Fluminense Federal University (UFF), Brazil