Sharing Experiments and their Provenance David Koop Juliana Freire - - PowerPoint PPT Presentation

sharing experiments and their provenance
SMART_READER_LITE
LIVE PREVIEW

Sharing Experiments and their Provenance David Koop Juliana Freire - - PowerPoint PPT Presentation

Sharing Experiments and their Provenance David Koop Juliana Freire Large-Scale Visualization and Data Analysis (VIDA) Center Polytechnic Institute of New York University www.vistrails.org NSF Community Codes 2012 Science Today 011100101


slide-1
SLIDE 1

www.vistrails.org NSF Community Codes 2012

Sharing Experiments and their Provenance

David Koop Juliana Freire

Large-Scale Visualization and Data Analysis (VIDA) Center Polytechnic Institute of New York University

slide-2
SLIDE 2

NSF Community Codes 2012 www.vistrails.org

Science Today

2

011100101 111001011 001001101 101010110 111000110

Collect/Generate/Obtain

Data

Filter/Analyze/Visualize

Results

Publish/Share

Findings

slide-3
SLIDE 3

NSF Community Codes 2012 www.vistrails.org

Science Today

  • There’s more...
  • Revisit or extend the initial result
  • Share with a colleague who wants to reproduce an experiment
  • Investigate the effect of new techniques in the same framework
  • Determine how flawed data or algorithms impacted results

2

011100101 111001011 001001101 101010110 111000110

Collect/Generate/Obtain

Data

Filter/Analyze/Visualize

Results

Publish/Share

Findings

slide-4
SLIDE 4
  • Goals:
  • Capture necessary provenance
  • Support reproducibility
  • Improve sharing and collaboration

NSF Community Codes 2012 www.vistrails.org

Provenance, Reproducibility, and Sharing

3 Text

011100101 111001011 001001101 101010110 111000110

Data Workflows Source Code Libraries Results Visualizations

slide-5
SLIDE 5

NSF Community Codes 2012 www.vistrails.org

Demo

4

  • 0.1
  • 0.05

0.05 0.1

coupling parameter θ / π

1 1 2 2 3 3 ground-state degeneracry splitting (E1-E0) x 1000 L = 4 L = 6 L = 8 L = 10

non-Hermitian DYL model

  • FIG. 6. (color online) Ground-state degeneracy splitting of the non-

Hermitian doubled Yang-Lee model when perturbed by a string ten- sion (θ 6= 0).

Galois Conjugates of Topological Phases

  • M. H. Freedman,1 J. Gukelberger,2 M. B. Hastings,1 S. Trebst,1 M. Troyer,2 and Z. Wang1

1Microsoft Research, Station Q, University of California, Santa Barbara, CA 93106, USA 2Theoretische Physik, ETH Zurich, 8093 Zurich, Switzerland

(Dated: July 6, 2011) Galois conjugation relates unitary conformal field theories (CFTs) and topological quantum field theories (TQFTs) to their non-unitary counterparts. Here we investigate Galois conjugates of quantum double models, such as the Levin-Wen model. While these Galois conjugated Hamiltonians are typically non-Hermitian, we find that their ground state wave functions still obey a generalized version of the usual code property (local operators do not act on the ground state manifold) and hence enjoy a generalized topological protection. The key question addressed in this paper is whether such non-unitary topological phases can also appear as the ground states of Hermitian Hamiltonians. Specific attempts at constructing Hermitian Hamiltonians with these ground states lead to a loss of the code property and topological protection of the degenerate ground states. Beyond this we rigorously prove that no local change of basis (IV.5) can transform the ground states of the Galois conjugated doubled Fibonacci theory into the ground states of a topological model whose Hermitian Hamiltonian satisfies Lieb-Robinson bounds. These include all gapped local or quasi-local Hamiltonians. A similar statement holds for many other non-unitary TQFTs. One consequence is that the “Gaffnian” wave function cannot be the ground state of a gapped fractional quantum Hall state.

PACS numbers: 05.30.Pr, 73.43.-f

I. INTRODUCTION

Galois conjugation, by definition, replaces a root of a poly- nomial by another one with identical algebraic properties. For example, i and −i are Galois conjugate (consider z2 + 1 = 0) as are φ = 1+

√ 5 2

and − 1

φ = 1− √ 5 2

(consider z2 − z − 1 = 0), as well as

3

√ 2,

3

√ 2e2πi/3, and

3

√ 2e−2πi/3 (consider z3 − 2 = 0). In physics Galois conjugation can be used to convert non- unitary conformal field theories (CFTs) to unitary ones, and vice versa. One famous example is the non-unitary Yang-Lee CFT, which is Galois conjugate to the Fibonacci CFT (G2)1, the even (or integer-spin) subset of su(2)3. In statistical mechanics non-unitary conformal field theo- ries have a venerable history.1,2 However, it has remained less clear if there exist physical situations in which non-unitary models can provide a useful description of the low energy physics of a quantum mechanical system – after all, Galois conjugation typically destroys the Hermitian property of the

  • Hamiltonian. Some non-Hermitian Hamiltonians, which sur-

prisingly have totally real spectrum, have been found to arise in the study of PT-invariant one-particle systems3 and in some Galois conjugate many-body systems4 and might be seen to open the door a crack to the physical use of such

  • models. Another situation, which has recently attracted some

interest, is the question whether non-unitary models can de- scribe 1D edge states of certain 2D bulk states (the edge holo- graphic for the bulk). In particular, there is currently a discus- sion on whether or not the “Gaffnian” wave function could be the ground state for a gapped fractional quantum Hall (FQH) state albeit with a non-unitary “Yang-Lee” CFT describing its edge.5–7 We conclude that this is not possible, further restrict- ing the possible scope of non-unitary models in quantum me- chanics. We reach this conclusion quite indirectly. Our main thrust is the investigation of Galois conjugation in the simplest non- Abelian Levin-Wen model.8 This model, which is also called “DFib”, is a topological quantum field theory (TQFT) whose states are string-nets on a surface labeled by either a triv- ial or “Fibonacci” anyon. From this starting point, we give a rigorous argument that the “Gaffnian” ground state cannot be locally conjugated to the ground state of any topological phase, within a Hermitian model satisfying Lieb-Robinson (LR) bounds9 (which includes but is not limited to gapped local and quasi-local Hamiltonians). Lieb-Robinson bounds are a technical tool for local lattice

  • models. In relativistically invariant field theories, the speed of

light is a strict upper bound to the velocity of propagation. In lattice theories, the LR bounds provide a similar upper bound by a velocity called the LR velocity, but in contrast to the rel- ativistic case there can be some exponentially small “leakage”

  • utside the light-cone in the lattice case. The Lieb-Robinson

bounds are a way of bounding the leakage outside the light-

  • cone. The LR velocity is set by microscopic details of the

Hamiltonian, such as the interaction strength and range. Com- bining the LR bounds with the spectral gap enables us to prove locality of various correlation and response functions. We will call a Hamiltonian a Lieb-Robinson Hamiltonian if it satisfies LR bounds. We work primarily with a single example, but it should be clear that the concept of Galois conjugation can be widely ap- plied to TQFTs. The essential idea is to retain the particle types and fusion rules of a unitary theory but when one comes to writing down the algebraic form of the F-matrices (also called 6j symbols), the entries are now Galois conjugated. A slight complication, which is actually an asset, is that writing an F-matrix requires a gauge choice and the most convenient choice may differ before and after Galois conjugation. Our method is not restricted to Galois conjugated DFibG and its factors FibG and FibG, but can be generalized to in- finitely many non-unitary TQFTs, showing that they will not arise as low energy models for a gapped 2D quantum mechan-

arXiv:1106.3267v3 [cond-mat.str-el] 5 Jul 2011

slide-6
SLIDE 6

NSF Community Codes 2012 www.vistrails.org

Benefits of Provenance-Rich Publications

  • Produce more knowledge–not just text
  • Allow scientists to stand on the shoulders of giants (and their own)
  • Science can move faster!
  • Higher-quality publications
  • Authors will be more careful
  • Many eyes to check results
  • Describe more of the discovery process: people only describe

successes, can we learn from mistakes?

  • Expose users to different techniques and tools: expedite their

training; and potentially reduce their time to insight

5

slide-7
SLIDE 7

NSF Community Codes 2012 www.vistrails.org

VisTrails

  • Combines features of visualization, data analysis, and scientific

workflow systems

  • Orchestrate multiple tools and libraries (e.g., VTK, R, matplotlib)
  • Visual spreadsheet for comparing results
  • Tracks provenance automatically as users generate and test

hypotheses

  • Leverages provenance to streamline exploration
  • Supports reflective reasoning and collaboration
  • Concerned with usability

6

slide-8
SLIDE 8

NSF Community Codes 2012 www.vistrails.org

VisTrails

  • Open-source, freely downloadable system (www.vistrails.org)
  • Also on github (github.com/vistrails)
  • Multi-platform: users on Mac, Linux, and Windows
  • Python code and uses PyQt and Qt for the interface
  • Over 35,000 downloads
  • User’s guide, wiki, and mailing list
  • Many users in different disciplines and countries:

7

  • Visualizing environmental simulations (CMOP STC)
  • Simulation for solid, fluid and structural mechanics

(Galileo Network, UFRJ Brazil)

  • Quantum physics simulations (ALPS, ETH Zurich)
  • Climate analysis (CDAT)
  • Habitat modeling (USGS)
  • Open Wildland Fire Modeling (U. Colorado, NCAR)
  • High-energy physics (LEPP

, Cornell)

  • Cosmology simulations (LANL)
  • Using tms for improving memory (Pyschiatry, U.

Utah)

  • eBird (Cornell, NSF DataONE)
  • Astrophysical Systems (Tohline, LSU)
  • NIH NBCR (UCSD)
  • Pervasive Technology Labs (Heiland, Indiana

University)

  • Linköping University
  • University of North Carolina, Chapel Hill
  • UTEP
slide-9
SLIDE 9

NSF Community Codes 2012 www.vistrails.org

DataONE Integration

  • Distributed framework and

sustainable cyberinfrastructure to access well-described and easily discovered observational data

  • Have VisTrails package to access

data from DataONE

8

slide-10
SLIDE 10

NSF Community Codes 2012 www.vistrails.org

USGS Habitat Modeling

9

  • [Morisette et al., 2012]
slide-11
SLIDE 11
  • Climate-specific app built on VisTrails workflows and provenance

NSF Community Codes 2012 www.vistrails.org

UV-CDAT: Climate Analysis

10

Variables Visualization Properties Visual Spreadsheet Plots & Analyses Project Workspace

[Santos et al., 2012] [uv-cdat.llnl.gov]

slide-12
SLIDE 12

NSF Community Codes 2012 www.vistrails.org

Workflows

data = vtk.vtkStructuredPointsReader() data.SetFileName(../examples/data/head.120.vtk) contour = vtk.vtkContourFilter() contour.SetInput(data.GetOutput()) contour.SetValue(0, 67) mapper = vtk.vtkPolyDataMapper() mapper.SetInput(contour.GetOutput()) mapper.ScalarVisibilityOff() actor = vtk.vtkActor() actor.SetMapper(mapper) cam = vtk.vtkCamera() cam.SetViewUp(0,0,-1) cam.SetPosition(745,-453,369) cam.SetFocalPoint(135,135,150) cam.ComputeViewPlaneNormal() ren = vtk.vtkRenderer() ren.AddActor(actor) ren.SetActiveCamera(cam) ren.ResetCamera() renwin = vtk.vtkRenderWindow() renwin.AddRenderer(ren) style = vtk.vtkInteractorStyleTrackballCamera() iren = vtk.vtkRenderWindowInteractor() iren.SetRenderWindow(renwin) iren.SetInteractorStyle(style) iren.Initialize() iren.Start() 11

slide-13
SLIDE 13

NSF Community Codes 2012 www.vistrails.org

Workflows

data = vtk.vtkStructuredPointsReader() data.SetFileName(../examples/data/head.120.vtk) contour = vtk.vtkContourFilter() contour.SetInput(data.GetOutput()) contour.SetValue(0, 67) mapper = vtk.vtkPolyDataMapper() mapper.SetInput(contour.GetOutput()) mapper.ScalarVisibilityOff() actor = vtk.vtkActor() actor.SetMapper(mapper) cam = vtk.vtkCamera() cam.SetViewUp(0,0,-1) cam.SetPosition(745,-453,369) cam.SetFocalPoint(135,135,150) cam.ComputeViewPlaneNormal() ren = vtk.vtkRenderer() ren.AddActor(actor) ren.SetActiveCamera(cam) ren.ResetCamera() renwin = vtk.vtkRenderWindow() renwin.AddRenderer(ren) style = vtk.vtkInteractorStyleTrackballCamera() iren = vtk.vtkRenderWindowInteractor() iren.SetRenderWindow(renwin) iren.SetInteractorStyle(style) iren.Initialize() iren.Start() 11 PythonSource

slide-14
SLIDE 14

NSF Community Codes 2012 www.vistrails.org

vtkActor VTKCell vtkRenderer vtkContourFilter vtkStructuredPointsReader vtkDataSetMapper vtkCamera

Workflows

data = vtk.vtkStructuredPointsReader() data.SetFileName(../examples/data/head.120.vtk) contour = vtk.vtkContourFilter() contour.SetInput(data.GetOutput()) contour.SetValue(0, 67) mapper = vtk.vtkPolyDataMapper() mapper.SetInput(contour.GetOutput()) mapper.ScalarVisibilityOff() actor = vtk.vtkActor() actor.SetMapper(mapper) cam = vtk.vtkCamera() cam.SetViewUp(0,0,-1) cam.SetPosition(745,-453,369) cam.SetFocalPoint(135,135,150) cam.ComputeViewPlaneNormal() ren = vtk.vtkRenderer() ren.AddActor(actor) ren.SetActiveCamera(cam) ren.ResetCamera() renwin = vtk.vtkRenderWindow() renwin.AddRenderer(ren) style = vtk.vtkInteractorStyleTrackballCamera() iren = vtk.vtkRenderWindowInteractor() iren.SetRenderWindow(renwin) iren.SetInteractorStyle(style) iren.Initialize() iren.Start() 11

slide-15
SLIDE 15

NSF Community Codes 2012 www.vistrails.org

vtkActor VTKCell vtkRenderer vtkContourFilter vtkStructuredPointsReader vtkDataSetMapper vtkCamera

Workflows

data = vtk.vtkStructuredPointsReader() data.SetFileName(../examples/data/head.120.vtk) contour = vtk.vtkContourFilter() contour.SetInput(data.GetOutput()) contour.SetValue(0, 67) mapper = vtk.vtkPolyDataMapper() mapper.SetInput(contour.GetOutput()) mapper.ScalarVisibilityOff() actor = vtk.vtkActor() actor.SetMapper(mapper) cam = vtk.vtkCamera() cam.SetViewUp(0,0,-1) cam.SetPosition(745,-453,369) cam.SetFocalPoint(135,135,150) cam.ComputeViewPlaneNormal() ren = vtk.vtkRenderer() ren.AddActor(actor) ren.SetActiveCamera(cam) ren.ResetCamera() renwin = vtk.vtkRenderWindow() renwin.AddRenderer(ren) style = vtk.vtkInteractorStyleTrackballCamera() iren = vtk.vtkRenderWindowInteractor() iren.SetRenderWindow(renwin) iren.SetInteractorStyle(style) iren.Initialize() iren.Start() 11

  • Orchestrate multiple tools
  • Structured: easier to understand
  • Natural granularity for tracking

modifications

  • Simpler maintenance
slide-16
SLIDE 16

NSF Community Codes 2012 www.vistrails.org

Making code available in VisTrails

  • Package infrastructure
  • Wrap python libraries, command-line calls, or use other interfaces

(jpype, rpy, etc.)

  • Need to specify:
  • 1. Package identification information
  • 2. Module structures: input & output ports
  • 3. Compute method for each module

12

slide-17
SLIDE 17

NSF Community Codes 2012 www.vistrails.org

Example: Wrapping an existing python library

  • seawater python package:
  • http://pypi.python.org/pypi/seawater/1.0.3

13

identifier = 'org.ocefpaf.seawater' version = '1.0.3' name = 'Seawater Routines' import seawater class SaturationN2(Module): _input_ports = [('S', Float), ('T', Float)] _output_ports = [('res', Float)] def compute(self): s = self.getInputFromPort("S") t = self.getInputFromPort("T") res = seawater.satN2(s, t) self.setResult('res', res) _modules = [SaturationN2,

slide-18
SLIDE 18

NSF Community Codes 2012 www.vistrails.org

  • Change-based Provenance
  • Undo/redo stacks are linear!
  • We lose history of exploration
  • Old Solution: User saves files/state
  • VisTrails Solution:
  • Automatically & transparently capture

entire history as a tree

  • Users can tag or annotate each version
  • Users can go back to any version by

selecting it in the tree

14

slide-19
SLIDE 19

NSF Community Codes 2012 www.vistrails.org

Isosurface Script Volume Rendering SW Combined Rendering HW Clipping Plane HW Volume Rendering HW Clipping Plane SW Histogram Combined Rendering SW Image Slices HW Isosurface Image Slices SW

Representing Provenance: Version Tree

15

slide-20
SLIDE 20

NSF Community Codes 2012 www.vistrails.org

Isosurface Script Volume Rendering SW Combined Rendering HW Clipping Plane HW Volume Rendering HW Clipping Plane SW Histogram Combined Rendering SW Image Slices HW Isosurface Image Slices SW

Representing Provenance: Version Tree

15

vtkActor VTKCell vtkRenderer vtkContourFilter vtkStructuredPointsReader vtkDataSetMapper vtkCamera vtkVolumeTextureMapper3D vtkStructuredPointsReader vtkColorTransferFunction vtkPiecewiseFunction VTKCell vtkVolumeProperty vtkCamera vtkRenderer vtkVolume vtkActor VTKCell vtkRenderer vtkContourFilter vtkStructuredPointsReader vtkDataSetMapper vtkCamera MplPlot MplFigureCell MplFigure

Volume Rendering HW Histogram Isosurface

slide-21
SLIDE 21

NSF Community Codes 2012 www.vistrails.org

Structure of Changes

16 Histogram Isosurface

vtkActor VTKCell vtkRenderer vtkContourFilter vtkStructuredPointsReader vtkDataSetMapper vtkCamera vtkActor VTKCell vtkRenderer vtkContourFilter vtkStructuredPointsReader vtkDataSetMapper vtkCamera MplPlot MplFigureCell MplFigure

Change 1 (add module): add module MplPlot Change 2 (change configuration): add function source(“vspr = self.getInputFromPort(...”) Change 3 (add connection): add connection vtkStructuredPointsReader → MplPlot Change 4 (paste): add module MplFigure add module MplFigureCell add connection MplFigure → MplFigureCell Change 5 (add connection): add connection MplPlot → MplFigre

[Freire et al., 2006]

slide-22
SLIDE 22

NSF Community Codes 2012 www.vistrails.org

vtkActor VTKCell vtkRenderer vtkContourFilter vtkStructuredPointsReader vtkDataSetMapper vtkCamera

Execution Provenance

17

<module id="12" name="vtkDataSetReader" start_time="2010-02-19 11:01:05" end_time="2010-02-19 11:01:07"> <annotation key="hash" value="c54bea63cb7d912a43ce"/> </module> <module id="13" name="vtkContourFilter" start_time="2010-02-19 11:01:07" end_time="2010-02-19 11:01:08"/> <module id="15" name="vtkDataSetMapper" start_time="2010-02-19 11:01:09" end_time="2010-02-19 11:01:12"/> <module id="16" name="vtkActor" start_time="2010-02-19 11:01:12" end_time="2010-02-19 11:01:13"/> <module id="17" name="vtkCamera" start_time="2010-02-19 11:01:13" end_time="2010-02-19 11:01:14"/> <module id="18" name="vtkRenderer" start_time="2010-02-19 11:01:14" end_time="2010-02-19 11:01:14"/> ...

slide-23
SLIDE 23

NSF Community Codes 2012 www.vistrails.org

Provenance: Beyond Reproducibility

  • Support reflective reasoning
  • Compare data products
  • Explore parameter spaces and compare results
  • Suggest new directions

18

slide-24
SLIDE 24

NSF Community Codes 2012 www.vistrails.org

Reflective Reasoning

19

Knowledge Data Data Products Specification Computation Perception & Cognition [Modified from Van Wijk, Vis 2005]

slide-25
SLIDE 25
  • Data analysis and visualization are iterative processes
  • In exploratory tasks, change is the norm!

NSF Community Codes 2012 www.vistrails.org

Reflective Reasoning

19

Knowledge Data Data Products Specification Computation Perception & Cognition Exploration [Modified from Van Wijk, Vis 2005]

“Reflective thought requires the ability to store temporary results, to make inferences from stored knowledge, and to follow chains of reasoning backward and forward, sometimes backtracking when a promising line of thought proves to be

  • unfruitful. The process takes time.” – Donald A. Norman
slide-26
SLIDE 26
  • Workflow Differences

NSF Community Codes 2012 www.vistrails.org

Exploring and Comparing Data & Results

20

vtkActor VTKCell vtkRenderer vtkContourFilter vtkStructuredPointsReader vtkDataSetMapper vtkCamera vtkVolumeTextureMapper3D vtkStructuredPointsReader vtkColorTransferFunction vtkPiecewiseFunction VTKCell vtkVolumeProperty vtkCamera vtkRenderer vtkVolume

slide-27
SLIDE 27
  • Workflow Differences

NSF Community Codes 2012 www.vistrails.org

Exploring and Comparing Data & Results

20

vtkActor VTKCell vtkRenderer vtkContourFilter vtkStructuredPointsReader vtkDataSetMapper vtkCamera vtkVolumeTextureMapper3D vtkStructuredPointsReader vtkColorTransferFunction vtkPiecewiseFunction VTKCell vtkVolumeProperty vtkCamera vtkRenderer vtkVolume vtkActor VTKCell vtkRenderer vtkContourFilter vtkStructuredPointsReader vtkDataSetMapper vtkCamera vtkVolumeTextureMapper3D vtkColorTransferFunction vtkPiecewiseFunction vtkVolumeProperty vtkVolume

slide-28
SLIDE 28
  • Parameter Exploration

NSF Community Codes 2012 www.vistrails.org

Exploring and Comparing Data & Results

21

slide-29
SLIDE 29

NSF Community Codes 2012 www.vistrails.org

VisComplete

22

  • Similar to textual completions on the web and in user interfaces
  • Mine provenance collection: Identify fragments that co-occur in a

collection of workflows

  • Predict sets of likely workflow additions to a given partial workflow

[Koop et al., 2008]

slide-30
SLIDE 30

NSF Community Codes 2012 www.vistrails.org

VTKCell vtkRenderer vtkActor vtkPolyDataMapper vtkTubeFilter vtkStreamTracer vtkDataSetReader VTKCell vtkRenderer vtkActor vtkDataSetMapper vtkContourFilter vtkDataSetReader VTKCell vtkRenderer vtkActor vtkPolyDataMapper vtkGlyph3D vtkMaskPoints vtkDataSetReader

VisComplete

23

slide-31
SLIDE 31

NSF Community Codes 2012 www.vistrails.org

Sharing and Collaboration

24

  • Packaging: maintain vistrail file/database that contains all workflow

versions, packages used, user/date/time stamps, mashups

  • Multiple users can work on the same vistrail
  • Working on allowing users to more easily include code and data
  • Stronger links from provenance to actual data
  • Workflow Mashups: simplify interaction in intuitive interfaces
  • crowdLabs: a social web site for sharing workflows and provenance
  • www.crowdlabs.org
  • Upload workflows from VisTrails
  • Run workflows from a web browser
  • Explore parameterizations from a web browser using mashups
slide-32
SLIDE 32

NSF Community Codes 2012 www.vistrails.org

Support multiple users

  • Provenance allows others to see what you have done, how you

computed it, and build from that

  • Distributed like modern version control systems (e.g. git)

25

  • User 2

User 3 User 1

User 1 User 2 User 3

[Ellkvist et al., 2008]

slide-33
SLIDE 33

NSF Community Codes 2012 www.vistrails.org

Linking Provenance and Data

  • Filenames are often the mode of

identification in data exploration

  • We might also use URIs or access

curated data stores

  • Can this always be expected for

exploratory tasks?

  • What happens if offline?
  • Solution:
  • Managed store for data

associated with computations

  • Improved data identification
  • Automatic versioning

26 <workflow_exec id=”1”> <m_exec id=”5” name=”vtkStructuredDataReader” package=”edu.utah.sci.vistrails.vtk” version=”5.6.0”> <param id=”2” name=”SetFile” value=”/MyData/05-12-sc2.dat”/> </m_exec> <m_exec id=”6” name=”vtkContourFilter” package=”edu.utah.sci.vistrails.vtk” version=”5.6.0”> <param id=”3” name=”SetValue” value=”[1, 57]”/> <param id=”4” name=”ComputeScalarsOn” value=”True”/> </m_exec> ... <m_exec id=”11” name=”FileSink” package=”edu.utah.sci.vistrails.basic” version=”1.5”> <param id=”15” name=”path” value=”/home/a/results/23.out”/> </m_exec>

slide-34
SLIDE 34

NSF Community Codes 2012 www.vistrails.org

Linking Provenance and Data

  • Filenames are often the mode of

identification in data exploration

  • We might also use URIs or access

curated data stores

  • Can this always be expected for

exploratory tasks?

  • What happens if offline?
  • Solution:
  • Managed store for data

associated with computations

  • Improved data identification
  • Automatic versioning

26 <workflow_exec id=”1”> <m_exec id=”5” name=”vtkStructuredDataReader” package=”edu.utah.sci.vistrails.vtk” version=”5.6.0”> <param id=”2” name=”SetFile” value=”/MyData/05-12-sc2.dat”/> </m_exec> <m_exec id=”6” name=”vtkContourFilter” package=”edu.utah.sci.vistrails.vtk” version=”5.6.0”> <param id=”3” name=”SetValue” value=”[1, 57]”/> <param id=”4” name=”ComputeScalarsOn” value=”True”/> </m_exec> ... <m_exec id=”11” name=”FileSink” package=”edu.utah.sci.vistrails.basic” version=”1.5”> <param id=”15” name=”path” value=”/home/a/results/23.out”/> </m_exec>

!

FILE NOT FOUND

slide-35
SLIDE 35

NSF Community Codes 2012 www.vistrails.org

Linking Provenance and Data

  • Filenames are often the mode of

identification in data exploration

  • We might also use URIs or access

curated data stores

  • Can this always be expected for

exploratory tasks?

  • What happens if offline?
  • Solution:
  • Managed store for data

associated with computations

  • Improved data identification
  • Automatic versioning

26 <workflow_exec id=”1”> <m_exec id=”5” name=”vtkStructuredDataReader” package=”edu.utah.sci.vistrails.vtk” version=”5.6.0”> <param id=”2” name=”SetFile” value=”/MyData/05-12-sc2.dat”/> </m_exec> <m_exec id=”6” name=”vtkContourFilter” package=”edu.utah.sci.vistrails.vtk” version=”5.6.0”> <param id=”3” name=”SetValue” value=”[1, 57]”/> <param id=”4” name=”ComputeScalarsOn” value=”True”/> </m_exec> ... <m_exec id=”11” name=”FileSink” package=”edu.utah.sci.vistrails.basic” version=”1.5”> <param id=”15” name=”path” value=”/home/a/results/23.out”/> </m_exec>

!

FILE NOT FOUND

!

FILE NOT FOUND

slide-36
SLIDE 36

NSF Community Codes 2012 www.vistrails.org

Full Data Provenance

27

newfilename.dat HASH CONTENTS QUERY FILE STORE OBTAIN FILE REFERENCE 12ab3-45ef2... QUERY PROVENANCE OBTAIN INPUT REFS 0ab678cd...

12ab3-45ef2...

QUERY FILE STORE

12ab3-45ef2... 12ab3-45ef2...

OBTAIN INPUT FILES input files P

[Koop et al., 2010]

slide-37
SLIDE 37

NSF Community Codes 2012 www.vistrails.org

Workflow Mashups

28

[Santos et al., 2009] Mobile Web Desktop

slide-38
SLIDE 38

NSF Community Codes 2012 www.vistrails.org

crowdLabs

29

slide-39
SLIDE 39

NSF Community Codes 2012 www.vistrails.org

crowdLabs

29

slide-40
SLIDE 40

NSF Community Codes 2012 www.vistrails.org

Adding Provenance to 3rd-Party Tools

30

Autodesk Maya

[Callahan et al., 2008]

slide-41
SLIDE 41

NSF Community Codes 2012 www.vistrails.org

Adding Provenance to 3rd-Party Tools

30

Autodesk Maya ParaView

[Callahan et al., 2008]

slide-42
SLIDE 42

NSF Community Codes 2012 www.vistrails.org

Adding Provenance to 3rd-Party Tools

30

Autodesk Maya ParaView VisIt

[Callahan et al., 2008]

slide-43
SLIDE 43

NSF Community Codes 2012 www.vistrails.org

Adding Provenance to 3rd-Party Tools

30

Autodesk Maya ParaView VisIt ImageVis3d

[Callahan et al., 2008]

slide-44
SLIDE 44

NSF Community Codes 2012 www.vistrails.org

Provenance SDK

31

  • Enable existing and new applications to incorporate provenance

Volume Rendering Create Reader Apply New Colors Clip With Error Slice Only Color Change vtkSMRepresentationProxy Isosurface Multiple Isosurfaces

[VisTrails, Inc.]

slide-45
SLIDE 45

NSF Community Codes 2012 www.vistrails.org

Conclusions and Future Work

  • Provenance is important for computational science not only for

archiving but also for enabling better and more efficient work

  • We need the ability to share work and make it more accessible
  • Scalability

32

slide-46
SLIDE 46

NSF Community Codes 2012 www.vistrails.org

Acknowledgements

  • Juliana Freire and Cláudio T. Silva direct the VisTrails and

crowdLabs projects

  • Many students and staff have contributed to these projects
  • Matthias Troyer and his group (ALPS project)
  • Other VisTrails users and collaborators
  • Funding sources:

33

slide-47
SLIDE 47

NSF Community Codes 2012 www.vistrails.org

Questions?

34