a graph model for data and workflow provenance
play

A graph model for data and workflow provenance Umut Acar, Peter - PowerPoint PPT Presentation

A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney , Natalia Kwasnikowska, Jan van den Bussche, & Stijn Vansummeren TaPP 2010 Provenance in ... Databases Workflows Mainly for (nested)


  1. A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney , Natalia Kwasnikowska, Jan van den Bussche, & Stijn Vansummeren TaPP 2010

  2. Provenance in ... • Databases • Workflows • • Mainly for (nested) Many different systems relational model • Many different models • Where-provenance • ("source location") (converging on OPM?) • • Lineage, why ("witnesses") Graphs/DAGs • • How/semiring model Relatively informal • Relatively formal

  3. Provenance in ... • Databases • Workflows • • Mainly for (nested) Many different systems relational model • Many different models • Where-provenance ????? • ("source location") (converging on OPM?) • • Lineage, why ("witnesses") Graphs/DAGs • • How/semiring model Relatively informal • Relatively formal

  4. This talk • Relate database & workflow "styles" • Develop a common graph formalism • Need a common, expressive language that • supports many database queries • describes some (simple) workflows

  5. Previous work • Dataflow calculus (DFL), based on nested relational calculus (NRC) • Provenance "run" model by Kwasnikowska & Van den Bussche (DILS 07, IPAW 08) • "Provenance trace" model for NRC • by (Acar, Ahmed & C. '08) • Open Provenance Model (bipartite graphs) • (Moreau et al. 2008-9), used in many WF systems

  6. NRC/DFL background • A very simple, functional language: • basic functions +, *,... & constants 0,1,2,3... • variables x,y,z • pair/record types (A:e,...,B:e), π A (e) • collection (set) types • {e,...} e ∪ e {e | x in e'} ∪ e

  7. An example

  8. An example • Suppose R = {(1,2,3), (4,5,6), (9,8,7)}

  9. An example • Suppose R = {(1,2,3), (4,5,6), (9,8,7)} sum { x * y | (x,y,z) in R, x < y}

  10. An example • Suppose R = {(1,2,3), (4,5,6), (9,8,7)} sum { x * y | (x,y,z) in R, x < y} = sum { x * y | (x,y,z) in {(1,2,3), (4,5,6)}}

  11. An example • Suppose R = {(1,2,3), (4,5,6), (9,8,7)} sum { x * y | (x,y,z) in R, x < y} = sum { x * y | (x,y,z) in {(1,2,3), (4,5,6)}} = sum {1 * 2, 4 * 5}

  12. An example • Suppose R = {(1,2,3), (4,5,6), (9,8,7)} sum { x * y | (x,y,z) in R, x < y} = sum { x * y | (x,y,z) in {(1,2,3), (4,5,6)}} = sum {1 * 2, 4 * 5} = sum {2,20}

  13. An example • Suppose R = {(1,2,3), (4,5,6), (9,8,7)} sum { x * y | (x,y,z) in R, x < y} = sum { x * y | (x,y,z) in {(1,2,3), (4,5,6)}} = sum {1 * 2, 4 * 5} = sum {2,20} = 22

  14. Another example • In DFL, built-in functions / constants can be whole programs & files, • as in Provenance Challenge 1 workflow: let WarpParams := {align_warp(img,hdr}) | (img,hdr) in Inputs} in let Reslices := {reslice(wp) | wp in WarpParams} in softmean(Reslices)

  15. Goal: Define "provenance graphs" for DFL

  16. Goal: Define "provenance graphs" for DFL let WarpParams := {align_warp(img,hdr}) | (img,hdr) in Inputs} in let Reslices := {reslice(wp) | wp in WarpParams} in in softmean(Reslices)

  17. Goal: Define "provenance graphs" for DFL let WarpParams := {align_warp(img,hdr}) | (img,hdr) in Inputs} in let Reslices := {reslice(wp) | wp in WarpParams} in in softmean(Reslices) http://www.flickr.com/photos/schneertz/679692806/

  18. First step: values or v c copy v v v elem A 1 or or {} ... <> ... elem A n v v

  19. Example value 1 A <> elem B {} 2 A elem <> B 3

  20. Next step: evaluation nodes ("process") Constants, 1 e primitive c f ... functions e n Variables & e x let x temporary head e body bindings

  21. Pairing A 1 e Record building <> ... e A n Field lookup π A e

  22. Conditionals test test e e if if e e then else Note: Only taken branch is recorded

  23. Sets: basic operations Empty set ∅ Singleton {} e 1 Union e ∪ e 2

  24. Sets: complex operations Flattening e ∪ e head e for x Iteration body ... e body

  25. Provenance graphs • are graphs with "both value and evaluation structure" ./01 " # % +,- * &'( % ! & &'( % # # " # % ! ) $ $ 2/34 ) ' $%&" ./01! 6%4" ! # , ( $%&" ' $%&" ( + (5 6%4" ' # '- ./01" " $%&" ( 2/34 *

  26. A bigger example 0 #$% &'() 0 0 " &'() $-. 1 &'() / ;<=$8 %8$% 8=$8 1 *+, 2+3 8:(%) 2+3 4# 1 98<. &'() 2+3 =8%+! >'.) &'() >'.) &'() &'() 1 ! 98<. =8%+@ 98<. 1 &'() $-. 0 #'6+" 2+3 ? 2+3 0 >'.) %8$% &'() / 5678 *+, 2+3 >'.) 0 98<. &'() #'6+) 2+3 #$% 4# %98- @ >'.) 0 &'() " &'() 2+3 2+3 1 1 &'() A 0 $-. ) &'() #$% &'()

  27. Value structure 0 #$% &'() 0 0 " &'() $-. 1 &'() / ;<=$8 %8$% 8=$8 1 *+, 2+3 8:(%) 2+3 4# 1 98<. &'() 2+3 =8%+! >'.) &'() >'.) &'() &'() 1 ! 98<. =8%+@ 98<. 1 &'() $-. 0 #'6+" 2+3 ? 2+3 0 >'.) %8$% &'() / 5678 *+, 2+3 >'.) 0 98<. &'() #'6+) 2+3 #$% 4# %98- @ >'.) 0 &'() " &'() 2+3 2+3 1 1 &'() A 0 $-. ) &'() #$% &'()

  28. Value structure 1 C 0 #$% &'() 0 C 0 " &'() $-. 1 F C &'() / ;<=$8 %8$% <> 8=$8 2 {} {} 1 *+, 2+3 8:(%) 2+3 4# 1 C 98<. &'() {} 2+3 =8%+! C >'.) &'() C >'.) C &'() &'() 1 ! 98<. 1 =8%+@ C {} 98<. {} <> {} T 1 &'() $-. 0 #'6+" 2+3 ? 2+3 0 C >'.) C %8$% &'() / 5678 *+, 2+3 >'.) {} 0 98<. &'() #'6+) 2+3 1 #$% 4# %98- C @ >'.) 0 &'() C {} " &'() 2+3 2+3 2 1 C 1 &'() A C 0 $-. ) &'() C #$% &'()

  29. Input values 1 C 0 #$% &'() 0 C 0 " &'() $-. 1 F C &'() / ;<=$8 %8$% <> 8=$8 2 {} {} 1 *+, 2+3 8:(%) 2+3 4# 1 C 98<. &'() {} 2+3 =8%+! C >'.) &'() C >'.) C &'() &'() 1 ! 98<. 1 =8%+@ C {} 98<. {} <> {} T 1 &'() $-. 0 #'6+" 2+3 ? 2+3 0 C >'.) C %8$% &'() / 5678 *+, 2+3 >'.) {} 0 98<. &'() #'6+) 2+3 1 #$% 4# %98- C @ >'.) 0 &'() C {} " &'() 2+3 2+3 2 1 C 1 &'() A C 0 $-. ) &'() C #$% &'()

  30. Return value 1 C 0 #$% &'() 0 C 0 " &'() $-. 1 F C &'() / ;<=$8 %8$% <> 8=$8 2 {} {} 1 *+, 2+3 8:(%) 2+3 4# 1 C 98<. &'() {} 2+3 =8%+! C >'.) &'() C >'.) C &'() &'() 1 ! 98<. 1 =8%+@ C {} 98<. {} <> {} T 1 &'() $-. 0 #'6+" 2+3 ? 2+3 0 C >'.) C %8$% &'() / 5678 *+, 2+3 >'.) {} 0 98<. &'() #'6+) 2+3 1 #$% 4# %98- C @ >'.) 0 &'() C {} " &'() 2+3 2+3 2 1 C 1 &'() A C 0 $-. ) &'() C #$% &'()

  31. Expression structure 0 #$% &'() 0 0 " &'() $-. 1 &'() / ;<=$8 %8$% 8=$8 1 *+, 2+3 8:(%) 2+3 4# 1 98<. &'() 2+3 =8%+! >'.) &'() >'.) &'() &'() 1 ! 98<. =8%+@ 98<. 1 &'() $-. 0 #'6+" 2+3 ? 2+3 0 >'.) %8$% &'() / 5678 *+, 2+3 >'.) 0 98<. &'() #'6+) 2+3 #$% 4# %98- @ >'.) 0 &'() " &'() 2+3 2+3 1 1 &'() A 0 $-. ) &'() #$% &'()

  32. Expression structure fst 0 #$% &'() 0 x 0 " &'() $-. snd 1 = &'() / ;<=$8 %8$% 8=$8 empty 1 *+, 2+3 8:(%) 2+3 4# 1 if 98<. let R &'() 2+3 =8%+! >'.) &'() >'.) &'() let S snd &'() 1 ! R 98<. =8%+@ fst for x U 98<. = 1 &'() $-. 0 #'6+" 2+3 ? 2+3 0 >'.) %8$% &'() / 5678 *+, 2+3 >'.) s for y 0 98<. &'() #'6+) 2+3 #$% 4# if %98- @ >'.) 0 &'() {} x " &'() 2+3 2+3 1 + snd 1 &'() A 0 $-. y ) &'() fst #$% &'()

  33. Building provenance graphs • is complicated • Here we'll use high-level "graph rewrite rule" formalism • Mostly because it is nicer to look at than formal version

  34. c c c v 1 v 1 1 1 f f f(v 1 ,...,v n ) ... ... n n v n v n v v head head let x let x copy e e x x body copy body

  35. v 1 v 1 A 1 A 1 A 1 <> ... <> <> ... A n v n A n A n v n v 1 v A 1 A 1 ... ... ... π Ai v i <> <> π Ai v i copy ... ... A n A n v n v

  36. True test True test e 1 if then if copy else e 1 then e 2 False test False test e 1 if then if copy else e 2 else e 2

  37. v elem v elem empty? {} empty? False {} ... ... elem elem v v empty? empty? True {} {}

  38. ∅ ∅ ∅ elem {} {} {} v v v elem v elem elem {} ... elem ... {} ... v elem ∪ v {} ∪ v elem v elem {} ... elem ... {} ... v elem elem v

  39. OK, take a deep breath!

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend