"[An article about computational science in a scientific - PowerPoint PPT Presentation

P RUNE : A Preserving Run Environment for Reproducible Scientific Computing -Peter Ivie

Reproducibility • "[An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and the complete set of instructions ]" –Jon Claerbout 2

Verify and Extend • Don’t re-invent the wheel • Stand on the shoulders of giants 3

P RUNE features • Designed for Big Data • Manage storage and compute resources • Reproducible workflow specifications • Share workflow with others • Reshare changes back • User defined granularity 4

Accepted philosphy Preserve Later • Libraries Design • Hardware • Network Execute Observe • System Administrators Share/Publish • Remote Collaborators Preserve • Graduated Students 5

Proposed philosophy Preserve Later Preserve First Design Design Preserve Execute Observe Execute Share/Publish Share/Publish Observe Preserve Unpreserve 6

Differences • Git: User decides when to preserve Preserve First Design Preserve Execute Share/Publish Observe Unpreserve 7

Differences • Git: User decides when to preserve Preserve First • Preserve ALL specification Design changes Preserve Execute Share/Publish Observe Unpreserve 8

Differences • Git: User decides when to preserve Preserve First • Preserve ALL specification Design changes Preserve • Git: Code Commits separate from Code Execution Execute Share/Publish Observe Unpreserve 9

Differences • Git: User decides when to preserve Preserve First • Preserve ALL specification Design changes Preserve • Git: Code Commits separate from Code Execution Execute Share/Publish • System Manages ALL computation Observe Unpreserve 10

Differences • Git: User decides when to preserve Preserve First • Preserve ALL specification Design changes Preserve • Git: Code Commits separate from Code Execution Execute Share/Publish • System Manages ALL computation Observe Unpreserve • Remove unneeded items later on 11

What to Preserve arguments : [ file_id1, file_id2 ] parameters : [ ‘in.txt’, ‘in.dat’ ] Virtual Machine / Container Command : ‘do < in.txt in.dat > out.txt o2.txt’ Prune Task returns : [ ‘out.txt’, ‘o2.txt’ ] Environment results : [ file_id3, file_id4 ] environment : envi_id1 Data Software Operating System Kernel Hardware 12

Overview E1 = envi_add ( type=‘EC2’, image= ‘ hep. beta ’ ) E1 T1 T4 Simulate (E1) (E2) E2 Workflow Version #2 Compute Resources E2 = envi_add ( type=‘EC2’ , image= ‘ hep. stable ’ ) F2 F5 F1 T4 = task_add ( cmd= ‘ simulate > output’, User space returns=[ ‘ output'], environment= E1 ) F1 = file_add ( filename=‘./observed.dat’ ) T3 T5 T2 T6 Analyze (E1) (E1) (E2) (E2) T6 = task_add ( args=[ T4[0] ], params=['input_data’], cmd= ‘ analyze < in_data > out_data’, returns=[ ‘ out_data'], environment=E2 ) F4 F3 F6 F7 T5 = task_add ( args=[ F1 ], ...) (remaining arguments the same as above) File T7 Plot T7 = task_add ( cmd=‘plot in1 in2 out1 out2’, (E2) Environment args=[ T5[0], T6[0] ], params=[ ‘ in1’, ‘ in2’], returns=[‘out1’,‘out2’], environment=E2 ) Task F8 F9 export ( [ T7[1] ], filename=‘./plot.jpg’ ) PRUNE space User interface 13

Sample code: Merge sort #!/usr/bin/env python from prune import client prune = client.Connect() #Use SQLite3 ###### Import sources stage ###### E1 = prune.env_add(type=`EC2', image=`ami-b06a98d8') D1, D2 = prune.file_add( `nouns.txt', `verbs.txt' ) 14

Sample code: Merge sort ###### Sort stage ###### D3, = prune.task_add( returns=[òutput.txt'], env=E1, cmd=`sort input.txt > output.txt', args=[D1], params=[ìnput.txt'] ) D4, = prune.task_add( returns=[òutput.txt'], env=E1, cmd=`sort input.txt > output.txt', args=[D2], params=[ìnput.txt'] ) ###### Merge stage ###### D5, = prune.task_add( returns=[`merged_out.txt'], env=E1, cmd=`sort -m input*.txt > merged_out.txt', args=[D3,D4], params=[ìnput1.txt',ìnput2.txt'] ) 15

Prune Task arguments : [ file_id1, file_id2 ] parameters : [ ‘in.txt’, ‘in.dat’ ] Virtual Machine / Container Command : ‘do < in.txt in.dat > out.txt o2.txt’ Prune Task returns : [ ‘out.txt’, ‘o2.txt’ ] Environment results : [ file_id3, file_id4 ] environment : envi_id1 Data Software Operating System Kernel Hardware 16

Sample code: Merge sort ###### Sort stage ###### D3, = prune.task_add( returns=[òutput.txt'], env=E1, cmd=`sort input.txt > output.txt', args=[D1], params=[ìnput.txt'] ) D4, = prune.task_add( returns=[òutput.txt'], env=E1, cmd=`sort input.txt > output.txt', args=[D2], params=[ìnput.txt'] ) ###### Merge stage ###### D5, = prune.task_add( returns=[`merged_out.txt'], env=E1, cmd=`sort -m input*.txt > merged_out.txt', args=[D3,D4], params=[ìnput1.txt',ìnput2.txt'] ) 17

Sample code: Merge sort ###### Execute the workflow ###### prune.execute( worker_type='local', cores=8 ) #prune.execute( worker_type='wq', name='myapp' ) ###### Export ###### prune.export( D5, `merged.txt' ) # Final data prune.export( D5, `wf.prune', lineage=2 ) 18

Sample code: Merge sort ###### Execute the workflow ###### prune.execute( worker_type='local', cores=8 ) #prune.execute( worker_type='wq', name='myapp' ) ###### Export ###### prune.export( D5, `merged.txt' ) # Final data prune.export( D5, `wf.prune', lineage=2 ) 19

Sharable workflow description file {"body": {"args": ["f908ff689b9e57f0055875d927d191ccd2d6deef:0", "319418e43783a78e3cb7e219f9a1211cba4b3b31:0"], "cmd": " sort -m input*.txt > merged_output.txt ", "env": "da39a3ee5e6b4b0d3255bfef95601890afd80709", "env_vars": {}, "params": ["input1.txt", "input2.txt"], "precise": true, "returns": ["merged_output.txt"], "types": []}, "cbid": "e82855394e9dcdee03ed8a25c96c79245fd0481a", "size": 322, "type": "call", "wfid": "a0230143-9b3a-4766-809d-5b7172e9b967", "when": 1476886144.7171359} {"body": {"args": ["29ae0a576ab660cb17bf9b14729c7b464fa98cca"], "cmd": " sort input.txt > output.txt ", "env": "da39a3ee5e6b4b0d3255bfef95601890afd80709", "env_vars": {}, "params": ["input.txt"], "precise": true, "returns": ["output.txt"], "types": []}, "cbid": "f908ff689b9e57f0055875d927d191ccd2d6deef", "size": 241, "type": "call", "wfid": "a0230143-9b3a-4766-809d-5b7172e9b967", "when": 1476886144.484422} {"body": {"args": ["48044131b31906e6c917d857ddd1539278c455cf"], "cmd": " sort input.txt > output.txt ", "env": "da39a3ee5e6b4b0d3255bfef95601890afd80709", "env_vars": {}, "params": ["input.txt"], "precise": true, "returns": ["output.txt"], "types": []}, "cbid": "319418e43783a78e3cb7e219f9a1211cba4b3b31", "size": 241, "type": "call", "wfid": "a0230143-9b3a-4766-809d-5b7172e9b967", "when": 1476886144.6183109} {"cbid": "29ae0a576ab660cb17bf9b14729c7b464fa98cca", "size": 144 , "type": "file", "wfid": "a0230143-9b3a-4766-809d-5b7172e9b967", "when": 1476886144.2482941} time person year Way … 20

Workflow evolution (US Censuses) ... 1850 1940 Stage 1 ... 1850 1940 Uncompress (year+fragment) Stage 2 ... 1850 1940 Normalize (year+fragment) Stage 3 ... ... ... 1940 1850 Split by key (year+fragment+key) 1940 1850 Stage 4 ... 1940 Join fragments (year+key) 1940 Stage 5 ... 1850 1860 1860 1870 1930 1940 1930 1940 Pair by year (year1+year2+key) Stage 6 ... 1850 1860 1860 1870 1930 1940 1930 1940 Group matches (year1+year2+key) Stage 7 ... 1850 1860 1860 1870 1930 1940 1930 1940 Filter 1-1 matches (year1+year2+key) 21

Redefine filter criteria ... 1850 1940 Stage 1 ... 1850 1940 Uncompress (year+fragment) Stage 2 ... 1850 1940 Normalize (year+fragment) Stage 3 ... ... ... 1940 1850 Split by key (year+fragment+key) 1940 1850 Stage 4 ... 1940 Join fragments (year+key) 1940 Stage 5 ... 1850 1860 1860 1870 1930 1940 1930 1940 Pair by year (year1+year2+key) Stage 6 ... 1850 1860 1860 1870 1930 1940 1930 1940 Group matches (year1+year2+key) Stage 7 ... ... 1850 1860 1860 1870 1930 1940 1930 1940 1850 1860 1860 1870 1930 1940 1930 1940 Filter 1-1 matches (year1+year2+key) 22

"[An article about computational science in a scientific - PowerPoint PPT Presentation

P RUNE : A Preserving Run Environment for Reproducible Scientific Computing -Peter Ivie Reproducibility "[An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of

AATO CONSTITUTION 1 Article of the Constitution Article 6 The Council Article 1

Article 1-To accept reports Article 2-To set salaries for school officials Article 3-To

SCIENCE SCIENCE Scientific Question Hypothesis Prediction Experimental Test Scientific

Reproducibility "[An article about computational science in a scientific publication is

Article 6 Kelley Kizzier UNFCCC Co-Chair Article 6 Context and Overview The last Article to

Paris Agreements Article 6 Update Stefano De Clara Director for International Policy, IETA

Scientific report Mariusz ynel April 22, 2015 Scientific report 2 Contents 1 Scientific

The Scientific Method The Scientific Method The Scientific Method involves 6 steps: Problem

1. Computational Fluid a. Computational Fluid Dynamics is in the domain of Computational Science

UNDERSTANDING THE SCIENTIFIC METHOD ATI TEAS SCIENCE - THE SCIENTIFIC METHOD Questions related

Scientific Programming in mpags-python.github.io Steven Bamford An introduction to scientific

What is an Article 81 Guardianship? 1 7/10/2016 Mental Hygiene Law Article 81 Enacted in

Warrant Articles Article 1- To R+A $2,188,560 for construction of an addition and authorize a

SENATE COMMITTEE ON FINANCE HEARING H7171 Article 8, Section 2 Hotel Tax H7171 Article 8,

Recommendation System for Opinion Articles in Turkish Newspapers stn zgr System

The Ramifications After Article 50: of After Article 50: The Ramifications of Brexit Contents /

Pruned Dynamic Programming for Steiner Tree Yoichi Iwata (NII) Takuto Shigemura (U-Tokyo)

NAS NAS

Pruning Neural Belief Propagation Decoders Andreas Buchberger, 1 Christian H ager, 1 Henry D.

CS133 Computational Geometry Computational Geometry on Big Data 1 Big Geometric Data Geotagged

Properties of - //the leaf node (terminal state) ) 9 ( The - algorithm //the leaf

Multilevel Coarse-to-Fine PCFG Parsing Eugene Charniak, Mark Johnson, Micha Elsner, Joseph

Adversarial Search Problem: minimax takes too long. Solution: improve algorithm to ignore

Minimal Cost Complexity Pruning of Meta-Classifiers Andreas L. Prodromidis Salvatore J. Stolfo

Sambuz

Useful Links

Newsletter

Mail Us

"[An article about computational science in a scientific - PowerPoint PPT Presentation

P RUNE : A Preserving Run Environment for Reproducible Scientific Computing -Peter Ivie Reproducibility "[An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of

AATO CONSTITUTION 1 Article of the Constitution Article 6 The Council Article 1

Article 1-To accept reports Article 2-To set salaries for school officials Article 3-To

SCIENCE SCIENCE Scientific Question Hypothesis Prediction Experimental Test Scientific

Reproducibility &quot;[An article about computational science in a scientific publication is

Article 6 Kelley Kizzier UNFCCC Co-Chair Article 6 Context and Overview The last Article to

Paris Agreements Article 6 Update Stefano De Clara Director for International Policy, IETA

Scientific report Mariusz ynel April 22, 2015 Scientific report 2 Contents 1 Scientific

The Scientific Method The Scientific Method The Scientific Method involves 6 steps: Problem

1. Computational Fluid a. Computational Fluid Dynamics is in the domain of Computational Science

UNDERSTANDING THE SCIENTIFIC METHOD ATI TEAS SCIENCE - THE SCIENTIFIC METHOD Questions related

Scientific Programming in mpags-python.github.io Steven Bamford An introduction to scientific

What is an Article 81 Guardianship? 1 7/10/2016 Mental Hygiene Law Article 81 Enacted in

Warrant Articles Article 1- To R+A $2,188,560 for construction of an addition and authorize a

SENATE COMMITTEE ON FINANCE HEARING H7171 Article 8, Section 2 Hotel Tax H7171 Article 8,

Recommendation System for Opinion Articles in Turkish Newspapers stn zgr System

The Ramifications After Article 50: of After Article 50: The Ramifications of Brexit Contents /

Pruned Dynamic Programming for Steiner Tree Yoichi Iwata (NII) Takuto Shigemura (U-Tokyo)

NAS NAS

Pruning Neural Belief Propagation Decoders Andreas Buchberger, 1 Christian H ager, 1 Henry D.

CS133 Computational Geometry Computational Geometry on Big Data 1 Big Geometric Data Geotagged

Properties of - //the leaf node (terminal state) ) 9 ( The - algorithm //the leaf

Multilevel Coarse-to-Fine PCFG Parsing Eugene Charniak, Mark Johnson, Micha Elsner, Joseph

Adversarial Search Problem: minimax takes too long. Solution: improve algorithm to ignore

Minimal Cost Complexity Pruning of Meta-Classifiers Andreas L. Prodromidis Salvatore J. Stolfo

Sambuz

Useful Links

Newsletter

Mail Us

Reproducibility "[An article about computational science in a scientific publication is