reducing technical debt with reproducible containers
play

Reducing Technical Debt with Reproducible Containers Tanu Malik - PowerPoint PPT Presentation

Reducing Technical Debt with Reproducible Containers Tanu Malik 2019 BSSw Fellow Assistant Professor School of Computing DePaul University Chicago, IL IDEAS-ECP Webinar, November 4 th , 2020 IDEAS-ECP Webinar, November 2020 2 WhoamI My


  1. Reducing Technical Debt with Reproducible Containers Tanu Malik 2019 BSSw Fellow Assistant Professor School of Computing DePaul University Chicago, IL IDEAS-ECP Webinar, November 4 th , 2020 IDEAS-ECP Webinar, November 2020 2

  2. WhoamI My expertise is: Databases and distributed computing Data provenance: history and lineage of data and software Computational reproducibility: Repeating and recreating some one else’s work Tanu Malik Assistant Professor, Systems built: http://sciunit.run School of Computing Director, Data Systems and Opt. Lab I want to know more about: DePaul University Reproducibility case studies in HPC and how containers are used. Chicago, IL https://facsrv.cs.depaul.edu/~tmalik1 Problems I’m currently working on: Tanu.Malik@depaul.edu Provenance alignment: Using provenance to highlight sources of irreproducibility State maintenance in lineage graphs: Making Jupyter Notebooks reproducible IDEAS-ECP Webinar, November 2020 1

  3. Outline PART 1: How technical debt affects reproducibility? PART 2: If reproducible containers provide a start? PART 3: Guidance and summary IDEAS-ECP Webinar, November 2020 3

  4. PART 1: How technical debt affects reproducibility? IDEAS-ECP Webinar, November 2020 4

  5. Monetary debt IDEAS-ECP Webinar, November 2020 5

  6. Monetary debt meets the objective “sooner” IDEAS-ECP Webinar, November 2020 6

  7. Technical debt 1 is no different 1 A metaphor introduced by Ward Cunningham in 1992. IDEAS-ECP Webinar, November 2020 7

  8. Technical debt 1 is no different 1 A metaphor introduced by Ward Cunningham in 1992. </> </> </> IDEAS-ECP Webinar, November 2020 8

  9. Technical debt is no different. Journal deadline Productivity Good scientific software Technical debt Poor scientific software Time IDEAS-ECP Webinar, November 2020 9

  10. Dimensions of Technical Debt • Poor quality code • Poor design • Environment debt • Documentation debt • Testing debt IDEAS-ECP Webinar, November 2020 10

  11. Consequence of Mismanaged Debt REPOSSESSED IDEAS-ECP Webinar, November 2020 11

  12. Consequence of Mismanaged Debt REPOSSESSED </> </> IRREPRODUCIBLE </> IDEAS-ECP Webinar, November 2020 12

  13. Dimensions of Scientific Technical Debt • Poor quality code • Poor design • Environment debt • Documentation debt • Testing debt 1 E. Tom, A. Aurum, R. Vidgen, An exploration of technical debt, Journal of Systems and Software, Volume 86, Issue 6, 2013, Pages 1498-1516, ISSN 0164-1212, https://doi.org/10.1016/j.jss.2012.12.052. IDEAS-ECP Webinar, November 2020 13

  14. Dim Dimensio ions o of S Scie cientif ific T ic Tech chnic ical De al Debt • Poor quality code • Poor design ü Environment debt ü Documentation debt • Testing debt IDEAS-ECP Webinar, November 2020 14

  15. https://www.newscientist.com/gallery/software-bugs IDEAS-ECP Webinar, November 2020 15

  16. IDEAS-ECP Webinar, November 2020 16

  17. https://www.nature.com/articles/d41586-020-01685-y IDEAS-ECP Webinar, November 2020 17

  18. Cos Cost of of Sc Scientific Technical Debt IDEAS-ECP Webinar, November 2020 18

  19. Su Supercomp omputing A Art rtifact ct D Descri cription on a and Ev Evaluation Initiative https://sc20.supercomputing.org/planning-committee/ IDEAS-ECP Webinar, November 2020 19

  20. La Lack ck of of a art rtifact cts w will r reject ct a a p paper Total Number Unacceptable AD/AE 1 with VG/E AD/AE (Phase 2) 24 Submissions (Phase 2) 43 with VG/E AD/AE (Phase 1) 5 Per reviewer 80 Submissions (Phase 1) 380 0 50 100 150 200 250 300 350 400 Number IDEAS-ECP Webinar, November 2020 20

  21. Te Technical debt incurs burden • “Sticks” from reviewers work • Reproducibility is an after • Authors who have not taken thought. AD/AE process seriously do submit additional work • Identifying files for an • Time consuming task application is a challenge • No tools to check if everything • Missing workflows relevant for the publication is submitted • Really, that data/algorithm • No mapping of experiments to should be part of the bundle? content in the paper. • No infrastructure for efficiently verifying claimed results IDEAS-ECP Webinar, November 2020 21

  22. PART 2: Do reproducible containers provide a start? IDEAS-ECP Webinar, November 2020 22

  23. Re Reproducibility ecosystem Github Sharing images via the cloud Package managers Zenodo.org OpenData.gov Figshare Docker.com An introduction to Docker for reproducible research C Boettiger - ACM SIGOPS Operating Systems Review, 2015 - dl.acm.org IDEAS-ECP Webinar, November 2020 23

  24. Do Dock cker: U : Usin ing c contain ainers f from b build ild t to r run https://www.exascaleproject.org/event/conthpc IDEAS-ECP Webinar, November 2020 24

  25. Con Containers provide con onstrained resou ource is isola latio tion Filesystem Network CPU Memory IDEAS-ECP Webinar, November 2020 25

  26. Authors must program a Dockerfile IDEAS-ECP Webinar, November 2020 26

  27. Con Containers do o not ot reduce technical debt • Declarative encapsulation of dependencies for isolated execution • E.g. various shell utilities and library versions unknown to user IDEAS-ECP Webinar, November 2020 27

  28. Au Automatic Encapsulation of Dependencies: Th The Sci Sciunit IDEAS-ECP Webinar, November 2020 28

  29. Ke Key Idea: Iden Identif tify dependenc dependencies ies dur during ing pr progr gram exec ecut ution • Captures application dependencies during executions • Repeats executions (with guarantees) within isolated environments IDEAS-ECP Webinar, November 2020 29

  30. Sci Sciunit: A : Audit • Audit uses ptrace to observe dependencies and environment variables • Identifies binaries, libraries, scripts, and environment variables that Sciunit Sciunit application is dependent on. • Dependencies are copied into a directory in the filesystem • Inclusion of data files is optional • user may or may not want to package based on the size of the dataset. D.H. Ton That, G. Fils, Z. Yuan, T. Malik. Sciunits: Reusable Research Objects. In IEEE eScience Conference (eScience), 374-383, 2017 IDEAS-ECP Webinar, November 2020 30

  31. Au Audits provenance during execution time Sciunit Utilizing Provenance in Reusable Research Objects, In Special Issue on Using Computational Provenance , MDPI Informatics, Vol 5(1), 2018. Light-weight Database Virtualization. In IEEE International Conference on Data Engineering , ICDE, 2015. Auditing and Maintaining Provenance in Software Packages. In International P rovenance and Annotation Workshop (IPAW), 97-109, 2014 IDEAS-ECP Webinar, November 2020 31

  32. Sciunit: Sh Sci : Share a as a a Z Zip f file o or Do r Dock cker c r container r Sciunit Containment Provenance Graph Computational Sciunit Log Documentation Artifacts (from websites) Identification of Docker File Inputs, Outputs, Processes, Dependencies Documenting Computing Environments for Reproducible Experiments, In Parallel Computing: Technology Trends, 756-765 , 2020 IDEAS-ECP Webinar, November 2020 32

  33. Sci Sciunit: R : Repeat • Sciunit uses namespace isolation during repeat • Redirection of each call into the package Sciunit Sciunit Sciunit Efficient Provenance Alignment in Reproduced Executions, In Theory and Practice of Provenance , 2020. ScIInc: A Container Runtime for Incremental Recomputation”, In IEEE 15th International Conference on eScience (eScience), 291-300, 2019, doi: 10.1109/eScience. 2019.00040. IDEAS-ECP Webinar, November 2020 33

  34. Sci Sciunit st steps and external re require rements 3. Repeat 2. Share 1. Create IDEAS-ECP Webinar, November 2020 34

  35. Network Ne rk-enabled enabled Sci Sciunit: A : Audit Network-enabled Sciunit Possible with Network- enabled Sciunit 1. Network-enabled 1. Network-enabled Spawn task Spawn task Sciunit Sciunit 1 2 4. Merge 4. Merge Note : 1. Identify remote host & copy Sciunit to it 2&3. Run task 1 2&3. Run task 2 2&3. Configure & run task with Sciunit 4. Retrieve & manually merge IDEAS-ECP Webinar, November 2020 35

  36. Ne Netw twork-en enabled ed Sci ciunit: : Rep epea eat t on singl gle e node Network-enabled Sciunit Run application Note : 1. Repeat all computations at root node. No connection 2. Network system calls are supplied through the content data captured during the original audit. IDEAS-ECP Webinar, November 2020 36

  37. Ne Netw twork-en enabled ed Sci ciunit: : Rep epea eat t on mu multi tiple e nodes es Network-enabled Sciunit Run application Requirements : 1. Identical number of nodes Network-enabled Network-enabled Sciunit & sub- 2. Descriptions of new hostnames or IP Sciunit & sub- container container addresses Run task 1 Run task 2 IDEAS-ECP Webinar, November 2020 37

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend