Challenge of Reproducible Pipelines Pjotr Prins 11th Biohackathon - - PowerPoint PPT Presentation

challenge of reproducible pipelines
SMART_READER_LITE
LIVE PREVIEW

Challenge of Reproducible Pipelines Pjotr Prins 11th Biohackathon - - PowerPoint PPT Presentation

Challenge of Reproducible Pipelines Pjotr Prins 11th Biohackathon 2018 Matsue, Japan, December 9th, UMC Utrecht/UTHSC GeneNetwork.org p. 1 Challenge Reproducible analysis starts with software p. 2 Deployment Software deployment is


slide-1
SLIDE 1

Challenge of Reproducible Pipelines

Pjotr Prins 11th Biohackathon 2018 Matsue, Japan, December 9th,

UMC Utrecht/UTHSC GeneNetwork.org

– p. 1

slide-2
SLIDE 2

Challenge

Reproducible analysis starts with software

– p. 2

slide-3
SLIDE 3

Deployment

Software deployment is boring

– p. 3

slide-4
SLIDE 4

Avoid

Programmers prefer to look away

– p. 4

slide-5
SLIDE 5

Reproducibility

What about Docker?

  • Docker is a binary blob
  • Also creating Docker images is not reproducible
  • Nor are Debian, Conda, Brew etc. reproducible
  • It is all about fixating dependencies (and

bootstrapping)

  • Building on shifting sands

– p. 5

slide-6
SLIDE 6

GNU Guix

  • Guix gives reproducible software installation
  • Guix is easy
  • Guix has versioning
  • Guix give real control over the full dependency graph
  • it just works (tm)
  • Guix creates reproducible binaries with dependencies

AND even Docker containers

– p. 6

slide-7
SLIDE 7

Confession

I love GNU Guix

– p. 7

slide-8
SLIDE 8

Goal

Write a pipeline using CWL and Guix and document it

– p. 8

slide-9
SLIDE 9

Goal

Write a pipeline using CWL and Guix and document it

  • CWL reference runner
  • Software graph is reproducible (from source)
  • Data is content-addressable
  • Metadata: software and data origins/descriptions

(wikidata)

  • See if we can embed it in Shogun - block chain

– p. 9

slide-10
SLIDE 10

ENV

Never go it alone

  • CWL (Michael Crusoe a.o.)
  • Galaxy (Conda support, CWL support, RStudio,

Jupyter Labs.. . .

  • GeneNetwork.org
  • Wikidata
  • Blockchain scientific credit (Alexander Garcia Castro

a.o.)

– p. 10

slide-11
SLIDE 11

WIP

Wouldn’t it be amazing to have fully reproducible and shareable pipelines

  • It can be done. We have the technology
  • And I have found software deployment is not boring
  • Full control over the software dependency graph

means things get fixated in time - you can move forward

– p. 11