https github com gglobster trappist trappist d e r u t a
play

https://github.com/gglobster/trappist TRAPPIST: d e r u t a - PowerPoint PPT Presentation

TRAPPIST: A toolkit for comparative analysis and visualization of genomic regions Geraldine A. Van der Auwera, PhD https://github.com/gglobster/trappist TRAPPIST: d e r u t a e f - l l u f n o A toolkit for comparative


  1. TRAPPIST: A toolkit for comparative analysis and visualization of genomic regions Geraldine A. Van der Auwera, PhD � https://github.com/gglobster/trappist

  2. TRAPPIST: d e r u t a e f - l l u f n o A toolkit for comparative analysis and i t a c i l p p a visualization of genomic regions Geraldine A. Van der Auwera, PhD Shankar Ambady � https://github.com/gglobster/trappist

  3. The source code of Life  Evolution = 4 bn years of forking without version tracking … and you thought legacy Fortran code was a pain

  4. Two distinct issues  Getting the code from the  Reverse-engineering the code repository (living beings) (zero documentation!) Extraction, sequencing, assembly Experimentation, mutagenesis + (comparative) sequence analysis

  5. Top issue for “getting” NGS is outscaling Moore’s Law Lincoln Stein via C. Titus Brown @PyCon 2011

  6. Top issue for “getting” NGS is outscaling Moore’s Law R E V E T A H W Lincoln Stein via C. Titus Brown @PyCon 2011

  7. Evolving process of rev-eng  No genomes  entirely experimental  Make random mutants, trace back effect to gene of interest  One genome  some predictive filtering  Design mutants, long iterative process  Many related genomes  much better predictive filtering  Nature’s mutants, drastically reduced iterative process

  8. Nature’s mutants (example) Anthrax PAI rep2 repX tra1 tra2 tra3 pXO1 pBCXO1 p03BB102_179 pAH820_272 pAH187_270 NZ_ACMR0 NZ_ACMH0 IS075 pBc10987 NZ_ACMC0 NZ_ACMS0 NZ_ACMT0 NZ_ACNI0 VD022 Schrouff NZ_ACMO0 TIAC129 NZ_ACNJ0 NZ_ABDM0 NZ_ACNB0 NZ_ACLY0 NZ_ACMP0 NZ_ACNK0 pBc239 NZ_ACNE0 NZ_ABDA0 NZ_ACLV0 NZ_ACLT0 NZ_ACNF0 NZ_ACNA0

  9. Typical analysis process BLAST All done through separate GUIs  poor batching, no automation, no chaining

  10. Programmatic access  The servers can be accessed with scripts*, and there are awesome libraries that provide wrappers, data structures etc. But here’s the rub… * (there are a few GUI pipeline apps but usability is an issue </diplomatic>)

  11. “What’s a command line?” Exhibit A: Experimental Biologist

  12. TRAPPIST Totally Rad Analysis Pipelines Python Super Tool �

  13. TRAPPIST Totally Rad Analysis Pipelines Python Super Tool �

  14. TRAPPIST Totally Rad Analysis Pipelines Python Super Tool �

  15. genome list of list of CONTIG FISHER reference DBs genomes targets Example pipeline / workflow reference plasmids list of + constructs genes OR (my research) Data Input HOST HK_SET CONSERVATION EXTRACTOR SORTER segment STRUCTURE H_K sets Core Analysis sets COMPARATOR ] [ Optional PHYLOGENY SEQUENCE CONTENT Analyses CONSTRUCT VARIATION FUNCTIONS ] [ host plasmid trees trees PHYLOGENY CONGRUENCE

  16. Do it manually? Exhibit B: Lazy Postdoc (me)

  17. Long story short Collection of one-time scripts  Toolkit  Full-featured application  DNA-based OS?

  18. Fundamental requirements  Design, assembly and modification of pipelines / workflows  Automated execution, parameter / output versioning, provenance data bundling, interactive visualization

  19. Staging / Execution

  20. Staging system  Basic requirements  intuitive  flexible  extensible + Pre-assembled workflows / pipelines + Hooks for external / roll-your-own functions

  21. Staging area - workflows

  22. Workflow components  TRAPPIST provides discrete task components for every step of analysis:  Initial inputs selection  Data processing steps (existing algorithms)  Graphical output  Component I/O relies on matching ports with data object classes Forces validation of data type/format (not up to user) 

  23. Component representation

  24. Interacting with components

  25. Connecting components

  26. Connecting components

  27. Connecting components

  28. Execution system  Progressive / modular / dependency-aware  Parameter set versioning linked to output versioning  Users more likely to try various parameters to test assumptions

  29. Flow control rule  Component ports have “fill” status  If all its inputs are filled, component is OK to execute � add component to execution queue I’m OK to go!

  30. Database architecture Workflow 1 Staging DB Execution DB Workflow 2 Central DB Staging DB Execution DB

  31. Staging DB  DB schema + data dump sufficient to fully describe a workflow

  32. Execution DB

  33. Enforcing good practices  Provenance bundle including:  Workflow schema  Parameter sets  Code version info  Executable papers!  Reproducibility!  Science!

  34. Interactive visualization

  35. U haz questions?

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend