https://github.com/gglobster/trappist TRAPPIST: d e r u t a - PowerPoint PPT Presentation

TRAPPIST: A toolkit for comparative analysis and visualization of genomic regions Geraldine A. Van der Auwera, PhD � https://github.com/gglobster/trappist

TRAPPIST: d e r u t a e f - l l u f n o A toolkit for comparative analysis and i t a c i l p p a visualization of genomic regions Geraldine A. Van der Auwera, PhD Shankar Ambady � https://github.com/gglobster/trappist

The source code of Life  Evolution = 4 bn years of forking without version tracking … and you thought legacy Fortran code was a pain

Two distinct issues  Getting the code from the  Reverse-engineering the code repository (living beings) (zero documentation!) Extraction, sequencing, assembly Experimentation, mutagenesis + (comparative) sequence analysis

Top issue for “getting” NGS is outscaling Moore’s Law Lincoln Stein via C. Titus Brown @PyCon 2011

Top issue for “getting” NGS is outscaling Moore’s Law R E V E T A H W Lincoln Stein via C. Titus Brown @PyCon 2011

Evolving process of rev-eng  No genomes  entirely experimental  Make random mutants, trace back effect to gene of interest  One genome  some predictive filtering  Design mutants, long iterative process  Many related genomes  much better predictive filtering  Nature’s mutants, drastically reduced iterative process

Nature’s mutants (example) Anthrax PAI rep2 repX tra1 tra2 tra3 pXO1 pBCXO1 p03BB102_179 pAH820_272 pAH187_270 NZ_ACMR0 NZ_ACMH0 IS075 pBc10987 NZ_ACMC0 NZ_ACMS0 NZ_ACMT0 NZ_ACNI0 VD022 Schrouff NZ_ACMO0 TIAC129 NZ_ACNJ0 NZ_ABDM0 NZ_ACNB0 NZ_ACLY0 NZ_ACMP0 NZ_ACNK0 pBc239 NZ_ACNE0 NZ_ABDA0 NZ_ACLV0 NZ_ACLT0 NZ_ACNF0 NZ_ACNA0

Typical analysis process BLAST All done through separate GUIs  poor batching, no automation, no chaining

Programmatic access  The servers can be accessed with scripts*, and there are awesome libraries that provide wrappers, data structures etc. But here’s the rub… * (there are a few GUI pipeline apps but usability is an issue </diplomatic>)

“What’s a command line?” Exhibit A: Experimental Biologist

TRAPPIST Totally Rad Analysis Pipelines Python Super Tool �

genome list of list of CONTIG FISHER reference DBs genomes targets Example pipeline / workflow reference plasmids list of + constructs genes OR (my research) Data Input HOST HK_SET CONSERVATION EXTRACTOR SORTER segment STRUCTURE H_K sets Core Analysis sets COMPARATOR ] [ Optional PHYLOGENY SEQUENCE CONTENT Analyses CONSTRUCT VARIATION FUNCTIONS ] [ host plasmid trees trees PHYLOGENY CONGRUENCE

Do it manually? Exhibit B: Lazy Postdoc (me)

Long story short Collection of one-time scripts  Toolkit  Full-featured application  DNA-based OS?

Fundamental requirements  Design, assembly and modification of pipelines / workflows  Automated execution, parameter / output versioning, provenance data bundling, interactive visualization

Staging / Execution

Staging system  Basic requirements  intuitive  flexible  extensible + Pre-assembled workflows / pipelines + Hooks for external / roll-your-own functions

Staging area - workflows

Workflow components  TRAPPIST provides discrete task components for every step of analysis:  Initial inputs selection  Data processing steps (existing algorithms)  Graphical output  Component I/O relies on matching ports with data object classes Forces validation of data type/format (not up to user) 

Component representation

Interacting with components

Connecting components

Execution system  Progressive / modular / dependency-aware  Parameter set versioning linked to output versioning  Users more likely to try various parameters to test assumptions

Flow control rule  Component ports have “fill” status  If all its inputs are filled, component is OK to execute � add component to execution queue I’m OK to go!

Database architecture Workflow 1 Staging DB Execution DB Workflow 2 Central DB Staging DB Execution DB

Staging DB  DB schema + data dump sufficient to fully describe a workflow

Execution DB

Enforcing good practices  Provenance bundle including:  Workflow schema  Parameter sets  Code version info  Executable papers!  Reproducibility!  Science!

Interactive visualization

U haz questions?

https://github.com/gglobster/trappist TRAPPIST: d e r u t a - PowerPoint PPT Presentation

TRAPPIST: A toolkit for comparative analysis and visualization of genomic regions Geraldine A. Van der Auwera, PhD https://github.com/gglobster/trappist TRAPPIST: d e r u t a e f - l l u f n o A toolkit for comparative

https://www.github.com/betatim/openrefineder https://www.github.com/betatim/openrefineder

Program 1 MOA signed October 22, 2010 Wildish Property Trappist Abbey Provisions of the MOA

GitHub Provider The GitHub provider is used to interact with GitHub organization resources. The

Git 101: Git and GitHub for beginners Overview 1.Install git and create a Github account

A Modern C++ Parallel Task Programming Library GitHub: https://github.com/cpp-taskflow Docs:

HydraBus An Open Source Platform HydraBus/HydraFW GitHub Hardware / Schematics on GitHub

Version control [ GitHub ] Thomas De Graaff August 23, 2016 Introduction Assignments: Working

Bro stuff Justin Azoff Aug 4, 2015 try.bro.org on github Figure : try.bro on github Bro

Dependently Typed Heaps https://github.com/brunjlar/heap About Me Lars Brnjes (PhD) (Pure)

ANATOMY OF A SERVERLESS GITHUB BOT How we built a serverless GitHub bot using Azure for the

Scaling Machine Learning Rahul Dave, for cs109b github https://github.com/rahuldave/dasktut

Static Typing Slides available from github at: https://github.com/bhurt/presentations/blob/master

Code With Purpose @tomprats github.com/tomprats www.tomify.me Tom Prats Developer

DAT300 / DIT615 Github Repo Charalampos Stylianopoulos Github Repo Example projects from

Matthew McCullough @matthewmccull @matthewmccull training@github.com training.github.com

ConnectHome Nation Webinar Introduction to GitHub July 30, 2019 1 Agenda Agenda What is 1.

Session 3. Mapping and Timeline Define geographical boundaries of the project (mapping) This

What are These? Jerry Gilfoyle Biological Attack! 1 / 23 What are These? Anthrax spores Jerry

Elemental Microanalysis of Bacillus Anthracis Spores from the Amerithrax Case Joseph R. Michael

PORTSECURITY,ANTHRAX,ANDDRUGSAFETY: ADIMACSMEDLEY DavidMadigan

Entry and the USO in the Postal Sector ACCC 2004 Regulatory Conference July 29-30, 2004 Sea

CSCI-548: Information Integration on the Web Craig Knoblock University of Southern California

Speculative Plan Execution for Information Agents Greg Barish University of Southern California

Data Mining Lecture 06: Bayes Theorem Theses slides are based on the slides by Tan,

Sambuz

Useful Links

Newsletter

Mail Us