Webinars series Live demonstrations on the e-infrastructure - PowerPoint PPT Presentation

Webinars series Live demonstrations on the e-infrastructure deployment and the risk assessment case studies Topic Date & Time See Webinar recordings: ● Session 1 (24 Sep 2018) Introduction sessions to the OpenRiskNet e- ● Session 2 (27 Sept 2018) infrastructure ● Session 3 (4 Oct 2018) ● Session 4 (30 Oct 2018) Learn how to deploy the OpenRiskNet virtual See Webinar recordings (25 Feb 2019) research environment Demonstration on data curation and creation of Monday, 18 March 2019 pre-reasoned datasets in the OpenRiskNet Past events 16:00 CET framework Identification and linking of data related to Tuesday, 26 March 2019 AOPWiki (an OpenRiskNet case study) 17:00 CET https://openrisknet.org/events/ Monday, 1 April 2019 Semantic annotation 16:00 CET The Adverse Outcome Pathway Database Monday, 8 April 2019 (AOP-DB) 16:00 CET Current Event Nextflow and TGX case study Monday, 27 May 2019 www.openrisknet.org

Nextflow for toxicogenomics-based predictions on the OpenRiskNet Virtual Research Infrastructure Evan Floden (Centre for Genomic Regulation) Webinar - 27 May 2019 OpenRiskNet: Open e-Infrastructure to Support Data Sharing, Knowledge Integration and in silico Analysis and Modelling in Risk Assessment www.openrisknet.org Project Number 731075

About the project OpenRiskNet is a 3-year EU Horizon 2020 project with the main objective to develop an open e-infrastructure providing resources and services to a variety of communities requiring risk assessment, including chemicals, cosmetic ingredients, therapeutic agents and nanomaterials. Main components: ➔ Case-study-driven development - examples of tools to be integrated are selected based on the case study needs. More information: https://openrisknet.org/e-infrastructure/development/case-studies/ ➔ Solutions for all areas by integrating existing tools from consortium and associated partners (via the implementation challenge) ➔ Integrated approach combining experimental data ( in vivo, in vitro, in chemico ) with analysis, modelling and simulation tools into risk assessment workflows www.openrisknet.org

Webinars series Live demonstrations on the e-infrastructure deployment and the risk assessment case studies Topic Date & Time See Webinar recordings: ● Session 1 (24 Sep 2018) Introduction sessions to the OpenRiskNet e- ● Session 2 (27 Sept 2018) infrastructure ● Session 3 (4 Oct 2018) ● Session 4 (30 Oct 2018) Learn how to deploy the OpenRiskNet virtual See Webinar recordings (25 Feb 2019) research environment Demonstration on data curation and creation of Monday, 18 March 2019 pre-reasoned datasets in the OpenRiskNet Past events 16:00 CET framework Identification and linking of data related to Tuesday, 26 March 2019 AOPWiki (an OpenRiskNet case study) 17:00 CET https://openrisknet.org/events/ Monday, 1 April 2019 Semantic annotation 16:00 CET The Adverse Outcome Pathway Database Monday, 8 April 2019 (AOP-DB) 16:00 CET Current Event Nextflow and TGX case study Monday, 27 May 2019 www.openrisknet.org

https://prod.openrisknet.org/ The OpenRiskNet VE www.openrisknet.org

https://github.com/OpenRiskNet/home/wiki The OpenRiskNet VE www.openrisknet.org

What is Toxicogenomics? www.openrisknet.org

External Compute Resources www.openrisknet.org

https://ewels.github.io/sra-explorer/ https://ewels.github.io/AWS-iGenomes/ External Data Resources www.openrisknet.org

Portable Computation Virtual Infrastructure Application www.openrisknet.org

Scientific Workflow Managers www.openrisknet.org

Toxicogenomic workflows ● Data analysis applications performs computation to generate information from large genomic datasets (resource requirements) ● Embarrassingly parallelisation, can spawn 100s-100k jobs over distributed cluster ● Mash-up of many different tools and scripts (dependancies!) Complex dependency trees and configuration → very fragile ecosystem ● www.openrisknet.org

Steinbiss et al., Companion parassite genome annotation pipeline , DOI: 10.1093/nar/gkw292 www.openrisknet.org

a lot of moving parts 70 tasks 55 external scripts 39 software tools & libraries www.openrisknet.org

To reproduce the result of a typical   computational biology paper   requires 280 hours. ≈ 1.7 months! www.openrisknet.org

* Di Tommaso P, et al., Nextflow enables computational reproducibility , Nature Biotech, 2017 www.openrisknet.org

Comparison of the Companion pipeline annotation of Leishmania infantum genome executed across different platforms * Platform Amazon Linux Debian Linux Mac OSX Number of chromosomes 36 36 36 Overall length (bp) 32,032,223 32,032,223 32,032,223 Number of genes 7,781 7,783 7,771 236.64 236.64 236.32 Gene density Number of coding genes 7,580 7,580 7570 Average coding length (bp) 1,764 1,764 1,762 113 113 111 Number of genes with multiple CDS Number of genes with known function 4,147 4,147 4,142 Number of t-RNAs 88 90 88 * Di Tommaso P, et al., Nextflow enables computational reproducibility , Nature Biotech, 2017 www.openrisknet.org

challenges for risk assessment entering into the omics era Reproducibility Portability Scalability Usability Traceability www.openrisknet.org

PUSH-THE-BUTTON PIPELINES

The fundamentals for scaleable genomic workflows code orchestration dependencies Git GitHub sharing & reproducibility deployment www.openrisknet.org

how to achieve this? ● Fast prototyping ⇒ custom DSL that enables tasks composition, simplifies most use cases + general purpose programming lang. for corner cases ● Easy parallelisation ⇒ declarative reactive programming model based on dataflow paradigm, implicit portable parallelism ● Self-contained ⇒ functional approach, a task execution is idempotent ie. cannot modify the state of other tasks + isolate dependencies with containers ● Portable deployments ⇒ executor abstraction layer + deployment configuration from implementation logic www.openrisknet.org

task example bwa mem reference.fa sample.fq \ | samtools sort -o sample.bam www.openrisknet.org

task example process align_sample { input: file 'reference.fa' from genome_ch file 'sample.fq' from reads_ch output: file 'sample.bam' into bam_ch script: """ bwa mem reference.fa sample.fq \ | samtools sort -o sample.bam """ } www.openrisknet.org

tasks composition process align_sample { process index_sample { input: input: file 'reference.fa' from genome_ch file 'sample.bam' from bam_ch file 'sample.fq' from reads_ch output: output: file 'sample.bai' into bai_ch file 'sample.bam' into bam_ch script: script: """ """ samtools index sample.bam bwa mem reference.fa sample.fq \ """ | samtools sort -o sample.bam } """ } www.openrisknet.org

dataflow programming model ● Declarative computational model for parallel process executions ● Processes wait for data, when an input set is ready the process is executed ● They communicate by using dataflow variables i.e. async FIFO queues called channels ● Parallelisation and tasks dependencies are implicitly defined by process in/out declarations www.openrisknet.org

How parallelisation works process task 1 out x data x channel task 2 out y data y data z data x data y task 3 data z out z www.openrisknet.org

how parallelisation works samples_ch = Channel.fromPath('data/sample.fastq') process FASTQC { input: file reads from samples_ch output: file 'fastqc_logs' into fastqc_ch script: """ mkdir fastqc_logs fastqc -o fastqc_logs -f fastq -q ${reads} """ } www.openrisknet.org

how parallelisation works samples_ch = Channel.fromPath(‘data/*.fastq') process FASTQC { input: file reads from samples_ch output: file 'fastqc_logs' into fastqc_ch script: """ mkdir fastqc_logs fastqc -o fastqc_logs -f fastq -q ${reads} """ } www.openrisknet.org

implicit parallelism Channel.fromPath("data/*.fastq") clustalo clustalo FASTQC www.openrisknet.org

handling file pairs Channel.fromFilePairs("*_{1,2}.fq") gut_1.fq gut_2.fq liver_1.fq liver_2.fq lung_1.fq ( gut, [gut_1.fq, gut_2.fq] ) lung_2.fq ( lung, [lung_1.fq, lung_2.fq] ) ( liver, [liver_1.fq, liver_2.fq] ) www.openrisknet.org

basic example ( gut, [gut_1.fq, gut_2.fq] ) ( lung, [lung_1.fq, lung_2.fq] ) ( liver, [liver_1.fq, liver_2.fq] ) process FASTQC { input: set pair_id, file(reads) from samples_ch output: file 'fastqc_logs' into fastqc_ch """ mkdir fastqc_logs fastqc -o fastqc_logs -f fastq -q ${reads} """ } www.openrisknet.org

deployment scenarios www.openrisknet.org

local execution ● Common development scenario laptop / workstation ● Dependencies can be managed using a nextflow container runtime ● Parallelisations is managed spawning posix processes docker/singularity ● Can scale vertically using fat server / shared mem. machine OS local storage www.openrisknet.org

Webinars series Live demonstrations on the e-infrastructure - PowerPoint PPT Presentation

Webinars series Live demonstrations on the e-infrastructure deployment and the risk assessment case studies Topic Date & Time See Webinar recordings: Session 1 (24 Sep 2018) Introduction sessions to the OpenRiskNet e- Session 2

Lead Screw Motors LSM08 Series LSM11 Series LSM14 Series LSM17 Series

Ask the Experts Webinar Series Series of recorded webinars to be posted on the NASP

Welcome to the series of webinars on Ocean Literacy! www.responseable.eu 2 Why these webinars?

standard series Overview DP series DX series H series M series bitte hier

Fiscal Year 2016 Arts Grants Workshops and Webinars Format for the Grant Workshops & Webinars

The Nutritionist 2019 Live and Recorded Ruminant Nutrition Webinars More Information at

National Grid Webinars Feedback Q1: Did you find the webinars useful? Question1. Website

Data Drives Revenue. begins soon Upcoming Webinars for your calendar: Sales Webinars:

Sponsors Wednesday Webinars Upcoming Webinars- 8/26 Newsletters August 12, 2015 9/9 How

APS Webinars APS Webinars Getting more out of APS Meetings Getting more out of APS Meetings

E- -Series: Series: Water Mist Extinguishers Water Mist Extinguishers E E- -Series: Series:

Fourier Series Fourier Sine Series Fourier Cosine Series Fourier Series Convergence

Time Series Analysis and Mining with R Time Series Decomposi- tion Time Series Forecasting

Ensuring Accurate Data for CCRPI Session 1 Accountability Webinars Accountability will host a

Upcoming GLR Learning Tuesdays Webinars: EMERGING INNOVATIONS SERIES Bridging the Early Years

Outline Time series and forecasting Time series objects 1 in R Basic time series functionality

Heritage Fund Moray Part of the Scottish Governments Highlands & European Regional

Swisscontact Niger: Youth Training and Employment in a fragile context SDC Seminar

Government Caring For Historic Records Agenda Part 1: Role of the Archivist Part 2:

Nametag Glyph Fold your paper in half and write your name in the center. In the top left

Fragile X Monday November 27th, 2017 www-x-fragile.be Agenda Debrief from annual European

Consistently Supporting and Connecting Families Jayne Dixon Weber Missy Zolecki Laureen Majeske

measuring emotional outbursts Justin Chung kwrg@contacts.bham.ac.uk Dr Kate Anne Woodcock

Lane Cove Cats Junior AFL Club We offer the opportunity for children to participate in the game

Webinars series Live demonstrations on the e-infrastructure - PowerPoint PPT Presentation

Webinars series Live demonstrations on the e-infrastructure deployment and the risk assessment case studies Topic Date & Time See Webinar recordings: Session 1 (24 Sep 2018) Introduction sessions to the OpenRiskNet e- Session 2

Lead Screw Motors LSM08 Series LSM11 Series LSM14 Series LSM17 Series

Ask the Experts Webinar Series Series of recorded webinars to be posted on the NASP

Welcome to the series of webinars on Ocean Literacy! www.responseable.eu 2 Why these webinars?

standard series Overview DP series DX series H series M series bitte hier

Fiscal Year 2016 Arts Grants Workshops and Webinars Format for the Grant Workshops &amp; Webinars

The Nutritionist 2019 Live and Recorded Ruminant Nutrition Webinars More Information at

National Grid Webinars Feedback Q1: Did you find the webinars useful? Question1. Website

Data Drives Revenue. begins soon Upcoming Webinars for your calendar: Sales Webinars:

Sponsors Wednesday Webinars Upcoming Webinars- 8/26 Newsletters August 12, 2015 9/9 How

APS Webinars APS Webinars Getting more out of APS Meetings Getting more out of APS Meetings

E- -Series: Series: Water Mist Extinguishers Water Mist Extinguishers E E- -Series: Series:

Fourier Series Fourier Sine Series Fourier Cosine Series Fourier Series Convergence

Time Series Analysis and Mining with R Time Series Decomposi- tion Time Series Forecasting

Ensuring Accurate Data for CCRPI Session 1 Accountability Webinars Accountability will host a

Upcoming GLR Learning Tuesdays Webinars: EMERGING INNOVATIONS SERIES Bridging the Early Years

Outline Time series and forecasting Time series objects 1 in R Basic time series functionality

Heritage Fund Moray Part of the Scottish Governments Highlands &amp; European Regional

Swisscontact Niger: Youth Training and Employment in a fragile context SDC Seminar

Government Caring For Historic Records Agenda Part 1: Role of the Archivist Part 2:

Nametag Glyph Fold your paper in half and write your name in the center. In the top left

Fragile X Monday November 27th, 2017 www-x-fragile.be Agenda Debrief from annual European

Consistently Supporting and Connecting Families Jayne Dixon Weber Missy Zolecki Laureen Majeske

measuring emotional outbursts Justin Chung kwrg@contacts.bham.ac.uk Dr Kate Anne Woodcock

Lane Cove Cats Junior AFL Club We offer the opportunity for children to participate in the game

Fiscal Year 2016 Arts Grants Workshops and Webinars Format for the Grant Workshops & Webinars

Heritage Fund Moray Part of the Scottish Governments Highlands & European Regional