Genomics Virtual Laboratory Mike Pheasant (UQ) Andrew Lonie - - PowerPoint PPT Presentation
Genomics Virtual Laboratory Mike Pheasant (UQ) Andrew Lonie - - PowerPoint PPT Presentation
Genomics Virtual Laboratory Mike Pheasant (UQ) Andrew Lonie (VLSCI) What is the Genomics Virtual Laboratory? NeCTAR funded nationally distributed platform for genomics, built on the Research Cloud and RDSI NeCTAR? Research Cloud? RDSI?
What is the Genomics Virtual Laboratory? NeCTAR funded nationally distributed platform for genomics, built on the Research Cloud and RDSI
NeCTAR? Research Cloud? RDSI?
NeCTAR = National eResearch Collaboration Tools and Resources
http://www.nectar.org.au
NeCTAR Research Cloud
http://www.nectar.org.au/research-cloud
RDSI = Research Data Storage Initiative
http://www.rdsi.uq.edu.au/
What is the Genomics Virtual Laboratory?
NeCTAR funded nationally distributed platform for genomic analyses:
Infrastructure
- Workflow management system
- Bioinformatics toolkit (for command-line users)
- Visualisation services
- Scalable compute infrastructure
Resources
- Tutorials and exemplar workflows targetted at common high throughput
genomics tasks
- Data catalogues and coordination centres
- Subscription based support
What is the Genomics Virtual Lab?
Workflow platforms
Workflow platforms
Interactive platforms for developing genomics workflows and interactive data analysis
- Galaxy
- Genepattern, others possible (Bioflow, ...)
What's Galaxy? "an open, web-based platform for performing accessible, reproducible, and transparent genomic science."
http://galaxyproject.org Accessible: Users without programming experience can easily specify parameters and run tools and workflows Reproducible: Galaxy captures information so that any user can repeat and understand a complete computational analysis Transparent: Users share and publish analyses via the web
Visualisation platforms
Cluster-on-the-cloud
Cluster-on-the-cloud
CloudBioLinux - Linux with comprehensive, actively maintained suite of bioinformatics tools
http://cloudbiolinux.org/
CloudMan: platform for launching and scaling CloudBioLinux clusters and Galaxy clusters on the cloud
http://usecloudman.org
Research Cloud: ~25000 CPUs to be spread across 6-10 research centres around Australia, to host research activities 'on demand'
http://www.nectar.org.au/research-cloud
Data catalogues
Data catalogues
UCSC databases Ensembl databases ENCODE dbSNP, Hapmap ICGC, COSMIC BPA Framework Datasets
- sarcoma
- wheat
- soil diversity
Tutorials and workshops
Tutorials and education resources
NGS School - summer schools, 2 day workshops Galaxy based online tutorials:
- Intro to NGS
- Genome Browsers
- Common analyses
○
Differential gene expression
○
Variant calling
○
ChIPseq
○
...
Exemplar best practice workflows
Exemplar workflows
- Variant calling:
○ GATK best-practice ○ microbial ○ cancer-optimised
- RNA-seq differential expression
- Fusion gene discovery from RNA-seq
- MicroRNA analysis
- De novo genome and transcriptome assembly
- Metagenomics
- ChIP-seq
- Variant annotation
- Pathway analysis
- Methylation
Support
Genomics Informatics Network
Institutional subscriptions:
- genomics support (% of FTE)
- large compute and data resources
- managed instances of GVL
- new GVL tool development
- advocacy to funding bodies for resources
- communities of best practice
Or...roll your own GVL
Progress and timelines
Dec 2012 Prototype at Qld (UQ) and Vic (UoM)
- Galaxy
- UCSC browser + databases
- Bioinformatics cluster-on-the-cloud
- Initial tutorials and exemplars
Jun 2013 Production at Qld (UQ) and Vic (UoM), prototype @ other Research Cloud nodes Data coordination centres, data catalogues Dec 2013 Additional workflows and tutorials Additional nodes Jun 2014 Operations (support centres - subscriptions)