The study of microbial communities: Bioinformatics applications - - PowerPoint PPT Presentation

the study of microbial communities bioinformatics
SMART_READER_LITE
LIVE PREVIEW

The study of microbial communities: Bioinformatics applications - - PowerPoint PPT Presentation

The study of microbial communities: Bioinformatics applications within the UL HPC environment UL HPC school 2017 13 June 2017 Sh Shaman Na Narayanasamy Eco-Systems Biology group of LCSB The subject: microbial communities 2 The samples:


slide-1
SLIDE 1

The study of microbial communities: Bioinformatics applications within the UL HPC environment

UL HPC school 2017 13 June 2017 Sh Shaman Na Narayanasamy

Eco-Systems Biology group of LCSB

slide-2
SLIDE 2

The subject: microbial communities

2

slide-3
SLIDE 3

The samples: Biomolecules

3

Roume et al. ISME J. (2013) 7:110-21 Roume et al. Methods Enzymol. (2013) 531:219-36

slide-4
SLIDE 4

4

Metatranscriptomics Metagenomics

Data integration

Metaproteomic

Roume et al. ISME J. (2013) 7:110-21 Roume et al. Methods Enzymol. (2013) 531:219-36

The measurements: High-throughput data

slide-5
SLIDE 5

The measurements: Random shotgun sequencing

5

DNA / cDNA

WGS

Biological

NGS

WGS library

Biological sample

NGS reads

In silico data

cDNA: complementary DNA WGS: whole genome shotgun NGS: next-generation sequencing

slide-6
SLIDE 6

The data: Next-generation sequencing (NGS)

6

Uncompressed Size: 14-82 GB

slide-7
SLIDE 7

The process: NGS read preprocessing

7

NGS reads

Preprocessing In silico data

Preprocessed NGS reads

slide-8
SLIDE 8

8

Preprocessing Assembly Post-assembly Automation Containerization Trimmomatic CutAdapt SortMeRNA *BWA *Bowtie2 IDBA-UD MEGAHIT SPAdes AbySS Newbler Cap3 BWA Bowtie2 MaxBin dRep HMMer BLASTn AMPHORA2 PhyloPhlan Bash Make Python Perl Galaxy Snakemake CWL Ruffus Docker LXD Vigrant *BioConda

The process: NGS read preprocessing

slide-9
SLIDE 9

9

NGS reads

Preprocessing In silico data

Contig 1 Contig 2

Preprocessed NGS reads

De novo assembly

Assembled contigs

The process: De novo assembly

slide-10
SLIDE 10

10

Preprocessing Assembly Post-assembly Automation Containerization Trimmomatic CutAdapt SortMeRNA *BWA *Bowtie2 IDBA-UD MEGAHIT SPAdes AbySS Newbler Cap3 BWA Bowtie2 MaxBin dRep HMMer BLASTn AMPHORA2 PhyloPhlan Bash Make Python Perl Galaxy Snakemake CWL Ruffus Docker LXD Vigrant *BioConda

The process: De novo assembly

slide-11
SLIDE 11

The process: Post-assembly analysis

11

Structure information Contig 1 Contig 2

Assembled contigs Annotation

Binning

Contig 2 Gene B Contig 1 Gene A

Predicted genes

Contig 1

Bin X

Contig 2

Bin Y

Bins

Function information Gene B Gene A

slide-12
SLIDE 12

12

Preprocessing Assembly Post-assembly Automation Containerization Trimmomatic CutAdapt SortMeRNA *BWA *Bowtie2 IDBA-UD MEGAHIT SPAdes AbySS Newbler Cap3 BWA Bowtie2 MaxBin dRep HMMer BLASTn AMPHORA2 PhyloPhlan Bash Make Python Perl Galaxy Snakemake CWL Ruffus Docker LXD Vigrant *BioConda

The process: Post-assembly analysis

slide-13
SLIDE 13

The process: Automation

13

Preprocessing Assembly Post-assembly Automation Containerization Trimmomatic CutAdapt SortMeRNA *BWA *Bowtie2 IDBA-UD MEGAHIT SPAdes AbySS Newbler Cap3 BWA Bowtie2 MaxBin dRep HMMer BLASTn AMPHORA2 PhyloPhlan Bash Make Python Perl Galaxy Snakemake CWL Ruffus Docker LXD Vigrant *BioConda

slide-14
SLIDE 14

The process: Reproducibility

14

Preprocessing Assembly Post-assembly Automation Containerization Trimmomatic CutAdapt SortMeRNA *BWA *Bowtie2 IDBA-UD MEGAHIT SPAdes AbySS Newbler Cap3 BWA Bowtie2 MaxBin dRep HMMer BLASTn AMPHORA2 PhyloPhlan Bash Make Python Perl Galaxy Snakemake CWL Ruffus Docker LXD Vigrant *BioConda

slide-15
SLIDE 15

15

IMP available at:

http://r3lab.uni.lu/web/imp

Narayanasamy, Jarosz et al. BioarXiv (2016) Narayanasamy, Jarosz et al. Genome Biology (2016)

Original logo by Linda Wampach

The process: Integrated meta-omics pipeline (IMP)

slide-16
SLIDE 16

16

snakemake

42 tools

Input: 14-82 GB Output: 44-182 GB

20 – 280 hrs.

  • 8 cores
  • 256 – 1024 GB

RAM

  • r3.4xlarge
  • 16 cores
  • 122 GB

Computing platforms The requirements, performance and output: In numbers

Narayanasamy, Jarosz et al. BioarXiv (2016) Narayanasamy, Jarosz et al. Genome Biology (2016)

slide-17
SLIDE 17

The outcome: Knowledge on microbial communities

17

Muller, Pinel et al. Nature Communications (2014) Roume, Heintz-Buschart et al. NPJ Microbiome and Biofilms (2015) Laczny et al. Frontiers in Microbiology (2016) Heintz-Buschart et al. Nature Microbiology (2016) Narayanasamy, Jarosz et al. Genome Biology (2016) Wampach et al. Frontiers in Microbiology (2017) Kaysen et al. Translational Research (accepted) Muller, Narayanasamy et al. Standards in Genomic Sciences (in review) Wampach, Heintz-Buschart et al. (in preparation) Herold et al. (in preparation) Narayanasamy, Martinez-Arbas et al. (in preparation)

slide-18
SLIDE 18

The outcome: AcKnowledge the HPC

18

slide-19
SLIDE 19

19

And in all presentations/posters in international conferences and PhD theses!

The outcome: AcKnowledge the HPC

slide-20
SLIDE 20

The experience: Continued improvement

20

  • First impression: Impressed!
  • Initial problems:
  • Learning curve
  • File system issues
  • Users “misbehaving”
  • Independent systems (bigbug compute node and storage “boxes”)
  • No dedicated system admin for LCSB
  • Improvements over the years:
  • Solved file system issues
  • HPC school
  • Improved documentation
  • Well behaved users
  • Dedicated system admin for LCSB
  • Additional request:
  • High-quality logo on HPC website for presentations
slide-21
SLIDE 21

The future: Best practices and improvements

21

  • Best practices:
  • (Try to) Be a good user; attend the HPC school
  • Incorporate cost of HPC into budgets/grants
  • Acknowledge the HPC (manuscripts, presentations)
  • Communicate effectively!
  • Future practices and improvements:
  • Integration of independent machines with HPC
  • Reduce reliance on Docker
  • Better data management
  • Software management
  • Software benchmarking
  • *Dedicated personnel within group
  • Continuous learning!
slide-22
SLIDE 22

Acknowledgements

Former ESBers: Emilie Muller Cedric Laczny Abdul Sheik Hugo Roume Myriam Zeimes

22