Sept 9, 2018
A metagenomic tool for cheese ecosystems Anne-Laure Abraham, - - PowerPoint PPT Presentation
A metagenomic tool for cheese ecosystems Anne-Laure Abraham, - - PowerPoint PPT Presentation
A metagenomic tool for cheese ecosystems Anne-Laure Abraham, Quentin Cavaill, Thibaut Guirimand, Sandra Drozier, Charlie Pauvert, Mahendra Mariadassou, Bedis Dridi, Valentin Loux, Pierre Renault Jouy en Josas France Sept 9, 2018
.02
JOUR / MOIS / ANNEE RCAM 2018
2
Cheesemaking
Evolution of the ecosystem during cheese making Inoculated micro organisms House microbiota Starters Micro organisms from : animal milk, Waterflows, airflows Micro organisms from salt Ripening cultures Micro organisms from shelves, cellar Micro organisms: bacteria, yeasts, fungi, phages
.03
JOUR / MOIS / ANNEE RCAM 2018
3
Properties of cheese micro organims
Starters Micro organisms from : animal milk, Waterflows, airflows Micro organisms from salt Ripening cultures Micro organisms from shelves, cellar Organoleptic properties Acid flavor Fruity flavor Formation of bubbles Production of lactic acid, carbon dioxide, alcohol, aldehydes ketones … Coat texture Coat color
.04
JOUR / MOIS / ANNEE RCAM 2018
4
Knowledge of cheese micro organims
Starters Micro organims from : animal milk, Waterflows, airflows Micro organisms from salt Ripening cultures Micro organisms from shelves, cellar Defined starter cultures Undefined complex starters
Not completely known
more vulnerable to bacteriophage attack
Known
“domesticated cultures” Inoculated micro organisms House microbiota
Not completely known
.05
JOUR / MOIS / ANNEE RCAM 2018
5
Why study cheese ecosystem?
Protect functional properties of strains Identify origin of organoleptic properties of strains Quality control Follow ecosystem during cheese manufacturing Compare production lines Study strain diversity Major reduction in the diversity of micro-organisms due to sanitary pressure & intensification of production
.06
JOUR / MOIS / ANNEE RCAM 2018
6
Food microbiomes project
- Project with academic & dairy industries
- Use metagenomics to achieve a better understanding of cheese ecosystems
Develop a user-friendly tool to analyze cheeses samples
- Characteristics of cheese ecosystems
- Few species (a few dozens)
- More than 4000 sequenced dairy genomes ≥ 1 genome / most species
- Needs
- Precise taxonomic assignation (strain level)
- Low abundant species identification
- Identification of genes (and their functions)
- A user-friendly interface for non bioinformaticians
- A database with dairy genomes
- Results easy to understand
- Public / private genomes & metagenomes
.07
JOUR / MOIS / ANNEE RCAM 2018
7
Metagenomic shotgun taxonomic assignation
Methods based on Kmer
- r Burrows–Wheeler
transform Krachen (Wood, 2014) CLARK (Ounit, 2015) Kaiju (Menzel, 2015) Centrifuge (Kim, 2016) Methods based on genomes/contigs mapping Sigma (Ahn, Bioinformatics, 2015) MicrobeGPS (Lindner, PLoS
One, 2015)
DESMAN (Quince, Genome
biol, 2017)
MetaSNV (Costea, Plos one, 2017) Constrains (Luo, Nat Biotech,
2015)
metaMLST (Zolfo, NAR, 2017)
StrainPhlAn (Truong,
Genome Research, 2017)
Methods based on marker genes Limited taxonomic assignation precision Precise taxonomic assignation Fast, large database Slow, limited database Identification of strain-level variation
.08
JOUR / MOIS / ANNEE RCAM 2018
8
Metagenomic alignment
Ecosystem Alignment sequencing mismatches Unaligned reads Reference genomes Sequencing errors & Absence of good reference genome Choice of alignment parameters
.09
JOUR / MOIS / ANNEE RCAM 2018
9
Metagenomic alignment
Ecosystem Alignment sequencing Reference genomes Regions with high reads coverage Repeated regions Heterogenous sequencing depth Transposable elements Conserved regions Low abundance High abundance Choice of alignment results cleaning
.010
JOUR / MOIS / ANNEE RCAM 2018
10
Coverage of genomes
Close strain – intermediate abundance Absent strain Very close strain – high abundance
.011
JOUR / MOIS / ANNEE RCAM 2018
11
Characteristics of alignment
Software Bowtie (Langmead, Genome Biology 2009)
- 3 mismatches allowed (-v)
- If several best hits, choose one randomly (-a --best --strata -M 1)
CDS CDS
Filtered Filtered
CDS Select reads that align on CDS Filter some CDS:
- Annotated: integrase, transposases, IS, phage
- Length <300nt
.012
JOUR / MOIS / ANNEE RCAM 2018
12
Characteristics of mapping
CDS CDS
Filtered Filtered
CDS Samtools & bedtools:
- Identify variant positions
- VCF file
Compute expected coverage
- Fraction of genome that should be covered by at least one read if the genome is present
- Lander & Waterman statistics
th GenomeLeng ReadNumber ReadLength
exp 1
C
Observed distribution Expected distribution
htslib.org
.013
JOUR / MOIS / ANNEE RCAM 2018
13
Genome indexes Summary (Samtools – Bedtools) Reference creation Alignment (bowtie) Reference genomes database (genbank) Metagenome (fastq) Gene annotations (GFF) CDS CDS CDS Reads alignment (BAM) Summary for each genome (CSV) Reads for each CDS (GFF)
Schema
.014
JOUR / MOIS / ANNEE RCAM 2018
14
software output
Mean, median, sd coverage Number of variant positions Genome name CDS number %CDS with at least 1 read % positions covered by reads Expected % positions covered by reads (Lander & Waterman)
Summary for each genome (CSV) Reads for each CDS (GFF)
CDS Localization CDS Name & product CDS Length, Length covered by reads & Number of positions with mismatches CDS coverage
.015
JOUR / MOIS / ANNEE RCAM 2018
15
A dedicated dairy database
- Based on organisms known to be in dairy products
- Database enrichment: sequencing and assembly of new species
isolated from dairy products - 150 bacterial species & 15 filamentous fungi and yeasts
- 4000 genomes, manually selected
- Work in progress:
- Use text mining to:
- Identify dairy species of the literature
- Identify habitat of species found in metagenomics (for example:
sea for salt bacteria)
- Annotation enrichment: genes of technological interest
(Almeida et al. 2014 BMC Genomics) Collab C. Nedellec team, MaIAGE
.016
JOUR / MOIS / ANNEE RCAM 2018
16
Web interface & server
Quentin Cavaillé, Thibaut Guirimand, Sandra Dérozier, Pierre Renault, Valentin Loux
- User friendly interface
- Public/private genomes and samples
- Personalized analyses
.017
JOUR / MOIS / ANNEE RCAM 2018
17
- Tchapalo: traditional beer in Côte d’Ivoire
- Mean production: 38.000 t/year
- Daily familial consumption
- Income-generating economic activity
- Production process:
- Sorghum malt goes through a double fermentation:
- Natural lactic fermentation => sour wort
- Alcoholic fermentation => Tchapalo
17
Tchapalo ecosystem
Racha ZAARIR
.018
JOUR / MOIS / ANNEE RCAM 2018
18
Tchapalo ecosystem analysis
72.3% 25.1% 80.2% 15.9% Metagenomic analysis Microbiology analysis
.019
JOUR / MOIS / ANNEE RCAM 2018
19
Tchapalo ecosystem abundant species
genome % CDS covered meanCoverage % coverage Expected % coverage Lactobacillus fermentum S6 100 54,979 99,215 100 Lactobacillus delbrueckii subsp. lactis KCCM 34717 95,503 150,326 91,717 100 Lactobacillus delbrueckii subsp. Jakobsenii 99,669 164,759 99,119 100
The strain Lactobacillus fermentum S6 is very close to the strain of the ecosystem Lactobacillus delbrueckii subsp. Jakobsenii is more close than Lactobacillus delbrueckii subsp. lactis KCCM 34717 to the strain of the ecosystem
.020
JOUR / MOIS / ANNEE RCAM 2018
20
Tchapalo ecosystem low abundant species
genome % CDS covered meanCoverage % coverage Expected % coverage # reads Saccharomyces cerevisiae YJM326 90,727 0,094 8,145 8,908 21405 Pediococcus acidilactici DSM 20284 81,706 0,577 9,418 46,005 28706
The strain Saccharomyces cerevisiae YJM326 YJM326 is very close to the strain of the ecosystem Pediococcus acidilactici DSM 20284 is absent of the ecosystem (reads coming from
- ther Lactobacillaceae)
.021
JOUR / MOIS / ANNEE RCAM 2018
21
Conclusion
- Will be publicly available for research purpose
- An account on the INRA migale platform is required
- The software and database development are still on going
Genome indexes Summary (Samtools – Bedtools) Reference creation Alignment (bowtie) Reference genomes database (genbank) Metagenome (fastq) Gene annotations (GFF) C D S C D S CD S Reads alignment (BAM) Summary for each genome (CSV) Reads for each gene (GFF)
Reference genome database Metagenomic software Web interface Provinding a user friendly tool for metagenomic analysis
.022
JOUR / MOIS / ANNEE RCAM 2018
22
Perspectives
Genomes pre-selection using a faster method (k-mer or Burrows– Wheeler transform) to speed up computation Allow metagenomes analysis comparisons Apply it on MetaPDOCheese project (next slide) Application to other ecosystems with enough reference genomes (for example: fermented food, animals digestive ecosystems…)
.023
JOUR / MOIS / ANNEE RCAM 2018
23
Compare ecosystems
- f the same PDO area
MetaPDO Cheese Project
INRA: MaIAGE (S. Dérozier, V. Loux, M. Mariadassou, C. Nedellec, Q. Cavaillé), Micalis (P. Renault, T. Guirimand, B. Dridi, C. Pauvert), GMPA (F. Irlinger), URF(C Delbès), CNIEL
Follow ecosystem in the time scale of cheesemaking What are the structural and functional diversities of cheese ecosystems? What are the evolutionary mechanisms of microbial population?
- 44 Protected Designation of Origin French Cheeses
- 1200 samples -16S & ITS sequencing
- Some sample with shotgun sequencing
- Sequencing of 100 new genomes
.024
JOUR / MOIS / ANNEE RCAM 2018
24
Thanks to
StatInfOmics and Bibliome teams Migale platform Robert Bossy Quentin Cavaillé Estelle Chaix Hélène Chiapello Louise Deleger Sandra Dérozier Valentin Loux Mahendra Mariadassou Claire Nédellec Pierre Nicolas Sophie Schbath
Micalis
Pierre Renault Charlie Pauvert Thibaut Guirimand Bédis Dridi Racha Zaarir
Sept 9, 2018
RCAM 2018
26
% ID Pos covered 100 nt / Pos covered 35 nt
RCAM 2018
27
% ID Nb Reads 100 nt / Nb Reads 35 nt
.028
JOUR / MOIS / ANNEE RCAM 2018
28 From: Irlinger et al. FEMS Microbiol Lett. 2014
Cheese ecosystems
.029
JOUR / MOIS / ANNEE RCAM 2018
29
Challenges of taxonomic assignation ??? Ou pas ?? Aussi challenges fonctions ??
We don’t have reference genomes for each strain of the ecosystem Some genera with many reference genomes, others without a reference genome Impossible to sequence every strain (non cultivable species, cost of DNA extraction, sequencing and storage…) Computational challenge: impossible to compare reads to every sequenced genome November 2017 : 124 481 procaryotic genomes A metagenome : millions reads per sample
Tree of life Reference genomes Ecosystem strains
.030
JOUR / MOIS / ANNEE RCAM 2018
30
GeDI method
Sequencing bias & repeted regions Heterogenous genome coverage Artefact Alignment on close genome
Gene position Very close strain – high abundance
% coverage Genome position
Strains present in different proportions
.031
JOUR / MOIS / ANNEE RCAM 2018
31