A metagenomic tool for cheese ecosystems Anne-Laure Abraham, - - PowerPoint PPT Presentation

a metagenomic tool for cheese ecosystems
SMART_READER_LITE
LIVE PREVIEW

A metagenomic tool for cheese ecosystems Anne-Laure Abraham, - - PowerPoint PPT Presentation

A metagenomic tool for cheese ecosystems Anne-Laure Abraham, Quentin Cavaill, Thibaut Guirimand, Sandra Drozier, Charlie Pauvert, Mahendra Mariadassou, Bedis Dridi, Valentin Loux, Pierre Renault Jouy en Josas France Sept 9, 2018


slide-1
SLIDE 1

Sept 9, 2018

A metagenomic tool for cheese ecosystems

Anne-Laure Abraham, Quentin Cavaillé, Thibaut Guirimand, Sandra Dérozier, Charlie Pauvert, Mahendra Mariadassou, Bedis Dridi, Valentin Loux, Pierre Renault Jouy en Josas – France

slide-2
SLIDE 2

.02

JOUR / MOIS / ANNEE RCAM 2018

2

Cheesemaking

Evolution of the ecosystem during cheese making Inoculated micro organisms House microbiota Starters Micro organisms from : animal milk, Waterflows, airflows Micro organisms from salt Ripening cultures Micro organisms from shelves, cellar Micro organisms: bacteria, yeasts, fungi, phages

slide-3
SLIDE 3

.03

JOUR / MOIS / ANNEE RCAM 2018

3

Properties of cheese micro organims

Starters Micro organisms from : animal milk, Waterflows, airflows Micro organisms from salt Ripening cultures Micro organisms from shelves, cellar Organoleptic properties Acid flavor Fruity flavor Formation of bubbles Production of lactic acid, carbon dioxide, alcohol, aldehydes ketones … Coat texture Coat color

slide-4
SLIDE 4

.04

JOUR / MOIS / ANNEE RCAM 2018

4

Knowledge of cheese micro organims

Starters Micro organims from : animal milk, Waterflows, airflows Micro organisms from salt Ripening cultures Micro organisms from shelves, cellar Defined starter cultures Undefined complex starters

Not completely known

more vulnerable to bacteriophage attack

Known

“domesticated cultures” Inoculated micro organisms House microbiota

Not completely known

slide-5
SLIDE 5

.05

JOUR / MOIS / ANNEE RCAM 2018

5

Why study cheese ecosystem?

Protect functional properties of strains Identify origin of organoleptic properties of strains Quality control Follow ecosystem during cheese manufacturing Compare production lines Study strain diversity Major reduction in the diversity of micro-organisms due to sanitary pressure & intensification of production

slide-6
SLIDE 6

.06

JOUR / MOIS / ANNEE RCAM 2018

6

Food microbiomes project

  • Project with academic & dairy industries
  • Use metagenomics to achieve a better understanding of cheese ecosystems

Develop a user-friendly tool to analyze cheeses samples

  • Characteristics of cheese ecosystems
  • Few species (a few dozens)
  • More than 4000 sequenced dairy genomes ≥ 1 genome / most species
  • Needs
  • Precise taxonomic assignation (strain level)
  • Low abundant species identification
  • Identification of genes (and their functions)
  • A user-friendly interface for non bioinformaticians
  • A database with dairy genomes
  • Results easy to understand
  • Public / private genomes & metagenomes
slide-7
SLIDE 7

.07

JOUR / MOIS / ANNEE RCAM 2018

7

Metagenomic shotgun taxonomic assignation

Methods based on Kmer

  • r Burrows–Wheeler

transform Krachen (Wood, 2014) CLARK (Ounit, 2015) Kaiju (Menzel, 2015) Centrifuge (Kim, 2016) Methods based on genomes/contigs mapping Sigma (Ahn, Bioinformatics, 2015) MicrobeGPS (Lindner, PLoS

One, 2015)

DESMAN (Quince, Genome

biol, 2017)

MetaSNV (Costea, Plos one, 2017) Constrains (Luo, Nat Biotech,

2015)

metaMLST (Zolfo, NAR, 2017)

StrainPhlAn (Truong,

Genome Research, 2017)

Methods based on marker genes Limited taxonomic assignation precision Precise taxonomic assignation Fast, large database Slow, limited database Identification of strain-level variation

slide-8
SLIDE 8

.08

JOUR / MOIS / ANNEE RCAM 2018

8

Metagenomic alignment

Ecosystem Alignment sequencing mismatches Unaligned reads Reference genomes Sequencing errors & Absence of good reference genome Choice of alignment parameters

slide-9
SLIDE 9

.09

JOUR / MOIS / ANNEE RCAM 2018

9

Metagenomic alignment

Ecosystem Alignment sequencing Reference genomes Regions with high reads coverage Repeated regions Heterogenous sequencing depth Transposable elements Conserved regions Low abundance High abundance Choice of alignment results cleaning

slide-10
SLIDE 10

.010

JOUR / MOIS / ANNEE RCAM 2018

10

Coverage of genomes

Close strain – intermediate abundance Absent strain Very close strain – high abundance

slide-11
SLIDE 11

.011

JOUR / MOIS / ANNEE RCAM 2018

11

Characteristics of alignment

Software Bowtie (Langmead, Genome Biology 2009)

  • 3 mismatches allowed (-v)
  • If several best hits, choose one randomly (-a --best --strata -M 1)

CDS CDS

Filtered Filtered

CDS Select reads that align on CDS Filter some CDS:

  • Annotated: integrase, transposases, IS, phage
  • Length <300nt
slide-12
SLIDE 12

.012

JOUR / MOIS / ANNEE RCAM 2018

12

Characteristics of mapping

CDS CDS

Filtered Filtered

CDS Samtools & bedtools:

  • Identify variant positions
  • VCF file

Compute expected coverage

  • Fraction of genome that should be covered by at least one read if the genome is present
  • Lander & Waterman statistics

th GenomeLeng ReadNumber ReadLength

exp 1

 

  C

Observed distribution Expected distribution

htslib.org

slide-13
SLIDE 13

.013

JOUR / MOIS / ANNEE RCAM 2018

13

Genome indexes Summary (Samtools – Bedtools) Reference creation Alignment (bowtie) Reference genomes database (genbank) Metagenome (fastq) Gene annotations (GFF) CDS CDS CDS Reads alignment (BAM) Summary for each genome (CSV) Reads for each CDS (GFF)

Schema

slide-14
SLIDE 14

.014

JOUR / MOIS / ANNEE RCAM 2018

14

software output

Mean, median, sd coverage Number of variant positions Genome name CDS number %CDS with at least 1 read % positions covered by reads Expected % positions covered by reads (Lander & Waterman)

Summary for each genome (CSV) Reads for each CDS (GFF)

CDS Localization CDS Name & product CDS Length, Length covered by reads & Number of positions with mismatches CDS coverage

slide-15
SLIDE 15

.015

JOUR / MOIS / ANNEE RCAM 2018

15

A dedicated dairy database

  • Based on organisms known to be in dairy products
  • Database enrichment: sequencing and assembly of new species

isolated from dairy products - 150 bacterial species & 15 filamentous fungi and yeasts

  • 4000 genomes, manually selected
  • Work in progress:
  • Use text mining to:
  • Identify dairy species of the literature
  • Identify habitat of species found in metagenomics (for example:

sea for salt bacteria)

  • Annotation enrichment: genes of technological interest

(Almeida et al. 2014 BMC Genomics) Collab C. Nedellec team, MaIAGE

slide-16
SLIDE 16

.016

JOUR / MOIS / ANNEE RCAM 2018

16

Web interface & server

Quentin Cavaillé, Thibaut Guirimand, Sandra Dérozier, Pierre Renault, Valentin Loux

  • User friendly interface
  • Public/private genomes and samples
  • Personalized analyses
slide-17
SLIDE 17

.017

JOUR / MOIS / ANNEE RCAM 2018

17

  • Tchapalo: traditional beer in Côte d’Ivoire
  • Mean production: 38.000 t/year
  • Daily familial consumption
  • Income-generating economic activity
  • Production process:
  • Sorghum malt goes through a double fermentation:
  • Natural lactic fermentation => sour wort
  • Alcoholic fermentation => Tchapalo

17

Tchapalo ecosystem

Racha ZAARIR

slide-18
SLIDE 18

.018

JOUR / MOIS / ANNEE RCAM 2018

18

Tchapalo ecosystem analysis

72.3% 25.1% 80.2% 15.9% Metagenomic analysis Microbiology analysis

slide-19
SLIDE 19

.019

JOUR / MOIS / ANNEE RCAM 2018

19

Tchapalo ecosystem abundant species

genome % CDS covered meanCoverage % coverage Expected % coverage Lactobacillus fermentum S6 100 54,979 99,215 100 Lactobacillus delbrueckii subsp. lactis KCCM 34717 95,503 150,326 91,717 100 Lactobacillus delbrueckii subsp. Jakobsenii 99,669 164,759 99,119 100

The strain Lactobacillus fermentum S6 is very close to the strain of the ecosystem Lactobacillus delbrueckii subsp. Jakobsenii is more close than Lactobacillus delbrueckii subsp. lactis KCCM 34717 to the strain of the ecosystem

slide-20
SLIDE 20

.020

JOUR / MOIS / ANNEE RCAM 2018

20

Tchapalo ecosystem low abundant species

genome % CDS covered meanCoverage % coverage Expected % coverage # reads Saccharomyces cerevisiae YJM326 90,727 0,094 8,145 8,908 21405 Pediococcus acidilactici DSM 20284 81,706 0,577 9,418 46,005 28706

The strain Saccharomyces cerevisiae YJM326 YJM326 is very close to the strain of the ecosystem Pediococcus acidilactici DSM 20284 is absent of the ecosystem (reads coming from

  • ther Lactobacillaceae)
slide-21
SLIDE 21

.021

JOUR / MOIS / ANNEE RCAM 2018

21

Conclusion

  • Will be publicly available for research purpose
  • An account on the INRA migale platform is required
  • The software and database development are still on going

Genome indexes Summary (Samtools – Bedtools) Reference creation Alignment (bowtie) Reference genomes database (genbank) Metagenome (fastq) Gene annotations (GFF) C D S C D S CD S Reads alignment (BAM) Summary for each genome (CSV) Reads for each gene (GFF)

Reference genome database Metagenomic software Web interface Provinding a user friendly tool for metagenomic analysis

slide-22
SLIDE 22

.022

JOUR / MOIS / ANNEE RCAM 2018

22

Perspectives

 Genomes pre-selection using a faster method (k-mer or Burrows– Wheeler transform) to speed up computation  Allow metagenomes analysis comparisons  Apply it on MetaPDOCheese project (next slide)  Application to other ecosystems with enough reference genomes (for example: fermented food, animals digestive ecosystems…)

slide-23
SLIDE 23

.023

JOUR / MOIS / ANNEE RCAM 2018

23

Compare ecosystems

  • f the same PDO area

MetaPDO Cheese Project

INRA: MaIAGE (S. Dérozier, V. Loux, M. Mariadassou, C. Nedellec, Q. Cavaillé), Micalis (P. Renault, T. Guirimand, B. Dridi, C. Pauvert), GMPA (F. Irlinger), URF(C Delbès), CNIEL

Follow ecosystem in the time scale of cheesemaking What are the structural and functional diversities of cheese ecosystems? What are the evolutionary mechanisms of microbial population?

  • 44 Protected Designation of Origin French Cheeses
  • 1200 samples -16S & ITS sequencing
  • Some sample with shotgun sequencing
  • Sequencing of 100 new genomes
slide-24
SLIDE 24

.024

JOUR / MOIS / ANNEE RCAM 2018

24

Thanks to

StatInfOmics and Bibliome teams Migale platform Robert Bossy Quentin Cavaillé Estelle Chaix Hélène Chiapello Louise Deleger Sandra Dérozier Valentin Loux Mahendra Mariadassou Claire Nédellec Pierre Nicolas Sophie Schbath

Micalis

Pierre Renault Charlie Pauvert Thibaut Guirimand Bédis Dridi Racha Zaarir

slide-25
SLIDE 25

Sept 9, 2018

slide-26
SLIDE 26

RCAM 2018

26

% ID Pos covered 100 nt / Pos covered 35 nt

slide-27
SLIDE 27

RCAM 2018

27

% ID Nb Reads 100 nt / Nb Reads 35 nt

slide-28
SLIDE 28

.028

JOUR / MOIS / ANNEE RCAM 2018

28 From: Irlinger et al. FEMS Microbiol Lett. 2014

Cheese ecosystems

slide-29
SLIDE 29

.029

JOUR / MOIS / ANNEE RCAM 2018

29

Challenges of taxonomic assignation ??? Ou pas ?? Aussi challenges fonctions ??

We don’t have reference genomes for each strain of the ecosystem Some genera with many reference genomes, others without a reference genome Impossible to sequence every strain (non cultivable species, cost of DNA extraction, sequencing and storage…) Computational challenge: impossible to compare reads to every sequenced genome November 2017 : 124 481 procaryotic genomes A metagenome : millions reads per sample

Tree of life Reference genomes Ecosystem strains

slide-30
SLIDE 30

.030

JOUR / MOIS / ANNEE RCAM 2018

30

GeDI method

Sequencing bias & repeted regions Heterogenous genome coverage Artefact Alignment on close genome

Gene position Very close strain – high abundance

% coverage Genome position

Strains present in different proportions

slide-31
SLIDE 31

.031

JOUR / MOIS / ANNEE RCAM 2018

31

Knowledge of cheese micro organims

Inoculated micro organims House microbiota starters Micro organims from : animal milk, Waterflows, airflows Micro organisms from salt Ripening cultures Micro organisms from shelves, cellar

Not completely known

Defined starter cultures Undefined complex starters

Not completely known

more vulnerable to bacteriophage attack

Known

“domesticated cultures”