statistical analysis of meta omics data sandra plancade
play

Statistical analysis of meta-omics data Sandra Plancade INRA - PowerPoint PPT Presentation

Statistical analysis of meta-omics data Sandra Plancade INRA (French Institute of Research in Agriculture) 24 fvrier 2016 Statistical analysis of meta-omics data Sandra Plancade INRA (French Institute of 1 / 24 1 Presentation of meta-omics


  1. Statistical analysis of meta-omics data Sandra Plancade INRA (French Institute of Research in Agriculture) 24 février 2016 Statistical analysis of meta-omics data Sandra Plancade INRA (French Institute of 1 / 24

  2. 1 Presentation of meta-omics 2 Sequencing of metagenomics data 3 Statistical analysis of metagenomics data 4 Some of my topics of interest Statistical analysis of meta-omics data Sandra Plancade INRA (French Institute of 2 / 24

  3. 1 Presentation of meta-omics 2 Sequencing of metagenomics data 3 Statistical analysis of metagenomics data 4 Some of my topics of interest Statistical analysis of meta-omics data Sandra Plancade INRA (French Institute of 3 / 24

  4. Microbial ecosystems Microbial ecosystem = population of bacteria that interact in a given environment Ñ Exple : soil, sea water, gut ã A varying proportion of bacteria are not genotyped neither cultivable. Before metagenomics : analysis of bacteria culture. Metagenomics = analysis of bacterial genes in a given biological sample. ( ‰ genomics = analysis of the genome of a given organism) Metagenomics made possible by technological advances. Ñ NGS (next generation sequencing) ã Statistical analysis of meta-omics data Sandra Plancade INRA (French Institute of 4 / 24

  5. Meta-omics data Meta-omics data= omics data measured on a population of bacteria in a given environment. Metagenomics data = DNA of bacteria. Two types of measures : ˛ only 16S gene, characteristic of the species ˛ all genes (Whole Genome Sequencing) Ñ widely studied ã Meta-transcriptomics data = RNA of bacteria Meta-proteomics data = proteins of bacteria Ñ New ã DNA Ñ RNA Ñ proteins function � genomics transcriptomics proteomics metabolomics Statistical analysis of meta-omics data Sandra Plancade INRA (French Institute of 5 / 24

  6. 1 Presentation of meta-omics 2 Sequencing of metagenomics data 3 Statistical analysis of metagenomics data 4 Some of my topics of interest Statistical analysis of meta-omics data Sandra Plancade INRA (French Institute of 6 / 24

  7. Metagenomics WGS (Whole Genome Sequencing) or shotgun Next generation sequencing AGGCTGCCA GCCATTCAGTCA GCAGGCTA . . . . . . Genes cut in small Biological List of 30-100 sequences that are sample millions of reads « read » by the populationof machine bacteria Statistical analysis of meta-omics data Sandra Plancade INRA (French Institute of 7 / 24

  8. Construction of a catalogue from a large number of sample sample n sample 1 AGGCTGCCA GTACGTAAG . . . GCCATTCAGTCA AGCCTAGTCT . . . . . . AGGCTGCCA Pool of GCCATTCAGTCA reads GTACGTAAG Assemble by AGCCTAGTCT Bruijn graph . . . CGCAAT GCAATCG CGCAATCG Long sequence of CGCATTTGAGCTAGCCTAGCATCGAGG nucleotides Délimitation of genes : sequences caracteristic begining/end of gene Metagenomics CGCATTTG AGCTAGCCTA GCATCGAGGC CTTA catalogue Ñ In gut, Metahit catalogue = 10 millions of genes. ã Statistical analysis of meta-omics data Sandra Plancade INRA (French Institute of 8 / 24

  9. Compute metagenomic abundances in a biological sample : 10 M genes AGGCTGCCA 10 M genes GCCATTCAGTCA n samples GCAGGCTA A i,j . . . Gene counts = # Reads « mapped » reads mapped on the catalogue Reads from a Matrix of biological abundances sample counts of gene g Abundance of gene g “ p length of gene g q ˆ p # reads mapped q Characteristics of the data ˛ High technical variability ˛ Very large dimension : log(p)>n ˛ In gut, 200-500,000 genes present in each sample : high sparsity Dimension reduction ˛ Grouping of genes based on sequence (similarity between proteins translated in sillico) : COG (Cluster of Orthologous Genes) Ñ Functional grouping. ã ˛ MGS (MetaGenomics Species) : grouping by covariance of abundances. ˛ Gene annotation (KEGG) : bank of genes whose function has been identified. Ñ Limited to known bacterial genes. ã Statistical analysis of meta-omics data Sandra Plancade INRA (French Institute of 9 / 24

  10. 16s metagenomics data 16s : gene characteristic of species Data : matrix of abundances of bacterial species (100/1000 variables) Phylogenetic tree : tree that represents evolutionnary relashionships between species. Ñ built from distances between the nucleotide sequences of 16s genes. ã Ñ Structure in variables. ã Statistical analysis of meta-omics data Sandra Plancade INRA (French Institute of 10 / 24

  11. Comparison 16s/WGS 16S ˛ Less expensive ˛ More widely used ( ñ more specific statistical methods) ˛ Less technical variability. ˛ Ecology issues : present/absent species in given conditions, co-presence... WGS ˛ Large number of variables ˛ High technical variability ˛ Functional analysis. Controverse : phylogenetic grouping correspond approximately to functional grouping Statistical analysis of meta-omics data Sandra Plancade INRA (French Institute of 11 / 24

  12. To sum up, metagenomics data are : of large/very large dimension (very) noisy highly correlated sparse potentially structured Statistical analysis of meta-omics data Sandra Plancade INRA (French Institute of 12 / 24

  13. Other meta-omics data Meta-transcriptomics : similar to metagenomics Meta-proteomics and metabolomics : Technologies similar to omics (GC-MS, MS-MS) ˛ Fractionning of molecules (metabolites/proteins) in fragments (ions/peptides) ˛ Identifications of fragments by their M/Z spectra compared to a bank of peptides/ions ˛ Recovering of molecules abundances. Difficulty : identification requires alignement, more difficult for molecules present in few biological samples. Statistical analysis of meta-omics data Sandra Plancade INRA (French Institute of 13 / 24

  14. 1 Presentation of meta-omics 2 Sequencing of metagenomics data 3 Statistical analysis of metagenomics data 4 Some of my topics of interest Statistical analysis of meta-omics data Sandra Plancade INRA (French Institute of 14 / 24

  15. General biological issues Ecology : description of species present in the environment. ˛ Difference between conditions (ex :comparison of soil samples from different geographics area) ˛ Co-presence of species. Functionality : how does microbiote works ? ˛ Interactions between bacteria ˛ Link between microbiote and phenotypes/omics data Ñ Related statistical questions may be unprecised. ã Statistical analysis of meta-omics data Sandra Plancade INRA (French Institute of 15 / 24

  16. Usual statistical approaches Multiple testing (differential analysis) ˛ zero-inflated parametric models. ˛ permutation tests [White et al , PLoS Comput. Bio. 2009] Mixed models (multiple time-points) [Le Cao et al 2015] X j α j i ` β j i p t q “ f j p t q + + ε i,j p t q i t lo omo on looomooon time effect : random individual splines effect Adaptation of multivariate analysis methods ˛ Centered Log-Ratio transformation + methods based on correlation (PLS...) ˛ Variance decomposition (multi-sites measurements) ˛ Methodes based on distance matrices ˛ Penalisation contraining structure based on phylogenic trees [Chen 2012] Variables selection by sparse multivariate methods Bi-clustering : Non-negative Matrix Factorization Network inference : GGM Statistical analysis of meta-omics data Sandra Plancade INRA (French Institute of 16 / 24

  17. Example of anaysis based on distance matrices Goal : test the effect of race on rumen microbiote for cow. Data : ˛ p X u,k q , u “ 1 , . . . , N , k “ 1 , . . . , p : 16S measurement of abundances in p bacterial species for N cows ˛ Y u P t 1 , . . . , a u : races ˛ "ANOVA" notations : X i,j,k : i “ 1 , . . . , a : category (race) j “ 1 , . . . , n : repetition (cow) k “ 1 , . . . , p : variable (species) Statistical analysis of meta-omics data Sandra Plancade INRA (French Institute of 17 / 24

  18. Example of anaysis based on distance matrices Goal : test the effect of race on rumen microbiote for cow. Data : ˛ p X u,k q , u “ 1 , . . . , N , k “ 1 , . . . , p : 16S measurement of abundances in p bacterial species for N cows ˛ Y u P t 1 , . . . , a u : races ˛ "ANOVA" notations : X i,j,k : i “ 1 , . . . , a : category (race) j “ 1 , . . . , n : repetition (cow) k “ 1 , . . . , p : variable (species) Unifrac distance based on phylogeny between 2 16S samples. o Sample 1 o o x x x Sample 2 x Statistical analysis of meta-omics data Sandra Plancade INRA (French Institute of 17 / 24

  19. Example of anaysis based on distance matrices Goal : test the effect of race on rumen microbiote for cow. Data : ˛ p X u,k q , u “ 1 , . . . , N , k “ 1 , . . . , p : 16S measurement of abundances in p bacterial species for N cows ˛ Y u P t 1 , . . . , a u : races ˛ "ANOVA" notations : X i,j,k : i “ 1 , . . . , a : category (race) j “ 1 , . . . , n : repetition (cow) k “ 1 , . . . , p : variable (species) Unifrac distance based on phylogeny between 2 16S samples. Shared edges Unshared edges o Sample 1 o o x x x Sample 2 x Statistical analysis of meta-omics data Sandra Plancade INRA (French Institute of 17 / 24

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend