Statistical analysis of meta-omics data Sandra Plancade INRA - PowerPoint PPT Presentation

Statistical analysis of meta-omics data Sandra Plancade INRA (French Institute of Research in Agriculture) 24 février 2016 Statistical analysis of meta-omics data Sandra Plancade INRA (French Institute of 1 / 24

1 Presentation of meta-omics 2 Sequencing of metagenomics data 3 Statistical analysis of metagenomics data 4 Some of my topics of interest Statistical analysis of meta-omics data Sandra Plancade INRA (French Institute of 2 / 24

Microbial ecosystems Microbial ecosystem = population of bacteria that interact in a given environment Ñ Exple : soil, sea water, gut ã A varying proportion of bacteria are not genotyped neither cultivable. Before metagenomics : analysis of bacteria culture. Metagenomics = analysis of bacterial genes in a given biological sample. ( ‰ genomics = analysis of the genome of a given organism) Metagenomics made possible by technological advances. Ñ NGS (next generation sequencing) ã Statistical analysis of meta-omics data Sandra Plancade INRA (French Institute of 4 / 24

Meta-omics data Meta-omics data= omics data measured on a population of bacteria in a given environment. Metagenomics data = DNA of bacteria. Two types of measures : ˛ only 16S gene, characteristic of the species ˛ all genes (Whole Genome Sequencing) Ñ widely studied ã Meta-transcriptomics data = RNA of bacteria Meta-proteomics data = proteins of bacteria Ñ New ã DNA Ñ RNA Ñ proteins function � genomics transcriptomics proteomics metabolomics Statistical analysis of meta-omics data Sandra Plancade INRA (French Institute of 5 / 24

Metagenomics WGS (Whole Genome Sequencing) or shotgun Next generation sequencing AGGCTGCCA GCCATTCAGTCA GCAGGCTA . . . . . . Genes cut in small Biological List of 30-100 sequences that are sample millions of reads « read » by the populationof machine bacteria Statistical analysis of meta-omics data Sandra Plancade INRA (French Institute of 7 / 24

Construction of a catalogue from a large number of sample sample n sample 1 AGGCTGCCA GTACGTAAG . . . GCCATTCAGTCA AGCCTAGTCT . . . . . . AGGCTGCCA Pool of GCCATTCAGTCA reads GTACGTAAG Assemble by AGCCTAGTCT Bruijn graph . . . CGCAAT GCAATCG CGCAATCG Long sequence of CGCATTTGAGCTAGCCTAGCATCGAGG nucleotides Délimitation of genes : sequences caracteristic begining/end of gene Metagenomics CGCATTTG AGCTAGCCTA GCATCGAGGC CTTA catalogue Ñ In gut, Metahit catalogue = 10 millions of genes. ã Statistical analysis of meta-omics data Sandra Plancade INRA (French Institute of 8 / 24

Compute metagenomic abundances in a biological sample : 10 M genes AGGCTGCCA 10 M genes GCCATTCAGTCA n samples GCAGGCTA A i,j . . . Gene counts = # Reads « mapped » reads mapped on the catalogue Reads from a Matrix of biological abundances sample counts of gene g Abundance of gene g “ p length of gene g q ˆ p # reads mapped q Characteristics of the data ˛ High technical variability ˛ Very large dimension : log(p)>n ˛ In gut, 200-500,000 genes present in each sample : high sparsity Dimension reduction ˛ Grouping of genes based on sequence (similarity between proteins translated in sillico) : COG (Cluster of Orthologous Genes) Ñ Functional grouping. ã ˛ MGS (MetaGenomics Species) : grouping by covariance of abundances. ˛ Gene annotation (KEGG) : bank of genes whose function has been identified. Ñ Limited to known bacterial genes. ã Statistical analysis of meta-omics data Sandra Plancade INRA (French Institute of 9 / 24

16s metagenomics data 16s : gene characteristic of species Data : matrix of abundances of bacterial species (100/1000 variables) Phylogenetic tree : tree that represents evolutionnary relashionships between species. Ñ built from distances between the nucleotide sequences of 16s genes. ã Ñ Structure in variables. ã Statistical analysis of meta-omics data Sandra Plancade INRA (French Institute of 10 / 24

Comparison 16s/WGS 16S ˛ Less expensive ˛ More widely used ( ñ more specific statistical methods) ˛ Less technical variability. ˛ Ecology issues : present/absent species in given conditions, co-presence... WGS ˛ Large number of variables ˛ High technical variability ˛ Functional analysis. Controverse : phylogenetic grouping correspond approximately to functional grouping Statistical analysis of meta-omics data Sandra Plancade INRA (French Institute of 11 / 24

To sum up, metagenomics data are : of large/very large dimension (very) noisy highly correlated sparse potentially structured Statistical analysis of meta-omics data Sandra Plancade INRA (French Institute of 12 / 24

Other meta-omics data Meta-transcriptomics : similar to metagenomics Meta-proteomics and metabolomics : Technologies similar to omics (GC-MS, MS-MS) ˛ Fractionning of molecules (metabolites/proteins) in fragments (ions/peptides) ˛ Identifications of fragments by their M/Z spectra compared to a bank of peptides/ions ˛ Recovering of molecules abundances. Difficulty : identification requires alignement, more difficult for molecules present in few biological samples. Statistical analysis of meta-omics data Sandra Plancade INRA (French Institute of 13 / 24

General biological issues Ecology : description of species present in the environment. ˛ Difference between conditions (ex :comparison of soil samples from different geographics area) ˛ Co-presence of species. Functionality : how does microbiote works ? ˛ Interactions between bacteria ˛ Link between microbiote and phenotypes/omics data Ñ Related statistical questions may be unprecised. ã Statistical analysis of meta-omics data Sandra Plancade INRA (French Institute of 15 / 24

Usual statistical approaches Multiple testing (differential analysis) ˛ zero-inflated parametric models. ˛ permutation tests [White et al , PLoS Comput. Bio. 2009] Mixed models (multiple time-points) [Le Cao et al 2015] X j α j i ` β j i p t q “ f j p t q + + ε i,j p t q i t lo omo on looomooon time effect : random individual splines effect Adaptation of multivariate analysis methods ˛ Centered Log-Ratio transformation + methods based on correlation (PLS...) ˛ Variance decomposition (multi-sites measurements) ˛ Methodes based on distance matrices ˛ Penalisation contraining structure based on phylogenic trees [Chen 2012] Variables selection by sparse multivariate methods Bi-clustering : Non-negative Matrix Factorization Network inference : GGM Statistical analysis of meta-omics data Sandra Plancade INRA (French Institute of 16 / 24

Example of anaysis based on distance matrices Goal : test the effect of race on rumen microbiote for cow. Data : ˛ p X u,k q , u “ 1 , . . . , N , k “ 1 , . . . , p : 16S measurement of abundances in p bacterial species for N cows ˛ Y u P t 1 , . . . , a u : races ˛ "ANOVA" notations : X i,j,k : i “ 1 , . . . , a : category (race) j “ 1 , . . . , n : repetition (cow) k “ 1 , . . . , p : variable (species) Statistical analysis of meta-omics data Sandra Plancade INRA (French Institute of 17 / 24

Example of anaysis based on distance matrices Goal : test the effect of race on rumen microbiote for cow. Data : ˛ p X u,k q , u “ 1 , . . . , N , k “ 1 , . . . , p : 16S measurement of abundances in p bacterial species for N cows ˛ Y u P t 1 , . . . , a u : races ˛ "ANOVA" notations : X i,j,k : i “ 1 , . . . , a : category (race) j “ 1 , . . . , n : repetition (cow) k “ 1 , . . . , p : variable (species) Unifrac distance based on phylogeny between 2 16S samples. o Sample 1 o o x x x Sample 2 x Statistical analysis of meta-omics data Sandra Plancade INRA (French Institute of 17 / 24

Example of anaysis based on distance matrices Goal : test the effect of race on rumen microbiote for cow. Data : ˛ p X u,k q , u “ 1 , . . . , N , k “ 1 , . . . , p : 16S measurement of abundances in p bacterial species for N cows ˛ Y u P t 1 , . . . , a u : races ˛ "ANOVA" notations : X i,j,k : i “ 1 , . . . , a : category (race) j “ 1 , . . . , n : repetition (cow) k “ 1 , . . . , p : variable (species) Unifrac distance based on phylogeny between 2 16S samples. Shared edges Unshared edges o Sample 1 o o x x x Sample 2 x Statistical analysis of meta-omics data Sandra Plancade INRA (French Institute of 17 / 24

Statistical analysis of meta-omics data Sandra Plancade INRA - PowerPoint PPT Presentation

Statistical analysis of meta-omics data Sandra Plancade INRA (French Institute of Research in Agriculture) 24 fvrier 2016 Statistical analysis of meta-omics data Sandra Plancade INRA (French Institute of 1 / 24 1 Presentation of meta-omics

PostgreSQL and Omics Data How omics data can be stored in postgres database Postgr tgreSQ eSQL

Integrating multi-omics Luciano Milanesi Outline Introduction Omics challenges Data

Meta- Meta -Programming with Programming with Modelica Modelica for Meta- for Meta

Multi-Omics with Galaxy for Diverse Biological Applications Tim Griffin and Pratik Jagtap

Machine Learning Applications to Omics Data Kelly Ruggles April 9, 2018 Diversity of Omics in

Abou out t OM OMICS S Gr Grou oup OMICS Group International is an amalgamation of

Intelligent Tutoring Systems: A Meta-Analysis Meta-Analysis Wenting Ma March, 2011

High-dimensional omics data analysis using a variable screening protocol with prior knowledge

Lecture 31/Chapter 25 More about Meta-Analysis Benefits and Pitfalls An Application:

The Fourth Age Sandra von Doetinchem sandra.doetinchem@berkeley.edu Short bio - Sandra von

Sandra McDonald Azaleas Sandra McDonalds Biography Born in Kansas City, KS Sandra and Ken

Reporting and Evaluation of Studies of Biomarkers and Omics-based Predictors: REMARK Guidelines

Individual Participant Data (IPD) Reviews and Meta analyses Lesley Stewart Director, CRD Larysa

META Seal of Recognition and META Prize Award Ceremony Georg Rehm (DFKI) on behalf of the

Simultaneous meta and data manipulation in Blaise Marien Lina Statistics netherlands Statistics

Data and Analysis Part V Statistical Analysis of Data Alex Simpson Part V: Statistical Analysis

Sequencing data files and Quality Control Gilgi Friedlander Bioinformatics Unit, Biological

Why and how to build up a network of excellence on Triticeae genomics in Europe? Nils Stein,

ChIP-seq data analysis 04-05-12 Outlook Friday 04-05-12: Next-generation sequencing

Development of Genomics Plugins in i2b2 Lori Phillips, MS AUG Meeting June 18, 2013 Big

Problem From September 1, 2009 to November 6, 2010, there w ere 21 cases of hospital acquired

Resistance to Antiretroviral Drugs HIV-2 HIV-2: Background 1986 Restricted to West

Investor Presentation March 2019 (NZX:TRU) INVESTMENT SUMMARY At TruScreen we are building our

Metrics Technical Advisory Workgroup June 22, 2017 PLEASE DO NOT PUT YOUR PHONE ON HOLD IT

Sambuz

Useful Links

Newsletter

Mail Us