ductape
play

DuctApe a tool for the analysis and correlation of genomic and high - PowerPoint PPT Presentation

Marco Galardini (@mgalactus) DuctApe a tool for the analysis and correlation of genomic and high throughput phenotypic Biolog data University of Florence Microbial genetics lab Florence computational biology group 04/03/2013 Who we are 2


  1. Marco Galardini (@mgalactus) DuctApe a tool for the analysis and correlation of genomic and high throughput phenotypic Biolog data University of Florence Microbial genetics lab Florence computational biology group 04/03/2013

  2. Who we are 2 • Three bioinformatics groups from Unifi • Est. 2011 • Microbiology (clinical, agronomical, ecological) • Biological sequences information analysis @combogenomics combo.unifi@gmail.com • Bioinformatics softwares development http://www.unifi.it/dbefcb • Italian Agricultural Research Council • Soil and agricultural microbiology

  3. Who we are 3 Other collaborations • Bacterial genomics and phenomics Dipartimento di Scienze delle Produzioni • Phenotypic assays on chemical sensitivities Agroalimentari e dell'Ambiente

  4. Introduction 4 The wishing well The genomics and phenomics era

  5. The wishing well 5 The genomics era MacLean et al., 2009 genomesonline.com

  6. The wishing well 6 The genomics era • Metabolic networks reconstruction • From genomes to metabolomes • High throughput genomics/metabolomics http://www.genome.jp/kegg/

  7. The wishing well 7 The phenomics era • Many compounds on KEGG DB • High throughput phenomics www.biolog.com

  8. Introduction 8 Genome data analysis Phenome data analysis • Metabolic activity parameters • Genome map to KEGG • Replica management • Pangenome prediction • Clear comparisons • core • Clear visualizations • accessory • unique • Compounds map to KEGG

  9. Introduction 9 How to combine genomic and phenomic data? • All data in a single metabolic map • Genetic basis for phenotypic differences

  10. The missing link 10 DuctApe The missing link between genomics and phenomics

  11. The missing link 11 Three different experimental setups Single strain(s) Mutant(s) • Correlation of mutated genes / different phenotypes • Deletion / insertion mutants PanGenome • Prediction of Core / Accessory / Unique genome • Correlation between Dispensable genome and phenotypes

  12. The missing link 12 Three different modules dgenome • Genes are mapped to KEGG database • PanGenome prediction (Blast-BBH) dphenome • Phenotype microarray data handling (sigmoid fit) • Classification of metabolic activity ( Activity index ) • Compounds are mapped to KEGG database dape • Generation of combined KEGG metabolic maps • Metabolic network analysis (through graph algorithms) • Metabolic hotspots prediction

  13. The missing link 13 dgenome Genomics made easy

  14. dgenome 14 Genome map to KEGG (1) Blast BBH on a local KEGG database* Blast BBH using KASS web-server** *Since July 1th 2011, the access to KEGG FTP needs a $2000/$5000 licence **Available for free, fast and reliable

  15. dgenome 15 Genome map to KEGG (2) Fast, multi-threaded access KEGG public API Detailed info on: • Reactions • Compunds • Pathways

  16. dgenome 16 Pangenome prediction • Many genomes • Serial BBH • User-defined PanGenome • Core Genome (conserved pathways) • Dispensable Genome (variable pathways) • Accessory Genome • Unique Genome

  17. The missing link 17 dphenome Painless high-throughput phenomics

  18. dphenome 18 From raw data to phenotypic variability 1. Parsing

  19. dphenome 19 From raw data to phenotypic variability 2. Control signal subtraction (optional) 1. Parsing

  20. dphenome 20 From raw data to phenotypic variability 2. Control signal subtraction (optional) 1. Parsing 3. Signal refinement

  21. dphenome 21 From raw data to phenotypic variability 2. Control signal subtraction (optional) 1. Parsing 3. Signal refinement 4. Sigmoid fit

  22. dphenome 22 From raw data to phenotypic variability 5. Parameters extraction

  23. dphenome 23 From raw data to phenotypic variability 5. Parameters extraction Max Slope Lag Min Plateau + Area + Average height

  24. dphenome 24 Phenotypic variability at a glance

  25. dphenome 25 Phenotypic variability at a glance • Multiple strain comparison • How to discriminate different activities? • A single, summarized value is needed AV = Activity Index

  26. dphenome 26 Activity index (AV) K-means clustering on 5 parameters, with 10 clusters Fast: from raw .csv files to AV in less than 5 minutes

  27. dphenome 27 Activity index (AV) Max activity No activity • Easier comparisons • More understandable metrics • Different experiments comparison

  28. dphenome 28 Activity index (AV) Plates heatmaps: phenotypic variability at a glance

  29. dphenome 29 Activity index (AV) AV boxplots: overall strains comparison (also on single compounds categories)

  30. dphenome 30 Activity index (AV) Δ AV + - AV rings: overall strains comparison

  31. dphenome 31 Activity index (AV) 2 replica 3 replica Keep-min Replica management: discard inconsistent replica using the Δ AV

  32. The missing link 32 dape The missing link

  33. dape 33 Whole metabolic network reconstruction

  34. dape 34 Single genome metabolic network Interactive metabolic maps (as web pages) • Reactions copy number • Compounds AV

  35. dape 35 Single genome metabolic network Max activity No activity Interactive metabolic maps (as graph files) • Can be used with graph analysis softwares (i.e. Gephi) • Generation of tables with network statistics on single pathways

  36. dape 36 Single genome metabolic network Max activity No activity Interactive metabolic maps (as graph files) • Can be used with graph analysis softwares (i.e. Gephi) • Generation of tables with network statistics on single pathways

  37. dape 37 Metabolic network comparisons

  38. The missing link 38 Under the hood Technical features

  39. Under the hood 39 Technical features Inputs DuctApe project file Outputs DuctApe comes as a UNIX command line program • Clear, modular and expressive syntax • A web interface is under development • Next versions will be compatible with opm

  40. Under the hood 40 Technical features Language Standing on the shoulders of giants • Curve fitting • Signal handling • Clustering • Sequence handling • Plots • Metabolic network (networkx)

  41. Under the hood 41 http://combogenomics.github.com/DuctApe “ combogenomics ductape ” ductape-users@googlegroups.com @combogenomics

  42. Acknowledgements  University of Florence Alessio Mengoni Marco Bazzicalupo Emanuela Marchi Giulia Spini Francesca Decorosi Carlo Viti Luciana Giovannetti  Biolog Inc. Barry Bochner  CRA Stefano Mocali Alessandro Florio Anna Benedetti • University of Lille Emanuele Biondi

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend