DuctApe a tool for the analysis and correlation of genomic and high - - PowerPoint PPT Presentation

ductape
SMART_READER_LITE
LIVE PREVIEW

DuctApe a tool for the analysis and correlation of genomic and high - - PowerPoint PPT Presentation

Marco Galardini (@mgalactus) DuctApe a tool for the analysis and correlation of genomic and high throughput phenotypic Biolog data University of Florence Microbial genetics lab Florence computational biology group 04/03/2013 Who we are 2


slide-1
SLIDE 1

University of Florence Microbial genetics lab Florence computational biology group

Marco Galardini

(@mgalactus)

DuctApe

a tool for the analysis and correlation of genomic and high throughput phenotypic Biolog data

04/03/2013

slide-2
SLIDE 2

Who we are

2

  • Three bioinformatics groups from Unifi
  • Est. 2011
  • Microbiology (clinical, agronomical, ecological)
  • Biological sequences information analysis
  • Bioinformatics softwares development

@combogenomics combo.unifi@gmail.com http://www.unifi.it/dbefcb

  • Italian Agricultural Research Council
  • Soil and agricultural microbiology
slide-3
SLIDE 3

Who we are

3

  • Bacterial genomics and phenomics
  • Phenotypic assays on chemical sensitivities

Other collaborations

Dipartimento di Scienze delle Produzioni Agroalimentari e dell'Ambiente

slide-4
SLIDE 4

Introduction

4

The wishing well

The genomics and phenomics era

slide-5
SLIDE 5

The wishing well

5

MacLean et al., 2009

The genomics era

genomesonline.com

slide-6
SLIDE 6

The wishing well

6

http://www.genome.jp/kegg/

The genomics era

  • Metabolic networks reconstruction
  • From genomes to metabolomes
  • High throughput genomics/metabolomics
slide-7
SLIDE 7

The wishing well

7

The phenomics era

www.biolog.com

  • Many compounds on KEGG DB
  • High throughput phenomics
slide-8
SLIDE 8

Introduction

8

Genome data analysis

  • Genome map to KEGG
  • Pangenome prediction
  • core
  • accessory
  • unique

Phenome data analysis

  • Metabolic activity parameters
  • Replica management
  • Clear comparisons
  • Clear visualizations
  • Compounds map to KEGG
slide-9
SLIDE 9

Introduction

9

How to combine genomic and phenomic data?

  • All data in a single metabolic map
  • Genetic basis for phenotypic differences
slide-10
SLIDE 10

The missing link

10

DuctApe

The missing link between genomics and phenomics

slide-11
SLIDE 11

The missing link

11

Three different experimental setups

Single strain(s) Mutant(s)

  • Correlation of mutated genes / different phenotypes
  • Deletion / insertion mutants

PanGenome

  • Prediction of Core / Accessory / Unique genome
  • Correlation between Dispensable genome and phenotypes
slide-12
SLIDE 12

The missing link

12

Three different modules

dgenome

  • Genes are mapped to KEGG database
  • PanGenome prediction (Blast-BBH)

dphenome

  • Phenotype microarray data handling (sigmoid fit)
  • Classification of metabolic activity (Activity index)
  • Compounds are mapped to KEGG database

dape

  • Generation of combined KEGG metabolic maps
  • Metabolic network analysis (through graph algorithms)
  • Metabolic hotspots prediction
slide-13
SLIDE 13

The missing link

13

dgenome

Genomics made easy

slide-14
SLIDE 14

dgenome

14

Genome map to KEGG (1)

Blast BBH on a local KEGG database* Blast BBH using KASS web-server**

*Since July 1th 2011, the access to KEGG FTP needs a $2000/$5000 licence **Available for free, fast and reliable

slide-15
SLIDE 15

dgenome

15

Genome map to KEGG (2)

KEGG public API Detailed info on:

  • Reactions
  • Compunds
  • Pathways

Fast, multi-threaded access

slide-16
SLIDE 16

dgenome

16

Pangenome prediction

  • Many genomes
  • Serial BBH
  • User-defined PanGenome
  • Core Genome (conserved pathways)
  • Dispensable Genome (variable pathways)
  • Accessory Genome
  • Unique Genome
slide-17
SLIDE 17

The missing link

17

dphenome

Painless high-throughput phenomics

slide-18
SLIDE 18

dphenome

18

From raw data to phenotypic variability

  • 1. Parsing
slide-19
SLIDE 19

dphenome

19

From raw data to phenotypic variability

  • 1. Parsing
  • 2. Control signal subtraction (optional)
slide-20
SLIDE 20

dphenome

20

From raw data to phenotypic variability

  • 1. Parsing
  • 2. Control signal subtraction (optional)
  • 3. Signal refinement
slide-21
SLIDE 21

dphenome

21

From raw data to phenotypic variability

  • 1. Parsing
  • 2. Control signal subtraction (optional)
  • 3. Signal refinement
  • 4. Sigmoid fit
slide-22
SLIDE 22

dphenome

22

From raw data to phenotypic variability

  • 5. Parameters extraction
slide-23
SLIDE 23

dphenome

23

From raw data to phenotypic variability

  • 5. Parameters extraction

Lag Min Max Slope Plateau + Area + Average height

slide-24
SLIDE 24

dphenome

24

Phenotypic variability at a glance

slide-25
SLIDE 25

dphenome

25

Phenotypic variability at a glance

  • Multiple strain comparison
  • How to discriminate different activities?
  • A single, summarized value is needed

AV = Activity Index

slide-26
SLIDE 26

dphenome

26

Activity index (AV)

K-means clustering on 5 parameters, with 10 clusters Fast: from raw .csv files to AV in less than 5 minutes

slide-27
SLIDE 27

dphenome

27

Activity index (AV)

No activity Max activity

  • Easier comparisons
  • More understandable metrics
  • Different experiments comparison
slide-28
SLIDE 28

dphenome

28

Activity index (AV)

Plates heatmaps: phenotypic variability at a glance

slide-29
SLIDE 29

dphenome

29

Activity index (AV)

AV boxplots: overall strains comparison (also on single compounds categories)

slide-30
SLIDE 30

dphenome

30

Activity index (AV)

AV rings: overall strains comparison

Δ AV

+

slide-31
SLIDE 31

dphenome

31

Activity index (AV)

Replica management: discard inconsistent replica using the Δ AV

3 replica 2 replica Keep-min

slide-32
SLIDE 32

The missing link

32

dape

The missing link

slide-33
SLIDE 33

dape

33

Whole metabolic network reconstruction

slide-34
SLIDE 34

dape

34

Single genome metabolic network

Interactive metabolic maps (as web pages)

  • Reactions copy number
  • Compounds AV
slide-35
SLIDE 35

dape

35

Single genome metabolic network

No activity Max activity

Interactive metabolic maps (as graph files)

  • Can be used with graph analysis softwares (i.e. Gephi)
  • Generation of tables with network statistics on single pathways
slide-36
SLIDE 36

dape

36

Single genome metabolic network

No activity Max activity

Interactive metabolic maps (as graph files)

  • Can be used with graph analysis softwares (i.e. Gephi)
  • Generation of tables with network statistics on single pathways
slide-37
SLIDE 37

dape

37

Metabolic network comparisons

slide-38
SLIDE 38

The missing link

38

Under the hood

Technical features

slide-39
SLIDE 39

Under the hood

39

Technical features

DuctApe comes as a UNIX command line program

  • Clear, modular and expressive syntax
  • A web interface is under development
  • Next versions will be compatible with opm

DuctApe project file Inputs Outputs

slide-40
SLIDE 40

Under the hood

40

Technical features

Language Standing on the shoulders of giants

  • Curve fitting
  • Signal handling
  • Clustering
  • Sequence handling
  • Plots
  • Metabolic network (networkx)
slide-41
SLIDE 41

Under the hood

41

http://combogenomics.github.com/DuctApe “combogenomics ductape” ductape-users@googlegroups.com @combogenomics

slide-42
SLIDE 42

Acknowledgements

 University of Florence

Alessio Mengoni Marco Bazzicalupo Emanuela Marchi Giulia Spini Francesca Decorosi Carlo Viti Luciana Giovannetti

 Biolog Inc.

Barry Bochner

 CRA

Stefano Mocali Alessandro Florio Anna Benedetti

  • University of Lille

Emanuele Biondi

slide-43
SLIDE 43
slide-44
SLIDE 44