Some biological questions in bacterial comparative genomics Meriem - - PowerPoint PPT Presentation

some biological questions in bacterial comparative
SMART_READER_LITE
LIVE PREVIEW

Some biological questions in bacterial comparative genomics Meriem - - PowerPoint PPT Presentation

Some biological questions in bacterial comparative genomics Meriem El Karoui Inra, Jouy-en-Josas Meriem_elkaroui@hms.harvard.edu MICALIS A L I M E N T A T I O N 9 December, 2008 January 15th 2010 A G R I C U L T U R E AERES E N V I R O N


slide-1
SLIDE 1

A L I M E N T A T I O N A G R I C U L T U R E E N V I R O N N E M E N T

9 December, 2008 AERES

Some biological questions in bacterial comparative genomics

Meriem El Karoui Inra, Jouy-en-Josas MICALIS

January 15th 2010

Meriem_elkaroui@hms.harvard.edu

slide-2
SLIDE 2

Comparative genomics : definition

  • The study of relationships between complete

genomes of different species/individuals.

  • Objectives :

– Describe and understand genomic diversity

  • Indentify conserved an variable regions
  • Understand species/individual specificity
  • Identify functional parts, genes, promoters ...

– Understand genome evolution

Marseille, January 2010 MICALIS

slide-3
SLIDE 3

Key dates

  • 1995 : complete genome of Haemophilus

influenzae and Mycoplasma genitalium

  • 1999 : first comparison at the strain level

(Helicobacter pylori)

  • 2003: First metagenome
  • 2004: First complete human genome
  • 2007: second human genome

MICALIS Marseille, January 2010

slide-4
SLIDE 4

A very large amount of data

MICALIS Marseille, January 2010

50 100 150 200 250 1996 1998 2000 2002 2004 2006 2008

Complete Genomes (by Domain) GOLD database, September 2009 A B E

slide-5
SLIDE 5

Data

MICALIS Marseille, January 2010

  • Complete genomes
  • Not so complete genome
  • Metagenomes
  • Different sequence quality standard are

emerging (draft, high quality draft, improved high quality draft,

annotation directed improvement, non contiguous finished, finished, Chain et. al Science, 2009)

slide-6
SLIDE 6

Next generation sequencing

DNA amplification/ template Read length Gb/run Roche 454 Yes /MP 350 0.45 Illumina Yes/MP 75 18 SOLiD Yes/MP 35 30 Heliscope No/MP 32 37

Pacific Bioscience

No 965 N/A

Metzker, Nature Reviews Genetics, 2010

slide-7
SLIDE 7

Evolutionary scale

  • Different individuals in the same species
  • Closely related species
  • Divergent species

MICALIS Marseille, January 2010

slide-8
SLIDE 8

Vertebrate genomes

MICALIS Marseille, January 2010

Margulies Birney, Nat. Rev. Genetics, 2008

slide-9
SLIDE 9

Bacterial genomes

MICALIS Marseille, January 2010 Wu, Hugenholtz et. al. Nature,2009

slide-10
SLIDE 10

Databases

  • Genbank
  • EMBL
  • DDBJ
  • Specialized

Databases (NAR database issue, January 2010)

MICALIS Marseille, January 2010

slide-11
SLIDE 11

ANALYSIS OF GENETIC DIVERSITY

MICALIS Marseille, January 2010

slide-12
SLIDE 12

Define conserved and variable regions

  • Objectives :

– Understand phenotypic behavior (pathogenicity, susceptibility to diseases). – Find functional information (identify genes, promoters, functional DNA motifs). – Establish « gene repertoire» of a species. Discover new protein families.

MICALIS Marseille, January 2010

slide-13
SLIDE 13

Two different approaches in genome comparison

  • Comparison of complete proteome

(gene level)

  • Comparison of complete genome

(nucleotide level)

MICALIS Marseille, January 2010

slide-14
SLIDE 14

Analysis at the gene level

  • Based on the identification of homologous

genes

  • Allows comparisons at various evolutionary

time scale

  • Can be applied to some extent to non finished

genomes

  • dependant on genome annotation and

accuracy of gene alignment procedure (usually BLAST)

MICALIS Marseille, January 2010

slide-15
SLIDE 15

What is the amount of gene conservation among procaryotes?

MICALIS Marseille, January 2010

Koonin, E. V. et al. Nucl. Acids Res. 2008

slide-16
SLIDE 16

A High level of HGT in procaryotes

MICALIS Marseille, January 2010

Koonin, E. V. et al. Nucl. Acids Res. 2008

slide-17
SLIDE 17

Ecosystem level : Metagenomic

  • Sampling the genome sequences of a

community of organisms inhabiting a common environment

  • genomes of dominant species can be fully

reconstructed

  • Most data are short reads that can be related

to genes. NGS

MICALIS Marseille, January 2010

slide-18
SLIDE 18

Analysis of phyla representation

MICALIS Marseille, January 2010

Ley R. et al, Nature Reviews Microbiology, 2008

slide-19
SLIDE 19

« Gene centric » analysis

MICALIS Marseille, January 2010

Regardless of species content Hugenholtz and Tyson, Nature, 2008

slide-20
SLIDE 20

Analysis at the species level

  • Core genome : genes shared by all the strains
  • analysed. Basic functions and species

phenotypic characteristic

  • Pan genome : core genome + « dispensable

genome ». Species diversity and functions related to niche adaptation

MICALIS Marseille, January 2010

slide-21
SLIDE 21

GBS pan-genome

Tettelin H et al. PNAS 2005;102:13950-13955

Streptococcus agalactiae pan genome

slide-22
SLIDE 22

Pan/core genome of Escherichia coli

MICALIS Marseille, January 2010

Touchon et. al. Plos Genetics, 2009

slide-23
SLIDE 23

Conclusion gene based analysis

  • Very powerful to characterize genomic

diversity at different evolutionary scale.

  • Shows a surprising level of genetic diversity in

procaryotes which in large part due to the « mobilome » (mobile genetic elements)

  • Dependant on annotation and accuracy of

gene comparison method.

  • Does not take into account genome structure.

MICALIS Marseille, January 2010

slide-24
SLIDE 24

Nucleotide level analysis

  • Short evolutionary time scale
  • Takes into account chromosome organisation

– complete multi genome alignment – Genome « mapping » (NGS)

slide-25
SLIDE 25

Complete genome alignment

Brudno et al.

slide-26
SLIDE 26

Multiple whole-genome alignment

  • 1. Identify local region of

similarity (matches)

  • 2. Chaining of matches

(rearrangments)

  • 3. Alignment of gaps

Softwares : MGA, MAUVE, MAVID, MultiLAGAN…..

Dewey and Pachter, Human Molecular Genetics, 2006

slide-27
SLIDE 27

Comparison of two E. coli strains

slide-28
SLIDE 28

Chromosome organisation

slide-29
SLIDE 29

Chromosome rearrangments in Yersinia Pestis

Darling et. al. Plos Genetics, 2008

slide-30
SLIDE 30

Backbone/variable segments

MICALIS Marseille, January 2010

complete genome alignment

Variable segments Backbone

slide-31
SLIDE 31

Bacterial genome segmentation

  • 1. Genome alignment

MGA, MAUVE

  • 2. Segmentation

http://genome.jouy.inra.fr/mosaic

Chiapello et al. BMC Bioinformatics, 2005 Chiapello et al. BMC Bioinformatics, 2008

slide-32
SLIDE 32

Robustness of genome comparison

  • H. Devillers, S. Schbath,

ANR Cocogen

Simulations

– Random perturbation of genome and segmentation – Robustness Score

slide-33
SLIDE 33

Identification of functional motifs

  • Perform multiple complete genome alignment
  • Define backbone
  • Look for motifs that have a particular

distribution on the backbone

slide-34
SLIDE 34
  • ri

dif

identification of functional motifs

DNA repair: Chi

Halpern et al. PLoS Genetics, 2007

Chromosome segregation: KOPS

Bigot et al. EMBO J., 2005 Val et al. PLoS Genetics, 2008

These motifs are enriched on the backbone Macrodomain organisation: MatS

Mercier et al. Cell, 2008

slide-35
SLIDE 35

Srivatsan et al. PlosGenetics 2008, see also Medvedev et al. Nature Methods, 2009

Caracterisation of mutants in Bacillus subtilis

slide-36
SLIDE 36

Analysis at the nucletide level

  • Very precise identification of variations (single

nucelotide mutations, indel etc…)

  • Complete genome alignment still an open

question

  • New methods to compare unfinished

genomes fast developping.

slide-37
SLIDE 37

UNDERSTANDING GENOME EVOLUTION

MICALIS Marseille, January 2010

slide-38
SLIDE 38

Analysis of E. coli genome evolution

  • 20 high quality E. coli genomes
  • 1 complete genome of Escherichia fergunsonii

(outgroup)

MICALIS Marseille, January 2010

slide-39
SLIDE 39

MICALIS Marseille, January 2010

Phylogenetic tree reconstruction :

  • E. coli

Touchon et al. PLoSGenetics, 2008

slide-40
SLIDE 40

A high level of gene variation along the tree

MICALIS Marseille, January 2010

slide-41
SLIDE 41

Conclusion

  • Comparative genomics has revealed an

unexpected amount of variability among prokaryotic genomes.

  • It raises challenging questions about genome

evolution, e. g. bacterial species concept.

  • It paves the way for other types or

comparisons

– Comparisons of networks – Comparisons of transcriptomes (RNA-seq) and protein binding regions (Chip-Seq)

MICALIS Marseille, January 2010

slide-42
SLIDE 42
  • S. Robin
  • S. Schbath
  • A. Jacquemard
  • H. Chiapello
  • C. Caron

F.X. Barre

  • F. Cornet
  • F. Boccard
  • O. Espeli

MIG, INRA, Jouy OMIP, AgroParisTech MIG, INRA, Jouy CGM, CNRS, Gif sur Yvette LMGM, CNRS, Toulouse

Bioinformatics Statistics Experimental biology

  • E. Rivals
  • R. Uricaru

LIRMM, CNRS, Montpellier

Algorithmics

  • P. Lebourgeois

M-A Petit

  • D. Halpern
  • H. Devillers
  • F. Touzain

MICALIS, INRA Jouy