Benjamin LINARD Thesis director : Julie Thompson Guideline 1) - - PowerPoint PPT Presentation

benjamin linard thesis director julie thompson guideline
SMART_READER_LITE
LIVE PREVIEW

Benjamin LINARD Thesis director : Julie Thompson Guideline 1) - - PowerPoint PPT Presentation

Team leader: O. Poch EvoluCode: an original view of Director : Julie Thompson Human Systems Evolution. Benjamin LINARD Thesis director : Julie Thompson Guideline 1) General context 2) EvoluCodes: evolutionary barcodes 3) Evolutionary


slide-1
SLIDE 1

Director : Julie Thompson

EvoluCode: an original view of Human Systems Evolution.

Benjamin LINARD

Team leader: O. Poch

Thesis director : Julie Thompson

slide-2
SLIDE 2

Guideline

1) General context 2) EvoluCodes: evolutionary barcodes 3) Evolutionary knowledge extraction in human networks Conclusion & perspectives

slide-3
SLIDE 3

Species Evolution & Technologies

  • 1. Context

An historical perspective …

Darwin Theory of Evolution

Middle age Islamic philosophers Lamarck: transmutation of species

1859

slide-4
SLIDE 4

Species Evolution & Technologies

  • 1. Context

An historical perspective …

Darwin Theory of Evolution

Middle age Islamic philosophers Lamarck: transmutation of species

1859

beans/drosophilia crossings Mendelean laws, heredity statistics (game theory) Population genetics Notions of gene & mutation

slide-5
SLIDE 5

Species Evolution & Technologies

  • 1. Context

An historical perspective …

Darwin Theory of Evolution

Middle age Islamic philosophers Lamarck: transmutation of species

1859

Mendelean laws, heredity Population genetics

Watson & Crick DNA structure Mullis PCR Sanger sequencing

1953 1977 1983

slide-6
SLIDE 6

Species Evolution & Technologies

  • 1. Context

An historical perspective …

Darwin Theory of Evolution

Middle age Islamic philosophers Lamarck: transmutation of species

1859

Mendelean laws, heredity Population genetics

Watson & Crick DNA structure Mullis PCR Sanger sequencing

1953 1977 1983 Reign of molecular biology

slide-7
SLIDE 7

Species Evolution & Technologies

  • 1. Context

An historical perspective …

Darwin Theory of Evolution

Middle age Islamic philosophers Lamarck: transmutation of species

1859

Mendelean laws, heredity Population genetics

Watson & Crick DNA structure Mullis PCR Sanger sequencing

1953 1977 1983 Reign of molecular biology 2000s

NGS, « omics » techniques

Systems biology Phenotypic variation Gene variation

Linking both ?

slide-8
SLIDE 8

Species Evolution & Technologies

  • 1. Context

http://skepticwonder.fieldofscience.com/

slide-9
SLIDE 9

Evolutionary Systems biology

  • 1. Context

1 day = 1 sequenced genome

182 3490 586

ARCHAEA BACTERIA EUKARYA

GOLD Database, mars 2012

Number of complete genomes

 Problems of dispersed data  Problems of visualisation

 need for summarisation

Sept 2008

1 gene

=

1 terabyte !!!!

  • Proteomics
  • Transcriptomics
  • Interactomics
  • Expression data

slide-10
SLIDE 10

Evolutionary Systems biology

  • 1. Context

Analysis of large-scale biological parameters

Phenomic parameters: expression level, network centrality, dispensability… Evolutionary parameters: sequence evolution rate, gene loss, genetic events, …

Observed general trends

CAI : codon adaptation index EL : expression level ER : evolutionary rate GI : genetic interactions KE : knockout lethal effect NP : number of paralogs PA : protein abundance PGL : propensity for gene loss PPI : prot.-prot. Interactions

Koonin EV, Wolf Y, 2006

How to trace back to the gene ?

slide-11
SLIDE 11

EvoluCodes and systems biology

How to study large scale biological parameters with a gene basis ?

 Representing gene variation (i.e. history)  Summarising multi-scale data

  • 2. EvoluCodes

EvoluCodes : Evolutionary Barcodes

Linard & al., Evo Bioinfo 2012

1 gene 1 evolutionary history 1 barcode

slide-12
SLIDE 12

EvoluCodes in vertebrates

  • 2. Evolucodes

Multi-level evolutionary data in vertebrates

  • 1. Multiple alignment related parameters (Thompson JD & al, PLoS ONE 2011)

1 evolucode for each human protein-coding gene Human genes compared to 17 vertebrate species

domains core blocks paralogs

  • rthologs

query

~20,000 human proteins (1/coding gene) Residue Domain Protein Clades  500,000 aligned sequences with annotation processes

slide-13
SLIDE 13

EvoluCodes in vertebrates

  • 2. Evolucodes

Multi-level evolutionary data in vertebrates

  • 1. Multiple alignment related parameters (Thompson JD & al, PLoS ONE 2011)
  • 2. Ortholog/Inparalog relationships

(Linard B. & al, BMC Bioinfo 2011)

OrthoInspector Software & Database >11,000,000

  • rthologous relations

lbgi.igbmc.fr/orthoinspector

lbgi.igbmc.fr / orthoinspector

slide-14
SLIDE 14

EvoluCodes

  • 2. Evolucodes

Multi-level evolutionary data in vertebrates

  • 1. Multiple alignment related parameters (Thompson JD & al, PLoS ONE 2011)
  • 3. Synteny data for vertebrates (Prosdocimi F. Linard B. & al, BMC

Genomics 2012)

~20 000 Human EvoluCodes

  • 2. Ortholog/Inparalog relationships

(Linard B. & al, BMC Bioinfo 2011)

Google “orthoinspector”

Image generated with CoGe:GEvo

~ 280 000 inter-species genome mappings

slide-15
SLIDE 15

EvoluCodes

Typical value Higher atypical value Lower atypical value

Parameter distribution in a given species

  • 2. Evolucodes

EvoluCode Examples

n vertebrate species

N parameters 1 barcode = 1 evolutionary scenario

Glucagon Receptor (GLR_HUMAN) Human genes are reference genes

slide-16
SLIDE 16

EvoluCodes

Developmental pluripotency-associated protein 3 ( DPPA3_HUMAN )

Recent innovation in primates and rodents + fast evolving gene

Pogo transposable element with ZNF domain ( POGZ_HUMAN )

Mammalian innovation, new domain composition + Strongly conserved in all mammals since this genetic event

HERV-K_1q22 provirus ancestral Pol protein ( POK12_HUMAN )

Variable repartition in vertebrates, viral DNA integration

  • 2. Evolucodes

High parameter value Low parameter value Generally observed value

Several EvoluCode profiles

= absent in the species

slide-17
SLIDE 17

Mammalia Teleostei Reptilia Amphibia

EvoluCodes

n vertebrate species

N parameters

Mean normalized by phylum composition

  • 2. Evolucodes

1D-EvoluCodes

Vector of N values

(1 value per parameter)

slide-18
SLIDE 18

Large scale analysis

Michael Hesse & al, 2003

Chromosome 17 Keratine type 1 cluster region

cytokeratins hair keratins Keratin-associated proteins (mammals specific cluster) cytokeratins hair keratins Inner root sheath keratins cytokeratins Human type 1 keratin cluster

Sequence Conservation Clade

Hydrophobicity

38,811,872 39,780,882 39,155,446 39,502,371

  • 2. Evolucodes
slide-19
SLIDE 19

Large scale analysis

Clustering the EvoluCodes

  • non-parametric technique, super paramagnetic clustering
  • improved Potts clustering model (Murua et al., 2008)

Sequence Conservation Clade

Hydrophobicity

 303 EvoluCode clusters  1 cluster = similar evolutionary scenario functional enrichment analysis

# genes GO accession GO terms 10log(p) FDR 55 GO:0022904 respiratory electron transport chain

  • 10.894378

GO:0006796 phosphate metabolic process

  • 5.162176

0.003 130 GO:0007608 sensory perception of smell

  • 69.573133

GO:0007606 sensory perception of chemical stimulus

  • 66.771345

GO:0007186 G-protein coupled receptor protein signaling pathway

  • 55.368505

88 GO:0042742 defense response to bacterium

  • 10.156822

GO:0009607 response to biotic stimulus

  • 5.232461

GO:0006950 response to stress

  • 4.145167

0.018 129 GO:0030029 actin filament-based process

  • 8.190798

GO:0007265 Ras protein signal transduction

  • 3.375746

0.015 GO:0014065 phosphoinositide 3-kinase cascade

  • 2.923239

0.031 25 GO:0006414 translational elongation

  • 14.67022

GO:0042273 ribosomal large subunit biogenesis

  • 5.260087

GO:0016072 rRNA metabolic process

  • 4.21555
  • 2. Evolucodes
slide-20
SLIDE 20

Reaching system level

  • 3. Evolutionary

knowledge extraction

Schematic representation of vertebrate gene evolutionary histories

+

eIF2α NRF2 ATF4 AARE CHOP Bcl2 Apoptosis ER stress Endoplasmic reticulum (ER) Nucleus Cytosol S1P S2P WFS1 GADD34 ER stress recovery ERSE PERK ATF6 p50 ATF6

Extracted from KEGG map hsa04141

Mapping EvoluCodes with biological networks

=

Apoptosis ER stress ER stress recovery

Human proteome EvoluCodes Gene network

« Evolutionary map »

  • evolutionary context

for a network

  • allows knowledge discovery

approach Local outlier factork(A) =

« Outlier » evolutionary history outlierness based on multi-scale parameters ! Phenotypic variation Gene variation

Linking both ?

slide-21
SLIDE 21

Pathway-level knowledge discovery

KEGG Pathway database Analysis of 40 human metabolic pathways (www.genome.jp/kegg/) Total number of pathway reactions : 875

  • 3. Evolutionary

knowledge extraction

A

reaction 1 reaction 2

Multiple genes for same reaction step

B

Alternative path for 2-n reactions reaction 2-n

C

reaction

  • ther

pathway

Pathway interface

E

reaction Multiple substrates and/or products

D

reaction END Start/end point of pathway, single substrate/product Other topology, mainly linear paths

F

Connectivity Redundancy

Evolutionary history and network topology

slide-22
SLIDE 22

Pathway-level knowledge discovery

  • 3. Evolutionary

knowledge extraction

A B C

PATH

  • WAY

E D

END

F

Connectivity Redundancy

Other topology

C 35% A 19% B 5% D 16% E 8% F 17%

0% 10% 20% 30% 40% 50% D C B A E F

Topological classification of

  • utliers

(normalised by total number of reactions) 27% 22% 19% 15% 14% 8%

Repartition of outliers in 6 topological classes

evolutionary history of metabolic genes is related to pathway topology !

slide-23
SLIDE 23

Cellular-level knowledge extraction

D

Graph node

C B A

Inter-pathway graph

  • 3. Evolutionary

knowledge extraction

Global analysis for all human pathways

vascular smooth muscle contraction (hsa04270) SLC9A1 (hsa:6548) CFTR (hsa:1080) GNAS (hsa:277) CA2 (hsa:760) CHRM3 (hsa:1131) salivary secretion (hsa04970) gastric acid secretion (hsa04971) pancreatic secretion (hsa04972) bile secretion (hsa04976)

Widely distibuted genes

A C B D

3x

3x

2x 2x

Representing pathways differential evolutionary behavior

slide-24
SLIDE 24

Cellular-level knowledge extraction

  • 3. Evolutionary

knowledge extraction

Global analysis for all human pathways

Cellular-level evolutionary map

p53 signaling Cell cycle Oocyte meiosis Progesteron e-m ediated

  • ocyte matu ration

Abipocytokin e signaling

cell cycle &

  • ocyte meiosis pathways

 well conserved in most animals

  • ocyte maturation pathway

varying signals of maturation : steroid hormones (fish, frog) removal of a follicular inhibitor (animals) but converge to the same target complex MPF (kinases)

Highlithing vertebrate systems evolutionary innovations !

slide-25
SLIDE 25

Conclusion

Conclusion

EvoluCodes:

« an innovative way to represent and exploit evolutionary histories in biological systems » (Linard & al., Evo Bioinfo 2012)

 multi-scale approach  high throughput technique  facilitate use of knowledge discovery techniques

Human genes:

  • Chromosomal regions sharing

similar histories

  • Limited number of

evolutionary histories

  • Histories linked to gene

function/localisation

Human networks:

  • Gene evolutionary history is linked

to topology in metabolic networks

  • New high-througput & multi-scale approaches

to uncover evolutionary messages at the cellular level

slide-26
SLIDE 26

Project perspectives

Perspectives

  • Genomic context (intron/exon) GECO db gps.igbmc.fr
  • Expression

GxDB gx.igbmc.fr

  • SNP and disease

SM2PHdb decrypthon.igbmc.fr/sm2ph/

Integrate supplementary « omics » in EvoluCodes

?

BME: Best Model Evaluation

  • Knowledge discovery: Inductive Logic Programming (ILP) …
  • Direct comparison of « Evolutionary maps » constructed for several species

Extending system-level analysis

slide-27
SLIDE 27

Acknowledgments

Julie Thompson, Olivier Poch

Tien-Dao Luu Luc Moulinier Jean Muller Ngoc-Hoan Nguyen Nicodème Paul Emmanuel Perrodou Laetitia Poidevin Wolfgang Raffelsberger Raymond Ripp Nicolas Wicker Odile Lecompte Xavier Brochet Claire Redin Marc Bigler Alexis Allot J-B. Hartmann Vincent Walter Vinod Santhara Alin Inafosi Cécile Pizot

Pierre Pontarotti Quest for Orthologs Consortium

Looking for a post-doc (abroad)