Orthology * and paralogy >pro >p rote tein in_s _seq - - PDF document

orthology and paralogy
SMART_READER_LITE
LIVE PREVIEW

Orthology * and paralogy >pro >p rote tein in_s _seq - - PDF document

01 Apr 15 BLAST Orthology * and paralogy >pro >p rote tein in_s _seq eque uence_A MTQSSHAVAA FD FDLGA LGAAL ALRQ RQE GL E GLTET TETDY DYSE SEI QR I QRDP DPNR NRAEL AELG TF G TFGV GV >pro >p rote tein


slide-1
SLIDE 1

01‐Apr‐15 1

Orthology* and paralogy

Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, April 2nd 2015

*Not to be confused with Ornithology

BLAST

>p >pro rote tein in_s _seq eque uence_A MTQSSHAVAA FD FDLGA LGAAL ALRQ RQE GL E GLTET TETDY DYSE SEI QR I QRDP DPNR NRAEL AELG TF G TFGV GV >p >pro rote tein in_s _seq eque uence_B MLTETDYSEI QR QRRLG RLGRD RDPN PNR AE R AELGM LGMFG FGVM VMN RA N RAEL ELGM GMFGY FGY >p >pro rote tein in_s _seq eque uence_C MHAVAAFDLG AA AALRQ LRQEG EGLT LTE TD E TDYSE YSEIQ IQRR RRL GR L GRAM AMFG FGVMW VMWS EH S EHCC CCYR YRNDD NDDA RP RPLL LLRP RPIK IKSP SP F FGAWVVIV

Using phylogenetics to study evolution of a protein

  • You are interested in

the evolution of a protein and use its sequence to find homologs in the database

– blastp homology search – Refseq database

  • You can make a

phylogenetic tree to see how the protein hits (homologs) are related

– Progressive multiple sequence alignment – ML phylogenetic tree

  • Homologous genes or proteins result

from:

– Speciation: when separate lineages diverge from a common ancestor and experience different evolutionary pressure

  • This means that the two homologs and are

present in the genome sequences of different species

Ancestral genome Current genomes

The origin of homologs

species

  • These homologs are called orthologs

– Gene duplication: when a gene is duplicated within a species and the duplication becomes fixed in the population

  • This means that two the two homologs and

are present in the same genome sequence

  • These homologs are called paralogs

Ancestral genome Current genome

The evolution of a gene

  • In this example, we assume that:

– We know all genes in the genomes, so that we have not missed any homolog – The gene is never lost from any of the genomes

Speciations and gene duplications

  • Thus, phylogenetic

trees of protein families contain two types of nodes

– Speciation nodes where the protein sequences in the tree diverged due to a d e ged due o a speciation event – Gene duplication nodes where the protein sequences in the tree diverged due to a gene duplication within one genome

slide-2
SLIDE 2

01‐Apr‐15 2

How to tell the difference?

  • If the two daughter

branches of a node contain the same species, it is a gene duplication node

– Only if those two proteins are both really present in the same genome and there has genome and there has been no horizontal gene transfer (HGT)

  • Otherwise it can be a

speciation node

– Unless genes were lost

  • r missed in these

genomes – this is called unrecognized paralogy

  • Orthologs are directly related to the

same ancestral gene, but have been retained in different species

– Since there is only one gene to perform the function, orthologous proteins generally perform the same function in these different genomes

The evolution of gene functions

Ancestral genome Current genomes

  • Paralogs have evolved side by side in the

same genome for a while

– Since there are two genes with a redundant function, paralogous proteins might be more free to change their function in time – Some can acquire new functions (neo‐ and sub‐functionalization) – Most gene families evolve through duplications

Ancestral genome Current genome

Question

  • Observe the two simplified gene trees above of two

homologs from mouse and two homologs from human Mouse C Mouse D Human C Human D

Tree 2

Mouse A Mouse B Human A Human B

Tree 1

homologs from mouse and two homologs from human.

1. Which are the speciation nodes and which are the gene duplication nodes? 2. What kind of homologs are Mouse A and Human A in Tree 1? 3. What kind of homologs are Human C and Human D in Tree 2? 4. Which genes may have the same function in Tree 1? 5. Which genes may have the same function in Tree 2?

Answers

  • Observe the two simplified gene trees above of two

homologs from mouse and two homologs from human. Mouse C Mouse D Human C Human D

Tree 2

Mouse A Mouse B Human A Human B

Tree 1

homologs from mouse and two homologs from human.

1. See above: 2. The Mouse A and Human A genes in Tree 1 are orthologs. 3. The Human C and Human D genes in Tree 2 are paralogs. 4. In Tree 1, Mouse A / Human A, and Mouse B / Human B are more likely to have the same function. 5. We cannot say which genes may have the same function in Tree

  • 2. What we can say is that the genes are all homologous, so

they will probably have a similar function to some extent.

Using orthology for function prediction

mouse mouse human human

Ancestor Present species

Time

Hex2 Hex2 Hex2 Hex1 Hex1 Hex1

Hex2 Hex1

  • Researchers are often trying to identify orthologs in model
  • rganisms, because if the function of the ortholog has been studied

in the model organism, it might perform the same function in other

  • rganisms as well
  • However, note that the definition of orthology says nothing about

function

  • f the

tetrapods

Hex

Identifying orthologs

  • Orthologs are best identified by studying phylogenetic trees

– They are derived from a speciation node – There can be simple 1:1 orthologs, but if one or both of the daughter lineages expanded by gene duplication (possibly resulting in many paralogs), there can also be 1:many or many:many orthology relationships – This can make function prediction based on orthology difficult

  • Operational definition of orthology: bi‐directional best hits (BBH)

– Blast human_Hex1 against all proteins in mouse – Blast mouse Hex1 against all proteins in human  if human Hex1 is the best hit,  mouse_Hex1 is the best hit Protein 1 Protein_2 Sp_2 Protein 2 Sp_1 Protein_1 Blast mouse_Hex1 against all proteins in human  if human_Hex1 is the best hit, then human_Hex1 and mouse_Hex1 are probably orthologs – If there is no 1:1 BBH, then it is likely that one or both of the orthologs duplicated since the speciation event frog frog mouse mouse human human

Hex2 Hex2 Hex2 Hex1 Hex1 Hex1

Hex2 Hex1 Hex

_ Protein_1 _ p_ Protein_2

slide-3
SLIDE 3

01‐Apr‐15 3

Gene losses

  • If we assume that this

tree contains all the members of this gene family in the species shown, then we can deduce where gene losses occurred:

  • How many copies of

Loss in the ancestor of Medaka and Trout

y p this gene did the last common ancestor of all fishes have?

  • How many copies of

this gene did the last common ancestor of all mammals have?

Gene loss

14

species B species A

Gene evolution can quickly get very complex

ancestor

Gene duplication node (paralogs)

species C

Gene loss Gene invention Speciation node (orthologs)

  • There are many possible causes of conflict between the

phylogenetic tree of a gene and the species tree or “Tree

  • f Life”:

– Unrecognized paralogy after differential gene losses – Horizontal gene transfer – Mutation saturation, biases, different rates of evolution

The “Tree” of Life?

  • Complex evolutionary processes

plus several endosymbiosis events such as the mitochondrion and the chloroplast, have led to an intricate network of evolutionary relationships that might not best be represented in the form of a Tree of Life

Detecting HGT in trees Detecting HGT in trees