error detection and correction
play

Error Detection and Correction of Gene Trees Using Gene Order - PowerPoint PPT Presentation

Error Detection and Correction of Gene Trees Using Gene Order Manuel Lafond , Krister M. Swenson and Nadia El- Mabrouk Universit de Montral 1 Introduction Gene trees reflect the evolutionary history of a family of homologous genes


  1. Error Detection and Correction of Gene Trees Using Gene Order Manuel Lafond , Krister M. Swenson and Nadia El- Mabrouk Université de Montréal 1

  2. Introduction  Gene trees reflect the evolutionary history of a family of homologous genes ◦ Genes that all descend from a common ancestor G : g 1 g 2 g 3 g 4 g 5 2

  3. Introduction  Ancestral genes may have undergone speciation or duplication Speciatio G : n Duplication g 1 g 2 g 3 g 4 g 5 3

  4. Introduction  Modern genes relationships (LCA = Lowest Common ◦ Orthologs : LCA is a speciation Ancestor)  g 1 , g 5 are orthologs ◦ Paralogs : LCA is a duplication  g 1 , g 3 are paralogs Speciatio G : n Duplication g 1 g 2 g 3 g 4 g 5 4

  5. Introduction  Speciations and duplications are typically inferred by reconciling G with its corresponding species tree S ◦ Idea : map each modern gene to the species containing it, and add duplications to make G “agree” with S G : S : a 1 a 2 b 1 c 1 d 1 a b c d 5

  6. Introduction  An internal node g of V(G) is a speciation when there is a s in V(S) such that ◦ The leaves in the left subtree of g all map to leaves in the left subtree of s ◦ Idem for the right side g s G : S : a 1 a 2 b 1 c 1 d 1 a b c d 6

  7. Introduction  An internal node g of V(G) is a speciation when there is a s in V(S) such that ◦ The leaves in the left subtree of g all map to leaves in the left subtree of s ◦ Idem for the right side G : S : s g a 1 a 2 b 1 c 1 d 1 a b c d 7

  8. Introduction  Otherwise, g is a duplication ◦ In this case, duplication is apparent :  Two copies of the same gene ended up in the ‘a’ species  Non-apparent duplications are possible (we will se later) G : S : s g a 1 a 2 b 1 c 1 d 1 a b c d 8

  9. Introduction  Suppose we are given the orthology/paralogy relationships ◦ For instance, some deity lets us know that a 1 , b 1 are orthologous ◦ Then this gene tree is wrong ! G : S : a 1 a 2 b 1 c 1 d 1 a b c d 9

  10. Introduction  How can we make a 1 , b 1 orthologous ? G : S : a 1 a 2 b 1 c 1 d 1 a b c d 10

  11. Introduction  How can we make a 1 , b 1 orthologous ? G : S : a 1 a 2 b 1 c 1 d 1 a b c d 11

  12. Introduction  How can we make a 1 , b 1 orthologous ? G : S : a 2 a 1 b 1 c 1 d 1 a b c d 12

  13. Introduction  How can we make a 1 , b 1 orthologous ? G : S : a 1 b 1 c 1 a 2 d 1 a b c d 13

  14. Introduction  How can we make a 1 , b 1 orthologous ?  And mess up G as least as possible ?  What if we’re given many orthology constraints ? G : S : a 1 b 1 c 1 a 2 d 1 a b c d 14

  15. Problem statement  Given : a gene tree G, a species tree S, and a set P of pairs of genes that are required to be orthologous  Find : a corrected gene tree G’ in which every pair (g1, g2) in P are orthologous in G’, such that the Robinson-Foulds distance between G and G’ is minimized G : S : a 1 b 1 c 1 a 2 d 1 a b c d 15

  16. Introduction  Two copies of the same gene were found twice in the same species (g 1 , g 2 ) => We need to infer a duplication G : S : a a b c d a b c d 16

  17. Accuracy of gene trees  A few misplaced leaves in G can lead to a completely different reconciliation G : S : g 1 :a g 2 :a g 3 :b g 4 :c g 5 :d a b c d 17

  18. Accuracy of gene trees  A few misplaced leaves in G can lead to a completely different reconciliation G : S : g 1 :a g 2 :a g 3 :b g 4 :c g 5 :d G’ : a b c d g 1 :a g 3 :b g 4 :c g 2 :a g 5 :d 18

  19. Accuracy of gene trees  A few misplaced leaves in G can lead to a completely different reconciliation G : S : g 1 :a g 2 :a g 3 :b g 4 :c g 5 :d G’ : a b c d g 1 :a g 3 :b g 4 :c g 2 :a g 5 :d 19

  20. Accuracy of gene trees  Inaccuracies in gene trees lead to ◦ Erroneous topologies ◦ Erroneous orthology/paralogy relationships  We use gene order to detect and correct such errors G : S : g 1 : g 2 :a g 3 :b g 4 : g 5 :d a b c d a c 20

  21. Gene tree inference and correction  Some available information to infer and correct gene trees ◦ Sequences (MP, ML, Bayesian, …) ◦ Species tree topology (GIGA) ◦ Branch/clade support (LSM) ◦ Speciation/duplication events inferred by reconciliation (TreeBeST) ◦ Gene synteny (SYNERGY) ◦ Gene position and order on genome 21

  22. Gene order  Genome : a string of genes, giving the order in which genes are found in a given species ◦ Genome for X species : “a b c d e f g …”  Region : a subsequence of a genome ◦ Pick a subset of a genome’s genes, maintaining the order ◦ a b c d e f g h ... => b c e g region  Typically, we impose a limit on the size of a region and on the genome distance between its members 22

  23. Region homology  Two genes are homologous if they descend from a common ancestral gene ◦ This ancestral has undergone speciation or duplication 23

  24. Region homology  Two genes are homologous if they descend from a common ancestral gene ◦ This ancestral has undergone speciation or duplication  Can we define region homology similarly? 24

  25. Region homology  Two genes are homologous if they descend from a common ancestral gene, which has undergone speciation or duplication  Can we define region homology similarly ?  Two regions are homologous if they descend from a common ancestral region , which has undergone speciation or duplication 25

  26. Region homology  Two genes are homologous if they descend from a common ancestral gene, which has undergone speciation or duplication  Can we define region homology similarly ?  Two regions are homologous if they descend from a common ancestral region , which has undergone speciation or duplication ◦ What does that even mean ? 26

  27. Region homology  Common ancestral region ◦ For two given regions R 1 , R 2 R 1 a 1 b 1 c 1 d 1 R 2 a 2 b 2 c 2 d 2 Genome X Genome Y 27

  28. Region homology  Common ancestral region ◦ For two given regions R 1 , R 2  Subdivide their genes into gene families F 1 , F 2 , …, F n  In the example, four families (a,b,c,d)  Look at the roots of the gene trees for all the F i ’s a b c d R 1 a 1 b 1 c 1 d 1 R 2 a 2 b 2 c 2 d 2 Genome X Genome Y 28

  29. Region homology  Common ancestral region  If all these ancestral genes are in the same ancestral genome, R 1 , R 2 share a common ancestral region R A R A a b c d R 1 a 1 b 1 c 1 d 1 R 2 a 2 b 2 c 2 d 2 Genome X Genome Y 29

  30. Region homology  Region speciation ◦ All the roots are speciation R A a b c d R 1 a 1 b 1 c 1 d 1 R 2 a 2 b 2 c 2 d 2 Genome X Genome Y 30

  31. Region homology  Region duplication ◦ All the roots are duplications ◦ Corresponds to a segmental duplication (or “region duplication” in the ancestral genome R A a b c d R 1 a 1 b 1 c 1 d 1 R 2 a 2 b 2 c 2 d 2 Genome X Genome Y 31

  32. Region homology  Not homologous regions R A a b c d R 1 a 1 b 1 c 1 d 1 R 2 a 2 b 2 c 2 d 2 Genome X Genome Y 32

  33. No convergent evolution hypothesis  Hypothesis : similar regions are homologous R A a b c d R 1 a 1 b 1 c 1 d 1 R 2 a 2 b 2 c 2 d 2 Genome X Genome Y 33

  34. Homology contradiction  If we find two similar regions and look at the roots of the gene family trees, we expect them all to be the same type R 1 a 1 b 1 c 1 d 1 R 2 a 2 b 2 c 2 d 2 Genome X Genome Y 37

  35. Homology contradiction  If we find two similar regions and look at the roots of the gene family trees, we expect them all to be the same type a b c d R 1 a 1 b 1 c 1 d 1 R 2 a 2 b 2 c 2 d 2 Genome X Genome Y 38

  36. Homology contradiction  If we find two similar regions and look at the roots of the gene family trees, we expect them all to be the same type a b c d R 1 a 1 b 1 c 1 d 1 R 2 a 2 b 2 c 2 d 2 Genome X Genome Y 39

  37. Homology contradiction  Otherwise, there is a homology contradiction (an error in one of the gene trees) R 1 a 1 b 1 c 1 d 1 R 2 a 2 b 2 c 2 d 2 Genome X Genome Y 40

  38. Homology contradiction  Why not ? ◦ If b A duplicated, the copy typically went somewhere else on the ancestral genome b A R 1 a 1 b 1 c 1 d 1 R 2 a 2 b 2 c 2 d 2 Genome X Genome Y 41

  39. Homology contradiction  Why not ? ◦ If b A duplicated, the copy typically went somewhere else on the ancestral genome b A ’ b A R 1 a 1 b 1 c 1 d 1 R 2 a 2 b 2 c 2 d 2 Genome X Genome Y 42

  40. Homology contradiction  Why not ? ◦ If b A duplicated, the copy typically went somewhere else on the ancestral genome ◦ And somehow, during evolution, it ended up in a region similar to R 1 , mostly by chance b A ’ b A R 1 a 1 b 1 c 1 d 1 R 2 a 2 b 2 c 2 d 2 Genome X Genome Y 43

  41. Strong no convergent evolution  Hypothesis : similarity is inherited from the common ancestral region, and is preserved during the course of evolution a 1 g 1 b 1 g 2 g 3 a 2 g 4 b 2 G : gene tree for g family 44

  42. Strong no convergent evolution  Hypothesis : similarity is inherited from the common ancestral region, and is preserved during the course of evolution a A g A b A a 1 g 1 b 1 g 2 g 3 a 2 g 4 b 2 45

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend