phylogenetic tree
play

Phylogenetic tree Michael Schroeder Biotechnology Center TU - PowerPoint PPT Presentation

Phylogenetic tree Michael Schroeder Biotechnology Center TU Dresden Phylogenetic trees Motivation Rooted and unrooted trees Rooted trees: Hierarchical clustering Drawing trees Unrooted trees: Neighbour joining Distance


  1. Phylogenetic tree Michael Schroeder Biotechnology Center TU Dresden

  2. Phylogenetic trees • Motivation • Rooted and unrooted trees • Rooted trees: Hierarchical clustering • Drawing trees • Unrooted trees: Neighbour joining

  3. Distance in matrix = distance in tree? A B C D E A 2 6 10 9 B 5 9 8 C 4 5 D 3 E A B C D E

  4. Distance in matrix = distance in 2D? A B C D E A 2 6 10 9 B 5 9 8 C 4 5 D 3 E

  5. Distance in matrix = distance in 2D? D E A B C D E A 2 6 10 9 B 5 9 8 C C 4 5 D 3 E A B

  6. Distance in matrix = distance in 2D? No, not always A B C A 1 10 B 1 C Distance in matrix = distance in tree?

  7. Distance in matrix = distance in tree § If tree is additive § distance from v to w is § sum of edge lengths connecting v to w

  8. Additive Tree Is there an additive tree? A B C D E F A 27 24 22 31 30 B 11 21 12 11 C 18 15 14 D 25 24 E 5 F

  9. Additive Tree Yes, there is an additive tree A A B C D E F A 27 24 22 31 30 14 C B B 11 21 12 11 4 4 C 18 15 14 6 3 F 2 5 D 25 24 8 3 E 5 E D F

  10. Additive Tree Tree is additive iff* for all nodes i,j,k,l D i,j + D k,l = D i,k + D j,l ≥ D i,l + D j,k j i l k * iff is used in math/comp sci for „if and only if“

  11. Constructing the Edges of the Tree A B C D E (A,B) C D E A 3 7 8 10 (A,B) 6.5 8 8.5 Average linkage (WPGMA) B 6 8 7 C 4 5 C 4 5 D 6 D 6 E E (A,B) 1.5 1.5 B A

  12. Constructing the Edges of the Tree (A,B) C D E (A,B) (C,D) E (A,B) 6.5 8 8.5 Average linkage (WPGMA) (A,B) 7.25 8.5 C 4 5 (C,D) 5.5 D 6 E E (A,B) (C,D) 1.5 1.5 2 2 C B D A

  13. Constructing the Edges of the Tree (A,B) (C,D) E (A,B) ((C,D),E) Average linkage (WPGMA) (A,B) 7.25 8.5 (A,B) 7.875 (C,D) 5.5 ((C,D),E) E ((C,D),E) 0.75 (A,B) (C,D) 2.75 1.5 1.5 2 2 E C B D A

  14. Constructing the Edges of the Tree (A,B) ((C,D),E) (A,B) 7.875 ((C,D),E) ((A,B),((C,D),E)) 1.1875 2.4375 ((C,D),E) 0.75 (A,B) (C,D) 2.75 1.5 1.5 2 2 E C B D A

  15. Constructing the Edges of the Tree § If node w=(v,u) joins nodes v and u, then § L v,w = 0.5 D u,v – L v,v’ § D refers to the distances (from the matrix) and § L to the lengths of the edges § L v,v’ is zero if v is a leave node w L v,w u v L v,v’ v’

  16. Original and tree distances may differ Linkage method changes distances. Tree reflects changed distances A B C D E A B C D E A 3 7 8 10 A 3 7.875 7.875 7.875 B 6 8 7 B 7.875 7.875 7.875 C 4 5 C 4 5.5 D 6 D 5.5 E E Original distances Distances in tree ((C,D),E) 0.75 (A,B) (C,D) 2.75 1.5 1.5 2 2 E C B D A

  17. Is hierarchical clustering always right? Original data A B C D E F A 5 4 7 6 8 B 7 10 9 11 C 7 6 8 D 5 9 E 8 F A and C are closest A and C have same distance to D, E, and F B is closer to A than to C http://www.icp.ucl.ac.be/~opperd/private/upgma.html

  18. Is hierarchical clustering always right? Original data Topology by UPGMA (Unweighted pair group method using arithmetic mean) A B C D E F A 5 4 7 6 8 B 7 10 9 11 C 7 6 8 D 5 9 E 8 F A and C are closest A and C are closest A and C have same distance to D, E, and F A and C have same distance to D, E, and F B is closer to A than to C B is closer to A than to C http://www.icp.ucl.ac.be/~opperd/private/upgma.html

  19. Is hierarchical clustering always right? Original data Topology by UPGMA Better topology (Unweighted pair group method using arithmetic mean) A B C D E F A 5 4 7 6 8 B 7 10 9 11 C 7 6 8 D 5 9 E 8 F A and C are closest A and C are closest A and C are closest A and C have same distance to D, E, and F A and C have same distance to D, E, and F A and C have same distance to D, E, and F B is closer to A than to C B is closer to A than to C B is closer to A than to C How do we compute the better topology? Hierarchical clustering takes a local perspective, we need a global one http://www.icp.ucl.ac.be/~opperd/private/upgma.html

  20. Phylogenetic trees • Motivation • Rooted and unrooted trees • Rooted trees: Hierarchical clustering • Drawing trees • Unrooted trees: Neighbour joining

  21. Neighbour Joining Based on wikipedia

  22. Neighbour Joining § Pair of nodes farthest from all other nodes § Let u i be distances from node i to all other nodes n ∑ u i = d ik k = 1 § Find pair of nodes i,j with minimal § Q(i,j) = (n-2) d i,j – (u i + u j ) Based on wikipedia and Felsenstein. Phylogenies

  23. Neighbour Joining 1. Calculate Q 2. Choose pair i, j with lowest value in Q 3. Create new node u 4. Calculate distances from i and j to u 5. Calculate distances from all remaining k to u 6. Start the algorithm again, replacing i and j by u

  24. Example Distances a b c d e a 5 9 9 8 b 10 10 9 c 8 7 d 3 e Q a b c d e a -50 -38 -34 -34 b -38 -34 -34 c -40 -40 d -48 e n ∑ u i = d ik Q(i,j) = (n-2) d i,j – (u i + u j ) k = 1 Based on wikipedia

  25. Example Distances a b c d e a 5 9 9 8 b 10 10 9 c 8 7 d 3 e If node u joins i and j, then distance i to u is: d ( i , u ) = 1 2 d ( i , j ) + 2( n − 2) ( u i − u j ) 1 n ∑ where u i = d ik k = 1 i.e. give weight to (differing) distances of i and j to other nodes k Based on wikipedia

  26. Rooting unrooted trees „Lift up“ at midpoint of longest path in tree A What is an outgroup? 14 How does it relate? C B 4 4 6 3 1.5 4.5 F 2 5 8 3 Longest path is from A to E: 14+6+3+5+3=31 Root at mid point of longest path: 31/2=15.5 E D

  27. Assessing Quality: Bootstrapping § Given a tree obtained from one of the methods above § Generate Multiple Alignment § For a number of iterations § Generate new sequences by selecting columns (possibly the same column more than once) form the multiple alignment § Generate tree for the new sequences § Compare this new tree with the given tree § For each cluster in the given tree, which also approach in the new tree, the bootstrap value is increased § Bootstrap-Value = Percentage of trees containing the same cluster

  28. Parsimony-method § Approach: Generate “ smallest ” tree containing all the sequences as leaves Seq 1 2 3 4 5 6 a G G G G G G b G G G A G T c G G A T A G d G A T C A T 3 G->A 4 G->T 5 G->A 2 G->A 3 T->A 4 G->A 4 T->C 6 G->T 6 G->T a GGGGGG b GGGAGT c GGATAG d GATCAT

  29. Parsimony § Generate smallest tree § Informative vs. non-informative sites § Build pairs with fewest possible substitutions § Example: § 3 possible trees: § ((a,b),(c,d)) or ((a,c),(b,d)) or ((a,d),(b,c)) § 1,2,3,4 are not informative Seq 1 2 3 4 5 6 § 5,6 are informative a G G G G G G § 5: ((a,b),(c,d)) b G G G A G T § 6: ((a,c),(b,d)) c G G A T A G d G A T C A T

  30. Maximum likelihood § Assigns quantitative probabilities to mutation events § Reconstructs ancestors for all nodes in the tree § Assigns branch lengths based on probabilities of the mutational events § For each possible tree topology, the assumed substitution rates are varied to find the parameters that give the highest likelihood of producing the observed data

  31. Summary § Drawing trees from hierarchical clustering § Neighbour joining § Assessing quality with bootstrapping § (Parsimony and maximum likelihood)

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend