Michael Schroeder
Biotechnology Center TU Dresden
Phylogenetic tree Michael Schroeder Biotechnology Center TU - - PowerPoint PPT Presentation
Phylogenetic tree Michael Schroeder Biotechnology Center TU Dresden Phylogenetic trees Motivation Rooted and unrooted trees Rooted trees: Hierarchical clustering Drawing trees Unrooted trees: Neighbour joining Distance
Biotechnology Center TU Dresden
A B C D E A 2 6 10 9 B 5 9 8 C 4 5 D 3 E A B C D E
A B C D E A 2 6 10 9 B 5 9 8 C 4 5 D 3 E
A B C D E A 2 6 10 9 B 5 9 8 C 4 5 D 3 E
A B C D E
C 1 B 10 1 A C B A
No, not always
A B C D E F A 27 24 22 31 30 B 11 21 12 11 C 18 15 14 D 25 24 E 5 F
D A E F B C
8 14 3 2 4 5 3 4 6
A B C D E F A 27 24 22 31 30 B 11 21 12 11 C 18 15 14 D 25 24 E 5 F
i l j k
* iff is used in math/comp sci for „if and only if“
A B (A,B) 1.5 1.5 A B C D E A 3 7 8 10 B 6 8 7 C 4 5 D 6 E
Average linkage (WPGMA)
(A,B) C D E (A,B) 6.5 8 8.5 C 4 5 D 6 E
A B C D (A,B) (C,D) 1.5 1.5 2 2
Average linkage (WPGMA)
(A,B) C D E (A,B) 6.5 8 8.5 C 4 5 D 6 E (A,B) (C,D) E (A,B) 7.25 8.5 (C,D) 5.5 E
A B C D E (A,B) (C,D) 1.5 1.5 2 2 2.75 0.75
Average linkage (WPGMA)
(A,B) (C,D) E (A,B) 7.25 8.5 (C,D) 5.5 E (A,B) ((C,D),E) (A,B) 7.875 ((C,D),E) ((C,D),E)
A B C D E (A,B) (C,D) ((C,D),E) 1.5 1.5 2.4375 1.1875 2 2 2.75 0.75 (A,B) ((C,D),E) (A,B) 7.875 ((C,D),E) ((A,B),((C,D),E))
§ If node w=(v,u) joins nodes v and u, then § Lv,w = 0.5 Du,v – Lv,v’
§ D refers to the distances (from the matrix) and § L to the lengths of the edges § Lv,v’ is zero if v is a leave node
Distances in tree Original distances
A B C D E A 3 7 8 10 B 6 8 7 C 4 5 D 6 E A B C D E A 3 7.875 7.875 7.875 B 7.875 7.875 7.875 C 4 5.5 D 5.5 E
Linkage method changes distances. Tree reflects changed distances A B C D E (A,B) (C,D) ((C,D),E) 1.5 1.5 2 2 2.75 0.75
A and C are closest A and C have same distance to D, E, and F B is closer to A than to C
A B C D E F A 5 4 7 6 8 B 7 10 9 11 C 7 6 8 D 5 9 E 8 F
http://www.icp.ucl.ac.be/~opperd/private/upgma.html
Original data
A and C are closest A and C have same distance to D, E, and F B is closer to A than to C
A B C D E F A 5 4 7 6 8 B 7 10 9 11 C 7 6 8 D 5 9 E 8 F
Topology by UPGMA
(Unweighted pair group method using arithmetic mean)
http://www.icp.ucl.ac.be/~opperd/private/upgma.html
A and C are closest A and C have same distance to D, E, and F B is closer to A than to C
Original data
A and C are closest A and C have same distance to D, E, and F B is closer to A than to C
A B C D E F A 5 4 7 6 8 B 7 10 9 11 C 7 6 8 D 5 9 E 8 F
Topology by UPGMA
(Unweighted pair group method using arithmetic mean)
http://www.icp.ucl.ac.be/~opperd/private/upgma.html
A and C are closest A and C have same distance to D, E, and F B is closer to A than to C A and C are closest A and C have same distance to D, E, and F B is closer to A than to C
Better topology Original data
How do we compute the better topology? Hierarchical clustering takes a local perspective, we need a global one
Based on wikipedia
§ Pair of nodes farthest from all other nodes § Let ui be distances from node i to all other nodes § Find pair of nodes i,j with minimal § Q(i,j) = (n-2) di,j – (ui + uj)
Based on wikipedia and Felsenstein. Phylogenies
ui = dik
k=1 n
Based on wikipedia
Distances Q
a b c d e a 5 9 9 8 b 10 10 9 c 8 7 d 3 e a b c d e a
b
c
d
e
Q(i,j) = (n-2) di,j – (ui + uj)
ui = dik
k=1 n
Based on wikipedia
Distances
2 d(i, j)+ 1 2(n−2) (ui −uj)
If node u joins i and j, then distance i to u is: where i.e. give weight to (differing) distances of i and j to other nodes k
a b c d e a 5 9 9 8 b 10 10 9 c 8 7 d 3 e
ui = dik
k=1 n
D
A
E F B C
8
14 3
2 4
5 3
4
6
1.5 4.5 Longest path is from A to E: 14+6+3+5+3=31 Root at mid point of longest path: 31/2=15.5
„Lift up“ at midpoint of longest path in tree
What is an outgroup? How does it relate?
§ Generate tree for the new sequences § Compare this new tree with the given tree § For each cluster in the given tree, which also approach in the new tree, the bootstrap value is increased
§ Approach: Generate “smallest” tree containing all the sequences as leaves
§ ((a,b),(c,d)) or ((a,c),(b,d)) or ((a,d),(b,c))
§ 5: ((a,b),(c,d)) § 6: ((a,c),(b,d))