Michael Schroeder
Biotechnology Center TU Dresden
Phylogenetic tree Michael Schroeder Biotechnology Center TU - - PowerPoint PPT Presentation
Phylogenetic tree Michael Schroeder Biotechnology Center TU Dresden Phylogenetic trees Motivation Rooted and unrooted trees Rooted trees: Hierarchical clustering Drawing trees Unrooted trees: Neighbour joining Origin of
Biotechnology Center TU Dresden
Sequence comparison (Blast) of 601 mitochondrial yeast genes to bacteria and archaea
Sequence comparison (Blast) of 601 mitochondrial yeast genes to bacteria and archaea
Bacteria
Horiike et al. Nat Cell Biol. 2001. Adapted from Campbell and Heyer. Discovering genomics, proteomics, bioinformatics.
Archaea
5
6
§ Goal: create taxonomy § Approach: § Phenotype § Phylogeny § 3 schools: § Phenotype only § Evolutionary Taxonomists: Phenotype (+ Phylogeny) § Cladists: Phylogeny (+Phenotype)
Thompson, NAR, 1994
ClustalW uses phylogenetic trees as guide trees for multiple sequence alignment
Topixgallery.com
A B C D
Edge or Branch Ancestral node (root) Internal node (hypothetical ancestor) Terminal node (leave) Genes, Proteins, Populations, Species,... Bifurcating = two decendants
The principal uses of these numbers will be ... to frighten taxonomists.
A B C A C B B C A B C A
A B C D A B C D A B C D
A B C D A C B D B C A D C A B D D A B c A D B C A D B C B D A C C B A D D B A C A B C D B A C D C D A B D C A B A C B D
By Michael Schroeder, Biotec 20
8.200.794.532.637.891.559.375 unrooted trees for 20 leaves!
To get a feeling: 8.200.794.532.637.891.559.375 ms is 20 times longer than the universe exists
Felsenstein, 1978
By Michael Schroeder, Biotec 21
Rooted tree with m leaves has m-1 internal nodes and 2m-2 edges Unrooted tree with m leaves has m-2 internal nodes and 2m-3 edges Let Tunroot (m) be the number of unrooted trees with m leaves Given an unrooted tree with m leaves, an extra leaf can be added to any of the 2m-3 edges to make a tree with m+1 leaves Tunroot(m+1)=(2m-3) Tunroot(m) This is satisfied by Tunroot (m)=(2m-5)!! Double factorial = Factorial leaving out every other number
Felsenstein, 1978
Consequence: Algorithms that generate all trees, judge them, and pick the best cannot work, as there are too many trees. Alternatives: Hierarchical clustering and Neighbour joining
A B C D E A 2 6 10 9 B 5 9 8 C 4 5 D 3 E A B C D E
A B C D E A 2 6 10 9 B 5 9 8 C 4 5 D 3 E
(A,B)
C D E (A,B) 5 9 8 C 4 5 D 3 E A B
(A,B)
C (D,E) (A,B) 5 8 C 4 (D,E)
(A,B)
C D E (A,B) 5 9 8 C 4 5 D 3 E A B D E A B
(A,B)
C (D,E) (A,B) 5 8 C 4 (D,E) A B C D E
(A,B)
(C,(D,E)) (A,B) 5 (C,(D,E)) A B D E
((A,B),(C,(D,E)))
((A,B),(C,(D,E))) A B C D E
(A,B)
(C,(D,E)) (A,B) 5 (C,(D,E)) A B C D E
const m number of original sequences var U a set of current trees, initially, one tree for each original sequence. D The distance between the trees in U begin U = the set of one tree (each of one node) for each original sequence. while |U| >1 do (u,v) = the roots of two trees in U with the least distance in D Make a new tree with root w and with u and v as children Calculate the length of the edges (v, w) and (u, w) for each root x of the trees in U-{u, v} do D(x, w) = calculate the distance between x and the new node (w) end U = (U - {u,v} ) ∪ {w} update U end end
Distance to the new cluster w = (u,v)
§ Single linkage:
§ D(x,w) = min { D(x,u), D(x,v) } § Example: Distance (A,B) to C is 1
§ Complete linkage:
§ D(x,w) = max { D(x,u), D(x,v) } § Example: Distance (A,B) is C is 2
§ Average linkage (WPGMA) (weighted pair group method with arithmetic mean)):
§ D(x,w) = ( D(x,u) + D(x,v) ) / 2 § Example: Distance (A,B) to C is 1.5
§ More general (UPGMA)
(unweighted pair group method using arithmetic mean):
§ D(x,w) = ( mu D(x,u) + mv D(x,v) ) / (mu + mv ) § mu is the number of nodes in the subtreee u
By Michael Schroeder, Biotec 30
Question: What’s the difference between UPGMA and WPGMA? Note: “weighted” because u and v may have different number of nodes, hences they are weighted.
C 1 B 2 1 A C B A
F 98 E 2 1 D F E D
Unweighted pair group method using arithmetic mean
A B C D E A 3 7 8 10 B 6 8 7 C 4 5 D 6 E
(A,B)
C D E (A,B) 6.5 8 8.5 C 4 5 D 6 E
(A,B)
(C,D) E (A,B) 7.25 8.5 (C,D) 5.5 E
(A,B)
(C,D),E) (A,B) 7.67 (C,D),E)
UPGMA: (2*7.25+1*8.5) / 3 = 7.67 WPGMA: (7.25+8.5) / 2 = 7.825
By Michael Schroeder, Biotec 33
A
B C D A 1 2 5 B 4 5 C 3 D
A B C D A B C D