Phylogenetic tree Michael Schroeder Biotechnology Center TU - - PowerPoint PPT Presentation

phylogenetic tree
SMART_READER_LITE
LIVE PREVIEW

Phylogenetic tree Michael Schroeder Biotechnology Center TU - - PowerPoint PPT Presentation

Phylogenetic tree Michael Schroeder Biotechnology Center TU Dresden Phylogenetic trees Motivation Rooted and unrooted trees Rooted trees: Hierarchical clustering Drawing trees Unrooted trees: Neighbour joining Distance


slide-1
SLIDE 1

Michael Schroeder

Biotechnology Center TU Dresden

Phylogenetic tree

slide-2
SLIDE 2

Phylogenetic trees

  • Motivation
  • Rooted and unrooted trees
  • Rooted trees: Hierarchical clustering
  • Drawing trees
  • Unrooted trees: Neighbour joining
slide-3
SLIDE 3

Distance in matrix = distance in tree?

A B C D E A 2 6 10 9 B 5 9 8 C 4 5 D 3 E A B C D E

slide-4
SLIDE 4

Distance in matrix = distance in 2D?

A B C D E A 2 6 10 9 B 5 9 8 C 4 5 D 3 E

slide-5
SLIDE 5

Distance in matrix = distance in 2D?

A B C D E A 2 6 10 9 B 5 9 8 C 4 5 D 3 E

A B C D E

slide-6
SLIDE 6

Distance in matrix = distance in 2D?

C 1 B 10 1 A C B A

Distance in matrix = distance in tree?

No, not always

slide-7
SLIDE 7

§ If tree is additive

§ distance from v to w is § sum of edge lengths connecting v to w

Distance in matrix = distance in tree

slide-8
SLIDE 8

Additive Tree

Is there an additive tree?

A B C D E F A 27 24 22 31 30 B 11 21 12 11 C 18 15 14 D 25 24 E 5 F

slide-9
SLIDE 9

Additive Tree

Yes, there is an additive tree

D A E F B C

8 14 3 2 4 5 3 4 6

A B C D E F A 27 24 22 31 30 B 11 21 12 11 C 18 15 14 D 25 24 E 5 F

slide-10
SLIDE 10

Additive Tree

Tree is additive iff* for all nodes i,j,k,l Di,j + Dk,l = Di,k + Dj,l ≥ Di,l + Dj,k

i l j k

* iff is used in math/comp sci for „if and only if“

slide-11
SLIDE 11

Constructing the Edges of the Tree

A B (A,B) 1.5 1.5 A B C D E A 3 7 8 10 B 6 8 7 C 4 5 D 6 E

Average linkage (WPGMA)

(A,B) C D E (A,B) 6.5 8 8.5 C 4 5 D 6 E

slide-12
SLIDE 12

Constructing the Edges of the Tree

A B C D (A,B) (C,D) 1.5 1.5 2 2

Average linkage (WPGMA)

(A,B) C D E (A,B) 6.5 8 8.5 C 4 5 D 6 E (A,B) (C,D) E (A,B) 7.25 8.5 (C,D) 5.5 E

slide-13
SLIDE 13

Constructing the Edges of the Tree

A B C D E (A,B) (C,D) 1.5 1.5 2 2 2.75 0.75

Average linkage (WPGMA)

(A,B) (C,D) E (A,B) 7.25 8.5 (C,D) 5.5 E (A,B) ((C,D),E) (A,B) 7.875 ((C,D),E) ((C,D),E)

slide-14
SLIDE 14

Constructing the Edges of the Tree

A B C D E (A,B) (C,D) ((C,D),E) 1.5 1.5 2.4375 1.1875 2 2 2.75 0.75 (A,B) ((C,D),E) (A,B) 7.875 ((C,D),E) ((A,B),((C,D),E))

slide-15
SLIDE 15

Constructing the Edges of the Tree

§ If node w=(v,u) joins nodes v and u, then § Lv,w = 0.5 Du,v – Lv,v’

§ D refers to the distances (from the matrix) and § L to the lengths of the edges § Lv,v’ is zero if v is a leave node

Lv,v’ v v’ w u Lv,w

slide-16
SLIDE 16

Original and tree distances may differ

Distances in tree Original distances

A B C D E A 3 7 8 10 B 6 8 7 C 4 5 D 6 E A B C D E A 3 7.875 7.875 7.875 B 7.875 7.875 7.875 C 4 5.5 D 5.5 E

Linkage method changes distances. Tree reflects changed distances A B C D E (A,B) (C,D) ((C,D),E) 1.5 1.5 2 2 2.75 0.75

slide-17
SLIDE 17

Is hierarchical clustering always right?

A and C are closest A and C have same distance to D, E, and F B is closer to A than to C

A B C D E F A 5 4 7 6 8 B 7 10 9 11 C 7 6 8 D 5 9 E 8 F

http://www.icp.ucl.ac.be/~opperd/private/upgma.html

Original data

slide-18
SLIDE 18

Is hierarchical clustering always right?

A and C are closest A and C have same distance to D, E, and F B is closer to A than to C

A B C D E F A 5 4 7 6 8 B 7 10 9 11 C 7 6 8 D 5 9 E 8 F

Topology by UPGMA

(Unweighted pair group method using arithmetic mean)

http://www.icp.ucl.ac.be/~opperd/private/upgma.html

A and C are closest A and C have same distance to D, E, and F B is closer to A than to C

Original data

slide-19
SLIDE 19

Is hierarchical clustering always right?

A and C are closest A and C have same distance to D, E, and F B is closer to A than to C

A B C D E F A 5 4 7 6 8 B 7 10 9 11 C 7 6 8 D 5 9 E 8 F

Topology by UPGMA

(Unweighted pair group method using arithmetic mean)

http://www.icp.ucl.ac.be/~opperd/private/upgma.html

A and C are closest A and C have same distance to D, E, and F B is closer to A than to C A and C are closest A and C have same distance to D, E, and F B is closer to A than to C

Better topology Original data

How do we compute the better topology? Hierarchical clustering takes a local perspective, we need a global one

slide-20
SLIDE 20

Phylogenetic trees

  • Motivation
  • Rooted and unrooted trees
  • Rooted trees: Hierarchical clustering
  • Drawing trees
  • Unrooted trees: Neighbour joining
slide-21
SLIDE 21

Neighbour Joining

Based on wikipedia

slide-22
SLIDE 22

Neighbour Joining

§ Pair of nodes farthest from all other nodes § Let ui be distances from node i to all other nodes § Find pair of nodes i,j with minimal § Q(i,j) = (n-2) di,j – (ui + uj)

Based on wikipedia and Felsenstein. Phylogenies

ui = dik

k=1 n

slide-23
SLIDE 23
  • 1. Calculate Q
  • 2. Choose pair i, j with lowest value in Q
  • 3. Create new node u
  • 4. Calculate distances from i and j to u
  • 5. Calculate distances from all remaining k to u
  • 6. Start the algorithm again, replacing i and j by u

Neighbour Joining

slide-24
SLIDE 24

Example

Based on wikipedia

Distances Q

a b c d e a 5 9 9 8 b 10 10 9 c 8 7 d 3 e a b c d e a

  • 50
  • 38
  • 34
  • 34

b

  • 38
  • 34
  • 34

c

  • 40
  • 40

d

  • 48

e

Q(i,j) = (n-2) di,j – (ui + uj)

ui = dik

k=1 n

slide-25
SLIDE 25

Example

Based on wikipedia

Distances

d(i,u) = 1

2 d(i, j)+ 1 2(n−2) (ui −uj)

If node u joins i and j, then distance i to u is: where i.e. give weight to (differing) distances of i and j to other nodes k

a b c d e a 5 9 9 8 b 10 10 9 c 8 7 d 3 e

ui = dik

k=1 n

slide-26
SLIDE 26

Rooting unrooted trees

D

A

E F B C

8

14 3

2 4

5 3

4

6

1.5 4.5 Longest path is from A to E: 14+6+3+5+3=31 Root at mid point of longest path: 31/2=15.5

„Lift up“ at midpoint of longest path in tree

What is an outgroup? How does it relate?

slide-27
SLIDE 27

Assessing Quality: Bootstrapping

§ Given a tree obtained from one of the methods above § Generate Multiple Alignment § For a number of iterations § Generate new sequences by selecting columns (possibly the same column more than once) form the multiple alignment

§ Generate tree for the new sequences § Compare this new tree with the given tree § For each cluster in the given tree, which also approach in the new tree, the bootstrap value is increased

§ Bootstrap-Value = Percentage of trees containing the same cluster

slide-28
SLIDE 28

Parsimony-method

§ Approach: Generate “smallest” tree containing all the sequences as leaves

Seq 1 2 3 4 5 6 a G G G G G G b G G G A G T c G G A T A G d G A T C A T 3 G->A 4 G->T 5 G->A 2 G->A 3 T->A 4 G->A 4 T->C 6 G->T 6 G->T a GGGGGG b GGGAGT c GGATAG d GATCAT

slide-29
SLIDE 29

Parsimony

§ Generate smallest tree § Informative vs. non-informative sites § Build pairs with fewest possible substitutions § Example:

§ 3 possible trees:

§ ((a,b),(c,d)) or ((a,c),(b,d)) or ((a,d),(b,c))

§ 1,2,3,4 are not informative § 5,6 are informative

§ 5: ((a,b),(c,d)) § 6: ((a,c),(b,d))

Seq 1 2 3 4 5 6 a G G G G G G b G G G A G T c G G A T A G d G A T C A T

slide-30
SLIDE 30

Maximum likelihood

§ Assigns quantitative probabilities to mutation events § Reconstructs ancestors for all nodes in the tree § Assigns branch lengths based on probabilities of the mutational events § For each possible tree topology, the assumed substitution rates are varied to find the parameters that give the highest likelihood of producing the

  • bserved data
slide-31
SLIDE 31

Summary

§ Drawing trees from hierarchical clustering § Neighbour joining § Assessing quality with bootstrapping § (Parsimony and maximum likelihood)