Phylogenetic tree Michael Schroeder Biotechnology Center TU - - PowerPoint PPT Presentation

phylogenetic tree
SMART_READER_LITE
LIVE PREVIEW

Phylogenetic tree Michael Schroeder Biotechnology Center TU - - PowerPoint PPT Presentation

Phylogenetic tree Michael Schroeder Biotechnology Center TU Dresden Phylogenetic trees Motivation Rooted and unrooted trees Rooted trees: Hierarchical clustering Drawing trees Unrooted trees: Neighbour joining Origin of


slide-1
SLIDE 1

Michael Schroeder

Biotechnology Center TU Dresden

Phylogenetic tree

slide-2
SLIDE 2

Phylogenetic trees

  • Motivation
  • Rooted and unrooted trees
  • Rooted trees: Hierarchical clustering
  • Drawing trees
  • Unrooted trees: Neighbour joining
slide-3
SLIDE 3

Origin of mitochondria in eucaryotes?

Sequence comparison (Blast) of 601 mitochondrial yeast genes to bacteria and archaea

slide-4
SLIDE 4

Origin of mitochondria in eucaryotes?

Sequence comparison (Blast) of 601 mitochondrial yeast genes to bacteria and archaea

Bacteria

Horiike et al. Nat Cell Biol. 2001. Adapted from Campbell and Heyer. Discovering genomics, proteomics, bioinformatics.

Archaea

slide-5
SLIDE 5

Darwin‘s Tree of Life

5

slide-6
SLIDE 6

Tree of Life with 2.3 Mio Species

  • pentreeoflife.org

6

slide-7
SLIDE 7

Phylogeny

§ Taxonomists classify and group organisms § Aristoteles, De Partibus Animalium

§ …discuss each separate species – man, lion, ox,… § or … deal first with the attributes which they have in common…

slide-8
SLIDE 8

Schools of Taxonomists

§ Goal: create taxonomy § Approach: § Phenotype § Phylogeny § 3 schools: § Phenotype only § Evolutionary Taxonomists: Phenotype (+ Phylogeny) § Cladists: Phylogeny (+Phenotype)

slide-9
SLIDE 9

Westnile virus in New York

slide-10
SLIDE 10

When did homo sapiens leave Africa?

§ Recent-Africa Hypothesis: hundred(s) thousand years § Multi-regional Hypothesis: million(s) years

slide-11
SLIDE 11

§ 53 humans § Outgroup chimpanzee

slide-12
SLIDE 12

Clustal W: over 50 000 citations

slide-13
SLIDE 13

Thompson, NAR, 1994

ClustalW uses phylogenetic trees as guide trees for multiple sequence alignment

slide-14
SLIDE 14

Phylogenetic trees

  • Motivation
  • Rooted and unrooted trees
  • Rooted trees: Hierarchical clustering
  • Drawing trees
  • Unrooted trees: Neighbour joining
slide-15
SLIDE 15

Topixgallery.com

slide-16
SLIDE 16

Bifurcating Trees

A B C D

Edge or Branch Ancestral node (root) Internal node (hypothetical ancestor) Terminal node (leave) Genes, Proteins, Populations, Species,... Bifurcating = two decendants

slide-17
SLIDE 17

Unrooted and Rooted Trees

The principal uses of these numbers will be ... to frighten taxonomists.

slide-18
SLIDE 18

Unrooted and Rooted Trees

A B C A C B B C A B C A

slide-19
SLIDE 19

A B C D A B C D A B C D

A B C D A C B D B C A D C A B D D A B c A D B C A D B C B D A C C B A D D B A C A B C D B A C D C D A B D C A B A C B D

Unrooted and Rooted Trees

slide-20
SLIDE 20

By Michael Schroeder, Biotec 20

Unrooted and Rooted Trees

8.200.794.532.637.891.559.375 unrooted trees for 20 leaves!

To get a feeling: 8.200.794.532.637.891.559.375 ms is 20 times longer than the universe exists

Felsenstein, 1978

slide-21
SLIDE 21

By Michael Schroeder, Biotec 21

Unrooted and Rooted Trees

Rooted tree with m leaves has m-1 internal nodes and 2m-2 edges Unrooted tree with m leaves has m-2 internal nodes and 2m-3 edges Let Tunroot (m) be the number of unrooted trees with m leaves Given an unrooted tree with m leaves, an extra leaf can be added to any of the 2m-3 edges to make a tree with m+1 leaves Tunroot(m+1)=(2m-3) Tunroot(m) This is satisfied by Tunroot (m)=(2m-5)!! Double factorial = Factorial leaving out every other number

Felsenstein, 1978

slide-22
SLIDE 22

Consequence: Algorithms that generate all trees, judge them, and pick the best cannot work, as there are too many trees. Alternatives: Hierarchical clustering and Neighbour joining

slide-23
SLIDE 23

Phylogenetic trees

  • Motivation
  • Rooted and unrooted trees
  • Rooted trees: Hierarchical clustering
  • Drawing trees
  • Unrooted trees: Neighbour joining
slide-24
SLIDE 24

Hierarchical clustering

§ Input: Pairwise distances between sequences § Output: A tree of clusters of sequences

A B C D E A 2 6 10 9 B 5 9 8 C 4 5 D 3 E A B C D E

slide-25
SLIDE 25

A B C D E A 2 6 10 9 B 5 9 8 C 4 5 D 3 E

(A,B)

C D E (A,B) 5 9 8 C 4 5 D 3 E A B

Hierarchical clustering

slide-26
SLIDE 26

(A,B)

C (D,E) (A,B) 5 8 C 4 (D,E)

(A,B)

C D E (A,B) 5 9 8 C 4 5 D 3 E A B D E A B

Hierarchical clustering

slide-27
SLIDE 27

(A,B)

C (D,E) (A,B) 5 8 C 4 (D,E) A B C D E

(A,B)

(C,(D,E)) (A,B) 5 (C,(D,E)) A B D E

Hierarchical clustering

slide-28
SLIDE 28

((A,B),(C,(D,E)))

((A,B),(C,(D,E))) A B C D E

(A,B)

(C,(D,E)) (A,B) 5 (C,(D,E)) A B C D E

Hierarchical clustering

slide-29
SLIDE 29

const m number of original sequences var U a set of current trees, initially, one tree for each original sequence. D The distance between the trees in U begin U = the set of one tree (each of one node) for each original sequence. while |U| >1 do (u,v) = the roots of two trees in U with the least distance in D Make a new tree with root w and with u and v as children Calculate the length of the edges (v, w) and (u, w) for each root x of the trees in U-{u, v} do D(x, w) = calculate the distance between x and the new node (w) end U = (U - {u,v} ) ∪ {w} update U end end

Algorithm

slide-30
SLIDE 30

Hierarchical Clustering

Distance to the new cluster w = (u,v)

§ Single linkage:

§ D(x,w) = min { D(x,u), D(x,v) } § Example: Distance (A,B) to C is 1

§ Complete linkage:

§ D(x,w) = max { D(x,u), D(x,v) } § Example: Distance (A,B) is C is 2

§ Average linkage (WPGMA) (weighted pair group method with arithmetic mean)):

§ D(x,w) = ( D(x,u) + D(x,v) ) / 2 § Example: Distance (A,B) to C is 1.5

§ More general (UPGMA)

(unweighted pair group method using arithmetic mean):

§ D(x,w) = ( mu D(x,u) + mv D(x,v) ) / (mu + mv ) § mu is the number of nodes in the subtreee u

By Michael Schroeder, Biotec 30

Question: What’s the difference between UPGMA and WPGMA? Note: “weighted” because u and v may have different number of nodes, hences they are weighted.

C 1 B 2 1 A C B A

slide-31
SLIDE 31

Are WPGMA and UPGMA the same?

§ Subtree D has 1000 nodes (mD =1000) § Subtree E has 1 node (mE =1) § Distance (D,E) to F is

§ ( 2 + 98)/ 2 = 50 for WPGMA § (1000*2 + 1*98)/(1000+1) = 2.09 for UPGMA

F 98 E 2 1 D F E D

slide-32
SLIDE 32

UPGMA

Unweighted pair group method using arithmetic mean

A B C D E A 3 7 8 10 B 6 8 7 C 4 5 D 6 E

(A,B)

C D E (A,B) 6.5 8 8.5 C 4 5 D 6 E

(A,B)

(C,D) E (A,B) 7.25 8.5 (C,D) 5.5 E

(A,B)

(C,D),E) (A,B) 7.67 (C,D),E)

UPGMA: (2*7.25+1*8.5) / 3 = 7.67 WPGMA: (7.25+8.5) / 2 = 7.825

slide-33
SLIDE 33

Does linkage method change trees?

By Michael Schroeder, Biotec 33

A

B C D A 1 2 5 B 4 5 C 3 D

A B C D A B C D

slide-34
SLIDE 34

Summary

§ Applications of phylogenetic trees § Clustal W

§ Hierarchical clustering § Linkage methods