Phylogenetic trees III Maximum Parsimony Gerhard Jger Words, - - PowerPoint PPT Presentation

phylogenetic trees iii maximum parsimony
SMART_READER_LITE
LIVE PREVIEW

Phylogenetic trees III Maximum Parsimony Gerhard Jger Words, - - PowerPoint PPT Presentation

Phylogenetic trees III Maximum Parsimony Gerhard Jger Words, Bones, Genes, Tools February 28, 2018 Gerhard Jger Maximum Parsimony WBGT 1 / 30 Background Background Gerhard Jger Maximum Parsimony WBGT 2 / 30 Background estimates


slide-1
SLIDE 1

Phylogenetic trees III Maximum Parsimony

Gerhard Jäger Words, Bones, Genes, Tools February 28, 2018

Gerhard Jäger Maximum Parsimony WBGT 1 / 30

slide-2
SLIDE 2

Background

Background

Gerhard Jäger Maximum Parsimony WBGT 2 / 30

slide-3
SLIDE 3

Background

Character-based tree estimation

distance-based tree estimation has several drawbacks:

very strong theoretical assumptions - e.g., all characters evolve at the same rate Neighbor Joining and UPGMA produce good but sub-optimal trees no solid statistical justifjcation for those algorithms distances are black boxes — we get a tree, but we learn nothing about the history of individual characters

character-based tree estimation

estimates complete scenario (or distribution over scenarios) for each character fjnds the tree that best explains the observed variation in the data (at least in theory, that is...)

Gerhard Jäger Maximum Parsimony WBGT 3 / 30

slide-4
SLIDE 4

Parsimony

Parsimony

Gerhard Jäger Maximum Parsimony WBGT 4 / 30

slide-5
SLIDE 5

Parsimony

Parsimony of a tree

background reading: Ewens and Grant (2005), 15.6 suppose a character matrix and a tree are given parsimony score: minimal number of mutations that has to be assumed to explain the character values at the tips, given the tree

Gerhard Jäger Maximum Parsimony WBGT 5 / 30

slide-6
SLIDE 6

Parsimony

Parsimony of a tree

Kopf "head" kop "head" head "head" tête "head" testa "head" cap "head"

Gerhard Jäger Maximum Parsimony WBGT 6 / 30

slide-7
SLIDE 7

Parsimony

Parsimony of a tree

Kopf "head" kop "head" head "head" tête "head" testa "head" cap "head"

Gerhard Jäger Maximum Parsimony WBGT 6 / 30

slide-8
SLIDE 8

Parsimony

Parsimony of a tree

Kopf "head" kop "head" head "head" tête "head" testa "head" cap "head" "head"

? ? ? ? ?

Gerhard Jäger Maximum Parsimony WBGT 6 / 30

slide-9
SLIDE 9

Parsimony

Parsimony of a tree

Kopf "head" kop "head" head "head" tête "head" testa "head" cap "head" *kop "head" testa "head"

Gerhard Jäger Maximum Parsimony WBGT 6 / 30

slide-10
SLIDE 10

Parsimony

Parsimony of a tree

Kopf "head" kop "head" head "head" tête "head" testa "head" cap "head" *kop "head" *haubud- "head" testa "head" caput "head" *kaput- "head"

Gerhard Jäger Maximum Parsimony WBGT 6 / 30

slide-11
SLIDE 11

Parsimony

Parsimony reconstruction

A C C B A B B A B B C

Parsimony = 2

Gerhard Jäger Maximum Parsimony WBGT 7 / 30

slide-12
SLIDE 12

Parsimony

Parsimony reconstruction

A C C A B B A B C

Parsimony = 3

A A

Gerhard Jäger Maximum Parsimony WBGT 7 / 30

slide-13
SLIDE 13

Parsimony

Parsimony reconstruction

A C C A B B A C

Parsimony = 3

A C C

Gerhard Jäger Maximum Parsimony WBGT 7 / 30

slide-14
SLIDE 14

Parsimony

Weighted parsimony reconstruction

A C C B A B B A B B C Weighted Parsimony = 3

Weight matrix A B C A 1 2 B 1 2 C 2 2

Gerhard Jäger Maximum Parsimony WBGT 8 / 30

slide-15
SLIDE 15

Parsimony

Weighted parsimony reconstruction

A C C A B B A B C A A Weighted Parsimony = 4

Weight matrix A B C A 1 2 B 1 2 C 2 2

Gerhard Jäger Maximum Parsimony WBGT 8 / 30

slide-16
SLIDE 16

Parsimony

Weighted parsimony reconstruction

A C C A B B A C Weighted Parsimony = 5 A C C

Weight matrix A B C A 1 2 B 1 2 C 2 2

Gerhard Jäger Maximum Parsimony WBGT 8 / 30

slide-17
SLIDE 17

Parsimony

Dynamic Programming (Sankofg Algorithm)

wp(mother, s) =

  • d∈daughters

min

s′∈states(w(s, s′) + wp(d, s′))

A C C A B B

0 ∞ ∞ 0 ∞ ∞

∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ Gerhard Jäger Maximum Parsimony WBGT 9 / 30

slide-18
SLIDE 18

Parsimony

Dynamic Programming (Sankofg Algorithm)

wp(mother, s) =

  • d∈daughters

min

s′∈states(w(s, s′) + wp(d, s′))

A C C A B B

0 ∞ ∞ 0 ∞ ∞

∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞

0 2 4 4 4 0

Gerhard Jäger Maximum Parsimony WBGT 9 / 30

slide-19
SLIDE 19

Parsimony

Dynamic Programming (Sankofg Algorithm)

wp(mother, s) =

  • d∈daughters

min

s′∈states(w(s, s′) + wp(d, s′))

A C C A B B

0 ∞ ∞ 0 ∞ ∞

∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞

1 3 2 4 1 4 4 4 0 2 2

Gerhard Jäger Maximum Parsimony WBGT 9 / 30

slide-20
SLIDE 20

Parsimony

Dynamic Programming (Sankofg Algorithm)

wp(mother, s) =

  • d∈daughters

min

s′∈states(w(s, s′) + wp(d, s′))

A C C A B B

0 ∞ ∞ 0 ∞ ∞

∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞

1 3 4 2 4 1 4 4 4 0 2 2 3 5

Gerhard Jäger Maximum Parsimony WBGT 9 / 30

slide-21
SLIDE 21

Parsimony

Dynamic Programming (Sankofg Algorithm)

wp(mother, s) =

  • d∈daughters

min

s′∈states(w(s, s′) + wp(d, s′))

A C C A B B

0 ∞ ∞ 0 ∞ ∞

∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞

1 3 4 2 4 1 4 4 4 0 2 2 3 5

Gerhard Jäger Maximum Parsimony WBGT 10 / 30

slide-22
SLIDE 22

Parsimony

Dynamic Programming (Sankofg Algorithm)

wp(mother, s) =

  • d∈daughters

min

s′∈states(w(s, s′) + wp(d, s′))

A C C A B B

0 ∞ ∞ 0 ∞ ∞

∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞

1 3 4 2 4 1 4 4 4 0 2 2 3 5

Gerhard Jäger Maximum Parsimony WBGT 10 / 30

slide-23
SLIDE 23

Parsimony

Dynamic Programming (Sankofg Algorithm)

wp(mother, s) =

  • d∈daughters

min

s′∈states(w(s, s′) + wp(d, s′))

A C C A B B

0 ∞ ∞ 0 ∞ ∞

∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞

1 3 4 2 4 1 4 4 4 0 2 2 3 5

Gerhard Jäger Maximum Parsimony WBGT 10 / 30

slide-24
SLIDE 24

Parsimony

Dynamic Programming (Sankofg Algorithm)

wp(mother, s) =

  • d∈daughters

min

s′∈states(w(s, s′) + wp(d, s′))

A C C A B B

0 ∞ ∞ 0 ∞ ∞

∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞

1 3 4 2 4 1 4 4 4 0 2 2 3 5

Gerhard Jäger Maximum Parsimony WBGT 10 / 30

slide-25
SLIDE 25

Parsimony

Dynamic Programming (Sankofg Algorithm)

wp(mother, s) =

  • d∈daughters

min

s′∈states(w(s, s′) + wp(d, s′))

A C C A B B

0 ∞ ∞ 0 ∞ ∞

∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞

1 3 4 2 4 1 4 4 4 0 2 2 3 5

Gerhard Jäger Maximum Parsimony WBGT 10 / 30

slide-26
SLIDE 26

Parsimony

Searching for the best tree

total parsimony score of tree: sum over all characters note: if weight matrix is symmetric, location of the root doesn’t matter Sankofg algorithm effjciently computes parsimony score of a given tree goal: tree which minimizes parsimony score no effjcient way to fjnd the optimal tree → heuristic tree search

Gerhard Jäger Maximum Parsimony WBGT 11 / 30

slide-27
SLIDE 27

Searching the tree space

Searching the tree space

Gerhard Jäger Maximum Parsimony WBGT 12 / 30

slide-28
SLIDE 28

Searching the tree space

How many rooted tree topologies are there?

2 1

n=2

Gerhard Jäger Maximum Parsimony WBGT 13 / 30

slide-29
SLIDE 29

Searching the tree space

How many rooted tree topologies are there?

2 3 1 3 2 1 2 3 1 2 1

n=2 n=3

2 3 1

Gerhard Jäger Maximum Parsimony WBGT 13 / 30

slide-30
SLIDE 30

Searching the tree space

How many rooted tree topologies are there?

2 3 1 2 4 3 1 2 4 3 1 2 4 3 1 2 3 4 1 3 2 1 2 3 1 2 1

n=2 n=3 n=4

2 3 1 2 4 3 1

Gerhard Jäger Maximum Parsimony WBGT 13 / 30

slide-31
SLIDE 31

Searching the tree space

How many rooted tree topologies are there?

f(2) = 1 f(n + 1) = (2n − 3)f(n) f(n) = (2n − 3)! 2n−2(n − 2)!

2 1 3 3 4 15 5 105 6 945 7 10395 8 135135 9 2027025 10 34459425 11 654729075 12 13749310575 13 316234143225 14 7.9e + 12 15 2.1e + 14 16 6.1e + 15 17 1.9e + 17 18 6.3e + 18 19 2.2e + 20 20 8.2e + 21 21 3.1e + 23 22 1.3e + 25 23 5.6e + 26 24 2.5e + 28 25 1.1e + 30 26 5.8e + 31 27 2.9e + 33 28 1.5e + 35 29 8.6e + 36 30 4.9e + 38 31 2.9e + 40 32 1.7e + 42 33 1.1e + 44 34 7.2e + 45 35 4.8e + 47 36 3.3e + 49 37 2.3e + 51 38 1.7e + 53 39 1.3e + 55 40 1.0e + 57

Gerhard Jäger Maximum Parsimony WBGT 14 / 30

slide-32
SLIDE 32

Searching the tree space

How many unrooted tree topologies are there?

2 3 1

n=3 Gerhard Jäger Maximum Parsimony WBGT 15 / 30

slide-33
SLIDE 33

Searching the tree space

How many unrooted tree topologies are there?

2 3 4 1 3 2 4 1 4 2 3 1 2 3 1

n=3 n=4 Gerhard Jäger Maximum Parsimony WBGT 15 / 30

slide-34
SLIDE 34

Searching the tree space

How many unrooted tree topologies are there?

2 5 4 3 1 2 3 4 5 1 2 4 5 3 1 2 5 3 4 1 5 2 3 4 1 3 5 2 4 1 2 4 3 5 1 3 2 5 4 1 3 4 2 5 1 5 3 4 2 1 4 5 2 3 1 4 2 3 5 1 4 3 5 2 1 4 5 2 3 1 5 4 2 3 1 2 3 4 1 3 2 4 1 4 2 3 1 2 3 1

n=3 n=4 n=5 Gerhard Jäger Maximum Parsimony WBGT 15 / 30

slide-35
SLIDE 35

Searching the tree space

How many unrooted tree topologies are there?

f(3) = 1 f(n + 1) = (2n − 3)f(n) f(n) = (2n − 5)! 2n−3(n − 3)!

3 1 4 3 5 15 6 105 7 945 8 10395 9 135135 10 2027025 11 34459425 12 654729075 13 13749310575 14 316234143225 15 7.90e + 12 16 2.13e + 14 17 6.19e + 15 18 1.91e + 17 19 6.33e + 18 20 2.21e + 20 21 8.20e + 21 22 3.19e + 23 23 1.31e + 25 24 5.63e + 26 25 2.53e + 28 26 1.19e + 30 27 5.84e + 31 28 2.98e + 33 29 1.57e + 35 30 8.68e + 36 31 4.95e + 38 32 2.92e + 40 33 1.78e + 42 34 1.12e + 44 35 7.29e + 45 36 4.88e + 47 37 3.37e + 49 38 2.39e + 51 39 1.74e + 53 40 1.31e + 55

Gerhard Jäger Maximum Parsimony WBGT 16 / 30

slide-36
SLIDE 36

Searching the tree space

Heuristic tree search

tree space is too large to do an exhaustive search if n (number of taxa) is larger than 12 or so heuristic search:

start with some tree topology (e.g., Neighbor-Joining tree) apply a bunch of local modifjcations to the current tree if one of the modifjed tree has lower or equal parsimony, move to that tree stop if no further improvement is possible

⇒ standard approach for optimization problems in computer science

Gerhard Jäger Maximum Parsimony WBGT 17 / 30

slide-37
SLIDE 37

Searching the tree space

Tree modifjcations

three tree modifjcations commonly in use:

1

Nearest Neighbor Interchange (NNI)

2

Tree Bisection and Reconnection (TBR)

3

Subtree Pruning and Regrafting (SPR)

local modifjcations are better than arbitrary moves in tree space because partial parsimony computations can be re-used in modifjed tree

Gerhard Jäger Maximum Parsimony WBGT 18 / 30

slide-38
SLIDE 38

Searching the tree space

Nearest Neighbor Interchange

Gerhard Jäger Maximum Parsimony WBGT 19 / 30

slide-39
SLIDE 39

Searching the tree space

Tree Bisection and Reconection

1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 10 9 1 2 3 4 5 6 7 8 10 9

Gerhard Jäger Maximum Parsimony WBGT 20 / 30

slide-40
SLIDE 40

Searching the tree space

Subtree Pruning and Regrafting

1 2 3 4 5 6 7 8 10 9 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 10 9

Gerhard Jäger Maximum Parsimony WBGT 21 / 30

slide-41
SLIDE 41

Searching the tree space

Heuristic tree search

NNI is very local → only O(n) possible moves SPR and TBR are more aggressive → O(n2)/O(n3) possible moves NNI search is comparatively fast, but prone to get stuck in local

  • ptima

Gerhard Jäger Maximum Parsimony WBGT 22 / 30

slide-42
SLIDE 42

Searching the tree space

Running example: SPR search with cognate data

parsimony=1984

Spanish Portuguese Hindi N e p a l i B e n g a l i Greek B r e t

  • n

Welsh Irish Dutch G e r m a n E n g l i s h Swedish Danish Icelandic Polish Czech Russian U k r a i n i a n B u l g a r i a n Lithuanian French Italian Romanian C a t a l a n

starting with Neighbor Joining tree . . .

Gerhard Jäger Maximum Parsimony WBGT 23 / 30

slide-43
SLIDE 43

Searching the tree space

Running example: SPR search with cognate data

parsimony=1979

S p a n i s h Portuguese Hindi N e p a l i B e n g a l i Greek Breton W e l s h Irish Dutch G e r m a n English Swedish D a n i s h Icelandic Polish Czech R u s s i a n Ukrainian Bulgarian Lithuanian French Italian R

  • m

a n i a n Catalan

Gerhard Jäger Maximum Parsimony WBGT 23 / 30

slide-44
SLIDE 44

Searching the tree space

Running example: SPR search with cognate data

parsimony=1975

Spanish P

  • r

t u g u e s e H i n d i Nepali B e n g a l i Greek B r e t

  • n

Welsh Irish Dutch G e r m a n English Swedish D a n i s h Icelandic Polish Czech Russian U k r a i n i a n Bulgarian Lithuanian French Italian Romanian C a t a l a n

Gerhard Jäger Maximum Parsimony WBGT 23 / 30

slide-45
SLIDE 45

Searching the tree space

Running example: SPR search with cognate data

parsimony=1973

Spanish Portuguese H i n d i Nepali B e n g a l i Greek Breton W e l s h Irish D u t c h German English S w e d i s h Danish Icelandic Polish Czech R u s s i a n Ukrainian Bulgarian Lithuanian F r e n c h Italian Romanian C a t a l a n

Gerhard Jäger Maximum Parsimony WBGT 23 / 30

slide-46
SLIDE 46

Searching the tree space

Running example: SPR search with cognate data

parsimony=1969

Spanish Portuguese H i n d i Nepali B e n g a l i Greek Breton W e l s h Irish D u t c h German English S w e d i s h Danish Icelandic Polish Czech R u s s i a n Ukrainian Bulgarian Lithuanian F r e n c h Italian Romanian C a t a l a n

. . . Maximum Parsimony tree

Gerhard Jäger Maximum Parsimony WBGT 23 / 30

slide-47
SLIDE 47

Searching the tree space

Running example: SPR search with cognate data

there are actually 16 difgerent trees with minimal parsimony score

Greek Irish Welsh Breton French Italian Catalan Portuguese Spanish Romanian Icelandic Swedish Danish German Dutch English Hindi Nepali Bengali Ukrainian Russian Polish Czech Bulgarian Lithuanian

Gerhard Jäger Maximum Parsimony WBGT 24 / 30

slide-48
SLIDE 48

Searching the tree space

MP tree for WALS characters

Bengali Breton Irish Welsh Bulgarian Greek Czech Lithuanian Polish Russian Ukrainian Catalan Romanian Italian Spanish Portuguese French Danish Swedish Icelandic Dutch German English Hindi Nepali

Gerhard Jäger Maximum Parsimony WBGT 25 / 30

slide-49
SLIDE 49

Searching the tree space

MP tree for sound-concept characters

Greek B u l g a r i a n Russian Ukrainian Polish Czech Lithuanian Icelandic S w e d i s h Danish D u t c h German E n g l i s h C a t a l a n Portuguese Spanish I t a l i a n Romanian French Breton Welsh Irish Hindi Bengali Nepali

Gerhard Jäger Maximum Parsimony WBGT 26 / 30

slide-50
SLIDE 50

Searching the tree space

Dollo parsimony

previous trees were estimated with a symmetric weight matrix if weights are asymmetric, location of the root matters extreme case: Dollo Parsimony w(0 → 1) = ∞

Spanish Portuguese Hindi Nepali Bengali Greek Breton Welsh Irish Dutch German English Swedish Danish Icelandic Polish Czech Russian Ukrainian Bulgarian Lithuanian French Italian Romanian Catalan

Gerhard Jäger Maximum Parsimony WBGT 27 / 30

slide-51
SLIDE 51

Searching the tree space

Maximum Parsimony: Discussion

Once we have found the best tree (or, in any event, which is very close to the best tree), we can reconstruct ancestral states via the Sankofg algorithm this allows to compute statistics about stability of characters, frequency and location of parallel changes etc.

⇒ much more informative than distance-based inference

Gerhard Jäger Maximum Parsimony WBGT 28 / 30

slide-52
SLIDE 52

Searching the tree space

Maximum Parsimony: Discussion

disadvantages of MP:

simulation studies: capacity to recover the true tree is decent but not

  • verwhelming

possibility of multiple mutations on a single branch is not taken into consideration all characters are treated equal; no discrimination between stable and volatile characters ties are common, especially if you have few data values for weight matrix are ad hoc no real theoretical justifjcation

Why should the true tree minimize the total number of mutations? Rests on a valid intuition: Mutations are unlikely, so assuming fewer mutations increases the likelihood of the data. Likelihood is not formally derived from a probabilistic modell though.

Next step: Maximum Likelihood tree estimation

Gerhard Jäger Maximum Parsimony WBGT 29 / 30

slide-53
SLIDE 53

Searching the tree space

Hands on

Install the software Paup*. Go to the directory where you have the put the nexus fjles and type > paup4 ielex.bin.nex At Paup’s command prompt, type paup> hsearch. Display tree with paup> describetree /plot=phylo Save result with paup> savetree format=newick file = ielex.mp.tre \ brlen=yes Leave Paup* with paup> q Install Dendroscope or FigTree and load ielex.mp.tre.

Gerhard Jäger Maximum Parsimony WBGT 30 / 30

slide-54
SLIDE 54

Searching the tree space

Ewens, W. and G. Grant (2005). Statistical Methods in Bioinformatics: An Introduction. Springer, New York.

Gerhard Jäger Maximum Parsimony WBGT 30 / 30