[PDF] - Maximum Likelihood properties Maximum parsimony Maximum likelihood PDF Document

SLIDE 1

c

N. Salamin

Sept 2007 Lecture outline Maximum likelihood in phylogenetics

Definition Maximum likelihood and models Likelihood of a tree Computational complexity

Statistical properties

Maximum parsimony Maximum likelihood Experimental design

Hypothesis testing

Tree support Tests of topology Tests of models

Phylogenetics and bioinformatics for evolution

Maximum Likelihood

September, 2007

c

N. Salamin

Sept 2007 Lecture outline Maximum likelihood in phylogenetics

Definition Maximum likelihood and models Likelihood of a tree Computational complexity

Statistical properties

Maximum parsimony Maximum likelihood Experimental design

Hypothesis testing

Tree support Tests of topology Tests of models

Lecture outline

1 Maximum likelihood in phylogenetics Definition Maximum likelihood and models Likelihood of a tree Computational complexity

2 Statistical properties Maximum parsimony Maximum likelihood Experimental design

3 Hypothesis testing Tree support Tests of topology Tests of models

SLIDE 2

c

N. Salamin

Sept 2007 Lecture outline Maximum likelihood in phylogenetics

Definition Maximum likelihood and models Likelihood of a tree Computational complexity

Statistical properties

Maximum parsimony Maximum likelihood Experimental design

Hypothesis testing

Tree support Tests of topology Tests of models

Lecture outline

1 Maximum likelihood in phylogenetics Definition Maximum likelihood and models Likelihood of a tree Computational complexity

2 Statistical properties Maximum parsimony Maximum likelihood Experimental design

3 Hypothesis testing Tree support Tests of topology Tests of models

c

N. Salamin

Sept 2007 Lecture outline Maximum likelihood in phylogenetics

Definition Maximum likelihood and models Likelihood of a tree Computational complexity

Statistical properties

Maximum parsimony Maximum likelihood Experimental design

Hypothesis testing

Tree support Tests of topology Tests of models

Description

Given an hypothesis H and some data D, the likelihood of H is L(H) = Prob(D|H) = Prob(D1|H)Prob(D2|H) · · · Prob(Dn|H) if the D can be split in n independent parts. Note that L(H) is not the probability of the hypothesis, but the probability of the data, given the hypothesis. Maximum likelihood properties (Fisher, 1922)

consistency – converge to correct value of the parameter
efficiency – has the smallest possible variance around true

parameter value

SLIDE 3

c

N. Salamin

Sept 2007 Lecture outline Maximum likelihood in phylogenetics

Definition Maximum likelihood and models Likelihood of a tree Computational complexity

Statistical properties

Maximum parsimony Maximum likelihood Experimental design

Hypothesis testing

Tree support Tests of topology Tests of models

Toss a coin

Let say we toss a coin 11 times and obtain 5 heads and 6 tails. All tosses are independent and all have the same unknown head probability p. What is the probability of this data? L(p) = Prob(D|p) = p5(1 − p)6 The maximum likelihood is p = 0.454545, which can be found by equating the derivative of L(p) with respect to p to zero and solving: dL(p) dp = 5p4(1 − p)6 − 6p5(1 − p)5 = 0 which yields ˆ p = 5/11 = 0.454545.

c

N. Salamin

Sept 2007 Lecture outline Maximum likelihood in phylogenetics

Definition Maximum likelihood and models Likelihood of a tree Computational complexity

Statistical properties

Maximum parsimony Maximum likelihood Experimental design

Hypothesis testing

Tree support Tests of topology Tests of models

Likelihood and models

Maximum likelihood rely on explicit probabilistic models of evolution. But, the process of evolution is so complex and multifaceted that basic models involve assumptions built upon assumptions. This reliance is often seen as a weakness of the likelihood framework, but

the need to make explicit assumptions is a strength
enable both inferences about evolutionary history and

assessments of the accuracy of the assumptions made

this led to a better understanding of evolution

“The purpose of models is not to fit the data, but to sharpen the questions” (S. Karlin)

SLIDE 4

c

N. Salamin

Sept 2007 Lecture outline Maximum likelihood in phylogenetics

Definition Maximum likelihood and models Likelihood of a tree Computational complexity

Statistical properties

Maximum parsimony Maximum likelihood Experimental design

Hypothesis testing

Tree support Tests of topology Tests of models

Basic settings for models of evolution

In order to discuss models of DNA evolution, we need to make some basic assumptions. Some can be relaxed in more complex models.

the DNA sequences are alignable
substitutions through time follow a Poison distribution
the sites of DNA sequence evolved independently
all the sites have the same rate of substitutions

With these assumptions, we can easily model substitutions with a Markov chain.

c

N. Salamin

Sept 2007 Lecture outline Maximum likelihood in phylogenetics

Definition Maximum likelihood and models Likelihood of a tree Computational complexity

Statistical properties

Maximum parsimony Maximum likelihood Experimental design

Hypothesis testing

Tree support Tests of topology Tests of models

Markov chain

Let the state (one of A, C, G or T) of the chain be X(t) at time t. The Markov chain is characterized by its generator matrix Q = {qij}, where qij is the instantaneous rate of change from i to j when ∆t → 0, that is Pr{X(t + ∆t) = j|X(t) = i} = qij∆t The diagonal elements qii are specified by the requirement that each row of Q sums to zero, that is qii = −

i=j

qij Thus −qii is the substitution rate of state i, i.e. the rate at which the Markov chain leaves i.

SLIDE 5

c

N. Salamin

Sept 2007 Lecture outline Maximum likelihood in phylogenetics

Definition Maximum likelihood and models Likelihood of a tree Computational complexity

Statistical properties

Maximum parsimony Maximum likelihood Experimental design

Hypothesis testing

Tree support Tests of topology Tests of models

Transition-probability matrix

The Q matrix fully determines the dynamics of the Markov chain. It specifies, in particular, the transition-probability matrix over any time t > 0, P(t) = {pij(t)} where pij(t) = Pr{X(t) = j|X(0) = i} We can further show that limh→0+ P(h) − I h = Q

c

N. Salamin

Sept 2007 Lecture outline Maximum likelihood in phylogenetics

Definition Maximum likelihood and models Likelihood of a tree Computational complexity

Statistical properties

Maximum parsimony Maximum likelihood Experimental design

Hypothesis testing

Tree support Tests of topology Tests of models

Transition-probability matrix

Thus (by the Chapman-Kolmogorov relation) P(t + ∆t) = P(t)P(∆t) P(t + ∆t) − P(t) = P(t)(P(∆t) − I) P(t + ∆t) − P(t) ∆t = P(t)P(∆t) − I ∆t P′(t) = P(t)Q as P(0) = I, we finally get P(t) = eQt

SLIDE 6

c

N. Salamin

Sept 2007 Lecture outline Maximum likelihood in phylogenetics

Definition Maximum likelihood and models Likelihood of a tree Computational complexity

Statistical properties

Maximum parsimony Maximum likelihood Experimental design

Hypothesis testing

Tree support Tests of topology Tests of models

Rate of mutation

As Q and t occur only in the form of a product, it is conventional to scale Q so that the average rate is 1. In phylogenetics, branch length is therefore measured in expected substitutions per site. A long branch can therefore either by due to

long evolutionary time
a rapid rate of substitution
a combination of both

c

N. Salamin

Sept 2007 Lecture outline Maximum likelihood in phylogenetics

Definition Maximum likelihood and models Likelihood of a tree Computational complexity

Statistical properties

Maximum parsimony Maximum likelihood Experimental design

Hypothesis testing

Tree support Tests of topology Tests of models

Kimura 2-parameters, 1981

Q =     − β/4 α/4 β/4 β/4 − β/4 α/4 α/4 β/4 − β/4 β/4 α/4 β/4 −     where

α is the transition rate
β is the transversion rate

It simplifies to Jukes-Cantor, 1969 if α = β.

SLIDE 7

c

N. Salamin

Sept 2007 Lecture outline Maximum likelihood in phylogenetics

Definition Maximum likelihood and models Likelihood of a tree Computational complexity

Statistical properties

Maximum parsimony Maximum likelihood Experimental design

Hypothesis testing

Tree support Tests of topology Tests of models

Tamura-Nei, 1993

Q =

−

βπC αRπG/πR + βπG βπT βπA − βπG αY πT /πY + βπT αR πA/πR + βπA βπC − βπT βπA αY πC /πY + βπC βπG −

where
αR is purine transition rate, πR is frequency of purine (A+G)
αY is pyrimidine transition rate, πY is frequency of pyrimidine

(C+T) It simplifies to

Hasegawa, Kishino, Yano, 1985 if αR/αY = πR/πY
Felsenstein, 1984 if αR = αY

c

N. Salamin

Sept 2007 Lecture outline Maximum likelihood in phylogenetics

Definition Maximum likelihood and models Likelihood of a tree Computational complexity

Statistical properties

Maximum parsimony Maximum likelihood Experimental design

Hypothesis testing

Tree support Tests of topology Tests of models

General-time reversible

Q =     − απC βπG γπT απA − δπG ǫπT βπA δπC − ηπT γπA ǫπC ηπG −     where

α · · · η are rates of changes from one nucleotide to another
πi are frequencies of nucleotides

SLIDE 8

c

N. Salamin

Sept 2007 Lecture outline Maximum likelihood in phylogenetics

Definition Maximum likelihood and models Likelihood of a tree Computational complexity

Statistical properties

Maximum parsimony Maximum likelihood Experimental design

Hypothesis testing

Tree support Tests of topology Tests of models

Variation of substitution rates

Up to now, we assumed that every sites had the same rate of

substitution. How can we change that?

The idea is to use a probability distribution to model changes in rates of substitution among sites, e.g. Gamma distribution

mean of distribution is αβ, and variance αβ2
set the mean rate of substitution to 1, so assume β = 1/α
α parameter allows to change characteristics of distribution

Felsenstein, 2004 c

N. Salamin

Sept 2007 Lecture outline Maximum likelihood in phylogenetics

Definition Maximum likelihood and models Likelihood of a tree Computational complexity

Statistical properties

Maximum parsimony Maximum likelihood Experimental design

Hypothesis testing

Tree support Tests of topology Tests of models

Discrete characters

Note that it is easy to extend the same approach to other discrete character dataset simply by modifying the size of the Q matrix, and thus the number of parameters. codon or aa

61 or 21 states
computational problems

due to the number of parameters involved

for codons, solutions

have been found to reduce size based on HKY85 model. morphology

usually 2 states

SLIDE 9

c

N. Salamin

Sept 2007 Lecture outline Maximum likelihood in phylogenetics

Definition Maximum likelihood and models Likelihood of a tree Computational complexity

Statistical properties

Maximum parsimony Maximum likelihood Experimental design

Hypothesis testing

Tree support Tests of topology Tests of models

Continuous characters

Continuous characters evolve according to a Brownian motion model.

consider a particle moving along an x-axis in small steps that

are independent from each other

with mean displacement of zero and constant variance s2
after n steps, the net displacement is the sum of individual

steps and its variance is the variance of the sum ns2

given a divergence time t between two species
let σ2 be the variance expected to accumulate per unit of time

in a continuous character

the variance after an interval of time t is σ2t
the mean is assumed to stay the same over time t

The Ornstein-Uhlenbeck model generalize the Brownian motion model by allowing to model traits subject to multiple types of selection processes.

c

N. Salamin

Sept 2007 Lecture outline Maximum likelihood in phylogenetics

Definition Maximum likelihood and models Likelihood of a tree Computational complexity

Statistical properties

Maximum parsimony Maximum likelihood Experimental design

Hypothesis testing

Tree support Tests of topology Tests of models

Likelihood of a tree

Suppose we have a data set D of DNA sequences with m sites. We are given a topology T with branch lengths and a model of evolution, Q that allow us to compute Pij(t). Assumptions made to compute likelihood L(T, Q) = Prob(D|T, Q)

evolution in different sites is independent
evolution in different lineages is independent

SLIDE 10

c

N. Salamin

Sept 2007 Lecture outline Maximum likelihood in phylogenetics

Definition Maximum likelihood and models Likelihood of a tree Computational complexity

Statistical properties

Maximum parsimony Maximum likelihood Experimental design

Hypothesis testing

Tree support Tests of topology Tests of models

Independence of sites

Felsenstein, 2004

The assumption of independence of sites along a sequence allow us to decompose the likelihood in a product of likelihood for each site L(T, Q) = Prob(D|T, Q) =

m

i=1

Prob(Di|T, Q) The likelihood for this tree for site i is Prob(Di|T, Q) =

ACGT

x

ACGT

y

ACGT

z

ACGT

w

Prob(A, C, C, C, G, x, y, z, w|T, Q)

c

N. Salamin

Sept 2007 Lecture outline Maximum likelihood in phylogenetics

Definition Maximum likelihood and models Likelihood of a tree Computational complexity

Statistical properties

Maximum parsimony Maximum likelihood Experimental design

Hypothesis testing

Tree support Tests of topology Tests of models

Independence of lineages

Felsenstein, 2004

With the independence of lineages assumptions, we can decompose the right hand side of the equation a bit further. Prob(A, C, C, C, G, x, y, z, w|T, Q) =

ACGT

x

ACGT

y

ACGT

z

ACGT

w

Prob(x)Pxy(t6)PyA(t1)PyC(t2) Pxz(t8)PzC(t3) Pzw(t7)PwC(t4)PwG(t3)

SLIDE 11

c

N. Salamin

Sept 2007 Lecture outline Maximum likelihood in phylogenetics

Definition Maximum likelihood and models Likelihood of a tree Computational complexity

Statistical properties

Maximum parsimony Maximum likelihood Experimental design

Hypothesis testing

Tree support Tests of topology Tests of models

Number of terms

It is reasonable to take Prob(x) to be the equilibrium probability of base x under the model selected. On a tree with n species, there are n − 1 interior nodes, and each can have one of 4 states; we have therefore too many terms:

n = 10, we have 49 terms: 262,144
n = 20, we have 419 terms: 274,877,906,944

c

N. Salamin

Sept 2007 Lecture outline Maximum likelihood in phylogenetics

Definition Maximum likelihood and models Likelihood of a tree Computational complexity

Statistical properties

Maximum parsimony Maximum likelihood Experimental design

Hypothesis testing

Tree support Tests of topology Tests of models

Pruning algorithm

Goal: render the likelihood computation practicable using

“dynamic programming”

Idea: move summation signs as far right as possible and

enclose them in parentheses where possible Prob(A, C, C, C, G, x, y, z, w|T) =

ACGT

x

Prob(x) ACGT

y

Pxy(t6)PyA(t1)PyC(t2)

×

ACGT

z

Pxz(t8)PzC(t3) × ACGT

w

Pzw(t7)PwC(t4)PwG(t5)

The parentheses pattern and terms for tips has an exact

correspondence to the structure of the tree.

SLIDE 12

c

N. Salamin

Sept 2007 Lecture outline Maximum likelihood in phylogenetics

Definition Maximum likelihood and models Likelihood of a tree Computational complexity

Statistical properties

Maximum parsimony Maximum likelihood Experimental design

Hypothesis testing

Tree support Tests of topology Tests of models

Conditional likelihoods

Flow of information goes down the tree, which make use of conditional likelihood of a subtree, e.g. PyA(t1)PyC(t2) Probability of everything seen at or above that node, given that the node has base y. So at each node, we can compute Li

k(s) =

ACGT

x

Psx(tl)Li

l(x)

ACGT

y

Psy(tm)Li

m(y)

The likelihood at the root node 0 is then

Li =

ACGT

x

πxLi

0(x)

Similar logic as in the Sankoff algorithm for parsimony.

c

N. Salamin

Sept 2007 Lecture outline Maximum likelihood in phylogenetics

Definition Maximum likelihood and models Likelihood of a tree Computational complexity

Statistical properties

Maximum parsimony Maximum likelihood Experimental design

Hypothesis testing

Tree support Tests of topology Tests of models

Unrootedness

The trees inferred so far appeared to be rooted trees, but if model

f evolution is reversible, the tree are unrooted.

Looking at the region near the root L(T) =

ACGT

x

ACGT

y

ACGT

z

Prob(x)Pxy(t6)Pxz(t8) But reversibility of the subsitution process guarantees that Prob(x)Pxy(t6) = Prob(y)Pyx(t6) Substituting that, we get L(T) =

ACGT

x

ACGT

y

ACGT

z

Prob(y)Pyx(t6)Pxz(t8)

SLIDE 13

c

N. Salamin

Sept 2007 Lecture outline Maximum likelihood in phylogenetics

Definition Maximum likelihood and models Likelihood of a tree Computational complexity

Statistical properties

Maximum parsimony Maximum likelihood Experimental design

Hypothesis testing

Tree support Tests of topology Tests of models

Estimating parameters

Up to now we assumed that both branch lengths and model parameters were fixed at a given value. Of course, we don’t know these values so we use numerical integration:

to get these values, start from a random guess
make small changes and compare the improvement in

likelihood

do so until the likelihood do not change any more
every time the topology is changed, reestimate these

parameters Drawback: very very very computationally intensive, but there are solutions to that.

c

N. Salamin

Sept 2007 Lecture outline Maximum likelihood in phylogenetics

Definition Maximum likelihood and models Likelihood of a tree Computational complexity

Statistical properties

Maximum parsimony Maximum likelihood Experimental design

Hypothesis testing

Tree support Tests of topology Tests of models

End of part 1

SLIDE 14

c

N. Salamin

Sept 2007 Lecture outline Maximum likelihood in phylogenetics

Definition Maximum likelihood and models Likelihood of a tree Computational complexity

Statistical properties

Maximum parsimony Maximum likelihood Experimental design

Hypothesis testing

Tree support Tests of topology Tests of models

Lecture outline

1 Maximum likelihood in phylogenetics Definition Maximum likelihood and models Likelihood of a tree Computational complexity

2 Statistical properties Maximum parsimony Maximum likelihood Experimental design

3 Hypothesis testing Tree support Tests of topology Tests of models

c

N. Salamin

Sept 2007 Lecture outline Maximum likelihood in phylogenetics

Definition Maximum likelihood and models Likelihood of a tree Computational complexity

Statistical properties

Maximum parsimony Maximum likelihood Experimental design

Hypothesis testing

Tree support Tests of topology Tests of models

Statistical properties of parsimony

As the amount of data approach infinity, an estimator is consistent if it convergences to the true value of the parameter with probability 1. inconsistent if it converges to something else. One way to test consistency of parsimony is to check data

patterns. For four taxa:
44 = 256 possible site patterns, from AAAA to TTTT
3 unrooted tree topologies ⇒ work out for each how many

changes are necessary for patterns to evolve

patterns xxyy, xyxy, or xyyx are the only ones to affect

length of tree in parsimony

commonly called phylogenetically informative characters
patterns such as xxyz do not affect a parsimony method, as

they require 2 changes on any tree

SLIDE 15

c

N. Salamin

Sept 2007 Lecture outline Maximum likelihood in phylogenetics

Definition Maximum likelihood and models Likelihood of a tree Computational complexity

Statistical properties

Maximum parsimony Maximum likelihood Experimental design

Hypothesis testing

Tree support Tests of topology Tests of models

Data patterns

Felsenstein, 2004 c

N. Salamin

Sept 2007 Lecture outline Maximum likelihood in phylogenetics

Definition Maximum likelihood and models Likelihood of a tree Computational complexity

Statistical properties

Maximum parsimony Maximum likelihood Experimental design

Hypothesis testing

Tree support Tests of topology Tests of models

Observed numbers of patterns

The length of a tree is obtained by counting how many times we have xxyy, xyxy, and xyyx, whatever nucleotides are represented in the x and y. On the first tree, the number of changes are nxxyy + 2nxyxy + 2nxyyx = 2(nxxyy + nxyxy + nxyyx) − nxxyy Similarly 2nxxyy + nxyxy + 2nxyyx = 2(nxxyy + nxyxy + nxyyx) − nxyxy 2nxxyy + 2nxyxy + nxyyx = 2(nxxyy + nxyxy + nxyyx) − nxyyx Since the first term is the same, we can remove it and the tree chosen by parsimony is the one minimizing the right most part of the equations.

SLIDE 16

c

N. Salamin

Sept 2007 Lecture outline Maximum likelihood in phylogenetics

Definition Maximum likelihood and models Likelihood of a tree Computational complexity

Statistical properties

Maximum parsimony Maximum likelihood Experimental design

Hypothesis testing

Tree support Tests of topology Tests of models

Pattern probabilities

Felsenstein, 2004

p and q represent the net probability of change along the branch, and assume independent evolutionary process in different lineages. If we assign the character state 0 to both internal nodes: P1100 = 1 2(1 − p)(1 − q)(1 − q)pq For all other possibilities at both internal nodes: P1100 = 1 2[(1 − p)(1 − q)2pq + (1 − p)2(1 − q)2q + p2q3 +pq(1 − p)(1 − q)2]

c

N. Salamin

Sept 2007 Lecture outline Maximum likelihood in phylogenetics

Definition Maximum likelihood and models Likelihood of a tree Computational complexity

Statistical properties

Maximum parsimony Maximum likelihood Experimental design

Hypothesis testing

Tree support Tests of topology Tests of models

Long branch attraction

Long branch attraction:

probability of parallel changes along both long branches is p2
probability of single change in interior branch is q
if p2 becomes greater than q ⇒ long branch attraction

Felsenstein, 2004

But if the tree is short enough, even large ratios of the length of the long to the short branches do not cause inconsistency.

SLIDE 17

c

N. Salamin

Sept 2007 Lecture outline Maximum likelihood in phylogenetics

Definition Maximum likelihood and models Likelihood of a tree Computational complexity

Statistical properties

Maximum parsimony Maximum likelihood Experimental design

Hypothesis testing

Tree support Tests of topology Tests of models

Generalisation

The proof of inconsistency of parsimony has been generalized to DNA data, and larger trees. However, parsimony can be “rescued” if long branches in the tree are broken by adding more taxa:

Graybeal, 1995

Problem: we don’t know in advance which branch to break, but a good taxon sampling should take care of that

c

N. Salamin

Sept 2007 Lecture outline Maximum likelihood in phylogenetics

Definition Maximum likelihood and models Likelihood of a tree Computational complexity

Statistical properties

Maximum parsimony Maximum likelihood Experimental design

Hypothesis testing

Tree support Tests of topology Tests of models

Consistency and overparameterisation

Maximum likelihood can be proved to be consistent (see Felsenstein, 2004 pp. 271)

true if we use the correct model of evolution
but what happen if model in not correct? No guarantee can

be given

less problematic if what we want to infer is just the topology

Beware of overparameterisation problems

adding more and more parameters to the model will result in

a better fit to the data

but this will lead to inconsistency
a few parameters are more important than others, in

particular the gamma distribution

SLIDE 18

c

N. Salamin

Sept 2007 Lecture outline Maximum likelihood in phylogenetics

Definition Maximum likelihood and models Likelihood of a tree Computational complexity

Statistical properties

Maximum parsimony Maximum likelihood Experimental design

Hypothesis testing

Tree support Tests of topology Tests of models

Phylogenetic information

“Given the recognition of phylogenetic inference as being inherently statistical in nature, it is surprising that so little attention has been paid to experimental design”

Goldman, 1998

Iθ = −∂2ln(L) ∂θi∂θj where Iθ is proportional to the precision of an estimator of θ. For experimental design, Fisher or expected information is useful as its inverse gives asymptotic lower bounds for the variances of estimators of the θi.

c

N. Salamin

Sept 2007 Lecture outline Maximum likelihood in phylogenetics

Definition Maximum likelihood and models Likelihood of a tree Computational complexity

Statistical properties

Maximum parsimony Maximum likelihood Experimental design

Hypothesis testing

Tree support Tests of topology Tests of models

What can we do with that?

best rate of evolution

Townsend, 2007

where to add a new sequence

Goldman, 1998

SLIDE 19

c

N. Salamin

Sept 2007 Lecture outline Maximum likelihood in phylogenetics

Definition Maximum likelihood and models Likelihood of a tree Computational complexity

Statistical properties

Maximum parsimony Maximum likelihood Experimental design

Hypothesis testing

Tree support Tests of topology Tests of models

End of part 2

c

N. Salamin

Sept 2007 Lecture outline Maximum likelihood in phylogenetics

Definition Maximum likelihood and models Likelihood of a tree Computational complexity

Statistical properties

Maximum parsimony Maximum likelihood Experimental design

Hypothesis testing

Tree support Tests of topology Tests of models

Lecture outline

1 Maximum likelihood in phylogenetics Definition Maximum likelihood and models Likelihood of a tree Computational complexity

2 Statistical properties Maximum parsimony Maximum likelihood Experimental design

3 Hypothesis testing Tree support Tests of topology Tests of models

SLIDE 20

c

N. Salamin

Sept 2007 Lecture outline Maximum likelihood in phylogenetics

Definition Maximum likelihood and models Likelihood of a tree Computational complexity

Statistical properties

Maximum parsimony Maximum likelihood Experimental design

Hypothesis testing

Tree support Tests of topology Tests of models

Bootstrap on phylogenetic tree

Allow us to infer the variability of parameters in models that are too complex for easy calculation of their variance. This is the case of topologies!

Felsenstein, 2004

Procedure

sample whole columns of data

with replacement

recreate n pseudomatrices with

the same number of species and sites than original one

build n phylogenetic trees from

these n pseudoreplicates

weigh each tree in replicate i by

the number of trees obtained

c

N. Salamin

Sept 2007 Lecture outline Maximum likelihood in phylogenetics

Definition Maximum likelihood and models Likelihood of a tree Computational complexity

Statistical properties

Maximum parsimony Maximum likelihood Experimental design

Hypothesis testing

Tree support Tests of topology Tests of models

Summarizing results

The end result of a bootstrap is a cloud of trees. What is the best way to summarize this given that trees have discrete topologies and continuous branch lengths? We could make a histogram of the length of a particular branches

it will give a lower limit on the branch length
then check if 0 is in the 95% interval, we would assert the

existence of the branche A simpler solution is to count how many times a particular branch appears in the list of trees estimated by bootstrap. A majority-rule consensus tree containing clades appearing in more than 50% of them can then be built

SLIDE 21

c

N. Salamin

Sept 2007 Lecture outline Maximum likelihood in phylogenetics

Definition Maximum likelihood and models Likelihood of a tree Computational complexity

Statistical properties

Maximum parsimony Maximum likelihood Experimental design

Hypothesis testing

Tree support Tests of topology Tests of models

Simple example

Felsenstein, 2004 c

N. Salamin

Sept 2007 Lecture outline Maximum likelihood in phylogenetics

Definition Maximum likelihood and models Likelihood of a tree Computational complexity

Statistical properties

Maximum parsimony Maximum likelihood Experimental design

Hypothesis testing

Tree support Tests of topology Tests of models

Jackknife on phylogenetic tree

Resample a portion of characters without replacement. Bootstrap:

weights mj on data have a multinomial distribution, with n

trials and equal probability for all j characters.

mean weight per character is 1, with variance 1 − 1/n

Jackknife

delete fraction fj of the characters (weights 0 or 1)
mean weight per character is 1 − f, with variance f(1 − f)
when f = 1/2, same coefficient of variation as bootstrap

when n → ∞ Some authors argue that jackknife with f = 1/e is better, but it has a smaller variance than bootstrap

SLIDE 22

c

N. Salamin

Sept 2007 Lecture outline Maximum likelihood in phylogenetics

Definition Maximum likelihood and models Likelihood of a tree Computational complexity

Statistical properties

Maximum parsimony Maximum likelihood Experimental design

Hypothesis testing

Tree support Tests of topology Tests of models

Multiple tests

We don’t (or rarely) know in advance which group interest us. If we look for the most supported group on the tree and report its p-value, we have a “multiple-tests” problem

if no significant evidence for existence of any groups on a tree
5% of branches are expected to be above 0.95
so one out of every 20 branches of a tree would be significant

The p-value cannot be interpreted as statistical test.

c

N. Salamin

Sept 2007 Lecture outline Maximum likelihood in phylogenetics

Definition Maximum likelihood and models Likelihood of a tree Computational complexity

Statistical properties

Maximum parsimony Maximum likelihood Experimental design

Hypothesis testing

Tree support Tests of topology Tests of models

Independence of characters

Independence assumption of resampling technics may not be met. Imagine that pairs of characters are identical

we should draw once for each identical pair, because only

n/2 independent characters

if we draw n times, we will be sampling too often
the variation between bootstrap samples will be too small
the trees generated will be too similar

Results: corroborating evidence for groups on the tree will be higher than there really is. Problem: no easy way to know how much correlation there is between characters.

SLIDE 23

c

N. Salamin

Sept 2007 Lecture outline Maximum likelihood in phylogenetics

Definition Maximum likelihood and models Likelihood of a tree Computational complexity

Statistical properties

Maximum parsimony Maximum likelihood Experimental design

Hypothesis testing

Tree support Tests of topology Tests of models

Invariant characters

What is the impact of invariant characters on the bootstrap percentages? How often will a single varying character appear in a bootstrap replicate

if N characters in total, will be chosen with probability 1/N
it will be omitted 1 − 1/N of the times
probability that it will be omitted entirely is (1 − 1/N)N
increase the number of invariant characters, therefore N
probability of being omitted tend towards e−1 = 0.36788

when N → ∞

90% of this value is reached with N = 6
99% with N = 50

c

N. Salamin

Sept 2007 Lecture outline Maximum likelihood in phylogenetics

Definition Maximum likelihood and models Likelihood of a tree Computational complexity

Statistical properties

Maximum parsimony Maximum likelihood Experimental design

Hypothesis testing

Tree support Tests of topology Tests of models

Biases in bootstrap

Estimated p-value is conservative:

Hillis and Bull (1993)

One source of conservatism

with bootstrap, we make statement about branch lengths µ
then we reduce that to statements about tree topology,

i.e. µ > 0 or µ < 0

generalisation of Hillis and Bull (1993) results is not clear

SLIDE 24

c

N. Salamin

Sept 2007 Lecture outline Maximum likelihood in phylogenetics

Definition Maximum likelihood and models Likelihood of a tree Computational complexity

Statistical properties

Maximum parsimony Maximum likelihood Experimental design

Hypothesis testing

Tree support Tests of topology Tests of models

Parametric bootstrap

Felsenstein, 2004

Use computer simulations to create pseudorandom data sets Advantages and disadvantages

can sample from the desired

distribution, even with small data sets

flexible hypothesis testing

framework

close reliance on the correctness
f the model of evolution

c

N. Salamin

Sept 2007 Lecture outline Maximum likelihood in phylogenetics

Definition Maximum likelihood and models Likelihood of a tree Computational complexity

Statistical properties

Maximum parsimony Maximum likelihood Experimental design

Hypothesis testing

Tree support Tests of topology Tests of models

Are two topologies different?

LRT or AIC not possible on topologies because they are discrete entities, and not parameters. We therefore have to rely on paired-sites tests

expected lnL is average lnL per site as number of sites grows

without limit

if sites are independent and two trees have equal expected

lnL, then differences in lnL at each site is drawn independently with expectation zero

statistical test of mean of these differences is zero
valid for likelihood and parsimony

SLIDE 25

c

N. Salamin

Sept 2007 Lecture outline Maximum likelihood in phylogenetics

Definition Maximum likelihood and models Likelihood of a tree Computational complexity

Statistical properties

Maximum parsimony Maximum likelihood Experimental design

Hypothesis testing

Tree support Tests of topology Tests of models

Possible tests

Different form of tests possible sites for each sites, score which tree is better and use a binomial distribution to test if scores are significantly different from 0.5 z assumes the differences at each site are normally distributed and estimate the variance of differences

f the scores

Wilcoxon replace absolute values of differences by their ranks, then re-applies signs; sum of values for one tree is used as statistic KH use bootstrap sampling to infer distribution of sum

f differences of scores and see whether 0 lay in

the tails of distribution

c

N. Salamin

Sept 2007 Lecture outline Maximum likelihood in phylogenetics

Definition Maximum likelihood and models Likelihood of a tree Computational complexity

Statistical properties

Maximum parsimony Maximum likelihood Experimental design

Hypothesis testing

Tree support Tests of topology Tests of models

Example

232-sites mtDNA for 7 mammals

Felsenstein, 2004

SLIDE 26

c

N. Salamin

Sept 2007 Lecture outline Maximum likelihood in phylogenetics

Definition Maximum likelihood and models Likelihood of a tree Computational complexity

Statistical properties

Maximum parsimony Maximum likelihood Experimental design

Hypothesis testing

Tree support Tests of topology Tests of models

Example

Results sites 160 sites for tree I, 72 for tree II ⇒ p-value = 3.279 × 10−9 against 116:116 H0 z sum of lnL = 3.18, σ2 = 0.04, so variance of sum of differences is 11.31; z statistic is 3.18/3.36 = 0.94 with p-value of 0.34 KH 10,000 bootstrap samples of sites; 8,326 favored tree II, which yields a one-tailed probability of 0.16, and a 0.33 two-tailed probability

c

N. Salamin

Sept 2007 Lecture outline Maximum likelihood in phylogenetics

Definition Maximum likelihood and models Likelihood of a tree Computational complexity

Statistical properties

Maximum parsimony Maximum likelihood Experimental design

Hypothesis testing

Tree support Tests of topology Tests of models

Multiple tests, again

If we want to test more than two trees

compare each tree to best tree
accept all trees that cannot be rejected by KH test
multiple tests setting, but no reduction of nominal rejection

level possible

need to correct for all different ways the data can vary, ways

that support different trees When two trees are compared, but one of them is the actual best tree, we should do a one-tailed test.

SLIDE 27

c

N. Salamin

Sept 2007 Lecture outline Maximum likelihood in phylogenetics

Definition Maximum likelihood and models Likelihood of a tree Computational complexity

Statistical properties

Maximum parsimony Maximum likelihood Experimental design

Hypothesis testing

Tree support Tests of topology Tests of models

SH test

Resampling technique that approximately corrects for testing multiple trees

1 make R bootstrap of the N sites 2 for each tree, normalize resampled lnL so they have same

expectation

3 for jth bootstrap, calculate ˜

Sij for ith tree how far normalized value is below maximum across all trees for that replicate

4 for each tree i, the tail probability is proportion of bootstrap

replicates in which ˜ Sij is less than the actual difference between ML and lnL of that tree Resampling build a “least-favorable” case in which the trees show some patterns of covariation of site as in actual data but do not differ in overall lnL. One limitation: assume that all proposed trees are possibly equal in likelihood

c

N. Salamin

Sept 2007 Lecture outline Maximum likelihood in phylogenetics

Definition Maximum likelihood and models Likelihood of a tree Computational complexity

Statistical properties

Maximum parsimony Maximum likelihood Experimental design

Hypothesis testing

Tree support Tests of topology Tests of models

Uncertainty assessment

Likelihood does not only allow to make point estimate of the topology and branch length, it also gives information about the uncertainty of our estimate. It is possible to use the likelihood curve to test hypothesis and to make interval estimates. Asymptotically (i.e. when the number of data point tend towards ∞), the ML estimate ˆ θ is normally distributed around its true value θ0.

SLIDE 28

c

N. Salamin

Sept 2007 Lecture outline Maximum likelihood in phylogenetics

Definition Maximum likelihood and models Likelihood of a tree Computational complexity

Statistical properties

Maximum parsimony Maximum likelihood Experimental design

Hypothesis testing

Tree support Tests of topology Tests of models

Likelihood ratio test

The variance σ around ˆ θ can also be found using the curvature of the likelihood surface, then ˆ θ − θ0 √σ ∼ N(0, 1) Considering the lnL curve as locally quadratic lnL(θ0) = lnL(ˆ θ) − 1 2 (θ0 − ˆ θ)2 σ Subtracting lnL(θ0) from both sides, and rearranging leads to 2

lnL(ˆ

θ) − lnL(θ0)

∼ χ2

1

For p parameters, twice the difference in likelihood is χ2

p

distributed.

c

N. Salamin

Sept 2007 Lecture outline Maximum likelihood in phylogenetics

Definition Maximum likelihood and models Likelihood of a tree Computational complexity

Statistical properties

Maximum parsimony Maximum likelihood Experimental design

Hypothesis testing

Tree support Tests of topology Tests of models

Nested models

Assumptions of the likelihood ratio test

null hypothesis should be in the interior space that contains

the alternative hypotheses

if q parameters have been constrained, they must be able to

vary in both sense

if L0 restricts one parameter to the end of its range,

distribution of twice log likelihood ratio has half its mass at 0 and the other half in the usual χ2 distribution

halve the tail probability obtained wiht usual χ2

Valid only asymptotically

should be close enough to the true values
can therefore approximate the likelihood curve as being

shaped like a normal distribution

true with very large amounts of data

SLIDE 29

c

N. Salamin

Sept 2007 Lecture outline Maximum likelihood in phylogenetics

Definition Maximum likelihood and models Likelihood of a tree Computational complexity

Statistical properties

Maximum parsimony Maximum likelihood Experimental design

Hypothesis testing

Tree support Tests of topology Tests of models

Akaike Information Criterion

More general model will always have higher likelihood than restricted models. So, choosing model with highest likelihood will lead to one that is unnecessarily complex. We should therefore compromise goodness of fit with complexity

f model.

AIC for hypothesis i with pi parameters: AICi = −2lnLi + 2pi Hypothesis with the lowest AIC is preferred.

c

N. Salamin

Sept 2007 Lecture outline Maximum likelihood in phylogenetics

Definition Maximum likelihood and models Likelihood of a tree Computational complexity

Statistical properties

Maximum parsimony Maximum likelihood Experimental design

Hypothesis testing

Tree support Tests of topology Tests of models

Models hierarchy

All the different models seen so far are special cases of the GTR+Γ+I model

setting Γ and I to 0 leads to GTR
setting transversion rates to β and transition rates to α leads

to F84

setting β = α leads to K2P
setting all nucleotide frequencies to 1/4 lead to JC69

SLIDE 30

c

N. Salamin

Sept 2007 Lecture outline Maximum likelihood in phylogenetics

Definition Maximum likelihood and models Likelihood of a tree Computational complexity

Statistical properties

Maximum parsimony Maximum likelihood Experimental design

Hypothesis testing

Tree support Tests of topology Tests of models

Examples

Possible comparisons

GTR+Γ vs GTR
2 × [lnLGTR+Γ − lnLGTR]
difference in df = 1
GTR+Γ vs F84+Γ
2 × [lnLGTR+Γ − lnLF84+Γ]
difference in df = 4
F84+Γ vs JC69
2 × [lnLF84+Γ − lnLF84]
difference in df = 6

c

N. Salamin

Sept 2007 Lecture outline Maximum likelihood in phylogenetics

Definition Maximum likelihood and models Likelihood of a tree Computational complexity

Statistical properties

Maximum parsimony Maximum likelihood Experimental design

Hypothesis testing

Tree support Tests of topology Tests of models

Testing the molecular clock

For the same topology, if the tree likelihood is estimated by enforcing a molecular clock, the number of branch lengths to estimate is reduced. The distance between an ancestor and its descendants is the same under a molecular clock, so we only have to estimate the s − 1 node ages instead of 2s − 3 branch lengths. The likelihood ratio test is thus:

2 × [lnLno clock − lnLclock]
difference in df = s − 2