Markov models in molecular phylogeny and evolution Nicolas Galtier - - PowerPoint PPT Presentation

▶

Aug 17, 2022 281 likes •716 views

Markov models in molecular phylogeny and evolution Nicolas Galtier CNRS UMR 5554 Institut des Sciences de lEvolution Universit Montpellier 2 galtier@univ-montp2.fr Markov models in molecular phylogeny Generalities about Markov

SLIDE 1

Markov models in molecular phylogeny and evolution Nicolas Galtier

CNRS UMR 5554 – Institut des Sciences de l’Evolution Université Montpellier 2 galtier@univ-montp2.fr

SLIDE 2

Markov models in molecular phylogeny

Definition:

Markov chains (= Markov processes) are mathematical objects devoted to the description/modelling of the variations in time of a system under the (very weak) hypothesis of lack of memory: the future of the system only depends on its current state, not on the pathway that was followed to reach it.

A few examples: discrete time, discrete states: branching process

discrete time, continuous states: random walks continuous time, discrete states: Poisson process continuous time, continuous states : Brownian motion

In molecular phylogeny, states are the 4 nucleotides / 20 amino-acids / 61 codons,

and the process is typically represented by a rate matrix in continuous time. Generalities about Markov processes

SLIDE 3

Markov models in molecular phylogeny

A C G T A C G T X α κ.α α

α X α κ.α κ.α α X α α κ.α α X

Kimura model (nucleotides) WAG model (amino-acids)

Example rate matrices

SLIDE 4

because evolution is very generally memoryless

Why? How?

thanks to the statistical approach in molecular phylogeny
simulating data
building phylogenies accounting for the evolutionary process
inferring the processes and learn about the forces underlying molecular evolution

What for? Markov models are the fundamental tool of molecular phylogeny

because the theory of Markov chains is well developed

Markov models in molecular phylogeny

SLIDE 5

The statistical approach in molecular phylogeny 1- modelling 2- computing expectations 3- fitting model to data Sequence evolution is represented by a Markov process running along a tree. Calculate the likelihood function, i.e. the probability of the data given the model. Maximise the likelihood over the parameter space, and thus obtain maximum likelihood estimates for parameters.

Calculate the posterior probability of parameters given the data and the priors (bayesian approach).

Markov models in molecular phylogeny

SLIDE 6

A C G T A C G T β α α α α β β β β β β β

Rate matrix : M

X0 X1 X2 X3 l1 l2 l3 l4 l5 l6 l7 l8 A A C A G T T C T T A A A A A y1: y2: y3:

data : Y Tree topology T Branch lengths: li Likelihood calculation in molecular phylogeny

Markov models in molecular phylogeny

SLIDE 7

A C G T A C G T β α α α α β β β β β β β

Rate matrix : M

X0 X1 X2 X3 l1 l2 l3 l4 l5 l6 l7 l8 A A C A G T T C T T A A A A A y1: y2: y3:

data : Y Tree topology T Branch lengths: li Likelihood calculation in molecular phylogeny

Markov models in molecular phylogeny

L(li, Μ, T ) = Pr(Y | li, Μ, T ) = Π Pr(yi | li, Μ , T )

Pr(y1 | li, Μ, T ) = ΣΣΣΣ Pr(X0=x0).Pr(X1=x1| X0=x0). Pr(X2=x2|X1=x1).Pr(y11=A| X2=x2). Pr(y12=A| X2=x2).

Pr(y13=C| X1=x1). Pr(X3=x3| X0=x0). Pr(y14=A| X3=x3). Pr(y15=G| X3=x3) x0 x1 x2 x3 Felsenstein 1981 J Mol Evol 17:368

SLIDE 8

Calculating transition probabilities

Markov models in molecular phylogeny

P(t)=eMt

t is for time (branch length) M is the rate matrix :1/mij = average waiting time before state i changes to state j P(t) is the substitution probability matrix: pij(t) is the probability of observing state j after evolution during time t starting from state i. Deriving this formula starts by writing differential equations like: A(t+dt)=A(t)(-mAC-mAG-mAT)dt + C(t)mCAdt + G(t)mGAdt + T(t)mTAdt Calculating the exponential of a matrix is easy when diagonalisable.

SLIDE 9

Markov models in molecular phylogeny

Using the likelihood function The bayesian approach can fulfill the same purposes with more complex models, if we accept to draw prior distributions for parameters. Knowing how to calculate the likelihood, we can:

estimate parameters by maximising (ML = Maximum Likelihood)
test hypotheses by comparing models (LRT = Likelihood Ratio Test)
recover details of the process using conditional likelihoods (EB = Empirical

Bayesian)

SLIDE 10

Example biological questions requiring good usage of Markov models:

have my favourite protein evolved under positive selection? (codon models)
have it undergone any functional change ? (covarion = heterotachous models)
can we exhibit coevolution between sites ? (models of departure from independence)

Markov models in molecular phylogeny

what did the ancestral sequence look like ? (empirical bayesian)
have it undergone any compositional change ? (non-stationary models)
which changes occurred ? In which branches ? (substitution mapping)
when did speciations occur ? (clock-relaxed models)

SLIDE 11

1 2 3 4 5

θ θ θ θ θ θ θ θ ω stationary, homogeneous

1 2 3 4 5

ω non-stationary, non-homogeneous θ1 θ2 θ4 θ7 θ3 θ5 θ8 θ6

A non-stationary model

Galtier and Gouy 1998 Mol Biol Evol 15:871

A C G T A C G T X (1-θ)α (1-θ)κα (1-θ)α

θα X θα θκα θκα θα X θα

(1-θ)α (1-θ)κα (1-θ)α X Tamura 1992 model

θ = equilibrium GC-content

SLIDE 12

actual MP NHML 18% 10% 22% 14% 14%

low GCanc (10-25%) high eqGC (90%) medium sequence GC (~40%)

actual MP NHML 18% 32% 10% 27% 22% 40% 14% 30% 14% 28% actual MP NHML 18% 32% 19% 10% 27% 11% 22% 40% 21% 14% 30% 16% 14% 28% 15%

Accuracy of ancestral GC% estimation (simulations)

A non-stationary model

SLIDE 13

A non-stationary model

40 80 40 80

50 60 70

SSU LSU Topt Topt rRNA G+C-content Optimal growth temperature versus rRNA GC% in prokaryotes

SLIDE 14

A non-stationary model

Giardia 70.4% Entamoeba 43.7%

Desulfurococcus 64.2% Thermoproteus 63.5% M.jannashi 62.3% M.vannieli 57.7% Halococcus 58.9% Halobacterium 58.7%

Thermus 61.3% Thermotoga 60.9% Euglena 51.7% FUNGI 48.6% PLANTA 50.4% METAZOA 52.4% EUCARYA CRENARCHAE EURYARCHAE BACTERIA LOW GC GRAM+ 54.2% PROTEOBACTERIA 54.1% HIGH GC GRAM+ 57.0% CHLOROPLASTS 52.5%

56.1%

estimated ancestral GC% : The rRNA universal tree of life

SLIDE 15

40 80 40 80

50 60 70

SSU LSU

Topt Topt rRNA G+C-content

A non-stationary model

A non-hyperthermophilic ancestor?

SLIDE 16

Giardia 70.4% Entamoeba 43.7%

Desulfurococcus 64.2% Thermoproteus 63.5% M.jannashi 62.3% M.vannieli 57.7% Halococcus 58.9% Halobacterium 58.7%

56.1% 57.3%

Eukaryote 1 70.9% Eukaryote 2 70.9% Crenarchae 1 65.4% Crenarchae 2 65.1% Euryarchae 1 65.2% Euryarchae 2 65.0% Bacteria 1 63.2% Bacteria 2 62.3% A non-stationary model

Controlling for species sampling

SLIDE 17

A non-stationary model

40 80 40 80

50 60 70

SSU LSU

Topt Topt rRNA G+C-content

A non-hyperthermophilic ancestor?

Galtier et al. 1999 Science 283:221

SLIDE 18

Codon models, positive selection

T C A G T C A G

TTT → Phe TTC → Phe TTA → Leu TTG → Leu CTT → Leu CTC → Leu CTA → Leu CTG → Leu ATT → Ile ATC → Ile ATA → Ile ATG → Met GTT → Val GTC → Val GTA → Val GTG → Val TCT → Ser TCC → Ser TCA → Ser TCG → Ser CCT → Pro CCC → Pro CCA → Pro CCG → Pro ACT → Thr ACC → Thr ACA → Thr ACG → Thr GCT → Ala GCC → Ala GCA → Ala GCG → Ala TAT → Tyr TAC → Tyr TAA → Stop TAG → Stop CAT → His CAC → His CAA → Gln CAG → Gln AAT → Asn AAC → Asn AAA → Lys AAG → Lys GAT → Asp GAC → Asp GAA → Glu GAG → Glu TGT → Cys TGC → Cys TGA → Stop TGG → Trp CGT → Arg CGC → Arg CGA → Arg CGG → Arg AGT → Ser AGC → Ser AGA → Arg AGG → Arg GGT → Gly GGC → Gly GGA → Gly GGG → Gly The standard genetic code

SLIDE 19

Codon models, positive selection

The Goldman-Yang codon model if codon X and codon Y differ by more than one base β .πY if codon X and codon Y differ by one synonymous transversion β ω.πY if codon X and codon Y differ by one nonsynonymous transversion α .πY if codon X and codon Y differ by one synonymous transition α.ω.πY if codon X and codon Y differ by one non-synonymous transition mXY =

Goldman & Yang 1994 Mol Biol Evol 11:725

ω is the parameter of interest:

ω=1 in case of neutral evolution
ω<1 in case of negative selection (constraint)
ω>1 in case of positive selection (adaptation)

SLIDE 20

Codon models, positive selection

Model 1 : ω0 ≠ ωC ln(L)= -1041.70 ω0 = 0.489 ; ωC = 3.383 Model 0 : ω0 = ωC ln(L)= -1043.84 ω0 = ωC = 0.574 Primate lysosyme evolution

Yang 1998 Mol Biol Evol 15:568

SLIDE 21

Codon models, positive selection

2. log(L1/L0) ~ χ2 (n df)

Let MO and M1 be two nested models: MO (p parameters) is a special instance of M1 (p+n parameters) Let L0 and L1 be the maximum likelihoods under MO and M1, respectively. Twice the log-likelihood ratio is asymptotically χ2 distributed (n degrees of freedom) under MO The likelihood ratio test (LRT) LRT are used to decide whether the increase in likelihood obtained by adding parameters (=degrees of freedom) to a model is significant.

SLIDE 22

Codon models, positive selection

Yang 1998 Mol Biol Evol 15:568

Model 1 : ω0 ≠ ωC ln(L)= -1041.70 ω0 = 0.489 ; ωC = 3.383 Model 0 : ω0 = ωC ln(L)= -1043.84 ω0 = ωC = 0.574 Primate lysosyme evolution 2 [ ln(L1) - ln(L0)] = 4.28 *

SLIDE 23

Codon models, positive selection

Yang 1998 Mol Biol Evol 15:568

Model 1 : ω0 ≠ ωC ln(L)= -1041.70 ω0 = 0.489 ; ωC = 3.383 Model 0 : ω0 = ωC ln(L)= -1043.84 ω0 = ωC = 0.574 Primate lysosyme evolution Model 2 : ω0 ≠ ωC, ωC =1 ln(L)= -1042.50 ω0 = 0.488 ; ωC = 1 2 [ ln(L2) - ln(L1)] = 1.6 NS

SLIDE 24

Codon models, positive selection

green: volume blue: polarity

range: charge

brown: dN/dS

Sainudiin et al 2005 J. Mol. Evol. 60:315

Variation of ω across sites: class 1 MHC

SLIDE 25

Codon models, positive selection

MORMYRIFORMES (Afrique) GYMNOTIFORMES (Amérique du Sud)

Zakon et al 2006 PNAS 103:3675

A sodium channel in electric fish

SLIDE 26

Codon models, positive selection

A genomic approach: 13731 human/chimpanzee orthologous genes Fonction n p-val

Immunité Perception sensorielle Gametogenèse Inhibition apoptose

417 51 40 133

<10-10 <10-3 <10-2 <5%

Tissu n p-val

Testicules Cerveau Thyroïde Sang

247 66 405 133

<10-3 <5% NS NS

The main target for adaptation in apes are immunity, perception/communication, and spermatic competition / genomic conflicts.

Nielsen et al 2005 PLoS 3:170

SLIDE 27

Covarion models, functional shifts

Constant rate among sites Distribution of substitution rate across sites

SLIDE 28

Covarion models, functional shifts

Constant rate among sites Variable rates between sites Distribution of substitution rate across sites

SLIDE 29

Covarion models, functional shifts

Gamma

r

( ) ( ). ( / ). L y f r L y r dr

: Gamma probability density function

f

r1 r2 r3 r4

discrete Gamma g: assumed number of classes

( ) Pr( ). ( / )

g i i i

L y r r L y r

= =

The likelihood conditional on r is obtained by first multiplying branch lengths by r

Yang 1994 J Mol Evol 39:306

Calculating the likelihood assuming variable rates across sites

SLIDE 30

Covarion models, functional shifts

favourable mutation function 1 function 2 covarion Functional shifts and site-specific rate variations

SLIDE 31

Constant rate among sites Variable rates between sites Distribution of substitution rate across sites

Covarion models, functional shifts

Site-specific rate variation = COVARIONS

SLIDE 32

Covarion models, functional shifts

Galtier 2001 Mol Biol Evol 18:866

a. Constant rate across sites
c. Site-specific rate variation = covarions = heterotachy
b. Variable rates across sites

M M.r1 M.r2 M.r3 M.r1 M.r2 M.r3

ν ν ν ν ν ν ν ν ν ν ν ν ν ν ν ν ν ν ν ν ν ν ν ν

SLIDE 33

Covarion models, functional shifts

LR = 2 . [ln(L1) – ln(L0)] ~ χ2 (1 ddl) r < 1 r = 1 r > 1

M0 (constant rate in time)

r2 > r1 r1 > r2

M1 (variable rate in time) A likelihood-ratio test for detecting site-specific rate variation

SLIDE 34

Covarion models, functional shifts

a b c d e f g S T M F S L P S T M F S L P S T M F I F P S T M F T F P S T M F Y F M S T M F H F H S T M F H F T S T M F Y F P S T M F L F P S T M F F F F S T M F H F T S T M F Y F A S T M F P F P S T M F P F P S T M F P H L S T M F P F P S T M F L H T S T M F W V F S T M F F T P S T M F T V F S T M F L F L A A M V L F I A T M I L F I A T N A L F I A I V S L F I S V M F L F I T T V I L F I F T T L L F I S T M F W S I S T M M W S T S T M F M N Q S T M F P H Y S T M F P H P P R I M A T E S Pupko & Galtier 2002 Proc Roy Soc 269:1313

SLIDE 35

Coevolution between sites AA AC AG AT CA … AA AG AT CA . . .

Modelling coevolution ?

Tillier & Collins 1998 Genetics 148:1993, Pollock et al 1999 J Mol Biol 287:187

(pairs of states are what evolve) Such models, however, are uneasy to use and to generalize.

SLIDE 36

Coevolution between sites

An approach based on substitution mapping A C A G T T C . . . A G A G C T A . . . A G A G C T A . . . T C A G T T C . . . T C G G T T T . . . . . . . . .

probabilistic mapping clustering of mappings significance test

SLIDE 37

Coevolution between sites

Probabilistic substitution mapping For each site di, we want to estimate the number vik of changes that occurred in each branch k of the tree.

Vik = [ Pr(di, X, Y) / Pr(di) ] nXY(k)

Pr(di) : likelihood for site di Pr(di | X, Y) : likelihood for site di and states X and Y at top and bottom nodes

f branch k

X Y

Σ Σ

nXY(k) : expected number of changes along branch k knowing states X and Y at top and bottom nodes

SLIDE 38

Coevolution between sites

G A A A C A U C U U C G G G U U G UG A G G U U A A G C G A C U A A G C G U A C A C G GU G G A U G C CCUGGCA G U CA GA G G C G AU G A AG G A C G U G C U A A UCUG C G A UA A G CG U CG G U A A G G U G A U A U G A A C C G U U A U A A C C G G C G A U U U C CGAA U G G G G AAA C C C A G U G U G U UUC G A C A C A C UA U C A UU A AC U G A AU C C A U A G G U U A A U G AG G C G A A CC GG GG GA A CU G AA A C A U C U A A G U A C C C C G A G G A A A A G A A A U C A A C C G A G A U U C C C C CA G U A G C G G C G AG C G A A C G G G G A G C A G C C C A G A G CC U G AA U C A G U G U G U G U G U U A G U GGA A GC GU CUG G A A A G G C G C G C G A U A C A G G G U G A C A GCCC C G UA C A C A A A A A U G C A C A U G C U G U G A G C U C G A U G A G U A G G G C G G G A C A C G U GG U A UCC U GU CU G A AUA UG G G G G G A C C A U C C U C C A AGGC U A A A U A C U C C U G A C U G A C C G A U A G U G A A C C A G U A C C G U GAGG G A AA G G C G A A A A G A A C C C C G G C G AG GG G A G U G A A A AA GAA CC UGA A A C C G U G U A C G U A C A A G C A G U G G G A G C A C G C U U A G G C G U G U G AC U G C G U A C C U U U U G U A U A A U G G G U C A G C G A C U U A U A U U C U G U A G CA A GG U U A ACC GAA U A G G G G A G C C G A A G GG A A A C C G A G U C U U A A C UGG GC G U U A A G U U G C A G G G U A U A G A C C C G A A A C C C G G U G A U C U A G C C A U G G G C A G G UU GA A G G U UG GG U A A C A C U A A C U G G A G G A C C G A A C C G A C U A A U G U U G A A AAA UUA GCG G A U G A CUU G UG G CU G G G G G U G A A AGG C C A A UC A A AC CG GG A G A U AG CU GG U U C U C C C C G A A A G C U A U U U A G G U A G C G C C U C G U G A A U U C A U C U C C G G G G G U A G A G C A C U G U U U C G G C A A G G G G G U C A U C C CG A C UU A C CA ACCC GA UG CA A A C U G CG AA UACC G GAG AAU G U U A U C AC G G G A G A C A C A C G G C G G G U G C U AACGU C CG U CGUG A AG A G G G A A A C A A CC CA GAC C G C CA G C U A A G G U C C C A AAG UC A U G G U U A A G U G G G A AAC G A U G U G G G A A G G C C C A G AC AG C C A G G A U G U U G G C U U A G A A G CA G CC AU CA U U UAA A G A A A GCGUA A U A G C U C A C U G G UC G A G U C G G C C U G C G C GGA AGA U G U A A C G G G G C U A A A C C A U G C A C C G A A G C U G C G GC A G C G A C G C UU A U G C G U U G U U G G G U A G G G GA GCG U U C U G UA A G C C U G C G AA G G U G U G C U G U G A G G C A U G C UG G A G G U A U C A G A AGU G CG A A UG CU G AC A U A A G U A A C G A U A A A G C G G G U G A A A A G C C CG CU C G C C G G A A G A C C A A G G G U U C C UG U CC A A C G U U A A U C G G G G C A G G G U G A G U C G A C C C C U A A G G CG A G G CCG A A A G G C G U A G U C G A U G G G A A A C A G G U U A A U AU U C CUG U ACU U G G U G U U A C U G C G A AG G G G G G A CG G A G A AG G C U AU G U U G G C C G G G C G A C G G U U G UC C CG G UU U A A G C G U G U A G G C U G G U U U U C C A G G C A AAU CCG GA AA AU C A A GG C UG A G G CG UGA UG A C GAGGC A CUA C G G U G C U G A A G C A A C A A A U G C C C U G C U U CC AG G A A A A G C C U CU A A G C A U C A G G U A A C A U C A AAU C G U A CC C C AA A C C G A CA C A G G UG GU CA G G U A G AGA A U A C C AA GG CG C UU G A G A G A AC U C G G G U G A A G G A A C U A G G C A A A A U G G U G C C G U A A C U U C G G G A G A A G G C A C G C U G A U A U G U A G G U G A A G C G A C UUG C U C G UG G A G C U GA A A U C A G U CG A AGA U AC CA G C U G G C U G C A A C U G U U U A U U A AA AA CA CA G C A C U G U G C A A A C A C G AAA G U G G A C G U A U A C G G U GU G A C GC C U G C C C G G U G C C G G A AG G U U A A U U G AUG G G G U U AG C G CA A G C G A A G C U CU U G A U C G A A G C C C C G G U A A A C G G C G GC CG U A ACU A U A A C G G U CCU A AG GU A G C G A A A A U U C C U U G UC GG GU A A G U U C C G A C C U G C A C G A A U G G C G U A A U GA U G G C C A G G C U G U C U C C A C C C G A G A C U C A G UGAAAUUG AA CU C G C U G U G A AGA U G C A G U G U AC CCG C GGC AA GA C G G A A A G A C C C C G U G A A C C U U U A C U A U A G C U U G A C A C U G A A C A U U G A G C C U U G A U G U G U A G GA UAG G U G G G A G G C U U A G A A G U G U G G A C G CCA G U C U G C A UG G A G C CGA C C U U G A A A U A C C A C C C U U U A A U G U U U G A U G U U CUA ACG U UG AC CC G UAA U C C G G G U U G C G G A C A G U G U C U G G U G G G U A G UU U G A C U G G GG C G G U C U C C U CC U A AAG A G U A A C G G A G G A G C AC G A AG GU U GGC U A A U C C U G G U C GGA C A U C A G G AG G U U A G U G C A A U G G C A U A A G C C A G C U UG A C UG C GA GC G U G A C G G C G C G A G C A G G UG C G A A A G C A G G U C A U A G U G A U C C GG U GG UU CUGA A U G G A A G G G C C A U C G C U C A A C G G A U A A A A G G U A C U C C G G G G A U A A CA GG C UGA U AC CG C C C A A G AGUU C A U A U C G A C G G C G G U G U U U G G C A C C U C G A U G U CG GC U C A UC AC A U C CU GG G G CU GAAG U A G G U C C C C A A G G G U A UG CU G U U C G C C A U U U A A A G U G G U A C G C G A G C U G G G U U U A G A A C G U C G U G A G A C AG U U C G G U C C C U A U C U G C C G U G G G C G C U GG AG AAC UG AG GG G GGC U G C U C C U A G U A CG A GA G G A C C G G A G U G GA CG CAU C A CU GGU G U U C G G GU U G U C AU GC C A A U G C A C U G C C C G G U A G C U A A A U G C G G A A G A GA U A A G U G C UG A A A G C A U C U AA G C A C G A A A C U U G C C C C G A G A U G A G U U C U C C C U G AC CCUU U A A G G G U C C U G A A GG AA C G U U G A A GAC G A C G A C GU U G A UA GG CC GGG UG U GU AA GC GC AG C G A U G C G U U G A G C U A A C C G G U A C U A A U G A A C C G U G A G GC U UAA CCU U

B1 B2 B3 B4 B5 B6 B7 B8 B9 B10 B11 B12 B13 B14 B15 B16 B17 B18 B19 B20 B21 C1 D1 D2 D3 D4 D5 D6 D7 D8 D9 D10 D11 D12 D13 D14 D15 D16 D17 D18 D19 D20 D21 D22 E1 E2 E3 E4 E5 E6 E7 E8 E9 E10 E11 E12 E13 E14 E15 E16 E17 E18 E19 E20 E21 E22 E23 E24 E25 E26 E27 E28 F1 G1 G2 G3 G4 G5 G6 G7 G8 G9 G10 G11 G12 G13 G14 G15 G16 G17 G18 G19 G20 H1 H2 H3 H4 H1_1 I1 I2 I3 Escherichia coli D U18997

16S rRNA P formyl-transferase

Dutheil et al 2005 Mol Biol Evol 22:1919, Dutheil & Galtier 2007 submitted

SLIDE 39

Molecular clock models

Dating divergences

a natural goal: providing a time scale to species divergence
easy in principle:

A B t t = r * dist(A,B) / 2

uneasy in practice:

(i) We typically deal with an arbitrary number of sequences, not just two: need for a phylogenetic approach. (ii) We typically have several, potentially discordant, fossil calibrations, not just one: need to reconcile them. (r : molecular evolutionary rate) (iii) There is no such thing as "the molecular evolutionary rate": need to account for departures from the molecular clock.

SLIDE 40

Molecular clock models

Dating divergences A B C D E A B C D E

tmin(A,B) tmax(A,B) tmin(D,E) tmax(D,E)

t What we have What we want

SLIDE 41

Molecular clock models

A bayesian approach

data D: a set of aligned sequences
parameters θ : tree topology, divergence times, rate autocorrelation
priors: "flat" for topology, from fossils for some divergence dates

Pr(θ | D) Pr(D | θ) posterior distribution likelihood

SLIDE 42

Molecular clock models

A bayesian approach

data D: a set of aligned sequences
parameters θ : tree topology, divergence times, rate autocorrelation
priors: "flat" for topology, from fossils for some divergence dates

Pr(θ | D) = Pr(D | θ) Pr(θ) Pr(D) Bayes theorem: posterior distribution likelihood prior distribution Calculating Pr(D) would require to integrate the likelihood over the space of branch lengths. This is not computationally feasable ⇒ Monte Carlo Markov Chains.

SLIDE 43

Conclusions

A basic tool of phyloinformatics

New biological questions in molecular evolution are typically (optimally) answered by building new models

A field under rapid development since the mid-90's

Most existing programs use Markov models. Markov models: See papers by Yang, Goldman, Whelan, Pupko, Thorne, Nielsen, Huelsenbeck, Rannala, Suchard, Lartillot, Rodrigo, Guindon, among other. A Royal Society conference about this topic, London, April 2008

Many good achievements, but still some work to be done