D T E Bi & Bogart 2010. BMC - - PowerPoint PPT Presentation

d t e
SMART_READER_LITE
LIVE PREVIEW

D T E Bi & Bogart 2010. BMC - - PowerPoint PPT Presentation

D T E Bi & Bogart 2010. BMC Evol. Biol. D T E Methods estimate the SUBSTITUTION RATE and TIME separately Branch


slide-1
SLIDE 1

D T E

Bi & Bogart 2010. BMC Evol. Biol.

slide-2
SLIDE 2

D T E

Methods estimate the SUBSTITUTION RATE and TIME separately

Branch lengths = SUUBSTITION RATE X TIME Branch lengths = SUBSTITUTION RATE Branch lengths = TIME

Unconstrained methods provide branch length estimates that are the product of SUBSTITUTION RATE and TIME

slide-3
SLIDE 3

M  S-R V

These models make assumptions about how substitution rate changes over the tree

◮ Global molecular clock (Zuckerkandl & Pauling, 1962) ◮ Local molecular clocks (Kishino, 1990; Rambaut & Bromham

1998; Yang & Yoder, 2003)

◮ Compound Poisson process (Huelsenbeck et al., 2000) ◮ Autocorrelated rates – Log-normal distribution (Thorne et al.,

1998; Kishino & Thorne, 2001; Thorne et al., 2002)

◮ Uncorrelated rates (Drummond et al., 2006, Rannala &

Yang, 2007, Lepage et al., 2007)

◮ Non-parametric rate smoothing/Penalized likelihood

(Sanderson, 1997, 2002)

◮ Ornstein-Uhlenbeck Process (Aris-Brosou & Yang, 2002) ◮ Cox–Ingersol–Ross process (Lepage et al., 2006)

slide-4
SLIDE 4

G M C

Substitution rate is constant across all lineages

Branch lengths = SUUBSTITION RATE X TIME Branch lengths = SUBSTITUTION RATE Branch lengths = TIME

(Zuckerkandl & Pauling, 1962)

slide-5
SLIDE 5

R  G M C

Incorrect models of sequence evolution lead to errors in the estimation of rates

◮ Almost any error in the

model can lead to biases (or higher than needed variance) in detecting multiple hits

◮ Assumption of a Poisson

clock can be wrong – even if we correctly count the number of changes, we don’t account for

  • ver-dispersion (higher

than Poisson-variance in the number of substitutions) (Cutler, 2000)

Branch lengths = SUUBSTITION RATE X TIME Branch lengths = SUBSTITUTION RATE Branch lengths = TIME

slide-6
SLIDE 6

R  G M C

Rates of evolution can vary across lineages and over time

◮ mutation rates can vary

(mutations per cell cycle, mutations per time, number of cell cycles per generation, generation time)

◮ strength and targets of

selection can vary

◮ population sizes can

vary

Branch lengths = SUUBSTITION RATE X TIME Branch lengths = SUBSTITUTION RATE Branch lengths = TIME

slide-7
SLIDE 7

L M C

Closely related lineages share the same rate – rates are clustered by subclades

low high rate

(Kishino, 1990; Rambaut & Bromham 1998; Yang & Yoder, 2003)

slide-8
SLIDE 8

C P P

Rate changes occur along lineages according to a point process At rate-change events, the new rate is a product of the old rate and a Γ - distributed multiplier

low high rate

(Huelsenbeck et al., 2000)

slide-9
SLIDE 9

A R (L-)

Substitution rates evolve gradually over the tree - closely related lineages have similar rates The rate at a node is drawn from a lognormal distribution with a mean equal to the parent rate

low high rate

(Thorne et al., 1998; Kishino & Thorne, 2001; Thorne et al., 2002)

slide-10
SLIDE 10

U R (L-  G)

The rates associated with each lineage are drawn, independently from a log-normal or gamma distribution Common models used in BEAST

low high rate

(Drummond et al., 2006)

slide-11
SLIDE 11

M  S- V

Are our models appropriate across all data sets?

cave bear American black bear sloth bear Asian black bear brown bear polar bear American giant short-faced bear giant panda sun bear harbor seal spectacled bear 4.08 5.39 5.66 12.86 2.75 5.05 19.09 35.7 0.88 4.58

[3.11–5.27] [4.26–7.34] [9.77–16.58] [3.9–6.48] [0.66–1.17] [4.2–6.86] [2.1–3.57] [14.38–24.79] [3.51–5.89] 14.32 [9.77–16.58] 95% CI mean age (Ma)

t 2 t 3 t 4 t 6 t 7 t 5 t 8 t 9 t 10 t x

node MP•MLu•MLp•Bayesian 100•100•100•1.00 100•100•100•1.00 85•93•93•1.00 76•94•97•1.00 99•97•94•1.00 100•100•100•1.00 100•100•100•1.00 100•100•100•1.00

t 1 Eocene Oligocene Miocene Plio Plei Hol 34 5.3 1.8 23.8 0.01 Epochs Ma

Global expansion of C4 biomass Major temperature drop and increasing seasonality Faunal turnover

Krause et al., 2008. Mitochondrial genomes reveal an explosive radiation of extinct and extant bears near the Miocene-Pliocene boundary. BMC Evol. Biol. 8.

Taxa 1 5 10 50 100 500 1000 5000 10000 20000 100 200 300 MYA Ophidiiformes Percomorpha Beryciformes Lampriformes Zeiforms Polymixiiformes

  • Percopsif. + Gadiif.

Aulopiformes Myctophiformes Argentiniformes Stomiiformes Osmeriformes Galaxiiformes Salmoniformes Esociformes Characiformes Siluriformes Gymnotiformes Cypriniformes Gonorynchiformes Denticipidae Clupeomorpha Osteoglossomorpha Elopomorpha Holostei Chondrostei Polypteriformes Clade r ε "AIC 1. 0.041 0.0017 25.3 2. 0.081 * 25.5 3. 0.067 0.37 45.1 4. * 3.1 Bg. 0.011 0.0011

  • Ostariophysi

Acanthomorpha Teleostei

  • Santini et al., 2009. Did genome duplication drive the origin
  • f teleosts? A comparative study of diversification in

ray-finned fishes. BMC Evol. Biol. 9.

slide-12
SLIDE 12

D T E S

Program Models/Method r8s Strict clock, local clocks, NPRS, PL ape (R) NPRS, PL multidivtime log-n autocorrelated (plus some others) PhyBayes OU, log-n autocorrelated (plus some others) PhyloBayes CIR, white noise (uncorrelated) (plus some others) BEAST Uncorrelated (log-n & gamma), local clocks TreeTime Dirichlet model, CPP, uncorrelated RevBayes CPP, strict clock, DPP, autocorrelated

slide-13
SLIDE 13

P  N T

Relaxed clock Bayesian analyses require a prior distribution

  • n node times

Uniform prior: the time at a given node has equal probability across the interval between the time of the parent node and the time of the oldest daughter node Birth-death prior: node times are sampled from a stochastic process with parameters for speciation and extinction (and in some cases taxon sampling)

Uniform prior Birth-death prior

slide-14
SLIDE 14

P  N T

A comparison of the prior and posterior estimates of relative node ages using an autocorrelated rates model and the uniform or birth-death priors on node times

Lepage et al., 2007. A General Comparison of Relaxed Molecular Clock Models. MBE 24:2669–2680.

slide-15
SLIDE 15

C  T

Goal: branch lengths in absolute time

Time

slide-16
SLIDE 16

C  T

We have an estimate of the tree topology

slide-17
SLIDE 17

C  T

Known ages for sampled extant taxa Estimates of minimum ages (from fossils or biogeographical data) that can be applied to nodes on the tree

Time

fossil age present

slide-18
SLIDE 18

C  T

Assigning fossils to clades

(Benton & Donoghue 2007 Mol. Biol. Evol. 24(1):26–53)

slide-19
SLIDE 19

C  T

Assigning fossils to clades

Crown clade: all living species and their most-recent common ancestor (MRCA)

(Benton & Donoghue 2007 Mol. Biol. Evol. 24(1):26–53)

slide-20
SLIDE 20

C  T

Assigning fossils to clades

Stem lineages: purely fossil forms that are closer to their descendant crown clade than any other crown clade

(Benton & Donoghue 2007 Mol. Biol. Evol. 24(1):26–53)

slide-21
SLIDE 21

C  T

Assigning fossils to clades

Fossiliferous horizons: the sources in the rock record for relevant fossils

(Benton & Donoghue 2007 Mol. Biol. Evol. 24(1):26–53)

slide-22
SLIDE 22

P  C N

Fossils typically provide MINIMUM bounds for calibrating nodes Reliable MAXIMUM bounds are difficult to obtain

Uniform ( , ? )

Time

fossil age present

slide-23
SLIDE 23

P  C N

Different types of distributions that do not require maximum bounds can be applied to calibrated nodes

Exponential (λ) Gamma (α, β) Uniform ( , ∞)

Time

fossil age present

slide-24
SLIDE 24

B D  

Time

slide-25
SLIDE 25

C  R C M

◮ Dependent on and sensitive to fossil calibrations – fossil

age estimates and node assignment are not without error

◮ Models are not biologically realistic ◮ Different methods/models can produce very different

estimates of the same divergence times

◮ Priors are too informative ◮ Studies comparing methods have produced conflicting

and unclear results

slide-26
SLIDE 26

HIV T  L

An outbreak of HIV (and HepC) among patients in a Libyan hospital resulted in the 8-yr imprisonment of 6 foreign medical workers The defendants were accused of deliberately infecting over 400 children and sentenced to death

(Butler, 2007 Nature)

Prosecutors claimed the medical workers used the kids as test subjects in an illicit clinical trial

slide-27
SLIDE 27

HIV T  L

A study by HIV experts – Were the viral strains present in Libya before or after the arrival of the foreign medics (March 1998)?

(de Oliveria et al., 2006 Nature)

slide-28
SLIDE 28

HIV T  L

Strict and relaxed clock methods were used to estimate the date of the MRCA for each cluster Every MRCA predated March 1998 Many different models were used to obtain robust results and build a solid case against deliberate infection

(de Oliveria et al., 2006 Nature)

This study supported the findings of a previous epidemiological study: the

  • utbreak was the result of

poor and unsanitary medical practices

slide-29
SLIDE 29

HIV T  L

The Libyan court denied the validity of the findings and held that the outbreak was deliberate International outrage and the phylogenetic analyses resulted in the eventual release of the defendants