Using phylogenetics to estimate species divergence times ... More - - PowerPoint PPT Presentation

using phylogenetics to estimate species divergence times
SMART_READER_LITE
LIVE PREVIEW

Using phylogenetics to estimate species divergence times ... More - - PowerPoint PPT Presentation

Using phylogenetics to estimate species divergence times ... More accurately ... Basics and basic issues for Bayesian inference of divergence times (plus some digression) "A comparison of the structures of homologous proteins ... from


slide-1
SLIDE 1

Using phylogenetics to estimate species divergence times ... More accurately ... Basics and basic issues for Bayesian inference of divergence times (plus some digression)

slide-2
SLIDE 2

"A comparison of the structures of homologous proteins ... from different species is important, therefore, for two

  • reasons. First, the similarities found give a measure of

the minimum structure for biological function. Second, the differences found may give us important clues to the rate at which successful mutations have occurred throughout evolutionary time and may also serve as an additional basis for establishing phylogenetic relationships."

From p. 143 of The Molecular Basis of Evolution

by Dr. Christian B. Anfinsen (Wiley, 1959)

slide-3
SLIDE 3

0.5% 0.5% 4.5% 5% 10% 5% 10% 20%

slide-4
SLIDE 4

0.5% 0.5% 4.5% 5% 10% 5% 10% 20% 200 M 200 Million illion Year ear O Old F ld Fossil

  • ssil
slide-5
SLIDE 5

0.5% 0.5% 4.5% 5% 10% 5% 10% 20% 200 M 200 Million illion Year ear O Old F ld Fossil

  • ssil

20% Sequence Divergence in 200 Mill. Years means 1% divergence per 10 Mill. Years 400 Million 100 Million 10 Million

The "Clock Idea"

slide-6
SLIDE 6

“Ernst Mayr recalled at this meeting that there are two distinct aspects to phylogeny: the splitting of lines, and what happens to the lines subsequently by divergence. He emphasized that, after splitting, the resulting lines may evolve at very different rates... How can one then expect a given type of protein to display constant rates of evolutionary modification along different lines of descent?” (Evolving Genes and Proteins. Zuckerkandl and Pauling, 1965, p. 138).

slide-7
SLIDE 7

0.5% 0.5% 4.5% 5% 10% 5% 10% 20% 200 M 200 Million illion Year ear O Old F ld Fossil

  • ssil

400 Million 100 Million 10 Million A problem with the "Clock Idea": Rates of Molecular Evolution Change Over Time !!

slide-8
SLIDE 8

0.5% 0.5% 4.5% 5% 10% 5% 10% 20% I If mammal head f mammal head is der is deriv ived char ed charac acter er & f & fossil is 200 M

  • ssil is 200 Mill.
  • ill. Years

ears

  • ld then bir
  • ld then bird-mammal split

d-mammal split must ha must have b e been a een at least 200 t least 200 million y million years old ears old. This is a c his is a constr

  • nstrain

aint

  • n a div
  • n a diver

ergenc gence time e time. Another problem with the "Clock Idea": Fossils are unlikely to represent same organism as genetic common ancestor.

slide-9
SLIDE 9

Bayesian Idea: (Prior Information ) X (Information from data) = Posterior Information

slide-10
SLIDE 10

R: rates T: node times C: Fossil Evidence (constraints) S: Sequence Data P(S,R,T|C) P(S|R,T,C) P(R|T,C) P(T|C) P(S|C) P(S|C) P(R,T|S,C) = = =

Basic Idea for Bayesian Divergence Time Inference

P(S|R,T) P(R|T) P(T|C) P(S|C)

slide-11
SLIDE 11

Bayesian Divergence Time Components

  • 1. DNA or protein sequence data
  • 2. Model of Sequence Change
  • 3. Model of Rate Change
  • 4. Prior Distributions for Rates, Times, etc.
  • 5. Fossil or other information
slide-12
SLIDE 12

1 2 3 4 5

Rate

1 2 3 4 5

Time

Branch Length = Rate x Time

(the information from molecular sequence data)

slide-13
SLIDE 13

1 2 3 4 5

Rate

1 2 3 4 5

Time

Prior Distribution

slide-14
SLIDE 14

1 2 3 4 5

Rate

1 2 3 4 5

Time

slide-15
SLIDE 15

1 2 3 4 5

Rate

1 2 3 4 5

Time

Region between green vertical lines are constraints on node time

Posterior with constraints

slide-16
SLIDE 16

1 2 3 4 5

Rate

1 2 3 4 5

Time

Yang-Rannala “Soft” Constraints (dashed green lines treated as imperfect fossil evidence)

slide-17
SLIDE 17

Bayesian Divergence Time Components

  • 1. DNA or protein sequence data

Sequence data is needed for branch length (rate x time) estimation. Sequence data does not separate rates and times. Better to invest in improving other time estimation components?

slide-18
SLIDE 18

Bayesian Divergence Time Components

  • 2. Model of Sequence Change

Branch Length (BL) Errors Divergence Time Errors Posterior distributions for times are compromise between branch length information from sequence data and prior information and fossil information.

slide-19
SLIDE 19

1 2 3 4 5 1 2 3 4 5

Time

Rate

Branch length estimation error can affect divergence time estimates ...

slide-20
SLIDE 20

Bayesian Divergence Time Components

  • 2. Model of Sequence Change

Branch Length (BL) Errors Divergence Errors in BL uncertainty Time Errors Posterior distributions for times are compromise between branch length information from sequence data and prior information and fossil information.

slide-21
SLIDE 21

1 2 3 4 5 1 2 3 4 5

Time

Rate

Red line represents “best” branch length

  • estimate. How good are yellow and green

estimates? Point: Rate and time estimates are a compromise between branch length uncertainty and prior information... Errors in assessing branch length uncertainty could have big effect

  • n divergence time

inferences ...

slide-22
SLIDE 22

Errors in BL uncertainty have more serious consequences for divergence time estimation than for phylogeny inference. Sources of these errors include failure to account for dependent change among sequence positions. Context-Dependent Mutation Codons Protein Tertiary Structure RNA Secondary Structure Other Genotype-Phenotype Connections

slide-23
SLIDE 23

Bayesian Divergence Time Components

  • 3. Model of Rate Change

How much of what appears to be rate change really is rate change? see Cutler, D.J. (2000) Estimating divergence times in the presence

  • f an overdispersed molecular clock.
  • Mol. Biol. Evol. 17:1647-1660.
slide-24
SLIDE 24

A point made well by Cutler (2000) ...Rejection of constant rate hypothesis may not be due to variation of rates

  • ver time as much as being due to

poor models of sequence evolution that may mislead us about how confident we can be regarding branch length estimates ... (my viewpoint... "first principles"

  • f evolutionary biology mean

constant rate hypothesis must be formally wrong even though it may sometimes be nearly right)

slide-25
SLIDE 25

A B C D E A B C D E Molecular Clock No Clock amount of evolution (substitutions per site)

slide-26
SLIDE 26

Why might rates of molecular evolution change over time? Candidates include changes in ... mutation rate per generation generation time natural selection (including effects due to duplication) population size (higher rates for small pop. size)

slide-27
SLIDE 27

From: Lartillot N , Poujol R. 2011. Reconstruction of the evolution

  • f body mass in carnivores.

Mol Biol Evol 28:729-744

A promising idea: By allowing them to evolve along with substitution rates, phenotypic characters that may be correlated with substitution rates can be leveraged to improved divergence time estimates

slide-28
SLIDE 28

Bayesian Divergence Time Components

  • 4. Prior Distributions for Rates, Times, etc.

Difficulty in specifying appropriate prior distributions is arguably the biggest obstacle for Bayesian inference and this difficulty is especially great for divergence time estimation. In many situations, prior distribution is not too important if data set is large. However, large amounts of sequence data do not overcome need for good rate and time priors here ...

slide-29
SLIDE 29

A nice paper ...

Drummond, Ho, Phillips, and Rambaut. 2006. Relaxed Phylogenetics and Dating With Confidence. PLOS Biology 4(5):e88 (see also their BEAST software) (i) Divergence time estimation without prespecified topology (ii) Phylogeny inference incorporating models of rate evolution

A B I C D J

Branch length between Nodes A & I and between Nodes B & I should be correlated even if rates on these branches are independent

  • f each other.

Reason: These branches represent the same amount of time.

slide-30
SLIDE 30

BEAUti BEAST Tracer FigTree

make XML files as input for BEAST analyses Make your own XML files to input to BEAST MCMC on rooted gene or species trees

diagnose MCMC convergence, visualize MCMC

  • utput

draw trees Other MCMC programs (e.g. MrBayes)

Other Programs

BEAST & relatives (see http://tree.bio.ed.ac.uk/software/)

slide-31
SLIDE 31

General impressions when data sets are analyzed with and without the constant rate assumption... ... often best estimate of all node times is very similar for the two situations ...often divergence time estimates are very similar except for one or a few nodes ...less often divergence time estimates differ greatly at most or all nodes

slide-32
SLIDE 32

More general impressions ... Uncertainty on node time estimates is higher when clock is not assumed Prior distribution requires more Markov chain Monte Carlo cycles to approximate well than posterior distribution Uncertainty on node time estimates is generally very high unless there is at least one node constrained with lower bound time and at least one node constrained with upper bound time

slide-33
SLIDE 33

(Incomplete) List of Multigene Analysis Possibilities:

  • 1. Genes do not share common divergence times (for
  • pop. gen. and closely related species)
  • 2. Genes share divergence times and pattern of rate

change (concatenate genes for this case?)

  • 3. Genes share divergence times and common tendency to

change rates but not actual patterns of rate change

  • 4. Genes share divergence times but not tendency to change

rates or actual patterns of rate change lineage effects? do functionally related genes have similar patterns of rate change?

slide-34
SLIDE 34 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5

18S 28S

Rate Change for Divergence Times versus for other reasons...

slide-35
SLIDE 35

Bayesian Divergence Time Components

  • 5. Fossil or other information

Prospects for much improved treatment

  • f fossil evidence are good

(particular progress by Ronquist et al.

  • 2012. Syst. Biol. in press;

see also Lee et al. 2009. Mol. Phylo.

  • Evol. 50:661-666)
slide-36
SLIDE 36

2006 1995 Serially Sampled Data Can separate rates and times for quickly evolving (e.g., viral) lineages but cannot for slow lineages.

slide-37
SLIDE 37

2006 10 MYA? Bayesian techniques can (in principle) account for uncertainty in phylogenetic placement of fossils and in uncertainty of fossil dating! ?

slide-38
SLIDE 38

2006 10 MYA Can get sequence data and morphological data for 2006. Can get morphological (fossil) data for 10 million years ago! Strategy: Use both molecular & morphological models of character change !!

slide-39
SLIDE 39

Protein Sequences from Mastodon and Tyrannosaurus Rex Revealed by Mass Spectrometry

Asara et al. 2007. Science 316:280-285

68 mya collagen protein sequence data !!

slide-40
SLIDE 40

68 mya collagen protein sequence data !! Ancient protein sequences to supplement morphological fossil data (i.e., extend serially sampled techniques way way beyond HIV data) ?

slide-41
SLIDE 41

68 mya collagen protein sequence data !! With Genotype-Phenotype mapping information can we accurately predict (and validate) ancient protein/DNA sequences based on morphological evidence?

slide-42
SLIDE 42

Bayesian Divergence Time Components

  • 5. Fossil or other information

Other information in the form of mutation data ...

slide-43
SLIDE 43

1 2 3 4 5

Rate

1 2 3 4 5

Time

Prior Distribution

slide-44
SLIDE 44

1 2 3 4 5

Rate

1 2 3 4 5

Time

slide-45
SLIDE 45

1 2 3 4 5

Rate

1 2 3 4 5

Time

Time Information from Fossil Data

slide-46
SLIDE 46

1 2 3 4 5

Rate

1 2 3 4 5

Time

Rate Information from Mutation Data

slide-47
SLIDE 47

Substitutions per Year Mutations per Year Neutral Assumption Next Generation Sequence Data (Parent-Offspring Or Mutation Accumulation Data) Mutations per Generation Generations per Year

slide-48
SLIDE 48

R: rates T: node times M: Mutation Data S: Aligned Homologous Sequence Data P(M,S,R,T) P(M|R,T,S)P(S|R,T)P(R|T)P(T) P(M,S) P(M,S) P(R,T|M,S) = = =

Our (H.-J. Lee, H. Kishino, J.L. Thorne) Basic Idea ...

P(M|R)P(S|R,T)P(R|T)P(T) P(M,S)

slide-49
SLIDE 49

Substitutions per Year Mutations per Year Neutral Assumption Next Generation Sequence Data (Parent-Offspring Or Mutation Accumulation Data) Mutations per Generation Generations per Year

Mutation-Selection Balance A Future Direction ...

slide-50
SLIDE 50

Korber et al.2000.Timing the Ancestor of the HIV-1 Pandemic

  • Strains. Science 288:1789
slide-51
SLIDE 51

Rate after therapy (substitutions/site/day) x 10

  • 5

Rate after therapy (substitutions/site/day) x 10

  • 4

HIV substitution rates before and after therapy (Log-Likelihood Surface) From Drummond et al. 2001. MBE 18:1365-1371

slide-52
SLIDE 52

Bayesian Divergence Time Components

  • 1. DNA or protein sequence data - Bountiful
  • 2. Model of Sequence Change - Difficult
  • 3. Model of Rate Change - Difficult
  • 4. Prior Distributions for Rates, Times, etc. - ? ? ?
  • 5. Fossil or other information - Progress !!
slide-53
SLIDE 53

THE END!

Some divergence time inference software: Beast http://beast.bio.ed.ac.uk/ PAML http://abacus.gene.ucl.ac.uk/software/paml.html PhyloBayes www.phylobayes.org/

slide-54
SLIDE 54 Additional References / Good divergence time reading material: Aris-Brosou, S., and Z. Yang. 2002. Effects of models of rate evolution on estimation of divergence dates with special reference to the metazoan 18S ribosomal phylogeny. Syst. Biol. 51(5):703-714. Felsenstein, J. Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evol., 17, 368–376 (1981) Gillespie, J.H.: The causes of molecular evolution. Oxford University Press, New York. (1991) Hasegawa, M., Kishino, H., Yano, T.: Dating of the Human-Ape Splitting by a molecular clock of mitochondrial DNA. J. Mol. Evol, 22, 160–174 (1985) Huelsenbeck, J.P., Larget, B., Swofford, D.L.: A compound Poisson process for relaxing the molecular clock. Genetics, 154, 1879–1892 (2000) Kishino, H., Hasegawa, M.: Converting distance to time: an application to human evolution. Methods in Enzymology, 183, 550–570 (1990) Kishino, H., Thorne, J.L., Bruno, W.J.: Performance of a divergence time estimation method under a probabilistic model of rate evolution. Mol. Bio. Evol., 18, 352–361 (2001) Leitner, T., Albert, J.: The molecular clock of HIV-1 unveiled through analysis of a known transmission history. Proc. Natl. Acad Sci. USA, 96, 10752–10757. (1999) Rambaut, A.: Estimating the rate of molecular evolution: incorporating non–contemporaneous sequences into maximum likelihood phylogenies. Bioinformatics, 16, 395–399 (2000) Sanderson, M.J. 1997. A nonparametric approach to estimating divergence times in the absence of rate constancy. Mol. Biol. Evol. 14:1218--1232. Sanderson, M.J.: Estimating absolute rates of molecular evolution and divergence times: A penalized likelihood approach. Mol. Biol. Evol., 19, 101–109 (2002) Sanderson, M.J.: R8S: Inferring absolute rates of molecular evolution and divergence times in the absence of a molecular clock. Bioinformatics, 19, 301–302 (2003) Thorne, J.L., Kishino, H., Painter, I.S.: Estimating the rate of evolution of the rate of molecular evolution. Mol. Bio. Evol., 15, 1647–1657 (1998) Thorne, J.L., Kishino, H.: Divergence time and evolutionary rate estimation with multilocus data. Syst. Biol., 51, 689–702 (2002) Yang, Z., Rannala, B. 2006. Bayesian estimation of species divergence times under a molecular clock using multiple fossil calibrations with soft
  • bounds. Mol Biol Evol 23(1):212-226
Yoder, A.D., Yang, Z.H.: Estimation of primate speciation dates using local molecular clocks. Mol. Biol. Evol. 17, 1081–1090 (2000) Zuckerkandl, E., Pauling, L.: Molecular disease, evolution, and genic heterogeneity. In: Kasha, M., Pullman, B. (eds) Horizons in Biochemistry: Albert Szent-Gyorgyi Dedicatory Volume. Academic Press, New York. (1962) Zuckerkandl, E., Pauling, L.: Evolutionary divergence and convergence in proteins. In: Bryson, V., Vogel, H.J. (eds) Evolving Genes and Proteins Academic Press, New York. (1965)