Phylogenies Phylogenies describe history Phylogenies describe - - PowerPoint PPT Presentation
Phylogenies Phylogenies describe history Phylogenies describe - - PowerPoint PPT Presentation
Phylogenies Phylogenies describe history Phylogenies describe history Haeckel. 1879. Phylogenies describe history Pace. 1997. Science. Phylogenies are the result of branching processes Timeseries and phylogeny are dual outcomes of an
Phylogenies describe history
Phylogenies describe history
- Haeckel. 1879.
Phylogenies describe history
- Pace. 1997. Science.
Phylogenies are the result of branching processes
Timeseries and phylogeny are dual
- utcomes of an infectious process
Time Epidemic process
Epidemic process Time Count
Time Count
Can ask for the probability of observing this timeseries given epidemiological parameters β and γ.
Epidemic process
Time Epidemic process
Time Epidemic process
Sample some individuals
Epidemic branching process Time
Epidemic branching process Time
Epidemic branching process Time
Can ask for the probability of observing this tree given epidemiological parameters β and γ.
The coalescent
Assume equilibrium number of infecteds. Call this equilibrium N.
The coalescent
Sample some individuals
The coalescent
Each generation, there is a small chance for coalescence for each pair
Pr(coal|i = 2) = 1 N
The coalescent
Probability of coalescence scales quadratically with lineage count
Pr(coal) = ✓i 2 ◆ 1 N = i(i − 1) 2N
The coalescent
The coalescent
The coalescent
The coalescent T3 T2
Ti ∼ Exponential ✓ 2N i(i − 1) ◆
Demo
5k 10k 5k 10k
- 5k
5k 10k
- 5k
5k 10k
- 15k
- 10k
- 5k
5k 10k
- 20k
- 15k
- 10k
- 5k
5k 10k
N = 500 N = 1000 N = 2000 N = 5000 N = 10000 N = 20000
Population size affects tree shape
The rate of coalescence decreases linearly with the population size N.
Changing population size
Constant size Growing population
Changing population size
Constant size Growing population
Generally, we want to know: Bayes rule:
p(model|data)
Often referred to as:
posterior ∝ likelihood × prior
Given a phylogeny, how can we learn about the evolutionary process that underlies it?
p(model|data) ∝ p(data|model) p(model)
In this case, we have:
p(λ|τ) ∝ p(τ|λ) p(λ) λ – coalescent model
– phylogeny
τ
However, we don’t observe the tree directly:
p(τ, µ|D) ∝ p(D|τ, µ) p(τ) p(µ)
– sequence data
D µ – mutation model
We integrate over uncertainty:
p(λ|D) ∝ Z p(D|τ, µ) p(τ|λ) p(λ) p(µ) dτ dµ
BEAST: Bayesian Evolutionary Analysis by Sampling Trees
Integration through Markov chain Monte Carlo
- 12
- 10
- 8
- 6
- 4
- 2
2
- 3
- 2
- 1
1 2 3 x1 x2
Integration through Markov chain Monte Carlo
- 12
- 10
- 8
- 6
- 4
- 2
2
- 3
- 2
- 1
1 2 3 x1 x2
Metropolis-Hastings algorithm
Acceptance probability: p (θ*) p (θ) If new state is more likely, always accept. If new state is less likely, accept with probability proportional to ratio of new state to old state. p(x) = 0.2 Starting from state θ propose a new state θ*. For the following, this proposal must to symmetric, i.e. Q(θ➝ θ*) = Q(θ*➝ θ) min 1,
( )
Simple example: p(y) = 0.8 A(x➝y) = 0.8/0.2 = 1 A(y➝x) = 0.2/0.8 = 0.25 Mass moving from x to y: p(x) A(x➝y) = 0.2╳1 = 0.2 Mass moving from y to x: p(y) A(y➝x) = 0.8╳0.25 = 0.2
BEAST will produce samples from:
λ – coalescent model
– phylogeny
τ µ – mutation model
Use a ‘skyline’ demographic model N1 N2 N3 N4
Use a ‘skyline’ demographic model N1 N2 N3 N4
Practical part 1
Estimating R0 from timeseries data
50 100 150 200 250 300 350 0.1 1 10 100 1000 Days Individuals
r = 0.20 per day for 1918 influenza
r(0) = β − γ
We know the approximate recovery rate We can solve for β and hence R0
γ ≈ 0.25 β = r + γ ≈ 0.45 R0 = β γ ≈ 0.45 0.25 ≈ 1.8
Growth rate of pandemic H1N1
Ê Ê Ê Ê Ê Ê Ê Ê Ê Ê Ê
Mar Apr May 1 10 100 1000 Laboratory confirmed cases
r = 0.11 per day β = 0.11 + 0.33 = 0.44 per day R0 = 0.44 / 0.33 = 1.33
Generation time τ of infection
At the beginning of the epidemic, new infections emerge at rate β.
50 100 150 200 250 300 350 0.1 1 10 100 1000 Days Individuals
Final susceptible fraction:
τ = 1 2βS(0) = 1 2 × 0.36 = 1.39 S(∞) = e−R0(1−S(∞))
At the end of the epidemic: τ =
1 2βS(∞) = 1 2 × 0.36 × 0.84 = 1.65
0.0 0.2 0.4 0.6 0.8 1.0 0.007 0.008 0.009 0.010 Time Τ
Ne = 7.2 years Ne = 1050 infections (duration of infection of 5 days) N = 70 million infections (prevalence) Off by a factor of 6,700 Ne = 124.6 years Ne = 8270 infections (duration of infection of 11 days) N = 0.9 million infections (prevalence) Off by a factor of 110
Effective population sizes of flu vs measles
1970 1980 1990 2000 2010 1950 1960 1970 1980 1990 2000 2010
Influenza A (H3N2) Measles
Practical part 2
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Time Probability in state X