Fitting Markovian binary trees using global and individual - - PowerPoint PPT Presentation

fitting markovian binary trees using global and
SMART_READER_LITE
LIVE PREVIEW

Fitting Markovian binary trees using global and individual - - PowerPoint PPT Presentation

Motivation in population biology The MBT model Global data Individual data Numerical results Fitting Markovian binary trees using global and individual population data Sophie Hautphenne * joint work with Melanie Massaro and Kate Turner *The


slide-1
SLIDE 1

Motivation in population biology The MBT model Global data Individual data Numerical results

Fitting Markovian binary trees using global and individual population data

Sophie Hautphenne*

joint work with Melanie Massaro and Kate Turner

*The University of Melbourne & EPFL

MAM9 Budapest, 28–30 June 2016

1

slide-2
SLIDE 2

Motivation in population biology The MBT model Global data Individual data Numerical results

1

Motivation in population biology

2

The MBT model

3

Global data

4

Individual data

5

Numerical results

2

slide-3
SLIDE 3

Motivation in population biology The MBT model Global data Individual data Numerical results

Motivation : Modelling the black robin population

The Chatham Islands

Picture: Melanie Massaro

3

slide-4
SLIDE 4

Motivation in population biology The MBT model Global data Individual data Numerical results

Saving the world’s most endangered bird

By 1980, the population of black robins had declined to five birds, including only a single successful breeding pair. In 1980-1989, intensive conservation efforts helped the population recover to 93 birds by 1990 1. In 1990-1998, the population was closely monitored, but without human intervention, and continued to grow rapidly to 197 adults by 1998. From 1999, the population growth slowed considerably and it only reached about 250 adults in 2014.

  • 1. D. Butler and D. Merton. The Black Robin : Saving the World’s Most

Endangered Bird. Oxford University Press, Auckland (1992)

4

slide-5
SLIDE 5

Motivation in population biology The MBT model Global data Individual data Numerical results

Motivation : Modelling the black robin population

We work in collaboration with field biologists who have been collecting raw data on the population for more than 30 years We want to fit branching processes to these data to make an age-specific demographic analysis of the population

5

slide-6
SLIDE 6

Motivation in population biology The MBT model Global data Individual data Numerical results

1

Motivation in population biology

2

The MBT model

3

Global data

4

Individual data

5

Numerical results

6

slide-7
SLIDE 7

Motivation in population biology The MBT model Global data Individual data Numerical results

Markovian binary trees : Trade-off between realism and tractability

Linear birth-and-death process : Lifetimes follow an exponential distribution Reproduction occurs according to a Poisson process Not realistic enough ! Bellman-Harris branching processes : Lifetimes follow an arbitrary distribution Reproduction occurs according to a more general process Not tractable enough ! We consider an intermediate type of branching process, called the Markovian binary tree (MBT), which is at the same time very general and tractable. In an MBT, individuals’ lifetime is structured into phases.

7

slide-8
SLIDE 8

Motivation in population biology The MBT model Global data Individual data Numerical results

Phase-structured lifetime

2 3 1 2 3 1 Parent Child 3 types of transitions:

  • ``Hidden’’ transitions
  • Birth
  • Death

8

slide-9
SLIDE 9

Motivation in population biology The MBT model Global data Individual data Numerical results

The individuals’ lifetime in an MBT

Lifetime controlled by an underlying Markov process with n transient phases and one absorbing phase ; αi i i → j (D0)ij di i → 0 i → k

✉ s

. . . j Bi,jk α : initial phase distribution (1 × n vector) ; D0 : hidden phase transition rates (n × n matrix) ; B : transition rates associated with a birth (n × n2 matrix) ; d : transition rates associated with the death (n × 1 vector). n phases → n3 + n2 + n − 1 parameters !

9

slide-10
SLIDE 10

Motivation in population biology The MBT model Global data Individual data Numerical results

Simple structure

2 3 1 Parent Child 2 3 1

µ2 λ1 ϒ1 µ1 µ3 ϒ2 λ3 λ2 µ2 ϒ1 µ1 µ3 ϒ2

Now, α1 = 1, (D0)i,i+1 = γi, Bi,1i = λi, di = µi n phases → 3n − 1 parameters γ1, . . . , γn−1, λ1, . . . , λn, µ1, . . . , µn

10

slide-11
SLIDE 11

Motivation in population biology The MBT model Global data Individual data Numerical results

Reproduction and lifetime in the simple MBT

In this MBT model, reproduction events occur according to a transient Markov modulated Poisson process, the lifetime of an individual has a Coxian phase-type distribution.

11

slide-12
SLIDE 12

Motivation in population biology The MBT model Global data Individual data Numerical results

Parameter fitting using population data

Aim : to fit the parameters of the MBT to different types of population data sets, distinguishing between

1 global population data : averaged age-specific fertility and

mortality rates over the whole population, or

2 individual population data : age-specific fertility and mortality

counts per individual. The estimation method depends on the precision of the available data. We aim at choosing the optimal number of phases n using some validation methods. Once an estimator is known for the parameters, we derive confidence intervals for the model outcomes.

12

slide-13
SLIDE 13

Motivation in population biology The MBT model Global data Individual data Numerical results

1

Motivation in population biology

2

The MBT model

3

Global data

4

Individual data

5

Numerical results

13

slide-14
SLIDE 14

Motivation in population biology The MBT model Global data Individual data Numerical results

Global population data and non-linear regression

Assume that we have estimates of the mean mortality and fertility rates at age x, denoted respectively by dx and bx, for ages x = 0, 1, . . . , M. These are our data points. Example : black robin data, close monitoring period 1990-2001

1 2 3 4 5 6 7 8 9 10 11 12 0.2 0.4 0.6 0.8 1 Age x Mortality rate, dx

1 2 3 4 5 6 7 8 9 10 11 12 0.5 1 1.5 2

Age x Fertility rate, bx 14

slide-15
SLIDE 15

Motivation in population biology The MBT model Global data Individual data Numerical results

Mean age-specific fertility and mortality rates

Let L be the lifetime of an individual, and D = D0+diag(λ). We can show that the expressions for the mean mortality and fertility rates at age x in the MBT model are respectively given by ¯ d(x) = P(x < L ≤ x + 1 | L > x) = αeDx(I − eD)1 αeDx1 ¯ β(x) = E(N([x, x + 1)) | L > x) = αeDx(I − eD)(−D)−1λ αeDx1 These functions are non-linear in both the input variable x and in the parameters.

15

slide-16
SLIDE 16

Motivation in population biology The MBT model Global data Individual data Numerical results

Weighted non-linear least square estimates

We want the same MBT model to fit both the age-specific fertility and mortality data. The 3n − 1 parameters are estimated by minimizing the sum of weighted squared errors F =

M

  • x=0
  • (dx − ¯

d(x))2 + (βx − ¯ β(x))2 Sx, where Sx = (1 − d0)(1 − d1) · · · (1 − dx−1) is the observed probability of survival until age x.

16

slide-17
SLIDE 17

Motivation in population biology The MBT model Global data Individual data Numerical results

1

Motivation in population biology

2

The MBT model

3

Global data

4

Individual data

5

Numerical results

17

slide-18
SLIDE 18

Motivation in population biology The MBT model Global data Individual data Numerical results

Individual population data and maximum likelihood

Instead of using the mean age-specific mortality and fertility rates, we can do much better by directly exploiting the individual data counts We organise these data into samples of i.i.d. individual life vectors v(i) = [0, −2, 0, 2, 6, 2, 2, 5, 0, −1, −1, −1, −1], 1 ≤ i ≤ N, recording life and reproduction data for each age class (allowing for missing information)

18

slide-19
SLIDE 19

Motivation in population biology The MBT model Global data Individual data Numerical results

Maximum likelihood estimation

The log-likelihood function is L(θ; v(1), . . . , v(N)) =

N

  • j=1

log p(v(j)|θ), (1) where θ = {α, D0, λ, d}, and p(v(j)|θ) is the probability of

  • bserving v(j), under the model parameter θ.

The probabilities p(v(j)|θ) can be decomposed into products involving matrices P(k) for 1 ≤ k ≤ K, where Pij(k) = P[N(1) = k, ϕ(1) = j | N(0) = 0, ϕ(0) = i], and K = maxi,j{v(j)

i

: 1 ≤ i, 1 ≤ j ≤ N}.

19

slide-20
SLIDE 20

Motivation in population biology The MBT model Global data Individual data Numerical results

Computing P(k) (I)

Pij(k, t) := P[N(t) = k, ϕ(t) = j|N(0) = 0, ϕ(0) = i] P(k, t) = (Pij(k, t))1≤i,j≤n P∗(z, t) :=

k≥0 P(k, t)zk,

P(k, t) = ∂kP∗(z, t) k!(∂z)k

  • z=0

Kolmogorov equations : ∂P∗(z, t)/∂t = (D0 + zD1)P∗(z, t) → P∗(z, t) = exp[(D0 + zD1)t].

20

slide-21
SLIDE 21

Motivation in population biology The MBT model Global data Individual data Numerical results

Computing P(k) (II)

Successive differentiations w.r. to z :

∂P∗(z, t)/∂t = (D0 + zD1)P∗(z, t) ∂2P∗(z, t)/(∂t)(∂z) = D1P∗(z, t) + (D0 + zD1)∂P∗(z, t)/∂z . . . ∂(K+1)P∗(z, t)/(∂t)(∂z)K = KD1∂(K−1)P∗(z, t)/(∂z)(K−1) +(D0 + zD1)∂K P∗(z, t)/(∂z)K .

Equivalent to ∂tY (z, t) = A(z)Y (z, t) with Yi(z, t) = ∂i−1P∗(z, t)/(∂z)i−1 1 ≤ i ≤ K + 1, A(z) =   

(D0 + zD1) D1 (D0 + zD1) 2D1 (D0 + zD1) ... KD1 (D0 + zD1)

  

21

slide-22
SLIDE 22

Motivation in population biology The MBT model Global data Individual data Numerical results

Computing P(k) (III)

We obtain P(k) = P(k, 1) = 1 k!(ek ⊗ I) eM (e⊤

1 ⊗ I)

with M =        D0 D1 D0 2D1 D0 ... KD1 D0        .

22

slide-23
SLIDE 23

Motivation in population biology The MBT model Global data Individual data Numerical results

Optimal number of phases (I) : Akaike Information Criterion

We compute AIC = 2p − 2L(ˆ θ; v(1), . . . , v(N)), where p = 3n − 1 is the number of parameters, for different values

  • f n, and choose the value of n which minimizes the AIC.

23

slide-24
SLIDE 24

Motivation in population biology The MBT model Global data Individual data Numerical results

Optimal number of phases (II) : Cross-validation method

We perform a K-fold cross validation over the data sample of individual life vectors (with K = 5). The idea is to randomly divide the data into K equal-sized parts. We leave out part k, fit the model to the other K − 1 parts (combined), and then evaluate the likelihood of the left-out kth part under the estimated parameters. This is done in turn for each part k = 1, 2, ...K, and then the results are averaged. We choose the model which maximizes this averaged value.

24

slide-25
SLIDE 25

Motivation in population biology The MBT model Global data Individual data Numerical results

Confidence intervals

When the real model is unknown, we estimate point-wise confidence intervals for the model outcomes in an empirical way, using bootstrapped datasets, or in a theoretical way, using the fact that for any function g(t, θ), g(t, ˆ θ) ∼ N(g(t, θ), ∇g(t, θ) J−1 ∇g(t, θ)⊤), where J = −∂2L(θ) ∂θ∂θ⊤

  • θ=ˆ

θ

is the observed information matrix.

25

slide-26
SLIDE 26

Motivation in population biology The MBT model Global data Individual data Numerical results

1

Motivation in population biology

2

The MBT model

3

Global data

4

Individual data

5

Numerical results

26

slide-27
SLIDE 27

Motivation in population biology The MBT model Global data Individual data Numerical results

Toy Example 1 : Comparison of model fits

Example with n = 3 phases. We simulated a dataset of N = 250 life vectors and recorded results for the first 15 age classes.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0.2 0.4 0.6 0.8 1 Age Mortality rate 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 2 3 4 5 6 Age Fertility rate Average age−specific rates Initial model Real model Estimation from individual data Estimation from global data Average age−specific rates Initial model Real model Estimation from individual data Estimation from global data

27

slide-28
SLIDE 28

Motivation in population biology The MBT model Global data Individual data Numerical results

Toy Example 1 : Confidence intervals (I)

Mean and 95% pointwise CIs of the model fits corresponding to 50 simulations from the real model Global population data Individual population data

5 10 15 0.2 0.25 0.3 0.35 0.4 Age Mortality rates 5 10 15 0.2 0.25 0.3 0.35 0.4 Age Mortality rates 5 10 15 1 2 3 4 5 6 Age Fertility rates 5 10 15 2.5 3 3.5 4 4.5 5 5.5 Age Fertility rates Real mortality Mean−based estimation Real mortality Individual−based estimation Real fertility Mean−based estimation Real fertility Individual−based estimation

28

slide-29
SLIDE 29

Motivation in population biology The MBT model Global data Individual data Numerical results

Toy Example 1 : Confidence intervals (II)

Mean and 95% pointwise CIs of the model fits corresponding to 50 bootstrapped datasets generated from the first dataset Global population data Individual population data

5 10 15 0.1 0.2 0.3 0.4 0.5 Age Mortality rate 5 10 15 0.1 0.2 0.3 0.4 0.5 Age Mortality rate 5 10 15 1 2 3 4 5 6 Age Fertility rate 5 10 15 2 3 4 5 6 Age Fertility rate Real mortality Mean−based estimation Real mortality Individual−based estimation Real fertility Mean−based estimation Real fertility Individual−based estimation

29

slide-30
SLIDE 30

Motivation in population biology The MBT model Global data Individual data Numerical results

Toy Example 1 : Confidence intervals (III)

Theoretical CI in the individual population data case

1 2 3 4 5 6 7 8 9 10 11 12 13 14 0.1 0.2 0.3 0.4 0.5 Age Mortality rate 1 2 3 4 5 6 7 8 9 10 11 12 13 14 2 3 4 5 6 Age Fertility rate Real model Individual−based estimated model CI lower bound CI upper bound

30

slide-31
SLIDE 31

Motivation in population biology The MBT model Global data Individual data Numerical results

Toy Example 2 : Akaike Information Criterion

Example with n = 4 phases. We simulated a dataset of N = 500 life vectors and recorded results for the first 25 age classes. Frequency of optimal n according to AIC based on 40 simulations

1 2 3 4 5 6 7 8 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 Optimal n according to AIC Frequency

31

slide-32
SLIDE 32

Motivation in population biology The MBT model Global data Individual data Numerical results

Toy Example 2 : Cross-validation

Frequency of optimal n according to CV based on 20 simulations

1 2 3 4 5 6 7 8 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Optimal n according to CV Frequency

32

slide-33
SLIDE 33

Motivation in population biology The MBT model Global data Individual data Numerical results

Black robin population : Optimal number of phases (I)

AIC optimal value : n = 8

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1000 1200 1400 1600 1800 2000

n

AIC Intensive management period 1989−90 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 4000 5000 6000 7000

n

AIC Close monitoring period 1990−2001 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1500 2000 2500 n AIC Monitoring period 2007−14 33

slide-34
SLIDE 34

Motivation in population biology The MBT model Global data Individual data Numerical results

Black robin population : Optimal number of phases (II)

Cross validation optimal value : n = 14, 11, 13

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 −250 −200 −150 −100 n Mean test likelihood Intensive management period 1989−90 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 −700 −600 −500 −400 n Mean test likelihood Close monitoring period 1990−2001 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 −300 −250 −200 −150 −100 n Mean test likelihood Monitoring period 2007−14 34

slide-35
SLIDE 35

Motivation in population biology The MBT model Global data Individual data Numerical results

Black robin population : Parameter estimation results (I)

Intensive management period 1980-89

1 2 3 4 5 6 7 8 9 10 11 12 0.5 1 Age Mortality rate Estimated age−specific mortality rates Individual−data estimated mortality 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 Age Fertility rate Estimated age−specific fertility rates Individual−data estimated fertility

35

slide-36
SLIDE 36

Motivation in population biology The MBT model Global data Individual data Numerical results

Black robin population : Parameter estimation results (II)

Close monitoring period 1990-2001

1 2 3 4 5 6 7 8 9 10 11 12 0.2 0.4 0.6 0.8 1 Age Mortality rate 1 2 3 4 5 6 7 8 9 10 11 12 0.5 1 1.5 2 Age Fertility rate Estimated age−specific mortality rates Individual−data estimated mortality Estimated age−specific fertility rates Individual−data estimated fertility

36

slide-37
SLIDE 37

Motivation in population biology The MBT model Global data Individual data Numerical results

Black robin population : Parameter estimation results (III)

Monitoring period 2007-2014

1 2 3 4 5 6 7 8 9 10 11 12 0.5 1 Age Mortality rate Estimated age−specific mortality rates Individual−data estimated mortality 1 2 3 4 5 6 7 8 9 10 11 12 0.5 1 1.5 Age Fertility rate Estimated age−specific fertility rates Individual−data estimated fertility

37

slide-38
SLIDE 38

Motivation in population biology The MBT model Global data Individual data Numerical results

Black robin population : Age-specific rates comparison

1 2 3 4 5 6 7 8 9 10 11 12 0.2 0.4 0.6 0.8 1 Age Mortality rate Intensive management period 1980 − 89 Monitored period 1990 − 2001 Monitored period 2007 − 2014 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 Age Fertility rate

38

slide-39
SLIDE 39

Motivation in population biology The MBT model Global data Individual data Numerical results

Black robin population : Extinction probability

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0.4 0.5 0.6 0.7 0.8 0.9 1 Age Extinction probability Intensive management period 1980 − 89 Monitored period 1990 − 2001 Monitored period 2007 − 14

39

slide-40
SLIDE 40

Motivation in population biology The MBT model Global data Individual data Numerical results

Thank you for your attention !

40