Multilevel Monte Carlo for the continuous time Markov chain models - - PowerPoint PPT Presentation

multilevel monte carlo for the continuous time markov
SMART_READER_LITE
LIVE PREVIEW

Multilevel Monte Carlo for the continuous time Markov chain models - - PowerPoint PPT Presentation

Multilevel Monte Carlo for the continuous time Markov chain models arising in biology David F. Anderson anderson@math.wisc.edu Department of Mathematics University of Wisconsin - Madison Work is joint with Des Higham MCQMC 2012


slide-1
SLIDE 1

Multilevel Monte Carlo for the continuous time Markov chain models arising in biology

David F. Anderson∗

∗anderson@math.wisc.edu

Department of Mathematics University of Wisconsin - Madison

Work is joint with Des Higham MCQMC 2012 February 17th, 2012

slide-2
SLIDE 2

Overview

◮ Most common stochastic models of biochemical reaction systems are

continuous time Markov chains.

◮ Common examples include:

  • 1. Gene regulatory networks.
  • 2. Models of viral infection.
  • 3. General population models (epidemic, predator-prey, etc.)

◮ These models are becoming dominant in molecular biology.

Problem: extend multi-level Monte Carlo to this setting.

slide-3
SLIDE 3

Why?

  • 1. Why would the be useful?

◮ Short answer: the common simulation methods can be very computationally

expensive.

◮ Many reactions happening over time-frame of interest.

  • 2. Why is this non-trivial?

◮ In SDE case, we have a useful representation

X(t) = X(0) + t b(X(s))ds + t σ(X(s))dW(s) Zℓ(t) = Zℓ + t b(Zℓ ◦ η(s))ds + t σ(Zℓ ◦ η(s))dW(s).

◮ Two processes are coupled via Brownian motion W. ◮ Clear how to simulate two processes with different time steps. ◮ We need a pathwise representation to get a good coupling that:

2.1 Couples closely. 2.2 Easy to analyze.

slide-4
SLIDE 4

The Poisson process

A Poisson process, Y(·): (a) Let {ξi} be i.i.d. exponential random variables with parameter one. (b) Now, put points down on a line with spacing equal to the ξi: x x x x x x x x

ξ1

ξ2

← →

ξ3 · · · t

◮ Let Y(t) denote the number of points hit by time t. ◮ In the figure above, Y(t) = 6.

slide-5
SLIDE 5

The Poisson process

Let Y be a unit rate Poisson process and λ ≥ 0. Then Y(λt) is a Poisson process with parameter λ. x x x x x x x x

ξ1

ξ2

← →

ξ3 · · · λt There is no reason λ needs to be constant in time, in which case Yλ(t) ≡ Y t λ(s)ds

  • is an non-homogeneous Poisson process with intensity λ(t).
slide-6
SLIDE 6

Build up model: Random time change representation of Tom Kurtz

Consider the simple system A + B → C where one molecule each of A and B is being converted to one of C. Simple book-keeping: if X(t) = (XA(t), XB(t), XC(t))T gives the state at time t, X(t) = X(0) + R(t)   −1 −1 1   , where

◮ R(t) is the # of times the reaction has occurred by time t, and ◮ X(0) is the initial condition.

slide-7
SLIDE 7

Build up model: Random time change representation of Tom Kurtz

Assuming intensity or propensity of reaction is λ(X(s)) = κXA(s)XB(s), we can model R(t) = Y t κXA(s)XB(s)ds

  • where Y is a unit-rate Poisson point process.

Hence   XA(t) XB(t) XC(t)   ≡ X(t) = X(0) +   −1 −1 1   Y t κXA(s)XB(s)ds

  • .
slide-8
SLIDE 8

Build up model: Random time change representation of Tom Kurtz

  • Now consider a network of reactions involving d chemical species,

S1, . . . , Sd: νk1S1 + νk2S2 + · · · + νkdSd − → ν′

k1S1 + ν′ k2S2 + · · · + ν′ kdSd

Denote reaction vector as ζk = ν′

k − νk,

  • The intensity (or propensity) of kth reaction is λk : Zd

≥0 → R.

  • By analogy with before

X(t) = X(0) +

  • k

Yk t λk(X(s))ds

  • ζk,

Yk are independent, unit-rate Poisson processes.

slide-9
SLIDE 9

Example

Consider a model of gene transcription and translation: G

25

→ G + M, (Transcription) M

1000

→ M + P, (Translation) P + P

0.001

→ D, (Dimerization) M

0.1

→ ∅, (Degradation of mRNA) P

1

→ ∅ (Degradation of Protein). Then, if X = [XM, XP, XD]T, X(t) = X(0) + Y1 (25t)   1   + Y2

  • 1000

t XM(s)ds   1   + Y3

  • 0.001

t XP(s)(XP(s) − 1)ds   −2 1   + Y4

  • 0.1

t XM(s)ds   −1   + Y5

  • 1

t XP(s)ds   −1  

slide-10
SLIDE 10

Computing Expectations

Problem: Approximate Ef(X(T)) to some desired tolerance, ǫ > 0. Easy!

◮ Simulate the CTMC exactly, ◮ generate independent paths, X[i](t), use the unbiased estimator

  • µn = 1

n

n

  • i=1

f(X[i](t)).

◮ stop when desired confidence interval is ± ǫ.

Total computational complexity = (cost per path) × (# paths) = O(N × ǫ−2).

slide-11
SLIDE 11

Tau-leaping: Euler’s method

Recall: X(t) = X(0) +

  • k

Yk t λk(X(s))ds

  • ζk.

Tau-leaping is essentially an Euler approximation of t λk(X(s))ds: Z(h) = Z(0) +

  • k

Yk h λk(Z(s)) ds

  • ζk

≈ Z(0) +

  • k

Yk

  • λk(Z(0)) h
  • ζk

d

= Z(0) +

  • k

Poisson

  • λk(Z(0)) h
  • ζk.
slide-12
SLIDE 12

Euler’s method

Path-wise representation for Z(t) generated by Euler’s method is Z(t) = X(0) +

  • k

Yk t λk(Z ◦ η(s))ds

  • ζk,

where η(s) = tn if tn ≤ s < tn+1 = tn + h is a step function giving left endpoints of time discretization.

slide-13
SLIDE 13

Return to approximating Ef(X(T))

Let

  • µn = 1

n

n

  • i=1

f(ZL,[i](t)). We have MSE( µn) = (Bias)2 + (variance), If have an order one method Total computational complexity = (cost per path) × (# paths) = O(ǫ−1 × ǫ−2) = O(ǫ−3).

slide-14
SLIDE 14

Benefits/drawbacks

Benefits:

  • 1. Can drastically lower the computational complexity of a problem

Drawbacks:

  • 1. Convergence results usually give order of convergence. Can’t give a

precise hL.

  • 2. Tau-leaping has problems: what happens if you go negative?
  • 3. Gone away from an unbiased estimator.
slide-15
SLIDE 15

Multi-level Monte Carlo

Usual MLMC: Ef(X(t)) ≈ E[f(ZL(t)) − f(ZL−1(t))]+

L−1

  • ℓ=ℓ0+1

E[f(Zℓ(t))−f(Zℓ−1(t))]+Ef(Zℓ0(t)). In our setting: Ef(X(t)) = E[f(X(t)) − f(ZL(t))] +

L

  • ℓ=ℓ0+1

E[f(Zℓ(t)) − f(Zℓ−1(t))] + Ef(Zℓ0(t)). gives an unbiased estimator.

slide-16
SLIDE 16

Multi-level Monte Carlo: an unbiased estimator

Ef(X(t)) = E[f(X(t)) − f(ZL(t))] +

L

  • ℓ=ℓ0+1

E[f(Zℓ(t)) − f(Zℓ−1(t))] + Ef(Zℓ0(t)). Estimators:

  • QE

def

= 1 nE

nE

  • i=1

(f(X[i](T) − f(ZL,[i](T))),

  • Qℓ

def

= 1 nℓ

nℓ

  • i=1

(f(Zℓ,[i](T)) − f(Zℓ−1,[i](T))), for ℓ ∈ {ℓ0 + 1, . . . , L}

  • Q0

def

= 1 n0

n0

  • i=1

f(Zℓ0,[i](T)). Question: what is the coupling and the computational cost?

slide-17
SLIDE 17

How do we couple Poisson processes

Suppose I want to generate:

◮ A Poisson process with intensity 13.1. ◮ A Poisson process with intensity 13. ◮ We could let Y1 and Y2 be independent, unit-rate Poisson processes,

and set Z13.1(t) = Y1(13.1t), Z13(t) = Y2(13t), Using this representation, these processes are independent and, hence, not coupled. The variance of difference is large: Var(Z13.1(t) − Z13(t)) = Var(Y1(13.1t)) + Var(Y2(13t)) = 26.1t.

slide-18
SLIDE 18

How do we generate processes simultaneously

Suppose I want to generate:

◮ A Poisson process with intensity 13.1. ◮ A Poisson process with intensity 13. ◮ We could let Y1 and Y2 be independent unit-rate Poisson processes, and

set Z13.1(t) = Y1(13t) + Y2(0.1t) Z13(t) = Y1(13t), The variance of difference is much smaller: Var(Z13.1(t) − Z13(t)) = Var (Y2(0.1t)) = 0.1t.

slide-19
SLIDE 19

How do we generate processes simultaneously

More generally, suppose we want

  • 1. non-homogeneous Poisson process with intensity f(t) and
  • 2. non-homogeneous Poisson process with intensity g(t).

Let Y1, Y2, and Y3 be independent, unit-rate Poisson processes and define Zf(t) = Y1 t f(s) ∧ g(s)ds

  • + Y2

t f(s) − (f(s) ∧ g(s)) ds

  • ,

Zg(t) = Y1 t f(s) ∧ g(s)ds

  • + Y3

t g(s) − (f(s) ∧ g(s)) ds

  • ,
slide-20
SLIDE 20

Back to our processes

X(t) = X(0) +

  • k

Yk t λk(X(s))ds

  • ζk,

Z(t) = X(0) +

  • k

Yk t λk(Z ◦ η(s))ds

  • ζk.

Now couple X(t) = X(0) +

  • k

Yk,1 t λk(X(s)) ∧ λk(Zℓ ◦ ηℓ(s))ds

  • ζk

+

  • k

Yk,2 t λk(X(s)) − λk(X(s)) ∧ λk(Zℓ ◦ ηℓ(s))ds

  • ζk

Zℓ(t) = Zℓ(0) +

  • k

Yk,1 t λk(X(s)) ∧ λk(Zℓ ◦ ηℓ(s))ds

  • ζk

+

  • k

Yk,3 t λk(Zℓ ◦ ηℓ(s)) − λk(X(s)) ∧ λk(Zℓ ◦ ηℓ(s))ds

  • ζk

Algorithm for simulation is equivalent to simulating CTMC.

slide-21
SLIDE 21

For approximate processes

Zℓ(t) = Zℓ(0) +

  • k

Yk,1 t λk(Zℓ ◦ ηℓ(s)) ∧ λk(Zℓ−1 ◦ ηℓ−1(s))ds

  • ζk

+

  • k

Yk,2 t λk(Zℓ ◦ ηℓ(s)) − λk(Zℓ ◦ ηℓ(s)) ∧ λk(Zℓ−1 ◦ ηℓ−1(s))ds

  • ζk

Zℓ−1(t) = Zℓ−1(0) +

  • k

Yk,1 t λk(Zℓ ◦ ηℓ(s)) ∧ λk(Zℓ−1 ◦ ηℓ−1(s))ds

  • ζk

+

  • k

Yk,3 t λk(Zℓ−1 ◦ ηℓ−1(s)) − λk(Zℓ ◦ ηℓ(s)) ∧ λk(Zℓ−1 ◦ ηℓ−1(s))ds

  • ζk,

Algorithm for simulation is equivalent to Euler’s method. What about the variance and, hence, computational cost?

slide-22
SLIDE 22

Multi-level Monte Carlo: chemical kinetic setting

Can prove:

Theorem (Anderson, Higham 2011)

Suppose (X, Zℓ) satisfy coupling. Then, sup

t≤T

E|X(t) − Zℓ(t)|2 ≤ C1(T)N−ρhℓ + C2(T)h2

ℓ.

Theorem (Anderson, Higham 2011)

Suppose (Zℓ, Zℓ−1) satisfy coupling. Then, sup

t≤T

E|Zℓ(t) − Zℓ−1(t)|2 ≤ C1(T)N−ρhℓ + C2(T)h2

ℓ.

0David F

. Anderson and Desmond J. Higham, Multi-level Monte Carlo for stochastically modeled chemical kinetic systems. To appear in SIAM: Modeling and Simulation. Available at arxiv.org:1107.2181. Also at www.math.wisc.edu/˜anderson.

slide-23
SLIDE 23

Multi-level Monte Carlo: an unbiased estimator

For well chosen n0, nℓ, and nE. We have Var( Q) = Var   QE +

L

  • ℓ=ℓ0+1
  • Qℓ +

Q0   = O(ǫ2), with

  • Comp. cost =
  • ǫ−2(N−ρhL + h2

L)

  • N+ǫ−2
  • h−1

ℓ0 + ln(ǫ)2N−ρ + ln(ǫ−1)

1 M − 1hℓ0

slide-24
SLIDE 24

Multi-level Monte Carlo: an unbiased estimator

Some observations:

  • 1. Weak error plays no role in analysis: free to choose hL.
  • 2. Common problems associated with tau-leaping

◮ Negativity of species numbers,

does not matter. Just define approximate process in a sensible way.

  • 3. The method is unbiased.
slide-25
SLIDE 25

Example

Consider a model of gene transcription and translation: G

25

→ G + M, M

1000

→ M + P, P + P

0.001

→ D, M

0.1

→ ∅, P

1

→ ∅. Suppose:

  • 1. initialize with: G = 1, M = 0, P = 0, D = 0,
  • 2. want to estimate the expected number of dimers at time T = 1,
  • 3. to an accuracy of ± 1.0 with 95% confidence.
slide-26
SLIDE 26

Example

Method: Exact algorithm with crude Monte Carlo. Approximation # paths CPU Time # updates 3,714.2 ± 1.0 4,740,000 149,000 CPU S (41 hours!) 8.27 ×1010 Method: Euler tau-leaping with crude Monte Carlo. Step-size Approximation # paths CPU Time # updates h = 3−7 3,712.3 ± 1.0 4,750,000 13,374.6 S 6.2 × 1010 h = 3−6 3,707.5 ± 1.0 4,750,000 6,207.9 S 2.1 × 1010 h = 3−5 3,693.4 ± 1.0 4,700,000 2,803.9 S 6.9 × 109 h = 3−4 3,654.6 ± 1.0 4,650,000 1,219.0 S 2.6 × 109 Method: unbiased MLMC with ℓ0 = 2, and M and L detailed below. Step-size parameters Approx. CPU Time # updates M = 3, L = 6 3,713.9 ± 1.0 1,063.3 S 1.1 ×109 M = 3, L = 5 3,714.7 ± 1.0 1,114.9 S 9.4 ×108 M = 3, L = 4 3,714.2 ± 1.0 1,656.6 S 1.0 ×109 M = 4, L = 4 3714.2 ± 1.0 1,334.8 S 1.1 ×109 M = 4, L = 5 3,713.8 ± 1.0 1,014.9 S 1.1 ×109

◮ the exact algorithm with crude Monte Carlo demanded 140 times more

CPU time than our unbiased MLMC estimator!

slide-27
SLIDE 27

Example

Method: Exact algorithm with crude Monte Carlo. Approximation # paths CPU Time # updates 3,714.2 ± 1.0 4,740,000 149,000 CPU S (41 hours!) 8.27 ×1010 Unbiased Multi-level Monte Carlo with M = 3, L = 5, and ℓ0 = 2. Level # paths CPU Time

  • Var. estimator

# updates (X, Z3−5) 3,900 279.6 S 0.0658 6.8 ×107 (Z3−5, Z3−4) 30,000 49.0 S 0.0217 8.8 ×107 (Z3−4, Z3−3) 150,000 71.7 S 0.0179 1.5 ×108 (Z3−3, Z3−2) 510,000 112.3 S 0.0319 1.7 ×108 Tau-leap with h = 3−2 8,630,000 518.4 S 0.1192 4.7 ×108 Totals N.A. 1031.0 S 0.2565 9.5 ×108

slide-28
SLIDE 28

Some conclusions about this method

  • 1. Gillespie’s algorithm is by far the most common way to compute

expectations:

1.1 Means. 1.2 Variances. 1.3 Probabilities.

  • 2. The new method (MLMC) also performs this task with no bias (exact).
  • 3. Will commonly be many orders of magnitude faster than usual methods.
  • 4. Applicable to essentially all continuous time Markov chains:

X(t) = X(0) +

  • k

Yk t λk(X(s))ds

  • ζk.
  • 5. Makes no use of any specific structure or scaling in the problem.
slide-29
SLIDE 29

Another example: Viral infection

Let

  • 1. T = viral template.
  • 2. G = viral genome.
  • 3. S = viral structure.
  • 4. V = virus.

Reactions: R1) T+ stuff

κ1

→ T + G κ1 = 1 R2) G

κ2

→ T κ2 = 0.025 R3) T+ stuff

κ3

→ T + S κ3 = 1000 R4) T

κ4

→ ∅ κ4 = 0.25 R5) S

κ5

→ ∅ κ5 = 2 R6) G + S

κ6

→ V κ6 = 7.5 × 10−6

◮ R. Srivastava, L. You, J. Summers, and J. Yin, J. Theoret. Biol., 2002. ◮ E. Haseltine and J. Rawlings, J. Chem. Phys, 2002. ◮ K. Ball, T. Kurtz, L. Popovic, and G. Rempala, Annals of Applied Probability, 2006. ◮ W. E, D. Liu, and E. Vanden-Eijden, J. Comput. Phys, 2006.

slide-30
SLIDE 30

Another example: Viral infection

Stochastic equations for X = (XG, XS, XT, XV) are X1(t) = X1(0) + Y1 t X3(s)ds

  • − Y2
  • 0.025

t X1(s)ds

  • − Y6
  • 7.5 × 10−6

t X1(s)X2(s)ds

  • X2(t) = X2(0) + Y3
  • 1000

t X3(s)ds

  • − Y5
  • 2

t X2(s)ds

  • − Y6
  • 7.5 × 10−6

t X1(s)X2(s)ds

  • X3(t) = X3(0) + Y2
  • 0.025

t X1(s)ds

  • − Y4
  • 0.25

t X3(s)ds

  • X4(t) = X4(0) + Y6
  • 7.5 × 10−6

t X1(s)X2(s)ds

  • .
slide-31
SLIDE 31

Another example: Viral infection

Reactions: R1) T+ stuff

κ1

→ T + G κ1 = 1 R2) G

κ2

→ T κ2 = 0.025 R3) T+ stuff

κ3

→ T + S κ3 = 1000 R4) T

κ4

→ ∅ κ4 = 0.25 R5) S

κ5

→ ∅ κ5 = 2 R6) G + S

κ6

→ V κ6 = 7.5 × 10−6 If T > 0,

◮ reactions 3 and 5 are much faster than others. ◮ M/M/∞ queue =

⇒ S is approximately Poisson(500 × T). Can average out to get approximate process Z(t).

slide-32
SLIDE 32

Another example: Viral infection

Approximate process satisfies. Z1(t) = X1(0) + Y1 t Z3(s)ds

  • − Y2
  • 0.025

t Z1(s)ds

  • − Y6
  • 3.75 × 10−3

t Z1(s)Z3(s)ds

  • Z3(t) = X3(0) + Y2
  • 0.025

t Z1(s)ds

  • − Y4
  • 0.25

t Z3(s)ds

  • Z4(t) = X4(0) + Y6
  • 3.75 × 10−3

t Z1(s)Z3(s)ds

  • .

(1) Now use Ef(X(t)) = E[f(X(t)) − f(Z(t))] + Ef(Z(t)).

slide-33
SLIDE 33

Another example: Viral infection

X(t) = X(0) + Y1,1 t min{X3(s), Z3(s)}ds

  • ζ1 + Y1,2

t X3(s) − min{X3(s), Z3(s)}ds

  • ζ1

+ Y2,1

  • 0.025

t min{X1(s), Z1(s)}ds

  • ζ2 + Y2,2
  • 0.025

t X1(s) − min{X1(s), Z1(s)}ds

  • ζ2

+ Y3

  • 1000

t X3(s)ds

  • ζ3

+ Y4,1

  • 0.25

t min{X3(s), Z3(s)}(s)ds

  • ζ4 + Y4,2
  • 0.25

t X3(s) − min{X3(s), Z3(s)}(s)ds

  • ζ4

+ Y5

  • 2

t X2(s)ds

  • ζ5

+ Y6,1 t min{λ6(X(s)), Λ6(Z(s))}ds

  • ζ6 − Y6,2

t λ6(X(s)) − min{λ6(X(s)), Λ6(Z(s))}ds

  • ζ6

Z(t) = Y1,1 t min{X3(s), Z3(s)}ds

  • ζ1 + Y1,3

t Z3(s) − min{X3(s), Z3(s)}ds

  • ζ1

+ Y2,1

  • 0.025

t min{X1(s), Z1(s)}ds

  • ζ2 + Y2,3
  • 0.025

t Z1(s) − min{X1(s), Z1(s)}ds

  • ζ2

+ Y4,1

  • 0.25

t min{X3(s), Z3(s)}(s)ds

  • ζ4 + Y4,3
  • 0.25

t Z3(s) − min{X3(s), Z3(s)}(s)ds

  • ζ4

+ Y6,1 t min{λ6(X(s)), Λ6(Z(s))}ds

  • ζ6 − Y6,3

t Λ6(Z(s)) − min{λ6(X(s)), Λ6(Z(s))}ds

  • ζ6,
slide-34
SLIDE 34

Another example: Viral infection

Suppose want EXvirus(20) Given T(0) = 10, all others zero. Method: Exact algorithm with crude Monte Carlo. Approximation # paths CPU Time # updates 13.85 ± 0.07 75,000 24,800 CPU S 1.45 × 1010 Method: Ef(X(t)) = E[f(X(t)) − f(Z(t))] + Ef(Z(t)). Approximation CPU Time # updates 13.91 ± 0.07 1,118.5 CPU S 2.41 × 108 Exact + crude Monte Carlo used:

  • 1. 60 times more total steps.
  • 2. 22 times more CPU time.
slide-35
SLIDE 35

Thanks!

References:

  • 1. David F. Anderson and Desmond J. Higham, Multi-level Monte Carlo for

continuous time Markov chains, with applications in biochemical kinetics, to appear in SIAM: Multiscale Modeling and Simulation. Available at arXiv.org:1107.2181. Also on my website: www.math.wisc.edu/˜anderson. Funding:

  • 1. NSF Grant DMS-1009275.
  • 2. Thanks to Art Owen.