Foundations of Chemical Kinetics Lecture 25: The Gillespie - - PowerPoint PPT Presentation
Foundations of Chemical Kinetics Lecture 25: The Gillespie - - PowerPoint PPT Presentation
Foundations of Chemical Kinetics Lecture 25: The Gillespie stochastic simulation algorithm Marc R. Roussel Department of Chemistry and Biochemistry The chemical master equation vs simulations As noted earlier, the chemical master equation
The chemical master equation vs simulations
◮ As noted earlier, the chemical master equation (CME) is only
solvable in some special cases.
◮ Probability distributions are a bit abstract for most people. ◮ Simulations can be carried out for any reaction and they’re
- ften easier to understand.
◮ Tradeoff: You need to do a lot of simulations to generate
statistical data (probability distributions or averages, standard deviations, etc.).
The basic ideas behind Gillespie stochastic simulations
◮ In a simulation, we follow one possible set of reactive events
and the consequent changes in molecular populations. This is called a realization of a stochastic process.
◮ In the simulation, at any time t, we have a specific
composition and therefore specific values of the propensities.
◮ The propensities therefore become reaction probabilities per
unit time (not conditional probabilities as in the CME).
◮ Generate random reactive events consistent with CME
statistics.
◮ Random reaction time ◮ Randomly selected reaction ◮ Every time a reaction occurs, update the numbers of
molecules. This affects the reaction propensities.
Selecting a random reaction time
◮ Each propensity is the probability per unit time that a specific
reaction occurs.
◮ The probability per unit time that any reaction occurs is just
the sum of the propensities. a0 ≡
- r
ar
◮ The probability per unit time of a reaction occurring is
constant until a reaction occurs.
◮ A constant probability per unit time implies exponential decay
- f the probability that a reaction has not occurred yet.
Punreacted = e−a0(t−tref) where tref is some reference time (e.g. the time at which the last reaction occurred).
Selecting a random reaction time (continued)
◮ The cumulative distribution for the probability of reaction is
therefore Preacted = 1 − e−a0(t−tref)
◮ The distribution of reaction times is therefore
p(t) = dPreacted dt = a0e−a0(t−tref)
◮ Thus, we need to generate exponentially distributed reaction
times.
Selecting a random reaction time (continued)
◮ Most programmable computer packages (from spreadsheets to
full-blown programming languages) have a uniform random number generator that generates pseudo-random numbers in the range [0, 1).
◮ There is a standard trick for converting a uniform random
number r1 into an exponentially distributed random number: treact = 1 a0 ln 1 r1
Selecting a random reaction
◮ Suppose that we have just two reactions. If reaction 1 has
probability per unit time (propensity) of a1 and reaction 2 has probability per unit time of a2, then the probability that the next reaction to occur is reaction 1 is a1/(a1 + a2).
◮ In general, the probability that the next reaction is reaction r
is ar/a0.
◮ To pick a random reaction, ◮ “Line up” the fractions ar/a0 in the interval [0, 1]. ◮ Choose a random number, r2, between 0 and 1. ◮ Figure out in which reaction interval r2 falls.
a a a r2 1
...
a
2
a a
3
a a
r 1
Selecting a random reaction
a a a r2 1
...
a
2
a a
3
a a
r 1 ◮ Using r2 as a “pointer” to determine which reaction occurs
next is equivalent to finding the value of µ for which the following inequality is satisfied:
µ−1
- r=1
ar < r2a0 ≤
µ
- r=1
ar
◮ Alternatively, we are looking for the smallest µ for which
µ
- r=1
ar > r2a0
The Gillespie algorithm
- 1. Setup: Store initial populations and rate constants, set t = 0,
etc.
- 2. Calculate reaction propensities.
- 3. Generate two uniform random numbers, r1 and r2.
- 4. Calculate treact, the time to next reaction, using r1.
- 5. Determine the next reaction using r2.
- 6. Add treact to t.
- 7. Update the populations based on the reaction chosen.
- 8. Go to step 2 until some chosen stopping criterion is reached
(exhaustion of a chemical, target simulation time reached, . . . )
Statistics
◮ If we want the kind of information we can get from the CME,
we need to repeat the simulation many times (with different sequences of pseudo-random numbers each time), and then average across realizations.
◮ If we are interested in the stationary distribution, we can also
run one simulation for a very long time and get a time average.
◮ The two procedures described above are equivalent if a
stochastic system has the property of ergodicity. A Markov chain (Markov process on a set of discrete states) is ergodic if any state can be reached from any other state in some finite number of steps.
Statistics
◮ Whichever procedure we adopt, we need a lot of data to get
reasonable estimates of system statistics: Convergence of statistics goes as N−1/2, where N is the number of samples (time points or realizations). Adding a significant figure (decrease in uncertainty by a factor
- f 10) therefore requires that we increase the amount of data