Stochastic reaction-diffusion modelling

Introduction

There is a fairly large literature on reaction-diffusion modelling using partial differential equations (PDEs). There is also a fairly large literature on stochastic modelling of coupled chemical reactions, which account for the discreteness of reacting species at low concentrations. There is some literature on combining the two, to form stochastic reaction-diffusion systems, but much less.

In this post we will look at one approach to the stochastic reaction-diffusion problem, based on an underlying stochastic process often described by the reaction diffusion master equation (RDME). We will start by generating exact realisations from this process using the spatial Gillespie algorithm, before switching to a continuous stochastic approximation often known as the spatial chemical Langevin equation (spatial CLE). For fine discretisations, this spatial CLE is just an explicit numerical scheme for an associated reaction-diffusion stochastic partial differential equation (SPDE), and we can easily contrast such SPDE dynamics with their deterministic PDE approximation. We will investigate using simulation, based on my Scala library, scala-smfsb, which accompanies the third edition of my textbook, Stochastic modelling for systems biology, as discussed in previous posts.

All of the code used to generate the plots and movies in this post is available in my blog repo, and is very simple to build and run.

The Lotka-Volterra reaction network

Exact simulation from the RDME

My favourite toy coupled chemical reaction network is the Lotka-Volterra predator-prey system, presented as the three reactions

X \longrightarrow 2X
X + Y \longrightarrow 2Y
Y \longrightarrow \emptyset

with X representing the prey species and Y the predator. I showed how to simulate realisations from this process using the Scala library in the previous post. Here we will consider simulation of this model in 2d, and simulate exact realisation from the appropriate RDME using the spatial Gillespie algorithm. Full runnable code for this simulation is here, but the key lines are:

val r = 100; val c = 120
val model = SpnModels.lv[IntState]()
val step = Spatial.gillespie2d(model, DenseVector(0.6, 0.6), maxH=1e12)
val x00 = DenseVector(0, 0)
val x0 = DenseVector(50, 100)
val xx00 = PMatrix(r, c, Vector.fill(r*c)(x00))
val xx0 = xx00.updated(c/2, r/2, x0)
val s = Stream.iterate(xx0)(step(_,0.0,0.1))

which sets up an infinite lazy Stream of states on a 100×120 grid over time steps of 0.1 units with diffusion rates of 0.6 for both species. We can then map this to a stream of images and visualise it using my scala-view library (described in this post). Running gives the following output:

Movie

The above image is the final frame of a movie which can be viewed by clicking on the image. In the simulation, blue represents the prey species, X, and red represents the predator, Y. The simulation is initialised with a few prey and predators in the central pixel. At each time step of the simulation, either a reaction or a diffusion event may occur. If diffusion occurs, an individual moves from its current location to one of the four adjacent pixels. This algorithm is extremely computationally intensive, however well it is implemented. The implementation used here (using the function Spatial.gillespie2d in the scala-smfsb library) is quite inefficient. A more efficient implementation would use the next subvolume method or similar algorithm. But since every reaction event is simulated sequentially, this algorithm is always going to be intolerably slow for most interesting problems.

The spatial CLE

The spatial CLE effectively approximates the true RDME dynamics with a set of coupled stochastic differential equations (SDEs) on the spatial grid. This can be interpreted as an explicit scheme for numerically integrating an SPDE. But this numerical scheme is much more efficient, allowing sensible time-stepping of the process, and vectorises and parallelises nicely. The details are in my book, but the Scala implementation is here. Diffusion is implemented efficiently and in parallel using the comonadic approach that I’ve described previously. We can quickly and easily generate large simulations using the spatial CLE. Here is a movie generated on a 250×300 grid.

Movie

Again, clicking on the frame should give the movie. We see that although the quantitative details are slightly different to the exact algorithm, the essential qualitative behaviour of the system is captured well by the spatial CLE. Full code for this simulation is here.

Reaction-diffusion PDE

If we remove all of the noise terms from the spatial CLE, we get a set of coupled ODEs, which again, may be interpreted as a numerical scheme for integrating a reaction-diffusion PDE model. Below are the dynamics on the same 250×300 grid.

Movie

It seems a bit harsh to describe a reaction-diffusion PDE as “boring”, but it certainly isn’t as interesting as the stochastic dynamics. Also, it has qualitatively quite different behaviour to the stochastic models, with wavefronts being less pronounced and less well separated. The code for this one is here.

Other initialisations

Instead of just seeding the simulation with some individuals in the central pixel, we can initialise 3 pixels. We can look first at a spatial CLE simulation.

Movie

Code here.

We can look at the same problem, but now using a PDE.

Movie

Code here.

Alternatively, we can initialise every pixel independently with random numbers of predator and prey. A movie for this is given below, following a short warm-up.

Movie

Code here.

Again, we can look at the corresponding deterministic integration.

Movie

Code here.

The SIR model

Let’s now turn attention to a spatial epidemic process model, the spatial susceptible-infectious-recovered model. Again, we’ll start from the discrete reaction formulation.

S + I \longrightarrow 2I
I \longrightarrow R

I’ll add this model to the next release of scala-smfsb, but in the meantime we can easily define it ourselves with:

def sir[S: State](p: DenseVector[Double] = DenseVector(0.1, 0.5)): Spn[S] =
  UnmarkedSpn[S](
    List("S", "I", "R"),
    DenseMatrix((1, 1, 0), (0, 1, 0)),
    DenseMatrix((0, 2, 0), (0, 0, 1)),
    (x, t) => {
      val xd = x.toDvd
      DenseVector(
        xd(0) * xd(1) * p(0), xd(1) * p(1)
      )}
  )

We can seed a simulation with a few infectious individuals in the centre of a roughly homogeneous population of susceptibles.

Spatial CLE

This time we’ll skip the exact simulation, since it’s very slow, and go straight to the spatial CLE. A simulation on a 250×300 grid is given below.

Movie

Here, green represents S, red I and blue R. In this simulation, I diffuses more slowly than S, and R doesn’t diffuse at all.
Code here.

PDE model

If we ditch the noise to get a reaction-diffusion PDE model, the dynamics are as follows.

Movie

Again, we see that the deterministic model is quite different to the stochastic version, and kind-of boring. Code here.

Further information

All of the code used to generate the plots and movies in this post is available in an easily runnable form in my blog repo. It is very easy to adapt the examples to vary parameters and initial conditions, and to study other reaction systems. Further details relating to stochastic reaction-diffusion modelling based on the RDME can be found in Chapter 9 of my textbook, Stochastic modelling for systems biology, third edition.

The scala-smfsb library

In the previous post I gave a very quick introduction to the smfsb R package. As mentioned in that post, although good for teaching and learning, R isn’t a great language for serious scientific computing or computational statistics. So for the publication of the third edition of my textbook, Stochastic modelling for systems biology, I have created a library in the Scala programming language replicating the functionality provided by the R package. Here I will give a very quick introduction to the scala-smfsb library. Some familiarity with both Scala and the smfsb R package will be helpful, but is not strictly necessary. Note that the library relies on the Scala Breeze library for linear algebra and probability distributions, so some familiarity with that library can also be helpful.

Setup

To follow the along you need to have Sbt installed, and this in turn requires a recent JDK. If you are new to Scala, you may find the setup page for my Scala course to be useful, but note that on many Linux systems it can be as simple as installing the packages openjdk-8-jdk and sbt.

Once you have Sbt installed, you should be able to run it by entering sbt at your OS command line. You now need to use Sbt to create a Scala REPL with a dependency on the scala-smfsb library. There are many ways to do this, but if you are new to Scala, the simplest way is probably to start up Sbt from an empty or temporary directory (which doesn’t contain any Scala code), and then paste the following into the Sbt prompt:

set libraryDependencies += "com.github.darrenjw" %% "scala-smfsb" % "0.6"
set libraryDependencies += "org.scalanlp" %% "breeze-viz" % "0.13.2"
set scalaVersion := "2.12.6"
set scalacOptions += "-Yrepl-class-based"
console

The first time you run this it will take a little while to download and cache various library dependencies. But everything is cached, so it should be much quicker in future. When it is finished, you should have a Scala REPL ready to enter Scala code.

An introduction to scala-smfsb

It should be possible to type or copy-and-paste the commands below one-at-a-time into the Scala REPL. We need to start with a few imports.

import smfsb._
import breeze.linalg.{Vector => BVec, _}
import breeze.numerics._
import breeze.plot._

Note that I’ve renamed Breeze’s Vector type to BVec to avoid clashing with that in the Scala standard library. We are now ready to go.

Simulating models

Let’s begin by instantiating a Lotka-Volterra model, simulating a single realisation of the process, and then plotting it.

// Simulate LV with Gillespie
val model = SpnModels.lv[IntState]()
val step = Step.gillespie(model)
val ts = Sim.ts(DenseVector(50, 100), 0.0, 20.0, 0.05, step)
Sim.plotTs(ts, "Gillespie simulation of LV model with default parameters")

The library comes with a few other models. There’s a Michaelis-Menten enzyme kinetics model:

// Simulate other models with Gillespie
val stepMM = Step.gillespie(SpnModels.mm[IntState]())
val tsMM = Sim.ts(DenseVector(301,120,0,0), 0.0, 100.0, 0.5, stepMM)
Sim.plotTs(tsMM, "Gillespie simulation of the MM model")

and an auto-regulatory genetic network model, for example.

val stepAR = Step.gillespie(SpnModels.ar[IntState]())
val tsAR = Sim.ts(DenseVector(10, 0, 0, 0, 0), 0.0, 500.0, 0.5, stepAR)
Sim.plotTs(tsAR, "Gillespie simulation of the AR model")

If you know the book and/or the R package, these models should all be familiar.
We are not restricted to exact stochastic simulation using the Gillespie algorithm. We can use an approximate Poisson time-stepping algorithm.

// Simulate LV with other algorithms
val stepPts = Step.pts(model)
val tsPts = Sim.ts(DenseVector(50, 100), 0.0, 20.0, 0.05, stepPts)
Sim.plotTs(tsPts, "Poisson time-step simulation of the LV model")

Alternatively, we can instantiate the example models using a continuous state rather than a discrete state, and then simulate using algorithms based on continous approximations, such as Euler-Maruyama simulation of a chemical Langevin equation (CLE) approximation.

val stepCle = Step.cle(SpnModels.lv[DoubleState]())
val tsCle = Sim.ts(DenseVector(50.0, 100.0), 0.0, 20.0, 0.05, stepCle)
Sim.plotTs(tsCle, "Euler-Maruyama/CLE simulation of the LV model")

If we want to ignore noise temporarily, there’s also a simple continuous deterministic Euler integrator built-in.

val stepE = Step.euler(SpnModels.lv[DoubleState]())
val tsE = Sim.ts(DenseVector(50.0, 100.0), 0.0, 20.0, 0.05, stepE)
Sim.plotTs(tsE, "Continuous-deterministic Euler simulation of the LV model")

Spatial stochastic reaction-diffusion simulation

We can do 1d reaction-diffusion simulation with something like:

val N = 50; val T = 40.0
val model = SpnModels.lv[IntState]()
val step = Spatial.gillespie1d(model,DenseVector(0.8, 0.8))
val x00 = DenseVector(0, 0)
val x0 = DenseVector(50, 100)
val xx00 = Vector.fill(N)(x00)
val xx0 = xx00.updated(N/2,x0)
val output = Sim.ts(xx0, 0.0, T, 0.2, step)
Spatial.plotTs1d(output)

For 2d simulation, we use PMatrix, a comonadic matrix/image type defined within the library, with parallelised map and coflatMap (cobind) operations. See my post on comonads for scientific computing for further details on the concepts underpinning this, though note that it isn’t necessary to understand comonads to use the library.

val r = 20; val c = 30
val model = SpnModels.lv[DoubleState]()
val step = Spatial.cle2d(model, DenseVector(0.6, 0.6), 0.05)
val x00 = DenseVector(0.0, 0.0)
val x0 = DenseVector(50.0, 100.0)
val xx00 = PMatrix(r, c, Vector.fill(r*c)(x00))
val xx0 = xx00.updated(c/2, r/2, x0)
val output = step(xx0, 0.0, 8.0)
val f = Figure("2d LV reaction-diffusion simulation")
val p0 = f.subplot(2, 1, 0)
p0 += image(PMatrix.toBDM(output map (_.data(0))))
val p1 = f.subplot(2, 1, 1)
p1 += image(PMatrix.toBDM(output map (_.data(1))))

Bayesian parameter inference

The library also includes functions for carrying out parameter inference for stochastic dynamical systems models, using particle MCMC, ABC and ABC-SMC. See the examples directory for further details.

Next steps

Having worked through this post, the next step is to work through the tutorial. There is some overlap of content with this blog post, but the tutorial goes into more detail regarding the basics. It also finishes with suggestions for how to proceed further.

Source

This post started out as a tut document (the Scala equivalent of an RMarkdown document). The source can be found here.

Stochastic Modelling for Systems Biology, second edition

The second edition of my textbook, Stochastic Modelling for Systems Biology was published on 7th November, 2011. One of the new features introduced into the new edition is an R package called smfsb which contains all of the code examples discussed in the text, which allow modelling, simulation and inference for stochastic kinetic models. The smfsb R package is the main topic of this post, but it seems appropriate to start off the post with a quick introduction to the book, and the main new features of the second edition.

The first edition was published in April 2006. It provided an introduction to mathematical modelling for systems biology from a stochastic viewpoint. It began with an introduction to biochemical network modelling, then moved on to probability theory, stochastic simulation and Markov processes. After providing all of the necessary background material, the book then introduced the theory of stochastic kinetic modelling and the Gillespie algorithm for exact discrete stochastic event simulation of stochastic kinetic biochemical network models. This was followed by examples and case studies, advanced simulation algorithms, and then a brief introduction to Bayesian inference and its application to inference for stochastic kinetic models.

The first edition proved to be very popular, as it was the first self-contained introduction to the field, and was aimed at an audience without a strong quantitative background. The decision to target an applied audience meant that it contained only the bare essentials necessary to get started with stochastic modelling in systems biology. The second edition was therefore an opportunity not only to revise and update the existing material, but also to add in additional material, especially new material which could provide a more solid foundation for advanced study by students with a more mathematical focus. New material introduced into the second edition includes a greatly expanded chapter on Markov processes, with particular emphasis on diffusion processes and stochastic differential equations, as well as Kolmogorov equations, the Fokker-Planck equation (FPE), Kurtz’s random time change representation of a stochastic kinetic model, an additional derivation of the chemical Langevin equation (CLE), and a derivation of the linear noise approximation (LNA). There is now also discussion of the modelling of “extrinsic” in addition to “intrinsic” noise. The final chapters on inference have also been greatly expanded, including discussion of importance resampling, particle filters, pseudo-marginal “exact approximate” MCMC, likelihood-free techniques and particle MCMC for rate parameter inference. I have tried as far as possible to maintain the informal and accessible style of the first edition, and a couple of the more technical new sections have been flagged as “skippable” by less mathematically trained students. In terms of computing, all of the SBML models have been updated to the new Level 3 specification, and all of the R code has been re-written, extended, documented and packaged as an open source R package. The rest of this post is an introduction to the R package. Although the R package is aimed mainly at owners of the second edition, it is well documented, and should therefore be usable by anyone with a reasonable background knowledge of the area. In particular, the R package should be very easy to use for anyone familiar with the first edition of the book. The introduction given here is closely based on the introductory vignette included with the package.

smfsb: an R package for simulation and inference in stochastic kinetic models

Overview

The smfsb package provides all of the R code associated with the book, Wilkinson (2011). Almost all of the code is pure R code, intended to be inspected from the R command line. In order to keep the code short, clean and easily understood, there is almost no argument checking or other boilerplate code.

Installation

The package is available from CRAN, and it should therefore be possible to install from the R command prompt using

install.packages("smfsb")

from any machine with an internet connection.

The package is being maintained on R-Forge, and so it should always be possible to install the very latest nightly build from the R command prompt with

install.packages("smfsb",repos="http://r-forge.r-project.org")

but you should only do this if you have a good reason to, in order not to overload the R-Forge servers (not that I imagine downloads of this package are likely to overload the servers…).

Once installed, the package can be loaded ready for use with

library(smfsb)

Accessing documentation

I have tried to ensure that the package and all associated functions and datasets are properly documented with runnable examples. So,

help(package="smfsb")

will give a brief overview of the package and a complete list of all functions. The list of vignettes associated with the package can be obtained with

vignette(package="smfsb")

At the time of writing, the introductory vignette is the only one available, and can be accessed from the R command line with

vignette("smfsb",package="smfsb")

Help on functions can be obtained using the usual R mechanisms. For example, help on the function StepGillespie can be obtained with

?StepGillespie

and the associated example can be run with

example(StepGillespie)

The sourcecode for the function can be obtained by typing StepGillespie on a line by itself. In this case, it returns the following R code:

function (N) 
{
    S = t(N$Post - N$Pre)
    v = ncol(S)
    return(function(x0, t0, deltat, ...) {
        t = t0
        x = x0
        termt = t0 + deltat
        repeat {
            h = N$h(x, t, ...)
            h0 = sum(h)
            if (h0 < 1e-10)
                t = 1e+99 
            else if (h0 > 1e+06) {
                t = 1e+99
                warning("Hazard too big - terminating simulation!")
            } 
            else 
                t = t + rexp(1, h0)
            if (t >= termt) 
                return(x)
            j = sample(v, 1, prob = h)
            x = x + S[, j]
        }
    })
}

A list of demos associated with the package can be obtained with

demo(package="smfsb")

A list of data sets associated with the package can be obtained with

data(package="smfsb")

For example, the small table, mytable from the introduction to R in Chapter 4 can by loaded with

data(mytable)

After running this command, the data frame mytable will be accessible, and can be examined by typing

mytable

at the R command prompt.

Simulation of stochastic kinetic models

The main purpose of this package is to provide a collection of tools for building and simulating stochastic kinetic models. This can be illustrated using a simple Lotka-Volterra predator-prey system. First, consider the prey, X_1 and the predator X_2 as a stochastic network, viz

R_1:\quad X_1 \longrightarrow 2 X_1
R_2:\quad X_1 + X_2\longrightarrow 2X_2
R_3:\quad X_2 \longrightarrow \emptyset.

The first “reaction” represents predator reproduction, the second predator-prey interaction and the third predator death. We can write the stoichiometries of the reactions, together with the rate (or hazard) of each reaction, in tabular form as

Reaction Pre Post Hazard
X_1 X_2 X_1 X_2 h()
R_1 1 0 2 0 \theta_1 x_1
R_2 1 1 0 2 \theta_2 x_1 x_2
R_3 0 1 0 0 \theta_3 x_2

This can be encoded in R as a stochastic Petri net (SPN) using

# SPN for the Lotka-Volterra system
LV=list()
LV$Pre=matrix(c(1,0,1,1,0,1),ncol=2,byrow=TRUE)
LV$Post=matrix(c(2,0,0,2,0,0),ncol=2,byrow=TRUE)
LV$h=function(x,t,th=c(th1=1,th2=0.005,th3=0.6))
{
 with(as.list(c(x,th)),{
         return(c(th1*x1, th2*x1*x2, th3*x2 ))
        })
}

This object could be created directly by executing

data(spnModels)

since the LV model is one of the standard demo models included with the package. Functions for simulating from the transition kernel of the Markov process defined by the SPN can be created easily by passing the SPN object into the appropriate constructor. For example, if simulation using the Gillespie algorithm is required, a simulation function can be created with

stepLV=StepGillespie(LV)

This resulting function (closure) can then be used to advance the state of the process. For example, to simulate the state of the process at time 1, given an initial condition of X_1=50, X_2=100 at time 0, use

stepLV(c(x1=50,x2=100),0,1)

Alternatively, to simulate a realisation of the process on a regular time grid over the interval [0,100] in steps of 0.1 time units, use

out = simTs(c(x1=50,x2=100),0,100,0.1,stepLV)
plot(out,plot.type="single",col=c(2,4))

which gives the resulting plot

See the help and runnable example for the function StepGillespie for further details, including some available alternative simulation algorithms, such as StepCLE.

Inference for stochastic kinetic models from time course data

Estimating the parameters of stochastic kinetic models using noisy time course measurements on some aspect of the system state is a very important problem. Wilkinson (2011) takes a Bayesian approach to the problem, using particle MCMC methodology. For this, a key aspect is the use of a particle filter to compute an unbiased estimate of marginal likelihood. This is accomplished using the function pfMLLik. Once a method is available for generating unbiased estimates for the marginal likelihood, this may be embedded into a fairly standard marginal Metropolis-Hastings algorithm for parameter estimation. See the help and runnable example for pfMLLik for further details, along with the particle MCMC demo, which can by run using demo(PMCMC). I’ll discuss more about particle MCMC and rate parameter inference in the next post.

References

  • Wilkinson, D. J. (2006) Stochastic Modelling for Systems Biology, Boca Raton, Florida: Chapman & Hall/CRC Press.
  • Wilkinson, D. J. (2011) Stochastic Modelling for Systems Biology, second edition, Boca Raton, Florida: Chapman & Hall/CRC Press.