Stochastic reaction-diffusion modelling

Introduction

There is a fairly large literature on reaction-diffusion modelling using partial differential equations (PDEs). There is also a fairly large literature on stochastic modelling of coupled chemical reactions, which account for the discreteness of reacting species at low concentrations. There is some literature on combining the two, to form stochastic reaction-diffusion systems, but much less.

In this post we will look at one approach to the stochastic reaction-diffusion problem, based on an underlying stochastic process often described by the reaction diffusion master equation (RDME). We will start by generating exact realisations from this process using the spatial Gillespie algorithm, before switching to a continuous stochastic approximation often known as the spatial chemical Langevin equation (spatial CLE). For fine discretisations, this spatial CLE is just an explicit numerical scheme for an associated reaction-diffusion stochastic partial differential equation (SPDE), and we can easily contrast such SPDE dynamics with their deterministic PDE approximation. We will investigate using simulation, based on my Scala library, scala-smfsb, which accompanies the third edition of my textbook, Stochastic modelling for systems biology, as discussed in previous posts.

All of the code used to generate the plots and movies in this post is available in my blog repo, and is very simple to build and run.

The Lotka-Volterra reaction network

Exact simulation from the RDME

My favourite toy coupled chemical reaction network is the Lotka-Volterra predator-prey system, presented as the three reactions

X \longrightarrow 2X
X + Y \longrightarrow 2Y
Y \longrightarrow \emptyset

with X representing the prey species and Y the predator. I showed how to simulate realisations from this process using the Scala library in the previous post. Here we will consider simulation of this model in 2d, and simulate exact realisation from the appropriate RDME using the spatial Gillespie algorithm. Full runnable code for this simulation is here, but the key lines are:

val r = 100; val c = 120
val model = SpnModels.lv[IntState]()
val step = Spatial.gillespie2d(model, DenseVector(0.6, 0.6), maxH=1e12)
val x00 = DenseVector(0, 0)
val x0 = DenseVector(50, 100)
val xx00 = PMatrix(r, c, Vector.fill(r*c)(x00))
val xx0 = xx00.updated(c/2, r/2, x0)
val s = Stream.iterate(xx0)(step(_,0.0,0.1))

which sets up an infinite lazy Stream of states on a 100×120 grid over time steps of 0.1 units with diffusion rates of 0.6 for both species. We can then map this to a stream of images and visualise it using my scala-view library (described in this post). Running gives the following output:

Movie

The above image is the final frame of a movie which can be viewed by clicking on the image. In the simulation, blue represents the prey species, X, and red represents the predator, Y. The simulation is initialised with a few prey and predators in the central pixel. At each time step of the simulation, either a reaction or a diffusion event may occur. If diffusion occurs, an individual moves from its current location to one of the four adjacent pixels. This algorithm is extremely computationally intensive, however well it is implemented. The implementation used here (using the function Spatial.gillespie2d in the scala-smfsb library) is quite inefficient. A more efficient implementation would use the next subvolume method or similar algorithm. But since every reaction event is simulated sequentially, this algorithm is always going to be intolerably slow for most interesting problems.

The spatial CLE

The spatial CLE effectively approximates the true RDME dynamics with a set of coupled stochastic differential equations (SDEs) on the spatial grid. This can be interpreted as an explicit scheme for numerically integrating an SPDE. But this numerical scheme is much more efficient, allowing sensible time-stepping of the process, and vectorises and parallelises nicely. The details are in my book, but the Scala implementation is here. Diffusion is implemented efficiently and in parallel using the comonadic approach that I’ve described previously. We can quickly and easily generate large simulations using the spatial CLE. Here is a movie generated on a 250×300 grid.

Movie

Again, clicking on the frame should give the movie. We see that although the quantitative details are slightly different to the exact algorithm, the essential qualitative behaviour of the system is captured well by the spatial CLE. Full code for this simulation is here.

Reaction-diffusion PDE

If we remove all of the noise terms from the spatial CLE, we get a set of coupled ODEs, which again, may be interpreted as a numerical scheme for integrating a reaction-diffusion PDE model. Below are the dynamics on the same 250×300 grid.

Movie

It seems a bit harsh to describe a reaction-diffusion PDE as “boring”, but it certainly isn’t as interesting as the stochastic dynamics. Also, it has qualitatively quite different behaviour to the stochastic models, with wavefronts being less pronounced and less well separated. The code for this one is here.

Other initialisations

Instead of just seeding the simulation with some individuals in the central pixel, we can initialise 3 pixels. We can look first at a spatial CLE simulation.

Movie

Code here.

We can look at the same problem, but now using a PDE.

Movie

Code here.

Alternatively, we can initialise every pixel independently with random numbers of predator and prey. A movie for this is given below, following a short warm-up.

Movie

Code here.

Again, we can look at the corresponding deterministic integration.

Movie

Code here.

The SIR model

Let’s now turn attention to a spatial epidemic process model, the spatial susceptible-infectious-recovered model. Again, we’ll start from the discrete reaction formulation.

S + I \longrightarrow 2I
I \longrightarrow R

I’ll add this model to the next release of scala-smfsb, but in the meantime we can easily define it ourselves with:

def sir[S: State](p: DenseVector[Double] = DenseVector(0.1, 0.5)): Spn[S] =
  UnmarkedSpn[S](
    List("S", "I", "R"),
    DenseMatrix((1, 1, 0), (0, 1, 0)),
    DenseMatrix((0, 2, 0), (0, 0, 1)),
    (x, t) => {
      val xd = x.toDvd
      DenseVector(
        xd(0) * xd(1) * p(0), xd(1) * p(1)
      )}
  )

We can seed a simulation with a few infectious individuals in the centre of a roughly homogeneous population of susceptibles.

Spatial CLE

This time we’ll skip the exact simulation, since it’s very slow, and go straight to the spatial CLE. A simulation on a 250×300 grid is given below.

Movie

Here, green represents S, red I and blue R. In this simulation, I diffuses more slowly than S, and R doesn’t diffuse at all.
Code here.

PDE model

If we ditch the noise to get a reaction-diffusion PDE model, the dynamics are as follows.

Movie

Again, we see that the deterministic model is quite different to the stochastic version, and kind-of boring. Code here.

Further information

All of the code used to generate the plots and movies in this post is available in an easily runnable form in my blog repo. It is very easy to adapt the examples to vary parameters and initial conditions, and to study other reaction systems. Further details relating to stochastic reaction-diffusion modelling based on the RDME can be found in Chapter 9 of my textbook, Stochastic modelling for systems biology, third edition.

The scala-smfsb library

In the previous post I gave a very quick introduction to the smfsb R package. As mentioned in that post, although good for teaching and learning, R isn’t a great language for serious scientific computing or computational statistics. So for the publication of the third edition of my textbook, Stochastic modelling for systems biology, I have created a library in the Scala programming language replicating the functionality provided by the R package. Here I will give a very quick introduction to the scala-smfsb library. Some familiarity with both Scala and the smfsb R package will be helpful, but is not strictly necessary. Note that the library relies on the Scala Breeze library for linear algebra and probability distributions, so some familiarity with that library can also be helpful.

Setup

To follow the along you need to have Sbt installed, and this in turn requires a recent JDK. If you are new to Scala, you may find the setup page for my Scala course to be useful, but note that on many Linux systems it can be as simple as installing the packages openjdk-8-jdk and sbt.

Once you have Sbt installed, you should be able to run it by entering sbt at your OS command line. You now need to use Sbt to create a Scala REPL with a dependency on the scala-smfsb library. There are many ways to do this, but if you are new to Scala, the simplest way is probably to start up Sbt from an empty or temporary directory (which doesn’t contain any Scala code), and then paste the following into the Sbt prompt:

set libraryDependencies += "com.github.darrenjw" %% "scala-smfsb" % "0.6"
set libraryDependencies += "org.scalanlp" %% "breeze-viz" % "0.13.2"
set scalaVersion := "2.12.6"
set scalacOptions += "-Yrepl-class-based"
console

The first time you run this it will take a little while to download and cache various library dependencies. But everything is cached, so it should be much quicker in future. When it is finished, you should have a Scala REPL ready to enter Scala code.

An introduction to scala-smfsb

It should be possible to type or copy-and-paste the commands below one-at-a-time into the Scala REPL. We need to start with a few imports.

import smfsb._
import breeze.linalg.{Vector => BVec, _}
import breeze.numerics._
import breeze.plot._

Note that I’ve renamed Breeze’s Vector type to BVec to avoid clashing with that in the Scala standard library. We are now ready to go.

Simulating models

Let’s begin by instantiating a Lotka-Volterra model, simulating a single realisation of the process, and then plotting it.

// Simulate LV with Gillespie
val model = SpnModels.lv[IntState]()
val step = Step.gillespie(model)
val ts = Sim.ts(DenseVector(50, 100), 0.0, 20.0, 0.05, step)
Sim.plotTs(ts, "Gillespie simulation of LV model with default parameters")

The library comes with a few other models. There’s a Michaelis-Menten enzyme kinetics model:

// Simulate other models with Gillespie
val stepMM = Step.gillespie(SpnModels.mm[IntState]())
val tsMM = Sim.ts(DenseVector(301,120,0,0), 0.0, 100.0, 0.5, stepMM)
Sim.plotTs(tsMM, "Gillespie simulation of the MM model")

and an auto-regulatory genetic network model, for example.

val stepAR = Step.gillespie(SpnModels.ar[IntState]())
val tsAR = Sim.ts(DenseVector(10, 0, 0, 0, 0), 0.0, 500.0, 0.5, stepAR)
Sim.plotTs(tsAR, "Gillespie simulation of the AR model")

If you know the book and/or the R package, these models should all be familiar.
We are not restricted to exact stochastic simulation using the Gillespie algorithm. We can use an approximate Poisson time-stepping algorithm.

// Simulate LV with other algorithms
val stepPts = Step.pts(model)
val tsPts = Sim.ts(DenseVector(50, 100), 0.0, 20.0, 0.05, stepPts)
Sim.plotTs(tsPts, "Poisson time-step simulation of the LV model")

Alternatively, we can instantiate the example models using a continuous state rather than a discrete state, and then simulate using algorithms based on continous approximations, such as Euler-Maruyama simulation of a chemical Langevin equation (CLE) approximation.

val stepCle = Step.cle(SpnModels.lv[DoubleState]())
val tsCle = Sim.ts(DenseVector(50.0, 100.0), 0.0, 20.0, 0.05, stepCle)
Sim.plotTs(tsCle, "Euler-Maruyama/CLE simulation of the LV model")

If we want to ignore noise temporarily, there’s also a simple continuous deterministic Euler integrator built-in.

val stepE = Step.euler(SpnModels.lv[DoubleState]())
val tsE = Sim.ts(DenseVector(50.0, 100.0), 0.0, 20.0, 0.05, stepE)
Sim.plotTs(tsE, "Continuous-deterministic Euler simulation of the LV model")

Spatial stochastic reaction-diffusion simulation

We can do 1d reaction-diffusion simulation with something like:

val N = 50; val T = 40.0
val model = SpnModels.lv[IntState]()
val step = Spatial.gillespie1d(model,DenseVector(0.8, 0.8))
val x00 = DenseVector(0, 0)
val x0 = DenseVector(50, 100)
val xx00 = Vector.fill(N)(x00)
val xx0 = xx00.updated(N/2,x0)
val output = Sim.ts(xx0, 0.0, T, 0.2, step)
Spatial.plotTs1d(output)

For 2d simulation, we use PMatrix, a comonadic matrix/image type defined within the library, with parallelised map and coflatMap (cobind) operations. See my post on comonads for scientific computing for further details on the concepts underpinning this, though note that it isn’t necessary to understand comonads to use the library.

val r = 20; val c = 30
val model = SpnModels.lv[DoubleState]()
val step = Spatial.cle2d(model, DenseVector(0.6, 0.6), 0.05)
val x00 = DenseVector(0.0, 0.0)
val x0 = DenseVector(50.0, 100.0)
val xx00 = PMatrix(r, c, Vector.fill(r*c)(x00))
val xx0 = xx00.updated(c/2, r/2, x0)
val output = step(xx0, 0.0, 8.0)
val f = Figure("2d LV reaction-diffusion simulation")
val p0 = f.subplot(2, 1, 0)
p0 += image(PMatrix.toBDM(output map (_.data(0))))
val p1 = f.subplot(2, 1, 1)
p1 += image(PMatrix.toBDM(output map (_.data(1))))

Bayesian parameter inference

The library also includes functions for carrying out parameter inference for stochastic dynamical systems models, using particle MCMC, ABC and ABC-SMC. See the examples directory for further details.

Next steps

Having worked through this post, the next step is to work through the tutorial. There is some overlap of content with this blog post, but the tutorial goes into more detail regarding the basics. It also finishes with suggestions for how to proceed further.

Source

This post started out as a tut document (the Scala equivalent of an RMarkdown document). The source can be found here.

The smfsb R package

Introduction

In the previous post I gave a brief introduction to the third edition of my textbook, Stochastic modelling for systems biology. The algorithms described in the book are illustrated by implementations in R. These implementations are collected together in an R package on CRAN called smfsb. This post will provide a brief introduction to the package and its capabilities.

Installation

The package is on CRAN – see the CRAN package page for details. So the simplest way to install it is to enter

install.packages("smfsb")

at the R command prompt. This will install the latest version that is on CRAN. Once installed, the package can be loaded with

library(smfsb)

The package is well-documented, so further information can be obtained with the usual R mechanisms, such as

vignette(package="smfsb")
vignette("smfsb")
help(package="smfsb")
?StepGillespie
example(StepCLE1D)

The version of the package on CRAN is almost certainly what you want. However, the package is developed on R-Forge – see the R-Forge project page for details. So the very latest version of the package can always be installed with

install.packages("smfsb", repos="http://R-Forge.R-project.org")

if you have a reason for wanting it.

A brief tutorial

The vignette gives a quick introduction the the library, which I don’t need to repeat verbatim here. If you are new to the package, I recommend working through that before continuing. Here I’ll concentrate on some of the new features associated with the third edition.

Simulating stochastic kinetic models

Much of the book is concerned with the simulation of stochastic kinetic models using exact and approximate algorithms. Although the primary focus of the text is the application to modelling of intra-cellular processes, the methods are also appropriate for population modelling of ecological and epidemic processes. For example, we can start by simulating a simple susceptible-infectious-recovered (SIR) disease epidemic model.

set.seed(2)
data(spnModels)

stepSIR = StepGillespie(SIR)
plot(simTs(SIR$M, 0, 8, 0.05, stepSIR),
  main="Exact simulation of the SIR model")

Exact simulation of the SIR epidemic model
The focus of the text is stochastic simulation of discrete models, so that is the obvious place to start. But there is also support for continuous deterministic simulation.

plot(simTs(SIR$M, 0, 8, 0.05, StepEulerSPN(SIR)),
  main="Euler simulation of the SIR model")

Euler simulation of the SIR model
My favourite toy population dynamics model is the Lotka-Volterra (LV) model, so I tend to use this frequently as a running example throughout the book. We can simulate this (exactly) as follows.

stepLV = StepGillespie(LV)
plot(simTs(LV$M, 0, 30, 0.2, stepLV),
  main="Exact simulation of the LV model")

Exact simulation of the Lotka-Volterra model

Stochastic reaction-diffusion modelling

The first two editions of the book were almost exclusively concerned with well-mixed systems, where spatial effects are ignorable. One of the main new features of the third edition is the inclusion of a new chapter on spatially extended systems. The focus is on models related to the reaction diffusion master equation (RDME) formulation, rather than individual particle-based simulations. For these models, space is typically divided into a regular grid of voxels, with reactions taking place as normal within each voxel, and additional reaction events included, corresponding to the diffusion of particles to adjacent voxels. So to specify such models, we just need an initial condition, a reaction model, and diffusion coefficients (one for each reacting species). So, we can carry out exact simulation of an RDME model for a 1D spatial domain as follows.

N=20; T=30
x0=matrix(0, nrow=2, ncol=N)
rownames(x0) = c("x1", "x2")
x0[,round(N/2)] = LV$M
stepLV1D = StepGillespie1D(LV, c(0.6, 0.6))
xx = simTs1D(x0, 0, T, 0.2, stepLV1D, verb=TRUE)
image(xx[1,,], main="Prey", xlab="Space", ylab="Time")

Discrete 1D simulation of the LV model

image(xx[2,,], main="Predator", xlab="Space", ylab="Time")

Discrete 1D simulation of the LV model
Exact simulation of discrete stochastic reaction diffusion systems is very expensive (and the reference implementation provided in the package is very inefficient), so we will often use diffusion approximations based on the CLE.

stepLV1DC = StepCLE1D(LV, c(0.6, 0.6))
xx = simTs1D(x0, 0, T, 0.2, stepLV1D)
image(xx[1,,], main="Prey", xlab="Space", ylab="Time")

Spatial CLE simulation of the 1D LV model

image(xx[2,,], main="Predator", xlab="Space", ylab="Time")

Spatial CLE simulation of the 1D LV model
We can think of this algorithm as an explicit numerical integration of the obvious SPDE approximation to the exact model.

The package also includes support for simulation of 2D systems. Again, we can use the Spatial CLE to speed things up.

m=70; n=50; T=10
data(spnModels)
x0=array(0, c(2,m,n))
dimnames(x0)[[1]]=c("x1", "x2")
x0[,round(m/2),round(n/2)] = LV$M
stepLV2D = StepCLE2D(LV, c(0.6,0.6), dt=0.05)
xx = simTs2D(x0, 0, T, 0.5, stepLV2D)
N = dim(xx)[4]
image(xx[1,,,N],main="Prey",xlab="x",ylab="y")

Spatial CLE simulation of the 2D LV model

image(xx[2,,,N],main="Predator",xlab="x",ylab="y")

Spatial CLE simulation of the 2D LV model

Bayesian parameter inference

Although much of the book is concerned with the problem of forward simulation, the final chapters are concerned with the inverse problem of estimating model parameters, such as reaction rate constants, from data. A computational Bayesian approach is adopted, with the main emphasis being placed on “likelihood free” methods, which rely on forward simulation to avoid explicit computation of sample path likelihoods. The second edition included some rudimentary code for a likelihood free particle marginal Metropolis-Hastings (PMMH) particle Markov chain Monte Carlo (pMCMC) algorithm. The third edition includes a more complete and improved implementation, in addition to approximate inference algorithms based on approximate Bayesian computation (ABC).

The key function underpinning the PMMH approach is pfMLLik, which computes an estimate of marginal model log-likelihood using a (bootstrap) particle filter. There is a new implementation of this function with the third edition. There is also a generic implementation of the Metropolis-Hastings algorithm, metropolisHastings, which can be combined with pfMLLik to create a PMMH algorithm. PMMH algorithms are very slow, but a full demo of how to use these functions for parameter inference is included in the package and can be run with

demo(PMCMC)

Simple rejection-based ABC methods are facilitated by the (very simple) function abcRun, which just samples from a prior and then carries out independent simulations in parallel before computing summary statistics. A simple illustration of the use of the function is given below.

data(LVdata)
rprior <- function() { exp(c(runif(1, -3, 3),runif(1,-8,-2),runif(1,-4,2))) }
rmodel <- function(th) { simTs(c(50,100), 0, 30, 2, stepLVc, th) }
sumStats <- identity
ssd = sumStats(LVperfect)
distance <- function(s) {
    diff = s - ssd
    sqrt(sum(diff*diff))
}
rdist <- function(th) { distance(sumStats(rmodel(th))) }
out = abcRun(10000, rprior, rdist)
q=quantile(out$dist, c(0.01, 0.05, 0.1))
print(q)
##       1%       5%      10% 
## 772.5546 845.8879 881.0573
accepted = out$param[out$dist < q[1],]
print(summary(accepted))
##        V1                V2                  V3         
##  Min.   :0.06498   Min.   :0.0004467   Min.   :0.01887  
##  1st Qu.:0.16159   1st Qu.:0.0012598   1st Qu.:0.04122  
##  Median :0.35750   Median :0.0023488   Median :0.14664  
##  Mean   :0.68565   Mean   :0.0046887   Mean   :0.36726  
##  3rd Qu.:0.86708   3rd Qu.:0.0057264   3rd Qu.:0.36870  
##  Max.   :4.76773   Max.   :0.0309364   Max.   :3.79220
print(summary(log(accepted)))
##        V1                V2               V3         
##  Min.   :-2.7337   Min.   :-7.714   Min.   :-3.9702  
##  1st Qu.:-1.8228   1st Qu.:-6.677   1st Qu.:-3.1888  
##  Median :-1.0286   Median :-6.054   Median :-1.9198  
##  Mean   :-0.8906   Mean   :-5.877   Mean   :-1.9649  
##  3rd Qu.:-0.1430   3rd Qu.:-5.163   3rd Qu.:-0.9978  
##  Max.   : 1.5619   Max.   :-3.476   Max.   : 1.3329

Naive rejection-based ABC algorithms are notoriously inefficient, so the library also includes an implementation of a more efficient, sequential version of ABC, often known as ABC-SMC, in the function abcSmc. This function requires specification of a perturbation kernel to “noise up” the particles at each algorithm sweep. Again, the implementation is parallel, using the parallel package to run the required simulations in parallel on multiple cores. A simple illustration of use is given below.

rprior <- function() { c(runif(1, -3, 3), runif(1, -8, -2), runif(1, -4, 2)) }
dprior <- function(x, ...) { dunif(x[1], -3, 3, ...) + 
    dunif(x[2], -8, -2, ...) + dunif(x[3], -4, 2, ...) }
rmodel <- function(th) { simTs(c(50,100), 0, 30, 2, stepLVc, exp(th)) }
rperturb <- function(th){th + rnorm(3, 0, 0.5)}
dperturb <- function(thNew, thOld, ...){sum(dnorm(thNew, thOld, 0.5, ...))}
sumStats <- identity
ssd = sumStats(LVperfect)
distance <- function(s) {
    diff = s - ssd
    sqrt(sum(diff*diff))
}
rdist <- function(th) { distance(sumStats(rmodel(th))) }
out = abcSmc(5000, rprior, dprior, rdist, rperturb,
    dperturb, verb=TRUE, steps=6, factor=5)
## 6 5 4 3 2 1
print(summary(out))
##        V1                V2               V3        
##  Min.   :-2.9961   Min.   :-7.988   Min.   :-3.999  
##  1st Qu.:-1.9001   1st Qu.:-6.786   1st Qu.:-3.428  
##  Median :-1.2571   Median :-6.167   Median :-2.433  
##  Mean   :-1.0789   Mean   :-6.014   Mean   :-2.196  
##  3rd Qu.:-0.2682   3rd Qu.:-5.261   3rd Qu.:-1.161  
##  Max.   : 2.1128   Max.   :-2.925   Max.   : 1.706

We can then plot some results with

hist(out[,1],30,main="log(c1)")

ABC-SMC posterior for the LV model

hist(out[,2],30,main="log(c2)")

ABC-SMC posterior for the LV model

hist(out[,3],30,main="log(c3)")

ABC-SMC posterior for the LV model

Although the inference methods are illustrated in the book in the context of parameter inference for stochastic kinetic models, their implementation is generic, and can be used with any appropriate parameter inference problem.

The smfsbSBML package

smfsbSBML is another R package associated with the third edition of the book. This package is not on CRAN due to its dependency on a package not on CRAN, and hence is slightly less straightforward to install. Follow the available installation instructions to install the package. Once installed, you should be able to load the package with

library(smfsbSBML)

This package provides a function for reading in SBML files and parsing them into the simulatable stochastic Petri net (SPN) objects used by the main smfsb R package. Examples of suitable SBML models are included in the main smfsb GitHub repo. An appropriate SBML model can be read and parsed with a command like:

model = sbml2spn("mySbmlModel.xml")

The resulting value, model is an SPN object which can be passed in to simulation functions such as StepGillespie for constructing stochastic simulation algorithms.

Other software

In addition to the above R packages, I also have some Python scripts for converting between SBML and the SBML-shorthand notation I use in the book. See the SBML-shorthand page for further details.

Although R is a convenient language for teaching and learning about stochastic simulation, it isn’t ideal for serious research-level scientific computing or computational statistics. So for the third edition of the book I have also developed scala-smfsb, a library written in the Scala programming language, which re-implements all of the models and algorithms from the third edition of the book in Scala, a fast, efficient, strongly-typed, compiled, functional programming language. I’ll give an introduction to this library in a subsequent post, but in the meantime, it is already well documented, so see the scala-smfsb repo for further details, including information on installation, getting started, a tutorial, examples, API docs, etc.

Source

This blog post started out as an RMarkdown document, the source of which can be found here.