The second edition of my textbook, Stochastic Modelling for Systems Biology was published on 7th November, 2011. One of the new features introduced into the new edition is an R package called `smfsb` which contains all of the code examples discussed in the text, which allow modelling, simulation and inference for stochastic kinetic models. The `smfsb` R package is the main topic of this post, but it seems appropriate to start off the post with a quick introduction to the book, and the main new features of the second edition.

The first edition was published in April 2006. It provided an introduction to mathematical modelling for systems biology from a stochastic viewpoint. It began with an introduction to biochemical network modelling, then moved on to probability theory, stochastic simulation and Markov processes. After providing all of the necessary background material, the book then introduced the theory of stochastic kinetic modelling and the Gillespie algorithm for exact discrete stochastic event simulation of stochastic kinetic biochemical network models. This was followed by examples and case studies, advanced simulation algorithms, and then a brief introduction to Bayesian inference and its application to inference for stochastic kinetic models.

The first edition proved to be very popular, as it was the first self-contained introduction to the field, and was aimed at an audience without a strong quantitative background. The decision to target an applied audience meant that it contained only the bare essentials necessary to get started with stochastic modelling in systems biology. The second edition was therefore an opportunity not only to revise and update the existing material, but also to add in additional material, especially new material which could provide a more solid foundation for advanced study by students with a more mathematical focus. New material introduced into the second edition includes a greatly expanded chapter on Markov processes, with particular emphasis on diffusion processes and stochastic differential equations, as well as Kolmogorov equations, the Fokker-Planck equation (FPE), Kurtz’s random time change representation of a stochastic kinetic model, an additional derivation of the chemical Langevin equation (CLE), and a derivation of the linear noise approximation (LNA). There is now also discussion of the modelling of “extrinsic” in addition to “intrinsic” noise. The final chapters on inference have also been greatly expanded, including discussion of importance resampling, particle filters, pseudo-marginal “exact approximate” MCMC, likelihood-free techniques and particle MCMC for rate parameter inference. I have tried as far as possible to maintain the informal and accessible style of the first edition, and a couple of the more technical new sections have been flagged as “skippable” by less mathematically trained students. In terms of computing, all of the SBML models have been updated to the new Level 3 specification, and all of the R code has been re-written, extended, documented and packaged as an open source R package. The rest of this post is an introduction to the R package. Although the R package is aimed mainly at owners of the second edition, it is well documented, and should therefore be usable by anyone with a reasonable background knowledge of the area. In particular, the R package should be very easy to use for anyone familiar with the first edition of the book. The introduction given here is closely based on the introductory vignette included with the package.

### smfsb: an R package for simulation and inference in stochastic kinetic models

#### Overview

The `smfsb` package provides all of the R code associated with the book, Wilkinson (2011). Almost all of the code is pure R code, intended to be inspected from the R command line. In order to keep the code short, clean and easily understood, there is almost no argument checking or other boilerplate code.

#### Installation

The package is available from CRAN, and it should therefore be possible to install from the R command prompt using

install.packages("smfsb")

from any machine with an internet connection.

The package is being maintained on R-Forge, and so it should always be possible to install the very latest nightly build from the R command prompt with

install.packages("smfsb",repos="http://r-forge.r-project.org")

but you should only do this if you have a good reason to, in order not to overload the R-Forge servers (not that I imagine downloads of this package are likely to overload the servers…).

Once installed, the package can be loaded ready for use with

library(smfsb)

#### Accessing documentation

I have tried to ensure that the package and all associated functions and datasets are properly documented with runnable examples. So,

help(package="smfsb")

will give a brief overview of the package and a complete list of all functions. The list of vignettes associated with the package can be obtained with

vignette(package="smfsb")

At the time of writing, the introductory vignette is the only one available, and can be accessed from the R command line with

vignette("smfsb",package="smfsb")

Help on functions can be obtained using the usual R mechanisms. For example, help on the function `StepGillespie` can be obtained with

?StepGillespie

and the associated example can be run with

example(StepGillespie)

The sourcecode for the function can be obtained by typing `StepGillespie` on a line by itself. In this case, it returns the following R code:

function (N) { S = t(N$Post - N$Pre) v = ncol(S) return(function(x0, t0, deltat, ...) { t = t0 x = x0 termt = t0 + deltat repeat { h = N$h(x, t, ...) h0 = sum(h) if (h0 < 1e-10) t = 1e+99 else if (h0 > 1e+06) { t = 1e+99 warning("Hazard too big - terminating simulation!") } else t = t + rexp(1, h0) if (t >= termt) return(x) j = sample(v, 1, prob = h) x = x + S[, j] } }) }

A list of demos associated with the package can be obtained with

demo(package="smfsb")

A list of data sets associated with the package can be obtained with

data(package="smfsb")

For example, the small table, `mytable` from the introduction to R in Chapter 4 can by loaded with

data(mytable)

After running this command, the data frame `mytable` will be accessible, and can be examined by typing

mytable

at the R command prompt.

#### Simulation of stochastic kinetic models

The main purpose of this package is to provide a collection of tools for building and simulating stochastic kinetic models. This can be illustrated using a simple Lotka-Volterra predator-prey system. First, consider the prey, and the predator as a stochastic network, viz

The first “reaction” represents predator reproduction, the second predator-prey interaction and the third predator death. We can write the stoichiometries of the reactions, together with the rate (or hazard) of each reaction, in tabular form as

Reaction | Pre | Post | Hazard | ||
---|---|---|---|---|---|

1 | 0 | 2 | 0 | ||

1 | 1 | 0 | 2 | ||

0 | 1 | 0 | 0 |

This can be encoded in R as a stochastic Petri net (SPN) using

# SPN for the Lotka-Volterra system LV=list() LV$Pre=matrix(c(1,0,1,1,0,1),ncol=2,byrow=TRUE) LV$Post=matrix(c(2,0,0,2,0,0),ncol=2,byrow=TRUE) LV$h=function(x,t,th=c(th1=1,th2=0.005,th3=0.6)) { with(as.list(c(x,th)),{ return(c(th1*x1, th2*x1*x2, th3*x2 )) }) }

This object could be created directly by executing

data(spnModels)

since the LV model is one of the standard demo models included with the package. Functions for simulating from the transition kernel of the Markov process defined by the SPN can be created easily by passing the SPN object into the appropriate constructor. For example, if simulation using the Gillespie algorithm is required, a simulation function can be created with

stepLV=StepGillespie(LV)

This resulting function (closure) can then be used to advance the state of the process. For example, to simulate the state of the process at time 1, given an initial condition of , at time 0, use

stepLV(c(x1=50,x2=100),0,1)

Alternatively, to simulate a realisation of the process on a regular time grid over the interval [0,100] in steps of 0.1 time units, use

out = simTs(c(x1=50,x2=100),0,100,0.1,stepLV) plot(out,plot.type="single",col=c(2,4))

which gives the resulting plot

See the help and runnable example for the function `StepGillespie` for further details, including some available alternative simulation algorithms, such as `StepCLE`.

#### Inference for stochastic kinetic models from time course data

Estimating the parameters of stochastic kinetic models using noisy time course measurements on some aspect of the system state is a very important problem. Wilkinson (2011) takes a Bayesian approach to the problem, using particle MCMC methodology. For this, a key aspect is the use of a particle filter to compute an unbiased estimate of marginal likelihood. This is accomplished using the function `pfMLLik`. Once a method is available for generating unbiased estimates for the marginal likelihood, this may be embedded into a fairly standard marginal Metropolis-Hastings algorithm for parameter estimation. See the help and runnable example for `pfMLLik` for further details, along with the particle MCMC demo, which can by run using `demo(PMCMC)`. I’ll discuss more about particle MCMC and rate parameter inference in the next post.