sampler – Darren Wilkinson's blog

Introduction to the particle Gibbs sampler

Introduction

Particle MCMC (the use of approximate SMC proposals within exact MCMC algorithms) is arguably one of the most important developments in computational Bayesian inference of the 21st Century. The key concepts underlying these methods are described in a famously impenetrable “read paper” by Andrieu et al (2010). Probably the most generally useful method outlined in that paper is the particle marginal Metropolis-Hastings (PMMH) algorithm that I have described previously – that post is required preparatory reading for this one.

In this post I want to discuss some of the other topics covered in the pMCMC paper, leading up to a description of the particle Gibbs sampler. The basic particle Gibbs algorithm is arguably less powerful than PMMH for a few reasons, some of which I will elaborate on. But there is still a lot of active research concerning particle Gibbs-type algorithms, which are attempting to address some of the deficiencies of the basic approach. Clearly, in order to understand and appreciate the recent developments it is first necessary to understand the basic principles, and so that is what I will concentrate on here. I’ll then finish with some pointers to more recent work in this area.

PIMH

I will adopt the same approach and notation as for my post on the PMMH algorithm, using a simple bootstrap particle filter for a state space model as the SMC proposal. It is simplest to understand particle Gibbs first in the context of known static parameters, and so it is helpful to first reconsider the special case of the PMMH algorithm where there are no unknown parameters and only the state path, $x$ of the process is being updated. That is, we target $p(x|y)$ (for known, fixed, $\theta$ ) rather than $p(\theta,x|y)$ . This special case is known as the particle independent Metropolis-Hastings (PIMH) sampler.

Here we envisage proposing a new path $x_{0:T}^\star$ using a bootstrap filter, and then accepting the proposal with probability $\min\{1,A\}$ , where $A$ is the Metropolis-Hastings ratio

$\displaystyle A = \frac{\hat{p}(y_{1:T})^\star}{\hat{p}(y_{1:T})},$

where $\hat{p}(y_{1:T})^\star$ is the bootstrap filter’s estimate of marginal likelihood for the new path, and $\hat{p}(y_{1:T})$ is the estimate associated with the current path. Again using notation from the previous post it is clear that this ratio targets a distribution on the joint space of all simulated random variables proportional to

$\displaystyle \hat{p}(y_{1:T})\tilde{q}(\mathbf{x}_0,\ldots,\mathbf{x}_T,\mathbf{a}_0,\ldots,\mathbf{a}_{T-1})$

and that in this case the marginal distribution of the accepted path is exactly $p(x_{0:T}|y_{1:T})$ . Again, be sure to see the previous post for the explanation.

Conditional SMC update

So far we have just recapped the previous post in the case of known parameters, but it gives us insight in how to proceed. A general issue with Metropolis independence samplers in high dimensions is that they often exhibit “sticky” behaviour, whereby an unusually “good” accepted path is hard to displace. This motivates consideration of a block-Gibbs-style algorithm where updates are used that are always accepted. It is clear that simply running a bootstrap filter will target the particle filter distribution

$\tilde{q}(\mathbf{x}_0,\ldots,\mathbf{x}_T,\mathbf{a}_0,\ldots,\mathbf{a}_{T-1})$

and so the marginal distribution of the accepted path will be the approximate $\hat{p}(x_{0:T}|y_{1:T})$ rather than the exact conditional distribution $p(x_{0:T}|y_{1:T})$ . However, we know from consideration of the PIMH algorithm that what we really want to do is target the slightly modified distribution proportional to

$\displaystyle \hat{p}(y_{1:T})\tilde{q}(\mathbf{x}_0,\ldots,\mathbf{x}_T,\mathbf{a}_0,\ldots,\mathbf{a}_{T-1})$ ,

as this will lead to accepted paths with the exact marginal distribution. For the PIMH this modification is achieved using a Metropolis-Hastings correction, but we now try to avoid this by instead conditioning on the previously accepted path. For this target the accepted paths have exactly the required marginal distribution, so we now write the target as the product of the marginal for the current path times a conditional for all of the remaining variables.

$\displaystyle \frac{p(x_{0:T}^k|y_{1:T})}{M^T} \times \frac{M^T}{p(x_{0:T}^k|y_{1:T})} \hat{p}(y_{1:T})\tilde{q}(\mathbf{x}_0,\ldots,\mathbf{x}_T,\mathbf{a}_0,\ldots,\mathbf{a}_{T-1})$

where in addition to the correct marginal for $x$ we assume iid uniform ancestor indices. The important thing to note here is that the conditional distribution of the remaining variables simplifies to

$\displaystyle \frac{\tilde{q}(\mathbf{x}_0,\ldots,\mathbf{x}_T,\mathbf{a}_0,\ldots,\mathbf{a}_{T-1})} {\displaystyle p(x_0^{b_0^k})\left[\prod_{t=0}^{T-1} \pi_t^{b_t^k}p\left(x_{t+1}^{b_{t+1}^k}|x_t^{b_t^k}\right)\right]}$ .

The terms in the denominator are precisely the terms in the numerator corresponding to the current path, and hence “cancel out” the current path terms in the numerator. It is therefore clear that we can sample directly from this conditional distribution by running a bootstrap particle filter that includes the current path and which leaves the current path fixed. This is the conditional SMC (CSMC) update, which here is just a conditional bootstrap particle filter update. It is clear from the form of the conditional density how this filter must be constructed, but for completeness it is described below.

The bootstrap filter is run conditional on one trajectory. This is usually the trajectory sampled at the last run of the particle filter. The idea is that you do not sample new state or ancestor values for that one trajectory. Note that this guarantees that the conditioned on trajectory survives the filter right through to the final sweep of the filter at which point a new trajectory is picked from the current selection of $M$ paths, of which the conditioned-on trajectory is one.

Let $x_{1:T} = (x_1^{b_1},x_2^{b_2},\ldots,x_T^{b_T})$ be the path that is to be conditioned on, with ancestral lineage $b_{1:T}$ . Then, for $k\not= b_1$ , sample $x_0^k \sim p(x_0)$ and set $\pi_0^k=1/M$ . Now suppose that at time $t$ we have a weighted sample from $p(x_t|y_{1:t})$ . First resample by sampling $a_t^k\sim \mathcal{F}(a_t^k|\boldsymbol{\pi}_t),\ \forall k\not= b_t$ . Next sample $x_{t+1}^k\sim p(x_{t+1}^k|x_t^{a_t^k}),\ \forall k\not=b_t$ . Then for all $k$ set $w_{t+1}^k=p(y_{t+1}|x_{t+1}^k)$ and normalise with $\pi_{t+1}^k=w_{t+1}^k/\sum_{i=1}^M w_{t+1}^i$ . Propagate this weighted set of particles to the next time point. At time $T$ select a single trajectory by sampling $k'\sim \mathcal{F}(k'|\boldsymbol{\pi}_T)$ .

This defines a block Gibbs sampler which updates $2(M-1)T+1$ of the $2MT+1$ random variables in the augmented state space at each iteration. Since the block of variables to be updated is random, this defines an ergodic sampler for $M\geq2$ particles, and we have explained why the marginal distribution of the selected trajectory is the exact conditional distribution.

Before going on to consider the introduction of unknown parameters, it is worth considering the limitations of this method. One of the main motivations for considering a Gibbs-style update was concern about the “stickiness” of a Metropolis independence sampler. However, it is clear that conditional SMC updates also have the potential to stick. For a large number of time points, particle filter genealogies coalesce, or degenerate, to a single path. Since here we are conditioning on the current path, if there is coalescence, it is guaranteed to be to the previous path. So although the conditional SMC updates are always accepted, it is likely that much of the new path will be identical to the previous path, which is just another kind of “sticking” of the sampler. This problem with conditional SMC and particle Gibbs more generally is well recognised, and quite a bit of recent research activity in this area is directed at alleviating this sticking problem. The most obvious strategy to use is “backward sampling” (Godsill et al, 2004), which has been used in this context by Lindsten and Schon (2012), Whiteley et al (2010), and Chopin and Singh (2013), among others. Another related idea is “ancestor sampling” (Lindsten et al, 2014), which can be done in a single forward pass. Both of these techniques work well, but both rely on the tractability of the transition kernel of the state space model, which can be problematic in certain applications.

Particle Gibbs sampling

As we are working in the context of Gibbs-style updates, the introduction of static parameters, $\theta$ , into the problem is relatively straightforward. It turns out to be correct to do the obvious thing, which is to alternate between sampling $\theta$ given $y$ and the currently sampled path, $x$ , and sampling a new path using a conditional SMC update, conditional on the previous path in addition to $\theta$ and $y$ . Although this is the obvious thing to do, understanding exactly why it works is a little delicate, due to the augmented state space and conditional SMC update. However, it is reasonably clear that this strategy defines a “collapsed Gibbs sampler” (Lui, 1994), and so actually everything is fine. This particular collapsed Gibbs sampler is relatively easy to understand as a marginal sampler which integrates out the augmented variables, but then nevertheless samples the augmented variables at each iteration conditional on everything else.

Note that the Gibbs update of $\theta$ may be problematic in the context of a state space model with intractable transition kernel.

In a subsequent post I’ll show how to code up the particle Gibbs and other pMCMC algorithms in a reasonably efficient way.

References

Andrieu, Christophe, Arnaud Doucet, and Roman Holenstein. “Particle Markov chain Monte Carlo methods.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 72.3 (2010): 269-342.
Chopin, Nicolas, and Sumeetpal S. Singh. “On the particle Gibbs sampler.” arXiv preprint arXiv:1304.1887 (2013).
Godsill, Simon J., Arnaud Doucet, and Mike West. “Monte Carlo smoothing for nonlinear time series.” Journal of the American Statistical Association 99.465 (2004).
Lindsten, Fredrik, and Thomas B. Schön. “On the use of backward simulation in the particle Gibbs sampler.” Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on. IEEE, 2012.
Lindsten, Fredrik, Michael I. Jordan, and Thomas B. Schön. “Particle Gibbs with Ancestor Sampling.” arXiv preprint arXiv:1401.0604 (2014).
Liu, Jun S. “The collapsed Gibbs sampler in Bayesian computations with applications to a gene regulation problem.” Journal of the American Statistical Association 89.427 (1994): 958-966.
Whiteley, Nick, Christophe Andrieu, and Arnaud Doucet. “Efficient Bayesian inference for switching state-space models using discrete particle Markov chain Monte Carlo methods.” arXiv preprint arXiv:1011.2437 (2010).

A functional Gibbs sampler in Scala

For many years I’ve had a passing interest in functional programming and languages which support functional programming approaches. I’m also quite interested in MOOCs and their future role in higher education. So I recently signed up for my first on-line course, Functional Programming Principles in Scala, via Coursera. I’m around half way through the course at the time of writing, and I’m enjoying it very much. I knew that I didn’t know much about Scala before starting the course, but during the course I’ve also learned that I didn’t know as much about functional programming as I thought I did, either! 😉 The course itself is very interesting, the assignments are well designed and appropriately challenging, and the web infrastructure to support the course is working well. I suspect I’ll try other on-line courses in the future.

Functional programming emphasises immutability, and discourages imperative programming approaches that use variables that can be modified during run-time. There are many advantages to immutability, especially in the context of parallel and concurrent programming, which is becoming increasingly important as multi-core systems become the norm. I’ve always found functional programming to be intellectually appealing, but have often worried about the practicalities of using functional programming in the context of scientific computing where many algorithms are iterative in nature, and are typically encoded using imperative approaches. The Scala programming language is appealing to me as it supports both imperative and functional styles of programming, as well as object oriented approaches. However, as a result of taking this course I am now determined to pursue functional approaches further, and get more of a feel for how practical they are for scientific computing applications.

For my first experiment, I’m going back to my post describing a Gibbs sampler in various languages. See that post for further details of the algorithm. In that post I did have an example implementation in Scala, which looked like this:

object GibbsSc {
 
    import cern.jet.random.tdouble.engine.DoubleMersenneTwister
    import cern.jet.random.tdouble.Normal
    import cern.jet.random.tdouble.Gamma
    import Math.sqrt
    import java.util.Date
 
    def main(args: Array[String]) {
        val N=50000
        val thin=1000
        val rngEngine=new DoubleMersenneTwister(new Date)
        val rngN=new Normal(0.0,1.0,rngEngine)
        val rngG=new Gamma(1.0,1.0,rngEngine)
        var x=0.0
        var y=0.0
        println("Iter x y")
        for (i <- 0 until N) {
            for (j <- 0 until thin) {
                x=rngG.nextDouble(3.0,y*y+4)
                y=rngN.nextDouble(1.0/(x+1),1.0/sqrt(2*x+2))
            }
            println(i+" "+x+" "+y)
        }
    }
 
}

At the time I wrote that post I knew even less about Scala than I do now, so I created the code by starting from the Java version and removing all of the annoying clutter! 😉 Clearly this code has an imperative style, utilising variables (declared with var) x and y having mutable state that is updated by a nested for loop. This algorithm is typical of the kind I use every day, so if I can’t re-write this in a more functional style, removing all mutable variables from my code, then I’m not going to get very far with functional programming!

In fact it is very easy to re-write this in a more functional style without utilising mutable variables. One possible approach is presented below.

object FunGibbs {
 
    import cern.jet.random.tdouble.engine.DoubleMersenneTwister
    import cern.jet.random.tdouble.Normal
    import cern.jet.random.tdouble.Gamma
    import java.util.Date
    import scala.math.sqrt

    val rngEngine=new DoubleMersenneTwister(new Date)
    val rngN=new Normal(0.0,1.0,rngEngine)
    val rngG=new Gamma(1.0,1.0,rngEngine)

    class State(val x: Double,val y: Double)

    def nextIter(s: State): State = {
         val newX=rngG.nextDouble(3.0,(s.y)*(s.y)+4.0)
         new State(newX, 
              rngN.nextDouble(1.0/(newX+1),1.0/sqrt(2*newX+2)))
    }

    def nextThinnedIter(s: State,left: Int): State = {
       if (left==0) s 
       else nextThinnedIter(nextIter(s),left-1)
    }

    def genIters(s: State,current: Int,stop: Int,thin: Int): State = {
         if (!(current>stop)) {
             println(current+" "+s.x+" "+s.y)
             genIters(nextThinnedIter(s,thin),current+1,stop,thin)
         }
         else s
    }

    def main(args: Array[String]) {
        println("Iter x y")
        genIters(new State(0.0,0.0),1,50000,1000)
     }

}

Although it is a few lines longer, it is a fairly clean implementation, and doesn’t look like a hack. Like many functional programs, this one makes extensive use of recursion. This is one of the things that has always concerned me about functional programming – many scientific computing applications involve lots of iteration, and that can potentially translate into very deep recursion. The above program has an apparent recursion depth of 50 million! However, it runs fine without crashing despite the fact that most programming languages will crash out with a stack overflow with recursion depths of more than a couple of thousand. So why doesn’t this crash? It runs fine because the recursion I used is a special form of recursion known as a tail call. Most functional (and some imperative) programming languages automatically perform tail call elimination which essentially turns the tail call into an iteration which runs very fast without creating new stack frames. In fact, this functional version of the code runs at roughly the same speed as the iterative version I presented first (perhaps just a few percent slower – I haven’t done careful timings), and runs well within a factor of 2 of imperative C code. So actually this seems perfectly practical so far, and I’m looking forward to experimenting more with functional programming approaches to statistical computation over the coming months…

Inlining JAGS models in R scripts for rjags

JAGS (Just Another Gibbs Sampler) is a general purpose MCMC engine similar to WinBUGS and OpenBUGS. I have a slight preference for JAGS as it is free and portable, works well on Linux, and interfaces well with R. It is tempting to write a tutorial introduction to JAGS and the corresponding R package, rjags, but there is a lot of material freely available on-line already, so it isn’t really necessary. If you are new to JAGS, I suggest starting with Getting Started with JAGS, rjags, and Bayesian Modelling. In this post I want to focus specifically on the problem of inlining JAGS models in R scripts as it can be very useful, and is usually skipped in introductory material.

JAGS and rjags on Ubuntu Linux

On recent versions of Ubuntu, assuming that R is already installed, the simplest way to install JAGS and rjags is using the command

sudo apt-get install jags r-cran-rjags

Now rjags is a CRAN package, so it can be installed in the usual way with install.packages("rjags"). However, taking JAGS and rjags direct from the Ubuntu repos should help to ensure that the versions of JAGS and rjags are in sync, which is a good thing.

Toy model

For this post, I will use a trivial toy example of inference for the mean and precision of a normal random sample. That is, we will assume data

$X_i \sim N(\mu,1/\tau),\quad i=1,2,\ldots n,$

with priors on $\mu$ and $\tau$ of the form

$\tau\sim Ga(a,b),\quad \mu \sim N(c,1/d).$

Separate model file

The usual way to fit this model in R using rjags is to first create a separate file containing the model

  model {
    for (i in 1:n) {
      x[i]~dnorm(mu,tau)
    }
    mu~dnorm(cc,d)
    tau~dgamma(a,b)
  }

Then, supposing that this file is called jags1.jags, an R session to fit the model could be constructed as follows:

require(rjags)
x=rnorm(15,25,2)
data=list(x=x,n=length(x))
hyper=list(a=3,b=11,cc=10,d=1/100)
init=list(mu=0,tau=1)
model=jags.model("jags1.jags",data=append(data,hyper), inits=init)
update(model,n.iter=100)
output=coda.samples(model=model,variable.names=c("mu", "tau"), n.iter=10000, thin=1)
print(summary(output))
plot(output)

This is all fine, and it can be very useful to have the model declared in a separate file, especially if the model is large and complex, and you might want to use it from outside R. However, very often for simple models it can be quite inconvenient to have the model separate from the R script which runs it. In particular, people often have issues with naming files correctly, making sure R is looking in the correct directory, moving the model with the R script, etc. So it would be nice to be able to just inline the JAGS model within an R script, to keep the model, the data, and the analysis all together in one place.

Using a temporary file

What we want to do is declare the JAGS model within a text string inside an R script and then somehow pass this into the call to jags.model(). The obvious way to do this is to write the string to a text file, and then pass the name of that text file into jags.model(). This works fine, but some care needs to be taken to make sure this works in a generic platform independent way. For example, you need to write to a file that you know doesn’t exist in a directory that is writable using a filename that is valid on the OS on which the script is being run. For this purpose R has an excellent little function called tempfile() which solves exactly this naming problem. It should always return the name of a file which does not exist in a writable directly within the standard temporary file location on the OS on which R is being run. This function is exceedingly useful for all kinds of things, but doesn’t seem to be very well known by newcomers to R. Using this we can construct a stand-alone R script to fit the model as follows:

require(rjags)
x=rnorm(15,25,2)
data=list(x=x,n=length(x))
hyper=list(a=3,b=11,cc=10,d=1/100)
init=list(mu=0,tau=1)
modelstring="
  model {
    for (i in 1:n) {
      x[i]~dnorm(mu,tau)
    }
    mu~dnorm(cc,d)
    tau~dgamma(a,b)
  }
"
tmpf=tempfile()
tmps=file(tmpf,"w")
cat(modelstring,file=tmps)
close(tmps)
model=jags.model(tmpf,data=append(data,hyper), inits=init)
update(model,n.iter=100)
output=coda.samples(model=model,variable.names=c("mu", "tau"), n.iter=10000, thin=1)
print(summary(output))
plot(output)

Now, although there is a file containing the model temporarily involved, the script is stand-alone and portable.

Using a text connection

The solution above works fine, but still involves writing a file to disk and reading it back in again, which is a bit pointless in this case. We can solve this by using another under-appreciated R function, textConnection(). Many R functions which take a file as an argument will work fine if instead passed a textConnection object, and the rjags function jags.model() is no exception. Here, instead of writing the model string to disk, we can turn it into a textConnection object and then pass that directly into jags.model() without ever actually writing the model file to disk. This is faster, neater and cleaner. An R session which takes this approach is given below.

require(rjags)
x=rnorm(15,25,2)
data=list(x=x,n=length(x))
hyper=list(a=3,b=11,cc=10,d=1/100)
init=list(mu=0,tau=1)
modelstring="
  model {
    for (i in 1:n) {
      x[i]~dnorm(mu,tau)
    }
    mu~dnorm(cc,d)
    tau~dgamma(a,b)
  }
"
model=jags.model(textConnection(modelstring), data=append(data,hyper), inits=init)
update(model,n.iter=100)
output=coda.samples(model=model,variable.names=c("mu", "tau"), n.iter=10000, thin=1)
print(summary(output))
plot(output)

This is my preferred way to use rjags. Note again that textConnection objects have many and varied uses and applications that have nothing to do with rjags.

MCMC on the Raspberry Pi

I’ve recently taken delivery of a Raspberry Pi mini computer. For anyone who doesn’t know, this is a low cost, low power machine, costing around 20 GBP (25 USD) and consuming around 2.5 Watts of power (it is powered by micro-USB). This amazing little device can run linux very adequately, and so naturally I’ve been interested to see if I can get MCMC codes to run on it, and to see how fast they run.

Now, I’m fairly sure that the majority of readers of this blog won’t want to be swamped with lots of Raspberry Pi related posts, so I’ve re-kindled my old personal blog for this purpose. Apart from this post, I’ll try not to write about my experiences with the Pi here on my main blog. Consequently, if you are interested in my ramblings about the Pi, you may wish to consider subscribing to my personal blog in addition to this one. Of course I’m not guaranteeing that the occasional Raspberry-flavoured post won’t find its way onto this blog, but I’ll try only to do so if it has strong relevance to statistical computing or one of the other core topics of this blog.

In order to get started with MCMC on the Pi, I’ve taken the C code gibbs.c for a simple Gibbs sampler described in a previous post (on this blog) and run it on a couple of laptops I have available, in addition to the Pi, and looked at timings. The full details of the experiment are recorded in this post over on my other blog, to which interested parties are referred. Here I will just give the “executive summary”.

The code runs fine on the Pi (running Raspbian), at around half the speed of my Intel Atom based netbook (running Ubuntu). My netbook in turn runs at around one fifth the speed of my Intel i7 based laptop. So the code runs at around one tenth of the speed of the fastest machine I have conveniently available.

As discussed over on my other blog, although the Pi is relatively slow, its low cost and low power consumption mean that is has a bang-for-buck comparable with high-end laptops and desktops. Further, a small cluster of Pis (known as a bramble) seems like a good, low cost way to learn about parallel and distributed statistical computing.

Gibbs sampling a Gaussian Markov random field (GMRF) using Java

Introduction

As I’ve explained previously, I’m gradually coming around to the idea of using Java for the development of MCMC codes, and I’m starting to build up a collection of simple examples for getting started. One of the advantages of Java is that it includes a standard cross-platform GUI library. This might not seem like the most important requirement for MCMC, but can actually be very handy in several contexts, particularly for monitoring convergence. One obvious context is that of image analysis, where it can be useful to monitor image reconstructions as the sampler is running. In this post I’ll show three very small simple Java classes which together provide an application for running a Gibbs sampler on a (non-stationary, unconditioned) Gaussian Markov random field.

The model is essentially that the distribution of each pixel is defined intrinsically, dependent only on its four nearest neighbours on a rectangular lattice, and here the distribution will be Gaussian with mean equal to the sample mean of the four neighbouring pixels and a fixed (unit) variance. On its own this isn’t especially useful, but it is a key component of many image analysis applications.

A simple Java implementation

We will start with the class MrfApp containing the main method for the application:

MrfApp.java

import java.io.*;
class MrfApp {
    public static void main(String[] arg)
	throws IOException
    {
	Mrf mrf;
	System.out.println("started program");
	mrf=new Mrf(800,600);
	System.out.println("created mrf object");
	mrf.update(1000);
	System.out.println("done updates");
	mrf.saveImage("mrf.png");
	System.out.println("finished program");
	mrf.frame.dispose();
	System.exit(0);
    }
}

Hopefully this code is largely self-explanatory, but relies on a class called Mrf which contains all of the logic associated with the GMRF.

Mrf.java

import java.io.*;
import java.util.*;
import java.awt.image.*;
import javax.swing.*;
import javax.imageio.ImageIO;


class Mrf 
{
    int n,m;
    double[][] cells;
    Random rng;
    BufferedImage bi;
    WritableRaster wr;
    JFrame frame;
    ImagePanel ip;
    
    Mrf(int n_arg,int m_arg)
    {
	n=n_arg;
	m=m_arg;
	cells=new double[n][m];
	rng=new Random();
	bi=new BufferedImage(n,m,BufferedImage.TYPE_BYTE_GRAY);
	wr=bi.getRaster();
	frame=new JFrame("MRF");
	frame.setSize(n,m);
	frame.add(new ImagePanel(bi));
	frame.setVisible(true);
    }
    
    public void saveImage(String filename)
	throws IOException
    {
	ImageIO.write(bi,"PNG",new File(filename));
    }
    
    public void updateImage()
    {
	double mx=-1e+100;
	double mn=1e+100;
	for (int i=0;i<n;i++) {
	    for (int j=0;j<m;j++) {
		if (cells[i][j]>mx) { mx=cells[i][j]; }
		if (cells[i][j]<mn) { mn=cells[i][j]; }
	    }
	}
	for (int i=0;i<n;i++) {
	    for (int j=0;j<m;j++) {
		int level=(int) (255*(cells[i][j]-mn)/(mx-mn));
		wr.setSample(i,j,0,level);
	    }
	}
	frame.repaint();
    }
    
    public void update(int num)
    {
	for (int i=0;i<num;i++) {
	    updateOnce();
	}
    }
    
    private void updateOnce()
    {
	double mean;
	for (int i=0;i<n;i++) {
	    for (int j=0;j<m;j++) {
		if (i==0) {
		    if (j==0) {
			mean=0.5*(cells[0][1]+cells[1][0]);
		    } 
		    else if (j==m-1) {
			mean=0.5*(cells[0][j-1]+cells[1][j]);
		    } 
		    else {
			mean=(cells[0][j-1]+cells[0][j+1]+cells[1][j])/3.0;
		    }
		}
		else if (i==n-1) {
		    if (j==0) {
			mean=0.5*(cells[i][1]+cells[i-1][0]);
		    }
		    else if (j==m-1) {
			mean=0.5*(cells[i][j-1]+cells[i-1][j]);
		    }
		    else {
			mean=(cells[i][j-1]+cells[i][j+1]+cells[i-1][j])/3.0;
		    }
		}
		else if (j==0) {
		    mean=(cells[i-1][0]+cells[i+1][0]+cells[i][1])/3.0;
		}
		else if (j==m-1) {
		    mean=(cells[i-1][j]+cells[i+1][j]+cells[i][j-1])/3.0;
		}
		else {
		    mean=0.25*(cells[i][j-1]+cells[i][j+1]+cells[i+1][j]
			       +cells[i-1][j]);
		}
		cells[i][j]=mean+rng.nextGaussian();
	    }
	}
	updateImage();
    }
    
}

This class contains a few simple methods for creating and updating the GMRF, and also for maintaining and updating a graphical view of the GMRF as the sampler is running. The Gibbs sampler update itself is encoded in the final method, updateOnce, and most of the code is to deal with edge and corner cases (in the literal rather than metaphorical sense!). This is called repeatedly by the method update for the required number of iterations. At the end of each iteration, the method updateOnce triggers updateImage which updates the image associated GMRF. The GMRF itself is stored in a 2-dimensional array of doubles, but an image pixel typically consists of a grayscale value represented by an unsigned byte – that is, an integer from 0 to 255. So updateImage scans through the GMRF to find the maximum and minimum values and then maps the GMRF values onto the 0 to 255 scale. The image itself is set up by the constructor method, Mrf. This class relies on an additional class called ImagePanel, which is a simple GUI panel for displaying images:

ImagePanel.java

import java.awt.*;
import java.awt.image.*;
import javax.swing.*;

class ImagePanel extends JPanel {

	protected BufferedImage image;

	public ImagePanel(BufferedImage image) {
		this.image=image;
		Dimension dim=new Dimension(image.getWidth(),image.getHeight());
		setPreferredSize(dim);
		setMinimumSize(dim);
		revalidate();
		repaint();
	}

	public void paintComponent(Graphics g) {
		g.drawImage(image,0,0,this);
	}

}

This completes the application, which can be compiled and run from the command line with

javac *.java
java MrfApp

This should compile the code and run the application, which will show a GMRF updating for 1000 iterations. When the 1000 iterations are complete, the application writes the final image to a file and then quits.

Using Parallel COLT

The above classes are very convenient, as they should work with any standard Java installation. However, in more complex scenarios, it is likely that a math library such as Parallel COLT will be required. In this case it will make sense to make use of features in the COLT library, such as random number generators and 2d matrix objects. We can adapt the above application by replacing the MrfApp and Mrf classes with the following versions (the ImagePanel class remains unchanged):

MrfApp.java

import java.io.*;
import cern.jet.random.tdouble.engine.*;

class MrfApp {

    public static void main(String[] arg)
	throws IOException
    {
	Mrf mrf;
	int seed=1234;
	System.out.println("started program");
        DoubleRandomEngine rngEngine=new DoubleMersenneTwister(seed);
	mrf=new Mrf(800,600,rngEngine);
	System.out.println("created mrf object");
	mrf.update(1000);
	System.out.println("done updates");
	mrf.saveImage("mrf.png");
	System.out.println("finished program");
	mrf.frame.dispose();
	System.exit(0);
    }

}

Mrf.java

import java.io.*;
import java.util.*;
import java.awt.image.*;
import javax.swing.*;
import javax.imageio.ImageIO;
import cern.jet.random.tdouble.*;
import cern.jet.random.tdouble.engine.*;
import cern.colt.matrix.tdouble.impl.*;

class Mrf 
{
    int n,m;
    DenseDoubleMatrix2D cells;
    DoubleRandomEngine rng;
    Normal rngN;
    BufferedImage bi;
    WritableRaster wr;
    JFrame frame;
    ImagePanel ip;
    
    Mrf(int n_arg,int m_arg,DoubleRandomEngine rng)
    {
	n=n_arg;
	m=m_arg;
	cells=new DenseDoubleMatrix2D(n,m);
	this.rng=rng;
	rngN=new Normal(0.0,1.0,rng);
	bi=new BufferedImage(n,m,BufferedImage.TYPE_BYTE_GRAY);
	wr=bi.getRaster();
	frame=new JFrame("MRF");
	frame.setSize(n,m);
	frame.add(new ImagePanel(bi));
	frame.setVisible(true);
    }
    
    public void saveImage(String filename)
	throws IOException
    {
	ImageIO.write(bi,"PNG",new File(filename));
    }
    
    public void updateImage()
    {
	double mx=-1e+100;
	double mn=1e+100;
	for (int i=0;i<n;i++) {
	    for (int j=0;j<m;j++) {
		if (cells.getQuick(i,j)>mx) { mx=cells.getQuick(i,j); }
		if (cells.getQuick(i,j)<mn) { mn=cells.getQuick(i,j); }
	    }
	}
	for (int i=0;i<n;i++) {
	    for (int j=0;j<m;j++) {
		int level=(int) (255*(cells.getQuick(i,j)-mn)/(mx-mn));
		wr.setSample(i,j,0,level);
	    }
	}
	frame.repaint();
    }
    
    public void update(int num)
    {
	for (int i=0;i<num;i++) {
	    updateOnce();
	}
    }
    
    private void updateOnce()
    {
	double mean;
	for (int i=0;i<n;i++) {
	    for (int j=0;j<m;j++) {
		if (i==0) {
		    if (j==0) {
			mean=0.5*(cells.getQuick(0,1)+cells.getQuick(1,0));
		    } 
		    else if (j==m-1) {
			mean=0.5*(cells.getQuick(0,j-1)+cells.getQuick(1,j));
		    } 
		    else {
			mean=(cells.getQuick(0,j-1)+cells.getQuick(0,j+1)+cells.getQuick(1,j))/3.0;
		    }
		}
		else if (i==n-1) {
		    if (j==0) {
			mean=0.5*(cells.getQuick(i,1)+cells.getQuick(i-1,0));
		    }
		    else if (j==m-1) {
			mean=0.5*(cells.getQuick(i,j-1)+cells.getQuick(i-1,j));
		    }
		    else {
			mean=(cells.getQuick(i,j-1)+cells.getQuick(i,j+1)+cells.getQuick(i-1,j))/3.0;
		    }
		}
		else if (j==0) {
		    mean=(cells.getQuick(i-1,0)+cells.getQuick(i+1,0)+cells.getQuick(i,1))/3.0;
		}
		else if (j==m-1) {
		    mean=(cells.getQuick(i-1,j)+cells.getQuick(i+1,j)+cells.getQuick(i,j-1))/3.0;
		}
		else {
		    mean=0.25*(cells.getQuick(i,j-1)+cells.getQuick(i,j+1)+cells.getQuick(i+1,j)
			       +cells.getQuick(i-1,j));
		}
		cells.setQuick(i,j,mean+rngN.nextDouble());
	    }
	}
	updateImage();
    }
    
}

Again, the code should be reasonably self explanatory, and will compile and run in the same way provided that Parallel COLT is installed and in your classpath. This version runs approximately twice as fast as the previous version on all of the machines I’ve tried it on.

Reference

I have found the following book very useful for understanding how to work with images in Java:

Hunt, K.A. (2010) The Art of Image Processing with Java, A K Peters/CRC Press.

Gibbs sampler in various languages (revisited)

Introduction

Regular readers of this blog will know that in April 2010 I published a short post showing how a trivial bivariate Gibbs sampler could be implemented in the four languages that I use most often these days (R, python, C, Java), and I discussed relative timings, and how one might start to think about trading off development time against execution time for more complex MCMC algorithms. I actually wrote the post very quickly one night while I was stuck in a hotel room in Seattle – I didn’t give much thought to it, and the main purpose was to provide simple illustrative examples of simple Monte Carlo codes using non-uniform random number generators in the different languages, as a starting point for someone thinking of switching languages (say, from R to Java or C, for efficiency reasons). It wasn’t meant to be very deep or provocative, or to start any language wars. Suffice to say that this post has had many more hits than all of my other posts combined, is still my most popular post, and still attracts comments and spawns other posts to this day. Several people have requested that I re-do the post more carefully, to include actual timings, and to include a few additional optimisations. Hence this post. For reference, the original post is here. A post about it from the python community is here, and a recent post about using Rcpp and inlined C++ code to speed up the R version is here.

The sampler

So, the basic idea was to construct a Gibbs sampler for the bivariate distribution

$f(x,y) = kx^2\exp\{-xy^2-y^2+2y-4x\},\qquad x>0,y\in\Bbb{R}$

with unknown normalising constant $k>0$ ensuring that the density integrates to one. Unfortunately, in the original post I dropped a factor of 2 constructing one of the full conditionals, which meant that none of the samplers actually had exactly the right target distribution (thanks to Sanjog Misra for bringing this to my attention). So actually, the correct full conditionals are

$\displaystyle x|y \sim Ga(3,y^2+4)$

$\displaystyle y|x \sim N\left(\frac{1}{1+x},\frac{1}{2(1+x)}\right)$

Note the factor of two in the variance of the full conditional for $y$ . Given the full conditionals, it is simple to alternately sample from them to construct a Gibbs sampler for the target distribution. We will run a Gibbs sampler with a thin of 1000 and obtain a final sample of 50000.

Implementations

R

Let’s start with R again. The slightly modified version of the code from the old post is given below

gibbs=function(N,thin)
{
	mat=matrix(0,ncol=3,nrow=N)
	mat[,1]=1:N
	x=0
	y=0
	for (i in 1:N) {
		for (j in 1:thin) {
			x=rgamma(1,3,y*y+4)
			y=rnorm(1,1/(x+1),1/sqrt(2*x+2))
		}
		mat[i,2:3]=c(x,y)
	}
	mat=data.frame(mat)
	names(mat)=c("Iter","x","y")
	mat
}

writegibbs=function(N=50000,thin=1000)
{
	mat=gibbs(N,thin)
	write.table(mat,"data.tab",row.names=FALSE)
}

writegibbs()

I’ve just corrected the full conditional, and I’ve increased the sample size and thinning to 50k and 1k, respectively, to allow for more accurate timings (of the faster languages). This code can be run from the (Linux) command line with something like:

time Rscript gibbs.R

I discuss timings in detail towards the end of the post, but this code is slow, taking over 7 minutes on my (very fast) laptop. Now, the above code is typical of the way code is often structured in R – doing as much as possible in memory, and writing to disk only if necessary. However, this can be a bad idea with large MCMC codes, and is less natural in other languages, anyway, so below is an alternative version of the code, written in more of a scripting language style.

gibbs=function(N,thin)
{
	x=0
	y=0
        cat(paste("Iter","x","y","\n"))
	for (i in 1:N) {
		for (j in 1:thin) {
			x=rgamma(1,3,y*y+4)
			y=rnorm(1,1/(x+1),1/sqrt(2*x+2))
		}
		cat(paste(i,x,y,"\n"))
	}
}

gibbs(50000,1000)

This can be run with a command like

time Rscript gibbs-script.R > data.tab

This code actually turns out to be a slightly slower than the in-memory version for this simple example, but for larger problems I would not expect that to be the case. I always analyse MCMC output using R, whatever language I use for running the algorithm, so for completeness, here is a bit of code to load up the data file, do some plots and compute summary statistics.

fun=function(x,y)
{
	x*x*exp(-x*y*y-y*y+2*y-4*x)
}

compare<-function(file="data.tab")
{
	mat=read.table(file,header=TRUE)
	op=par(mfrow=c(2,1))
	x=seq(0,3,0.1)
	y=seq(-1,3,0.1)
	z=outer(x,y,fun)
	contour(x,y,z,main="Contours of actual (unnormalised) distribution")
	require(KernSmooth)
	fit=bkde2D(as.matrix(mat[,2:3]),c(0.1,0.1))
	contour(fit$x1,fit$x2,fit$fhat,main="Contours of empirical distribution")
	par(op)
	print(summary(mat[,2:3]))
}
compare()

Python

Another language I use a lot is Python. I don’t want to start any language wars, but I personally find python to be a better designed language than R, and generally much nicer for the development of large programs. A python script for this problem is given below

import random,math

def gibbs(N=50000,thin=1000):
    x=0
    y=0
    print "Iter  x  y"
    for i in range(N):
        for j in range(thin):
            x=random.gammavariate(3,1.0/(y*y+4))
            y=random.gauss(1.0/(x+1),1.0/math.sqrt(2*x+2))
        print i,x,y

gibbs()

It can be run with a command like

time python gibbs.py > data.tab

This code turns out to be noticeably faster than the R versions, taking around 4 minutes on my laptop (again, detailed timing information below). However, there is a project for python known as the PyPy project, which is concerned with compiling regular python code to very fast byte-code, giving significant speed-ups on certain problems. For this post, I downloaded and install version 1.5 of the 64-bit linux version of PyPy. Once installed, I can run the above code with the command

time pypy gibbs.py > data.tab

To my astonishment, this “just worked”, and gave very impressive speed-up over regular python, running in around 30 seconds. This actually makes python a much more realistic prospect for the development of MCMC codes than I imagined. However, I need to understand the limitations of PyPy better – for example, why doesn’t everyone always use PyPy for everything?! It certainly seems to make python look like a very good option for prototyping MCMC codes.

C

Traditionally, I have mainly written MCMC codes in C, using the GSL. C is a fast, efficient, statically typed language, which compiles to native code. In many ways it represents the “gold standard” for speed. So, here is the C code for this problem.

#include <stdio.h>
#include <math.h>
#include <stdlib.h>
#include <gsl/gsl_rng.h>
#include <gsl/gsl_randist.h>

void main()
{
  int N=50000;
  int thin=1000;
  int i,j;
  gsl_rng *r = gsl_rng_alloc(gsl_rng_mt19937);
  double x=0;
  double y=0;
  printf("Iter x y\n");
  for (i=0;i<N;i++) {
    for (j=0;j<thin;j++) {
      x=gsl_ran_gamma(r,3.0,1.0/(y*y+4));
      y=1.0/(x+1)+gsl_ran_gaussian(r,1.0/sqrt(2*x+2));
    }
    printf("%d %f %f\n",i,x,y);
  }
}

It can be compiled and run with command like

gcc -O4 gibbs.c -lgsl -lgslcblas -lm -o gibbs
time ./gibbs > datac.tab

This runs faster than anything else I consider in this post, taking around 8 seconds.

Java

I’ve recently been experimenting with Java for MCMC codes, in conjunction with Parallel COLT. Java is a statically typed object-oriented (O-O) language, but is usually compiled to byte-code to run on a virtual machine (known as the JVM). Java compilers and virtual machines are very fast these days, giving “close to C” performance, but with a nicer programming language, and advantages associated with virtual machines. Portability is a huge advantage of Java. For example, I can easily get my Java code to run on almost any University Condor pool, on both Windows and Linux clusters – they all have a recent JVM installed, and I can easily bundle any required libraries with my code. Suffice to say that getting GSL/C code to run on generic Condor pools is typically much less straightforward. Here is the Java code:

import java.util.*;
import cern.jet.random.tdouble.*;
import cern.jet.random.tdouble.engine.*;

class Gibbs
{

    public static void main(String[] arg)
    {
	int N=50000;
	int thin=1000;
	DoubleRandomEngine rngEngine=new DoubleMersenneTwister(new Date());
	Normal rngN=new Normal(0.0,1.0,rngEngine);
	Gamma rngG=new Gamma(1.0,1.0,rngEngine);
	double x=0;
	double y=0;
	System.out.println("Iter x y");
	for (int i=0;i<N;i++) {
	    for (int j=0;j<thin;j++) {
		x=rngG.nextDouble(3.0,y*y+4);
		y=rngN.nextDouble(1.0/(x+1),1.0/Math.sqrt(2*x+2));
	    }
	    System.out.println(i+" "+x+" "+y);
	}
    }

}

It can be compiled and run with

javac Gibbs.java
time java Gibbs > data.tab

This takes around 11.6s seconds on my laptop. This is well within a factor of 2 of the C version, and around 3 times faster than even the PyPy python version. It is around 40 times faster than R. Java looks like a good choice for implementing MCMC codes that would be messy to implement in C, or that need to run places where it would be fiddly to get native codes to run.

Scala

Another language I’ve been taking some interest in recently is Scala. Scala is a statically typed O-O/functional language which compiles to byte-code that runs on the JVM. Since it uses Java technology, it can seamlessly integrate with Java libraries, and can run anywhere that Java code can run. It is a much nicer language to program in than Java, and feels more like a dynamic language such as python. In fact, it is almost as nice to program in as python (and in some ways nicer), and will run in a lot more places than PyPy python code. Here is the scala code (which calls Parallel COLT for random number generation):

object GibbsSc {

	import cern.jet.random.tdouble.engine.DoubleMersenneTwister
	import cern.jet.random.tdouble.Normal
	import cern.jet.random.tdouble.Gamma
	import Math.sqrt
 	import java.util.Date

	def main(args: Array[String]) {
		val N=50000
		val thin=1000
		val rngEngine=new DoubleMersenneTwister(new Date)
		val rngN=new Normal(0.0,1.0,rngEngine)
		val rngG=new Gamma(1.0,1.0,rngEngine)
		var x=0.0
		var y=0.0
		println("Iter x y")
		for (i <- 0 until N) {
			for (j <- 0 until thin) {
				x=rngG.nextDouble(3.0,y*y+4)
				y=rngN.nextDouble(1.0/(x+1),1.0/sqrt(2*x+2))
			}
			println(i+" "+x+" "+y)
		}
	}

}

It can be compiled and run with

scalac GibbsSc.scala
time scala GibbsSc > data.tab

This code takes around 11.8s on my laptop – almost as fast as the Java code! So, on the basis of this very simple and superficial example, it looks like scala may offer the best of all worlds – a nice, elegant, terse programming language, functional and O-O programming styles, the safety of static typing, the ability to call on Java libraries, great speed and efficiency, and the portability of Java! Very interesting.

Groovy

James Durbin has kindly sent me a Groovy version of the code, which he has also discussed in his own blog post. Groovy is a dynamic O-O language for the JVM, which, like Scala, can integrate nicely with Java applications. It isn’t a language I have examined closely, but it seems quite nice. The code is given below:

import cern.jet.random.tdouble.engine.*;
import cern.jet.random.tdouble.*;

N=50000;
thin=1000;
rngEngine= new DoubleMersenneTwister(new Date());
rngN=new Normal(0.0,1.0,rngEngine);
rngG=new Gamma(1.0,1.0,rngEngine);
x=0.0;
y=0.0;
println("Iter x y");
for(i in 1..N){
	for(j in 1..thin){
		x=rngG.nextDouble(3.0,y*y+4);
		y=rngN.nextDouble(1.0/(x+1),1.0/Math.sqrt(2*x+2));
	}
	println("$i $x $y");
}

It can be run with a command like:

time groovy Gibbs.gv > data.tab

Again, rather amazingly, this code runs in around 35 seconds – very similar to the speed of PyPy. This makes Groovy also seem like a potential very attractive environment for prototyping MCMC codes, especially if I’m thinking about ultimately porting to Java.

Timings

The laptop I’m running everything on is a Dell Precision M4500 with an Intel i7 Quad core (x940@2.13Ghz) CPU, running the 64-bit version of Ubuntu 11.04. I’m running stuff from the Ubuntu (Unity) desktop, and running several terminals and applications, but the machine is not loaded at the time each job runs. I’m running each job 3 times and taking the arithmetic mean real elapsed time. All timings are in seconds.

R 2.12.1 (in memory)	435.0
R 2.12.1 (script)	450.2
Python 2.7.1+	233.5
PyPy 1.5	32.2
Groovy 1.7.4	35.4
Java 1.6.0	11.6
Scala 2.7.7	11.8
C (gcc 4.5.2)	8.1

If we look at speed-up relative to the R code (in-memory version), we get:

R (in memory)	1.00
R (script)	0.97
Python	1.86
PyPy	13.51
Groovy	12.3
Java	37.50
Scala	36.86
C	53.70

Alternatively, we can look at slow-down relative to the C version, to get:

R (in memory)	53.7
R (script)	55.6
Python	28.8
PyPy	4.0
Groovy	4.4
Java	1.4
Scala	1.5
C	1.0

Discussion

The findings here are generally consistent with those of the old post, but consideration of PyPy, Groovy and Scala does throw up some new issues. I was pretty stunned by PyPy. First, I didn’t expect that it would “just work” – I thought I would either have to spend time messing around with my configuration settings, or possibly even have to modify my code slightly. Nope. Running python code with pypy appears to be more than 10 times faster than R, and only 4 times slower than C. I find it quite amazing that it is possible to get python code to run just 4 times slower than C, and if that is indicative of more substantial examples, it really does open up the possibility of using python for “real” problems, although library coverage is currently a problem. It certainly solves my “prototyping problem”. I often like to prototype algorithms in very high level dynamic languages like R and python before porting to a more efficient language. However, I have found that this doesn’t always work well with complex MCMC codes, as they just run too slowly in the dynamic languages to develop, test and debug conveniently. But it looks now as though PyPy should be fast enough at least for prototyping purposes, and may even be fast enough for production code in some circumstances. But then again, exactly the same goes for Groovy, which runs on the JVM, and can access any existing Java library… I haven’t yet looked into Groovy in detail, but it appears that it could be a very nice language for prototyping algorithms that I intend to port to Java.

The results also confirm my previous findings that Java is now “fast enough” that one shouldn’t worry too much about the difference in speed between it and native code written in C (or C++). The Java language is much nicer than C or C++, and the JVM platform is very attractive in many situations. However, the Scala results were also very surprising for me. Scala is a really elegant language (certainly on a par with python), comes with all of the advantages of Java, and appears to be almost as fast as Java. I’m really struggling to come up with reasons not to use Scala for everything!

Speeding up R

MCMC codes are used by a range of different scientists for a range of different problems. However, they are very (most?) often used by Bayesian statisticians who use the algorithms to target a Bayesian posterior distribution. For various (good) reasons, many statisticians are heavily invested in R, like to use R as much as possible, and do as much as possible from within the R environment. These results show why R is not a good language in which to implement MCMC algorithms, so what is an R-dependent statistician supposed to do? One possibility would be to byte-code compile R code in an analogous way to python and pypy. The very latest versions of R support such functionality, but the post by Dirk Eddelbuettel suggests that the current version of cmpfun will only give a 40% speedup on this problem, which is still slower than regular python code. Short of a dramatic improvement in this technology, the only way forward seems to be to extend R using code from another language. It is easy to extend R using C, C++ and Java. I have shown in previous posts how to do this using Java and using C, and the recent post by Dirk shows how to extend using C++. Although interesting, this doesn’t really have much bearing on the current discussion. If you extend using Java you get Java-like speedups, and if you extend using C you get C-like speedups. However, in case people are interested, I intend to gather up these examples into one post and include detailed timing information in a subsequent post.

Calling Java code from R

Introduction

In the previous post I looked at some simple methods for calling C code from R using a simple Gibbs sampler as the motivating example. In this post we will look again at the same Gibbs sampler, but now implemented in Java, and look at a couple of options for calling that code from an R session.

Stand-alone Java code

Below is some Java code for implementing the bivariate Gibbs sampler discussed previously. It relies on Parallel COLT, which must be installed and in the Java CLASSPATH in order to follow the examples.

import java.util.*;
import cern.jet.random.tdouble.*;
import cern.jet.random.tdouble.engine.*;

class Gibbs
{

    public static void main(String[] arg)
    {
	if (arg.length != 3) {
            System.err.println("Usage: java Gibbs <Iters> <Thin> <Seed>");
            System.exit(1);  
        }
	int N=Integer.parseInt(arg[0]);
	int thin=Integer.parseInt(arg[1]);
	int seed=Integer.parseInt(arg[2]);
	DoubleRandomEngine rngEngine=new DoubleMersenneTwister(seed);
	Normal rngN=new Normal(0.0,1.0,rngEngine);
	Gamma rngG=new Gamma(1.0,1.0,rngEngine);
	double x=0,y=0;
	System.out.println("Iter x y");
	for (int i=0;i<N;i++) {
	    for (int j=0;j<thin;j++) {
		x=rngG.nextDouble(3.0,y*y+4);
		y=rngN.nextDouble(1.0/(x+1),1.0/Math.sqrt(x+1));
	    }
	    System.out.println(i+" "+x+" "+y);
	}
    }

}

It can be compiled and run stand-alone from an OS shell with the following commands:

javac Gibbs.java
java Gibbs 10 1000 1

As discussed in the previous post, it is possible to call any command-line program from inside an R session using the system() command. A small wrapper function for conveniently running this code from within R can be written as follows.

gibbs<-function(N=10000,thin=500,
             seed=trunc(runif(1)*1e6),
             exec="Gibbs",
             tmpfile=tempfile())
{
  command=paste("java",exec,N,thin,seed,">",tmpfile)
  system(command)
  read.table(tmpfile,header=TRUE)
}

This can then be run from within an R session with a simple call to gibbs(). Note that a random seed is being generated within R to be passed to the Java code to be used to seed the COLT random number generator used within the Java code. As previously discussed, for many long running codes, this approach can be quite effective, and is clearly very simple. However, there is an overhead associated with the system() call, and also with writing output to disk and then reading it back again.

Using rJava

It is possible to avoid the overheads associated with the above approach by directly calling the Java code from R and having the return values returned directly into the R session from memory. There isn’t really direct support for this within the core R language, but there are couple of different solutions provided by R packages. The simplest and most popular approach seems to be the rJava package. This package can be installed with a simple

install.packages("rJava")

This should “just work” on some OSs (eg. Windows), but may fail on other OSs if R is not aware of the local Java environment. If the installation fails, check the error message carefully for advice/instructions. On most Linux systems, the problem can be fixed by quitting R, then running the following command from the shell

sudo R CMD javareconf

before re-starting R and re-attempting the installation. rJava provides a mechanism for starting a JVM within the running R session, creating objects, calling methods and having method return values returned to R. It is actually much more flexible than the .C() function for C code discussed in the previous post.

In order to use this package for our example, we must first re-factor the code slightly in the following way.

import java.util.*;
import cern.jet.random.tdouble.*;
import cern.jet.random.tdouble.engine.*;

class GibbsR
{

    public static void main(String[] arg)
    {
	if (arg.length != 3) {
            System.err.println("Usage: java GibbsR <Iters> <Thin> <Seed>");
            System.exit(1);  
        }
	int N=Integer.parseInt(arg[0]);
	int thin=Integer.parseInt(arg[1]);
	int seed=Integer.parseInt(arg[2]);
	double[][] mat=gibbs(N,thin,seed);
	System.out.println("Iter x y");
	for (int i=0;i<N;i++) {
	    System.out.println(""+i+" "+mat[0][i]+" "+mat[1][i]);
	}	
    }

    public static double[][] gibbs(int N,int thin,int seed)
    {
	DoubleRandomEngine rngEngine=new DoubleMersenneTwister(seed);
	Normal rngN=new Normal(0.0,1.0,rngEngine);
	Gamma rngG=new Gamma(1.0,1.0,rngEngine);
	double x=0,y=0;
	double[][] mat=new double[2][N];
	for (int i=0;i<N;i++) {
	    for (int j=0;j<thin;j++) {
		x=rngG.nextDouble(3.0,y*y+4);
		y=rngN.nextDouble(1.0/(x+1),1.0/Math.sqrt(x+1));
	    }
	    mat[0][i]=x; mat[1][i]=y;
	}
	return mat;
    }

}

This code can be compiled and run from the command-line just as the previous code could.

javac GibbsR.java
java GibbsR 10 1000 1

However, we have now separated out the code we want to be able to call from R into a static method called gibbs, which runs the Gibbs sampler and stores the result in a 2-dimensional array which is its return value. We can now see how to call this code from within a running R session. We first need to set up the R environment ready to call the code.

library(rJava)
.jinit()
obj=.jnew("GibbsR")

Line 1 loads the package, line 2 starts up the JVM, and line 3 creates a link to the the GibbsR class (in general this is used to create a new Java object of the given type, but here we are using static methods). Java methods are called on Java objects using .jcall(). We can write a simple R function to conveniently call the method as follows.

jgibbs<-function(N=10000,thin=500,seed=trunc(runif(1)*1e6))
{
    result=.jcall(obj,"[[D","gibbs",as.integer(N),as.integer(thin),as.integer(seed))
    mat=sapply(result,.jevalArray)
    mat=cbind(1:N,mat)
    colnames(mat)=c("Iter","x","y")
    mat
}

This can now be called with a simple jgibbs(). The first line of the function body carries out the actual method call. The return type of the method must be explicitly declared – “[[D” means a 2-dimensional array of doubles, using JNI notation. Care must also be taken to coerce the method parameters into the correct type that the Java method expects to receive. .jcall() is generally quite good at unpacking basic Java types into corresponding R types. However, the two dimensional array is here returned as an R list consisting of one-dimensional Java array objects. The unpacking is completed using the subsequent call to jevalArray() using sapply(), before the resulting matrix is tidied up and returned to the R session.

Summary and further reading

We have looked at a couple of very simple methods for calling Java code from an R session. The rJava package is a very flexible mechanism for integrating Java code into R.

I haven’t found a lot of tutorial-level material on the web for the rJava package. However, the package itself has very good documentation associated with it. Start with the information on the rJava home page. From an R session with the rJava package loaded, help(package="rJava") lists the available functions, all of which have associated documentation. ?.jinit, ?.jnew, ?.jcall and ?.jevalArray provide further background and information on the example covered here.

After that, the source code of R packages which use rJava are a useful source of further inspiration – look at the reverse-depends list for rJava in CRAN. In particular, the helloJavaWorld package is a tutorial for how to include Java code in an R package (read the associated vignette).

Calling C code from R

Introduction

In this post I’ll look at how to call compiled C code from an R session. The focus here is on calling C code from R, rather than on extending R using C. Although the two are technically very similar problems, the emphasis is somewhat different. A lot of existing documentation focuses on the latter problem, and this is one of the motivations for writing this post. Fortunately, the problem of calling existing C code from R is a bit simpler than the more general problem of extending R in C.

In a previous post I looked at how to implement a trivial bivariate Gibbs sampler in various languages. It was seen there that the C version ran approximately 60 times faster than the R version. It is therefore often desirable to code up MCMC algorithms in C. However, it is usually very convenient to be able to call such algorithms from inside an R session. There are various ways to do this, ranging from the trivial to very complex. In this post I will look at some of the simpler methods and discuss the pros and cons.

Standalone C code

We will restrict attention to the Gibbs sampler discussed in a previous post. We will focus on the C version of the code. Below is a slightly modified version of the code which includes some command-line arguments that enable some flexibility in how the code is run post-compilation.

#include <stdio.h>
#include <math.h>
#include <stdlib.h>
#include <gsl/gsl_rng.h>
#include <gsl/gsl_randist.h>

int main(int argc, char *argv[])
{
  if (argc!=4) {
    fprintf(stderr,"Usage: %s <Iters> <Thin> <Seed>\n",argv[0]);
    exit(EXIT_FAILURE);
  }
  long N=(long) atoi(argv[1]);
  long thin=(long) atoi(argv[2]);
  long seed=(long) atoi(argv[3]);
  long i,j;
  gsl_rng *r = gsl_rng_alloc(gsl_rng_mt19937);
  gsl_rng_set(r,seed);
  double x=0;
  double y=0;
  printf("Iter x y\n");
  for (i=0;i<N;i++) {
    for (j=0;j<thin;j++) {
      x=gsl_ran_gamma(r,3.0,1.0/(y*y+4));
      y=1.0/(x+1)+gsl_ran_gaussian(r,1.0/sqrt(x+1));
    }
    printf("%ld %f %f\n",i,x,y);
  }
  exit(EXIT_SUCCESS);
}

Assuming a Unix/Linux environment (including a GSL implementation), the above code can be compiled from the Unix shell with a command like:

gcc -O2 -lgsl -lgslcblas standalone.c -o standalone

and run with a command like:

./standalone 10000 500 1 > data.tab

The first command-line argument is the number of iterations required, and the second is the “thin” to be applied to the output. The third argument is the “seed” to be applied to the GSL random number generator (RNG). This allows different (not quite independent – see my post on parallel MCMC for details) runs to be obtained by selecting different seed values. The simplest way to call this code from within an R session is to call this unmodified executable using the R system() command. A small “wrapper” function to do this is given below.

standalone<-function(N=10000,thin=500,
             seed=trunc(runif(1)*1e6),
             exec=file.path(".","standalone"),
             tmpfile=tempfile())
{
  command=paste(exec,N,thin,seed,">",tmpfile)
  system(command)
  read.table(tmpfile,header=TRUE)
}

Note the use of the file.path() and tempfile() R functions in a (probably vain!) attempt to make the code somewhat portable. Just running standalone() from an R session should then return a data frame containing the MCMC output. I gave some commands for analysing this output in a previous post. This approach to calling external code is very simple and crude, and quite generic (it is not specific to C code at all). However, it is very quick and easy to implement, and in many cases quite efficient. There is a considerable computational overhead in executing the system command and parsing output files from disk. However, if the code being called is very computationally intensive and relatively slow (as is typically the case), then this overhead can often be negligible, rendering this approach quite practical.

Building and linking to a shared library

If one is really keen to avoid the overhead of executing an R system command, then it is necessary to compile the required C code into a shared library (or DLL), and link this code into R where it can be called directly via R’s foreign language interface. Below is a version of the previous C code modified to make it appropriate for calling from R.

#include <stdio.h>
#include <math.h>
#include <stdlib.h>
#include <gsl/gsl_rng.h>
#include <gsl/gsl_randist.h>
#include <R.h>

void gibbs(int *Np,int *thinp,int *seedp,double *xvec,double *yvec)
{
  int i,j;
  int N=*Np,thin=*thinp,seed=*seedp;
  gsl_rng *r = gsl_rng_alloc(gsl_rng_mt19937);
  gsl_rng_set(r,seed);
  double x=0;
  double y=0;
  for (i=0;i<N;i++) {
    for (j=0;j<thin;j++) {
      x=gsl_ran_gamma(r,3.0,1.0/(y*y+4));
      y=1.0/(x+1)+gsl_ran_gaussian(r,1.0/sqrt(x+1));
    }
    xvec[i]=x; yvec[i]=y;
  }
}

Note that it is only possible to pass pointers from simple R/C data types, and so all function arguments must be pointers. Also note that there is no return value to the function, and that values are retrieved in R by modifying some of the values pointed to by the pointer arguments. This is the mode of operation imposed by the basic method that R provides for calling C code from R (the .C() function). Note that there are other methods for extending R in C, using the .Call() and .External() functions, but these are beyond the scope of this post. Again assuming a Unix/Linux environment, this code can be compiled into a shared library with a command like:

R CMD SHLIB -lgsl -lgslcblas dynamic.c

It can then be loaded into a running R session with a command like dyn.load("dynamic.so"). Again, if we are attempting to write portable code, we might use a command like:

dyn.load(file.path(".",paste("dynamic",.Platform$dynlib.ext,sep="")))

You can check what dynamic libraries are loaded into the current R session with getLoadedDLLs(). Once the DLL (Dynamic Link Library) is loaded, it can be called using the .C() function. A small wrapper function appropriate in this instance is given below:

dynamic<-function(n=10000,thin=500,seed=trunc(runif(1)*1e6))
{
  tmp=.C("gibbs",as.integer(n),as.integer(thin),
               as.integer(seed),x=as.double(1:n),
                  y=as.double(1:n))
  mat=cbind(1:n,tmp$x,tmp$y) 
  colnames(mat)=c("Iter","x","y")
  mat
}

Note how a random seed is generated in R to be passed to the C code to be used to seed the GSL random generator used within the C code. The code can then be run with a simple call to dynamic() and everything should work OK provided that all of the required libraries are found. This is the simplest way to link C code into R in a way that avoids the overhead associated with a system() call. However, this approach is also not without issues. In particular, the C code relies on the GSL, and more specifically on the random number streams provided by the GSL. These are completely separate from the random number streams used within the R system. In some situations it would make sense to use the same random number streams used within the R session, and to remove the dependence of the C code on the GSL.

Using the R API

The C code discussed in the previous section relies on the GSL only for the generation of (non-uniform) random numbers. Obviously R has its own very sophisticated system for handling random numbers and it is possible to use this system from within externally called C code using the R API. In particular, C versions of functions such as rnorm() and rgamma() can be called in C by including Rmath.h. Below is a version of the C code previously given modified to use the R random number generation routines and to remove all dependence on the GSL.

#include <stdio.h>
#include <math.h>
#include <stdlib.h>
#include <R.h>
#include <Rmath.h>

void gibbsR(int *Np,int *thinp,double *xvec,double *yvec)
{
  int i,j;
  int N=*Np,thin=*thinp;
  GetRNGstate();
  double x=0;
  double y=0;
  for (i=0;i<N;i++) {
    for (j=0;j<thin;j++) {
      x=rgamma(3.0,1.0/(y*y+4));
      y=rnorm(1.0/(x+1),1.0/sqrt(x+1));
    }
    xvec[i]=x; yvec[i]=y;
  }
  PutRNGstate();
}

Note that a call to GetRNGstate() must be made before calling any random number functions and that a call to PutRNGstate() must be called before the function returns control back to R. This code can be compiled with a command like

R CMD SHLIB dynamicR.c

and linked into R with a command like

dyn.load(file.path(".",paste("dynamicR",.Platform$dynlib.ext,sep="")))

An appropriate wrapper for this code is given below:

dynamicR<-function(n=10000,thin=500)
{
  tmp=.C("gibbsR",as.integer(n),as.integer(thin),
                x=as.double(1:n),y=as.double(1:n))
  mat=cbind(1:n,tmp$x,tmp$y) 
  colnames(mat)=c("Iter","x","y")
  mat
}

This code is now slightly simpler, and the lack of dependence on external libraries such as the GSL makes it much easier to integrate into R packages, should this be desired.

Summary and further reading

Foreign language interfaces are a notoriously complex subject and this post has obviously just scratched the surface of the problem. For a few more examples, first see my old computer practicals on Stochastic simulation in R and C. The examples are a bit out of date, but easy to fix. Also see a howto by the Flemish Supercomputing Centre on a similar topic to this one. For more detailed information, see the manual on Writing R extensions, especially the sections on Foreign language interfaces and the R API. I also find Chapter 6 of R Programming for Bioinformatics to be a useful introduction to more complex aspects.

I have also somewhat belatedly re-discovered Charlie Geyer‘s notes on Calling C and Fortran from R, which covers very similar ground to this post. They were probably the unconscious inspiration for this post…