Calling C code from R

Introduction

In this post I’ll look at how to call compiled C code from an R session. The focus here is on calling C code from R, rather than on extending R using C. Although the two are technically very similar problems, the emphasis is somewhat different. A lot of existing documentation focuses on the latter problem, and this is one of the motivations for writing this post. Fortunately, the problem of calling existing C code from R is a bit simpler than the more general problem of extending R in C.

In a previous post I looked at how to implement a trivial bivariate Gibbs sampler in various languages. It was seen there that the C version ran approximately 60 times faster than the R version. It is therefore often desirable to code up MCMC algorithms in C. However, it is usually very convenient to be able to call such algorithms from inside an R session. There are various ways to do this, ranging from the trivial to very complex. In this post I will look at some of the simpler methods and discuss the pros and cons.

Standalone C code

We will restrict attention to the Gibbs sampler discussed in a previous post. We will focus on the C version of the code. Below is a slightly modified version of the code which includes some command-line arguments that enable some flexibility in how the code is run post-compilation.

#include <stdio.h>
#include <math.h>
#include <stdlib.h>
#include <gsl/gsl_rng.h>
#include <gsl/gsl_randist.h>

int main(int argc, char *argv[])
{
  if (argc!=4) {
    fprintf(stderr,"Usage: %s <Iters> <Thin> <Seed>\n",argv[0]);
    exit(EXIT_FAILURE);
  }
  long N=(long) atoi(argv[1]);
  long thin=(long) atoi(argv[2]);
  long seed=(long) atoi(argv[3]);
  long i,j;
  gsl_rng *r = gsl_rng_alloc(gsl_rng_mt19937);
  gsl_rng_set(r,seed);
  double x=0;
  double y=0;
  printf("Iter x y\n");
  for (i=0;i<N;i++) {
    for (j=0;j<thin;j++) {
      x=gsl_ran_gamma(r,3.0,1.0/(y*y+4));
      y=1.0/(x+1)+gsl_ran_gaussian(r,1.0/sqrt(x+1));
    }
    printf("%ld %f %f\n",i,x,y);
  }
  exit(EXIT_SUCCESS);
}

Assuming a Unix/Linux environment (including a GSL implementation), the above code can be compiled from the Unix shell with a command like:

gcc -O2 -lgsl -lgslcblas standalone.c -o standalone

and run with a command like:

./standalone 10000 500 1 > data.tab

The first command-line argument is the number of iterations required, and the second is the “thin” to be applied to the output. The third argument is the “seed” to be applied to the GSL random number generator (RNG). This allows different (not quite independent – see my post on parallel MCMC for details) runs to be obtained by selecting different seed values. The simplest way to call this code from within an R session is to call this unmodified executable using the R system() command. A small “wrapper” function to do this is given below.

standalone<-function(N=10000,thin=500,
             seed=trunc(runif(1)*1e6),
             exec=file.path(".","standalone"),
             tmpfile=tempfile())
{
  command=paste(exec,N,thin,seed,">",tmpfile)
  system(command)
  read.table(tmpfile,header=TRUE)
}

Note the use of the file.path() and tempfile() R functions in a (probably vain!) attempt to make the code somewhat portable. Just running standalone() from an R session should then return a data frame containing the MCMC output. I gave some commands for analysing this output in a previous post. This approach to calling external code is very simple and crude, and quite generic (it is not specific to C code at all). However, it is very quick and easy to implement, and in many cases quite efficient. There is a considerable computational overhead in executing the system command and parsing output files from disk. However, if the code being called is very computationally intensive and relatively slow (as is typically the case), then this overhead can often be negligible, rendering this approach quite practical.

Building and linking to a shared library

If one is really keen to avoid the overhead of executing an R system command, then it is necessary to compile the required C code into a shared library (or DLL), and link this code into R where it can be called directly via R’s foreign language interface. Below is a version of the previous C code modified to make it appropriate for calling from R.

#include <stdio.h>
#include <math.h>
#include <stdlib.h>
#include <gsl/gsl_rng.h>
#include <gsl/gsl_randist.h>
#include <R.h>

void gibbs(int *Np,int *thinp,int *seedp,double *xvec,double *yvec)
{
  int i,j;
  int N=*Np,thin=*thinp,seed=*seedp;
  gsl_rng *r = gsl_rng_alloc(gsl_rng_mt19937);
  gsl_rng_set(r,seed);
  double x=0;
  double y=0;
  for (i=0;i<N;i++) {
    for (j=0;j<thin;j++) {
      x=gsl_ran_gamma(r,3.0,1.0/(y*y+4));
      y=1.0/(x+1)+gsl_ran_gaussian(r,1.0/sqrt(x+1));
    }
    xvec[i]=x; yvec[i]=y;
  }
}

Note that it is only possible to pass pointers from simple R/C data types, and so all function arguments must be pointers. Also note that there is no return value to the function, and that values are retrieved in R by modifying some of the values pointed to by the pointer arguments. This is the mode of operation imposed by the basic method that R provides for calling C code from R (the .C() function). Note that there are other methods for extending R in C, using the .Call() and .External() functions, but these are beyond the scope of this post. Again assuming a Unix/Linux environment, this code can be compiled into a shared library with a command like:

R CMD SHLIB -lgsl -lgslcblas dynamic.c

It can then be loaded into a running R session with a command like dyn.load("dynamic.so"). Again, if we are attempting to write portable code, we might use a command like:

dyn.load(file.path(".",paste("dynamic",.Platform$dynlib.ext,sep="")))

You can check what dynamic libraries are loaded into the current R session with getLoadedDLLs(). Once the DLL (Dynamic Link Library) is loaded, it can be called using the .C() function. A small wrapper function appropriate in this instance is given below:

dynamic<-function(n=10000,thin=500,seed=trunc(runif(1)*1e6))
{
  tmp=.C("gibbs",as.integer(n),as.integer(thin),
               as.integer(seed),x=as.double(1:n),
                  y=as.double(1:n))
  mat=cbind(1:n,tmp$x,tmp$y) 
  colnames(mat)=c("Iter","x","y")
  mat
}

Note how a random seed is generated in R to be passed to the C code to be used to seed the GSL random generator used within the C code. The code can then be run with a simple call to dynamic() and everything should work OK provided that all of the required libraries are found. This is the simplest way to link C code into R in a way that avoids the overhead associated with a system() call. However, this approach is also not without issues. In particular, the C code relies on the GSL, and more specifically on the random number streams provided by the GSL. These are completely separate from the random number streams used within the R system. In some situations it would make sense to use the same random number streams used within the R session, and to remove the dependence of the C code on the GSL.

Using the R API

The C code discussed in the previous section relies on the GSL only for the generation of (non-uniform) random numbers. Obviously R has its own very sophisticated system for handling random numbers and it is possible to use this system from within externally called C code using the R API. In particular, C versions of functions such as rnorm() and rgamma() can be called in C by including Rmath.h. Below is a version of the C code previously given modified to use the R random number generation routines and to remove all dependence on the GSL.

#include <stdio.h>
#include <math.h>
#include <stdlib.h>
#include <R.h>
#include <Rmath.h>

void gibbsR(int *Np,int *thinp,double *xvec,double *yvec)
{
  int i,j;
  int N=*Np,thin=*thinp;
  GetRNGstate();
  double x=0;
  double y=0;
  for (i=0;i<N;i++) {
    for (j=0;j<thin;j++) {
      x=rgamma(3.0,1.0/(y*y+4));
      y=rnorm(1.0/(x+1),1.0/sqrt(x+1));
    }
    xvec[i]=x; yvec[i]=y;
  }
  PutRNGstate();
}

Note that a call to GetRNGstate() must be made before calling any random number functions and that a call to PutRNGstate() must be called before the function returns control back to R. This code can be compiled with a command like

R CMD SHLIB dynamicR.c

and linked into R with a command like

dyn.load(file.path(".",paste("dynamicR",.Platform$dynlib.ext,sep="")))

An appropriate wrapper for this code is given below:

dynamicR<-function(n=10000,thin=500)
{
  tmp=.C("gibbsR",as.integer(n),as.integer(thin),
                x=as.double(1:n),y=as.double(1:n))
  mat=cbind(1:n,tmp$x,tmp$y) 
  colnames(mat)=c("Iter","x","y")
  mat
}

This code is now slightly simpler, and the lack of dependence on external libraries such as the GSL makes it much easier to integrate into R packages, should this be desired.

Summary and further reading

Foreign language interfaces are a notoriously complex subject and this post has obviously just scratched the surface of the problem. For a few more examples, first see my old computer practicals on Stochastic simulation in R and C. The examples are a bit out of date, but easy to fix. Also see a howto by the Flemish Supercomputing Centre on a similar topic to this one. For more detailed information, see the manual on Writing R extensions, especially the sections on Foreign language interfaces and the R API. I also find Chapter 6 of R Programming for Bioinformatics to be a useful introduction to more complex aspects.

I have also somewhat belatedly re-discovered Charlie Geyer‘s notes on Calling C and Fortran from R, which covers very similar ground to this post. They were probably the unconscious inspiration for this post…

About these ads

Tags: , , , , , , , , , , , , , , ,

13 Responses to “Calling C code from R”

  1. Calling Java code from R « Darren Wilkinson's research blog Says:

    [...] Darren Wilkinson's research blog Statistics, computing, Bayes, stochastic modelling, systems biology and bioinformatics « Calling C code from R [...]

  2. Gibbs sampler in various languages (revisited) « Darren Wilkinson's research blog Says:

    [...] to extend R using C, C++ and Java. I have shown in previous posts how to do this using Java and using C, and the recent post by Dirk shows how to extend using C++. Although interesting, this [...]

  3. Faster Gibbs sampling MCMC from within R « Darren Wilkinson's research blog Says:

    [...] how fast the different options turn out to be. The post draws heavily on my previous posts on calling C from R and calling Java from R, as well as Dirk Eddelbuettel’s post on calling C++ from R, and it [...]

  4. Afshin Sadeghi Says:

    Hi,I am now thinking to use the functionality of both C and R jointly, but is it faster like the C extensions written for R?

    • darrenjw Says:

      I’m not sure I understand your question, but writing C code and linking it in to R is one way to extend R using C. It is usually much faster than just writing pure R code, but is rather problem-dependent.

  5. Catalogue of my first 25 blog posts « Darren Wilkinson's research blog Says:

    [...] Calling C code from R: how to call a Gibbs sampler written in C from [...]

  6. Rajen Shah Says:

    Thanks for the excellent post! I also read your post on using MPI, and I was wondering whether it’s possible to call C code which uses MPI, from R?

    • darrenjw Says:

      I don’t see why not, but I’ve never done it, and it may be a bit fiddly. Also see the “Rmpi” package for using MPI directly from R.

  7. Using R — Calling C Code ‘Hello World’ | Working With Data Says:

    [...] Calling C code from R (Acadia Centre for Mathematical Modelling and Computation) [...]

  8. Piggy Says:

    Cool! I really needed such a thing.
    Thanks!

  9. lindonslog Says:

    Hi Darren,

    The purists always maintain that .C() should not be used and that .Call() should be used instead as the latter incurs some data transfer overheads. However, .C(), for me personally seems like the easiest thing to grab onto and the unprotect/protect statements with .Call() obfuscate the code quite a bit. When you write extensions for R, do you personally use .C() or .Call()? I’m just wondering whether I should really care about this extra overhead with .C().

    • darrenjw Says:

      I generally use .C(), because it’s easier, and is usually fine for what I need. But .Call() really is better, and so you probably should consider using it if you do a lot of writing of R extensions using C.

  10. lindonslog Says:

    I mean’t former..

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


Follow

Get every new post delivered to your Inbox.

Join 155 other followers

%d bloggers like this: