Calling Java code from R

Introduction

In the previous post I looked at some simple methods for calling C code from R using a simple Gibbs sampler as the motivating example. In this post we will look again at the same Gibbs sampler, but now implemented in Java, and look at a couple of options for calling that code from an R session.

Stand-alone Java code

Below is some Java code for implementing the bivariate Gibbs sampler discussed previously. It relies on Parallel COLT, which must be installed and in the Java CLASSPATH in order to follow the examples.

import java.util.*;
import cern.jet.random.tdouble.*;
import cern.jet.random.tdouble.engine.*;

class Gibbs
{

    public static void main(String[] arg)
    {
	if (arg.length != 3) {
            System.err.println("Usage: java Gibbs <Iters> <Thin> <Seed>");
            System.exit(1);  
        }
	int N=Integer.parseInt(arg[0]);
	int thin=Integer.parseInt(arg[1]);
	int seed=Integer.parseInt(arg[2]);
	DoubleRandomEngine rngEngine=new DoubleMersenneTwister(seed);
	Normal rngN=new Normal(0.0,1.0,rngEngine);
	Gamma rngG=new Gamma(1.0,1.0,rngEngine);
	double x=0,y=0;
	System.out.println("Iter x y");
	for (int i=0;i<N;i++) {
	    for (int j=0;j<thin;j++) {
		x=rngG.nextDouble(3.0,y*y+4);
		y=rngN.nextDouble(1.0/(x+1),1.0/Math.sqrt(x+1));
	    }
	    System.out.println(i+" "+x+" "+y);
	}
    }

}

It can be compiled and run stand-alone from an OS shell with the following commands:

javac Gibbs.java
java Gibbs 10 1000 1

As discussed in the previous post, it is possible to call any command-line program from inside an R session using the system() command. A small wrapper function for conveniently running this code from within R can be written as follows.

gibbs<-function(N=10000,thin=500,
             seed=trunc(runif(1)*1e6),
             exec="Gibbs",
             tmpfile=tempfile())
{
  command=paste("java",exec,N,thin,seed,">",tmpfile)
  system(command)
  read.table(tmpfile,header=TRUE)
}

This can then be run from within an R session with a simple call to gibbs(). Note that a random seed is being generated within R to be passed to the Java code to be used to seed the COLT random number generator used within the Java code. As previously discussed, for many long running codes, this approach can be quite effective, and is clearly very simple. However, there is an overhead associated with the system() call, and also with writing output to disk and then reading it back again.

Using rJava

It is possible to avoid the overheads associated with the above approach by directly calling the Java code from R and having the return values returned directly into the R session from memory. There isn’t really direct support for this within the core R language, but there are couple of different solutions provided by R packages. The simplest and most popular approach seems to be the rJava package. This package can be installed with a simple

install.packages("rJava")

This should “just work” on some OSs (eg. Windows), but may fail on other OSs if R is not aware of the local Java environment. If the installation fails, check the error message carefully for advice/instructions. On most Linux systems, the problem can be fixed by quitting R, then running the following command from the shell

sudo R CMD javareconf

before re-starting R and re-attempting the installation. rJava provides a mechanism for starting a JVM within the running R session, creating objects, calling methods and having method return values returned to R. It is actually much more flexible than the .C() function for C code discussed in the previous post.

In order to use this package for our example, we must first re-factor the code slightly in the following way.

import java.util.*;
import cern.jet.random.tdouble.*;
import cern.jet.random.tdouble.engine.*;

class GibbsR
{

    public static void main(String[] arg)
    {
	if (arg.length != 3) {
            System.err.println("Usage: java GibbsR <Iters> <Thin> <Seed>");
            System.exit(1);  
        }
	int N=Integer.parseInt(arg[0]);
	int thin=Integer.parseInt(arg[1]);
	int seed=Integer.parseInt(arg[2]);
	double[][] mat=gibbs(N,thin,seed);
	System.out.println("Iter x y");
	for (int i=0;i<N;i++) {
	    System.out.println(""+i+" "+mat[0][i]+" "+mat[1][i]);
	}	
    }

    public static double[][] gibbs(int N,int thin,int seed)
    {
	DoubleRandomEngine rngEngine=new DoubleMersenneTwister(seed);
	Normal rngN=new Normal(0.0,1.0,rngEngine);
	Gamma rngG=new Gamma(1.0,1.0,rngEngine);
	double x=0,y=0;
	double[][] mat=new double[2][N];
	for (int i=0;i<N;i++) {
	    for (int j=0;j<thin;j++) {
		x=rngG.nextDouble(3.0,y*y+4);
		y=rngN.nextDouble(1.0/(x+1),1.0/Math.sqrt(x+1));
	    }
	    mat[0][i]=x; mat[1][i]=y;
	}
	return mat;
    }

}

This code can be compiled and run from the command-line just as the previous code could.

javac GibbsR.java
java GibbsR 10 1000 1

However, we have now separated out the code we want to be able to call from R into a static method called gibbs, which runs the Gibbs sampler and stores the result in a 2-dimensional array which is its return value. We can now see how to call this code from within a running R session. We first need to set up the R environment ready to call the code.

library(rJava)
.jinit()
obj=.jnew("GibbsR")

Line 1 loads the package, line 2 starts up the JVM, and line 3 creates a link to the the GibbsR class (in general this is used to create a new Java object of the given type, but here we are using static methods). Java methods are called on Java objects using .jcall(). We can write a simple R function to conveniently call the method as follows.

jgibbs<-function(N=10000,thin=500,seed=trunc(runif(1)*1e6))
{
    result=.jcall(obj,"[[D","gibbs",as.integer(N),as.integer(thin),as.integer(seed))
    mat=sapply(result,.jevalArray)
    mat=cbind(1:N,mat)
    colnames(mat)=c("Iter","x","y")
    mat
}

This can now be called with a simple jgibbs(). The first line of the function body carries out the actual method call. The return type of the method must be explicitly declared – “[[D” means a 2-dimensional array of doubles, using JNI notation. Care must also be taken to coerce the method parameters into the correct type that the Java method expects to receive. .jcall() is generally quite good at unpacking basic Java types into corresponding R types. However, the two dimensional array is here returned as an R list consisting of one-dimensional Java array objects. The unpacking is completed using the subsequent call to jevalArray() using sapply(), before the resulting matrix is tidied up and returned to the R session.

Summary and further reading

We have looked at a couple of very simple methods for calling Java code from an R session. The rJava package is a very flexible mechanism for integrating Java code into R.

I haven’t found a lot of tutorial-level material on the web for the rJava package. However, the package itself has very good documentation associated with it. Start with the information on the rJava home page. From an R session with the rJava package loaded, help(package="rJava") lists the available functions, all of which have associated documentation. ?.jinit, ?.jnew, ?.jcall and ?.jevalArray provide further background and information on the example covered here.

After that, the source code of R packages which use rJava are a useful source of further inspiration – look at the reverse-depends list for rJava in CRAN. In particular, the helloJavaWorld package is a tutorial for how to include Java code in an R package (read the associated vignette).