Parallel Monte Carlo using Scala

Introduction

In previous posts I have discussed general issues regarding parallel MCMC and examined in detail parallel Monte Carlo on a multicore laptop. In those posts I used the C programming language in conjunction with the MPI parallel library in order to illustrate the concepts. In this post I want to take the example from the second post and re-examine it using the Scala programming language.

The toy problem considered in the parallel Monte Carlo post used 10^9 U(0,1) random quantities to construct a Monte Carlo estimate of the integral

\displaystyle I=\int_0^1\exp\{-u^2\}du.

A very simple serial program to implement this algorithm is given below:

import java.util.concurrent.ThreadLocalRandom
import scala.math.exp
import scala.annotation.tailrec

object MonteCarlo {

  @tailrec
  def sum(its: Long,acc: Double): Double = {
    if (its==0) 
      (acc)
    else {
      val u=ThreadLocalRandom.current().nextDouble()
      sum(its-1,acc+exp(-u*u))
    }
  }

  def main(args: Array[String]) = {
    println("Hello")
    val iters=1000000000
    val result=sum(iters,0.0)
    println(result/iters)
    println("Goodbye")
  }

}

Note that ThreadLocalRandom is a parallel random number generator introduced into recent versions of the Java programming language, which can be easily utilised from Scala code. Assuming that Scala is installed, this can be compiled and run with commands like

scalac monte-carlo.scala
time scala MonteCarlo

This program works, and the timings (in seconds) for three runs are 57.79, 57.77 and 57.55 on the same laptop considered in the previous post. The first thing to note is that this Scala code is actually slightly faster than the corresponding C+MPI code in the single processor special case! Now that we have a good working implementation we should think how to parallelise it…

Parallel implementation

Before constructing a parallel implementation, we will first construct a slightly re-factored serial version that will be easier to parallelise. The simplest way to introduce parallelisation into Scala code is to parallelise a map over a collection. We therefore need a collection and a map to apply to it. Here we will just divide our 10^9 iterations into N=4 separate computations, and use a map to compute the required Monte Carlo sums.

import java.util.concurrent.ThreadLocalRandom
import scala.math.exp
import scala.annotation.tailrec

object MonteCarlo {

  @tailrec
  def sum(its: Long,acc: Double): Double = {
    if (its==0) 
      (acc)
    else {
      val u=ThreadLocalRandom.current().nextDouble()
      sum(its-1,acc+exp(-u*u))
    }
  }

  def main(args: Array[String]) = {
    println("Hello")
    val N=4
    val iters=1000000000
    val its=iters/N
    val sums=(1 to N).toList map {x => sum(its,0.0)}
    val result=sums.reduce(_+_)
    println(result/iters)
    println("Goodbye")
  }

}

Running this new code confirms that it works and gives similar estimates for the Monte Carlo integral as the previous version. The timings for 3 runs on my laptop were 57.57, 57.67 and 57.80, similar to the previous version of the code. So far so good. But how do we make it parallel? Like this:

import java.util.concurrent.ThreadLocalRandom
import scala.math.exp
import scala.annotation.tailrec

object MonteCarlo {

  @tailrec
  def sum(its: Long,acc: Double): Double = {
    if (its==0) 
      (acc)
    else {
      val u=ThreadLocalRandom.current().nextDouble()
      sum(its-1,acc+exp(-u*u))
    }
  }

  def main(args: Array[String]) = {
    println("Hello")
    val N=4
    val iters=1000000000
    val its=iters/N
    val sums=(1 to N).toList.par map {x => sum(its,0.0)}
    val result=sums.reduce(_+_)
    println(result/iters)
    println("Goodbye")
  }

}

That’s it! It’s now parallel. Studying the above code reveals that the only difference from the previous version is the introduction of the 4 characters .par in line 22 of the code. R programmers will find this very much analagous to using lapply() versus mclapply() in R code. The function par converts the collection (here an immutable List) to a parallel collection (here an immutable parallel List), and then subsequent maps, filters, etc., can be computed in parallel on appropriate multicore architectures. Timings for 3 runs on my laptop were 20.74, 20.82 and 20.88. Note that these timings are faster than the timings for N=4 processors for the corresponding C+MPI code…

Varying the size of the parallel collection

We can trivially modify the previous code to make the size of the parallel collection, N, a command line argument:

import java.util.concurrent.ThreadLocalRandom
import scala.math.exp
import scala.annotation.tailrec

object MonteCarlo {

  @tailrec
  def sum(its: Long,acc: Double): Double = {
    if (its==0) 
      (acc)
    else {
      val u=ThreadLocalRandom.current().nextDouble()
      sum(its-1,acc+exp(-u*u))
    }
  }

  def main(args: Array[String]) = {
    println("Hello")
    val N=args(0).toInt
    val iters=1000000000
    val its=iters/N
    val sums=(1 to N).toList.par map {x => sum(its,0.0)}
    val result=sums.reduce(_+_)
    println(result/iters)
    println("Goodbye")
  }

}

We can now run this code with varying sizes of N in order to see how the runtime of the code changes as the size of the parallel collection increases. Timings on my laptop are summarised in the table below.

 N     T1     T2     T3
 1   57.67  57.62  57.83
 2   32.20  33.24  32.76
 3   26.63  26.60  26.63
 4   20.99  20.92  20.75
 5   20.13  18.70  18.76
 6   16.57  16.52  16.59
 7   15.72  14.92  15.27
 8   13.56  13.51  13.32
 9   18.30  18.13  18.12
10   17.25  17.33  17.22
11   17.04  16.99  17.09
12   15.95  15.85  15.91

16   16.62  16.68  16.74
32   15.41  15.54  15.42
64   15.03  15.03  15.28

So we see that the timings decrease steadily until the size of the parallel collection hits 8 (the number of processors my hyper-threaded quad-core presents via Linux), and then increases very slightly, but not much as the size of the collection increases. This is better than the case of C+MPI where performance degrades noticeably if too many processes are requested. Here, the Scala compiler and JVM runtime manage an appropriate number of threads for the collection irrespective of the actual size of the collection. Also note that all of the timings are faster than the corresponding C+MPI code discussed in the previous post.

However, the notion that the size of the collection is irrelevant is only true up to a point. Probably the most natural way to code this algorithm would be as:

import java.util.concurrent.ThreadLocalRandom
import scala.math.exp

object MonteCarlo {

  def main(args: Array[String]) = {
    println("Hello")
    val iters=1000000000
    val sums=(1 to iters).toList map {x => ThreadLocalRandom.current().nextDouble()} map {x => exp(-x*x)}
    val result=sums.reduce(_+_)
    println(result/iters)
    println("Goodbye")
  }

}

or as the parallel equivalent

import java.util.concurrent.ThreadLocalRandom
import scala.math.exp

object MonteCarlo {

  def main(args: Array[String]) = {
    println("Hello")
    val iters=1000000000
    val sums=(1 to iters).toList.par map {x => ThreadLocalRandom.current().nextDouble()} map {x => exp(-x*x)}
    val result=sums.reduce(_+_)
    println(result/iters)
    println("Goodbye")
  }

}

Although these algorithms are in many ways cleaner and more natural, they will bomb out with a lack of heap space unless you have a huge amount of RAM, as they rely on having all 10^9 realisations in RAM simultaneously. The lesson here is that even though functional languages make it very easy to write clean, efficient parallel code, we must still be careful not to fill up the heap with gigantic (immutable) data structures…

Advertisement

Scala as a platform for statistical computing and data science

There has been a lot of discussion on-line recently about languages for data analysis, statistical computing, and data science more generally. I don’t really want to go into the detail of why I believe that all of the common choices are fundamentally and unfixably flawed – language wars are so unseemly. Instead I want to explain why I’ve been using the Scala programming language recently and why, despite being far from perfect, I personally consider it to be a good language to form a platform for efficient and scalable statistical computing. Obviously, language choice is to some extent a personal preference, implicitly taking into account subjective trade-offs between features different individuals consider to be important. So I’ll start by listing some language/library/ecosystem features that I think are important, and then explain why.

A feature wish list

It should:

  • be a general purpose language with a sizable user community and an array of general purpose libraries, including good GUI libraries, networking and web frameworks
  • be free, open-source and platform independent
  • be fast and efficient
  • have a good, well-designed library for scientific computing, including non-uniform random number generation and linear algebra
  • have a strong type system, and be statically typed with good compile-time type checking and type safety
  • have reasonable type inference
  • have a REPL for interactive use
  • have good tool support (including build tools, doc tools, testing tools, and an intelligent IDE)
  • have excellent support for functional programming, including support for immutability and immutable data structures and “monadic” design
  • allow imperative programming for those (rare) occasions where it makes sense
  • be designed with concurrency and parallelism in mind, having excellent language and library support for building really scalable concurrent and parallel applications

The not-very-surprising punch-line is that Scala ticks all of those boxes and that I don’t know of any other languages that do. But before expanding on the above, it is worth noting a couple of (perhaps surprising) omissions. For example:

  • have excellent data viz capability built-in
  • have vast numbers of statistical routines in the standard library

The above are points (and there are other similar points) where other languages (for example, R), currently score better than Scala. It is not that these things are not important – indeed, they are highly desirable. But I consider them to be of lesser importance as they are much easier to fix, given a suitable platform, than fixing an unsuitable language and platform. Visualisation is not trivial, but it is not fantastically difficult in a language with excellent GUI libraries. Similarly, most statistical routines are quite straightforward to implement for anyone with reasonable expertise in scientific and statistical computing and numerical linear algebra. These are things that are relatively easy for a community to contribute to. Building a great programming language, on the other hand, is really, really, difficult.

I will now expand briefly on each point in turn.

be a general purpose language with a sizable user community and an array of general purpose libraries, including good GUI libraries, networking and web frameworks

History has demonstrated, time and time again, that domain specific languages (DSLs) are synonymous with idiosyncratic, inconsistent languages that are terrible for anything other than what they were specifically designed for. They can often be great for precisely the thing that they were designed for, but people always want to do other things, and that is when the problems start. For the avoidance of controversy I won’t go into details, but the whole Python versus R thing is a perfect illustration of this general versus specific trade-off. Similarly, although there has been some buzz around another new language recently, which is faster than R and Python, my feeling is that the last thing the world needs right now is Just Unother Language for Indexed Arrays…

In this day-and-age it is vital that statistical code can use a variety of libraries, and communicate with well-designed network libraries and web frameworks, as statistical analysis does not exist in a vacuum. Scala certainly fits the bill here, being used in a large number of important high-profile systems, ensuring a lively, well-motivated ecosystem. There are numerous well-maintained libraries for almost any task. Picking on web frameworks, for example, there are a number of excellent libraries, including Lift and Play. Scala also has the advantage of offering seamless Java integration, for those (increasingly rare) occasions when a native Scala library for the task at hand doesn’t exist.

be free, open-source and platform independent

This hardly needs expanding upon, other than to observe that there are a few well-known commercial software solutions for scientific, statistical and mathematical computing. There are all kinds of problems with using closed proprietary systems, including transparency and reproducibility, but also platform and scalability problems. eg. running code requiring a license server in the cloud. The academic statistical community has largely moved away from commercial software, and I don’t think there is any going back. Scala is open source and runs on the JVM, which is about as platform independent as it is possible to get.

be fast and efficient

Speed and efficiency continue to be important, despite increasing processor speeds. Computationally intensive algorithms are being pushed to ever larger and more complex models and data sets. Compute cycles and memory efficiency really matter, and can’t be ignored. This doesn’t mean that we all have to code in C/C++/Fortran, but we can’t afford to code in languages which are orders of magnitude slower. This will always be a problem. Scala code generally runs well within a factor of 2 of comparable native code – see my Gibbs sampler post for a simple example including timings.

have a good, well-designed library for scientific computing, including non-uniform random number generation and linear algebra

I hesitated about including this in my list of essentials, because it is certainly something that can, in principle, be added to a language at a later date. However, building such libraries is far from trivial, and they need to be well-designed, comprehensive and efficient. For Scala, Breeze is rapidly becoming the standard scientific library, including special functions, non-uniform random number generation and numerical linear algebra. For a data library, there is Saddle, and for a scalable analytics library there is Spark. These libraries certainly don’t cover everything that can be found in R/CRAN, but they provide a fairly solid foundation on which to build.

have a strong type system, and be statically typed with good compile-time type checking and type safety

I love dynamic languages – they are fun and exciting. It is fun to quickly throw together a few functions in a scripting language without worrying about declaring the types of anything. And it is exciting to see the myriad of strange and unanticipated ways your code can crash-and-burn at runtime! 😉 But this excitement soon wears off, and you end up adding lots of boilerplate argument checking code that would not only be much cleaner and simpler in a statically typed language, but would be checked at compile-time, making the static code faster and more efficient. For messing about prototyping, dynamic languages are attractive, but as a solid platform for statistical computing, they really don’t make sense. Scala has a strong type system offering a high degree of compile-time checking, making it a safe and efficient language.

have reasonable type inference

A common issue with statically typed languages is that they lead to verbose code containing many redundant type declarations that the compiler ought to be able to check. This doesn’t just mean more typing – it leads to verbose code that can hide the program logic. Languages with type inference offer the best of both worlds – the safety of static typing without the verbosity. Scala does a satisfactory job here.

have a REPL for interactive use

One thing that dynamic languages have taught us is that it is actually incredibly useful to have a REPL for interactive analysis. This is true generally, but especially so for statistical computing, where human intervention is often desirable. Again, Scala has a nice REPL.

have good tool support (including build tools, doc tools, testing tools, and an intelligent IDE)

Tools matter. Scala has an excellent build tool in the SBT. It has code documentation in the form of scaladoc (similar to javadoc). It has a unit testing framework, and a reasonably intelligent IDE in the form of the Scala IDE (based on Eclipse).

have excellent support for functional programming, including support for immutability and immutable data structures and “monadic” design

I, like many others, am gradually coming to realise that functional programming offers many advantages over other programming styles. In particular, it provides best route to building scalable software, in terms of both program complexity and data size/complexity. Scala has good support for functional programming, including immutable named values, immutable data structures and for-comprehensions. And if off-the-shelf Scala isn’t sufficiently functional already, libraries such as scalaz make it even more so.

allow imperative programming for those (rare) occasions where it makes sense

Although most algorithms in scientific computing are typically conceived of and implemented in an imperative style, I’m increasingly convinced that most can be recast in a pure functional way without significant loss of efficiency, and with significant benefits. That said, there really are some problems that are more efficient to implement in an imperative framework. It is therefore important that the language is not so “pure” functional that this is forbidden. Again, Scala fits the bill.

be designed with concurrency and parallelism in mind, having excellent language and library support for building really scalable concurrent and parallel applications

These days scalability typically means exploiting concurrency and parallelism. In an imperative world this is hard, and libraries such as MPI prove that it is difficult to bolt parallelism on top of a language post-hoc. Check-points, communication overhead, deadlocks and race conditions make it very difficult to build codes that scale well to more than a few processors. Concurrency is more straightforward in functional languages, and this is one of the reasons for the recent resurgence of functional languages and programming. Scala has good concurrency support built-in, and libraries such as Akka make it relatively easy to build truly scalable software.

Summary

The Scala programming language ticks many boxes when it comes to forming a nice solid foundation for building a platform for efficient scalable statistical computing. Although I still use R and Python almost every day, I’m increasingly using Scala for serious algorithm development. In the short term I can interface to my Scala code from R using jvmr, but in the longer term I hope that Scala will become a complete framework for statistics and data science. In a subsequent post I will attempt to give a very brief introduction to Scala and the Breeze numerical library.