The last Valencia meeting on Bayesian Statistics and the future of Bayesian computation

I’ve spent the last week in Benidorm, Spain, for the 9th and final Valencia meeting on Bayesian Statistics. Nine of us travelled from Newcastle University, making us one of the best represented groups at the meeting. This was my fifth Valencia meeting – the first I attended was Valencia 5 which took place in Alicante back in 1994 when I was a PhD student at Durham University working with Michael Goldstein. Our contributed paper to the proceedings of that meeting was my first publication, and I’ve been to every meeting since. Michael is one of the few people to have attended all 9 Valencia meetings (in addition to the somewhat mythical “Valencia 0”). It was therefore very fitting that Michael opened the Valencia 9 invited programme (with a great talk on Bayesian analysis of complex computer models). Like many others, I’ll be a little sad to think that the Valencia meetings have come to an end.

The meeting itself was scientifically very interesting. I wish that I had the energy to give a summary of the scientific programme, but unfortunately I don’t! However, anyone who does want to get something of the flavour of the programme should take a look at the “Valencia Snapshots” on Christian Robert’s blog. My own talk gets a mention in Snapshot 4. I presented a paper entitled Parameter inference for stochastic kinetic models of bacterial gene regulation: a Bayesian approach to systems biology. Unfortunately my discussant, Sam Kou, was unable to travel to the meeting due to passport problems, but very kindly produced a pre-recorded video discussion to be played to me and the audience at the end of my talk. After a brief problem with the audio (a recurring theme of the meeting!), this actually worked quite well, though it felt slightly strange replying to his discussion knowing that he could not hear what I was saying!

There were several talks discussing Bayesian approaches to challenging problems in bioinformatics and molecular biology, and these were especially interesting to me. I was also particularly interested in the talks on Bayesian computation. Several talks mentioned the possibility of speeding up Bayesian computation using GPUs, and Chris Holmes gave a nice overview of the current technology and its potential, together with a link to a website providing further information. Although there is no doubt that GPU technology can provide fairly impressive speedups for certain Bayesian computations, I’m actually a little bit of a GPU-sceptic, so let me explain why. There are many reasons. First, I’m always a bit suspicious of a technology that is fairly closed and proprietary being pushed by a large powerful company – I prefer my hardware to be open, and my software to be free and open. Next, there isn’t really anything that you can do on a GPU that you can’t do on a decent multicore server or cluster using standard well established technologies such as MPI and OpenMP. Also, GPUs are relatively difficult to program, and time taken for software development is a very expensive cost which in many cases will dwarf differences in hardware costs. Also, in the days when 64 bit chips and many GB of RAM are sitting on everyone’s desktops, do I really want to go back to 1 GB of RAM, single precision arithmetic and no math libraries?! That hardly seems like the future I’m envisaging! Next, there are other related products like the Intel Knights Corner on the horizon that are likely to offer similar performance gains while being much simpler to develop for. Next, it seems likely to me that machines in the future are going to feature massively multicore CPUs, rendering GPU computing obsolete. Finally, although GPUs offer one possible approach to tackling the problem of speedup, they do little for the far more general and important problem of scalability of Bayesian computing and software. From that perspective, I really enjoyed the talk by Andrew McCallum on Probabilistic programming with imperatively-defined factor graphs. Andrew was talking about a flexible machine learning library he is developing called factorie for the interesting new language Scala. Whilst that particular library is not exactly what I need, fundamentally, his talk was all about building frameworks for Bayesian computation which really scale. I think this is the real big issue facing Bayesian computation, and modern languages and software platforms, including so-called cloud computing approaches, and technologies like Hadoop and MapReduce probably represent some of the directions we should be looking in. There is an interesting project called CRdata which is a first step in that direction. Clearly these technologies are somewhat orthogonal to the GPU/speedup issue, but I don’t think they are completely unrelated.