Interfacing your model with R

Florian Hartig

Abstract

This tutorial discusses how to interface models written in other programming languages with R, so that they can be fit with BayesianTools

Creating an R package from your code

The first step is optional, but we recommend that you take it from the start, because there is really no downside to it. You work with R code in several files that you run by hand, or diretly put all code into an R package. Creating and managing R packages is very easy, and it’s easier to pass on your code because everything, including help, is in on package.

To make parameter variation of calibrations from R, the first thing thing we need to do is to be able to execute the model with a given set of parameters. We will assume that we want to run the model with parameters coded in an object called par, which should be either a (named) vector or a list.

What happens now depends on how your model is programmed - I have listed the steps in order of convenience / speed. If your model has never been interfaced with R you will likely have to move down to the last option.

Case 1 - model programmed in R

Usually, you will have to do nothing. Just make sure you can call your model like that

Case 2 - model programmed in C / C++, interfaced with RCPP

RCPP is a highly flexible environment to interface between R and C/C++. If your model is coded in C / C++, RCPP offers the most save and powerful way to connect with R (much more flexible than with command line or dll). However, doing the interface may need some larger adjustments in the beginning, and there can be technical problems that are difficult to solve for beginners. We do not recommend to attempt this unless you have RCPP experience, and thus we do not provide further help on this topic.

Case 3 - compiled executable, parameters set via command line (std I/O)

If your model is written in a compiled or interpreted language, and accepts parameters via std I/O, wrapping is usually nothing more than writing the system call in an R function. An example would be

Note: If you have problems with the system command, try system2.

Case 4 - compiled dll, parameters are set via dll interface

If you have your model prepared as a dll, or you can prepare it that way, you can use the function to link R to your model

The tricky thing in this approach is that you have to code the interface to your dll, which technically means in most programming languages to set your variables as external or something similar, so that they can be accessed from the outside. How this works will depend on the programming language.

Case 5 - compiled model, parameters set via parameter file or in any other method

Many models read parameters with a parameter file. In this case you want to do something like this

Depending on your problem, it can also make sense to define a setup function such as

How you do the write parameter function depends on the file format you use for the parameters. In general, you probably want to create a template parameter file that you use as a base and from which you change parameters

Case 6 - compiled model, parameters cannot be changed

You have to change your model code to achieve one of the former options. If the model is in C/C++, going directly to RCPP seems the best alternative.

Step 2: Reading back data

For simple models, you might consider returning the model output directly with the runMyModel function. This is probably so for cases a) and b) above, i.e. model is already in R, or accepts parameters via command line.

More complicated models, however, produce a large number of outputs and you typically don’t need all of them. It is therefore more useful to make on or several separate readData or getDate function. The only two different cases I will consider here is

Model is a dll If the model is a dll file, the best thing would probably be to implement appropriate getData functions in the source code that can then be called from R. If your model is in C and in a dll, interfacing this via RCPP would probably be easier, because you can directly return R dataframes and other data structures.

Model writes file output If the model writes file output, write a getData function that reads in the model outputs and returns the data in the desired format, typically the same that you would use to represent your field data.

getData(type = X){
  
  read.csv(xxx)
  
  # do some transformation 
  
  # return data in desidered format   
}

From R, you should now be able to do something like that

par = c(1,2,3,4 ..)

runMyModel(par)

output <- getData(type = DesiredType)

plot(output)

Speed optimization and parallelization

For running sensitivity analyses or calibrations, runtime is often an issue. Before you parallelize, make sure your model is as fast as possible.

Easy things

Difficult things

Parallelization

A possibilty to speed up the run time of your model is to run it on multiple cores (CPU’s). To do so, you have two choices:

  1. Parallelize the model itself
  2. Parallelize the model call, so that you can do several model evaluations in parallel

Which makes more sense depends a lot on your problem. The usual advice in parallel computing is to parallelize the outer loops first to minimize commonication, which would suggest to start with 2. This is also much easier to program. This approach is particular useful for algorithms that can use a large number of cores in parallel, e.g. sensitivity analyses or SMC sampling. MCMC samplers or optimization algorithms, however, can typically only use a limited number of cores.

To parallelize the model itself will be interesting in particular for very large models, which could otherwise not be calibrated with MCMCs. However, this approach will typically require to write parrallel C/C++ code and require advanced programming skills, which is the reason why we will not further discuss it here.

Let’s assume thus that we want to run several model evaluations in R in parallel, on multiple cores of across a cluster or grid computing system. A first requirement to do so is to to have your model wrapped into an R function (see PREVIOUS SECTION).

If that is the case, R offers a number of options to run functions in parallel. The easiest is to use the parallel package that comes with the R core. For other packages, see the internet and the CRAN task view on High Performance Computing

As an example, assume we have the following, very simple model:

To start a parallel computation, we will first need to create a cluster object. Here we will initiate a cluster with 2 CPU’s.

## [1] 0.30 0.41

Something like the previous loop is automatized in BayesianTools. You can create a parallel model evaluation function with the function generateParallelExecuter, or alternatively directly in the createBayesianSetup

Thread safety

Thread safety quite generally means that you can execute multiple instances of your code on your hardware. There are various things that can limit Thread safety, for example

  • writing outputs to file (several threads might write to the same file at the same time)