Creating an R package from your code

The first step is optional, but we recommend that you take it from the start, because there is really no downside to it. You work with R code in several files that you run by hand, or diretly put all code into an R package. Creating and managing R packages is very easy, and it’s easier to pass on your code because everything, including help, is in on package.

To make parameter variation of calibrations from R, the first thing thing we need to do is to be able to execute the model with a given set of parameters. We will assume that we want to run the model with parameters coded in an object called par, which should be either a (named) vector or a list.

What happens now depends on how your model is programmed - I have listed the steps in order of convenience / speed. If your model has never been interfaced with R you will likely have to move down to the last option.

Case 1 - model programmed in R

Usually, you will have to do nothing. Just make sure you can call your model like that

runMyModel(par)

Case 2 - model programmed in C / C++, interfaced with RCPP

RCPP is a highly flexible environment to interface between R and C/C++. If your model is coded in C / C++, RCPP offers the most save and powerful way to connect with R (much more flexible than with command line or dll). However, doing the interface may need some larger adjustments in the beginning, and there can be technical problems that are difficult to solve for beginners. We do not recommend to attempt this unless you have RCPP experience, and thus we do not provide further help on this topic.

Case 3 - compiled executable, parameters set via command line (std I/O)

If your model is written in a compiled or interpreted language, and accepts parameters via std I/O, wrapping is usually nothing more than writing the system call in an R function. An example would be

runMyModel(par){
  
  # Create here a string with what you would write to call the model from the command line
  systemCall <- paste("model.exe", par[1], par[2])
  
  # this 
  system(systemCall)
  
}

Note: If you have problems with the system command, try system2.

Case 4 - compiled dll, parameters are set via dll interface

If you have your model prepared as a dll, or you can prepare it that way, you can use the function to link R to your model

dyn.load(model)

runMyModel(par){
  # model call here 
}

The tricky thing in this approach is that you have to code the interface to your dll, which technically means in most programming languages to set your variables as external or something similar, so that they can be accessed from the outside. How this works will depend on the programming language.

Case 5 - compiled model, parameters set via parameter file or in any other method

Many models read parameters with a parameter file. In this case you want to do something like this

runMyModel(par, returnData = NULL){
  
  writeParameters(par)
  
  system("Model.exe")
  
  if(! is.null(returnData)) return(readData(returnData)) # The readData function will be defined later
  
}

writeParameters(par){
  
  # e.g.
  # read template parameter fil
  # replace strings in template file
  # write parameter file 
}

Depending on your problem, it can also make sense to define a setup function such as

setUpModel <- function(parameterTemplate, site, localConditions){
  
  # create the runModel, readData functions (see later) here
  
  return(list(runModel, readData))
  
}

How you do the write parameter function depends on the file format you use for the parameters. In general, you probably want to create a template parameter file that you use as a base and from which you change parameters

If your parameter file is in an .xml format, check out the xml functions in R
If your parameter file is in a general text format, the best option may be to create a template parameter file, place a unique string at the locations of the parameters that you want to replace, and then use string replace functions in R, e.g. grep to replace this string.

Case 6 - compiled model, parameters cannot be changed

You have to change your model code to achieve one of the former options. If the model is in C/C++, going directly to RCPP seems the best alternative.

Step 2: Reading back data

For simple models, you might consider returning the model output directly with the runMyModel function. This is probably so for cases a) and b) above, i.e. model is already in R, or accepts parameters via command line.

More complicated models, however, produce a large number of outputs and you typically don’t need all of them. It is therefore more useful to make on or several separate readData or getDate function. The only two different cases I will consider here is

via dll / RCPP
via file ouputs

Model is a dll If the model is a dll file, the best thing would probably be to implement appropriate getData functions in the source code that can then be called from R. If your model is in C and in a dll, interfacing this via RCPP would probably be easier, because you can directly return R dataframes and other data structures.

Model writes file output If the model writes file output, write a getData function that reads in the model outputs and returns the data in the desired format, typically the same that you would use to represent your field data.

getData(type = X){
  
  read.csv(xxx)
  
  # do some transformation 
  
  # return data in desidered format   
}

From R, you should now be able to do something like that

par = c(1,2,3,4 ..)

runMyModel(par)

output <- getData(type = DesiredType)

plot(output)

Speed optimization and parallelization

For running sensitivity analyses or calibrations, runtime is often an issue. Before you parallelize, make sure your model is as fast as possible.

Easy things

Are you compiling with maximum optimization (e.g. -o3 in cpp)
If you have a spin-up phase, could you increase the time-step during this phase?
Could you increase the time step generally
Do you write unnecessary outputs that you could turn off (harddisk I/O is often slow)?

Difficult things

Make the model directly callable (RCPPor dll) to avoid harddisk I/O
Is it possible to reduce initialization time (not only spin-up, but also for reading in the forcings / drivers) by avoid ending the model executable after each run, but rather keep it “waiting” for a new run.
Code optimization: did you use a profiler? Read up on code optimization
Check for unnecessary calculations in your code / introduce compiler flags where appropriate

Parallelization

A possibilty to speed up the run time of your model is to run it on multiple cores (CPU’s). To do so, you have two choices:

Parallelize the model itself
Parallelize the model call, so that you can do several model evaluations in parallel

Which makes more sense depends a lot on your problem. The usual advice in parallel computing is to parallelize the outer loops first to minimize commonication, which would suggest to start with 2. This is also much easier to program. This approach is particular useful for algorithms that can use a large number of cores in parallel, e.g. sensitivity analyses or SMC sampling. MCMC samplers or optimization algorithms, however, can typically only use a limited number of cores.

To parallelize the model itself will be interesting in particular for very large models, which could otherwise not be calibrated with MCMCs. However, this approach will typically require to write parrallel C/C++ code and require advanced programming skills, which is the reason why we will not further discuss it here.

Let’s assume thus that we want to run several model evaluations in R in parallel, on multiple cores of across a cluster or grid computing system. A first requirement to do so is to to have your model wrapped into an R function (see PREVIOUS SECTION).

If that is the case, R offers a number of options to run functions in parallel. The easiest is to use the parallel package that comes with the R core. For other packages, see the internet and the CRAN task view on High Performance Computing

As an example, assume we have the following, very simple model:

mymodel<-function(x){
  output<-0.2*x+0.1^x
  return(output)
}

To start a parallel computation, we will first need to create a cluster object. Here we will initiate a cluster with 2 CPU’s.

library(parallel)
cl <- makeCluster(2)

runParallel<- function(parList){
  
  parSapply(cl, parList, mymodel)
  
}

runParallel(c(1,2))

## [1] 0.30 0.41

Something like the previous loop is automatized in BayesianTools. You can create a parallel model evaluation function with the function generateParallelExecuter, or alternatively directly in the createBayesianSetup

library(BayesianTools)
parModel <- generateParallelExecuter(mymodel)

Thread safety

Thread safety quite generally means that you can execute multiple instances of your code on your hardware. There are various things that can limit Thread safety, for example

writing outputs to file (several threads might write to the same file at the same time)

Interfacing your model with R

Florian Hartig