The dqrng package provides fast random number generators (RNG) with good statistical properties for usage with R. It combines these RNGs with fast distribution functions to sample from uniform, normal or exponential distributions. Both the RNGs and the distribution functions are distributed as C++ header-only library.
Support for the following 64bit RNGs are currently included:
Of these RNGs Xoroshiro128+ is fastest and therefore used in the examples and set as default RNG.
Using these RNGs from R is deliberately similar to using R’s build-in RNGs:
dqRNGkind()
is analogue to RNGkind()
and sets the RNGdqset.seed()
is analogue to set.seed()
and sets the seeddqrunif()
, dqrnorm()
, and dqrexp()
are analogue to runif()
, rnorm()
, and rexp()
and sample from uniform, normal or exponential distributionsdqsample()
and dqsample.int
are analogue to sample
and sample.int
for creating random samples of vectors and vector indicesLet’s look at the classical example of calculating \(\pi\) via simulation. The basic idea is to generate a large number of random points within the unit square. An approximation for \(\pi\) can then be calculated from the ratio of points within the unit circle to the total number of points. A vectorized implementation in R where we can switch the RNG might look like this:
<- 1e7
N <- function(n, rng = runif) {
piR <- rng(n)
x <- rng(n)
y 4 * sum(sqrt(x^2 + y^2) < 1.0) / n
}set.seed(42)
system.time(cat("pi ~= ", piR(N), "\n"))
#> pi ~= 3.140899
#> user system elapsed
#> 0.960 0.112 1.081
Using dqrng is about three times faster:
library(dqrng)
dqRNGkind("Xoroshiro128+")
dqset.seed(42)
system.time(cat("pi ~= ", piR(N, rng = dqrunif), "\n"))
#> pi ~= 3.14042
#> user system elapsed
#> 0.343 0.103 0.496
Since the calculations add a constant off-set, the speed-up for the RNGs alone has to be even greater:
system.time(runif(N))
#> user system elapsed
#> 0.311 0.037 0.350
system.time(dqrunif(N))
#> user system elapsed
#> 0.040 0.036 0.076
Similar for the exponential distribution:
system.time(rexp(N))
#> user system elapsed
#> 0.512 0.013 0.524
system.time(dqrexp(N))
#> user system elapsed
#> 0.086 0.020 0.106
And for the normal distribution:
system.time(rnorm(N))
#> user system elapsed
#> 0.642 0.020 0.661
system.time(dqrnorm(N))
#> user system elapsed
#> 0.115 0.016 0.132
As well as for sampling with and without replacement:
system.time(for (i in 1:100) sample.int(N, N/100, replace = TRUE))
#> user system elapsed
#> 0.727 0.016 0.744
system.time(for (i in 1:100) dqsample.int(N, N/100, replace = TRUE))
#> user system elapsed
#> 0.046 0.012 0.059
system.time(for (i in 1:100) sample.int(N, N/100))
#> user system elapsed
#> 2.550 1.592 4.215
system.time(for (i in 1:100) dqsample.int(N, N/100))
#> user system elapsed
#> 0.125 0.008 0.132
The RNGs and distributions functions can also be used from C++ at various levels of abstraction. Technically there are three ways to make use of dqrng at the C++ level:
// [[Rcpp::depends(dqrng)]]
together with Rcpp::sourceCpp()
Rcpp::cppFunction(depends = "dqrng", ...)
LinkingTo: dqrng
Here only the first approach is shown.
The functions available in R directly call corresponding C++ functions. These functions are also available at the C++ level if you include dqrng.h
. The full ist of functions is available with vignette("cpp-api", package = "dqrng")
. Revisiting the example of approximating \(\pi\) we can use:
// [[Rcpp::depends(dqrng)]]
#include <Rcpp.h>
#include <dqrng.h>
using Rcpp::IntegerVector;
using Rcpp::NumericVector;
using Rcpp::sqrt;
using Rcpp::sum;
using dqrng::dqrunif;
// [[Rcpp::export]]
double piCpp(const int n) {
"Xoroshiro128+");
dqrng::dqRNGkind(42));
dqrng::dqset_seed(IntegerVector::create(
NumericVector x = dqrunif(n);
NumericVector y = dqrunif(n);
NumericVector d = sqrt(x*x + y*y);return 4.0 * sum(d < 1.0) / n;
}/*** R
system.time(cat("pi ~= ", piCpp(1e7), "\n"))
*/
Note that in C++ you have to use dqrng::dqset_seed()
, whereas the analogue function in the R interface is called dqrng::dqset.seed()
. For sampling with and without replacement dqrng::dqsample_int()
and dqrng::dqsample_num()
are the analogue of dqrng::dqsample.int()
in the R interface:
// [[Rcpp::depends(dqrng)]]
#include <Rcpp.h>
#include <dqrng.h>
// [[Rcpp::export]]
void sampleCpp(const int n) {
"Xoroshiro128+");
dqrng::dqRNGkind(42));
dqrng::dqset_seed(Rcpp::IntegerVector::create(100, true);
Rcpp::IntegerVector sample = dqrng::dqsample_int(n, n/std::endl;
Rcpp::Rcout << sample <<
}/*** R
sampleCpp(1000)
*/
The RNG wrapper and distributions functions can be used from C++ by including dqrng_generator.h
and dqrng_distribution.h
. In order to use these header files, you have to use at least C++11 and link to the BH
and sitmo
packages as well. For example, you can use the distribution functions from dqrng together with some foreign 64bit RNG. Here a minimal SplitMix generator is used together with dqrng::normal_distribution
:
#include <Rcpp.h>
// [[Rcpp::depends(dqrng, BH, sitmo)]]
#include <dqrng_distribution.h>
// [[Rcpp::plugins(cpp11)]]
#include <cstdint>
class SplitMix {
public:
typedef uint64_t result_type;
result_type seed) : state(seed) {};
SplitMix (result_type operator() () {
result_type z = (state += 0x9e3779b97f4a7c15ULL);
30)) * 0xbf58476d1ce4e5b9ULL;
z = (z ^ (z >> 27)) * 0x94d049bb133111ebULL;
z = (z ^ (z >> return z ^ (z >> 31);
}void seed(result_type seed) {state = seed;}
static constexpr result_type min() {return 0;};
static constexpr result_type max() {return UINT64_MAX;};
private:
result_type state;
};
// [[Rcpp::export]]
const int n, const double mean = 0.0, const double sd = 1.0) {
Rcpp::NumericVector splitmix_rnorm(auto rng = dqrng::generator<SplitMix>(42);
dqrng::normal_distribution dist(mean, sd);
Rcpp::NumericVector result(n);std::generate(result.begin(), result.end(), [&dist, &rng](){return dist(*rng);});
return result;
}/*** R
splitmix_rnorm(10)
system.time(splitmix_rnorm(1e7))
*/
Since SplitMix is a very fast RNG, the speed of this function is comparable to dqrnorm
. Generally speaking you can use any C++11 compliant RNG with 64 bit output size. For example, here the 64 bit Threefry engine with 13 rounds from package sitmo is used:
#include <Rcpp.h>
// [[Rcpp::depends(dqrng, BH, sitmo)]]
#include <dqrng_distribution.h>
#include <threefry.h>
// [[Rcpp::plugins(cpp11)]]
// [[Rcpp::export]]
const int n, const double mean = 0.0, const double sd = 1.0) {
Rcpp::NumericVector threefry_rnorm(auto rng = dqrng::generator<sitmo::threefry_13_64>(42);
dqrng::normal_distribution dist(mean, sd);
Rcpp::NumericVector result(n);std::generate(result.begin(), result.end(), [&dist, &rng](){return dist(*rng);});
return result;
}/*** R
threefry_rnorm(10)
system.time(threefry_rnorm(1e7))
*/
Alternatively, you could combine the included RNGs together with dqrng’s tooling and some other distribution function. For example, this function generates random numbers according to the normal distribution using the standard library from C++11:
#include <random>
#include <Rcpp.h>
// [[Rcpp::depends(dqrng, BH, sitmo)]]
#include <dqrng_distribution.h>
#include <xoshiro.h>
// [[Rcpp::plugins(cpp11)]]
// [[Rcpp::export]]
const int n, const double mean = 0.0, const double sd = 1.0) {
Rcpp::NumericVector std_rnorm(auto rng = dqrng::generator<dqrng::xoroshiro128plus>(42);
std::normal_distribution<double> dist(mean, sd);
Rcpp::NumericVector result(n);std::generate(result.begin(), result.end(), [&dist, &rng](){return dist(*rng);});
return result;
}/*** R
std_rnorm(10)
system.time(std_rnorm(1e7))
*/
Typically this is not as fast as dqrnorm
, but the technique is useful to support distributions not (yet) included in dqrng. Note however, that the algorithms used for the distributions from C++11 are implementation defined.
Finally you could directly use the base generators, which are provided as header-only libraries, without dqrng’s tooling. For example, the following function uses the 32 bit PCG variant together with Boost’s normal distribution function:
#include <Rcpp.h>
// [[Rcpp::depends(dqrng, BH, sitmo)]]
#include <pcg_random.hpp>
#include <boost/random/normal_distribution.hpp>
// [[Rcpp::plugins(cpp11)]]
// [[Rcpp::export]]
const int n, const double mean = 0.0, const double sd = 1.0) {
Rcpp::NumericVector boost_pcg_rnorm(42);
pcg32 rng(boost::random::normal_distribution<double> dist(mean, sd);
Rcpp::NumericVector result(n);std::generate(result.begin(), result.end(), [&dist, &rng](){return dist(rng);});
return result;
}/*** R
boost_pcg_rnorm(10)
system.time(boost_pcg_rnorm(1e7))
*/
This is quite fast since boost::random::normal_distribution
uses the fast Ziggurat algorithm. For some applications it is necessary to draw random numbers from multiple distributions with varying parameters. The following function uses a binomial distribution (from boost.random
) as well as the normal distribution from dqrng
. The parameters of the distributions are adjusted for every draw using <distribution>::param_type
:
#include <Rcpp.h>
// [[Rcpp::depends(dqrng, BH, sitmo)]]
#include <boost/random/binomial_distribution.hpp>
#include <dqrng_distribution.h>
#include <xoshiro.h>
// [[Rcpp::plugins(cpp11)]]
// aliases for the used ditributions
using binomial = boost::random::binomial_distribution<int>;
using normal = dqrng::normal_distribution;
// [[Rcpp::export]]
int n) {
Rcpp::NumericMatrix multiple_distributions(42);
dqrng::xoshiro256plus rng(// distributions with default parameters
binomial bernoulli;
normal normal;3);
Rcpp::NumericMatrix out(n, double p = 0.0;
for (int i = 0; i < n; ++i) {
double(i) / double(n);
p = 0) = bernoulli(rng, binomial::param_type(1, p));
out(i,1) = normal(rng, normal::param_type(p, 1.0));
out(i,2) = normal(rng, normal::param_type(4.0, 3.0 - p));
out(i,
}"Bernoulli", "Normal1", "Normal2");
Rcpp::colnames(out) = Rcpp::CharacterVector::create(return out;
}
/*** R
multiple_distributions(5)
*/