Operator-based rational approximation

Introduction

Several popular Gaussian random field models can be represented as solutions to stochastic partial differential equations (SPDEs) of the form \[ L^{\beta} (\tau u) = \mathcal{W}. \]

Here \(\mathcal{W}\) is Gaussian white noise, \(L\) is a second-order differential operator, the fractional power \(\beta>0\) determines the smoothness of \(u\), and \(\tau>0\) scales the variance of \(u\).

If \(2\beta\) is an integer and if the domain \(\mathcal{D}\) where the model is defined is bounded, then \(u\) can be approximated by a Gaussian Markov random field (GMRF) \(\boldsymbol{\mathrm{u}}\) via a finite element method (FEM) for the SPDE. Specifically, the approximation can be written as \[ u_h(s) = \sum_{i=1}^n u_i \varphi_i(s). \] Here \(\{\varphi_i\}\) are piecewise linear basis functions defined by some triangulation of \(\mathcal{D}\) and the vector of weights \(\boldsymbol{\mathrm{u}} = (u_1,\ldots,u_n)^T\) is normally distributed, \(N(\boldsymbol{\mathrm{u}},\tilde{\boldsymbol{\mathrm{Q}}}^{-1})\), where \(\tilde{\boldsymbol{\mathrm{Q}}}\) is sparse. See (Lindgren, Rue, and Lindström 2011)(https://rss.onlinelibrary.wiley.com/doi/full/10.1111/j.1467-9868.2011.00777.x) for further details.

In this vignette we focus on the operator-based rational approximation. This approach, introduced by (Bolin and Kirchner 2020)(https://www.tandfonline.com/doi/full/10.1080/10618600.2019.1665537), results in an approximation of the original SPDE which is of the form \(P_l x = P_r \mathcal{W}\), where \(P_l\) and \(P_r\) are non-fractional operators defined in terms of polynomials \(p_l\) and \(p_r\). The order of \(p_r\) is given by \(m\) and the order of \(p_l\) is \(m + m_{\beta}\) where \(m_{\beta}\) is the integer part of \(\beta\) if \(\beta>1\) and \(m_{\beta} = 1\) otherwise.

The combination of the rational approximation of the operator with the FEM discretization yields an approximation \(u_h\) of \(u\) on the basis expansion form above. The difference to the non-fractional case is that the vector of stochastic weights now is \(\boldsymbol{\mathrm{u}} \sim N(\boldsymbol{\mathrm{0}},\boldsymbol{\mathrm{P}}_r\boldsymbol{\mathrm{Q}}^{-1}\boldsymbol{\mathrm{P}}_r^T)\) where \(\boldsymbol{\mathrm{Q}}\) and \(\boldsymbol{\mathrm{P}}_r\) are sparse matrices. Alternatively, \(\boldsymbol{\mathrm{u}}\) can be represented as \(\boldsymbol{\mathrm{u}} = \boldsymbol{\mathrm{P}}_r \boldsymbol{\mathrm{x}}\) where \(\boldsymbol{\mathrm{x}} \sim N(\boldsymbol{\mathrm{0}},\boldsymbol{\mathrm{Q}}^{-1})\), which means that the discrete approximation is a latent GMRF. This can be used for computationally efficient inference and simulation. See Bolin and Kirchner (2020) for further details.

Using the package to perform operator-based rational approximations

The main purpose of the rSPDE package is to provide functions for creating the rational approximation. In this vignette we focus on the operator-based rational approximation, which means assembling the matrices \(\boldsymbol{\mathrm{Q}}\) and \(\boldsymbol{\mathrm{P}}_r\). There are three functions for computing the rational approximation. The most general function is fractional.operators(), which works for a wide class of models with a general differential operator \(L\). For the stationary Matérn case, where \(L = \kappa^2 - \Delta\), the function matern.operators() provides a simplified model specification. For the generalized non-stationary Matérn model, defined through the SPDE \[ (\kappa(s)^2 - \Delta)^\beta (\tau(s)u(s)) = \mathcal{W}, \] the function spde.matern.operators() can be used.

For the covariance-based rational approximation, we refer the reader to the Rational approximation with the rSPDE package vignette.

It is noteworthy that the covariance-based rational approximation only applies to fractional SPDE models with Gaussian noise, whereas the operator-based rational approximation can be used for more general noises such as the type-G Lévy noises. We refer the reader to (Wallin and Bolin 2015)(https://onlinelibrary.wiley.com/doi/full/10.1111/sjos.12141), (Bolin 2013)(https://onlinelibrary.wiley.com/doi/abs/10.1111/sjos.12046) and (Asar et al. 2020)(https://rss.onlinelibrary.wiley.com/doi/pdf/10.1111/rssc.12405) for examples of models driven by type-G Lévy noises. We also refer the reader to the ngme package where one can fit such models.

Once the approximation has been constructed, it can be included manually in statistical models just as for the non-fractional case. The package has some built-in functions for basic use of the approximation, such as simulate() which can be applied for simulation of the field. There are also functions for likelihood evaluation and kriging prediction for geostatistical models with Gaussian measurement noise where \(u\) is included. In the following sections, we will illustrate the usage of these functions.

Constructing the approximation

In this section, we explain how the different main functions can be used for constructing the rational approximation. The first step for constructing the rational SPDE approximation is to define the FEM mesh. For illustration purposes, the rSPDE package contains a simple FEM implementation for models on \(\mathbb{R}\) which we will use first. We will then illustrate how spatial models can be constructed, if the FEM implementation of the R-INLA package is used instead.

Assume that we want to define a model on the interval \([0,1]\). We then start by defining a vector with mesh nodes \(s_i\) where the basis functions \(\varphi_i\) are centered.

s <- seq(from = 0, to = 1, length.out = 101)

Based on these nodes, we use the built-in function rSPDE.fem1d() to assemble two matrices needed for creating the approximation of a basic Matérn model. These matrices are the mass matrix \(\boldsymbol{\mathrm{C}}\), with elements \(C_{ij} = \int \varphi_j(s) \varphi_i(s) ds\), and the stiffness matrix \(\boldsymbol{\mathrm{G}}\), with elements \(G_{ij} = \int \nabla\varphi_j(s) \cdot \nabla\varphi_i(s) ds\).

fem <- rSPDE.fem1d(s)

We can now use matern.operators() to construct a rational SPDE approximation of degree \(m=1\) for a Gaussian random field with a Matérn covariance function on the interval. Since we are using the operator-based approximation, we must set type to "operator".

kappa <- 20
sigma <- 2
nu <- 0.8
op <- matern.operators(kappa = kappa, sigma = sigma, nu = nu,
                       G = fem$G, C = fem$C, d = 1, m = 1,
                       type="operator")

The object op contains the matrices needed for evaluating the distribution of the stochastic weights \(\boldsymbol{\mathrm{u}}\). If we want to evaluate \(u_h(s)\) at some locations \(s_1,\ldots, s_n\), we need to multiply the weights with the basis functions \(\varphi_i(s)\) evaluated at the locations. For this, we can construct the observation matrix \(\boldsymbol{\mathrm{A}}\) with elements \(A_{ij} = \varphi_j(s_i)\), which links the FEM basis functions to the locations. This matrix can be constructed using the function rSPDE.A1d().

To evaluate the accuracy of the approximation, let us compute the covariance function between the process at \(s=0.5\) and all other locations in s and compare with the true Matérn covariance function, which is the folded Matérn, see Theorem 1 in An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach for further details. The covariances can be calculated as \[ \boldsymbol{\mathrm{A}} \boldsymbol{\mathrm{P}}_r \boldsymbol{\mathrm{Q}}^{-1}\boldsymbol{\mathrm{P}}_r^T\boldsymbol{\mathrm{v}}. \] Here \(\boldsymbol{\mathrm{A}}\) is an identity matrix since we are evaluating the approximation in the nodes of the FEM mesh and \(\boldsymbol{\mathrm{v}}\) is a vector with all basis functions evaluated in \(s=0.5\).

v <- t(rSPDE.A1d(s,0.5))
c.approx <- op$Pr %*% solve(op$Q, op$Pr %*% v)
c.true <- folded.matern.covariance(rep(0.5,length(s)),abs(s), kappa, nu, sigma)

The covariance function and the error compared with the Matérn covariance are shown in the following figure.

To improve the approximation we can increase the degree of the polynomials, by increasing \(m\), and/or increase the number of basis functions used for the FEM approximation. Let us, for example, compute the approximation with \(m=2\) using the same mesh, as well as the approximation when we increase the number of basis functions and use \(m=1\) and \(m=2\).

op2 <- matern.operators(kappa = kappa, sigma = sigma, nu = nu,
                       G = fem$G, C = fem$C, d = 1, m = 2,
                       type="operator")
c.approx2 <- op2$Pr %*% solve(op2$Q, op2$Pr %*% v)

s2 <- seq(from = 0, to = 1, length.out = 501)
fem2 <- rSPDE.fem1d(s2)
op <- matern.operators(kappa = kappa, sigma = sigma, nu = nu,
                       G = fem2$G, C = fem2$C, d = 1, m=1,
                       type="operator")
A <- rSPDE.A1d(s2,s)
v <- t(rSPDE.A1d(s2,0.5))
c.approx3 <- A%*%op$Pr %*% solve(op$Q, op$Pr %*% v)
op <- matern.operators(kappa = kappa, sigma = sigma, nu = nu,
                       G = fem2$G, C = fem2$C, d = 1, m=2,
                       type="operator")
c.approx4 <- A%*%op$Pr %*% solve(op$Q, op$Pr %*% v)

The resulting errors are shown in the following figure.

Since the error induced by the rational approximation decreases exponentially in \(m\), there is in general rarely a need for an approximation with a large value of \(m\). This is good because the number of non-zero elements in \(\boldsymbol{\mathrm{P}}_r\) and \(\boldsymbol{\mathrm{Q}}\) increases with \(m\), which makes the approximation more computationally costly to use. Further, the condition number of \(\boldsymbol{\mathrm{P}}_r\) and \(\boldsymbol{\mathrm{Q}}\) increases with \(m\), which can cause numerical problems when working with these matrices. To illustrate this, let us compute the norm of the approximation error for different \(m\).

errors <- rep(0,4)
for(i in 1:4){
  op <- matern.operators(kappa = kappa, sigma = sigma, nu = nu,
                       G = fem2$G, C = fem2$C, d = 1, m = i,
                       type = "operator")
  c.app <- A%*%op$Pr %*% solve(op$Q, op$Pr %*% v)
  errors[i] <- norm(c.true-c.app)
}
print(errors)
#> [1]  1.0120165  0.1100542 52.4841240 54.6499071

We see that the error decreases when increasing \(m\) from \(1\) to \(2\), but is very large for \(m=3\) and \(m=4\). The reason for this is not that the approximation is bad, but that the numerical accuracy of the product \(\boldsymbol{\mathrm{P}}_r\boldsymbol{\mathrm{Q}}^{-1}\boldsymbol{\mathrm{P}}_rv\) is low due to the high condition numbers of the matrices.

It is important to notice that the covariance-based rational approximation is more numerically stable, the main reason being that its clever decomposition of the latent field into a vector of random fields remove the need of higher order finite element matrices to have higher order rational approximations. We refer the reader to the Rational approximation with the rSPDE package vignette for further details.

To solve this issue for the operator-based rational approximation, the package contains functions for performing operations such as \(\boldsymbol{\mathrm{P}}_rv\) or \(\boldsymbol{\mathrm{P}}_r^{-1}v\) that takes advantage of the structure of \(\boldsymbol{\mathrm{P}}_r\) to avoid numerical issues. A complete list of these function can be seen by typing ?operator.operations. One of these functions is Sigma.mult(), which performs the multiplication \(\boldsymbol{\mathrm{P}}_r\boldsymbol{\mathrm{Q}}^{-1}\boldsymbol{\mathrm{P}}_rv\) in a more numerically stable way. Let us use this function to compute the errors of the approximations again to see that we indeed get better approximations as \(m\) increases.

errors2 <- rep(0,4)
for(i in 1:4){
  op <- matern.operators(kappa = kappa, sigma = sigma, nu = nu,
                       G = fem2$G, C = fem2$C, d = 1, m = i,
                       type="operator")
  c.app <- A%*%Sigma.mult(op, v)
  errors2[i] <- norm(c.true-c.app)
}
print(errors2)
#> [1] 1.01201540 0.10479953 0.02474815 0.01788324

A non-stationary model

Let us now examine a non-stationary model \((\kappa(s)^2 - \Delta)^\beta (\tau(s)u(s)) = \mathcal{W}\) with \(\kappa(s) = 10(1+2s^2)\) and \(\tau(s) = 0.1(1 - 0.7s^2)\). We can then use spde.matern.operators(), which currently is only available for operator-based rational approximation, to create the rational approximation with \(m=1\) as follows.

s <- seq(from = 0, to = 1, length.out = 501)
fem <- rSPDE.fem1d(s)
kappa <-  10*(1+2*s^2)
tau <-  0.1*(1 - 0.7*s^2)
op <- spde.matern.operators(kappa = kappa, tau = tau, nu = nu, 
                            G = fem$G, C = fem$C, d = 1, m=1)

Let us compute the covariance function \(C(s,s_i)\) of the non-stationary model for the locations \(s_1=0.1, s_2 = 0.5,\) and \(s_3 = 0.9\).

v <- t(rSPDE.A1d(s, c(0.1,0.5,0.9)))
covs <- Sigma.mult(op, v)

The three covariances are shown in the following figure.

We see that this choice of \(\kappa(s)\) and \(\tau(s)\) results in a model with longer range for small values of \(s\) and smaller variance in the middle of the domain. We can also apply the general function fractional.operators() to construct the approximation. This function requires that one supplies a discretization of the non-fractional operator \(L\), as well as a scaling factor \(c>0\) which is a lower bound for the smallest eigenvalue of \(L\). In our case we have \(L = \kappa(s)^2 - \Delta\), and the eigenvalues of this operator is bounded from below by \(c = \min_s \kappa(s)^2\). We compute this constant and the discrete operator.

c = min(kappa)^2
L = fem$G + fem$C %*% Diagonal(501, kappa^2)

Another difference between fractional.operators() and the previous functions for constructing the approximation, is that it requires specifying \(\beta\) instead of the smoothness parameter \(\nu\) for the Matérn covariance. These two parameters are related as \(2\beta = \nu + d/2\).

op <- fractional.operators(L = L, beta = (nu + 1/2)/2, C = fem$C, 
                           scale.factor = c, tau = tau, m = 1)

Let’s make sure that we have the same approximation by comparing the previously computed covariances.

covs2 <- Sigma.mult(op,v)
norm(covs-covs2)
#> [1] 0

Obviously, it is simpler to use spde.matern.operators() in this case, but the advantage with fractional.operators() is that it also can be used for other more general models such as one with \(L = \kappa(s)^2 - \nabla \cdot (\boldsymbol{\mathrm{H}}(s) \nabla)\) for some matrix-valued function \(\boldsymbol{\mathrm{H}}(s)\).

Using the approximation

For any approximation, constructed using the functions fractional.operators(), matern.operators(), or spde.matern.operators(), we can simulate from the model using simulate().

u <- simulate(op)

There is also built-in support for kriging prediction. To illustrate this, we use the simulation to create some noisy observations of the process. For this, we first construct the observation matrix linking the FEM basis functions to the locations where we want to simulate. We first randomly generate some observation locations and then construct the matrix.

n.obs <- 20
obs.loc <- runif(n = n.obs, min = 0, max = 1)
A <- rSPDE.A1d(s, obs.loc)

We now generate the observations as \(Y_i = u(s_i) + \varepsilon_i\), where \(\varepsilon_i \sim N(0,\sigma_e^2)\) is Gaussian measurement noise.

sigma.e <- 0.3
Y <- as.vector(A %*% u + sigma.e * rnorm(n.obs))

Finally, we compute the kriging prediction of the process \(u\) at the locations in s based on these observations. To specify which locations that should be predicted, the argument Aprd is used. This argument should be an observation matrix that links the mesh locations to the prediction locations.

A.krig <- rSPDE.A1d(s, s)
u.krig <- predict(op, A = A, Aprd = A.krig, Y = Y, sigma.e = sigma.e)

The process simulation, the observed data, and the kriging prediction are shown in the following figure.

Spatial data and parameter estimation

The functions used in the previous examples also work for spatial models. We then need to construct a mesh over the domain of interest and then compute the matrices needed to define the operator. These tasks can be performed, for example, using the R-INLA package. Let us start by defining a mesh over \([0,1]\times [0, 1]\) and compute the mass and stiffness matrices for that mesh.

library(INLA)
m = 100
loc_2d_mesh = matrix(runif(m*2),m,2)
mesh_2d = inla.mesh.create.helper(
  points=loc_2d_mesh,
  cutoff=0.05,
  offset=c(0.1,0.4),
  max.edge=c(0.05,0.5) )
plot(mesh_2d, main= "")
points(loc_2d_mesh[,1],loc_2d_mesh[,2])


fem_mesh_2d <- inla.mesh.fem(mesh_2d)
C = fem_mesh_2d$c0
G <- fem_mesh_2d$g1

We can now use these matrices to define a rational SPDE approximation of degree \(m=1\) for a Matérn model in the same was as before. We will now simulate a latent process with standard deviation \(\sigma=1\) and range \(0.1\). We will use \(\nu=0.5\) so that the model has an exponential covariance function.

nu = 0.5
sigma = 1 
range = 0.1
kappa = sqrt(8*nu)/range
tau = 1/(sqrt(4*pi)*kappa*sigma)
op <- spde.matern.operators(kappa = kappa, tau = tau, nu = nu, G = G, C = C, d = 2, m = 1)

Now let us simulate some noisy data that we will use to estimate the parameters of the model. To construct the observation matrix, we use the R-INLA function inla.spde.make.A(). We will sample 30 replicates of the latent field.

n.rep = 30
u <- simulate(op, nsim=n.rep)
A <- inla.spde.make.A(
  mesh=mesh_2d,
  loc=loc_2d_mesh)
sigma.e<- 0.1
Y = A %*% u + matrix(rnorm(m*n.rep),ncol=n.rep) * sigma.e

The first replicate of the simulated random field as well as the observation locations are shown in the following figure.

For each type of rational approximation of degree \(m\), there is a corresponding likelihood function that can be used for likelihood-based parameter estimation. Since we constructed the model with spde.matern.operators(), we can use the function spde.matern.loglike() to define the likelihood. To simplify parameter estimation, we create an objective function to minimize which is the negative log-likelihood, parametrized using the logarithm of each parameter to avoid constrained optimization.

mlik <- function(theta, Y, G, C, A) {
  return(-spde.matern.loglike(tau=exp(theta[1]), kappa=exp(theta[2]), 
                              nu=exp(theta[3]), sigma.e=exp(theta[4]),
                              Y = Y, G = G, C = C, A = A, d = 2, m=1))
}

We can now estimate the parameter using optim().

theta0 = c(get.inital.values.rSPDE(mesh=mesh_2d, include.tau = TRUE),
              log(0.1*sqrt(var(as.vector(Y)))))
start_time <- Sys.time()
pars <- optim(theta0, mlik, Y = Y, G = G, C = C, A = A, method = "L-BFGS-B")
end_time <- Sys.time()
total_time <- end_time - start_time
results <- data.frame(tau = c(tau, exp(pars$par[1])),
                      kappa = c(kappa, exp(pars$par[2])), 
                      nu = c(nu, exp(pars$par[3])),
                      sigma.e = c(sigma.e, exp(pars$par[4])),
                      row.names = c("True", "Estimate"))
print(results)
#>                  tau    kappa        nu    sigma.e
#> True     0.014104740 20.00000 0.5000000 0.10000000
#> Estimate 0.002885492 25.69506 0.8895145 0.08292449

#Total time
print(total_time)
#> Time difference of 1.38297 mins