The INLA method is used to compute fast approximative posterior distribution for Bayesian generalised additive models. The hierarchical structure of such a model with latent Gaussian components \(\boldsymbol{u}\), covariance parameters \(\boldsymbol{\theta}\), and measured response variables \(\boldsymbol{y}\), can be written as \[ \begin{aligned} \boldsymbol{\theta} &\sim p(\boldsymbol{\theta}) \\ \boldsymbol{u}|\boldsymbol{\theta} &\sim \mathcal{N}\!\left(\boldsymbol{\mu}_u, \boldsymbol{Q}(\boldsymbol{\theta})^{-1}\right) \\ \boldsymbol{\eta}(\boldsymbol{u}) &= \boldsymbol{A}\boldsymbol{u} \\ \boldsymbol{y}|\boldsymbol{u},\boldsymbol{\theta} & \sim p(\boldsymbol{y}|\boldsymbol{\eta}(\boldsymbol{u}),\boldsymbol{\theta}) \end{aligned} \] where typically each linear predictor element, \(\eta_i(\boldsymbol{u})\), is linked to a location parameter of the distribution for observation \(y_i\), for each \(i\), via a (non-linear) link function \(g^{-1}(\cdot)\). In the R-INLA implementation, the observations are assumed to be conditionally independent, given \(\boldsymbol{\eta}\) and \(\boldsymbol{\theta}\).
The premise for the inlabru method for non-linear predictors is to build on the existing implementation, and only add a linearisation step. The properties of the resulting approximation will depend on the nature of the non-linearity.
Let \(\widetilde{\boldsymbol{\eta}}(\boldsymbol{u})\) be a non-linear predictor, and let \(\overline{\boldsymbol{\eta}}(\boldsymbol{u})\) be the 1st order Taylor approximation at \(\boldsymbol{u}_0\), \[ \overline{\boldsymbol{\eta}}(\boldsymbol{u}) = \widetilde{\boldsymbol{\eta}}(\boldsymbol{u}_0) + \boldsymbol{B}(\boldsymbol{u} - \boldsymbol{u}_0) = \left[\widetilde{\boldsymbol{\eta}}(\boldsymbol{u}_0) - \boldsymbol{B}\boldsymbol{u}_0\right] + \boldsymbol{B}\boldsymbol{u} , \] where \(\boldsymbol{B}\) is the derivative matrix for the non-linear predictor, evaluated at \(\boldsymbol{u}_0\).
The non-linear observation model \(p(\boldsymbol{y}|g^{-1}[\widetilde{\boldsymbol{\eta}}(\boldsymbol{u})],\boldsymbol{\theta})\) is approximated by replacing the non-linear predictor with its linearisation, so that the linearised model is defined by \[ \overline{p}(\boldsymbol{y}|\boldsymbol{u},\boldsymbol{\theta}) = p(\boldsymbol{y}|\overline{\boldsymbol{\eta}}(\boldsymbol{u}),\boldsymbol{\theta}) = p(\boldsymbol{y}|g^{-1}[\overline{\boldsymbol{\eta}}(\boldsymbol{u})],\boldsymbol{\theta}) \approx p(\boldsymbol{y}|g^{-1}[\widetilde{\boldsymbol{\eta}}(\boldsymbol{u})],\boldsymbol{\theta}) = p(\boldsymbol{y}|\widetilde{\boldsymbol{\eta}}(\boldsymbol{u}),\boldsymbol{\theta}) = \widetilde{p}(\boldsymbol{y}|\boldsymbol{u},\boldsymbol{\theta}) \] The non-linear model posterior is factorised as \[ \widetilde{p}(\boldsymbol{\theta},\boldsymbol{u}|\boldsymbol{y}) = \widetilde{p}(\boldsymbol{\theta}|\boldsymbol{y})\widetilde{p}(\boldsymbol{u}|\boldsymbol{y},\boldsymbol{\theta}), \] and the linear model approximation is factorised as \[ \overline{p}(\boldsymbol{\theta},\boldsymbol{u}|\boldsymbol{y}) = \overline{p}(\boldsymbol{\theta}|\boldsymbol{y})\overline{p}(\boldsymbol{u}|\boldsymbol{y},\boldsymbol{\theta}) . \]
The remaining step of the approximation is how to choose the
linearisation point \(\boldsymbol{u}_0\). We start by introducing
a functional \(f(\overline{p}_{\boldsymbol{v}})\) of the
posterior distribution linearised at \(\boldsymbol{v}\), that generates a latent
field configuration. We then seek a fix point of the functional, so that
\(\boldsymbol{u}_0=f(\overline{p}_{\boldsymbol{u}_0})\).
Potential choices for \(f(\cdot)\)
include the posterior expectation \(\overline{E}(\boldsymbol{u}|\boldsymbol{y})\)
and the joint conditional mode, \[
f(\overline{p}_{\boldsymbol{v}})=\mathop{\mathrm{arg\,max}}_{\boldsymbol{u}}
\overline{p}_{\boldsymbol{v}}(\boldsymbol{u}|\boldsymbol{y},\widehat{\boldsymbol{\theta}}),
\] where \(\widehat{\boldsymbol{\theta}}=\mathop{\mathrm{arg\,max}}_{\boldsymbol{\theta}}
\overline{p}_{\boldsymbol{v}}(\boldsymbol{\theta}|\boldsymbol{y})\)
(used in inlabru
from version 2.2.31) One key to the fix
point iteration is that the observation model is linked to \(\boldsymbol{u}\) only through the
non-linear predictor \(\widetilde{\boldsymbol{\eta}}(\boldsymbol{u})\).
In step 4, an approximate line search can be used, that avoids many
potentially expensive evaluations of the non-linear predictor. We
evaluate \(\widetilde{\boldsymbol{\eta}}_1=\widetilde{\boldsymbol{\eta}}(\boldsymbol{u}_1)\)
and make use of the linearised predictor information. Let \(\widetilde{\boldsymbol{\eta}}_\alpha=\widetilde{\boldsymbol{\eta}}(\boldsymbol{u}_\alpha)\)
and \(\overline{\boldsymbol{\eta}}_\alpha=\overline{\boldsymbol{\eta}}(\boldsymbol{u}_\alpha)=(1-\alpha)\widetilde{\boldsymbol{\eta}}(\boldsymbol{u}_0)+\alpha\overline{\boldsymbol{\eta}}(\boldsymbol{u}_1)\).
An exact line search would minimise \(\|\widetilde{\boldsymbol{\eta}}_\alpha-\overline{\boldsymbol{\eta}}_1\|\).
Instead, we define a quadratic approximation to the non-linear predictor
as a function of \(\alpha\), \[
\breve{\boldsymbol{\eta}}_\alpha =
\overline{\boldsymbol{\eta}}_\alpha + \alpha^2
(\widetilde{\boldsymbol{\eta}}_1 - \overline{\boldsymbol{\eta}}_1)
\] and minimise the quartic polynomial in \(\alpha\), \[
\begin{aligned}
\|\breve{\boldsymbol{\eta}}_\alpha-\overline{\boldsymbol{\eta}}_1\|^2
&=
\| (\alpha-1)(\overline{\boldsymbol{\eta}}_1 -
\overline{\boldsymbol{\eta}}_0) + \alpha^2
(\widetilde{\boldsymbol{\eta}}_1 - \overline{\boldsymbol{\eta}}_1) \|^2
.
\end{aligned}
\] If initial expansion and contraction steps are carried out,
leading to an initial guess of \(\alpha=\gamma^k\), where \(\gamma>1\) is a scaling factor (see
?bru_options
, bru_method$factor
) and \(k\) is the (signed) number of expansions
and contractions, the quadratic expression is replaced by \[
\begin{aligned}
\|\breve{\boldsymbol{\eta}}_\alpha-\overline{\boldsymbol{\eta}}_1\|^2
&=
\| (\alpha-1)(\overline{\boldsymbol{\eta}}_1 -
\overline{\boldsymbol{\eta}}_0) + \frac{\alpha^2}{\gamma^{2k}}
(\widetilde{\boldsymbol{\eta}}_{\gamma^k} -
\overline{\boldsymbol{\eta}}_{\gamma^k}) \|^2
,
\end{aligned}
\] which is minimised on the interval \(\alpha\in[\gamma^{k-1},\gamma^{k+1}]\).
A potential improvement of step 4 might be to also take into account the prior distribution for \(\boldsymbol{u}\) as a minimisation penalty, to avoid moving further than would be indicated by a full likelihood optimisation.
Note: In inlabru
version 2.1.15, \[
f(\overline{p}_{\boldsymbol{v}})=\left\{\mathop{\mathrm{arg\,max}}_{u_i}
\overline{p}_{\boldsymbol{v}}(u_i|\boldsymbol{y}),\,i=1,\dots,n\right\},
\] was used, which caused problems for some nonlinear models.↩︎