Iterative INLA method

The INLA method for linear predictors

The INLA method is used to compute fast approximative posterior distribution for Bayesian generalised additive models. The hierarchical structure of such a model with latent Gaussian components \(\boldsymbol{u}\), covariance parameters \(\boldsymbol{\theta}\), and measured response variables \(\boldsymbol{y}\), can be written as \[ \begin{aligned} \boldsymbol{\theta} &\sim p(\boldsymbol{\theta}) \\ \boldsymbol{u}|\boldsymbol{\theta} &\sim \mathcal{N}\!\left(\boldsymbol{\mu}_u, \boldsymbol{Q}(\boldsymbol{\theta})^{-1}\right) \\ \boldsymbol{\eta}(\boldsymbol{u}) &= \boldsymbol{A}\boldsymbol{u} \\ \boldsymbol{y}|\boldsymbol{u},\boldsymbol{\theta} & \sim p(\boldsymbol{y}|\boldsymbol{\eta}(\boldsymbol{u}),\boldsymbol{\theta}) \end{aligned} \] where typically each linear predictor element, \(\eta_i(\boldsymbol{u})\), is linked to a location parameter of the distribution for observation \(y_i\), for each \(i\), via a (non-linear) link function \(g^{-1}(\cdot)\). In the R-INLA implementation, the observations are assumed to be conditionally independent, given \(\boldsymbol{\eta}\) and \(\boldsymbol{\theta}\).

Approximate INLA for non-linear predictors

The premise for the inlabru method for non-linear predictors is to build on the existing implementation, and only add a linearisation step. The properties of the resulting approximation will depend on the nature of the non-linearity.

Let \(\widetilde{\boldsymbol{\eta}}(\boldsymbol{u})\) be a non-linear predictor, and let \(\overline{\boldsymbol{\eta}}(\boldsymbol{u})\) be the 1st order Taylor approximation at \(\boldsymbol{u}_0\), \[ \overline{\boldsymbol{\eta}}(\boldsymbol{u}) = \widetilde{\boldsymbol{\eta}}(\boldsymbol{u}_0) + \boldsymbol{B}(\boldsymbol{u} - \boldsymbol{u}_0) = \left[\widetilde{\boldsymbol{\eta}}(\boldsymbol{u}_0) - \boldsymbol{B}\boldsymbol{u}_0\right] + \boldsymbol{B}\boldsymbol{u} , \] where \(\boldsymbol{B}\) is the derivative matrix for the non-linear predictor, evaluated at \(\boldsymbol{u}_0\).

The non-linear observation model \(p(\boldsymbol{y}|g^{-1}[\widetilde{\boldsymbol{\eta}}(\boldsymbol{u})],\boldsymbol{\theta})\) is approximated by replacing the non-linear predictor with its linearisation, so that the linearised model is defined by \[ \overline{p}(\boldsymbol{y}|\boldsymbol{u},\boldsymbol{\theta}) = p(\boldsymbol{y}|\overline{\boldsymbol{\eta}}(\boldsymbol{u}),\boldsymbol{\theta}) = p(\boldsymbol{y}|g^{-1}[\overline{\boldsymbol{\eta}}(\boldsymbol{u})],\boldsymbol{\theta}) \approx p(\boldsymbol{y}|g^{-1}[\widetilde{\boldsymbol{\eta}}(\boldsymbol{u})],\boldsymbol{\theta}) = p(\boldsymbol{y}|\widetilde{\boldsymbol{\eta}}(\boldsymbol{u}),\boldsymbol{\theta}) = \widetilde{p}(\boldsymbol{y}|\boldsymbol{u},\boldsymbol{\theta}) \] The non-linear model posterior is factorised as \[ \widetilde{p}(\boldsymbol{\theta},\boldsymbol{u}|\boldsymbol{y}) = \widetilde{p}(\boldsymbol{\theta}|\boldsymbol{y})\widetilde{p}(\boldsymbol{u}|\boldsymbol{y},\boldsymbol{\theta}), \] and the linear model approximation is factorised as \[ \overline{p}(\boldsymbol{\theta},\boldsymbol{u}|\boldsymbol{y}) = \overline{p}(\boldsymbol{\theta}|\boldsymbol{y})\overline{p}(\boldsymbol{u}|\boldsymbol{y},\boldsymbol{\theta}) . \]

Fixed point iteration

The remaining step of the approximation is how to choose the linearisation point \(\boldsymbol{u}_0\). We start by introducing a functional \(f(\overline{p}_{\boldsymbol{v}})\) of the posterior distribution linearised at \(\boldsymbol{v}\), that generates a latent field configuration. We then seek a fix point of the functional, so that \(\boldsymbol{u}_0=f(\overline{p}_{\boldsymbol{u}_0})\). Potential choices for \(f(\cdot)\) include the posterior expectation \(\overline{E}(\boldsymbol{u}|\boldsymbol{y})\) and the joint conditional mode, \[ f(\overline{p}_{\boldsymbol{v}})=\mathop{\mathrm{arg\,max}}_{\boldsymbol{u}} \overline{p}_{\boldsymbol{v}}(\boldsymbol{u}|\boldsymbol{y},\widehat{\boldsymbol{\theta}}), \] where \(\widehat{\boldsymbol{\theta}}=\mathop{\mathrm{arg\,max}}_{\boldsymbol{\theta}} \overline{p}_{\boldsymbol{v}}(\boldsymbol{\theta}|\boldsymbol{y})\) (used in inlabru from version 2.2.31) One key to the fix point iteration is that the observation model is linked to \(\boldsymbol{u}\) only through the non-linear predictor \(\widetilde{\boldsymbol{\eta}}(\boldsymbol{u})\).

  1. Let \(\boldsymbol{u}_0\) be an initial linearisation point for the latent variables.
  2. Compute the predictor linearisation at \(\boldsymbol{u}_0\),
  3. Compute the linearised INLA posterior \(\overline{p}(\boldsymbol{\theta}|\boldsymbol{y})\)
  4. Let \(\boldsymbol{u}_1=f(\overline{p}_{\boldsymbol{u}_0})\) be the initial candidate for new linearisation point.
  5. Let \(\boldsymbol{u}_\alpha=(1-\alpha)\boldsymbol{u}_0+\alpha\boldsymbol{u}_1\), and find the value \(\alpha\) that minimises \(\|\widetilde{\eta}(\boldsymbol{u}_\alpha)-\overline{\eta}(\boldsymbol{u}_1)\|\).
  6. Set the linearisation point equal to \(\boldsymbol{u}_\alpha\) and repeat from step 1, unless the iteration has converged to a given tolerance.

  1. Note: In inlabru version 2.1.15, \[ f(\overline{p}_{\boldsymbol{v}})=\left\{\mathop{\mathrm{arg\,max}}_{u_i} \overline{p}_{\boldsymbol{v}}(u_i|\boldsymbol{y}),\,i=1,\dots,n\right\}, \] was used, which caused problems for some nonlinear models.↩︎