Iterative INLA method

Approximate INLA for non-linear predictors

The premise for the inlabru method for non-linear predictors is to build on the existing implementation, and only add a linearisation step. The properties of the resulting approximation will depend on the nature of the non-linearity.

Let $\widetilde{\boldsymbol{\eta}}(\boldsymbol{u})$ be a non-linear predictor, and let $\overline{\boldsymbol{\eta}}(\boldsymbol{u})$ be the 1st order Taylor approximation at $\boldsymbol{u}_0$, \[ \overline{\boldsymbol{\eta}}(\boldsymbol{u}) = \widetilde{\boldsymbol{\eta}}(\boldsymbol{u}_0) + \boldsymbol{B}(\boldsymbol{u} - \boldsymbol{u}_0) = \left[\widetilde{\boldsymbol{\eta}}(\boldsymbol{u}_0) - \boldsymbol{B}\boldsymbol{u}_0\right] + \boldsymbol{B}\boldsymbol{u} , \] where $\boldsymbol{B}$ is the derivative matrix for the non-linear predictor, evaluated at $\boldsymbol{u}_0$.

The non-linear observation model $p(\boldsymbol{y}|g^{-1}[\widetilde{\boldsymbol{\eta}}(\boldsymbol{u})],\boldsymbol{\theta})$ is approximated by replacing the non-linear predictor with its linearisation, so that the linearised model is defined by \[ \overline{p}(\boldsymbol{y}|\boldsymbol{u},\boldsymbol{\theta}) = p(\boldsymbol{y}|\overline{\boldsymbol{\eta}}(\boldsymbol{u}),\boldsymbol{\theta}) = p(\boldsymbol{y}|g^{-1}[\overline{\boldsymbol{\eta}}(\boldsymbol{u})],\boldsymbol{\theta}) \approx p(\boldsymbol{y}|g^{-1}[\widetilde{\boldsymbol{\eta}}(\boldsymbol{u})],\boldsymbol{\theta}) = p(\boldsymbol{y}|\widetilde{\boldsymbol{\eta}}(\boldsymbol{u}),\boldsymbol{\theta}) = \widetilde{p}(\boldsymbol{y}|\boldsymbol{u},\boldsymbol{\theta}) \] The non-linear model posterior is factorised as \[ \widetilde{p}(\boldsymbol{\theta},\boldsymbol{u}|\boldsymbol{y}) = \widetilde{p}(\boldsymbol{\theta}|\boldsymbol{y})\widetilde{p}(\boldsymbol{u}|\boldsymbol{y},\boldsymbol{\theta}), \] and the linear model approximation is factorised as \[ \overline{p}(\boldsymbol{\theta},\boldsymbol{u}|\boldsymbol{y}) = \overline{p}(\boldsymbol{\theta}|\boldsymbol{y})\overline{p}(\boldsymbol{u}|\boldsymbol{y},\boldsymbol{\theta}) . \]

Fixed point iteration

The remaining step of the approximation is how to choose the linearisation point $\boldsymbol{u}_0$. We start by introducing a functional $f(\overline{p}_{\boldsymbol{v}})$ of the posterior distribution linearised at $\boldsymbol{v}$, that generates a latent field configuration. We then seek a fix point of the functional, so that $\boldsymbol{u}_0=f(\overline{p}_{\boldsymbol{u}_0})$. Potential choices for $f(\cdot)$ include the posterior expectation $\overline{E}(\boldsymbol{u}|\boldsymbol{y})$ and the joint conditional mode, \[ f(\overline{p}_{\boldsymbol{v}})=\mathop{\mathrm{arg\,max}}_{\boldsymbol{u}} \overline{p}_{\boldsymbol{v}}(\boldsymbol{u}|\boldsymbol{y},\widehat{\boldsymbol{\theta}}), \] where $\widehat{\boldsymbol{\theta}}=\mathop{\mathrm{arg\,max}}_{\boldsymbol{\theta}} \overline{p}_{\boldsymbol{v}}(\boldsymbol{\theta}|\boldsymbol{y})$ (used in inlabru from version 2.2.3¹) One key to the fix point iteration is that the observation model is linked to $\boldsymbol{u}$ only through the non-linear predictor $\widetilde{\boldsymbol{\eta}}(\boldsymbol{u})$.

Let $\boldsymbol{u}_0$ be an initial linearisation point for the latent variables.
Compute the predictor linearisation at $\boldsymbol{u}_0$,
Compute the linearised INLA posterior $\overline{p}(\boldsymbol{\theta}|\boldsymbol{y})$
Let $\boldsymbol{u}_1=f(\overline{p}_{\boldsymbol{u}_0})$ be the initial candidate for new linearisation point.
Let $\boldsymbol{u}_\alpha=(1-\alpha)\boldsymbol{u}_0+\alpha\boldsymbol{u}_1$, and find the value $\alpha$ that minimises $\|\widetilde{\eta}(\boldsymbol{u}_\alpha)-\overline{\eta}(\boldsymbol{u}_1)\|$.
Set the linearisation point equal to $\boldsymbol{u}_\alpha$ and repeat from step 1, unless the iteration has converged to a given tolerance.

Line search

In step 4, an approximate line search can be used, that avoids many potentially expensive evaluations of the non-linear predictor. We evaluate $\widetilde{\boldsymbol{\eta}}_1=\widetilde{\boldsymbol{\eta}}(\boldsymbol{u}_1)$ and make use of the linearised predictor information. Let $\widetilde{\boldsymbol{\eta}}_\alpha=\widetilde{\boldsymbol{\eta}}(\boldsymbol{u}_\alpha)$ and $\overline{\boldsymbol{\eta}}_\alpha=\overline{\boldsymbol{\eta}}(\boldsymbol{u}_\alpha)=(1-\alpha)\widetilde{\boldsymbol{\eta}}(\boldsymbol{u}_0)+\alpha\overline{\boldsymbol{\eta}}(\boldsymbol{u}_1)$. An exact line search would minimise $\|\widetilde{\boldsymbol{\eta}}_\alpha-\overline{\boldsymbol{\eta}}_1\|$. Instead, we define a quadratic approximation to the non-linear predictor as a function of $\alpha$, \[ \breve{\boldsymbol{\eta}}_\alpha = \overline{\boldsymbol{\eta}}_\alpha + \alpha^2 (\widetilde{\boldsymbol{\eta}}_1 - \overline{\boldsymbol{\eta}}_1) \] and minimise the quartic polynomial in $\alpha$, \[ \begin{aligned} \|\breve{\boldsymbol{\eta}}_\alpha-\overline{\boldsymbol{\eta}}_1\|^2 &= \| (\alpha-1)(\overline{\boldsymbol{\eta}}_1 - \overline{\boldsymbol{\eta}}_0) + \alpha^2 (\widetilde{\boldsymbol{\eta}}_1 - \overline{\boldsymbol{\eta}}_1) \|^2 . \end{aligned} \] If initial expansion and contraction steps are carried out, leading to an initial guess of $\alpha=\gamma^k$, where $\gamma>1$ is a scaling factor (see ?bru_options, bru_method$factor) and $k$ is the (signed) number of expansions and contractions, the quadratic expression is replaced by \[ \begin{aligned} \|\breve{\boldsymbol{\eta}}_\alpha-\overline{\boldsymbol{\eta}}_1\|^2 &= \| (\alpha-1)(\overline{\boldsymbol{\eta}}_1 - \overline{\boldsymbol{\eta}}_0) + \frac{\alpha^2}{\gamma^{2k}} (\widetilde{\boldsymbol{\eta}}_{\gamma^k} - \overline{\boldsymbol{\eta}}_{\gamma^k}) \|^2 , \end{aligned} \] which is minimised on the interval $\alpha\in[\gamma^{k-1},\gamma^{k+1}]$.

A potential improvement of step 4 might be to also take into account the prior distribution for $\boldsymbol{u}$ as a minimisation penalty, to avoid moving further than would be indicated by a full likelihood optimisation.

Note: In inlabru version 2.1.15, \[ f(\overline{p}_{\boldsymbol{v}})=\left\{\mathop{\mathrm{arg\,max}}_{u_i} \overline{p}_{\boldsymbol{v}}(u_i|\boldsymbol{y}),\,i=1,\dots,n\right\}, \] was used, which caused problems for some nonlinear models.↩︎

Iterative INLA method

The INLA method for linear predictors

Approximate INLA for non-linear predictors

Fixed point iteration

Line search