This vignette demonstrates how to improve the Monte Carlo sampling accuracy of leave-one-out cross-validation with the loo package and Stan. The loo package automatically monitors the sampling accuracy using Pareto \(k\) diagnostics for each observation. Here, we present a method for quickly improving the accuracy when the Pareto diagnostics indicate problems. This is done by performing some additional computations using the existing posterior sample. If successful, this will decrease the Pareto \(k\) values, making the model assessment more reliable. loo also stores the original Pareto \(k\) values with the name influence_pareto_k
which are not changed. They can be used as a diagnostic of how much each observation influences the posterior distribution.
The methodology presented is based on the paper
More information about the Pareto \(k\) diagnostics is given in the following papers
Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing. 27(5), 1413–1432. :10.1007/s11222-016-9696-4. Links: published | arXiv preprint.
Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2019). Pareto smoothed importance sampling. arXiv preprint arXiv:1507.02646.
We will use the same example as in the vignette Using the loo package (version >= 2.0.0). See the demo for a description of the problem and data. We will use the same Poisson regression model as in the case study.
Here is the Stan code for fitting the Poisson regression model, which we will use for modeling the number of roaches.
<- "
stancode data {
int<lower=1> K;
int<lower=1> N;
matrix[N,K] x;
int y[N];
vector[N] offset;
real beta_prior_scale;
real alpha_prior_scale;
}
parameters {
vector[K] beta;
real intercept;
}
model {
y ~ poisson(exp(x * beta + intercept + offset));
beta ~ normal(0,beta_prior_scale);
intercept ~ normal(0,alpha_prior_scale);
}
generated quantities {
vector[N] log_lik;
for (n in 1:N)
log_lik[n] = poisson_lpmf(y[n] | exp(x[n] * beta + intercept + offset[n]));
}
"
Following the usual approach recommended in Writing Stan programs for use with the loo package, we compute the log-likelihood for each observation in the generated quantities
block of the Stan program.
In addition to loo, we load the rstan package for fitting the model, and the rstanarm package for the data.
library("rstan")
library("loo")
<- 9547
seed set.seed(seed)
Next we fit the model in Stan using the rstan package:
# Prepare data
data(roaches, package = "rstanarm")
$roach1 <- sqrt(roaches$roach1)
roaches<- roaches$y
y <- roaches[,c("roach1", "treatment", "senior")]
x <- log(roaches[,"exposure2"])
offset <- dim(x)[1]
n <- dim(x)[2]
k
<- list(N = n, K = k, x = as.matrix(x), y = y, offset = offset, beta_prior_scale = 2.5, alpha_prior_scale = 5.0)
standata
# Compile
<- stan_model(model_code = stancode)
stanmodel
# Fit model
<- sampling(stanmodel, data = standata, seed = seed, refresh = 0)
fit print(fit, pars = "beta")
Inference for Stan model: anon_model.
4 chains, each with iter=2000; warmup=1000; thin=1;
post-warmup draws per chain=1000, total post-warmup draws=4000.
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
beta[1] 0.16 0 0.00 0.16 0.16 0.16 0.16 0.16 2344 1
beta[2] -0.57 0 0.03 -0.62 -0.59 -0.57 -0.55 -0.52 2395 1
beta[3] -0.31 0 0.03 -0.38 -0.34 -0.31 -0.29 -0.25 2135 1
Samples were drawn using NUTS(diag_e) at Wed Mar 23 14:12:20 2022.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
Let us now evaluate the predictive performance of the model using loo()
.
<- loo(fit) loo1
Warning: Some Pareto k diagnostic values are too high. See help('pareto-k-diagnostic') for details.
loo1
Computed from 4000 by 262 log-likelihood matrix
Estimate SE
elpd_loo -5459.4 694.1
p_loo 258.8 55.4
looic 10918.9 1388.2
------
Monte Carlo SE of elpd_loo is NA.
Pareto k diagnostic values:
Count Pct. Min. n_eff
(-Inf, 0.5] (good) 241 92.0% 241
(0.5, 0.7] (ok) 7 2.7% 53
(0.7, 1] (bad) 7 2.7% 24
(1, Inf) (very bad) 7 2.7% 2
See help('pareto-k-diagnostic') for details.
The loo()
function output warnings that there are some observations which are highly influential, and thus the accuracy of importance sampling is compromised as indicated by the large Pareto \(k\) diagnostic values (> 0.7). As discussed in the vignette Using the loo package (version >= 2.0.0), this may be an indication of model misspecification. Despite that, it is still beneficial to be able to evaluate the predictive performance of the model accurately.
To improve the accuracy of the loo()
result above, we could perform leave-one-out cross-validation by explicitly leaving out single observations and refitting the model using MCMC repeatedly. However, the Pareto \(k\) diagnostics indicate that there are 19 observations which are problematic. This would require 19 model refits which may require a lot of computation time.
Instead of refitting with MCMC, we can perform a faster moment matching correction to the importance sampling for the problematic observations. This can be done with the loo_moment_match()
function in the loo package, which takes our existing loo
object as input and modifies it. The moment matching requires some evaluations of the model posterior density. For models fitted with rstan, this can be conveniently done by using the existing stanfit
object.
First, we show how the moment matching can be used for a model fitted using rstan. It only requires setting the argument moment_match
to TRUE
in the loo()
function. Optionally, you can also set the argument k_threshold
which determines the Pareto \(k\) threshold, above which moment matching is used. By default, it operates on all observations whose Pareto \(k\) value is larger than 0.7.
# available in rstan >= 2.21
<- loo(fit, moment_match = TRUE) loo2
Warning: Some Pareto k diagnostic values are slightly high. See help('pareto-k-diagnostic') for details.
loo2
Computed from 4000 by 262 log-likelihood matrix
Estimate SE
elpd_loo -5478.8 700.0
p_loo 269.8 61.5
looic 10957.7 1400.1
------
Monte Carlo SE of elpd_loo is 0.4.
Pareto k diagnostic values:
Count Pct. Min. n_eff
(-Inf, 0.5] (good) 252 96.2% 236
(0.5, 0.7] (ok) 10 3.8% 50
(0.7, 1] (bad) 0 0.0% <NA>
(1, Inf) (very bad) 0 0.0% <NA>
All Pareto k estimates are ok (k < 0.7).
See help('pareto-k-diagnostic') for details.
After the moment matching, all observations have the diagnostic Pareto \(k\) less than 0.7, meaning that the estimates are now reliable. The total elpd_loo
estimate also changed from -5457.8
to -5478.5
, showing that before moment matching, loo()
overestimated the predictive performance of the model.
The updated Pareto \(k\) values stored in loo2$diagnostics$pareto_k
are considered algorithmic diagnostic values that indicate the sampling accuracy. The original Pareto \(k\) values are stored in loo2$pointwise[,"influence_pareto_k"]
and these are not modified by the moment matching. These can be considered as diagnostics for how big influence each observation has on the posterior distribution. In addition to the Pareto \(k\) diagnostics, moment matching also updates the effective sample size estimates.
loo_moment_match()
directlyThe moment matching can also be performed by explicitly calling the function loo_moment_match()
. This enables its use also for models that are not using rstan or another package with built-in support for loo_moment_match()
. To use loo_moment_match()
, the user must give the model object x
, the loo
object, and 5 helper functions as arguments to loo_moment_match()
. The helper functions are
post_draws
x
as the first argument and returns a matrix of posterior draws of the model parameters, pars
.log_lik_i
x
and i
and returns a matrix (one column per chain) or a vector (all chains stacked) of log-likeliood draws of the ith observation based on the model x
. If the draws are obtained using MCMC, the matrix with MCMC chains separated is preferred.unconstrain_pars
x
and pars
, and returns posterior draws on the unconstrained space based on the posterior draws on the constrained space passed via pars
.log_prob_upars
x
and upars
, and returns a matrix of log-posterior density values of the unconstrained posterior draws passed via upars
.log_lik_i_upars
x
, upars
, and i
and returns a vector of log-likelihood draws of the i
th observation based on the unconstrained posterior draws passed via upars
.Next, we show how the helper functions look like for RStan objects, and show an example of using loo_moment_match()
directly. For stanfit objects from rstan objects, the functions look like this:
# create a named list of draws for use with rstan methods
<- function(x, skeleton) {
.rstan_relist <- utils::relist(x, skeleton)
out for (i in seq_along(skeleton)) {
dim(out[[i]]) <- dim(skeleton[[i]])
}
out
}
# rstan helper function to get dims of parameters right
<- function(pars, dims) {
.create_skeleton <- lapply(seq_along(pars), function(i) {
out <- length(dims[[i]])
len_dims if (len_dims < 1) return(0)
return(array(0, dim = dims[[i]]))
})names(out) <- pars
out
}
# extract original posterior draws
<- function(x, ...) {
post_draws_stanfit as.matrix(x)
}
# compute a matrix of log-likelihood values for the ith observation
# matrix contains information about the number of MCMC chains
<- function(x, i, parameter_name = "log_lik", ...) {
log_lik_i_stanfit ::extract_log_lik(x, parameter_name, merge_chains = FALSE)[, , i]
loo
}
# transform parameters to the unconstraint space
<- function(x, pars, ...) {
unconstrain_pars_stanfit <- .create_skeleton(x@sim$pars_oi, x@par_dims[x@sim$pars_oi])
skeleton <- apply(pars, 1, FUN = function(theta) {
upars ::unconstrain_pars(x, .rstan_relist(theta, skeleton))
rstan
})# for one parameter models
if (is.null(dim(upars))) {
dim(upars) <- c(1, length(upars))
}t(upars)
}
# compute log_prob for each posterior draws on the unconstrained space
<- function(x, upars, ...) {
log_prob_upars_stanfit apply(upars, 1, rstan::log_prob, object = x,
adjust_transform = TRUE, gradient = FALSE)
}
# compute log_lik values based on the unconstrained parameters
<- function(x, upars, i, parameter_name = "log_lik",
log_lik_i_upars_stanfit
...) {<- nrow(upars)
S <- numeric(S)
out for (s in seq_len(S)) {
<- rstan::constrain_pars(x, upars = upars[s, ])[[parameter_name]][i]
out[s]
}
out }
Using these function, we can call loo_moment_match()
to update the existing loo
object.
<- loo::loo_moment_match.default(
loo3 x = fit,
loo = loo1,
post_draws = post_draws_stanfit,
log_lik_i = log_lik_i_stanfit,
unconstrain_pars = unconstrain_pars_stanfit,
log_prob_upars = log_prob_upars_stanfit,
log_lik_i_upars = log_lik_i_upars_stanfit
)
Warning: Some Pareto k diagnostic values are slightly high. See help('pareto-k-diagnostic') for details.
loo3
Computed from 4000 by 262 log-likelihood matrix
Estimate SE
elpd_loo -5478.8 700.0
p_loo 269.8 61.5
looic 10957.7 1400.1
------
Monte Carlo SE of elpd_loo is 0.4.
Pareto k diagnostic values:
Count Pct. Min. n_eff
(-Inf, 0.5] (good) 252 96.2% 236
(0.5, 0.7] (ok) 10 3.8% 50
(0.7, 1] (bad) 0 0.0% <NA>
(1, Inf) (very bad) 0 0.0% <NA>
All Pareto k estimates are ok (k < 0.7).
See help('pareto-k-diagnostic') for details.
As expected, the result is identical to the previous result of loo2 <- loo(fit, moment_match = TRUE)
.
Gelman, A., and Hill, J. (2007). Data Analysis Using Regression and Multilevel Hierarchical Models. Cambridge University Press.
Stan Development Team (2020) RStan: the R interface to Stan, Version 2.21.1 https://mc-stan.org
Paananen, T., Piironen, J., Buerkner, P.-C., Vehtari, A. (2021). Implicitly adaptive importance sampling. Statistics and Computing, 31, 16. :10.1007/s11222-020-09982-2. arXiv preprint arXiv:1906.08850.
Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing. 27(5), 1413–1432. :10.1007/s11222-016-9696-4. Links: published | arXiv preprint.
Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2019). Pareto smoothed importance sampling. arXiv preprint arXiv:1507.02646.