The ‘gfilmm’ package allows to generate simulations from the generalized fiducial distribution of the parameters of a Gaussian linear mixed model with categorical random effects (numeric random effects are not supported) and interval data. It also provides some helper functions to get summary statistics and confidence intervals.
The algorithm implemented in ‘gfilmm’ is the one described in the paper Generalized fiducial inference for normal linear mixed models written by Jessi Cisewski and Jan Hannig. It is coded in C++ and the code is based on the original Matlab code written by Jessi Cisewski.
Fiducial inference has something similar to Bayesian inference: the uncertainty about the parameters are represented by a distribution, the fiducial distribution, with the help of which we conduct inference on the parameters in a way similar to the Bayesian way, based on the posterior distribution of the parameters. The main difference is that there is no prior distribution (so fiducial inference is similar to objective Bayesian inference). The fiducial inference yields results close to the ones of the frequentist inference.
Some pieces of code in the algorithm are run in parallel. In the
examples below, as well as in the examples of the package documentation,
I set nthreads = 2L
because of CRAN restrictions: CRAN does
not allow more than two threads and then it would not accept the package
if a higher number of threads were set.
The data must be given as a dataframe. Here we simulate data from a simple linear regression model:
set.seed(666L)
<- 30L
n <- 1L:n
x <- rnorm(n, mean = x, sd = 2)
y <- round(y, digits = 1L)
y_rounded <- data.frame(
dat ylwr = y_rounded - 0.05,
yupr = y_rounded + 0.05,
x = x
)
Now we run the fiducial sampler:
library(gfilmm)
<- gfilmm(
fidSims y = ~ cbind(ylwr, yupr), # interval data
fixed = ~ x, # fixed effects
random = NULL, # random effects
data = dat, # data
N = 30000L, # number of simulations
nthreads = 2L
)
A summary of the fiducial simulations (the Pr(=0)
column
will be explained latter):
gfiSummary(fidSims)
#> mean median lwr upr Pr(=0)
#> (Intercept) 0.8714978 0.8789662 -0.9678679 2.669521 NA
#> x 0.9108290 0.9108441 0.8074198 1.013744 NA
#> sigma_error 2.4460004 2.4053447 1.8862373 3.216785 0
#> attr(,"confidence level")
#> [1] 0.95
The fiducial confidence intervals are close to the frequentist ones:
<- lm(y ~ x)
lmfit confint(lmfit)
#> 2.5 % 97.5 %
#> (Intercept) -0.9589356 2.70406
#> x 0.8076886 1.01402
The fiducial cumulative distribution function of the slope:
<- gfiCDF(~ x, fidSims)
Fslope plot(Fslope, main = "Slope", ylab = expression("Pr("<="x)"))
To get a fiducial density, I recommend the ‘kde1d’ package:
library(kde1d)
<- kde1d(fidSims$VERTEX[["x"]], weights = fidSims$WEIGHT, mult = 4)
kfit curve(dkde1d(x, kfit), from = 0.7, to = 1.1, col = "red", lwd = 2)
# observe the resemblance with the distribution of the
# frequentist estimate of the slope:
curve(
dnorm(x, coef(lmfit)["x"], sqrt(vcov(lmfit)["x","x"])), add = TRUE,
col = "blue", lwd = 2, lty = "dashed"
)
The fiducial simulations are weighted, so it makes no sense to plot them without taking the weights into account. We can get “pseudo-simulations” of the fiducial distribution by picking the fiducial simulations at random according to their weight:
<- sample.int(30000L, replace = TRUE, prob = fidSims$WEIGHT)
indcs <- fidSims$VERTEX[indcs,]
pseudoSims library(GGally)
#> Le chargement a nécessité le package : ggplot2
#> Registered S3 method overwritten by 'GGally':
#> method from
#> +.gg ggplot2
ggpairs(
pseudoSims,upper = list(continuous = ggally_density),
lower = list(continuous = wrap("points", alpha = 0.1))
)
The scatterplot of the intercept and the slope has the same shape as the frequentist confidence ellipse of these two parameters:
library(car)
#> Le chargement a nécessité le package : carData
dataEllipse(
"(Intercept)"]], pseudoSims[["x"]],
pseudoSims[[levels = c(0.5,0.95), col = c("black", "red"),
xlab = "Intercept", ylab = "Slope"
)confidenceEllipse(
1:2, levels = c(0.5,0.95), add = TRUE,
lmfit, col = "blue", lty = "dashed"
)
The gfilmmPredictive
function samples the generalized
fiducial predictive distribution. All the functions seen above can be
applied to the output.
<- gfilmmPredictive(fidSims, newdata = data.frame(x = c(1, 30)))
fpd gfiSummary(fpd)
#> mean median lwr upr
#> y1 1.789606 1.799935 -3.41942 7.028119
#> y2 28.196433 28.202328 22.99711 33.329155
#> attr(,"confidence level")
#> [1] 0.95
Compare with the frequentist approach:
predict(lmfit, newdata = data.frame(x = c(1, 30)), interval = "prediction")
#> fit lwr upr
#> 1 1.783417 -3.408471 6.975305
#> 2 28.198198 23.006310 33.390086
Now let us simulate some data from a one-way ANOVA model with a random factor:
<- 10000 # grand mean
mu <- 2
sigmaBetween <- 3
sigmaWithin <- 6L # number of groups
I <- 5L # sample size per group
J
set.seed(31415926L)
<- rnorm(I, mu, sigmaBetween)
groupmeans <- c(
y vapply(groupmeans, function(gmean) rnorm(J, gmean, sigmaWithin), numeric(J))
)<- round(y, digits = 1L)
y_rounded <- data.frame(
dat ylwr = y_rounded - 0.05,
yupr = y_rounded + 0.05,
group = gl(J, I)
)
We run the fiducial sampler:
<- gfilmm(
fidSims ~ cbind(ylwr, yupr), ~ 1, ~ group, data = dat, N = 30000L, nthreads = 2L
)
Observe that the between standard deviation sigma_group
has a positive value in the Pr(=0)
column:
gfiSummary(fidSims)
#> mean median lwr upr Pr(=0)
#> (Intercept) 9999.985967 9999.987730 9997.636256 10002.333351 NA
#> sigma_group 1.982167 1.733644 0.000000 5.445910 0.03838671
#> sigma_error 2.594661 2.543399 1.898396 3.596938 0.00000000
#> attr(,"confidence level")
#> [1] 0.95
What does it mean? The fiducial distributions of the variance
components have a mass at zero, and this value is the probability that
the between standard deviation equals zero. So you have to be careful if
you are interested in the fiducial density of a standard deviation: if
Pr(=0)
is not null for the standard deviation you are
interested in, the fiducial distribution of this standard deviation
does not have a density. It has a mass at zero, and a density
on the strictly positive real numbers.
Compare the fiducial confidence interval of the grand mean to its Kenward-Roger confidence interval:
library(lmerTest)
library(emmeans)
<- lmer(y ~ (1|group), data = dat)
fit emmeans(fit, ~ 1)
#> 1 emmean SE df lower.CL upper.CL
#> overall 10000 0.83263 4 9997.7 10002
#>
#> Degrees-of-freedom method: kenward-roger
#> Confidence level used: 0.95
Actually the design is balanced, and we can get the same confidence interval by the frequentist method:
library(AOV1R)
<- aov1r(y ~ dat$group)
aovfit confint(aovfit)
#> estimate lwr upr
#> grandMean 10000.000191 9997.688438 10002.311943
#> within 2.575337 2.019727 3.555018
#> between 1.536546 -0.314190 5.239974
#> total 2.998889 2.432479 5.885821
#>
#> attr(,"confidence level")
#> [1] 0.95
#> attr(,"standard deviations")
#> [1] TRUE
With gfiConfInt
we can get a fiducial confidence
interval for any parameter of interest, for example the
coefficient of total variation:
gfiConfInt(~ sqrt(sigma_group^2 + sigma_error^2)/`(Intercept)`, fidSims)
#> 2.5% 97.5%
#> 0.0002319522 0.0006015996
This is a good question. Unfortunately I have no clue about the answer. A good practice consists in running several fiducial samplers and to check whether the results change. Since we have already tried 30000 simulations, let us try with 40000 and 50000:
<- lapply(c(40000L, 50000L), function(N){
gfs gfilmm(~ cbind(ylwr, yupr), ~ 1, ~ group, data = dat, N = N, nthreads = 2L)
})lapply(gfs, gfiSummary)
#> [[1]]
#> mean median lwr upr Pr(=0)
#> (Intercept) 9999.978671 9999.968214 9997.552700 10002.419754 NA
#> sigma_group 2.035821 1.784611 0.000000 5.485706 0.03675751
#> sigma_error 2.574917 2.522337 1.894993 3.551445 0.00000000
#> attr(,"confidence level")
#> [1] 0.95
#>
#> [[2]]
#> mean median lwr upr Pr(=0)
#> (Intercept) 9999.977925 9999.956154 9997.576927 10002.458668 NA
#> sigma_group 2.040705 1.792377 0.000000 5.554322 0.0322059
#> sigma_error 2.565906 2.512663 1.895767 3.528799 0.0000000
#> attr(,"confidence level")
#> [1] 0.95
We find similar results, so N = 30000L
should be
enough.
Never combine the fiducial simulations of two different runs. That makes no sense.
I have performed some simulations to evaluate the frequentist coverage of the fiducial 95%-confidence intervals for two different designs of a one-way ANOVA model with a random factor: the first one with three groups and two replicates per group (3x2 design), and the second one with six groups and three replicates per group (6x3 design). I compared with the frequentist confidence intervals calculated by the ‘AOV1R’ package. These ones are theoretically exact, except the one about the between standard deviation.
Here are the coverage probabilities of the frequentist confidence intervals:
Interval
Parameter two-sided left-sided right-sided
grandMean 95.250 97.800 97.45
within 94.600 97.550 97.05
between 94.575 96.975 97.60
And the ones of the fiducial confidence intervals:
Interval
Parameter two-sided left-sided right-sided
grandMean 99.450 99.750 99.700
within 95.800 97.950 97.850
between 99.425 99.975 99.450
total 96.850 99.925 96.925
CV 96.850 99.925 96.925
The figures below show the confidence intervals for fifteen simulations.
Coverage probabilities of the frequentist confidence intervals:
Interval
Parameter two-sided left-sided right-sided
grandMean 95.10 97.400 97.700
within 94.85 97.450 97.400
between 94.95 97.275 97.675
Coverage probabilities of the fiducial confidence intervals:
Interval
Parameter two-sided left-sided right-sided
grandMean 96.925 98.400 98.525
within 95.475 97.925 97.550
between 98.325 99.675 98.650
total 95.950 99.375 96.575
CV 95.975 99.375 96.600
Confidence intervals for fifteen simulations: