An R
package implementing the SE-set algorithm, a tool to explore statistically-equivalent path models from correlation matrices and Gaussian Graphical Models (GGMs).
This repository contains an R
package used by Ryan, Bringmann and Schuurman (pre-print)[PsyArXiv] to aid researchers in investigating the relationship between a given GGM estimate (in the form of a precision matrix) and possible underlying linear path models. Linear path models can also be interpreted as “weighted DAGs”. Full details on the package can be found in Appendix B of the manuscript, and code to reproduce the empirical illustration in that paper can be found in this repository.
Please note that we recommend the use of this tool only for GGMs with 13 or less nodes as the size of the outputted object, and the run-time, grows factorialy (\(p!\)) with each node in the network.
The current version of this package can be installed directly from github using
The main SE-set function takes a precision matrix as input by default (but can also take a matrix of partial correlations, or a covariance matrix, detailed below). This can be estimated using packages such as qgraph
, using either raw data or a matrix of correlations. For example, using the riskcor
correlation matrix supplied with the package as input, the precision matrix omega
can be estimated by running
# load data
data(riskcor)
# estimate precision matrix
estimate <- qgraph::EBICglasso(riskcor, n = 69, returnAllResults = TRUE)
omega <- estimate$optwi
# The precision matrix can also be standardized to a partial correlation matrix, and plotted as a network
library(qgraph)
parcor <- qgraph::wi2net(omega)
pnet <- qgraph(parcor, repulsion = .8,vsize = c(10,15), theme = "colorblind", fade = F, edge.labels = TRUE)
Given a \(p\)-variate precision matrix, the SEset
package can be used to obtain a set of maximally \(p!\) statistically equivalent weighted DAGs (i.e., linear path models): One DAG for every possible topological ordering of the \(p\) variables, from exogenous to endogenous. This is the SE-set of omega
which gives the package its name.
The statistically-equivalent set is found by using the network_to_SEset
function
where the digits
arguments determines the rounding of parameters in the weighted DAGs, and the rm_duplicates
argument indicates that only unique weighted DAGs should be returned: duplicates are removed after rounding. Typically, when duplicates are removed, the number of unique DAGs returned is much less than \(p!\).
Individual members of the SE-set can be plotted, for example using qgraph. When visualizing, remember that the weights matrix of the path model \(B\) is given in SEM form \(X = BX +e\), and so should be read “from column to row”, that is \(b_{ij}\) encodes the direct effect of \(X_j\) on \(X_i\). As a consequence, matrices should be transposed when being used as input to qgraph
and most other network visualization tools.
DAG1 <- matrix(SE_example[1,],6,6)
DAG2 <- matrix(SE_example[30,],6,6)
par(mfrow=c(1,2))
qgraph(t(DAG1), repulsion = .8,vsize = c(10,15), theme = "colorblind", fade = F,
layout = pnet$layout, maximum = 2, edge.labels = T)
qgraph(t(DAG2), repulsion = .8,vsize = c(10,15), theme = "colorblind", fade = F,
layout = pnet$layout, maximum = 2, edge.labels = T)
We also supply additional functions to aid in exploring the SE-set.
For example, the function propcal
calculates the proportion of DAGs in the SE-set in which an edge between two variables is present. You can choose distinguish between the presence of some edge or an edge of a particular direction using the directed
option
# Undirected edge frequency
propu <- propcal(SE_example, rm_duplicate = F, directed = FALSE)
# Directed edge frequency
propd <- propcal(SE_example, rm_duplicate = F, directed = TRUE)
# Plot each as a network
par(mfrow=c(1,2))
qgraph(propu, edge.color = "darkgreen", layout = pnet$layout, edge.labels = T, maximum = 1)
qgraph(propd, edge.color = "darkgreen", layout = pnet$layout, edge.labels = T, maximum = 1)
Given the SEset, you can essentially explore how any property of one graphical model varies across members. For example, we may be interested in the distribution of variance explained (\(R^2\)) for each variable when predicted by it’s directed causes. This will differ across SEset members because the direct casues of any given variable will typically differ across the different path models. This can be computed using the r2_distribution
function.
r2set <- r2_distribution(SEmatrix = SE_example, cormat = riskcor, names = NULL,
indices = c(1,3,4,5,6))
This distribution can be visualized, for instance using ggplot
library(tidyr)
library(ggplot2)
require(ggridges)
df <- as.data.frame(r2set, col.names = paste0("X",1:6))
df2 <- tidyr::gather(df)
p <- ggplot(df2, aes(y = key, x = value)) +
geom_density_ridges(fill = "light seagreen") +
labs(y = "Variable", x = expression(paste("Controllability value ", R^2)))
p + theme(text = element_text(size = 20),
axis.title.x = element_text(size = 20),
axis.title.y = element_text(size = 20),
panel.background = element_blank())
Alternatively, we can supply a matrix of partial correlations, such as output by the parcor
package, or a (model-implied) covariance matrix using the input_type
argument, demonstrated below.
# Using the partial correlation matrix as input
SE_example_2 <- network_to_SEset(parcor, digits = 2, rm_duplicates = TRUE, input_type = "parcor")
# Calculating the model-implied covariance matrix from the precision matrix
MIcov <- solve(omega)
SE_example_3 <- network_to_SEset(MIcov, digits = 2, rm_duplicates = TRUE, input_type = "MIcov")
Note only that, since the partial correlation matrix does not contain information on the diagonal elements of the precision matrix (that is, the partial variances), if a partial correlation matrix is supplied, we transform this to a correlation matrix using the corpcor
package. As such, small numerical differences (approximately in the 7th decimal place) may be present depending on the input used.
For more details please contact o.ryan@uu.nl