Getting started with sjSDM: a scalable joint Species Distribution Model

Maximilian Pichler & Florian Hartig, Theoretical Ecology, University of Regensburg

2022-06-22

Abstract

sjSDM is a scalable joint species distribution model based on Monte-Carlo intergration of the joint likelihood.

Getting started

Load the package via

library(sjSDM)

Installing dependencies

sjSDM depends on the Anaconda distribution of python and pytorch. You will get a warning if you don’t have python installed. sjSDM depends on Anaconda which will be installed automatically if it is not available.

To install the CPU version, run:

install_sjSDM()

For the GPU version (NVIDIA GPUs are only supported), run:

install_sjSDM(method = "gpu")

More details on and trouble-shooting for installing the necessary dependencies is explained in the vignette dependencies, call

vignette("Dependencies", package = "sjSDM")

Citing sjSDM

To cite sjSDM in a publication, use

citation("sjSDM")

Working with sjSDM

sjSDM has a function to create test data. Here, we create a dataset with 3 environmental predictors, 5 species and 100 sites.

com = simulate_SDM(env = 3L, species = 5L, sites = 100L)

Fitting a model

The model is then fit with the function sjSDM(). You have to provide predictors (can be also a data.frame) and response as matrices.

model = sjSDM(Y = com$response, env = com$env_weights, iter = 100L, se=TRUE)

Interpreting model output

Things you can do with the model output

coef(model)
summary(model)
getCov(model)

Adding quadratic predictors and interactions

sjSDM supports formula description for the predictors.

E.g. interaction with intercept and calculate p-values:

model = sjSDM(Y = com$response, 
              env = linear(data = com$env_weights, formula = ~ X1*X2,), iter = 50L, se = TRUE)
summary(model)

E.g. quadratic effect without intercept:

model = sjSDM(Y = com$response, 
              env = linear(data = com$env_weights, formula = ~0+ I(X1^2)), iter = 50L, se = TRUE)
summary(model)

Handling spatial autocorrelation

Simulating spatial data

jSDMs account for correlation between species within communities (sites), in real datasets, however, communities (sites) are often additionally correlated (== spatial autocorrelation).

Let’s first simulate test data: 1) Simulate jSDM without a link (normal response)

com = simulate_SDM(env = 3L, species = 5L, sites = 100L, 
                   link = "identical", response = "identical")
X = com$env_weights
Y = com$response
  1. add spatial residuals (create coordinates and use spatial distance matrix to draw autocorrelated residuals for each species)
XYcoords = matrix(rnorm(200), 100, 2)+2
WW = as.matrix(dist(XYcoords))
spatialResiduals = mvtnorm::rmvnorm( 5L, sigma = exp(-WW))
  1. Finish test data
Ysp = Y + t(spatialResiduals)
Y = ifelse(Ysp < 0, 0, 1) # multivariate probit model

Fitting spatial jSDM

Using spatial eigenvectors as additional predictors is a common approach:

SPeigen = generateSpatialEV(XYcoords)

model = sjSDM(Y, env = linear(X, ~.), spatial = linear(SPeigen, ~0+.), iter = 100L)
summary(model)

Further analyses

Anova

Type I analysis of variance for the two/three different components (environment, biotic associations, and space (optional)).

an = anova(model)
plot(an)

Internal meta-community structure

We can also visualize the internal meta-community structure (only for models with space):

plot(an, internal = TRUE)

Importance

We can also partition the effects of the different components. This function will be deprecated in future. Please use plot(anova(model), internal = TRUE)

imp = importance(model)
plot(imp)

Regularization

Regularization on abiotic coefficients

sjSDM supports l1 (lasso) and l2 (ridge) regularization: * alpha is the weighting between lasso and ridge * alpha = 0.0 corresponds to pure lasso * alpha = 1.0 corresponds to pure ridge

model = sjSDM(Y = com$response, env = linear(data = com$env_weights, formula = ~0+ I(X1^2),lambda = 0.5), iter = 50L)
summary(model)

Regularization on species-species associations

We can do the same for the species associations:

model = sjSDM(Y = com$response, 
              env = linear(data = com$env_weights, formula = ~0+ I(X1^2),lambda = 0.5),
              biotic = bioticStruct(lambda =0.1),
              iter = 50L)
summary(model)

Regularization on the spatial model:

model = sjSDM(Y, env = linear(X, ~X1+X2), spatial = linear(XYcoords, ~0+X1:X2, lambda = 0.4))
summary(model)

Fitting a non-linear (deep neural network) model

com = simulate_SDM(env = 3L, species = 5L, sites = 100L)
X = com$env_weights
Y = com$response

# three fully connected layers with relu as activation function
model = sjSDM(Y = Y, 
              env = DNN(data = X, formula = ~., hidden = c(10L, 10L, 10L), activation = "relu"), 
              iter = 50L, se = TRUE)
summary(model)

The methods for sjSDM() work also for the non-linear model:

getCov(model) # species association matrix
pred = predict(model) # predict on fitted data
pred = predict(model, newdata = X) # predict on new data

Extract and set weights of model:

weights = getWeights(model) # get layer weights and sigma
setWeights(model, weights)

Plot the training history:

plot(model)

We can also use a non-linear model as a spatial model:

model = sjSDM(Y, env = linear(X, ~X1+X2), spatial = DNN(XYcoords, ~0+X1:X2, lambda = 0.4))
summary(model)

Or for both:

model = sjSDM(Y, env = DNN(X, ~X1+X2), spatial = DNN(XYcoords, ~0+X1:X2, lambda = 0.4))
summary(model)

Fitting other distributions (e.g. species frequencies)

sjSDM supports other responses than presence-absence data: Simulate non-presence-absence data:

com = simulate_SDM(env = 3L, species = 5L, sites = 100L,link = "identical", response = "count") # samples from a Poisson distribution
X = com$env_weights
Y = com$response

Poisson

model = sjSDM(Y, env = linear(X, ~.), se = TRUE, iter = 50L, family = poisson("log"))
summary(model)

Normal (gaussian)

model = sjSDM(log(Y+0.01), env = linear(X, ~.), se = TRUE, iter = 50L, family = gaussian("identity"))
summary(model)