Siland: Spatial Influence of landscape

Florence Carpentier and Olivier Martin

2021-01-27

Introduction

The package siland aims to estimate the effects of landscape and local variables on a measurement of interest. Two methods are available which enable to estimate effect intensities but also the scale of the effect for landscape variables, without any a priori.

It is worth noting that the both methods simultaneously estimate intensities of effect and scales of effect using a numerical likelihood maximisation procedure. This means that the buffers radii (Bsiland) or the mean distances of the SIFs (Fsiland) are not given by the user but estimated for each landscape variable.

siland package is designed to be a user-friendly tool: data format are commonly used (data.frame and shape files from GIS data), model expression is simple (classical formula as lm expression), various outputs are very easily computed: summary of estimates and tests, graphical maps of effects and graphical checking of likelihood maximisation. Based on linear model, it includes numerous extensions of linear model: interactions, random effects, GLM (binomial and poisson). It is also possible to build multiannual model. We detailed hereafter some examples of simple use of the siland package. (More details about the method can be found in https://www.biorxiv.org/content/10.1101/692566v1.

A first example

Data

siland package requires two data objects:

landdata=st_read(dsn = "FILE",layer="NAME")

Here, we use the objects dataSiland and landSiland, included in siland package. You can load data using the data command.

library(siland)
## Loading required package: sf
## Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1
data(dataSiland)
data(landSiland)

Main results for this example have been saved in object VignetteSiland and can be loaded to avoid computing time.

data("likresB1","resB1","resB2","resB3","resF1","resF3","resF4","resF5.1","resF5.2","resY")

dataSiland is a data.frame with columns: “obs”, variable of interest and “x1” variable, a continuous local variable. The “Id” column indicates the identification number of the plots where observations were made. landSiland is an object of class sf where landscape is characterised by two features named L1 and L2. Landscape is described by polygons. For each polygon, L1 is equal to 1 if the polygon is of type L1, 0 otherwise (so is it for L2).

str(dataSiland)
## 'data.frame':    100 obs. of  5 variables:
##  $ X  : num  8508 8508 6578 6578 1904 ...
##  $ Y  : num  7177 7177 3829 3829 1555 ...
##  $ x1 : num  6.23 6.23 -3.91 -3.91 7.55 ...
##  $ Id : num  1 1 2 2 3 3 4 4 5 5 ...
##  $ obs: num  18.34 18.17 1.95 1.09 6.81 ...
str(landSiland)
## Classes 'sf' and 'data.frame':   4884 obs. of  3 variables:
##  $ L1      : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ L2      : num  0 1 0 0 0 0 0 0 1 0 ...
##  $ geometry:sfc_MULTIPOLYGON of length 4884; first list element: List of 1
##   ..$ :List of 1
##   .. ..$ : num [1:8, 1:2] 5993 5999 5978 5977 5975 ...
##   ..- attr(*, "class")= chr [1:3] "XY" "MULTIPOLYGON" "sfg"
##  - attr(*, "sf_column")= chr "geometry"
##  - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA
##   ..- attr(*, "names")= chr [1:2] "L1" "L2"

You can look at spatial configuration of the loaded data using the following commands :

library(ggplot2)
library(sf)
L1pol=st_geometry(landSiland[landSiland$L1==1,])#extract an sf object with only polygons of type L1 
L2pol=st_geometry(landSiland[landSiland$L2==1,])#extract an sf object with only polygons of type L2 
ggplot(landSiland)+
  geom_sf(colour="grey",fill="white")+
  geom_sf(data=L1pol,fill="red")+
  geom_sf(data=L2pol,fill="blue")+
  geom_point(data=dataSiland, aes(X,Y),col="green")

Buffer approach : Bsiland()

Classical disks buffers (point observations)

Buffer approach modelizes effect of lanscape variable with a constant intensity on a buffer (i.e. an area of constant radius) around the observation location. Estimating effects of the local variable x1 and landscape variables L1 and L2 on the variable of interest obs can be done with the command Bsiland():

resB1=Bsiland(obs~x1+L1+L2,land=landSiland,data=dataSiland)
resB1
## Model: obs ~ x1 + L1 + L2
## 
## Landscape variables: L1 L2
##  
## Coefficients:
## (Intercept)          x1          L1          L2        B.L1        B.L2 
##       2.040       0.111     -11.068      25.401     121.179     218.946 
## 
## standard error: 1.437742
## AIC: 366.32  AIC (no landscape): 619.37
## (No landscape effect) p-value: <1e-16

Coefficients are the parameter estimates. For a landscape variable, there are two values : for instance for L1, L1 (-11.068) is the intensity of the effect and B.L1 ( 121.179) is the estimated buffer size for L1. AIC is AIC the of the model resB1 (Model: obs ~ x1 + L1 + L2) and AIC(no landscape) is the AIC with only local variable (Model0: obs ~ x1). (No landscape effect) p-value is related to the test of landscape global effect significativity, i.e. H0= ‘Model0: obs ~ x1’ vs H1=‘Model: obs ~ x1 + L1 + L2’ (Likelihood ratio test).

Parameter estimates and tests can be obtained using the command summary:

summary(resB1)
## Buffer sizes:
##     B.L1     B.L2 
## 121.1786 218.9462 
## 
## -- Tests are given conditionally to the best estimated buffer sizes --
## 
## Call:
## obs ~ x1 + L1 + L2
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -3.9453  -0.9165  -0.1116   0.9083   4.0983  
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   2.04012    0.20636   9.886 2.61e-16 ***
## x1            0.11068    0.02361   4.688 9.12e-06 ***
## L1          -11.06774    1.09401 -10.117  < 2e-16 ***
## L2           25.40082    0.78080  32.532  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 2.067102)
## 
##     Null deviance: 2965.92  on 99  degrees of freedom
## Residual deviance:  198.44  on 96  degrees of freedom
## AIC: 362.32
## 
## Number of Fisher Scoring iterations: 2

Note that p.values are given conditionnaly to estimates of buffer radii.

resB1$parambuffer gives the vector of estimated buffers radii. resB1$buffer gives a data.frame of the percentages of each landscape variable (in columns) in buffers of estimated radii for each observation (in rows).

resB1$parambuffer
##     B.L1     B.L2 
## 121.1786 218.9462

The plotBsiland.land() function gives a graphical representation of the estimation. The color intensity is the effect of the landscape variable i.e. the product of the occupancy percentage of the landscape variable in the buffer by the associated coeffcient.

plotBsiland.land(x=resB1,land=landSiland,data=dataSiland)
## Plot for landscape variable  L1
##     B.L1 
## 121.1786


plotBsiland.land(x=resB1,land=landSiland,data=dataSiland,var=2)
## Plot for landscape variable  L2
##     B.L2 
## 218.9462

“Doughnut” buffers (polygonal observations)

In function Bsiland, the argument border=F (by default) indicates that buffers are computed from the observations locations. Buffers are thus a circle of constant radius centered on observation locations.
The option border=T indicates that buffers are computed from the border of the polygon (observation plot) where observations are made. Then the buffers look much more like “doughnuts”, with an empty area in the center.

resB2=Bsiland(obs~x1+L1+L2,land=landSiland,data=dataSiland,family=gaussian,border=T)
resB2$parambuffer
plotBsiland.land(x=resB2,land=landSiland,data=dataSiland)
plotBsiland.land(x=resB2,land=landSiland,data=dataSiland,var=2)

Visual check of likelihood maximisation procedure.

As with all numerical maximisation procedures, optimization problems may arise. The function Bsiland.lik() allows to point out possible problems of optimization.

likresB1=Bsiland.lik(resB1,land= landSiland,data=dataSiland,varnames=c("L1","L2"))
likresB1

On this graphic, the -Log-likelihood is represented in function of buffer radii. The estimation is made by maximazing the likelihood i.e. by minimizing the -Log-likelihood. The orange line indicates the minimal value obtained during the estimation. The black line represents the -loglikelihood for L1 when the buffer radius for L1 is set to given values in seqd argument (seqd=seq(2,200,length=10)) and the other parameters are fixed to their estimation. The black dotted line indicates the value of L1 buffer size estimated during the global estimation procedure. The red continuous and dotted red lines simirlarly indicates the -loglikelihood and the value of buffer size estimated for L2. When minization correctly occurs, the minimal values of the profiled -loglikelihoods are equal and equal the minimal value of the global -Log-likelihood. This means that the minimums of continuous black and red lines are on the orange line (and that dotted black and red lines intersect the continuous black and red lines at their minimum, respectively). If it is not the case, the minimizing procedure has failed and it is necessary to proceed with a new estimation with different initialisation values. This is possible using the argument init in function Bsiland. For instance, for starting estimation from buffer sizes of 20 and 150 for L1 and L2 landscape variables, respectively, use the command Bsiland(obs~x1+L1+L2,land=landSiland,data=dataSiland,init=c(20,150)).

Spatial Influence Function approach : Fsiland()

In the Spatial Influence Function (SIF) approach, each unit of landscape variable (cell/pixel) has an influence which is maximal at its location and decreases as the distance increases. For modelling a given landscape variable influence, we consider every cell where the landscape variable is distributed, and computed its influence by summing spatial influence of all cells. The spatial influence of a cell is determined by its intensity (negative or positive) and its SIF, which is a density function described by its mean distance (in siland package). As for the buffer approach, the siland package allows to estimate the intensity of effect and the scale of effects i.e. here the mean distance of SIF. Estimating effects of the local variable x1 and landscape variables L1 and L2 on the variable of interest obs can be done with the command Fsiland() in a similar syntax than for Bsiland :

resF1=Fsiland(obs~x1+L1+L2,land=landSiland,data=dataSiland)
resF1
## Model: obs ~ x1 + L1 + L2
## 
## Landscape variables: L1 L2
##  
## Coefficients:
## (Intercept)          x1          L1          L2      SIF.L1      SIF.L2 
##       1.249       0.085     -11.957      29.458     113.489     190.179 
## 
## standard error: 1.123119
## AIC: 316.93  AIC (no landscape): 619.37
## (No landscape effect) p-value: <1e-16
summary(resF1)
## SIF parameters:
##   SIF.L1   SIF.L2 
## 113.4894 190.1787 
## 
## -- Tests are given conditionally to the best SIF parameters --
## 
## Call:
## obs ~ x1 + L1 + L2
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -2.97160  -0.83136   0.02424   0.66240   2.68791  
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   1.24886    0.17868   6.989 3.65e-10 ***
## x1            0.08501    0.01861   4.568 1.46e-05 ***
## L1          -11.95677    0.97356 -12.281  < 2e-16 ***
## L2           29.45810    0.68872  42.772  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 1.261396)
## 
##     Null deviance: 2965.92  on 99  degrees of freedom
## Residual deviance:  121.09  on 96  degrees of freedom
## AIC: 312.93
## 
## Number of Fisher Scoring iterations: 2

Here the parameters related to the landscape variable L1 are intensity L1 (-11.957) and the mean distance of its SIF, SIF.L1 (113.4894). The vector resF1$SIF gives for each landscape variable the mean distance of estimated SIFs. The data.frame resF1$landcontri gives for each landscape variable, the cumulative spatial influence received by each observation (i.e. the sum of spatial influence received by each observation from all cells of the landscape).

The command plotFsiland.sif gives a representation of the estimated SIF for each landscape variable.

plotFsiland.sif(resF1)

The command plotFsiland.land gives map of effects of the landscape. By default, it gives the global contribution of all landscape variables:

plotFsiland.land(x=resF1,land=landSiland,data=dataSiland)

One can obtain a map for a specific landscape variable by specifying its number with the argument var:

plotFsiland.land(x=resF1,land=landSiland,data=dataSiland,var=2)

Fsiland() arguments

  • The argument wd indicates the mesh size used to construct spatial unit (cells/pixels) of landscape variable. In fact, lanscape variable described as polygon are discretized on grid during the procedure estimation (like raster). The choice of wd is a tradeoff between computing precision and computing time (and memory size). The smallest wd is, the better are the precision but the longer the computing time is (and the larger the required memory size is). It is worth to note that estimated parameters can be very sensitive to this mesh size. To obtain a reliable estimation, we recommand to ensure, after the estimation procedure, that wd size is at least three times smaller than the smallest estimated SIF. If not, it is recommended to proceed with a new estimation with a smaller wd size.

  • The argument border indicates whether an observation receives the spatial influence of cells belonging to the same plot (i.e. the plot where the observation was made). If border=T, its receives influence only from the border of the plot (so no influence from the cells of the plot). By default, border = F (all cells are considered).

  • The argument sif defines the family function of SIF, i.e. the form of decrease for the landscape influence. The family can be exponential, gaussian, or uniform. If influence is uniform, it implies that there is no decrease of inlfuence and influence is uniform around each pixel of the raster. Note that all landscape variables have the same form.

Comparison between Fsiland() and Bsiland()

Buffer approach modelizes landscape effect from the observation location whereas SIF approach modelizes landscape from the lanscape variables locations. These difference of point of view made difficult to compare the estimated results directly. It is possible to compare the landscape effect received by each observation, named hereafter (landscape) contribution, estimated by the both models.


resB1$parambuffer
##     B.L1     B.L2 
## 121.1786 218.9462
resF1$paramSIF
##   SIF.L1   SIF.L2 
## 113.4894 190.1787
plot(resB1$buffer[,1],resF1$landcontri[,1],xlab="Buffer contribution",ylab="FIS contribution", main="Variable L1")
abline(0,1)

plot(resB1$buffer[,2],resF1$landcontri[,2],xlab="Buffer contribution",ylab="FIS contribution", main="Variable L2")
abline(0,1)

To compare models adequacy to data, one can compare their AIC (the smaller the better) :

resB1$AIC
## [1] 366.3203
resF1$AIC
## [1] 316.9274

Model extensions

Mixed model

Random effect can be included using the syntax (1| ). Note that only local effect are concerned.

resB3=Bsiland(obs~x1+L1+L2+(1|Id),land=landSiland,data=dataSiland)
summary(resB3)
## Buffer sizes:
##     B.L1     B.L2 
## 121.7323 219.1212 
## 
## -- Tests are given conditionally to the best estimated buffer sizes --
## Linear mixed model fit by REML ['lmerMod']
## 
## REML criterion at convergence: 345.1
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.0652 -0.5592 -0.1006  0.5043  2.1234 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Id       (Intercept) 0.9769   0.9884  
##  Residual             1.1309   1.0635  
## Number of obs: 100, groups:  Id, 50
## 
## Fixed effects:
##              Estimate Std. Error t value
## (Intercept)   2.03952    0.25219   8.087
## x1            0.11059    0.02884   3.835
## L1          -11.08324    1.33819  -8.282
## L2           25.41874    0.95426  26.637
## 
## Correlation of Fixed Effects:
##    (Intr) x1     L1    
## x1  0.161              
## L1 -0.328 -0.090       
## L2 -0.658 -0.223  0.066
resF3=Fsiland(obs~x1+L1+L2+(1|Id),land=landSiland,data=dataSiland)
summary(resF3)
## SIF parameters:
##   SIF.L1   SIF.L2 
## 113.4894 190.1787 
## 
## -- Tests are given conditionally to the best SIF parameters --
## Linear mixed model fit by maximum likelihood  ['lmerMod']
## 
##      AIC      BIC   logLik deviance df.resid 
##    314.7    330.3   -151.4    302.7       94 
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.62434 -0.73048  0.01841  0.59069  2.34981 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Id       (Intercept) 0.080    0.2828  
##  Residual             1.131    1.0635  
## Number of obs: 100, groups:  Id, 50
## 
## Fixed effects:
##              Estimate Std. Error t value
## (Intercept)   1.24886    0.18076   6.909
## x1            0.08501    0.01883   4.515
## L1          -11.95677    0.98490 -12.140
## L2           29.45810    0.69674  42.280
## 
## Correlation of Fixed Effects:
##    (Intr) x1     L1    
## x1  0.201              
## L1 -0.373 -0.107       
## L2 -0.702 -0.252  0.055

Non gaussian model

To consider distributions of the variable of interest that differ from Gaussian, use the option family available in the Bsiland and the Fsiland functions. Family can be “gaussian”, “poisson” or “binomial” and the associated link function are identity, log and logit respectively.

Model with interaction between local and landscape variables

Interaction between local and landscape variables using the syntax : or * (as well as localxlocal interaction). Interaction term modify the intensity of a landscape variable according to the local variable value. But the scale of effect of the landscape variable remains constant (buffer size for Bsiland or SIF mean distance for Fsiland).

#Model with main and interaction effect
resF4=Fsiland(obs~x1*L1+L2,land=landSiland,data=dataSiland)
resF4
## Model: obs ~ x1 * L1 + L2
## 
## Landscape variables: L1 L2
##  
## Coefficients:
## (Intercept)          x1          L1          L2       x1:L1      SIF.L1 
##       1.243       0.081     -12.023      29.435       0.100     110.550 
##      SIF.L2 
##     190.241 
## 
## standard error: 1.128036
## AIC: 318.75  AIC (no landscape): 619.37
## (No landscape effect) p-value: <1e-16
#Model with only interaction effect
resF5.1=Fsiland(obs~x1:L1+L2,land=landSiland,data=dataSiland)
resF5.1
## Model: obs ~ x1:L1 + L2
## 
## Landscape variables: L1 L2
##  
## Coefficients:
## (Intercept)          L2       x1:L1      SIF.L1      SIF.L2 
##       0.240      31.366      -0.483      54.074     200.230 
## 
## standard error: 1.780699
## AIC: 408.14  AIC (no landscape): 626.76
## (No landscape effect) p-value: <1e-16
## Warning in print.Fsiland(x): 
## It is recommended that wd is three times smaller than the estimated SIF mean distance. A new estimation with smaller wd should be more appropriate (see argument wd in Fsiland).
resF5.2=Fsiland(obs~x1:L1+L2,land=landSiland,data=dataSiland,wd=15)
resF5.2
## Model: obs ~ x1:L1 + L2
## 
## Landscape variables: L1 L2
##  
## Coefficients:
## (Intercept)          L2       x1:L1      SIF.L1      SIF.L2 
##       0.215      31.603      -0.457      54.721     200.628 
## 
## standard error: 1.801733
## AIC: 410.49  AIC (no landscape): 626.76
## (No landscape effect) p-value: <1e-16

The same syntax can be applied with function Bsiland().

Multiyear model and multisite model

It is possible to deal with data observed during several years, i.e. data associated with several landscapes with the function Bsiland (this option is not implemented yet in Fsiland). The two data objects required are then : * a list of data.frames of located observations, one for each year. An important point is that data.frames column names have to be exactly the same and ordered in the same way. * a list of sf object containing a landscape description of each year.

Let us suppose we have two years of observations associated with two landscapes. Since the goal is only to show how to deal with such datasets, we take the same datastets for observations and for landscapes for the two years.

landSilandY1=landSiland
landSilandY2=landSiland
#landSilandY is a list with the landscape for each year
landSilandY=list(landSilandY1,landSilandY2)
dataSilandY1=dataSiland
dataSilandY2=dataSiland
dataSilandY1$year=factor("2018")
dataSilandY2$year=factor("2019")
head(dataSilandY1)
head(dataSilandY2)
dataSilandY=list(dataSilandY1,dataSilandY2)
resY=Bsiland(obs~year+x1+L1+L2, land = landSilandY,data=dataSilandY)
resY
## Model: obs ~ year + x1 + L1 + L2
## 
## Landscape variables: L1 L2
##  
## Coefficients:
## (Intercept)    year2019          x1          L1          L2        B.L1 
##       2.040       0.000       0.111     -11.068      25.401     121.179 
##        B.L2 
##     218.946 
## 
## standard error: 1.42664
## AIC: 720.64  AIC (no landscape): 1234.73
## (No landscape effect) p-value: <1e-16
summary(resY)
## Buffer sizes:
##     B.L1     B.L2 
## 121.1786 218.9462 
## 
## -- Tests are given conditionally to the best estimated buffer sizes --
## 
## Call:
## obs ~ year + x1 + L1 + L2
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -3.9453  -0.9165  -0.1116   0.9083   4.0983  
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.040e+00  1.765e-01  11.561  < 2e-16 ***
## year2019    -1.618e-15  2.018e-01   0.000        1    
## x1           1.107e-01  1.657e-02   6.681 2.42e-10 ***
## L1          -1.107e+01  7.676e-01 -14.418  < 2e-16 ***
## L2           2.540e+01  5.478e-01  46.365  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 2.035301)
## 
##     Null deviance: 5931.84  on 199  degrees of freedom
## Residual deviance:  396.88  on 195  degrees of freedom
## AIC: 716.64
## 
## Number of Fisher Scoring iterations: 2

It is possible to deal with multisite data, using the same procedure considering different sites instead of different years.

Remarks

As for an object ot type GLM, functions AIC(), residuals() and fitted() are available. In fact, conditionnaly to the estimated buffers or SIFs, the fitted model is a GLM or a LMM (Linear Mixed Model) or a GLMM (Generalized Linear Mixed Model). So after an estimation with Bsiland or Fsiland(), it is possible to analyse more precisely the estimated model with the object result stored in the output. Object result is an an object from glm() or lmer() or glmer() functions.

summary(resB1$result)
## 
## Call:
## glm(formula = model, family = family, data = newdata)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -3.9453  -0.9165  -0.1116   0.9083   4.0983  
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   2.04012    0.20636   9.886 2.61e-16 ***
## x1            0.11068    0.02361   4.688 9.12e-06 ***
## L1          -11.06774    1.09401 -10.117  < 2e-16 ***
## L2           25.40082    0.78080  32.532  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 2.067102)
## 
##     Null deviance: 2965.92  on 99  degrees of freedom
## Residual deviance:  198.44  on 96  degrees of freedom
## AIC: 362.32
## 
## Number of Fisher Scoring iterations: 2
BIC(resB1$result)
## [1] 375.3461
fitted(resF1$result)[1:10]
##         1         2         3         4         5         6         7         8 
## 17.515084 17.515084  1.931324  1.931324  6.116991  6.116991 -3.618455 -3.618455 
##         9        10 
##  2.579123  2.579123
residuals(resF1$result)[1:10]
##           1           2           3           4           5           6 
##  0.82133235  0.65286713  0.01963192 -0.84277194  0.69098527 -1.67560150 
##           7           8           9          10 
##  1.01085931 -1.20498043  0.12104466  1.27277387
class(resB1$result)
## [1] "glm" "lm"
class(resF1$result)
## [1] "glm" "lm"

Further developments