Autocorrelated Kernel Density Estimation

Christen H. Fleming and Justin M. Calabrese

2022-07-10

In this vignette we walk through autocorrelated kernel density estimation. We will assume that you have already estimated a good ctmm movement model for your data. For demonstration purposes we will be working from the results of the “Variograms and Model Selection” vignette (see vignette("variogram")). Here we will use the buffalo “Pepper” instead of “Cilla”, because Pepper has a GPS collar malfunction that can be mitigated with optimal weighting.

library(ctmm)
data(buffalo)
Pepper <- buffalo$Pepper
M.IID <- ctmm.fit(Pepper) # no autocorrelation timescales
GUESS <- ctmm.guess(Pepper,interactive=FALSE) # automated model guess
M.OUF <- ctmm.fit(Pepper,GUESS) # in general, use ctmm.select instead

M.IID is the innapropriate, IID model, which will result in a conventional kernel-density estimate, while M.OUF is the superior, continuous-velocity OUF model. Note that you want the best model for each individual, even if that differs by individual. Different movement behaviors and sampling schedules will reveal different autocorrelation structures in the data.

Now we can calculate a uniformly weighted akde object for each model.

KDE <- akde(Pepper,M.IID) # KDE
AKDE <- akde(Pepper,M.OUF) # AKDE

However, Pepper’s sampling schedule is anything but uniform.

# plot all sampling intervals
dt.plot(Pepper)

Keeping in mind that Pepper’s home-range crossing timescale is two weeks, we have several options for calculating optimal weights here. The exact algorithm is the easiest to implement, but it can be prohibitively slow on larger datasets (10k-100k). On the other hand, the fast algorithm can scale to extremely large datasets, but requires an appropriate choice of discrete-time grid dt argument. Here, for the fast algorithm, we need to either toss out the the tiny sampling intervals from the data or choose a tiny discrete-time grid dt argument.

For the exact algorithm, Pepper’s 1,725 locations would not be prohibitive. On the other hand, for the fast algorithm, if we use the minimum sampling interval for dt and considering that Pepper was sampled for 8.5 months (summary(Pepper)) and has a minimum sampling interval of 3 minutes (min(diff(Pepper$t))), then the discrete-time grid would be (8.5 %#% 'month') / (3 %#% 'min') or 120k points, which is very feasible in terms of computer memory. (A very large discrete-time grid could cause an out-of-memory error!) Finally, we could toss out the tiny (and not very informative) sampling intervals and then use the fast algorithm with dt=1 %#% 'hr'. This would reduce the discrete-time grid size to (8.5 %#% 'month') / (1 %#% 'hr') or 6k points.

# Option 1) exact calculation - slow on larger datasets
# UD2w <- akde(Pepper,M.OUF,weights=TRUE,fast=FALSE,PC='direct')

# Option 2) use tiny dt
dt <- min(diff(Pepper$t))
wAKDE <- akde(Pepper,M.OUF,weights=TRUE,dt=dt)

# Option 3) remove tiny sampling intervals and use smaller dt
# dt <- 1 %#% 'hr'
# ...
# wAKDE <- akde(Pepper,M.OUF,weights=TRUE,dt=dt)

Now let us plot the results.

# calculate one extent for all UDs
EXT <- extent(list(KDE,AKDE,wAKDE),level=0.95)

plot(Pepper,UD=KDE,xlim=EXT$x,ylim=EXT$y)
title(expression("IID KDE"["C"]))
plot(Pepper,UD=AKDE,xlim=EXT$x,ylim=EXT$y)
title(expression("OUF AKDE"["C"]))
plot(Pepper,UD=wAKDE,xlim=EXT$x,ylim=EXT$y)
title(expression("weighted OUF AKDE"["C"]))
## DOP values missing. Assuming DOP=1.
## DOP values missing. Assuming DOP=1.
## DOP values missing. Assuming DOP=1.

By default both the density function and its 95% contours are plotted along with the location data. The middle contour represent the maximum likelihood area where the animal spends 95% of its time. This percentage can be changed with the level.UD option (see help(plot.telemetry)). The inner and outer contours correspond to confidence intervals on the magnitude of the area, which can be adjusted with the level option.

The optimal bandwidth determines the “resolution” of the kernel density estimate. By default we plot grid lines with dimensions matching the standard deviations of the individual kernels. This gives a rough guideline as to what spatial details are and are not important in the density estimate. One can see that the IID KDEC estimate fits tightly to the data and reports many significant details in the buffalo’s home range. The autocorrelated estimates predict future space use more accurately, based on the diffusion model, and yield a more honest account of their uncertainties.

In the case of Pepper, the GPS collar malfunctioned during data collection and the sampling interval mysteriously changed from 1 hour (yellow) to 2 hours (red). This irregularity makes Pepper a good candidate for optimally weighted AKDE (weights=TRUE). The optimally weighted estimate features smaller error, finer resolution, and mitigation of sampling bias. You can see that the uniformly weighted estimates place excessive emphasis on the oversampled data (yellow).

Finally, we can compare the area estimates and effective sample sizes.

summary(KDE)
## $DOF
##      area bandwidth 
##      1724      1725 
## 
## $CI
##                               low      est     high
## area (square kilometers) 482.0913 505.6825 529.8294
## 
## attr(,"class")
## [1] "area"
summary(wAKDE)
## $DOF
##      area bandwidth 
##  15.65767  28.16610 
## 
## $CI
##                               low      est     high
## area (square kilometers) 432.3389 761.6594 1182.695
## 
## attr(,"class")
## [1] "area"