The function fclust can work using different methods and models managed by options. The three main options are opt.method, opt.model and opt.mean. Additional option is opt.jack. The options are defined by default for focusing on the main results provided by a functional clustering. They are: opt.tree = list(“prd”, “leg”), opt.perf = list(“prd”, “pub”), opt.motif = list(“obs”, “hor”, “leg”).
opt.method determines the method of clustering. The option can be “divisive”, “agglomerative” or “apriori”. All methods generate hierarchical trees. Each tree is complete, running from a unique trunk to as many leaves as components.
If opt.method = “divisive”, the components are clustered from the trivial cluster where all components are together, towards the clustering where each component is isolated in a cluster. This method is very efficient because the first component partioning change considerably the coefficient of determination of clustering model. The first divisions are thus very discriminative, and it is therefore possible to surely observe the effect on performance of co-occurring components within assemblages.
If opt.method = “agglomerative”, the components are clustered from the trivial clustering where each component is isolated in a cluster, towards the cluster where all components are together. In most cases, the first component clustering do not change the coefficient of determination of clustering model: the first clustering are not discriminative and it is therefore not possible to select the component combinations associated with the strongest effects of assemblage performance.
res <- fclust(dat.2004, nbElt, opt.method = "divisive")
fclust_plot(res, opt.tree = list("prd"))
res <- fclust(dat.2004, nbElt, opt.method = "agglomerative")
fclust_plot(res, opt.tree = list("prd"))
The left graph shows the tree obtained using “divisive”" method, the right graph the tree obtained using “agglomerative”" method. Divisive method gives generally a more accurate and a more predictive tree than agglomerative method. Divisive and agglomerative methods give the same results whether the number of components is small, thus the number of possible component partitions also is small. (The possible component partitions is given by the number of Stirling of second species, see the function stirling).
apriori <- c(1,3,2,4,1,3,3,2,1,2,1,4,2,3,4,4)
apriori <- c("F","C3","L","C4","F","C3","C3","L","F","L","F","C4","L","C3","C4","C4")
res <- fclust(dat.2004, nbElt, opt.method = "apriori", affectElt = apriori)
fclust_plot(res, opt.tree = list("prd", "leg", cols = apriori))
In ecology, meadow species are classically a priori clustered in Legumes (in red), Forbs (in cyan), C3-grasses (in blue) and C4 grasses (in gold). A functional clustering suggests that only the legumes group is pertinent.
opt.model determines the model for predicting assemblage performance. The option can be “bymot” or “byelt”.
opt.mod = “bymot”: the option is the simplest one. The performances are modelled as the mean performance of assemblages that share a same assembly motif, by including all assemblages that belong to the same assembly motif. Consequently, all assemblages that share a same assembly motif have the same predicted performance, whatever their component composition. The modelling of assemblage performances is therefore based only on the partitioning of components into functional groups, and the resulting partitioning of assemblages into assembly motifs. It does not take into account the fact that the elementary composition of each assemblage differs from one assemblage to another within the same assembly motif, and that these elementary compositions are known.
opt.model = “byelt”: the option uses the fact that the elementary composition of each assemblage differs from one assemblage to another within the same assembly motif, and that these elementary compositions are known. Within each assembly motif, the performances of assemblages that contain a given components are first averaged. Consequently, a mean value of performance is associated with each component that occurs within the assembly motif. Second the performance of each assemblage is computed as the mean of mean performances of assemblages that contain the same components as the assemblage to predict. If no assemblage contains component belonging to assemblage to predict, performance is computed as the mean performance of all assemblages that share a same assembly motif, as in opt.mod = “bymot”. As a whole, this procedure partitions assemblages by assembly motif and adds a linear model within each assembly motif based on the component occurrence within each assemblage. This procedure can improve the explanatory and predictive abilities of the modelling (see Jaillard et al., 2018a, Meth. Ecol. Evol.).
res <- fclust(dat.2004, nbElt, opt.mod = "bymot")
fclust_plot(res, opt.tree = list("prd"), opt.perf = list("prd", "aov", pvalue = 0.01))
res <- fclust(dat.2004, nbElt, opt.mod = "byelt")
fclust_plot(res, opt.tree = list("prd"), opt.perf = list("prd", "aov", pvalue = 0.01))
Both the highest graphs correspond to opt.model = “bymot”, both the lowest graphs to opt.model = “byelt”. The resulting trees are different: the first redgroup contains Luppe in both the trees, the second blue-group contains Liaas and Lesca in both the trees, but differs by Amocan and Koecr, the third *gold“group contains Andge* in both the trees, but Koecr in bymot-tree and several other species in byelt-tree, etc…. The coefficients of determination are equivalent (R2 = 0.906 against 0.909), and the predictive ability of assemblage performances are more robust with opt.model = “bymot” than with opt.model = “byelt” (E = 0.851 against 0.797, then E/R2 = 0.940 against 0.877). However, our experiment suggests that opt.model = “byelt” gives the most likely result.
opt.mean determines the formula to use in averaging. The option can be “amean” or “gmean”. Functional clustering is based on computations of mean performances of assemblages, differently partitioned. The mean formula to use depends on the distribution of assemblage performance: it can shift a little the resuls.
If opt.mean = “amean”, mean performances are computed using an arithmetic formula.
If opt.mean = “gmean”, mean performances are computed using a geometric formula.
res <- fclust(dat.2004, nbElt, opt.mean = "amean")
fclust_plot(res, opt.tree = list("prd"), opt.perf = list("prd", "aov", pvalue = 0.01))
res <- fclust(dat.2004, nbElt, opt.mean = "gmean")
fclust_plot(res, opt.tree = list("prd"), opt.perf = list("prd", "aov", pvalue = 0.01))
The left graph corresponds to opt.mean = “amean”, the right graph to opt.mean = “gmean”. The resulting trees are the same, and the model goodness-of-fit (R2 = 0.909 against 0.940; E = 0.797 against 0.798) are not significantly different.
opt.jack determines the method of cross-validation. By default (opt.jack = FALSE), the performance of each assemblage is predicted by a Leave-One-Out method: the performance of each assemblage is predicted as the mean performance of assemblages that share a same assembly motif, except the only assemblage to predict. If the number of assemblages that share a same assembly motif is large, Leave-One-Out method is time-consuming. It is more convenient to switch towards a jackknife method (opt.jack = TRUE): the performances of assemblages that belong to each subset are predicted as the mean performance of assemblages of other subsets, except the assemblage subset to predict. jack then specifies how to divide the assemblage collection. jack is an integer vector of length 2: the first integer specifies the size of subset, the second integer specifies the number of subsets.
Note that some computations are time-consuming. To facilitate the monitoring of the smooth running of the computations, informations are written on the Console and graphs are drawn on the Plots panel. The writting are activated or deactivated by the “verbose” option.
getOption("verbose")
#> [1] FALSE
# to follow the computations
options(verbose = TRUE)
# to deactivate the option
options(verbose = FALSE)