Searching for temporal trends in phenotypes or rates

Silvia Castiglione, Carmela Serio, Pasquale Raia

Index

  1. search.trend basics
  2. Temporal trends on the entire tree
  3. Temporal trends at clade level
  4. Multivariate data
  5. Guided examples

search.trend basics

The branch-wise estimation of phenotypic evolutionary rates and the computation of ancestral states at each node make RRphylo especially suitable to study temporal trends in phenotypic means and absolute rates applying to the entire phylogeny or just on a part of it. This is particularly true as the RRphylo method is specifically meant to work with phylogenies of extinct species, taking full advantage from fossil information. The function search.trend (Castiglione et al. 2019) is designed to explore the domain of macroevolution, addressing patterns such as Cope’s rule, or to test other specific research questions as to whether the rate of evolution increased/decreased in a particular tree or clade within it. Although possible in principle, running search.trend without extinct species in the tree makes little sense.

search.trend takes an object produced by RRphylo, which is necessary to retrieve branch-wise rate estimates and ancestral characters at nodes. By default, the function searches for significant temporal trends in phenotypic means and absolute evolutionary rates as applying to the entire phylogeny. If a specific hypothesis about one or more clades is tested, the clades presumed to experience trended evolution, in either phenotypes, rates or both, must be indicated by the argument node. If more than one clade is under testing, the function performs a pairwise comparison of trends also between the clades.

Temporal trends on the entire tree

To search for temporal trends in phenotypes occurring on the entire phylogeny, the function computes the regression slopes between phenotypic values at nodes and tips and their age, meant as the distance from the tree root. The same is implemented to detect temporal trends in absolute rates, but absolute rates are first rescaled in the 0-1 range and logged. Since a significant relationship between phenotypes (absolute rates) and age is possible even under the Brownian Motion (BM from now on), significance is assessed comparing the real slope to a family of 100 slopes (the number can be specified by setting the argument nsim within the function) generated by simulating phenotypes according to the BM process (BMslopes).

search.trend(RR=RR,y=y)->ST
slope p.real p.random spread dev
rescaled absolute rate regression 0.038 0 1 9.38
phenotypic regression 0.008 0 1 0.681

As in the table above, search.trend results for phenotypic ($phenotypic.regression) and absolute rate ($rate.regression) regressions both include: the actual regression slope and its p-value (p.real), and the p-value derived by contrasting the real slope to the BMslopes (p.random). The user wants to look at p.random to see if a real trend applies.

The output of search.trend also includes two metrics specifically designed to quantify the magnitude of the phenotypic/absolute rates deviation from BM. The spread represents the deviation from BM produced by a trend in absolute rates. It is calculated as the ratio between the range of phenotypic values and the range of such values halfway along the tree height, divided to the same metric value generated under BM. spread is 1 under BM. The dev quantifies the deviation from BM produced by a trend in phenotypic means. It represents the deviation of the phenotypic mean from the root value in terms of the number of standard deviations of the trait distribution. dev is zero under the BM.

Therefore, the example above produced significant results for both phenotypic and absolute rates temporal trends (p.random are both > 0.95). Absolute rates increase over time is significant (p.real < 0.05) and some nine times as much (spread = 9.380) as expected under BM. The increase in phenotypic means through time is significant (p.real < 0.05) and significantly different from the BM simulation, describing a deviation in mean phenotype more than one half of the standard deviation of the phenotype (dev = 0.681).

Temporal trends at clade level

A peculiarity of search.trend is that it can test whether individual clades follow a different trend in phenotypic means (absolute rates) over time, as compared to the rest of the tree.

In the case of phenotypic trend, individual clades are tested for the hypothesis that trend intensity (slope) does not depart from BM and whether the marginal means of the regression values differ from the means calculated for the rest of the tree. In the case of a trend in absolute rates the actual regression slope depends on the relative position (age) of the focal nodes respective to the root, given the exponential nature of phenotypic variance change in time. Because of this, estimated marginal means and regression slopes for the rate versus age regression of the focal clade are contrasted directly to the same metrics generated for the rest of the tree. The comparisons for both phenotypes (estimated marginal means) and rates regression (estimated marignal means and regression slopes) are implemented by using the functions emmeans and emtrends in package emmeans (Lenth 2020).

If more than one clade is under testing, the function performs the pairwise comparisons of trend intensity between clades. For absolute rates regression, such comparison is performed by computing differences in estimated marginal means and regression slopes as described above. Trends in phenotypic means are compared by computing the difference in regression slopes between the clades and contrasting such value to a random distribution of differences generated under BM. The difference in estimated marginal means of phenotype versus age regression between clades is also returned.

search.trend(RR=RR,y=y,node=c(225,319))->ST
Trend in absolute rates
Trend in phenotypic means
node emm.difference p.emm slope.node slope.others p.slope node slope p.slope emm.difference p.emm
225 -0.098 0 -0.001 0.001 0.317 225 0.018 1 0.215 0
319 -0.118 0 -0.003 0.002 0.010 319 0.011 1 0.433 0

Same as for the entire tree, search.trend returns the results for phenotypic ($node.phenotypic.regression) and absolute rate ($node.rate.regression) regressions over time at each node under testing. For trends in absolute rates, significance level for both differences in estimated marginal means (emm.difference and p.emm) and regression slopes (slope.node,slope.others, and p.slope) between each clade and the rest of the tree are reported. Results for trend in phenotypic means through each node include significance for the comparison of the real slope to the random slopes (slope and p.slope), and the significance for the difference in estimated marginal means between each clade and the rest of the tree (emm.difference and p.emm).

In the example above, the absolute rates versus age regression through the clade subtended by node 225 (just node 225 from now on) produced significant and negative difference in the estimated marginal means of the clade as compared to the rest of the tree (p.emm < 0.05) but no significant difference in regression slopes (p.slope > 0.05). Similarly, the estimated marginal means of regression through node 319 are significantly lower than the rest of the tree (emm.difference < 0 and p.emm < 0.05), and the difference in slopes is negative and significant (p.slope < 0.05). As for the trend in phenotypic means, this is positive and significantly higher than BM for both clades (slope > 0 and p.slope > 0.975). Both clades also show significantly larger estimated marginal means of phenotypes versus age regression than the rest of the tree (emm.difference > 0 and p.emm < 0.05).

Please note, the dark gray line in both figures represents the whole tree, inclusive of both clades under testing.

Comparison of trends in absolute rates
group_1 group_2 emm.difference p.emm slope.group1 slope.group2 p.slope
g225 g319 0.016 0.702 -0.001 -0.003 0.575
Comparison of trends in phenotypic means
group_1 group_2 slope.group_1 slope.group_2 p.slope emm.difference p.emm
g225 g225 g319 0.018 0.011 0.9 -0.168 0

If more than one clade is indicated, the results also include pairwise comparison of trends between clades ($group.comparison). The comparison of trends in absolute rates is performed in the same way as the comparison of a single clade against the rest of the tree. Thus, results consist of differences in both estimated marginal means (emm.difference) and slopes (slope.group_1,slope.group_2) between clades with respective p.values (p.emm and p.slope respectively). For the comparison of phenotypic trends between clades, the differences in regression slopes (slope.difference) and estimated marginal means (emm.difference) between clades are returned. In this case the p-value for the slope difference (p.slope) is computed through randomization.

In the example above, the comparison of trends in absolute rates at nodes 225 and 319 is not significant for either slope or estimated marginal means difference (both p.emm and p.slope > 0.05). On the other hand, trends in phenotypic means significantly differ in estimate marginal means (p.emm < 0.05), but not in slope (p.slope > 0.05).

Multivariate data and multiple RRphylo output

When applied on multivariate data, search.trend treats each phenotypic and rate (derived by RRphylo) component independently. Also, it performs the whole set of analyses on a multivariate vector of phenotypes (rates), derived by computing the L2-norm of the individual phenotypes (rates) at each branch (the same as in multivariate RRphylo).

When applying search.trend on a multiple RRphylo output, the possible effect of an additional predictor is incorporated in the computation

The multiple regression version of RRphylo allows to incorporate the effect of an additional predictor in the computation of evolutionary rates without altering the ancestral character estimation. Thus, when a multiple RRphylo output is fed to search.trend, the predictor effect is accounted for on the absolute evolutionary rates, but not on the phenotype. However, in some situations the user might want to ‘factor out’ the predictor effect on phenotypes as well. Under the latter circumstance, by setting the argument x1.residuals = TRUE, the residuals of the response to predictor regression are used as to represent the phenotype.

Guided examples

# load the RRphylo example dataset including Cetaceans tree and data
data("DataCetaceans")
DataCetaceans$treecet->treecet # phylogenetic tree
DataCetaceans$masscet->masscet # logged body mass data
DataCetaceans$brainmasscet->brainmasscet # logged brain mass data
DataCetaceans$aceMyst->aceMyst # known phenotypic value for the most recent 
                               # common ancestor of Mysticeti


par(mar=c(0,0,0,1))
plot(ladderize(treecet,FALSE),show.tip.label = FALSE,edge.color = "gray40")
plotinfo<-get("last_plot.phylo",envir =ape::.PlotPhyloEnv)
nodelabels(text="",node=128,frame="circle",bg="red",cex=0.5)
nodelabels(text="Mystacodon",node=128,frame="n",bg="w",cex=1,adj=c(-0.1,0.5),font=2)
range(plotinfo$yy[which(treecet$tip.label%in%tips(treecet,128))])->yran128
rect(plotinfo$x.lim[2]+0.4,yran128[1],plotinfo$x.lim[2]+0.7,yran128[2],col="red",border="red")
range(plotinfo$yy[which(treecet$tip.label%in%tips(treecet,142))])->yran142
rect(plotinfo$x.lim[2]+0.4,yran142[1],plotinfo$x.lim[2]+0.7,yran142[2],col="blue",border="blue")
mtext(c("Mysticeti","Odontoceti"), side = 4,line=-0.5,at=c(sum(yran128)/2,sum(yran142)/2),
      cex=1.5,adj=0.5,col=c("red","blue"))

# check the order of your data: best if data vectors
# are sorted in the same order of the species on the phylogeny
masscet[match(treecet$tip.label,names(masscet))]->masscet

## Example 1: search.trend by setting values at internal nodes
# Set the body mass of Mysticetes ancestor (Mystacodon selenensis) 
# as known value at node and perform RRphylo on the vector of (log) body mass
RRphylo(tree=treecet,y=masscet,aces=aceMyst)->RR

# Perform search.trend on the RR object and (log) body mass by indicating Mysticeti as focal clade
search.trend(RR=RR,y=masscet,node=as.numeric(names(aceMyst)))


## Example 2: search.trend on multiple regression version of RRphylo
# cross-reference the phylogenetic tree and body and brain mass data. Remove from both the tree and
# vector of body sizes the species whose brain size is missing
drop.tip(treecet,treecet$tip.label[-match(names(brainmasscet),treecet$tip.label)])->treecet.multi
masscet[match(treecet.multi$tip.label,names(masscet))]->masscet.multi

# check the order of your data: best if
# data vectors (i.e. masscet and brainmasscet) are sorted
# in the same order of the species on the phylogeny
masscet.multi[match(treecet.multi$tip.label,names(masscet.multi))]->masscet.multi
brainmasscet[match(treecet.multi$tip.label,names(brainmasscet))]->brainmasscet

# perform RRphylo on tree and body mass data
RRphylo(tree=treecet.multi,y=masscet.multi)->RRmass.multi

# create the predictor vector: retrieve the ancestral character estimates 
# of body size at internal nodes from the RR object ($aces) and collate them
# to the vector of species' body sizes to create
c(RRmass.multi$aces[,1],masscet.multi)->x1.mass

# perform the multiple regression version of RRphylo by using
# the brain size as variable and the body size as predictor
RRphylo(treecet.multi,y=brainmasscet,x1=x1.mass)->RRmulti

# Perform search.trend on the multiple RR object to inspect the effect of body 
# size on absolute rates temporal trend only
search.trend(RR=RRmulti, y=brainmasscet,x1=x1.mass,clus=cc)

# Perform search.trend on the multiple RR object to inspect the effect of body
# size on trends in both absolute evolutionary rates and phenotypic values 
# (by using brain versus body mass regression residuals)
search.trend(RR=RRmulti, y=brainmasscet,x1=x1.mass,x1.residuals=TRUE,
             clus=cc)

References

Castiglione, S., Serio, C., Mondanaro, A., Di Febbraro, M., Profico, A., Girardi, G., & Raia, P. (2019). Simultaneous detection of macroevolutionary patterns in phenotypic means and rate of change with and within phylogenetic trees including extinct species. PloS one, 14: e0210101. doi.org/10.1371/journal.pone.0210101

Lenth, R. (2020). emmeans: Estimated Marginal Means, aka Least-Squares Means. R package version 1.4.4. https://CRAN.R-project.org/package=emmeans