--- title: 'mvMORPH: an R package for fitting multivariate evolutionary models' output: pdf_document vignette: > %\VignetteIndexEntry{mvMORPH: an R package for fitting multivariate evolutionary models} %\VignetteEngine{knitr::rmarkdown} --- *** To illustrate how **mvMORPH** works, we describe and comment below two simulated examples. These examples are deliberately simple in order to show how the package works; therefore they are not directly related to any real biological context, although we will use names of known environments in order to make the understanding of objectives and results easier. In the first example, we show how the package works in the classic situation of testing evolution towards two distinct selective regimes (Fig. 1). We simulate two correlated continuous traits evolving according to an Ornstein-Uhlenbeck (OU) process with two optima along the tree. We then fit an OU multivariate model with one (OU1) or two optima (OUM), and we compare the results with those obtained from separate univariate analyses on each simulated trait. This example is intended to show how covariations between traits can blur differences in optima that are not detected using a set of univariate analyses. In this specific case, we further show differences in type I and type II errors under univariate and multivariate settings; we also illustrate how to compare for significant interactions between traits in the path toward the optima. In the second worked example, we compare the structure of evolutionary covariances in two, ecologically distinct species group using a nested hierarchy of tests thanks to the different rate matrix parameterizations proposed in **mvMORPH**. Such an approach is particularly useful to assess statistically how evolutionary covariations differ between groups of species. Indeed, we can expect various covariance structures depending on the selective pressures acting, e.g., on different ecological groups (e.g., low covariances for phenotypically diversified species when compared to more specialized ones, same directions of trait covariances, common structure...). Note that the help pages of the **mvMORPH** package (accessible through the question mark argument in the R console; e.g., ?mvBM) also provide simulated examples illustrating how the various functions work. ## Example 1: Models with distinct selective regimes We first load the package and its dependencies, and initialize the random number generator: ```{r, comment=">"} # Load the package and dependencies (ape, phytools, corpcor, subplex, spam) library(mvMORPH) # Use a specified random number seed for reproducibility set.seed(14) ``` Then we simulate a random pure-birth tree with 100 taxa using the phytools package: ```{r} tree<-pbtree(n=100) ``` In this example, we are interested in evaluating the optima of two species groups (one optimum per species group) experiencing different environments. We first map on the tree the location of each regime (Fig. 1), and then simulate traits according to a multivariate OU process with two slightly distinct optima. For the sake of illustration, we hypothesize that the two selective regimes in this case are two well-known environments found in Africa: Savannah and Tropical Evergreen Forest. In real case studies, we may expect, e.g., that morpho-functional traits found in species occurring in each environment display differences in evolutionary rates, or adaptation toward distinct optima due to distinct selective pressures. We use stochastic character mapping of discrete traits (SIMMAP; Huelsenbeck et al. 2003; Bollback 2006) as proposed by the make.simmap function from **phytools** for mapping the selective regimes: ```{r, comment=">"} # Simulate two selective regimes state<-as.vector(c(rep("Forest",60),rep("Savannah",40))); names(state)<-tree$tip.label # Make the tree with mapped states using SIMMAP tree<-make.simmap(tree, state, model="ER", nsim=1) ``` ![alt text](ellipses.png "Fig. 1") **Figure 1.** *Simulated correlated traits under a multi-optimum OU process (function "mvSIM"). The tree was simulated according to a pure-birth process with 100 taxa and two selective regimes representing hypothesized "Savannah" and "Tropical Evergreen Forest" environments mapped on. The mapping is done using the "make.simmap" function from the phytools package, and is used as an input tree by the mvMORPH functions.* The resulting tree can be plotted with the mapped selective regimes as displayed in Fig. 1 for illustrative purpose: ```{r, comment=">"} # Plot the phylogeny with the mapped discrete trait col<-c("blue","orange"); names(col)<-c("Forest","Savannah") plotSimmap(tree,col, fsize=0.6, node.numbers=FALSE, lwd=3, pts=FALSE) ``` Then we simulate the evolution of a multivariate Ornstein-Uhlenbeck process with two optima on the tree. In order to illustrate the problem shown in Fig. 1 of the main text, we choose here a configuration where the leading eigenvector is oriented at ~45 in the bivariate space of the two evolving traits: ```{r, comment=">"} # 2 Random traits evolving along the phylogeny as a two-optimum OU set.seed(101) alpha<-matrix(c(1.1,-0.9,-0.9,1),2) sigma<-matrix(c(0.35,0.06,0.06,0.35),2) theta<-c(5.5,5.1,1.2,1.4) data<-mvSIM(tree, param=list(sigma=sigma, alpha=alpha, ntraits=2, theta=theta, names_traits=c("limb.length","limb.width")), model="OUM", nsim=1) ``` We now fit a single optimum and a multiple optimum Ornstein-Uhlenbeck process to each trait separately (i.e., univariate analysis), and then a single and a multiple OU process to both traits analyzed simultaneously (i.e., multivariate analysis): ```{r, comment=">"} # Fitting the Ornstein Uhlenbeck on the whole tree trait1_OU1<- mvOU(tree, data[,1], model="OU1", diagnostic=FALSE, echo=FALSE) trait2_OU1<- mvOU(tree, data[,2], model="OU1", diagnostic=FALSE, echo=FALSE) # Fitting the Ornstein Uhlenbeck with multiple optimums trait1_OUM<- mvOU(tree, data[,1], model="OUM", diagnostic=FALSE, echo=FALSE) trait2_OUM<- mvOU(tree, data[,2], model="OUM", diagnostic=FALSE, echo=FALSE) # Compare the AIC values between models fit AIC(trait1_OUM); AIC(trait1_OU1) AIC(trait2_OUM); AIC(trait2_OU1) # Now compare with the multivariate fit OUM<- mvOU(tree, data, model="OUM") OU1<- mvOU(tree, data, model="OU1") AIC(OUM); AIC(OU1) ``` After model fitting, we extract their related AIC or AICc using the expression "$AIC" or with the AIC() generic method. Despite of the low number of parameters to estimate in the univariate analysis (3 and 4 for the OU1 and OUM models, respectively), it is difficult to distinguish between the two models and the unique optimum OU1 is slightly preferred. In stark contrast, the multivariate analysis clearly favors the actual simulated (generating) model (two optima) with a delta AIC > 35. We can further assess the type I and type II error rates through simulations - using the generic simulate() function or mvSIM(). Note that parametric bootstrapping and simulations could be parallelized using forking with the **parallel** package to speed up the computations: ```{r, eval=FALSE} # Simulate 1000 traits under the two optima (OUM) and the unique optimum OU (OU1) process library(parallel) nsim=1000 # Dataset simulated with the OUM maximum likelihood estimates data1<-simulate(OUM, nsim=nsim, tree=tree) # Dataset simulated with the MLE of the OU1 model data2<-simulate(OU1, nsim=nsim, tree=tree) # Fit of the models using the parallel package (we will use 2 cores), can take a while... library(parallel) nb_cores=2L oum_data1<- mclapply(1:nsim, function(x){ mvOU(tree, data1[[x]], model="OUM", method="sparse", diagnostic=F, echo=F) }, mc.cores = getOption("mc.cores", nb_cores)) ou1_data1<- mclapply(1:nsim, function(x){ mvOU(tree, data1[[x]], model="OU1", method="sparse", diagnostic=F, echo=F) }, mc.cores = getOption("mc.cores", nb_cores)) # Now same simulations on the second dataset oum_data2<- mclapply(1:nsim, function(x){ mvOU(tree, data2[[x]], model="OUM", method="sparse", diagnostic=F, echo=F) }, mc.cores = getOption("mc.cores", nb_cores)) ou1_data2<- mclapply(1:nsim, function(x){ mvOU(tree, data2[[x]], model="OU1", method="sparse", diagnostic=F, echo=F) }, mc.cores = getOption("mc.cores", nb_cores)) # Retrieve the results from the simulations OUM_simul<-sapply(1:nsim, function(x){ c(oum_data1[[x]]$AICc,ou1_data1[[x]]$AICc) }) OU1_simul<-sapply(1:nsim, function(x){ c(oum_data2[[x]]$AICc,ou1_data2[[x]]$AICc) }) # Now compute the type I error and power (type II) sum(OU1_simul[1,]