R Hackathon 1/ContinuousData

From Phyloinformatics
Jump to: navigation, search

Inferring Models for Continuous Characters in R

Most investigations into the evolution of continuous characters start with the Brownian Motion (BM) model. This model is analytically very tractable, and is the basis for a variety of comparative method approaches (e.g., independent contrasts, most instances of phylogenetic GLS). A variety of extensions to BM have been developed; here we focus on the BM model and one alternative: an Ornstein-Uhlenbeck (OU) model with a single optimum.

Fitting Evolutionary Models

Start by reading in the Geospiza data and extracting the the wingL varable into a variable called 'wL'

  data(geospiza)
  wL<- geospiza$geospiza.data[,1] 
  names(wL)<- row.names(geospiza$geospiza.data)

Get the Geospiza tree, call it 'tr.' Then drop the olivacea species and save the new tree as 'tr2.'

 tr<- geospiza$geospiza.tree
 tr2<- drop.tip(tr, "olivacea")

Now, we are ready to fit the BM using geiger's fitContinuous function.

  BMfit <- fitContinuous(tr2, wL, model="BM")
  BMfit

Note that the reported rate parameter ('beta') is 0.0705, with a log-likelihood of 8.243, and a AICc of -11.153.

We can also fit an Orstein-Uhlenbeck model with one optimum to these data. According to this model, there exists an optimal phenotypic value, to which evolutionary trajectories are attracted. This approach has been used to model adaption on macroevolutionary time scales.

 OUfit <- fitContinuous(tr2, wL, model="OU")
 OUfit

The solution for this OU model gives a rather high alpha parameter, which measures the strength of the attraction to the optimum. This estimate should be close to alpha = 15.79, which corresponds to a phylogenetic half-life of log(2)/alpha = 0.043 units. By plotting the tree and adding an axis,

 plot(tr2)
 axisPhylo()

you can see that this is a rather short duration relative to the temporal span of the tree. This is indicative of evolutionary trajectories that are rapidly drawn to the phenotypic optimum.

Comparing the Fit of Evolutionary Models

Because the two fitted models have different numbers of parameters (two for BM, three for OU), their log-likelihoods cannot be compared directly. One useful way to compare the performance of these models is though their Akaike Information Criterion, which balances goodness of fit (log-likelihood) against model complexity as measured by the number of parameters. For relatively small data sets, it is generally recommended to use a bias-corrected version of the AIC, often called AICc. Lower values for AICc indicate more support, and so the BM model, with an AICc score of -11.15, is more strongly supported than the OU model (AICc = -10.68). However, the difference in scores is not great, and so neither model conclusively outperforms the other.

A Note about Multiple Implementations of these Methods

Currently, there are multiple implementations of several of these models across different packages. The results are not always the same across methods, however, because either the purpose of the analysis is slightly different across packages, or the implementations use different computational strategies. The models that currently have multiple implementations as of this writing (13 December 2007) are (i) Brownian motion and (ii) Ornstein-Uhlenbeck with one optimum.

Brownian motion

Currently, there are three packages that implement inference of the BM model: ape in function ace(), geiger in function fitContinuous(), and ouch in function brown.fit(). The function ace() will return a log-likelihood and a parameter estimate for the BM model, but these are conditional upon the reconstructed trait values of the nodes. This is consistent with the main purpose of ace(), but these results are not comparable with other model fits (which are not usually conditioned in this way). Thus, unless your main focus is on ancestral reconstructions, you should probably be using the functions in either geiger or ouch to fit the BM model.

The functions in geiger and ouch share the approach of not conditioning the model fits on particular values at ancestral nodes. Although their computational strategies differ somewhat, they seem to produce log-likelihoods and rate estimates that are very nearly the same (note that rate parameter in ouch, "sigma", is the square root of the rate parameter in geiger, "beta").

Ornstein-Uhlenbeck

Both geiger and ouch can fit OU models with a single optimum (only ouch can currently fit OU models with more than one optimum). Again, the approaches differ in algorithmic approach. The log-likelihoods produced seem to be very similar, but the parameter estimates may not be. Differences in the parameter estimates reflect that this can be a very difficult model to fit, as there is often a very large ridge over which the log-likelihood changes very little. The differences can be small enough so that optimization routines can differ as to where they stop along this ridge.