R Hackathon 1/Divergence Time Estimation

From Phyloinformatics
Jump to: navigation, search

Many of the comparative methods require a ultrmetric tree. Currently, there are many programs available that estimate divergence times (e.g., PAUP*, paml, BEAST, r8smultidivtime, PATHd8). Different methods making different assumptions about how the tree may be parameterized with respect to branching times. The other available way to estimate divergence times is in R with the ape package. In each of the following examples you will need a rooted tree with branch lengths. Likewise, your tree will need to be dichotomous (i.e., with no polytomies), therefore it might need a little massaging. See the page on Tree & Data manipulation.

How do I estimate divergence times using nonparametric rate smoothing (NPRS)

The first step for each of these methods is to load the functions from the ape package:


next, you will want to read in your rooted tree with branch lengths equal or proportional to number of base pair substitutions with the read.tree command:

mytree <- read.tree(file="PATH_TO_FILE")


mytree <- read.nexus(file="PATH_TO_FILE")

depending on how your tree is formatted. The variable mytree is now an object of class phylo. This tree can be used with all of the following examples. The first is to transform your branch lengths using nonparametric rate smoothing (NPRS; see Sanderson, 1997). This is achieved by issuing the command:

chronotree <- chronogram(mytree)

This command takes three additional subcommand:

scale-- This assigns a age to the root of the tree.

expo-- This defines the exponent of the exponential function.

minEdgeLength-- Minimum edge length in the phylogram (default value: 1e-06). If any branches in the tree are shorter then this value, then they will be assigned it.

It is then possible to view the tree by passing the chronogram argument to the plot function:


Likewise, you can save the tree to file by passing the chronogram argument to the write.tree function:

write.tree(chronogram(mytree), file="/Users/cbell/tree")

where "/Users/cbell/tree" is the path to where I want the file to be saved.

How do I estimate divergence times using penalized likelihood (PL)

This next function estimates the node ages of a tree using a semi-parametric method based on penalized likelihood (Sanderson 2002).

chronopl(phy, lambda, node.age = 1, node = "root", CV = FALSE)

The branch lengths of the input tree are interpreted as (mean) numbers of substitutions where 'phy' is an object of class "phylo." lambda equals a value of the smoothing parameter; node.age is a numeric values specifying the fixed node ages; 'node' is the numbers of the nodes whose ages are given by node.age; "root" is a short-cut for the number of the node; and 'CV' is whether to perform cross-validation (see Sanderson, 2002). One thing to keep in mind, however, is that the likelihood function is calculated based on a Poisson approximation using the number of substitutions observed to perform the calculations. This is not a problem if parsimony was used to calculate branch lengths, where the branch length represents the inferred number of changes that occurred along any given branch. If maximum likelihood was used to infer branch lengths, the values are in expected substitutions per site. In the program r8s, the user provides the number of sites used to infer the branch lengths to convert branch lengths to the observed number of substitutions. In ape, there is no such conversion/option. The user my want to convert there user tree branch lengths to observed number of substitutions by issuing the following commands in R:

x <- mytree$edge.length

for (i in x)

mytree$edge.length <-y

In the example above, 'mytree' is the tree with branch lengths (edge.length) I read into R. I then converted the branch lengths by a factor of 7999 (the number of sites used in the original dataset to calculate the branch lengths).

Now your tree is ready for the penalized likelihood analysis. However, you need to determine a lambda value (smoothing parameter). Determining an appropriate 'lambda' value is the crux of the matter. This is where the cross-validation procedure comes in.

l <- 10^(-1:6)
cv <- numeric(length(l))

for (i in 1:length(l))
    cv[i] <- sum(attr(chronopl(mammals, lambda = l[i], CV=TRUE), "D2"))
plot(l, cv)

How do I estimate divergence times using mean path length (MPL)

Another method available in ape is the mean path length method of Britton et al. (2002, 2007). This is achieved by issuing the command:


This function has two sub commands:

se-- a logical specifying whether to compute the standard-errors of the node ages (TRUE by default).

test--a logical specifying whether to test the molecular clock at each node (TRUE by default).

The tests performed if test = TRUE is a comparison of the MPL of the two subtrees originating from a node; the null hypothesis is that the rate of substitution was the same in both subtrees (Britton et al. 2002). The test statistic follows, under the null hypothesis, a standard normal distribution. The returned P-value is the probability of observing a greater absolute value (i.e., a two-sided test). No correction for multiple testing is applied: this is left to the user.

Credit: Some of the information on this page is paraphrased from the book Analysis of Phylogenetics and Evolution with R" (Paradis, 2006) or from documentation within ape itself.