R Hackathon 1/Phylogenetic Independent Contrasts

From Phyloinformatics
Jump to: navigation, search

Phylogenetic Independent Contrasts

This example assumes you have installed R and the package ape (if not start here) and downloaded the tree and dataset Geospiza from geospiza.rda. Upload the tree and data file into your R workspace as a phylo Class for the tree (read.nexus) and a dataframe for the data (read.table).

  > library(ape)
  > mydata<-read.table("geospiza.txt")
  > mytree<-read.nexus("geospiza.nex")

The following example will calculate phylogenetic independent contrasts (Felsenstein, 1985) using the pic function in ape.

IMPORTANT you can only calculate contrasts on a single variable, the soon to be released CAIC package will allow you to run more than one trait at a time as well as discrete traits.

WARNING your tree must be fully dichotomous and there must be no gaps in your data.

So remove the outgroup "olivacea" from the tree as there are no data for it in the dataframe.

  > mytree <- drop.tip(mytree, "olivacea")

Calculating Contrasts

Create a numeric vector for the traits wingL and tarsusL

  > wingL <- mydata$wingL
  > tarsusL <- mydata$tarsusL

However if you leave the data like this the analysis will run but the data will not be associated with the correct tips. So as long as your row names in your dataframe (mydata) match the tip labels in the phylogeny (mytree$tip.label) you can associate the data with the correct tip by:

  > names(wingL) <- row.names(mydata)
  > names(tarsusL) <- row.names(mydata)

To calculate contrasts that have been scaled using the expected variances:

  > ContrastwingL <- pic(wingL, mytree)
  > ContrasttarsusL <- pic(tarsusL, mytree)

If you want to extract the expected variances as well as the contrasts:

  > ContrastwingL <- pic(wingL, mytree, var.contrasts=TRUE)

If you don't want to scale the contrasts using the expected variances:

 > ContrastwingL <- pic(wingL, mytree, scaled=FALSE)

Displaying Contrasts

To see the contrasts of the wingL variable type:

  > ContrastwingL

To see the contrasts plotted at the appropriate nodes, when you have saved both expected variances and contrasts to the object ContrastwingL and ContrasttarsusL, use the function nodelabels:

  > plot (mytree)
  > nodelabels(round(ContrastwingL [,1], 3), adj = c(0, -0.5), frame="n")
  > nodelabels(round(ContrasttarsusL [,1], 3), adj = c(0, 1), frame="n")

So what is this code actually doing? it is plotting the first column ([,1]) of the object ContrastwingL to 3 decimal places (round(ContrastwingL [,1], 3). You will notice that the instructions for the 2 sets of contrasts are slightly different, this is to stop the 2 values being printed upon one another!

To see the contrasts plotted at the appropriate nodes, when you have only saved the contrasts to the object ContrastwingL and ContrasttarsusL, use the function nodelabels:

  > plot (mytree)
  > nodelabels(round(ContrastwingL, 3), adj = c(0, -0.5), frame="n")
  > nodelabels(round(ContrasttarsusL, 3), adj = c(0, 1), frame="n")

Correlated Trait Evolution

To run a linear regression on the wing and tarsus contrasts when we have not extracted the variances:

  > RegressTarsusWing <- lm(ContrastwingL~ContrasttarsusL -1)

The -1 specifies that the regression is through the origin (the intercept is set to zero) as recommended by Garland et al., 1992.

If you have extracted the expected variances there will be more than one column in the objects ContrastwingL and ContrasttarsusL so you need to specify that the contrasts are in the first column:

  > RegressTarsusWing <- lm(ContrastwingL [,1]~ContrasttarsusL [,1] -1)

To see the regression statistics:

  > summary.lm(RegressTarsusWing)

To plot the contrasts to visualize the regression - which is always wise to check that they meet the assumption of linear relationship between the 2 variables:

  > plot(ContrastwingL [,1], ContrasttarsusL [,1])

To add the regression line to the plot you add the regression coefficients from the linear regression.

  > abline(RegressTarsusWing)

You can change lots of things about the plot such as the line thickness, colour etc. to see more information type:

  > help(plot)

References

Felsenstein, J. (1985) Phylogenies and the comparative method. Am. Nat. 125, 1-15.

Garland, Jr. T., Harvey, P.H. & Ives, A. R. (1992) Procedures for the analysis of comparative data using phylogenetically independent contrasts. Syst. Biol 41, 18-32.

Purvis, A. & Rambaut, A. (1995) Comparative Analysis by Independent Contrasts (CAIC): an Apple Macintosh application for analysing comparative data. Bioinformatics 11(3), 247-251.