Difference between revisions of "R Hackathon 1/Ancestral State Reconstruction"

From Phyloinformatics
Jump to: navigation, search
m
Line 9: Line 9:
 
   geotree <- drop.tip(geotree, "olivacea")
 
   geotree <- drop.tip(geotree, "olivacea")
  
IMPORTANT.  The row.names of your data.frame must match the tip.labels of your phylogeny.  In the example above, the row.names for geodata will match the tip.labels for geotree after "olivacea" has been culled.  However, the individual columns of geodata (e.g. geodata$wingL) do not automatically have the row.names of the whole data table associated with them! If you call a column of the data.table with aca without first dealing with this issue, the analysis will run, but the tip data will be disassociated from the proper tips! There are two workarounds.
+
IMPORTANT.  The row.names of your dataframe must match the tip.labels of your phylogeny.  In the example above, the row.names for geodata will match the tip.labels for geotree after "olivacea" has been culled.  However, the individual columns of geodata (e.g. geodata$wingL) do not automatically have the row.names of the whole data table associated with them! If you call a column of the data.table with ace without first dealing with this issue, the analysis will run, but the tip data will be disassociated from the proper tips! There are two workarounds.
  
 
One option is to sort the datatable so that the taxa appear in the same order in the table as in the phylogeny. For example:
 
One option is to sort the datatable so that the taxa appear in the same order in the table as in the phylogeny. For example:

Revision as of 13:17, 12 December 2007

The primary ancestral state reconstruction algorithms in R are accessed through the function "ace" in package "ape". To work through the following example using the Geospiza dataset, make sure that you have installed and loaded ape into your R session and loaded the Geospiza phylogeny and tip data into memory.

  library(ape)
  geotree <- read.nexus("geospiza.nex")
  geodata <- read.table("geospiza.txt")

Tip data is not available for the outgroup "olivacea", so drop that taxon from the analysis.

  geotree <- drop.tip(geotree, "olivacea")

IMPORTANT. The row.names of your dataframe must match the tip.labels of your phylogeny. In the example above, the row.names for geodata will match the tip.labels for geotree after "olivacea" has been culled. However, the individual columns of geodata (e.g. geodata$wingL) do not automatically have the row.names of the whole data table associated with them! If you call a column of the data.table with ace without first dealing with this issue, the analysis will run, but the tip data will be disassociated from the proper tips! There are two workarounds.

One option is to sort the datatable so that the taxa appear in the same order in the table as in the phylogeny. For example:

  geodata <- geodata[geotree$tip.label, ]

The other, and better option is to extract the column of data of interest from the overall datatable, creating a new vector, and to transfer the row.names of the datatable to the NAMES of the vector. This is the option least likely to inadverently disassociate the tip data from the tips.

  wingL <- geodata$wingL
  names(wingL) <- row.names(geodata)
  

The worked examples below assume that you have used the second solution (vector extraction and name assignment).

Reconstructing Ancestral States for Continuous Variables

There are three general options for continuous variables: reconstructions based on maximum likelihood (ML), reconstructions based upon phylogenetic independent contrasts (pic) and reconstructions based on generalized least squares.

MAXIMUM LIKELIHOOD This syntax will reconstruct the ancestral states for the variable "wingL" (extracted from geodata) using the Brownian motion-based maximum likelihood (ML) estimator of Schluter et. al. (1997), which is the default.

  MLreconstruction <- ace(wingL, geotree, type="continuous", method="ML")

ML reconstruction using ace tends to generate a large number of not-a-number error messages. Te result when the program calculates the likelihood of particular bad fits. You can safely ignore these messages.

PHYLOGENETIC INDEPENDENT CONSTRASTS This syntax will reconstruct the ancestral states for the variable "wingL" (extracted from geodata) using Felsenstein's (1985) phylogenetic independent contrasts (pic). This is also a Brownian-motion based estimator, but it only takes descendants of each node into account in reconstructing the state at that node. More basal nodes are ignored.

  picreconstruction <- ace(wingL, geotree, type="continuous", method="pic")

GENERALIZED LEAST SQUARES Reconstructions based upon generalized least squares require specification of a correlation structure for the generalized linear model. There are three basic options for the specification of the correlation structure: 1) corBrownian, which uses a simple Brownian motion model, corMartins, which uses an Ornstein-Uhlenbeck (constrained random-walk) model, and corGrafen, which is a modified Brownian motion model (Grafen 1989) (ADD MORE INFO ON WHAT EXACTLY THE GRAFEN MODEL DOES).

This is the syntax for the simple Brownian model

  GLSreconstruction <- ace(wingL, geotree, type="continuous", method="GLS", corStruct = corBrownian(1, geotree))

(CHECK RESULTS . . . fixes basal node at 0?)

The syntax for the Ornstein-Uhlenbeck model requires specifying an alpha parameter (here, 0.5).

  GLSreconstruction <- ace(wingL, geotree, type="continuous", method="GLS", corStruct = corMartins(0.5, geotree))

(CHECK RESULTS . . . gives wacky answers)

The syntax for the Grafen model requires specifying a rho parameter (here, 1) which exponentiates the recalculated branch lengths.

  GLSreconstruction <- ace(wingL, geotree, type="continuous", method="GLS", corStruct = corGrafen(1, geotree))

(This is also giving wacky answers .. . .)

WHAT ABOUT SQUARED CHANGE PARSIMONY?

Squared change parsimony is mathematically equivalent to a special case of Schluter et. al.'s maximum likelihood reconstruction in which branch length information is ignored. Thus, to perform a squared change parsimony reconstruction, set all branch lengths equal to 1 and calculate a maximum likelihood solution.

  geotreeones <- compute.brlen(geotree, 1)
  MLreconstruction <- ace(wingL, geotreeones, type="continuous", method="ML")

Reconstructing Ancestral States for Discrete Variables